Tag Archives: AWS CloudFormation

How to create and retrieve secrets managed in AWS Secrets Manager using AWS CloudFormation template

Post Syndicated from Apurv Awasthi original https://aws.amazon.com/blogs/security/how-to-create-and-retrieve-secrets-managed-in-aws-secrets-manager-using-aws-cloudformation-template/

AWS Secrets Manager now integrates with AWS CloudFormation so you can create and retrieve secrets securely using CloudFormation. This integration makes it easier to automate provisioning your AWS infrastructure. For example, without any code changes, you can generate unique secrets for your resources with every execution of your CloudFormation template. This also improves the security of your infrastructure by storing secrets securely, encrypting automatically, and enabling rotation more easily.

Secrets Manager helps you protect the secrets needed to access your applications, services, and IT resources. In this post, I show how you can get the benefits of Secrets Manager for resources provisioned through CloudFormation. First, I describe the new Secrets Manager resource types supported in CloudFormation. Next, I show a sample CloudFormation template that launches a MySQL database on Amazon Relational Database Service (RDS). This template uses the new resource types to create, rotate, and retrieve the credentials (user name and password) of the database superuser required to launch the MySQL database.

Why use Secrets Manager with CloudFormation?

CloudFormation helps you model your AWS resources as templates and execute these templates to provision AWS resources at scale. Some AWS resources require secrets as part of the provisioning process. For example, to provision a MySQL database, you must provide the credentials for the database superuser. You can use Secrets Manager, the AWS dedicated secrets management service, to create and manage such secrets.

Secrets Manager makes it easier to rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle. You can now reference Secrets Manager in your CloudFormation templates to create unique secrets with every invocation of your template. By default, Secrets Manager encrypts these secrets with encryption keys that you own and control. Secrets Manager ensures the secret isn’t logged or persisted by CloudFormation by using a dynamic reference to the secret. You can configure Secrets Manager to rotate your secrets automatically without disrupting your applications. Secrets Manager offers built-in integrations for rotating credentials for all Amazon RDS databases and supports extensibility with AWS Lambda so you can meet your custom rotation requirements.

New Secrets Manager resource types supported in CloudFormation

  1. AWS::SecretsManager::Secret — Create a secret and store it in Secrets Manager.
  2. AWS::SecretsManager::ResourcePolicy — Create a resource-based policy and attach it to a secret. Resource-based policies enable you to control access to secrets.
  3. AWS::SecretsManager::SecretTargetAttachment — Configure Secrets Manager to rotate the secret automatically.
  4. AWS::SecretsManager::RotationSchedule — Define the Lambda function that will be used to rotate the secret.

How to use Secrets Manager in CloudFormation

Now that you’re familiar with the new Secrets Manager resource types supported in CloudFormation, I’ll show how you can use these in a CloudFormation template. I will use a sample template that creates a MySQL database in Amazon RDS and uses Secrets Manager to create the credentials for the superuser. The template also configures the secret to rotate every 30 days automatically.

  1. Create a stack on the AWS CloudFormation console by copying the following sample template.
    
    ---
    Description: "How to create and retrieve secrets securely using an AWS CloudFormation template"
    Resources:
    
    # Create a secret with the username admin and a randomly generated password in JSON.  
      MyRDSInstanceRotationSecret:
        Type: AWS::SecretsManager::Secret
        Properties:
          Description: 'This is the secret for my RDS instance'
          GenerateSecretString:
            SecretStringTemplate: '{"username": "admin"}'
            GenerateStringKey: 'password'
            PasswordLength: 16
            ExcludeCharacters: '"@/'
    
    
    
    # Create a MySQL database of size t2.micro.
    # The secret (username and password for the superuser) will be dynamically 
    # referenced. This ensures CloudFormation will not log or persist the resolved 
    # value. 
      MyDBInstance:
        Type: AWS::RDS::DBInstance
        Properties:
          AllocatedStorage: 20
          DBInstanceClass: db.t2.micro
          Engine: mysql
          MasterUsername: !Join ['', ['{{resolve:secretsmanager:', !Ref MyRDSInstanceRotationSecret, ':SecretString:username}}' ]]
          MasterUserPassword: !Join ['', ['{{resolve:secretsmanager:', !Ref MyRDSInstanceRotationSecret, ':SecretString:password}}' ]]
          BackupRetentionPeriod: 0
          DBInstanceIdentifier: 'rotation-instance'
    
    
    
    # Update the referenced secret with properties of the RDS database.
    # This is required to enable rotation. To learn more, visit our documentation
    # https://docs.aws.amazon.com/secretsmanager/latest/userguide/rotating-secrets.html
      SecretRDSInstanceAttachment:
        Type: AWS::SecretsManager::SecretTargetAttachment
        Properties:
          SecretId: !Ref MyRDSInstanceRotationSecret
          TargetId: !Ref MyDBInstance
          TargetType: AWS::RDS::DBInstance
    
    
    
    # Schedule rotating the secret every 30 days. 
    # Note, the first rotation is triggered immediately. 
    # This enables you to verify that rotation is configured appropriately.
    # Subsequent rotations are scheduled according to the configured rotation. 
      MySecretRotationSchedule:
        Type: AWS::SecretsManager::RotationSchedule
        DependsOn: SecretRDSInstanceAttachment
        Properties:
          SecretId: !Ref MyRDSInstanceRotationSecret
          RotationLambdaARN: <% replace-with-lambda-arn %>
          RotationRules:
            AutomaticallyAfterDays: 30
     
    

  2. Next, execute the stack.
     
    Figure 1: Execute the stack

    Figure 1: Execute the stack

  3. After you execute the stack, open the RDS console to verify the database, rotation-instance, has been successfully created.
     
    Figure 2: Verify the database has been created

    Figure 2: Verify the database has been created

  4. Open the Secrets Manager console and verify the stack successfully created the secret, MyRDSInstanceRotationSecret.
     
    Figure 3: Verify the stack successfully created the secret

    Figure 3: Verify the stack successfully created the secret

Summary

I showed you how to create and retrieve secrets in CloudFormation. This improves the security of your infrastructure and makes it easier to automate infrastructure provisioning. To get started managing secrets, open the Secrets Manager console. To learn more, read How to Store, Distribute, and Rotate Credentials Securely with Secret Manager or refer to the Secrets Manager documentation.

If you have comments about this post, submit them in the Comments section below. If you have questions about anything in this post, start a new thread on the Secrets Manager forum or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Apurv Awasthi

Apurv is the product manager for credentials management services at AWS, including AWS Secrets Manager and IAM Roles. He enjoys the “Day 1” culture at Amazon because it aligns with his experience building startups in the sports and recruiting industries. Outside of work, Apurv enjoys hiking. He holds an MBA from UCLA and an MS in computer science from University of Kentucky.

ICYMI: Serverless Q3 2018

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/icymi-serverless-q3-2018/

Welcome to the third edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

If you didn’t see them, catch our Q1 ICYMI and Q2 ICYMI posts for what happened then.

So, what might you have missed this past quarter? Here’s the recap.

AWS Amplify CLI

In August, AWS Amplify launched the AWS Amplify Command Line Interface (CLI) toolchain for developers.

The AWS Amplify CLI enables developers to build, test, and deploy full web and mobile applications based on AWS Amplify directly from their CLI. It has built-in helpers for configuring AWS services such as Amazon Cognito for Auth , Amazon S3 and Amazon DynamoDB for storage, and Amazon API Gateway for APIs. With these helpers, developers can configure AWS services to interact with applications built in popular web frameworks such as React.

Get started with the AWS Amplify CLI toolchain.

New features

Rejoice Microsoft application developers: AWS Lambda now supports .NET Core 2.1 and PowerShell Core!

AWS SAM had a few major enhancements to help in both testing and debugging functions. The team launched support to locally emulate an endpoint for Lambda so that you can run automated tests against your functions. This differs from the existing functionality that emulated a proxy similar to API Gateway in front of your function. Combined with the new improved support for ‘sam local generate-event’ to generate over 50 different payloads, you can now test Lambda function code that would be invoked by almost all of the various services that interface with Lambda today. On the operational front, AWS SAM can now fetch, tail, and filter logs generated by your functions running live on AWS. Finally, with integration with Delve, a debugger for the Go programming language, you can more easily debug your applications locally.

If you’re part of an organization that uses AWS Service Catalog, you can now launch applications based on AWS SAM, too.

The AWS Serverless Application Repository launched new search improvements to make it even faster to find serverless applications that you can deploy.

In July, AWS AppSync added HTTP resolvers so that now you can query your REST APIs via GraphQL! API Inception! AWS AppSync also added new built-in scalar types to help with data validation at the GraphQL layer instead of having to do this in code that you write yourself. For building your GraphQL-based applications on AWS AppSync, an enhanced no-code GraphQL API builder enables you to model your data, and the service generates your GraphQL schema, Amazon DynamoDB tables, and resolvers for your backend. The team also published a Quick Start for using Amazon Aurora as a data source via a Lambda function. Finally, the service is now available in the Asia Pacific (Seoul) Region.

Amazon API Gateway announced support for AWS X-Ray!

With X-Ray integrated in API Gateway, you can trace and profile application workflows starting at the API layer and going through the backend. You can control the sample rates at a granular level.

API Gateway also announced improvements to usage plans that allow for method level throttling, request/response parameter and status overrides, and higher limits for the number of APIs per account for regional, private, and edge APIs. Finally, the team added support for the OpenAPI 3.0 API specification, the next generation of OpenAPI 2, formerly known as Swagger.

AWS Step Functions is now available in the Asia Pacific (Mumbai) Region. You can also build workflows visually with Step Functions and trigger them directly with AWS IoT Rules.

AWS [email protected] now makes the HTTP Request Body for POST and PUT requests available.

AWS CloudFormation announced Macros, a feature that enables customers to extend the functionality of AWS CloudFormation templates by calling out to transformations that Lambda powers. Macros are the same technology that enables SAM to exist.

Serverless posts

July:

August:

September:

Tech Talks

We hold several Serverless tech talks throughout the year, so look out for them in the Serverless section of the AWS Online Tech Talks page. Here are the three tech talks that we delivered in Q3:

Twitch

We’ve been busy streaming deeply technical content to you the past few months! Check out awesome sessions like this one by AWS’s Heitor Lessa and Jason Barto diving deep into Continuous Learning for ML and the entire “Build on Serverless” playlist.

For information about upcoming broadcasts and recent live streams, keep an eye on AWS on Twitch for more Serverless videos and on the Join us on Twitch AWS page.

For AWS Partners

In September, we announced the AWS Serverless Navigate program for AWS APN Partners. Via this program, APN Partners can gain a deeper understanding of the AWS Serverless Platform, including many of the services mentioned in this post. The program’s phases help partners learn best practices such as the Well-Architected Framework, business and technical concepts, and growing their business’s ability to better support AWS customers in their serverless projects.

Check out more at AWS Serverless Navigate.

In other news

AWS re:Invent 2018 is coming in just a few weeks! For November 26–30 in Las Vegas, Nevada, join tens of thousands of AWS customers to learn, share ideas, and see exciting keynote announcements. The agenda for Serverless talks contains over 100 sessions where you can hear about serverless applications and technologies from fellow AWS customers, AWS product teams, solutions architects, evangelists, and more.

Register for AWS re:Invent now!

Want to get a sneak peek into what you can expect at re:Invent this year? Check out the awesome re:Invent Guides put out by AWS Community Heroes. AWS Community Hero Eric Hammond (@esh on Twitter) published one for advanced serverless attendees that you will want to read before the big event.

What did we do at AWS re:Invent 2017? Check out our recap: Serverless @ re:Invent 2017.

Still looking for more?

The Serverless landing page has lots of information. The resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials. Check it out!

Extending AWS CloudFormation with AWS Lambda Powered Macros

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/cloudformation-macros/

Today I’m really excited to show you a powerful new feature of AWS CloudFormation called Macros. CloudFormation Macros allow developers to extend the native syntax of CloudFormation templates by calling out to AWS Lambda powered transformations. This is the same technology that powers the popular Serverless Application Model functionality but the transforms run in your own accounts, on your own lambda functions, and they’re completely customizable. CloudFormation, if you’re new to AWS, is an absolutely essential tool for modeling and defining your infrastructure as code (YAML or JSON). It is a core building block for all of AWS and many of our services depend on it.

There are two major steps for using macros. First, we need to define a macro, which of course, we do with a CloudFormation template. Second, to use the created macro in our template we need to add it as a transform for the entire template or call it directly. Throughout this post, I use the term macro and transform somewhat interchangeably. Ready to see how this works?

Creating a CloudFormation Macro

Creating a macro has two components: a definition and an implementation. To create the definition of a macro we create a CloudFormation resource of a type AWS::CloudFormation::Macro, that outlines which Lambda function to use and what the macro should be called.

Type: "AWS::CloudFormation::Macro"
Properties:
  Description: String
  FunctionName: String
  LogGroupName: String
  LogRoleARN: String
  Name: String

The Name of the macro must be unique throughout the region and the Lambda function referenced by FunctionName must be in the same region the macro is being created in. When you execute the macro template, it will make that macro available for other templates to use. The implementation of the macro is fulfilled by a Lambda function. Macros can be in their own templates or grouped with others, but you won’t be able to use a macro in the same template you’re registering it in. The Lambda function receives a JSON payload that looks like something like this:

{
    "region": "us-east-1",
    "accountId": "$ACCOUNT_ID",
    "fragment": { ... },
    "transformId": "$TRANSFORM_ID",
    "params": { ... },
    "requestId": "$REQUEST_ID",
    "templateParameterValues": { ... }
}

The fragment portion of the payload contains either the entire template or the relevant fragments of the template – depending on how the transform is invoked from the calling template. The fragment will always be in JSON, even if the template is in YAML.

The Lambda function is expected to return a simple JSON response:

{
    "requestId": "$REQUEST_ID",
    "status": "success",
    "fragment": { ... }
}

The requestId needs to be the same as the one received in the input payload, and if status contains any value other than success (case-insensitive) then the changeset will fail to create. Now, fragment must contain the valid CloudFormation JSON of the transformed template. Even if your function performed no action it would still need to return the fragment for it to be included in the final template.

Using CloudFormation Macros


To use the macro we simply call out to Fn::Transform with the required parameters. If we want to have a macro parse the whole template we can include it in our list of transforms in the template the same way we would with SAM: Transform: [Echo]. When we go to execute this template the transforms will be collected into a changeset, by calling out to each macro’s specified function and returning the final template.

Let’s imagine we have a dummy Lambda function called EchoFunction, it just logs the data passed into it and returns the fragments unchanged. We define the macro as a normal CloudFormation resource, like this:

EchoMacro:
  Type: "AWS::CloudFormation::Macro"
  Properties:
    FunctionName: arn:aws:lambda:us-east-1:1234567:function:EchoFunction
	Name: EchoMacro

The code for the lambda function could be as simple as this:

def lambda_handler(event, context):
    print(event)
    return {
        "requestId": event['requestId'],
        "status": "success",
        "fragment": event["fragment"]
    }

Then, after deploying this function and executing the macro template, we can invoke the macro in a transform at the top level of any other template like this:

AWSTemplateFormatVersion: 2010-09-09 
 Transform: [EchoMacro, AWS::Serverless-2016-10-31]
 Resources:
    FancyTable:
      Type: AWS::Serverless::SimpleTable

The CloudFormation service creates a changeset for the template by first calling the Echo macro we defined and then the AWS::Serverless transform. It will execute the macros listed in the transform in the order they’re listed.

We could also invoke the macro using the Fn::Transform intrinsic function which allows us to pass in additional parameters. For example:

AWSTemplateFormatVersion: 2010-09-09
Resources:
  MyS3Bucket:
    Type: 'AWS::S3::Bucket'
    Fn::Transform:
      Name: EchoMacro
      Parameters:
        Key: Value

The inline transform will have access to all of its sibling nodes and all of its children nodes. Transforms are processed from deepest to shallowest which means top-level transforms are executed last. Since I know most of you are going to ask: no you cannot include macros within macros – but nice try.

When you go to execute the CloudFormation template it would simply ask you to create a changeset and you could preview the output before deploying.

Example Macros

We’re launching a number of reference macros to help developers get started and I expect many people will publish others. These four are the winners from a little internal hackathon we had prior to releasing this feature:

Name Description Author
PyPlate Allows you to inline Python in your templates Jay McConnel – Partner SA
ShortHand Defines a short-hand syntax for common cloudformation resources Steve Engledow – Solutions Builder
StackMetrics Adds cloudwatch metrics to stacks Steve Engledow and Jason Gregson – Global SA
String Functions Adds common string functions to your templates Jay McConnel – Partner SA

Here are a few ideas I thought of that might be fun for someone to implement:

If you end up building something cool I’m more than happy to tweet it out!

Available Now

CloudFormation Macros are available today, in all AWS regions that have AWS Lambda. There is no additional CloudFormation charge for Macros meaning you are only billed normal AWS Lambda function charges. The documentation has more information that may be helpful.

This is one of my favorite new features for CloudFormation and I’m excited to see some of the amazing things our customers will build with it. The real power here is that you can extend your existing infrastructure as code with code. The possibilities enabled by this new functionality are virtually unlimited.

Randall

Protecting your API using Amazon API Gateway and AWS WAF — Part 2

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/protecting-your-api-using-amazon-api-gateway-and-aws-waf-part-2/

This post courtesy of Heitor Lessa, AWS Specialist Solutions Architect – Serverless

In Part 1 of this blog, we described how to protect your API provided by Amazon API Gateway using AWS WAF. In this blog, we show how to use API keys between an Amazon CloudFront distribution and API Gateway to secure access to your API in API Gateway in addition to your preferred authorization (AuthZ) mechanism already set up in API Gateway. For more information about AuthZ mechanisms in API Gateway, see Secure API Access with Amazon Cognito Federated Identities, Amazon Cognito User Pools, and Amazon API Gateway.

We also extend the AWS CloudFormation stack previously used to automate the creation of the following necessary resources of this solution:

The following are alternative solutions to using an API key, depending on your security requirements:

Using a randomly generated HTTP secret header in CloudFront and verifying by API Gateway request validation
Signing incoming requests with [email protected] and verifying with API Gateway Lambda authorizers

Requirements

To follow along, you need full permissions to create, update, and delete API Gateway, CloudFront, Lambda, and CloudWatch Events through AWS CloudFormation.

Extending the existing AWS CloudFormation stack

First, click here to download the full template. Then follow these steps to update the existing AWS CloudFormation stack:

  1. Go to the AWS Management Console and open the AWS CloudFormation console.
  2. Select the stack that you created in Part 1, right-click it, and select Update Stack.
  3. For option 2, choose Choose file and select the template that you downloaded.
  4. Fill in the required parameters as shown in the following image.

Here’s more information about these parameters:

  • API Gateway to send traffic to – We use the same API Gateway URL as in Part 1 except without the URL scheme (https://): cxm45444t9a.execute-api.us-east-2.amazonaws.com/prod
  • Rotating API Keys – We define Daily and use 2018-04-03 as the timestamp value to append to the API key name

Continue with the AWS CloudFormation console to complete the operation. It might take a couple of minutes to update the stack as CloudFront takes its time to propagate changes across all point of presences.

Enabling API Keys in the example Pet Store API

While the stack completes in the background, let’s enable the use of API Keys in the API that CloudFront will send traffic to.

  1. Go to the AWS Management Console and open the API Gateway console.
  2. Select the API that you created in Part 1 and choose Resources.
  3. Under /pets, choose GET and then choose Method Request.
  4. For API Key Required, choose the dropdown menu and choose true.
  5. To save this change, select the highlighted check mark as shown in the following image.

Next, we need to deploy these changes so that requests sent to /pets fail if an API key isn’t present.

  1. Choose Actions and select Deploy API.
  2. Choose the Deployment stage dropdown menu and select the stage you created in Part 1.
  3. Add a deployment description such as “Requires API Keys under /pets” and choose Deploy.

When the deployment succeeds, you’re redirected to the API Gateway Stage page. There you can use the Invoke URL to test if the following request fails due to not having an API key.

This failure is expected and proves that our deployed changes are working. Next, let’s try to access the same API but this time through our CloudFront distribution.

  1. From the AWS Management Console, open the AWS Cloudformation console.
  2. Select the stack that you created in Part 1 and choose Outputs at the bottom left.
  3. On the CFDistribution line, copy the URL. Before you paste in a new browser tab or window, append ‘/pets’ to it.

As opposed to our first attempt without an API key, we receive a JSON response from the PetStore API. This is because CloudFront is injecting an API key before it forwards the request to the PetStore API. The following image demonstrates both of these tests:

  1. Successful request when accessing the API through CloudFront
  2. Unsuccessful request when accessing the API directly through its Invoke URL

This works as a secret between CloudFront and API Gateway, which could be any agreed random secret that can be rotated like an API key. However, it’s important to know that the API key is a feature to track or meter API consumers’ usage. It’s not a secure authorization mechanism and therefore should be used only in conjunction with an API Gateway authorizer.

Rotating API keys

API keys are automatically rotated based on the schedule (e.g., daily or monthly) that you chose when updating the AWS CloudFormation stack. This requires no maintenance or intervention on your part. In this section, we explain how this process works under the hood and what you can do if you want to manually trigger an API key rotation.

The AWS CloudFormation template that we downloaded and used to update our stack does the following in addition to Part 1.

Introduce a Timestamp parameter that is appended to the API key name

Parameters:
  Timestamp:
    Type: String
    Description: Fill in this format <Year>-<Month>-<Day>
    Default: 2018-04-02

Create an API Gateway key, API Gateway usage plan, associate the new key with the API gateway given as a parameter, and configure the CloudFront distribution to send a custom header when forwarding traffic to API Gateway

CFDistribution:
  Type: AWS::CloudFront::Distribution
  Properties:
    DistributionConfig:
      Logging:
        IncludeCookies: 'false'
        Bucket: !Sub ${S3BucketAccessLogs}.s3.amazonaws.com
        Prefix: cloudfront-logs
      Enabled: 'true'
      Comment: API Gateway Regional Endpoint Blog post
      Origins:
        -
          Id: APIGWRegional
          DomainName: !Select [0, !Split ['/', !Ref ApiURL]]
          CustomOriginConfig:
            HTTPPort: 443
            OriginProtocolPolicy: https-only
          OriginCustomHeaders:
            - 
              HeaderName: x-api-key
              HeaderValue: !Ref ApiKey
              ...

ApiUsagePlan:
  Type: AWS::ApiGateway::UsagePlan
  Properties:
    Description: CloudFront usage only
    UsagePlanName: CloudFront_only
    ApiStages:
      - 
        ApiId: !Select [0, !Split ['.', !Ref ApiURL]]
        Stage: !Select [1, !Split ['/', !Ref ApiURL]]

ApiKey: 
  Type: "AWS::ApiGateway::ApiKey"
  Properties: 
    Name: !Sub "CloudFront-${Timestamp}"
    Description: !Sub "CloudFormation API Key ${Timestamp}"
    Enabled: true

ApiKeyUsagePlan:
  Type: "AWS::ApiGateway::UsagePlanKey"
  Properties:
    KeyId: !Ref ApiKey
    KeyType: API_KEY
    UsagePlanId: !Ref ApiUsagePlan

As shown in the ApiKey resource, we append the given Timestamp to Name as well as use it in the API Gateway usage plan key resource. This means that whenever the Timestamp parameter changes, AWS CloudFormation triggers a resource replacement and updates every resource that depends on that API key. In this case, that includes the AWS CloudFront configuration and API Gateway usage plan.

But what does the rotation schedule that you chose at the beginning of this blog mean in this example?

Create a scheduled activity to trigger a Lambda function on a given schedule

Parameters:
...
  ApiKeyRotationSchedule: 
    Description: Schedule to rotate API Keys e.g. Daily, Monthly, Bimonthly basis
    Type: String
    Default: Daily
    AllowedValues:
      - Daily
      - Fortnightly
      - Monthly
      - Bimonthly
      - Quarterly
    ConstraintDescription: Must be any of the available options

Mappings: 

  ScheduleMap: 
    CloudwatchEvents: 
      Daily: "rate(1 day)"
      Fortnightly: "rate(14 days)"
      Monthly: "rate(30 days)"
      Bimonthly: "rate(60 days)"
      Quarterly: "rate(90 days)"

Resources:
...
  RotateApiKeysScheduledJob: 
    Type: "AWS::Events::Rule"
    Properties: 
      Description: "ScheduledRule"
      ScheduleExpression: !FindInMap [ScheduleMap, CloudwatchEvents, !Ref ApiKeyRotationSchedule]
      State: "ENABLED"
      Targets: 
        - 
          Arn: !GetAtt RotateApiKeysFunction.Arn
          Id: "RotateApiKeys"

The resource RotateApiKeysScheduledJob shows that the schedule that you selected through a dropdown menu when updating the AWS CloudFormation stack is actually converted to a CloudWatch Events rule. This in turn triggers a Lambda function that is defined in the same template.

RotateApiKeysFunction:
      Type: "AWS::Lambda::Function"
      Properties:
        Handler: "index.lambda_handler"
        Role: !GetAtt RotateApiKeysFunctionRole.Arn
        Runtime: python3.6
        Environment:
          Variables:
            StackName: !Ref "AWS::StackName"
        Code:
          ZipFile: !Sub |
            import datetime
            import os

            import boto3
            from botocore.exceptions import ClientError

            session = boto3.Session()
            cfn = session.client('cloudformation')
            
            timestamp = datetime.date.today()            
            params = {
                'StackName': os.getenv('StackName'),
                'UsePreviousTemplate': True,
                'Capabilities': ["CAPABILITY_IAM"],
                'Parameters': [
                    {
                      'ParameterKey': 'ApiURL',
                      'UsePreviousValue': True
                    },
                    {
                      'ParameterKey': 'ApiKeyRotationSchedule',
                      'UsePreviousValue': True
                    },
                    {
                      'ParameterKey': 'Timestamp',
                      'ParameterValue': str(timestamp)
                    },
                ],                
            }

            def lambda_handler(event, context):
              """ Updates CloudFormation Stack with a new timestamp and returns CloudFormation response"""
              try:
                  response = cfn.update_stack(**params)
              except ClientError as err:
                  if "No updates are to be performed" in err.response['Error']['Message']:
                      return {"message": err.response['Error']['Message']}
                  else:
                      raise Exception("An error happened while updating the stack: {}".format(err))          
  
              return response

All this Lambda function does is trigger an AWS CloudFormation stack update via API (exactly what you did through the console but programmatically) and updates the Timestamp parameter. As a result, it rotates the API key and the CloudFront distribution configuration.

This gives you enough flexibility to change the API key rotation schedule at any time without maintaining or writing any code. You can also manually update the stack and rotate the keys by updating the AWS CloudFormation stack’s Timestamp parameter.

Next Steps

We hope you found the information in this blog helpful. You can use it to understand how to create a mechanism to allow traffic only from CloudFront to API Gateway and avoid bypassing the AWS WAF rules that Part 1 set up.

Keep the following important notes in mind about this solution:

  • It assumes that you already have a strong AuthZ mechanism, managed by API Gateway, to control access to your API.
  • The API Gateway usage plan and other resources created in this solution work only for APIs created in the same account (the ApiUrl parameter).
  • If you already use API keys for tracking API usage, consider using either of the following solutions as a replacement:
    • Use a random HTTP header value in CloudFront origin configuration and use an API Gateway request model validation to verify it instead of API keys alone.
    • Combine [email protected] and an API Gateway custom authorizer to sign and verify incoming requests using a shared secret known only to the two. This is a more advanced technique.

Managing Amazon SNS Subscription Attributes with AWS CloudFormation

Post Syndicated from Rachel Richardson original https://aws.amazon.com/blogs/compute/managing-amazon-sns-subscription-attributes-with-aws-cloudformation/

This post is courtesy of Otavio Ferreira, Manager, Amazon SNS, AWS Messaging.

Amazon SNS is a fully managed pub/sub messaging and event-driven computing service that can decouple distributed systems and microservices. By default, when your publisher system posts a message to an Amazon SNS topic, all systems subscribed to the topic receive a copy of the message. By using Amazon SNS subscription attributes, you can customize this default behavior and make Amazon SNS fit your use cases even more naturally. The available set of Amazon SNS subscription attributes includes FilterPolicy, DeliveryPolicy, and RawMessageDelivery.

You can manually manage your Amazon SNS subscription attributes via the AWS Management Console or programmatically via AWS Development Tools (SDK and AWS CLI). Now you can automate their provisioning via AWS CloudFormation templates as well. AWS CloudFormation lets you use a simple text file to model and provision all the Amazon SNS resources for your messaging use cases, across AWS Regions and accounts, in an automated and secure manner.

The following sections describe how you can simultaneously create Amazon SNS subscriptions and set their attributes via AWS CloudFormation templates.

Setting the FilterPolicy attribute

The FilterPolicy attribute is valid in the context of message filtering, regardless of the delivery protocol, and defines which type of message the subscriber expects to receive from the topic. Hence, by applying the FilterPolicy attribute, you can offload the message-filtering logic from subscribers and the message-routing logic from publishers.

To set the FilterPolicy attribute in your AWS CloudFormation template, use the syntax in the following JSON snippet. This snippet creates an Amazon SNS subscription whose endpoint is an AWS Lambda function. Simultaneously, this code also sets a subscription filter policy that matches messages carrying an attribute whose key is “pet” and value is either “dog” or “cat.”

{
   "Resources": {
      "mySubscription": {
         "Type" : "AWS::SNS::Subscription",
         "Properties" : {
            "Protocol": "lambda",
            "Endpoint": "arn:aws:lambda:us-east-1:000000000000:function:SavePet",
            "TopicArn": "arn:aws:sns:us-east-1:000000000000:PetTopic",
            "FilterPolicy": {
               "pet": ["dog", "cat"]
            }
         }
      }
   }
}

Setting the DeliveryPolicy attribute

The DeliveryPolicy attribute is valid in the context of message delivery to HTTP endpoints and defines a delivery-retry policy. By applying the DeliveryPolicy attribute, you can control the maximum number of retries the subscriber expects, the time delay between each retry, and the backoff function. You should fine-tune these values based on the traffic volume your subscribing HTTP server can handle.

To set the DeliveryPolicy attribute in your AWS CloudFormation template, use the syntax in the following JSON snippet. This snippet creates an Amazon SNS subscription whose endpoint is an HTTP address. The code also sets a delivery policy capped at 10 retries for this subscription, with a linear backoff function.

{
   "Resources": {
      "mySubscription": {
         "Type" : "AWS::SNS::Subscription",
         "Properties" : {
            "Protocol": "https",
            "Endpoint": "https://api.myendpoint.ca/pets",
            "TopicArn": "arn:aws:sns:us-east-1:000000000000:PetTopic",
            "DeliveryPolicy": {
               "healthyRetryPolicy": {
                  "numRetries": 10,
                  "minDelayTarget": 10,
                  "maxDelayTarget": 30,
                  "numMinDelayRetries": 3,
                  "numMaxDelayRetries": 7,
                  "numNoDelayRetries": 0,
                  "backoffFunction": "linear"
               }
            }
         }
      }
   }
}

Setting the RawMessageDelivery attribute

The RawMessageDelivery attribute is valid in the context of message delivery to Amazon SQS queues and HTTP endpoints. This Boolean attribute eliminates the need for the subscriber to process the JSON formatting that is created by default to decorate all published messages with Amazon SNS metadata. When you set RawMessageDelivery to true, you get two outcomes. First, your message is delivered as is, with no metadata added. Second, your message attributes propagate from Amazon SNS to Amazon SQS, when the subscribing endpoint is an Amazon SQS queue.

To set the RawMessageDelivery attribute in your AWS CloudFormation template, use the syntax in the following JSON snippet. This snippet creates an Amazon SNS subscription whose endpoint is an Amazon SQS queue. This code also enables raw message delivery for the subscription, which prevents Amazon SNS metadata from being added to the message payload.

{
   "Resources": {
      "mySubscription": {
         "Type" : "AWS::SNS::Subscription",
         "Properties" : {
            "Protocol": "https",
            "Endpoint": "https://api.myendpoint.ca/pets",
            "TopicArn": "arn:aws:sns:us-east-1:000000000000:PetTopic",
            "DeliveryPolicy": {
               "healthyRetryPolicy": {
                  "numRetries": 10,
                  "minDelayTarget": 10,
                  "maxDelayTarget": 30,
                  "numMinDelayRetries": 3,
                  "numMaxDelayRetries": 7,
                  "numNoDelayRetries": 0,
                  "backoffFunction": "linear"
               }
            }
         }
      }
   }
}

Applying subscription attributes in a use case

Here’s how everything comes together. The following example is based on a car dealer company, which operates with the following distributed systems hosted on Amazon EC2 instances:

  • Car-Dealer-System – Front-office system that takes orders placed by car buyers
  • ERP-System – Enterprise resource planning, the back-office system that handles finance, accounting, human resources, and related business activities
  • CRM-System – Customer relationship management, the back-office system responsible for storing car buyers’ profile information and running sales workflows
  • SCM-System – Supply chain management, the back-office system that handles inventory tracking and demand forecast and planning

 

Whenever an order is placed in the car dealer system, this event is broadcasted to all back-office systems interested in this type of event. As shown in the preceding diagram, the company applied AWS Messaging services to decouple their distributed systems, promoting more scalability and maintainability for their architecture. The queues and topic used are the following:

  • Car-Sales – Amazon SNS topic that receives messages from the car dealer system. All orders placed by car buyers are published to this topic, then delivered to subscribers (two Amazon SQS queues and one HTTP endpoint).
  • ERP-Integration – Amazon SQS queue that feeds the ERP system with orders published by the car dealer system. The ERP pulls messages from this queue to track revenue and trigger related bookkeeping processes.
  • CRM-Integration – Amazon SQS queue that feeds the CRM system with orders published by the car dealer system. The CRM pulls messages from this queue to track car buyers’ interests and update sales workflows.

The company created the following three Amazon SNS subscriptions:

  • The first subscription refers to the ERP-Integration queue. This subscription has the RawMessageDelivery attribute set to true. Hence, no metadata is added to the message payload, and message attributes are propagated from Amazon SNS to Amazon SQS.
  • The second subscription refers to the CRM-Integration queue. Like the first subscription, this one also has the RawMessageDelivery attribute set to true. Additionally, it has the FilterPolicy attribute set to {“buyer-class”: [“vip”]}. This policy defines that only orders placed by VIP buyers are managed in the CRM system, and orders from other buyers are filtered out.
  • The third subscription points to the HTTP endpoint that serves the SCM-System. Unlike ERP and CRM, the SCM system provides its own HTTP API. Therefore, its HTTP endpoint was subscribed to the topic directly without a queue in between. This subscription has a DeliveryPolicy that caps the number of retries to 20, with exponential back-off function.

The company didn’t want to create all these resources manually, though. They wanted to turn this infrastructure into versionable code, and the ability to quickly spin up and tear down this infrastructure in an automated manner. Therefore, they created an AWS CloudFormation template to manage these AWS messaging resources: Amazon SNS topic, Amazon SNS subscriptions, Amazon SNS subscription attributes, and Amazon SQS queues.

Executing the AWS CloudFormation template

Now you’re ready to execute this AWS CloudFormation template yourself. To bootstrap this architecture in your AWS account:

    1. Download the sample AWS CloudFormation template from the repository.
    2. Go to the AWS CloudFormation console.
    3. Choose Create Stack.
    4. For Select Template, choose to upload a template to Amazon S3, and choose Browse.
    5. Select the template you downloaded and choose Next.
    6. For Specify Details:
      • Enter the following stack name: Car-Dealer-Stack.
      • Enter the HTTP endpoint to be subscribed to your topic. If you don’t have an HTTP endpoint, create a temp one.
      • Choose Next.
    7. For Options, choose Next.
    8. For Review, choose Create.
    9. Wait until your stack creation process is complete.

Now that all the infrastructure is in place, verify the Amazon SNS subscriptions attributes set by the AWS CloudFormation template as follows:

  1. Go to the Amazon SNS console.
  2. Choose Topics and then select the ARN associated with Car-Sales.
  3. Verify the first subscription:
    • Select the subscription related to ERP-Integration (Amazon SQS protocol).
    • Choose Other subscription actions and then choose Edit subscription attributes.
    • Note that raw message delivery is enabled, and choose Cancel to go back.
  4. Verify the second subscription:
    • Select the subscription related to CRM-Integration (Amazon SQS protocol).
    • Choose Other subscription actions and then choose Edit subscription attributes.
    • Note that raw message delivery is enabled and then choose Cancel to go back.
    • Choose Other subscription actions and then choose Edit subscription filter policy.
    • Note that the filter policy is set, and then choose Cancel to go back
  5. Confirm the third subscription.
  6. Verify the third subscription:
    • Select the subscription related to SCM-System (HTTP protocol).
    • Choose Other subscription actions and then choose Edit subscription delivery policy.
    •  Choose Advanced view.
    • Note that an exponential delivery retry policy is set, and then choose Cancel to go back.

Now that you have verified all subscription attributes, you can delete your AWS CloudFormation stack as follows:

  1. Go to the AWS CloudFormation console.
  2. In the list of stacks, select Car-Dealer-Stack.
  3. Choose Actions, choose Delete Stack, and then choose Yes Delete.
  4. Wait for the stack deletion process to complete.

That’s it! At this point, you have deleted all Amazon SNS and Amazon SQS resources created in this exercise from your AWS account.

Summary

AWS CloudFormation templates enable the simultaneous creation of Amazon SNS subscriptions and their attributes (such as FilterPolicy, DeliveryPolicy, and RawMessageDelivery) in an automated and secure manner. AWS CloudFormation support for Amazon SNS subscription attributes is available now in all AWS Regions.

For information about pricing, see AWS CloudFormation Pricing. For more information on setting up Amazon SNS resources via AWS CloudFormation templates, see:

Amazon SageMaker Updates – Tokyo Region, CloudFormation, Chainer, and GreenGrass ML

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/sagemaker-tokyo-summit-2018/

Today, at the AWS Summit in Tokyo we announced a number of updates and new features for Amazon SageMaker. Starting today, SageMaker is available in Asia Pacific (Tokyo)! SageMaker also now supports CloudFormation. A new machine learning framework, Chainer, is now available in the SageMaker Python SDK, in addition to MXNet and Tensorflow. Finally, support for running Chainer models on several devices was added to AWS Greengrass Machine Learning.

Amazon SageMaker Chainer Estimator


Chainer is a popular, flexible, and intuitive deep learning framework. Chainer networks work on a “Define-by-Run” scheme, where the network topology is defined dynamically via forward computation. This is in contrast to many other frameworks which work on a “Define-and-Run” scheme where the topology of the network is defined separately from the data. A lot of developers enjoy the Chainer scheme since it allows them to write their networks with native python constructs and tools.

Luckily, using Chainer with SageMaker is just as easy as using a TensorFlow or MXNet estimator. In fact, it might even be a bit easier since it’s likely you can take your existing scripts and use them to train on SageMaker with very few modifications. With TensorFlow or MXNet users have to implement a train function with a particular signature. With Chainer your scripts can be a little bit more portable as you can simply read from a few environment variables like SM_MODEL_DIR, SM_NUM_GPUS, and others. We can wrap our existing script in a if __name__ == '__main__': guard and invoke it locally or on sagemaker.


import argparse
import os

if __name__ =='__main__':

    parser = argparse.ArgumentParser()

    # hyperparameters sent by the client are passed as command-line arguments to the script.
    parser.add_argument('--epochs', type=int, default=10)
    parser.add_argument('--batch-size', type=int, default=64)
    parser.add_argument('--learning-rate', type=float, default=0.05)

    # Data, model, and output directories
    parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
    parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
    parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
    parser.add_argument('--test', type=str, default=os.environ['SM_CHANNEL_TEST'])

    args, _ = parser.parse_known_args()

    # ... load from args.train and args.test, train a model, write model to args.model_dir.

Then, we can run that script locally or use the SageMaker Python SDK to launch it on some GPU instances in SageMaker. The hyperparameters will get passed in to the script as CLI commands and the environment variables above will be autopopulated. When we call fit the input channels we pass will be populated in the SM_CHANNEL_* environment variables.


from sagemaker.chainer.estimator import Chainer
# Create my estimator
chainer_estimator = Chainer(
    entry_point='example.py',
    train_instance_count=1,
    train_instance_type='ml.p3.2xlarge',
    hyperparameters={'epochs': 10, 'batch-size': 64}
)
# Train my estimator
chainer_estimator.fit({'train': train_input, 'test': test_input})

# Deploy my estimator to a SageMaker Endpoint and get a Predictor
predictor = chainer_estimator.deploy(
    instance_type="ml.m4.xlarge",
    initial_instance_count=1
)

Now, instead of bringing your own docker container for training and hosting with Chainer, you can just maintain your script. You can see the full sagemaker-chainer-containers on github. One of my favorite features of the new container is built-in chainermn for easy multi-node distribution of your chainer training jobs.

There’s a lot more documentation and information available in both the README and the example notebooks.

AWS GreenGrass ML with Chainer

AWS GreenGrass ML now includes a pre-built Chainer package for all devices powered by Intel Atom, NVIDIA Jetson, TX2, and Raspberry Pi. So, now GreenGrass ML provides pre-built packages for TensorFlow, Apache MXNet, and Chainer! You can train your models on SageMaker then easily deploy it to any GreenGrass-enabled device using GreenGrass ML.

JAWS UG

I want to give a quick shout out to all of our wonderful and inspirational friends in the JAWS UG who attended the AWS Summit in Tokyo today. I’ve very much enjoyed seeing your pictures of the summit. Thanks for making Japan an amazing place for AWS developers! I can’t wait to visit again and meet with all of you.

Randall

Amazon Neptune Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-neptune-generally-available/

Amazon Neptune is now Generally Available in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland). Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets. At the core of Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with millisecond latencies. Neptune supports two popular graph models, Property Graph and RDF, through Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune can be used to power everything from recommendation engines and knowledge graphs to drug discovery and network security. Neptune is fully-managed with automatic minor version upgrades, backups, encryption, and fail-over. I wrote about Neptune in detail for AWS re:Invent last year and customers have been using the preview and providing great feedback that the team has used to prepare the service for GA.

Now that Amazon Neptune is generally available there are a few changes from the preview:

Launching an Amazon Neptune Cluster

Launching a Neptune cluster is as easy as navigating to the AWS Management Console and clicking create cluster. Of course you can also launch with CloudFormation, the CLI, or the SDKs.

You can monitor your cluster health and the health of individual instances through Amazon CloudWatch and the console.

Additional Resources

We’ve created two repos with some additional tools and examples here. You can expect continuous development on these repos as we add additional tools and examples.

  • Amazon Neptune Tools Repo
    This repo has a useful tool for converting GraphML files into Neptune compatible CSVs for bulk loading from S3.
  • Amazon Neptune Samples Repo
    This repo has a really cool example of building a collaborative filtering recommendation engine for video game preferences.

Purpose Built Databases

There’s an industry trend where we’re moving more and more onto purpose-built databases. Developers and businesses want to access their data in the format that makes the most sense for their applications. As cloud resources make transforming large datasets easier with tools like AWS Glue, we have a lot more options than we used to for accessing our data. With tools like Amazon Redshift, Amazon Athena, Amazon Aurora, Amazon DynamoDB, and more we get to choose the best database for the job or even enable entirely new use-cases. Amazon Neptune is perfect for workloads where the data is highly connected across data rich edges.

I’m really excited about graph databases and I see a huge number of applications. Looking for ideas of cool things to build? I’d love to build a web crawler in AWS Lambda that uses Neptune as the backing store. You could further enrich it by running Amazon Comprehend or Amazon Rekognition on the text and images found and creating a search engine on top of Neptune.

As always, feel free to reach out in the comments or on twitter to provide any feedback!

Randall

Protecting your API using Amazon API Gateway and AWS WAF — Part I

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/protecting-your-api-using-amazon-api-gateway-and-aws-waf-part-i/

This post courtesy of Thiago Morais, AWS Solutions Architect

When you build web applications or expose any data externally, you probably look for a platform where you can build highly scalable, secure, and robust REST APIs. As APIs are publicly exposed, there are a number of best practices for providing a secure mechanism to consumers using your API.

Amazon API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, authorization and access control, monitoring, and API version management.

In this post, I show you how to take advantage of the regional API endpoint feature in API Gateway, so that you can create your own Amazon CloudFront distribution and secure your API using AWS WAF.

AWS WAF is a web application firewall that helps protect your web applications from common web exploits that could affect application availability, compromise security, or consume excessive resources.

As you make your APIs publicly available, you are exposed to attackers trying to exploit your services in several ways. The AWS security team published a whitepaper solution using AWS WAF, How to Mitigate OWASP’s Top 10 Web Application Vulnerabilities.

Regional API endpoints

Edge-optimized APIs are endpoints that are accessed through a CloudFront distribution created and managed by API Gateway. Before the launch of regional API endpoints, this was the default option when creating APIs using API Gateway. It primarily helped to reduce latency for API consumers that were located in different geographical locations than your API.

When API requests predominantly originate from an Amazon EC2 instance or other services within the same AWS Region as the API is deployed, a regional API endpoint typically lowers the latency of connections. It is recommended for such scenarios.

For better control around caching strategies, customers can use their own CloudFront distribution for regional APIs. They also have the ability to use AWS WAF protection, as I describe in this post.

Edge-optimized API endpoint

The following diagram is an illustrated example of the edge-optimized API endpoint where your API clients access your API through a CloudFront distribution created and managed by API Gateway.

Regional API endpoint

For the regional API endpoint, your customers access your API from the same Region in which your REST API is deployed. This helps you to reduce request latency and particularly allows you to add your own content delivery network, as needed.

Walkthrough

In this section, you implement the following steps:

  • Create a regional API using the PetStore sample API.
  • Create a CloudFront distribution for the API.
  • Test the CloudFront distribution.
  • Set up AWS WAF and create a web ACL.
  • Attach the web ACL to the CloudFront distribution.
  • Test AWS WAF protection.

Create the regional API

For this walkthrough, use an existing PetStore API. All new APIs launch by default as the regional endpoint type. To change the endpoint type for your existing API, choose the cog icon on the top right corner:

After you have created the PetStore API on your account, deploy a stage called “prod” for the PetStore API.

On the API Gateway console, select the PetStore API and choose Actions, Deploy API.

For Stage name, type prod and add a stage description.

Choose Deploy and the new API stage is created.

Use the following AWS CLI command to update your API from edge-optimized to regional:

aws apigateway update-rest-api \
--rest-api-id {rest-api-id} \
--patch-operations op=replace,path=/endpointConfiguration/types/EDGE,value=REGIONAL

A successful response looks like the following:

{
    "description": "Your first API with Amazon API Gateway. This is a sample API that integrates via HTTP with your demo Pet Store endpoints", 
    "createdDate": 1511525626, 
    "endpointConfiguration": {
        "types": [
            "REGIONAL"
        ]
    }, 
    "id": "{api-id}", 
    "name": "PetStore"
}

After you change your API endpoint to regional, you can now assign your own CloudFront distribution to this API.

Create a CloudFront distribution

To make things easier, I have provided an AWS CloudFormation template to deploy a CloudFront distribution pointing to the API that you just created. Click the button to deploy the template in the us-east-1 Region.

For Stack name, enter RegionalAPI. For APIGWEndpoint, enter your API FQDN in the following format:

{api-id}.execute-api.us-east-1.amazonaws.com

After you fill out the parameters, choose Next to continue the stack deployment. It takes a couple of minutes to finish the deployment. After it finishes, the Output tab lists the following items:

  • A CloudFront domain URL
  • An S3 bucket for CloudFront access logs
Output from CloudFormation

Output from CloudFormation

Test the CloudFront distribution

To see if the CloudFront distribution was configured correctly, use a web browser and enter the URL from your distribution, with the following parameters:

https://{your-distribution-url}.cloudfront.net/{api-stage}/pets

You should get the following output:

[
  {
    "id": 1,
    "type": "dog",
    "price": 249.99
  },
  {
    "id": 2,
    "type": "cat",
    "price": 124.99
  },
  {
    "id": 3,
    "type": "fish",
    "price": 0.99
  }
]

Set up AWS WAF and create a web ACL

With the new CloudFront distribution in place, you can now start setting up AWS WAF to protect your API.

For this demo, you deploy the AWS WAF Security Automations solution, which provides fine-grained control over the requests attempting to access your API.

For more information about deployment, see Automated Deployment. If you prefer, you can launch the solution directly into your account using the following button.

For CloudFront Access Log Bucket Name, add the name of the bucket created during the deployment of the CloudFormation stack for your CloudFront distribution.

The solution allows you to adjust thresholds and also choose which automations to enable to protect your API. After you finish configuring these settings, choose Next.

To start the deployment process in your account, follow the creation wizard and choose Create. It takes a few minutes do finish the deployment. You can follow the creation process through the CloudFormation console.

After the deployment finishes, you can see the new web ACL deployed on the AWS WAF console, AWSWAFSecurityAutomations.

Attach the AWS WAF web ACL to the CloudFront distribution

With the solution deployed, you can now attach the AWS WAF web ACL to the CloudFront distribution that you created earlier.

To assign the newly created AWS WAF web ACL, go back to your CloudFront distribution. After you open your distribution for editing, choose General, Edit.

Select the new AWS WAF web ACL that you created earlier, AWSWAFSecurityAutomations.

Save the changes to your CloudFront distribution and wait for the deployment to finish.

Test AWS WAF protection

To validate the AWS WAF Web ACL setup, use Artillery to load test your API and see AWS WAF in action.

To install Artillery on your machine, run the following command:

$ npm install -g artillery

After the installation completes, you can check if Artillery installed successfully by running the following command:

$ artillery -V
$ 1.6.0-12

As the time of publication, Artillery is on version 1.6.0-12.

One of the WAF web ACL rules that you have set up is a rate-based rule. By default, it is set up to block any requesters that exceed 2000 requests under 5 minutes. Try this out.

First, use cURL to query your distribution and see the API output:

$ curl -s https://{distribution-name}.cloudfront.net/prod/pets
[
  {
    "id": 1,
    "type": "dog",
    "price": 249.99
  },
  {
    "id": 2,
    "type": "cat",
    "price": 124.99
  },
  {
    "id": 3,
    "type": "fish",
    "price": 0.99
  }
]

Based on the test above, the result looks good. But what if you max out the 2000 requests in under 5 minutes?

Run the following Artillery command:

artillery quick -n 2000 --count 10  https://{distribution-name}.cloudfront.net/prod/pets

What you are doing is firing 2000 requests to your API from 10 concurrent users. For brevity, I am not posting the Artillery output here.

After Artillery finishes its execution, try to run the cURL request again and see what happens:

 

$ curl -s https://{distribution-name}.cloudfront.net/prod/pets

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<TITLE>ERROR: The request could not be satisfied</TITLE>
</HEAD><BODY>
<H1>ERROR</H1>
<H2>The request could not be satisfied.</H2>
<HR noshade size="1px">
Request blocked.
<BR clear="all">
<HR noshade size="1px">
<PRE>
Generated by cloudfront (CloudFront)
Request ID: [removed]
</PRE>
<ADDRESS>
</ADDRESS>
</BODY></HTML>

As you can see from the output above, the request was blocked by AWS WAF. Your IP address is removed from the blocked list after it falls below the request limit rate.

Conclusion

In this first part, you saw how to use the new API Gateway regional API endpoint together with Amazon CloudFront and AWS WAF to secure your API from a series of attacks.

In the second part, I will demonstrate some other techniques to protect your API using API keys and Amazon CloudFront custom headers.

From Framework to Function: Deploying AWS Lambda Functions for Java 8 using Apache Maven Archetype

Post Syndicated from Ryosuke Iwanaga original https://aws.amazon.com/blogs/compute/from-framework-to-function-deploying-aws-lambda-functions-for-java-8-using-apache-maven-archetype/

As a serverless computing platform that supports Java 8 runtime, AWS Lambda makes it easy to run any type of Java function simply by uploading a JAR file. To help define not only a Lambda serverless application but also Amazon API Gateway, Amazon DynamoDB, and other related services, the AWS Serverless Application Model (SAM) allows developers to use a simple AWS CloudFormation template.

AWS provides the AWS Toolkit for Eclipse that supports both Lambda and SAM. AWS also gives customers an easy way to create Lambda functions and SAM applications in Java using the AWS Command Line Interface (AWS CLI). After you build a JAR file, all you have to do is type the following commands:

aws cloudformation package 
aws cloudformation deploy

To consolidate these steps, customers can use Archetype by Apache Maven. Archetype uses a predefined package template that makes getting started to develop a function exceptionally simple.

In this post, I introduce a Maven archetype that allows you to create a skeleton of AWS SAM for a Java function. Using this archetype, you can generate a sample Java code example and an accompanying SAM template to deploy it on AWS Lambda by a single Maven action.

Prerequisites

Make sure that the following software is installed on your workstation:

  • Java
  • Maven
  • AWS CLI
  • (Optional) AWS SAM CLI

Install Archetype

After you’ve set up those packages, install Archetype with the following commands:

git clone https://github.com/awslabs/aws-serverless-java-archetype
cd aws-serverless-java-archetype
mvn install

These are one-time operations, so you don’t run them for every new package. If you’d like, you can add Archetype to your company’s Maven repository so that other developers can use it later.

With those packages installed, you’re ready to develop your new Lambda Function.

Start a project

Now that you have the archetype, customize it and run the code:

cd /path/to/project_home
mvn archetype:generate \
  -DarchetypeGroupId=com.amazonaws.serverless.archetypes \
  -DarchetypeArtifactId=aws-serverless-java-archetype \
  -DarchetypeVersion=1.0.0 \
  -DarchetypeRepository=local \ # Forcing to use local maven repository
  -DinteractiveMode=false \ # For batch mode
  # You can also specify properties below interactively if you omit the line for batch mode
  -DgroupId=YOUR_GROUP_ID \
  -DartifactId=YOUR_ARTIFACT_ID \
  -Dversion=YOUR_VERSION \
  -DclassName=YOUR_CLASSNAME

You should have a directory called YOUR_ARTIFACT_ID that contains the files and folders shown below:

├── event.json
├── pom.xml
├── src
│   └── main
│       ├── java
│       │   └── Package
│       │       └── Example.java
│       └── resources
│           └── log4j2.xml
└── template.yaml

The sample code is a working example. If you install SAM CLI, you can invoke it just by the command below:

cd YOUR_ARTIFACT_ID
mvn -P invoke verify
[INFO] Scanning for projects...
[INFO]
[INFO] ---------------------------< com.riywo:foo >----------------------------
[INFO] Building foo 1.0
[INFO] --------------------------------[ jar ]---------------------------------
...
[INFO] --- maven-jar-plugin:3.0.2:jar (default-jar) @ foo ---
[INFO] Building jar: /private/tmp/foo/target/foo-1.0.jar
[INFO]
[INFO] --- maven-shade-plugin:3.1.0:shade (shade) @ foo ---
[INFO] Including com.amazonaws:aws-lambda-java-core:jar:1.2.0 in the shaded jar.
[INFO] Replacing /private/tmp/foo/target/lambda.jar with /private/tmp/foo/target/foo-1.0-shaded.jar
[INFO]
[INFO] --- exec-maven-plugin:1.6.0:exec (sam-local-invoke) @ foo ---
2018/04/06 16:34:35 Successfully parsed template.yaml
2018/04/06 16:34:35 Connected to Docker 1.37
2018/04/06 16:34:35 Fetching lambci/lambda:java8 image for java8 runtime...
java8: Pulling from lambci/lambda
Digest: sha256:14df0a5914d000e15753d739612a506ddb8fa89eaa28dcceff5497d9df2cf7aa
Status: Image is up to date for lambci/lambda:java8
2018/04/06 16:34:37 Invoking Package.Example::handleRequest (java8)
2018/04/06 16:34:37 Decompressing /tmp/foo/target/lambda.jar
2018/04/06 16:34:37 Mounting /private/var/folders/x5/ldp7c38545v9x5dg_zmkr5kxmpdprx/T/aws-sam-local-1523000077594231063 as /var/task:ro inside runtime container
START RequestId: a6ae19fe-b1b0-41e2-80bc-68a40d094d74 Version: $LATEST
Log output: Greeting is 'Hello Tim Wagner.'
END RequestId: a6ae19fe-b1b0-41e2-80bc-68a40d094d74
REPORT RequestId: a6ae19fe-b1b0-41e2-80bc-68a40d094d74	Duration: 96.60 ms	Billed Duration: 100 ms	Memory Size: 128 MB	Max Memory Used: 7 MB

{"greetings":"Hello Tim Wagner."}


[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10.452 s
[INFO] Finished at: 2018-04-06T16:34:40+09:00
[INFO] ------------------------------------------------------------------------

This maven goal invokes sam local invoke -e event.json, so you can see the sample output to greet Tim Wagner.

To deploy this application to AWS, you need an Amazon S3 bucket to upload your package. You can use the following command to create a bucket if you want:

aws s3 mb s3://YOUR_BUCKET --region YOUR_REGION

Now, you can deploy your application by just one command!

mvn deploy \
    -DawsRegion=YOUR_REGION \
    -Ds3Bucket=YOUR_BUCKET \
    -DstackName=YOUR_STACK
[INFO] Scanning for projects...
[INFO]
[INFO] ---------------------------< com.riywo:foo >----------------------------
[INFO] Building foo 1.0
[INFO] --------------------------------[ jar ]---------------------------------
...
[INFO] --- exec-maven-plugin:1.6.0:exec (sam-package) @ foo ---
Uploading to aws-serverless-java/com.riywo:foo:1.0/924732f1f8e4705c87e26ef77b080b47  11657 / 11657.0  (100.00%)
Successfully packaged artifacts and wrote output template to file target/sam.yaml.
Execute the following command to deploy the packaged template
aws cloudformation deploy --template-file /private/tmp/foo/target/sam.yaml --stack-name <YOUR STACK NAME>
[INFO]
[INFO] --- maven-deploy-plugin:2.8.2:deploy (default-deploy) @ foo ---
[INFO] Skipping artifact deployment
[INFO]
[INFO] --- exec-maven-plugin:1.6.0:exec (sam-deploy) @ foo ---

Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - archetype
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 37.176 s
[INFO] Finished at: 2018-04-06T16:41:02+09:00
[INFO] ------------------------------------------------------------------------

Maven automatically creates a shaded JAR file, uploads it to your S3 bucket, replaces template.yaml, and creates and updates the CloudFormation stack.

To customize the process, modify the pom.xml file. For example, to avoid typing values for awsRegion, s3Bucket or stackName, write them inside pom.xml and check in your VCS. Afterward, you and the rest of your team can deploy the function by typing just the following command:

mvn deploy

Options

Lambda Java 8 runtime has some types of handlers: POJO, Simple type and Stream. The default option of this archetype is POJO style, which requires to create request and response classes, but they are baked by the archetype by default. If you want to use other type of handlers, you can use handlerType property like below:

## POJO type (default)
mvn archetype:generate \
 ...
 -DhandlerType=pojo

## Simple type - String
mvn archetype:generate \
 ...
 -DhandlerType=simple

### Stream type
mvn archetype:generate \
 ...
 -DhandlerType=stream

See documentation for more details about handlers.

Also, Lambda Java 8 runtime supports two types of Logging class: Log4j 2 and LambdaLogger. This archetype creates LambdaLogger implementation by default, but you can use Log4j 2 if you want:

## LambdaLogger (default)
mvn archetype:generate \
 ...
 -Dlogger=lambda

## Log4j 2
mvn archetype:generate \
 ...
 -Dlogger=log4j2

If you use LambdaLogger, you can delete ./src/main/resources/log4j2.xml. See documentation for more details.

Conclusion

So, what’s next? Develop your Lambda function locally and type the following command: mvn deploy !

With this Archetype code example, available on GitHub repo, you should be able to deploy Lambda functions for Java 8 in a snap. If you have any questions or comments, please submit them below or leave them on GitHub.

Implement continuous integration and delivery of serverless AWS Glue ETL applications using AWS Developer Tools

Post Syndicated from Prasad Alle original https://aws.amazon.com/blogs/big-data/implement-continuous-integration-and-delivery-of-serverless-aws-glue-etl-applications-using-aws-developer-tools/

AWS Glue is an increasingly popular way to develop serverless ETL (extract, transform, and load) applications for big data and data lake workloads. Organizations that transform their ETL applications to cloud-based, serverless ETL architectures need a seamless, end-to-end continuous integration and continuous delivery (CI/CD) pipeline: from source code, to build, to deployment, to product delivery. Having a good CI/CD pipeline can help your organization discover bugs before they reach production and deliver updates more frequently. It can also help developers write quality code and automate the ETL job release management process, mitigate risk, and more.

AWS Glue is a fully managed data catalog and ETL service. It simplifies and automates the difficult and time-consuming tasks of data discovery, conversion, and job scheduling. AWS Glue crawls your data sources and constructs a data catalog using pre-built classifiers for popular data formats and data types, including CSV, Apache Parquet, JSON, and more.

When you are developing ETL applications using AWS Glue, you might come across some of the following CI/CD challenges:

  • Iterative development with unit tests
  • Continuous integration and build
  • Pushing the ETL pipeline to a test environment
  • Pushing the ETL pipeline to a production environment
  • Testing ETL applications using real data (live test)
  • Exploring and validating data

In this post, I walk you through a solution that implements a CI/CD pipeline for serverless AWS Glue ETL applications supported by AWS Developer Tools (including AWS CodePipeline, AWS CodeCommit, and AWS CodeBuild) and AWS CloudFormation.

Solution overview

The following diagram shows the pipeline workflow:

This solution uses AWS CodePipeline, which lets you orchestrate and automate the test and deploy stages for ETL application source code. The solution consists of a pipeline that contains the following stages:

1.) Source Control: In this stage, the AWS Glue ETL job source code and the AWS CloudFormation template file for deploying the ETL jobs are both committed to version control. I chose to use AWS CodeCommit for version control.

To get the ETL job source code and AWS CloudFormation template, download the gluedemoetl.zip file. This solution is developed based on a previous post, Build a Data Lake Foundation with AWS Glue and Amazon S3.

2.) LiveTest: In this stage, all resources—including AWS Glue crawlers, jobs, S3 buckets, roles, and other resources that are required for the solution—are provisioned, deployed, live tested, and cleaned up.

The LiveTest stage includes the following actions:

  • Deploy: In this action, all the resources that are required for this solution (crawlers, jobs, buckets, roles, and so on) are provisioned and deployed using an AWS CloudFormation template.
  • AutomatedLiveTest: In this action, all the AWS Glue crawlers and jobs are executed and data exploration and validation tests are performed. These validation tests include, but are not limited to, record counts in both raw tables and transformed tables in the data lake and any other business validations. I used AWS CodeBuild for this action.
  • LiveTestApproval: This action is included for the cases in which a pipeline administrator approval is required to deploy/promote the ETL applications to the next stage. The pipeline pauses in this action until an administrator manually approves the release.
  • LiveTestCleanup: In this action, all the LiveTest stage resources, including test crawlers, jobs, roles, and so on, are deleted using the AWS CloudFormation template. This action helps minimize cost by ensuring that the test resources exist only for the duration of the AutomatedLiveTest and LiveTestApproval

3.) DeployToProduction: In this stage, all the resources are deployed using the AWS CloudFormation template to the production environment.

Try it out

This code pipeline takes approximately 20 minutes to complete the LiveTest test stage (up to the LiveTest approval stage, in which manual approval is required).

To get started with this solution, choose Launch Stack:

This creates the CI/CD pipeline with all of its stages, as described earlier. It performs an initial commit of the sample AWS Glue ETL job source code to trigger the first release change.

In the AWS CloudFormation console, choose Create. After the template finishes creating resources, you see the pipeline name on the stack Outputs tab.

After that, open the CodePipeline console and select the newly created pipeline. Initially, your pipeline’s CodeCommit stage shows that the source action failed.

Allow a few minutes for your new pipeline to detect the initial commit applied by the CloudFormation stack creation. As soon as the commit is detected, your pipeline starts. You will see the successful stage completion status as soon as the CodeCommit source stage runs.

In the CodeCommit console, choose Code in the navigation pane to view the solution files.

Next, you can watch how the pipeline goes through the LiveTest stage of the deploy and AutomatedLiveTest actions, until it finally reaches the LiveTestApproval action.

At this point, if you check the AWS CloudFormation console, you can see that a new template has been deployed as part of the LiveTest deploy action.

At this point, make sure that the AWS Glue crawlers and the AWS Glue job ran successfully. Also check whether the corresponding databases and external tables have been created in the AWS Glue Data Catalog. Then verify that the data is validated using Amazon Athena, as shown following.

Open the AWS Glue console, and choose Databases in the navigation pane. You will see the following databases in the Data Catalog:

Open the Amazon Athena console, and run the following queries. Verify that the record counts are matching.

SELECT count(*) FROM "nycitytaxi_gluedemocicdtest"."data";
SELECT count(*) FROM "nytaxiparquet_gluedemocicdtest"."datalake";

The following shows the raw data:

The following shows the transformed data:

The pipeline pauses the action until the release is approved. After validating the data, manually approve the revision on the LiveTestApproval action on the CodePipeline console.

Add comments as needed, and choose Approve.

The LiveTestApproval stage now appears as Approved on the console.

After the revision is approved, the pipeline proceeds to use the AWS CloudFormation template to destroy the resources that were deployed in the LiveTest deploy action. This helps reduce cost and ensures a clean test environment on every deployment.

Production deployment is the final stage. In this stage, all the resources—AWS Glue crawlers, AWS Glue jobs, Amazon S3 buckets, roles, and so on—are provisioned and deployed to the production environment using the AWS CloudFormation template.

After successfully running the whole pipeline, feel free to experiment with it by changing the source code stored on AWS CodeCommit. For example, if you modify the AWS Glue ETL job to generate an error, it should make the AutomatedLiveTest action fail. Or if you change the AWS CloudFormation template to make its creation fail, it should affect the LiveTest deploy action. The objective of the pipeline is to guarantee that all changes that are deployed to production are guaranteed to work as expected.

Conclusion

In this post, you learned how easy it is to implement CI/CD for serverless AWS Glue ETL solutions with AWS developer tools like AWS CodePipeline and AWS CodeBuild at scale. Implementing such solutions can help you accelerate ETL development and testing at your organization.

If you have questions or suggestions, please comment below.

 


Additional Reading

If you found this post useful, be sure to check out Implement Continuous Integration and Delivery of Apache Spark Applications using AWS and Build a Data Lake Foundation with AWS Glue and Amazon S3.

 


About the Authors

Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.

 
Luis Caro is a Big Data Consultant for AWS Professional Services. He works with our customers to provide guidance and technical assistance on big data projects, helping them improving the value of their solutions when using AWS.

 

 

 

Implementing safe AWS Lambda deployments with AWS CodeDeploy

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/implementing-safe-aws-lambda-deployments-with-aws-codedeploy/

This post courtesy of George Mao, AWS Senior Serverless Specialist – Solutions Architect

AWS Lambda and AWS CodeDeploy recently made it possible to automatically shift incoming traffic between two function versions based on a preconfigured rollout strategy. This new feature allows you to gradually shift traffic to the new function. If there are any issues with the new code, you can quickly rollback and control the impact to your application.

Previously, you had to manually move 100% of traffic from the old version to the new version. Now, you can have CodeDeploy automatically execute pre- or post-deployment tests and automate a gradual rollout strategy. Traffic shifting is built right into the AWS Serverless Application Model (SAM), making it easy to define and deploy your traffic shifting capabilities. SAM is an extension of AWS CloudFormation that provides a simplified way of defining serverless applications.

In this post, I show you how to use SAM, CloudFormation, and CodeDeploy to accomplish an automated rollout strategy for safe Lambda deployments.

Scenario

For this walkthrough, you write a Lambda application that returns a count of the S3 buckets that you own. You deploy it and use it in production. Later on, you receive requirements that tell you that you need to change your Lambda application to count only buckets that begin with the letter “a”.

Before you make the change, you need to be sure that your new Lambda application works as expected. If it does have issues, you want to minimize the number of impacted users and roll back easily. To accomplish this, you create a deployment process that publishes the new Lambda function, but does not send any traffic to it. You use CodeDeploy to execute a PreTraffic test to ensure that your new function works as expected. After the test succeeds, CodeDeploy automatically shifts traffic gradually to the new version of the Lambda function.

Your Lambda function is exposed as a REST service via an Amazon API Gateway deployment. This makes it easy to test and integrate.

Prerequisites

To execute the SAM and CloudFormation deployment, you must have the following IAM permissions:

  • cloudformation:*
  • lambda:*
  • codedeploy:*
  • iam:create*

You may use the AWS SAM Local CLI or the AWS CLI to package and deploy your Lambda application. If you choose to use SAM Local, be sure to install it onto your system. For more information, see AWS SAM Local Installation.

All of the code used in this post can be found in this GitHub repository: https://github.com/aws-samples/aws-safe-lambda-deployments.

Walkthrough

For this post, use SAM to define your resources because it comes with built-in CodeDeploy support for safe Lambda deployments.  The deployment is handled and automated by CloudFormation.

SAM allows you to define your Serverless applications in a simple and concise fashion, because it automatically creates all necessary resources behind the scenes. For example, if you do not define an execution role for a Lambda function, SAM automatically creates one. SAM also creates the CodeDeploy application necessary to drive the traffic shifting, as well as the IAM service role that CodeDeploy uses to execute all actions.

Create a SAM template

To get started, write your SAM template and call it template.yaml.

AWSTemplateFormatVersion : '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: An example SAM template for Lambda Safe Deployments.

Resources:

  returnS3Buckets:
    Type: AWS::Serverless::Function
    Properties:
      Handler: returnS3Buckets.handler
      Runtime: nodejs6.10
      AutoPublishAlias: live
      Policies:
        - Version: "2012-10-17"
          Statement: 
          - Effect: "Allow"
            Action: 
              - "s3:ListAllMyBuckets"
            Resource: '*'
      DeploymentPreference:
          Type: Linear10PercentEvery1Minute
          Hooks:
            PreTraffic: !Ref preTrafficHook
      Events:
        Api:
          Type: Api
          Properties:
            Path: /test
            Method: get

  preTrafficHook:
    Type: AWS::Serverless::Function
    Properties:
      Handler: preTrafficHook.handler
      Policies:
        - Version: "2012-10-17"
          Statement: 
          - Effect: "Allow"
            Action: 
              - "codedeploy:PutLifecycleEventHookExecutionStatus"
            Resource:
              !Sub 'arn:aws:codedeploy:${AWS::Region}:${AWS::AccountId}:deploymentgroup:${ServerlessDeploymentApplication}/*'
        - Version: "2012-10-17"
          Statement: 
          - Effect: "Allow"
            Action: 
              - "lambda:InvokeFunction"
            Resource: !Ref returnS3Buckets.Version
      Runtime: nodejs6.10
      FunctionName: 'CodeDeployHook_preTrafficHook'
      DeploymentPreference:
        Enabled: false
      Timeout: 5
      Environment:
        Variables:
          NewVersion: !Ref returnS3Buckets.Version

This template creates two functions:

  • returnS3Buckets
  • preTrafficHook

The returnS3Buckets function is where your application logic lives. It’s a simple piece of code that uses the AWS SDK for JavaScript in Node.JS to call the Amazon S3 listBuckets API action and return the number of buckets.

'use strict';

var AWS = require('aws-sdk');
var s3 = new AWS.S3();

exports.handler = (event, context, callback) => {
	console.log("I am here! " + context.functionName  +  ":"  +  context.functionVersion);

	s3.listBuckets(function (err, data){
		if(err){
			console.log(err, err.stack);
			callback(null, {
				statusCode: 500,
				body: "Failed!"
			});
		}
		else{
			var allBuckets = data.Buckets;

			console.log("Total buckets: " + allBuckets.length);
			callback(null, {
				statusCode: 200,
				body: allBuckets.length
			});
		}
	});	
}

Review the key parts of the SAM template that defines returnS3Buckets:

  • The AutoPublishAlias attribute instructs SAM to automatically publish a new version of the Lambda function for each new deployment and link it to the live alias.
  • The Policies attribute specifies additional policy statements that SAM adds onto the automatically generated IAM role for this function. The first statement provides the function with permission to call listBuckets.
  • The DeploymentPreference attribute configures the type of rollout pattern to use. In this case, you are shifting traffic in a linear fashion, moving 10% of traffic every minute to the new version. For more information about supported patterns, see Serverless Application Model: Traffic Shifting Configurations.
  • The Hooks attribute specifies that you want to execute the preTrafficHook Lambda function before CodeDeploy automatically begins shifting traffic. This function should perform validation testing on the newly deployed Lambda version. This function invokes the new Lambda function and checks the results. If you’re satisfied with the tests, instruct CodeDeploy to proceed with the rollout via an API call to: codedeploy.putLifecycleEventHookExecutionStatus.
  • The Events attribute defines an API-based event source that can trigger this function. It accepts requests on the /test path using an HTTP GET method.
'use strict';

const AWS = require('aws-sdk');
const codedeploy = new AWS.CodeDeploy({apiVersion: '2014-10-06'});
var lambda = new AWS.Lambda();

exports.handler = (event, context, callback) => {

	console.log("Entering PreTraffic Hook!");
	
	// Read the DeploymentId & LifecycleEventHookExecutionId from the event payload
    var deploymentId = event.DeploymentId;
	var lifecycleEventHookExecutionId = event.LifecycleEventHookExecutionId;

	var functionToTest = process.env.NewVersion;
	console.log("Testing new function version: " + functionToTest);

	// Perform validation of the newly deployed Lambda version
	var lambdaParams = {
		FunctionName: functionToTest,
		InvocationType: "RequestResponse"
	};

	var lambdaResult = "Failed";
	lambda.invoke(lambdaParams, function(err, data) {
		if (err){	// an error occurred
			console.log(err, err.stack);
			lambdaResult = "Failed";
		}
		else{	// successful response
			var result = JSON.parse(data.Payload);
			console.log("Result: " +  JSON.stringify(result));

			// Check the response for valid results
			// The response will be a JSON payload with statusCode and body properties. ie:
			// {
			//		"statusCode": 200,
			//		"body": 51
			// }
			if(result.body == 9){	
				lambdaResult = "Succeeded";
				console.log ("Validation testing succeeded!");
			}
			else{
				lambdaResult = "Failed";
				console.log ("Validation testing failed!");
			}

			// Complete the PreTraffic Hook by sending CodeDeploy the validation status
			var params = {
				deploymentId: deploymentId,
				lifecycleEventHookExecutionId: lifecycleEventHookExecutionId,
				status: lambdaResult // status can be 'Succeeded' or 'Failed'
			};
			
			// Pass AWS CodeDeploy the prepared validation test results.
			codedeploy.putLifecycleEventHookExecutionStatus(params, function(err, data) {
				if (err) {
					// Validation failed.
					console.log('CodeDeploy Status update failed');
					console.log(err, err.stack);
					callback("CodeDeploy Status update failed");
				} else {
					// Validation succeeded.
					console.log('Codedeploy status updated successfully');
					callback(null, 'Codedeploy status updated successfully');
				}
			});
		}  
	});
}

The hook is hardcoded to check that the number of S3 buckets returned is 9.

Review the key parts of the SAM template that defines preTrafficHook:

  • The Policies attribute specifies additional policy statements that SAM adds onto the automatically generated IAM role for this function. The first statement provides permissions to call the CodeDeploy PutLifecycleEventHookExecutionStatus API action. The second statement provides permissions to invoke the specific version of the returnS3Buckets function to test
  • This function has traffic shifting features disabled by setting the DeploymentPreference option to false.
  • The FunctionName attribute explicitly tells CloudFormation what to name the function. Otherwise, CloudFormation creates the function with the default naming convention: [stackName]-[FunctionName]-[uniqueID].  Name the function with the “CodeDeployHook_” prefix because the CodeDeployServiceRole role only allows InvokeFunction on functions named with that prefix.
  • Set the Timeout attribute to allow enough time to complete your validation tests.
  • Use an environment variable to inject the ARN of the newest deployed version of the returnS3Buckets function. The ARN allows the function to know the specific version to invoke and perform validation testing on.

Deploy the function

Your SAM template is all set and the code is written—you’re ready to deploy the function for the first time. Here’s how to do it via the SAM CLI. Replace “sam” with “cloudformation” to use CloudFormation instead.

First, package the function. This command returns a CloudFormation importable file, packaged.yaml.

sam package –template-file template.yaml –s3-bucket mybucket –output-template-file packaged.yaml

Now deploy everything:

sam deploy –template-file packaged.yaml –stack-name mySafeDeployStack –capabilities CAPABILITY_IAM

At this point, both Lambda functions have been deployed within the CloudFormation stack mySafeDeployStack. The returnS3Buckets has been deployed as Version 1:

SAM automatically created a few things, including the CodeDeploy application, with the deployment pattern that you specified (Linear10PercentEvery1Minute). There is currently one deployment group, with no action, because no deployments have occurred. SAM also created the IAM service role that this CodeDeploy application uses:

There is a single managed policy attached to this role, which allows CodeDeploy to invoke any Lambda function that begins with “CodeDeployHook_”.

An API has been set up called safeDeployStack. It targets your Lambda function with the /test resource using the GET method. When you test the endpoint, API Gateway executes the returnS3Buckets function and it returns the number of S3 buckets that you own. In this case, it’s 51.

Publish a new Lambda function version

Now implement the requirements change, which is to make returnS3Buckets count only buckets that begin with the letter “a”. The code now looks like the following (see returnS3BucketsNew.js in GitHub):

'use strict';

var AWS = require('aws-sdk');
var s3 = new AWS.S3();

exports.handler = (event, context, callback) => {
	console.log("I am here! " + context.functionName  +  ":"  +  context.functionVersion);

	s3.listBuckets(function (err, data){
		if(err){
			console.log(err, err.stack);
			callback(null, {
				statusCode: 500,
				body: "Failed!"
			});
		}
		else{
			var allBuckets = data.Buckets;

			console.log("Total buckets: " + allBuckets.length);
			//callback(null, allBuckets.length);

			//  New Code begins here
			var counter=0;
			for(var i  in allBuckets){
				if(allBuckets[i].Name[0] === "a")
					counter++;
			}
			console.log("Total buckets starting with a: " + counter);

			callback(null, {
				statusCode: 200,
				body: counter
			});
			
		}
	});	
}

Repackage and redeploy with the same two commands as earlier:

sam package –template-file template.yaml –s3-bucket mybucket –output-template-file packaged.yaml
	
sam deploy –template-file packaged.yaml –stack-name mySafeDeployStack –capabilities CAPABILITY_IAM

CloudFormation understands that this is a stack update instead of an entirely new stack. You can see that reflected in the CloudFormation console:

During the update, CloudFormation deploys the new Lambda function as version 2 and adds it to the “live” alias. There is no traffic routing there yet. CodeDeploy now takes over to begin the safe deployment process.

The first thing CodeDeploy does is invoke the preTrafficHook function. Verify that this happened by reviewing the Lambda logs and metrics:

The function should progress successfully, invoke Version 2 of returnS3Buckets, and finally invoke the CodeDeploy API with a success code. After this occurs, CodeDeploy begins the predefined rollout strategy. Open the CodeDeploy console to review the deployment progress (Linear10PercentEvery1Minute):

Verify the traffic shift

During the deployment, verify that the traffic shift has started to occur by running the test periodically. As the deployment shifts towards the new version, a larger percentage of the responses return 9 instead of 51. These numbers match the S3 buckets.

A minute later, you see 10% more traffic shifting to the new version. The whole process takes 10 minutes to complete. After completion, open the Lambda console and verify that the “live” alias now points to version 2:

After 10 minutes, the deployment is complete and CodeDeploy signals success to CloudFormation and completes the stack update.

Check the results

If you invoke the function alias manually, you see the results of the new implementation.

aws lambda invoke –function [lambda arn to live alias] out.txt

You can also execute the prod stage of your API and verify the results by issuing an HTTP GET to the invoke URL:

Summary

This post has shown you how you can safely automate your Lambda deployments using the Lambda traffic shifting feature. You used the Serverless Application Model (SAM) to define your Lambda functions and configured CodeDeploy to manage your deployment patterns. Finally, you used CloudFormation to automate the deployment and updates to your function and PreTraffic hook.

Now that you know all about this new feature, you’re ready to begin automating Lambda deployments with confidence that things will work as designed. I look forward to hearing about what you’ve built with the AWS Serverless Platform.

AWS AppSync – Production-Ready with Six New Features

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-appsync-production-ready-with-six-new-features/

If you build (or want to build) data-driven web and mobile apps and need real-time updates and the ability to work offline, you should take a look at AWS AppSync. Announced in preview form at AWS re:Invent 2017 and described in depth here, AWS AppSync is designed for use in iOS, Android, JavaScript, and React Native apps. AWS AppSync is built around GraphQL, an open, standardized query language that makes it easy for your applications to request the precise data that they need from the cloud.

I’m happy to announce that the preview period is over and that AWS AppSync is now generally available and production-ready, with six new features that will simplify and streamline your application development process:

Console Log Access – You can now see the CloudWatch Logs entries that are created when you test your GraphQL queries, mutations, and subscriptions from within the AWS AppSync Console.

Console Testing with Mock Data – You can now create and use mock context objects in the console for testing purposes.

Subscription Resolvers – You can now create resolvers for AWS AppSync subscription requests, just as you can already do for query and mutate requests.

Batch GraphQL Operations for DynamoDB – You can now make use of DynamoDB’s batch operations (BatchGetItem and BatchWriteItem) across one or more tables. in your resolver functions.

CloudWatch Support – You can now use Amazon CloudWatch Metrics and CloudWatch Logs to monitor calls to the AWS AppSync APIs.

CloudFormation Support – You can now define your schemas, data sources, and resolvers using AWS CloudFormation templates.

A Brief AppSync Review
Before diving in to the new features, let’s review the process of creating an AWS AppSync API, starting from the console. I click Create API to begin:

I enter a name for my API and (for demo purposes) choose to use the Sample schema:

The schema defines a collection of GraphQL object types. Each object type has a set of fields, with optional arguments:

If I was creating an API of my own I would enter my schema at this point. Since I am using the sample, I don’t need to do this. Either way, I click on Create to proceed:

The GraphQL schema type defines the entry points for the operations on the data. All of the data stored on behalf of a particular schema must be accessible using a path that begins at one of these entry points. The console provides me with an endpoint and key for my API:

It also provides me with guidance and a set of fully functional sample apps that I can clone:

When I clicked Create, AWS AppSync created a pair of Amazon DynamoDB tables for me. I can click Data Sources to see them:

I can also see and modify my schema, issue queries, and modify an assortment of settings for my API.

Let’s take a quick look at each new feature…

Console Log Access
The AWS AppSync Console already allows me to issue queries and to see the results, and now provides access to relevant log entries.In order to see the entries, I must enable logs (as detailed below), open up the LOGS, and check the checkbox. Here’s a simple mutation query that adds a new event. I enter the query and click the arrow to test it:

I can click VIEW IN CLOUDWATCH for a more detailed view:

To learn more, read Test and Debug Resolvers.

Console Testing with Mock Data
You can now create a context object in the console where it will be passed to one of your resolvers for testing purposes. I’ll add a testResolver item to my schema:

Then I locate it on the right-hand side of the Schema page and click Attach:

I choose a data source (this is for testing and the actual source will not be accessed), and use the Put item mapping template:

Then I click Select test context, choose Create New Context, assign a name to my test content, and click Save (as you can see, the test context contains the arguments from the query along with values to be returned for each field of the result):

After I save the new Resolver, I click Test to see the request and the response:

Subscription Resolvers
Your AWS AppSync application can monitor changes to any data source using the @aws_subscribe GraphQL schema directive and defining a Subscription type. The AWS AppSync client SDK connects to AWS AppSync using MQTT over Websockets and the application is notified after each mutation. You can now attach resolvers (which convert GraphQL payloads into the protocol needed by the underlying storage system) to your subscription fields and perform authorization checks when clients attempt to connect. This allows you to perform the same fine grained authorization routines across queries, mutations, and subscriptions.

To learn more about this feature, read Real-Time Data.

Batch GraphQL Operations
Your resolvers can now make use of DynamoDB batch operations that span one or more tables in a region. This allows you to use a list of keys in a single query, read records multiple tables, write records in bulk to multiple tables, and conditionally write or delete related records across multiple tables.

In order to use this feature the IAM role that you use to access your tables must grant access to DynamoDB’s BatchGetItem and BatchPutItem functions.

To learn more, read the DynamoDB Batch Resolvers tutorial.

CloudWatch Logs Support
You can now tell AWS AppSync to log API requests to CloudWatch Logs. Click on Settings and Enable logs, then choose the IAM role and the log level:

CloudFormation Support
You can use the following CloudFormation resource types in your templates to define AWS AppSync resources:

AWS::AppSync::GraphQLApi – Defines an AppSync API in terms of a data source (an Amazon Elasticsearch Service domain or a DynamoDB table).

AWS::AppSync::ApiKey – Defines the access key needed to access the data source.

AWS::AppSync::GraphQLSchema – Defines a GraphQL schema.

AWS::AppSync::DataSource – Defines a data source.

AWS::AppSync::Resolver – Defines a resolver by referencing a schema and a data source, and includes a mapping template for requests.

Here’s a simple schema definition in YAML form:

  AppSyncSchema:
    Type: "AWS::AppSync::GraphQLSchema"
    DependsOn:
      - AppSyncGraphQLApi
    Properties:
      ApiId: !GetAtt AppSyncGraphQLApi.ApiId
      Definition: |
        schema {
          query: Query
          mutation: Mutation
        }
        type Query {
          singlePost(id: ID!): Post
          allPosts: [Post]
        }
        type Mutation {
          putPost(id: ID!, title: String!): Post
        }
        type Post {
          id: ID!
          title: String!
        }

Available Now
These new features are available now and you can start using them today! Here are a couple of blog posts and other resources that you might find to be of interest:

Jeff;

 

 

How to retain system tables’ data spanning multiple Amazon Redshift clusters and run cross-cluster diagnostic queries

Post Syndicated from Karthik Sonti original https://aws.amazon.com/blogs/big-data/how-to-retain-system-tables-data-spanning-multiple-amazon-redshift-clusters-and-run-cross-cluster-diagnostic-queries/

Amazon Redshift is a data warehouse service that logs the history of the system in STL log tables. The STL log tables manage disk space by retaining only two to five days of log history, depending on log usage and available disk space.

To retain STL tables’ data for an extended period, you usually have to create a replica table for every system table. Then, for each you load the data from the system table into the replica at regular intervals. By maintaining replica tables for STL tables, you can run diagnostic queries on historical data from the STL tables. You then can derive insights from query execution times, query plans, and disk-spill patterns, and make better cluster-sizing decisions. However, refreshing replica tables with live data from STL tables at regular intervals requires schedulers such as Cron or AWS Data Pipeline. Also, these tables are specific to one cluster and they are not accessible after the cluster is terminated. This is especially true for transient Amazon Redshift clusters that last for only a finite period of ad hoc query execution.

In this blog post, I present a solution that exports system tables from multiple Amazon Redshift clusters into an Amazon S3 bucket. This solution is serverless, and you can schedule it as frequently as every five minutes. The AWS CloudFormation deployment template that I provide automates the solution setup in your environment. The system tables’ data in the Amazon S3 bucket is partitioned by cluster name and query execution date to enable efficient joins in cross-cluster diagnostic queries.

I also provide another CloudFormation template later in this post. This second template helps to automate the creation of tables in the AWS Glue Data Catalog for the system tables’ data stored in Amazon S3. After the system tables are exported to Amazon S3, you can run cross-cluster diagnostic queries on the system tables’ data and derive insights about query executions in each Amazon Redshift cluster. You can do this using Amazon QuickSight, Amazon Athena, Amazon EMR, or Amazon Redshift Spectrum.

You can find all the code examples in this post, including the CloudFormation templates, AWS Glue extract, transform, and load (ETL) scripts, and the resolution steps for common errors you might encounter in this GitHub repository.

Solution overview

The solution in this post uses AWS Glue to export system tables’ log data from Amazon Redshift clusters into Amazon S3. The AWS Glue ETL jobs are invoked at a scheduled interval by AWS Lambda. AWS Systems Manager, which provides secure, hierarchical storage for configuration data management and secrets management, maintains the details of Amazon Redshift clusters for which the solution is enabled. The last-fetched time stamp values for the respective cluster-table combination are maintained in an Amazon DynamoDB table.

The following diagram covers the key steps involved in this solution.

The solution as illustrated in the preceding diagram flows like this:

  1. The Lambda function, invoke_rs_stl_export_etl, is triggered at regular intervals, as controlled by Amazon CloudWatch. It’s triggered to look up the AWS Systems Manager parameter store to get the details of the Amazon Redshift clusters for which the system table export is enabled.
  2. The same Lambda function, based on the Amazon Redshift cluster details obtained in step 1, invokes the AWS Glue ETL job designated for the Amazon Redshift cluster. If an ETL job for the cluster is not found, the Lambda function creates one.
  3. The ETL job invoked for the Amazon Redshift cluster gets the cluster credentials from the parameter store. It gets from the DynamoDB table the last exported time stamp of when each of the system tables was exported from the respective Amazon Redshift cluster.
  4. The ETL job unloads the system tables’ data from the Amazon Redshift cluster into an Amazon S3 bucket.
  5. The ETL job updates the DynamoDB table with the last exported time stamp value for each system table exported from the Amazon Redshift cluster.
  6. The Amazon Redshift cluster system tables’ data is available in Amazon S3 and is partitioned by cluster name and date for running cross-cluster diagnostic queries.

Understanding the configuration data

This solution uses AWS Systems Manager parameter store to store the Amazon Redshift cluster credentials securely. The parameter store also securely stores other configuration information that the AWS Glue ETL job needs for extracting and storing system tables’ data in Amazon S3. Systems Manager comes with a default AWS Key Management Service (AWS KMS) key that it uses to encrypt the password component of the Amazon Redshift cluster credentials.

The following table explains the global parameters and cluster-specific parameters required in this solution. The global parameters are defined once and applicable at the overall solution level. The cluster-specific parameters are specific to an Amazon Redshift cluster and repeat for each cluster for which you enable this post’s solution. The CloudFormation template explained later in this post creates these parameters as part of the deployment process.

Parameter name Type Description
Global parametersdefined once and applied to all jobs
redshift_query_logs.global.s3_prefix String The Amazon S3 path where the query logs are exported. Under this path, each exported table is partitioned by cluster name and date.
redshift_query_logs.global.tempdir String The Amazon S3 path that AWS Glue ETL jobs use for temporarily staging the data.
redshift_query_logs.global.role> String The name of the role that the AWS Glue ETL jobs assume. Just the role name is sufficient. The complete Amazon Resource Name (ARN) is not required.
redshift_query_logs.global.enabled_cluster_list StringList A comma-separated list of cluster names for which system tables’ data export is enabled. This gives flexibility for a user to exclude certain clusters.
Cluster-specific parametersfor each cluster specified in the enabled_cluster_list parameter
redshift_query_logs.<<cluster_name>>.connection String The name of the AWS Glue Data Catalog connection to the Amazon Redshift cluster. For example, if the cluster name is product_warehouse, the entry is redshift_query_logs.product_warehouse.connection.
redshift_query_logs.<<cluster_name>>.user String The user name that AWS Glue uses to connect to the Amazon Redshift cluster.
redshift_query_logs.<<cluster_name>>.password Secure String The password that AWS Glue uses to connect the Amazon Redshift cluster’s encrypted-by key that is managed in AWS KMS.

For example, suppose that you have two Amazon Redshift clusters, product-warehouse and category-management, for which the solution described in this post is enabled. In this case, the parameters shown in the following screenshot are created by the solution deployment CloudFormation template in the AWS Systems Manager parameter store.

Solution deployment

To make it easier for you to get started, I created a CloudFormation template that automatically configures and deploys the solution—only one step is required after deployment.

Prerequisites

To deploy the solution, you must have one or more Amazon Redshift clusters in a private subnet. This subnet must have a network address translation (NAT) gateway or a NAT instance configured, and also a security group with a self-referencing inbound rule for all TCP ports. For more information about why AWS Glue ETL needs the configuration it does, described previously, see Connecting to a JDBC Data Store in a VPC in the AWS Glue documentation.

To start the deployment, launch the CloudFormation template:

CloudFormation stack parameters

The following table lists and describes the parameters for deploying the solution to export query logs from multiple Amazon Redshift clusters.

Property Default Description
S3Bucket mybucket The bucket this solution uses to store the exported query logs, stage code artifacts, and perform unloads from Amazon Redshift. For example, the mybucket/extract_rs_logs/data bucket is used for storing all the exported query logs for each system table partitioned by the cluster. The mybucket/extract_rs_logs/temp/ bucket is used for temporarily staging the unloaded data from Amazon Redshift. The mybucket/extract_rs_logs/code bucket is used for storing all the code artifacts required for Lambda and the AWS Glue ETL jobs.
ExportEnabledRedshiftClusters Requires Input A comma-separated list of cluster names from which the system table logs need to be exported.
DataStoreSecurityGroups Requires Input A list of security groups with an inbound rule to the Amazon Redshift clusters provided in the parameter, ExportEnabledClusters. These security groups should also have a self-referencing inbound rule on all TCP ports, as explained on Connecting to a JDBC Data Store in a VPC.

After you launch the template and create the stack, you see that the following resources have been created:

  1. AWS Glue connections for each Amazon Redshift cluster you provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
  2. All parameters required for this solution created in the parameter store.
  3. The Lambda function that invokes the AWS Glue ETL jobs for each configured Amazon Redshift cluster at a regular interval of five minutes.
  4. The DynamoDB table that captures the last exported time stamps for each exported cluster-table combination.
  5. The AWS Glue ETL jobs to export query logs from each Amazon Redshift cluster provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
  6. The IAM roles and policies required for the Lambda function and AWS Glue ETL jobs.

After the deployment

For each Amazon Redshift cluster for which you enabled the solution through the CloudFormation stack parameter, ExportEnabledRedshiftClusters, the automated deployment includes temporary credentials that you must update after the deployment:

  1. Go to the parameter store.
  2. Note the parameters <<cluster_name>>.user and redshift_query_logs.<<cluster_name>>.password that correspond to each Amazon Redshift cluster for which you enabled this solution. Edit these parameters to replace the placeholder values with the right credentials.

For example, if product-warehouse is one of the clusters for which you enabled system table export, you edit these two parameters with the right user name and password and choose Save parameter.

Querying the exported system tables

Within a few minutes after the solution deployment, you should see Amazon Redshift query logs being exported to the Amazon S3 location, <<S3Bucket_you_provided>>/extract_redshift_query_logs/data/. In that bucket, you should see the eight system tables partitioned by customer name and date: stl_alert_event_log, stl_dlltext, stl_explain, stl_query, stl_querytext, stl_scan, stl_utilitytext, and stl_wlm_query.

To run cross-cluster diagnostic queries on the exported system tables, create external tables in the AWS Glue Data Catalog. To make it easier for you to get started, I provide a CloudFormation template that creates an AWS Glue crawler, which crawls the exported system tables stored in Amazon S3 and builds the external tables in the AWS Glue Data Catalog.

Launch this CloudFormation template to create external tables that correspond to the Amazon Redshift system tables. S3Bucket is the only input parameter required for this stack deployment. Provide the same Amazon S3 bucket name where the system tables’ data is being exported. After you successfully create the stack, you can see the eight tables in the database, redshift_query_logs_db, as shown in the following screenshot.

Now, navigate to the Athena console to run cross-cluster diagnostic queries. The following screenshot shows a diagnostic query executed in Athena that retrieves query alerts logged across multiple Amazon Redshift clusters.

You can build the following example Amazon QuickSight dashboard by running cross-cluster diagnostic queries on Athena to identify the hourly query count and the key query alert events across multiple Amazon Redshift clusters.

How to extend the solution

You can extend this post’s solution in two ways:

  • Add any new Amazon Redshift clusters that you spin up after you deploy the solution.
  • Add other system tables or custom query results to the list of exports from an Amazon Redshift cluster.

Extend the solution to other Amazon Redshift clusters

To extend the solution to more Amazon Redshift clusters, add the three cluster-specific parameters in the AWS Systems Manager parameter store following the guidelines earlier in this post. Modify the redshift_query_logs.global.enabled_cluster_list parameter to append the new cluster to the comma-separated string.

Extend the solution to add other tables or custom queries to an Amazon Redshift cluster

The current solution ships with the export functionality for the following Amazon Redshift system tables:

  • stl_alert_event_log
  • stl_dlltext
  • stl_explain
  • stl_query
  • stl_querytext
  • stl_scan
  • stl_utilitytext
  • stl_wlm_query

You can easily add another system table or custom query by adding a few lines of code to the AWS Glue ETL job, <<cluster-name>_extract_rs_query_logs. For example, suppose that from the product-warehouse Amazon Redshift cluster you want to export orders greater than $2,000. To do so, add the following five lines of code to the AWS Glue ETL job product-warehouse_extract_rs_query_logs, where product-warehouse is your cluster name:

  1. Get the last-processed time-stamp value. The function creates a value if it doesn’t already exist.

salesLastProcessTSValue = functions.getLastProcessedTSValue(trackingEntry=”mydb.sales_2000",job_configs=job_configs)

  1. Run the custom query with the time stamp.

returnDF=functions.runQuery(query="select * from sales s join order o where o.order_amnt > 2000 and sale_timestamp > '{}'".format (salesLastProcessTSValue) ,tableName="mydb.sales_2000",job_configs=job_configs)

  1. Save the results to Amazon S3.

functions.saveToS3(dataframe=returnDF,s3Prefix=s3Prefix,tableName="mydb.sales_2000",partitionColumns=["sale_date"],job_configs=job_configs)

  1. Get the latest time-stamp value from the returned data frame in Step 2.

latestTimestampVal=functions.getMaxValue(returnDF,"sale_timestamp",job_configs)

  1. Update the last-processed time-stamp value in the DynamoDB table.

functions.updateLastProcessedTSValue(“mydb.sales_2000",latestTimestampVal[0],job_configs)

Conclusion

In this post, I demonstrate a serverless solution to retain the system tables’ log data across multiple Amazon Redshift clusters. By using this solution, you can incrementally export the data from system tables into Amazon S3. By performing this export, you can build cross-cluster diagnostic queries, build audit dashboards, and derive insights into capacity planning by using services such as Athena. I also demonstrate how you can extend this solution to other ad hoc query use cases or tables other than system tables by adding a few lines of code.


Additional Reading

If you found this post useful, be sure to check out Using Amazon Redshift Spectrum, Amazon Athena, and AWS Glue with Node.js in Production and Amazon Redshift – 2017 Recap.


About the Author

Karthik Sonti is a senior big data architect at Amazon Web Services. He helps AWS customers build big data and analytical solutions and provides guidance on architecture and best practices.