Tag Archives: AssumeRole

How to Patch, Inspect, and Protect Microsoft Windows Workloads on AWS—Part 1

Post Syndicated from Koen van Blijderveen original https://aws.amazon.com/blogs/security/how-to-patch-inspect-and-protect-microsoft-windows-workloads-on-aws-part-1/

Most malware tries to compromise your systems by using a known vulnerability that the maker of the operating system has already patched. To help prevent malware from affecting your systems, two security best practices are to apply all operating system patches to your systems and actively monitor your systems for missing patches. In case you do need to recover from a malware attack, you should make regular backups of your data.

In today’s blog post (Part 1 of a two-part post), I show how to keep your Amazon EC2 instances that run Microsoft Windows up to date with the latest security patches by using Amazon EC2 Systems Manager. Tomorrow in Part 2, I show how to take regular snapshots of your data by using Amazon EBS Snapshot Scheduler and how to use Amazon Inspector to check if your EC2 instances running Microsoft Windows contain any common vulnerabilities and exposures (CVEs).

What you should know first

To follow along with the solution in this post, you need one or more EC2 instances. You may use existing instances or create new instances. For the blog post, I assume this is an EC2 for Microsoft Windows Server 2012 R2 instance installed from the Amazon Machine Images (AMIs). If you are not familiar with how to launch an EC2 instance, see Launching an Instance. I also assume you launched or will launch your instance in a private subnet. A private subnet is not directly accessible via the internet, and access to it requires either a VPN connection to your on-premises network or a jump host in a public subnet (a subnet with access to the internet). You must make sure that the EC2 instance can connect to the internet using a network address translation (NAT) instance or NAT gateway to communicate with Systems Manager and Amazon Inspector. The following diagram shows how you should structure your Amazon Virtual Private Cloud (VPC). You should also be familiar with Restoring an Amazon EBS Volume from a Snapshot and Attaching an Amazon EBS Volume to an Instance.

Later on, you will assign tasks to a maintenance window to patch your instances with Systems Manager. To do this, the AWS Identity and Access Management (IAM) user you are using for this post must have the iam:PassRole permission. This permission allows this IAM user to assign tasks to pass their own IAM permissions to the AWS service. In this example, when you assign a task to a maintenance window, IAM passes your credentials to Systems Manager. This safeguard ensures that the user cannot use the creation of tasks to elevate their IAM privileges because their own IAM privileges limit which tasks they can run against an EC2 instance. You should also authorize your IAM user to use EC2, Amazon Inspector, Amazon CloudWatch, and Systems Manager. You can achieve this by attaching the following AWS managed policies to the IAM user you are using for this example: AmazonInspectorFullAccess, AmazonEC2FullAccess, and AmazonSSMFullAccess.

Architectural overview

The following diagram illustrates the components of this solution’s architecture.

Diagram showing the components of this solution's architecture

For this blog post, Microsoft Windows EC2 is Amazon EC2 for Microsoft Windows Server 2012 R2 instances with attached Amazon Elastic Block Store (Amazon EBS) volumes, which are running in your VPC. These instances may be standalone Windows instances running your Windows workloads, or you may have joined them to an Active Directory domain controller. For instances joined to a domain, you can be using Active Directory running on an EC2 for Windows instance, or you can use AWS Directory Service for Microsoft Active Directory.

Amazon EC2 Systems Manager is a scalable tool for remote management of your EC2 instances. You will use the Systems Manager Run Command to install the Amazon Inspector agent. The agent enables EC2 instances to communicate with the Amazon Inspector service and run assessments, which I explain in detail later in this blog post. You also will create a Systems Manager association to keep your EC2 instances up to date with the latest security patches.

You can use the EBS Snapshot Scheduler to schedule automated snapshots at regular intervals. You will use it to set up regular snapshots of your Amazon EBS volumes. EBS Snapshot Scheduler is a prebuilt solution by AWS that you will deploy in your AWS account. With Amazon EBS snapshots, you pay only for the actual data you store. Snapshots save only the data that has changed since the previous snapshot, which minimizes your cost.

You will use Amazon Inspector to run security assessments on your EC2 for Windows Server instance. In this post, I show how to assess if your EC2 for Windows Server instance is vulnerable to any of the more than 50,000 CVEs registered with Amazon Inspector.

In today’s and tomorrow’s posts, I show you how to:

  1. Launch an EC2 instance with an IAM role, Amazon EBS volume, and tags that Systems Manager and Amazon Inspector will use.
  2. Configure Systems Manager to install the Amazon Inspector agent and patch your EC2 instances.
  3. Take EBS snapshots by using EBS Snapshot Scheduler to automate snapshots based on instance tags.
  4. Use Amazon Inspector to check if your EC2 instances running Microsoft Windows contain any common vulnerabilities and exposures (CVEs).

Step 1: Launch an EC2 instance

In this section, I show you how to launch your EC2 instances so that you can use Systems Manager with the instances and use instance tags with EBS Snapshot Scheduler to automate snapshots. This requires three things:

  • Create an IAM role for Systems Manager before launching your EC2 instance.
  • Launch your EC2 instance with Amazon EBS and the IAM role for Systems Manager.
  • Add tags to instances so that you can automate policies for which instances you take snapshots of and when.

Create an IAM role for Systems Manager

Before launching your EC2 instance, I recommend that you first create an IAM role for Systems Manager, which you will use to update the EC2 instance you will launch. AWS already provides a preconfigured policy that you can use for your new role, and it is called AmazonEC2RoleforSSM.

  1. Sign in to the IAM console and choose Roles in the navigation pane. Choose Create new role.
    Screenshot of choosing "Create role"
  2. In the role-creation workflow, choose AWS service > EC2 > EC2 to create a role for an EC2 instance.
    Screenshot of creating a role for an EC2 instance
  3. Choose the AmazonEC2RoleforSSM policy to attach it to the new role you are creating.
    Screenshot of attaching the AmazonEC2RoleforSSM policy to the new role you are creating
  4. Give the role a meaningful name (I chose EC2SSM) and description, and choose Create role.
    Screenshot of giving the role a name and description

Launch your EC2 instance

To follow along, you need an EC2 instance that is running Microsoft Windows Server 2012 R2 and that has an Amazon EBS volume attached. You can use any existing instance you may have or create a new instance.

When launching your new EC2 instance, be sure that:

  • The operating system is Microsoft Windows Server 2012 R2.
  • You attach at least one Amazon EBS volume to the EC2 instance.
  • You attach the newly created IAM role (EC2SSM).
  • The EC2 instance can connect to the internet through a network address translation (NAT) gateway or a NAT instance.
  • You create the tags shown in the following screenshot (you will use them later).

If you are using an already launched EC2 instance, you can attach the newly created role as described in Easily Replace or Attach an IAM Role to an Existing EC2 Instance by Using the EC2 Console.

Add tags

The final step of configuring your EC2 instances is to add tags. You will use these tags to configure Systems Manager in Step 2 of this blog post and to configure Amazon Inspector in Part 2. For this example, I add a tag key, Patch Group, and set the value to Windows Servers. I could have other groups of EC2 instances that I treat differently by having the same tag key but a different tag value. For example, I might have a collection of other servers with the Patch Group tag key with a value of IAS Servers.

Screenshot of adding tags

Note: You must wait a few minutes until the EC2 instance becomes available before you can proceed to the next section.

At this point, you now have at least one EC2 instance you can use to configure Systems Manager, use EBS Snapshot Scheduler, and use Amazon Inspector.

Note: If you have a large number of EC2 instances to tag, you may want to use the EC2 CreateTags API rather than manually apply tags to each instance.

Step 2: Configure Systems Manager

In this section, I show you how to use Systems Manager to apply operating system patches to your EC2 instances, and how to manage patch compliance.

To start, I will provide some background information about Systems Manager. Then, I will cover how to:

  • Create the Systems Manager IAM role so that Systems Manager is able to perform patch operations.
  • Associate a Systems Manager patch baseline with your instance to define which patches Systems Manager should apply.
  • Define a maintenance window to make sure Systems Manager patches your instance when you tell it to.
  • Monitor patch compliance to verify the patch state of your instances.

Systems Manager is a collection of capabilities that helps you automate management tasks for AWS-hosted instances on EC2 and your on-premises servers. In this post, I use Systems Manager for two purposes: to run remote commands and apply operating system patches. To learn about the full capabilities of Systems Manager, see What Is Amazon EC2 Systems Manager?

Patch management is an important measure to prevent malware from infecting your systems. Most malware attacks look for vulnerabilities that are publicly known and in most cases are already patched by the maker of the operating system. These publicly known vulnerabilities are well documented and therefore easier for an attacker to exploit than having to discover a new vulnerability.

Patches for these new vulnerabilities are available through Systems Manager within hours after Microsoft releases them. There are two prerequisites to use Systems Manager to apply operating system patches. First, you must attach the IAM role you created in the previous section, EC2SSM, to your EC2 instance. Second, you must install the Systems Manager agent on your EC2 instance. If you have used a recent Microsoft Windows Server 2012 R2 AMI published by AWS, Amazon has already installed the Systems Manager agent on your EC2 instance. You can confirm this by logging in to an EC2 instance and looking for Amazon SSM Agent under Programs and Features in Windows. To install the Systems Manager agent on an instance that does not have the agent preinstalled or if you want to use the Systems Manager agent on your on-premises servers, see the documentation about installing the Systems Manager agent. If you forgot to attach the newly created role when launching your EC2 instance or if you want to attach the role to already running EC2 instances, see Attach an AWS IAM Role to an Existing Amazon EC2 Instance by Using the AWS CLI or use the AWS Management Console.

To make sure your EC2 instance receives operating system patches from Systems Manager, you will use the default patch baseline provided and maintained by AWS, and you will define a maintenance window so that you control when your EC2 instances should receive patches. For the maintenance window to be able to run any tasks, you also must create a new role for Systems Manager. This role is a different kind of role than the one you created earlier: Systems Manager will use this role instead of EC2. Earlier we created the EC2SSM role with the AmazonEC2RoleforSSM policy, which allowed the Systems Manager agent on our instance to communicate with the Systems Manager service. Here we need a new role with the policy AmazonSSMMaintenanceWindowRole to make sure the Systems Manager service is able to execute commands on our instance.

Create the Systems Manager IAM role

To create the new IAM role for Systems Manager, follow the same procedure as in the previous section, but in Step 3, choose the AmazonSSMMaintenanceWindowRole policy instead of the previously selected AmazonEC2RoleforSSM policy.

Screenshot of creating the new IAM role for Systems Manager

Finish the wizard and give your new role a recognizable name. For example, I named my role MaintenanceWindowRole.

Screenshot of finishing the wizard and giving your new role a recognizable name

By default, only EC2 instances can assume this new role. You must update the trust policy to enable Systems Manager to assume this role.

To update the trust policy associated with this new role:

  1. Navigate to the IAM console and choose Roles in the navigation pane.
  2. Choose MaintenanceWindowRole and choose the Trust relationships tab. Then choose Edit trust relationship.
  3. Update the policy document by copying the following policy and pasting it in the Policy Document box. As you can see, I have added the ssm.amazonaws.com service to the list of allowed Principals that can assume this role. Choose Update Trust Policy.
    {
       "Version":"2012-10-17",
       "Statement":[
          {
             "Sid":"",
             "Effect":"Allow",
             "Principal":{
                "Service":[
                   "ec2.amazonaws.com",
                   "ssm.amazonaws.com"
               ]
             },
             "Action":"sts:AssumeRole"
          }
       ]
    }

Associate a Systems Manager patch baseline with your instance

Next, you are going to associate a Systems Manager patch baseline with your EC2 instance. A patch baseline defines which patches Systems Manager should apply. You will use the default patch baseline that AWS manages and maintains. Before you can associate the patch baseline with your instance, though, you must determine if Systems Manager recognizes your EC2 instance.

Navigate to the EC2 console, scroll down to Systems Manager Shared Resources in the navigation pane, and choose Managed Instances. Your new EC2 instance should be available there. If your instance is missing from the list, verify the following:

  1. Go to the EC2 console and verify your instance is running.
  2. Select your instance and confirm you attached the Systems Manager IAM role, EC2SSM.
  3. Make sure that you deployed a NAT gateway in your public subnet to ensure your VPC reflects the diagram at the start of this post so that the Systems Manager agent can connect to the Systems Manager internet endpoint.
  4. Check the Systems Manager Agent logs for any errors.

Now that you have confirmed that Systems Manager can manage your EC2 instance, it is time to associate the AWS maintained patch baseline with your EC2 instance:

  1. Choose Patch Baselines under Systems Manager Services in the navigation pane of the EC2 console.
  2. Choose the default patch baseline as highlighted in the following screenshot, and choose Modify Patch Groups in the Actions drop-down.
    Screenshot of choosing Modify Patch Groups in the Actions drop-down
  3. In the Patch group box, enter the same value you entered under the Patch Group tag of your EC2 instance in “Step 1: Configure your EC2 instance.” In this example, the value I enter is Windows Servers. Choose the check mark icon next to the patch group and choose Close.Screenshot of modifying the patch group

Define a maintenance window

Now that you have successfully set up a role and have associated a patch baseline with your EC2 instance, you will define a maintenance window so that you can control when your EC2 instances should receive patches. By creating multiple maintenance windows and assigning them to different patch groups, you can make sure your EC2 instances do not all reboot at the same time. The Patch Group resource tag you defined earlier will determine to which patch group an instance belongs.

To define a maintenance window:

  1. Navigate to the EC2 console, scroll down to Systems Manager Shared Resources in the navigation pane, and choose Maintenance Windows. Choose Create a Maintenance Window.
    Screenshot of starting to create a maintenance window in the Systems Manager console
  2. Select the Cron schedule builder to define the schedule for the maintenance window. In the example in the following screenshot, the maintenance window will start every Saturday at 10:00 P.M. UTC.
  3. To specify when your maintenance window will end, specify the duration. In this example, the four-hour maintenance window will end on the following Sunday morning at 2:00 A.M. UTC (in other words, four hours after it started).
  4. Systems manager completes all tasks that are in process, even if the maintenance window ends. In my example, I am choosing to prevent new tasks from starting within one hour of the end of my maintenance window because I estimated my patch operations might take longer than one hour to complete. Confirm the creation of the maintenance window by choosing Create maintenance window.
    Screenshot of completing all boxes in the maintenance window creation process
  5. After creating the maintenance window, you must register the EC2 instance to the maintenance window so that Systems Manager knows which EC2 instance it should patch in this maintenance window. To do so, choose Register new targets on the Targets tab of your newly created maintenance window. You can register your targets by using the same Patch Group tag you used before to associate the EC2 instance with the AWS-provided patch baseline.
    Screenshot of registering new targets
  6. Assign a task to the maintenance window that will install the operating system patches on your EC2 instance:
    1. Open Maintenance Windows in the EC2 console, select your previously created maintenance window, choose the Tasks tab, and choose Register run command task from the Register new task drop-down.
    2. Choose the AWS-RunPatchBaseline document from the list of available documents.
    3. For Parameters:
      1. For Role, choose the role you created previously (called MaintenanceWindowRole).
      2. For Execute on, specify how many EC2 instances Systems Manager should patch at the same time. If you have a large number of EC2 instances and want to patch all EC2 instances within the defined time, make sure this number is not too low. For example, if you have 1,000 EC2 instances, a maintenance window of 4 hours, and 2 hours’ time for patching, make this number at least 500.
      3. For Stop after, specify after how many errors Systems Manager should stop.
      4. For Operation, choose Install to make sure to install the patches.
        Screenshot of stipulating maintenance window parameters

Now, you must wait for the maintenance window to run at least once according to the schedule you defined earlier. Note that if you don’t want to wait, you can adjust the schedule to run sooner by choosing Edit maintenance window on the Maintenance Windows page of Systems Manager. If your maintenance window has expired, you can check the status of any maintenance tasks Systems Manager has performed on the Maintenance Windows page of Systems Manager and select your maintenance window.

Screenshot of the maintenance window successfully created

Monitor patch compliance

You also can see the overall patch compliance of all EC2 instances that are part of defined patch groups by choosing Patch Compliance under Systems Manager Services in the navigation pane of the EC2 console. You can filter by Patch Group to see how many EC2 instances within the selected patch group are up to date, how many EC2 instances are missing updates, and how many EC2 instances are in an error state.

Screenshot of monitoring patch compliance

In this section, you have set everything up for patch management on your instance. Now you know how to patch your EC2 instance in a controlled manner and how to check if your EC2 instance is compliant with the patch baseline you have defined. Of course, I recommend that you apply these steps to all EC2 instances you manage.

Summary

In Part 1 of this blog post, I have shown how to configure EC2 instances for use with Systems Manager, EBS Snapshot Scheduler, and Amazon Inspector. I also have shown how to use Systems Manager to keep your Microsoft Windows–based EC2 instances up to date. In Part 2 of this blog post tomorrow, I will show how to take regular snapshots of your data by using EBS Snapshot Scheduler and how to use Amazon Inspector to check if your EC2 instances running Microsoft Windows contain any CVEs.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about or issues implementing this solution, start a new thread on the EC2 forum or the Amazon Inspector forum, or contact AWS Support.

– Koen

Capturing Custom, High-Resolution Metrics from Containers Using AWS Step Functions and AWS Lambda

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/capturing-custom-high-resolution-metrics-from-containers-using-aws-step-functions-and-aws-lambda/

Contributed by Trevor Sullivan, AWS Solutions Architect

When you deploy containers with Amazon ECS, are you gathering all of the key metrics so that you can correctly monitor the overall health of your ECS cluster?

By default, ECS writes metrics to Amazon CloudWatch in 5-minute increments. For complex or large services, this may not be sufficient to make scaling decisions quickly. You may want to respond immediately to changes in workload or to identify application performance problems. Last July, CloudWatch announced support for high-resolution metrics, up to a per-second basis.

These high-resolution metrics can be used to give you a clearer picture of the load and performance for your applications, containers, clusters, and hosts. In this post, I discuss how you can use AWS Step Functions, along with AWS Lambda, to cost effectively record high-resolution metrics into CloudWatch. You implement this solution using a serverless architecture, which keeps your costs low and makes it easier to troubleshoot the solution.

To show how this works, you retrieve some useful metric data from an ECS cluster running in the same AWS account and region (Oregon, us-west-2) as the Step Functions state machine and Lambda function. However, you can use this architecture to retrieve any custom application metrics from any resource in any AWS account and region.

Why Step Functions?

Step Functions enables you to orchestrate multi-step tasks in the AWS Cloud that run for any period of time, up to a year. Effectively, you’re building a blueprint for an end-to-end process. After it’s built, you can execute the process as many times as you want.

For this architecture, you gather metrics from an ECS cluster, every five seconds, and then write the metric data to CloudWatch. After your ECS cluster metrics are stored in CloudWatch, you can create CloudWatch alarms to notify you. An alarm can also trigger an automated remediation activity such as scaling ECS services, when a metric exceeds a threshold defined by you.

When you build a Step Functions state machine, you define the different states inside it as JSON objects. The bulk of the work in Step Functions is handled by the common task state, which invokes Lambda functions or Step Functions activities. There is also a built-in library of other useful states that allow you to control the execution flow of your program.

One of the most useful state types in Step Functions is the parallel state. Each parallel state in your state machine can have one or more branches, each of which is executed in parallel. Another useful state type is the wait state, which waits for a period of time before moving to the next state.

In this walkthrough, you combine these three states (parallel, wait, and task) to create a state machine that triggers a Lambda function, which then gathers metrics from your ECS cluster.

Step Functions pricing

This state machine is executed every minute, resulting in 60 executions per hour, and 1,440 executions per day. Step Functions is billed per state transition, including the Start and End state transitions, and giving you approximately 37,440 state transitions per day. To reach this number, I’m using this estimated math:

26 state transitions per-execution x 60 minutes x 24 hours

Based on current pricing, at $0.000025 per state transition, the daily cost of this metric gathering state machine would be $0.936.

Step Functions offers an indefinite 4,000 free state transitions every month. This benefit is available to all customers, not just customers who are still under the 12-month AWS Free Tier. For more information and cost example scenarios, see Step Functions pricing.

Why Lambda?

The goal is to capture metrics from an ECS cluster, and write the metric data to CloudWatch. This is a straightforward, short-running process that makes Lambda the perfect place to run your code. Lambda is one of the key services that makes up “Serverless” application architectures. It enables you to consume compute capacity only when your code is actually executing.

The process of gathering metric data from ECS and writing it to CloudWatch takes a short period of time. In fact, my average Lambda function execution time, while developing this post, is only about 250 milliseconds on average. For every five-second interval that occurs, I’m only using 1/20th of the compute time that I’d otherwise be paying for.

Lambda pricing

For billing purposes, Lambda execution time is rounded up to the nearest 100-ms interval. In general, based on the metrics that I observed during development, a 250-ms runtime would be billed at 300 ms. Here, I calculate the cost of this Lambda function executing on a daily basis.

Assuming 31 days in each month, there would be 535,680 five-second intervals (31 days x 24 hours x 60 minutes x 12 five-second intervals = 535,680). The Lambda function is invoked every five-second interval, by the Step Functions state machine, and runs for a 300-ms period. At current Lambda pricing, for a 128-MB function, you would be paying approximately the following:

Total compute

Total executions = 535,680
Total compute = total executions x (3 x $0.000000208 per 100 ms) = $0.334 per day

Total requests

Total requests = (535,680 / 1000000) * $0.20 per million requests = $0.11 per day

Total Lambda Cost

$0.11 requests + $0.334 compute time = $0.444 per day

Similar to Step Functions, Lambda offers an indefinite free tier. For more information, see Lambda Pricing.

Walkthrough

In the following sections, I step through the process of configuring the solution just discussed. If you follow along, at a high level, you will:

  • Configure an IAM role and policy
  • Create a Step Functions state machine to control metric gathering execution
  • Create a metric-gathering Lambda function
  • Configure a CloudWatch Events rule to trigger the state machine
  • Validate the solution

Prerequisites

You should already have an AWS account with a running ECS cluster. If you don’t have one running, you can easily deploy a Docker container on an ECS cluster using the AWS Management Console. In the example produced for this post, I use an ECS cluster running Windows Server (currently in beta), but either a Linux or Windows Server cluster works.

Create an IAM role and policy

First, create an IAM role and policy that enables Step Functions, Lambda, and CloudWatch to communicate with each other.

  • The CloudWatch Events rule needs permissions to trigger the Step Functions state machine.
  • The Step Functions state machine needs permissions to trigger the Lambda function.
  • The Lambda function needs permissions to query ECS and then write to CloudWatch Logs and metrics.

When you create the state machine, Lambda function, and CloudWatch Events rule, you assign this role to each of those resources. Upon execution, each of these resources assumes the specified role and executes using the role’s permissions.

  1. Open the IAM console.
  2. Choose Roles, create New Role.
  3. For Role Name, enter WriteMetricFromStepFunction.
  4. Choose Save.

Create the IAM role trust relationship
The trust relationship (also known as the assume role policy document) for your IAM role looks like the following JSON document. As you can see from the document, your IAM role needs to trust the Lambda, CloudWatch Events, and Step Functions services. By configuring your role to trust these services, they can assume this role and inherit the role permissions.

  1. Open the IAM console.
  2. Choose Roles and select the IAM role previously created.
  3. Choose Trust RelationshipsEdit Trust Relationships.
  4. Enter the following trust policy text and choose Save.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "events.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "states.us-west-2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Create an IAM policy

After you’ve finished configuring your role’s trust relationship, grant the role access to the other AWS resources that make up the solution.

The IAM policy is what gives your IAM role permissions to access various resources. You must whitelist explicitly the specific resources to which your role has access, because the default IAM behavior is to deny access to any AWS resources.

I’ve tried to keep this policy document as generic as possible, without allowing permissions to be too open. If the name of your ECS cluster is different than the one in the example policy below, make sure that you update the policy document before attaching it to your IAM role. You can attach this policy as an inline policy, instead of creating the policy separately first. However, either approach is valid.

  1. Open the IAM console.
  2. Select the IAM role, and choose Permissions.
  3. Choose Add in-line policy.
  4. Choose Custom Policy and then enter the following policy. The inline policy name does not matter.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [ "logs:*" ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [ "cloudwatch:PutMetricData" ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [ "states:StartExecution" ],
            "Resource": [
                "arn:aws:states:*:*:stateMachine:WriteMetricFromStepFunction"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [ "lambda:InvokeFunction" ],
            "Resource": "arn:aws:lambda:*:*:function:WriteMetricFromStepFunction"
        },
        {
            "Effect": "Allow",
            "Action": [ "ecs:Describe*" ],
            "Resource": "arn:aws:ecs:*:*:cluster/ECSEsgaroth"
        }
    ]
}

Create a Step Functions state machine

In this section, you create a Step Functions state machine that invokes the metric-gathering Lambda function every five (5) seconds, for a one-minute period. If you divide a minute (60) seconds into equal parts of five-second intervals, you get 12. Based on this math, you create 12 branches, in a single parallel state, in the state machine. Each branch triggers the metric-gathering Lambda function at a different five-second marker, throughout the one-minute period. After all of the parallel branches finish executing, the Step Functions execution completes and another begins.

Follow these steps to create your Step Functions state machine:

  1. Open the Step Functions console.
  2. Choose DashboardCreate State Machine.
  3. For State Machine Name, enter WriteMetricFromStepFunction.
  4. Enter the state machine code below into the editor. Make sure that you insert your own AWS account ID for every instance of “676655494xxx”
  5. Choose Create State Machine.
  6. Select the WriteMetricFromStepFunction IAM role that you previously created.
{
    "Comment": "Writes ECS metrics to CloudWatch every five seconds, for a one-minute period.",
    "StartAt": "ParallelMetric",
    "States": {
      "ParallelMetric": {
        "Type": "Parallel",
        "Branches": [
          {
            "StartAt": "WriteMetricLambda",
            "States": {
             	"WriteMetricLambda": {
                  "Type": "Task",
				  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitFive",
            "States": {
            	"WaitFive": {
            		"Type": "Wait",
            		"Seconds": 5,
            		"Next": "WriteMetricLambdaFive"
          		},
             	"WriteMetricLambdaFive": {
                  "Type": "Task",
				  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitTen",
            "States": {
            	"WaitTen": {
            		"Type": "Wait",
            		"Seconds": 10,
            		"Next": "WriteMetricLambda10"
          		},
             	"WriteMetricLambda10": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitFifteen",
            "States": {
            	"WaitFifteen": {
            		"Type": "Wait",
            		"Seconds": 15,
            		"Next": "WriteMetricLambda15"
          		},
             	"WriteMetricLambda15": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait20",
            "States": {
            	"Wait20": {
            		"Type": "Wait",
            		"Seconds": 20,
            		"Next": "WriteMetricLambda20"
          		},
             	"WriteMetricLambda20": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait25",
            "States": {
            	"Wait25": {
            		"Type": "Wait",
            		"Seconds": 25,
            		"Next": "WriteMetricLambda25"
          		},
             	"WriteMetricLambda25": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait30",
            "States": {
            	"Wait30": {
            		"Type": "Wait",
            		"Seconds": 30,
            		"Next": "WriteMetricLambda30"
          		},
             	"WriteMetricLambda30": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait35",
            "States": {
            	"Wait35": {
            		"Type": "Wait",
            		"Seconds": 35,
            		"Next": "WriteMetricLambda35"
          		},
             	"WriteMetricLambda35": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait40",
            "States": {
            	"Wait40": {
            		"Type": "Wait",
            		"Seconds": 40,
            		"Next": "WriteMetricLambda40"
          		},
             	"WriteMetricLambda40": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait45",
            "States": {
            	"Wait45": {
            		"Type": "Wait",
            		"Seconds": 45,
            		"Next": "WriteMetricLambda45"
          		},
             	"WriteMetricLambda45": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait50",
            "States": {
            	"Wait50": {
            		"Type": "Wait",
            		"Seconds": 50,
            		"Next": "WriteMetricLambda50"
          		},
             	"WriteMetricLambda50": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait55",
            "States": {
            	"Wait55": {
            		"Type": "Wait",
            		"Seconds": 55,
            		"Next": "WriteMetricLambda55"
          		},
             	"WriteMetricLambda55": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          }
        ],
        "End": true
      }
  }
}

Now you’ve got a shiny new Step Functions state machine! However, you might ask yourself, “After the state machine has been created, how does it get executed?” Before I answer that question, create the Lambda function that writes the custom metric, and then you get the end-to-end process moving.

Create a Lambda function

The meaty part of the solution is a Lambda function, written to consume the Python 3.6 runtime, that retrieves metric values from ECS, and then writes them to CloudWatch. This Lambda function is what the Step Functions state machine is triggering every five seconds, via the Task states. Key points to remember:

The Lambda function needs permission to:

  • Write CloudWatch metrics (PutMetricData API).
  • Retrieve metrics from ECS clusters (DescribeCluster API).
  • Write StdOut to CloudWatch Logs.

Boto3, the AWS SDK for Python, is included in the Lambda execution environment for Python 2.x and 3.x.

Because Lambda includes the AWS SDK, you don’t have to worry about packaging it up and uploading it to Lambda. You can focus on writing code and automatically take a dependency on boto3.

As for permissions, you’ve already created the IAM role and attached a policy to it that enables your Lambda function to access the necessary API actions. When you create your Lambda function, make sure that you select the correct IAM role, to ensure it is invoked with the correct permissions.

The following Lambda function code is generic. So how does the Lambda function know which ECS cluster to gather metrics for? Your Step Functions state machine automatically passes in its state to the Lambda function. When you create your CloudWatch Events rule, you specify a simple JSON object that passes the desired ECS cluster name into your Step Functions state machine, which then passes it to the Lambda function.

Use the following property values as you create your Lambda function:

Function Name: WriteMetricFromStepFunction
Description: This Lambda function retrieves metric values from an ECS cluster and writes them to Amazon CloudWatch.
Runtime: Python3.6
Memory: 128 MB
IAM Role: WriteMetricFromStepFunction

import boto3

def handler(event, context):
    cw = boto3.client('cloudwatch')
    ecs = boto3.client('ecs')
    print('Got boto3 client objects')
    
    Dimension = {
        'Name': 'ClusterName',
        'Value': event['ECSClusterName']
    }

    cluster = get_ecs_cluster(ecs, Dimension['Value'])
    
    cw_args = {
       'Namespace': 'ECS',
       'MetricData': [
           {
               'MetricName': 'RunningTask',
               'Dimensions': [ Dimension ],
               'Value': cluster['runningTasksCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'PendingTask',
               'Dimensions': [ Dimension ],
               'Value': cluster['pendingTasksCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'ActiveServices',
               'Dimensions': [ Dimension ],
               'Value': cluster['activeServicesCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'RegisteredContainerInstances',
               'Dimensions': [ Dimension ],
               'Value': cluster['registeredContainerInstancesCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           }
        ]
    }
    cw.put_metric_data(**cw_args)
    print('Finished writing metric data')
    
def get_ecs_cluster(client, cluster_name):
    cluster = client.describe_clusters(clusters = [ cluster_name ])
    print('Retrieved cluster details from ECS')
    return cluster['clusters'][0]

Create the CloudWatch Events rule

Now you’ve created an IAM role and policy, Step Functions state machine, and Lambda function. How do these components actually start communicating with each other? The final step in this process is to set up a CloudWatch Events rule that triggers your metric-gathering Step Functions state machine every minute. You have two choices for your CloudWatch Events rule expression: rate or cron. In this example, use the cron expression.

A couple key learning points from creating the CloudWatch Events rule:

  • You can specify one or more targets, of different types (for example, Lambda function, Step Functions state machine, SNS topic, and so on).
  • You’re required to specify an IAM role with permissions to trigger your target.
    NOTE: This applies only to certain types of targets, including Step Functions state machines.
  • Each target that supports IAM roles can be triggered using a different IAM role, in the same CloudWatch Events rule.
  • Optional: You can provide custom JSON that is passed to your target Step Functions state machine as input.

Follow these steps to create the CloudWatch Events rule:

  1. Open the CloudWatch console.
  2. Choose Events, RulesCreate Rule.
  3. Select Schedule, Cron Expression, and then enter the following rule:
    0/1 * * * ? *
  4. Choose Add Target, Step Functions State MachineWriteMetricFromStepFunction.
  5. For Configure Input, select Constant (JSON Text).
  6. Enter the following JSON input, which is passed to Step Functions, while changing the cluster name accordingly:
    { "ECSClusterName": "ECSEsgaroth" }
  7. Choose Use Existing Role, WriteMetricFromStepFunction (the IAM role that you previously created).

After you’ve completed with these steps, your screen should look similar to this:

Validate the solution

Now that you have finished implementing the solution to gather high-resolution metrics from ECS, validate that it’s working properly.

  1. Open the CloudWatch console.
  2. Choose Metrics.
  3. Choose custom and select the ECS namespace.
  4. Choose the ClusterName metric dimension.

You should see your metrics listed below.

Troubleshoot configuration issues

If you aren’t receiving the expected ECS cluster metrics in CloudWatch, check for the following common configuration issues. Review the earlier procedures to make sure that the resources were properly configured.

  • The IAM role’s trust relationship is incorrectly configured.
    Make sure that the IAM role trusts Lambda, CloudWatch Events, and Step Functions in the correct region.
  • The IAM role does not have the correct policies attached to it.
    Make sure that you have copied the IAM policy correctly as an inline policy on the IAM role.
  • The CloudWatch Events rule is not triggering new Step Functions executions.
    Make sure that the target configuration on the rule has the correct Step Functions state machine and IAM role selected.
  • The Step Functions state machine is being executed, but failing part way through.
    Examine the detailed error message on the failed state within the failed Step Functions execution. It’s possible that the
  • IAM role does not have permissions to trigger the target Lambda function, that the target Lambda function may not exist, or that the Lambda function failed to complete successfully due to invalid permissions.
    Although the above list covers several different potential configuration issues, it is not comprehensive. Make sure that you understand how each service is connected to each other, how permissions are granted through IAM policies, and how IAM trust relationships work.

Conclusion

In this post, you implemented a Serverless solution to gather and record high-resolution application metrics from containers running on Amazon ECS into CloudWatch. The solution consists of a Step Functions state machine, Lambda function, CloudWatch Events rule, and an IAM role and policy. The data that you gather from this solution helps you rapidly identify issues with an ECS cluster.

To gather high-resolution metrics from any service, modify your Lambda function to gather the correct metrics from your target. If you prefer not to use Python, you can implement a Lambda function using one of the other supported runtimes, including Node.js, Java, or .NET Core. However, this post should give you the fundamental basics about capturing high-resolution metrics in CloudWatch.

If you found this post useful, or have questions, please comment below.

Creating a Cost-Efficient Amazon ECS Cluster for Scheduled Tasks

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/creating-a-cost-efficient-amazon-ecs-cluster-for-scheduled-tasks/

Madhuri Peri
Sr. DevOps Consultant

When you use Amazon Relational Database Service (Amazon RDS), depending on the logging levels on the RDS instances and the volume of transactions, you could generate a lot of log data. To ensure that everything is running smoothly, many customers search for log error patterns using different log aggregation and visualization systems, such as Amazon Elasticsearch Service, Splunk, or other tool of their choice. A module needs to periodically retrieve the RDS logs using the SDK, and then send them to Amazon S3. From there, you can stream them to your log aggregation tool.

One option is writing an AWS Lambda function to retrieve the log files. However, because of the time that this function needs to execute, depending on the volume of log files retrieved and transferred, it is possible that Lambda could time out on many instances.  Another approach is launching an Amazon EC2 instance that runs this job periodically. However, this would require you to run an EC2 instance continuously, not an optimal use of time or money.

Using the new Amazon CloudWatch integration with Amazon EC2 Container Service, you can trigger this job to run in a container on an existing Amazon ECS cluster. Additionally, this would allow you to improve costs by running containers on a fleet of Spot Instances.

In this post, I will show you how to use the new scheduled tasks (cron) feature in Amazon ECS and launch tasks using CloudWatch events, while leveraging Spot Fleet to maximize availability and cost optimization for containerized workloads.

Architecture

The following diagram shows how the various components described schedule a task that retrieves log files from Amazon RDS database instances, and deposits the logs into an S3 bucket.

Amazon ECS cluster container instances are using Spot Fleet, which is a perfect match for the workload that needs to run when it can. This improves cluster costs.

The task definition defines which Docker image to retrieve from the Amazon EC2 Container Registry (Amazon ECR) repository and run on the Amazon ECS cluster.

The container image has Python code functions to make AWS API calls using boto3. It iterates over the RDS database instances, retrieves the logs, and deposits them in the S3 bucket. Many customers choose these logs to be delivered to their centralized log-store. CloudWatch Events defines the schedule for when the container task has to be launched.

Walkthrough

To provide the basic framework, we have built an AWS CloudFormation template that creates the following resources:

  • Amazon ECR repository for storing the Docker image to be used in the task definition
  • S3 bucket that holds the transferred logs
  • Task definition, with image name and S3 bucket as environment variables provided via input parameter
  • CloudWatch Events rule
  • Amazon ECS cluster
  • Amazon ECS container instances using Spot Fleet
  • IAM roles required for the container instance profiles

Before you begin

Ensure that Git, Docker, and the AWS CLI are installed on your computer.

In your AWS account, instantiate one Amazon Aurora instance using the console. For more information, see Creating an Amazon Aurora DB Cluster.

Implementation Steps

  1. Clone the code from GitHub that performs RDS API calls to retrieve the log files.
    git clone https://github.com/awslabs/aws-ecs-scheduled-tasks.git
  2. Build and tag the image.
    cd aws-ecs-scheduled-tasks/container-code/src && ls

    Dockerfile		rdslogsshipper.py	requirements.txt

    docker build -t rdslogsshipper .

    Sending build context to Docker daemon 9.728 kB
    Step 1 : FROM python:3
     ---> 41397f4f2887
    Step 2 : WORKDIR /usr/src/app
     ---> Using cache
     ---> 59299c020e7e
    Step 3 : COPY requirements.txt ./
     ---> 8c017e931c3b
    Removing intermediate container df09e1bed9f2
    Step 4 : COPY rdslogsshipper.py /usr/src/app
     ---> 099a49ca4325
    Removing intermediate container 1b1da24a6699
    Step 5 : RUN pip install --no-cache-dir -r requirements.txt
     ---> Running in 3ed98b30901d
    Collecting boto3 (from -r requirements.txt (line 1))
      Downloading boto3-1.4.6-py2.py3-none-any.whl (128kB)
    Collecting botocore (from -r requirements.txt (line 2))
      Downloading botocore-1.6.7-py2.py3-none-any.whl (3.6MB)
    Collecting s3transfer<0.2.0,>=0.1.10 (from boto3->-r requirements.txt (line 1))
      Downloading s3transfer-0.1.10-py2.py3-none-any.whl (54kB)
    Collecting jmespath<1.0.0,>=0.7.1 (from boto3->-r requirements.txt (line 1))
      Downloading jmespath-0.9.3-py2.py3-none-any.whl
    Collecting python-dateutil<3.0.0,>=2.1 (from botocore->-r requirements.txt (line 2))
      Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)
    Collecting docutils>=0.10 (from botocore->-r requirements.txt (line 2))
      Downloading docutils-0.14-py3-none-any.whl (543kB)
    Collecting six>=1.5 (from python-dateutil<3.0.0,>=2.1->botocore->-r requirements.txt (line 2))
      Downloading six-1.10.0-py2.py3-none-any.whl
    Installing collected packages: six, python-dateutil, docutils, jmespath, botocore, s3transfer, boto3
    Successfully installed boto3-1.4.6 botocore-1.6.7 docutils-0.14 jmespath-0.9.3 python-dateutil-2.6.1 s3transfer-0.1.10 six-1.10.0
     ---> f892d3cb7383
    Removing intermediate container 3ed98b30901d
    Step 6 : COPY . .
     ---> ea7550c04fea
    Removing intermediate container b558b3ebd406
    Successfully built ea7550c04fea
  3. Run the CloudFormation stack and get the names for the Amazon ECR repo and S3 bucket. In the stack, choose Outputs.
  4. Open the ECS console and choose Repositories. The rdslogs repo has been created. Choose View Push Commands and follow the instructions to connect to the repository and push the image for the code that you built in Step 2. The screenshot shows the final result:
  5. Associate the CloudWatch scheduled task with the created Amazon ECS Task Definition, using a new CloudWatch event rule that is scheduled to run at intervals. The following rule is scheduled to run every 15 minutes:
    aws --profile default --region us-west-2 events put-rule --name demo-ecs-task-rule  --schedule-expression "rate(15 minutes)"

    {
        "RuleArn": "arn:aws:events:us-west-2:12345678901:rule/demo-ecs-task-rule"
    }
  6. CloudWatch requires IAM permissions to place a task on the Amazon ECS cluster when the CloudWatch event rule is executed, in addition to an IAM role that can be assumed by CloudWatch Events. This is done in three steps:
    1. Create the IAM role to be assumed by CloudWatch.
      aws --profile default --region us-west-2 iam create-role --role-name Test-Role --assume-role-policy-document file://event-role.json

      {
          "Role": {
              "AssumeRolePolicyDocument": {
                  "Version": "2012-10-17", 
                  "Statement": [
                      {
                          "Action": "sts:AssumeRole", 
                          "Effect": "Allow", 
                          "Principal": {
                              "Service": "events.amazonaws.com"
                          }
                      }
                  ]
              }, 
              "RoleId": "AROAIRYYLDCVZCUACT7FS", 
              "CreateDate": "2017-07-14T22:44:52.627Z", 
              "RoleName": "Test-Role", 
              "Path": "/", 
              "Arn": "arn:aws:iam::12345678901:role/Test-Role"
          }
      }

      The following is an example of the event-role.json file used earlier:

      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Principal": {
                    "Service": "events.amazonaws.com"
                  },
                  "Action": "sts:AssumeRole"
              }
          ]
      }
    2. Create the IAM policy defining the ECS cluster and task definition. You need to get these values from the CloudFormation outputs and resources.
      aws --profile default --region us-west-2 iam create-policy --policy-name test-policy --policy-document file://event-policy.json

      {
          "Policy": {
              "PolicyName": "test-policy", 
              "CreateDate": "2017-07-14T22:51:20.293Z", 
              "AttachmentCount": 0, 
              "IsAttachable": true, 
              "PolicyId": "ANPAI7XDIQOLTBUMDWGJW", 
              "DefaultVersionId": "v1", 
              "Path": "/", 
              "Arn": "arn:aws:iam::123455678901:policy/test-policy", 
              "UpdateDate": "2017-07-14T22:51:20.293Z"
          }
      }

      The following is an example of the event-policy.json file used earlier:

      {
          "Version": "2012-10-17",
          "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "ecs:RunTask"
                ],
                "Resource": [
                    "arn:aws:ecs:*::task-definition/"
                ],
                "Condition": {
                    "ArnLike": {
                        "ecs:cluster": "arn:aws:ecs:*::cluster/"
                    }
                }
            }
          ]
      }
    3. Attach the IAM policy to the role.
      aws --profile default --region us-west-2 iam attach-role-policy --role-name Test-Role --policy-arn arn:aws:iam::1234567890:policy/test-policy
  7. Associate the CloudWatch rule created earlier to place the task on the ECS cluster. The following command shows an example. Replace the AWS account ID and region with your settings.
    aws events put-targets --rule demo-ecs-task-rule --targets "Id"="1","Arn"="arn:aws:ecs:us-west-2:12345678901:cluster/test-cwe-blog-ecsCluster-15HJFWCH4SP67","EcsParameters"={"TaskDefinitionArn"="arn:aws:ecs:us-west-2:12345678901:task-definition/test-cwe-blog-taskdef:8"},"RoleArn"="arn:aws:iam::12345678901:role/Test-Role"

    {
        "FailedEntries": [], 
        "FailedEntryCount": 0
    }

That’s it. The logs now run based on the defined schedule.

To test this, open the Amazon ECS console, select the Amazon ECS cluster that you created, and then choose Tasks, Run New Task. Select the task definition created by the CloudFormation template, and the cluster should be selected automatically. As this runs, the S3 bucket should be populated with the RDS logs for the instance.

Conclusion

In this post, you’ve seen that the choices for workloads that need to run at a scheduled time include Lambda with CloudWatch events or EC2 with cron. However, sometimes the job could run outside of Lambda execution time limits or be not cost-effective for an EC2 instance.

In such cases, you can schedule the tasks on an ECS cluster using CloudWatch rules. In addition, you can use a Spot Fleet cluster with Amazon ECS for cost-conscious workloads that do not have hard requirements on execution time or instance availability in the Spot Fleet. For more information, see Powering your Amazon ECS Cluster with Amazon EC2 Spot Instances and Scheduled Events.

If you have questions or suggestions, please comment below.

Build a Serverless Architecture to Analyze Amazon CloudFront Access Logs Using AWS Lambda, Amazon Athena, and Amazon Kinesis Analytics

Post Syndicated from Rajeev Srinivasan original https://aws.amazon.com/blogs/big-data/build-a-serverless-architecture-to-analyze-amazon-cloudfront-access-logs-using-aws-lambda-amazon-athena-and-amazon-kinesis-analytics/

Nowadays, it’s common for a web server to be fronted by a global content delivery service, like Amazon CloudFront. This type of front end accelerates delivery of websites, APIs, media content, and other web assets to provide a better experience to users across the globe.

The insights gained by analysis of Amazon CloudFront access logs helps improve website availability through bot detection and mitigation, optimizing web content based on the devices and browser used to view your webpages, reducing perceived latency by caching of popular object closer to its viewer, and so on. This results in a significant improvement in the overall perceived experience for the user.

This blog post provides a way to build a serverless architecture to generate some of these insights. To do so, we analyze Amazon CloudFront access logs both at rest and in transit through the stream. This serverless architecture uses Amazon Athena to analyze large volumes of CloudFront access logs (on the scale of terabytes per day), and Amazon Kinesis Analytics for streaming analysis.

The analytic queries in this blog post focus on three common use cases:

  1. Detection of common bots using the user agent string
  2. Calculation of current bandwidth usage per Amazon CloudFront distribution per edge location
  3. Determination of the current top 50 viewers

However, you can easily extend the architecture described to power dashboards for monitoring, reporting, and trigger alarms based on deeper insights gained by processing and analyzing the logs. Some examples are dashboards for cache performance, usage and viewer patterns, and so on.

Following we show a diagram of this architecture.

Prerequisites

Before you set up this architecture, install the AWS Command Line Interface (AWS CLI) tool on your local machine, if you don’t have it already.

Setup summary

The following steps are involved in setting up the serverless architecture on the AWS platform:

  1. Create an Amazon S3 bucket for your Amazon CloudFront access logs to be delivered to and stored in.
  2. Create a second Amazon S3 bucket to receive processed logs and store the partitioned data for interactive analysis.
  3. Create an Amazon Kinesis Firehose delivery stream to batch, compress, and deliver the preprocessed logs for analysis.
  4. Create an AWS Lambda function to preprocess the logs for analysis.
  5. Configure Amazon S3 event notification on the CloudFront access logs bucket, which contains the raw logs, to trigger the Lambda preprocessing function.
  6. Create an Amazon DynamoDB table to look up partition details, such as partition specification and partition location.
  7. Create an Amazon Athena table for interactive analysis.
  8. Create a second AWS Lambda function to add new partitions to the Athena table based on the log delivered to the processed logs bucket.
  9. Configure Amazon S3 event notification on the processed logs bucket to trigger the Lambda partitioning function.
  10. Configure Amazon Kinesis Analytics application for analysis of the logs directly from the stream.

ETL and preprocessing

In this section, we parse the CloudFront access logs as they are delivered, which occurs multiple times in an hour. We filter out commented records and use the user agent string to decipher the browser name, the name of the operating system, and whether the request has been made by a bot. For more details on how to decipher the preceding information based on the user agent string, see user-agents 1.1.0 in the Python documentation.

We use the Lambda preprocessing function to perform these tasks on individual rows of the access log. On successful completion, the rows are pushed to an Amazon Kinesis Firehose delivery stream to be persistently stored in an Amazon S3 bucket, the processed logs bucket.

To create a Firehose delivery stream with a new or existing S3 bucket as the destination, follow the steps described in Create a Firehose Delivery Stream to Amazon S3 in the S3 documentation. Keep most of the default settings, but select an AWS Identity and Access Management (IAM) role that has write access to your S3 bucket and specify GZIP compression. Name the delivery stream CloudFrontLogsToS3.

Another pre-requisite for this setup is to create an IAM role that provides the necessary permissions our AWS Lambda function to get the data from S3, process it, and deliver it to the CloudFrontLogsToS3 delivery stream.

Let’s use the AWS CLI to create the IAM role using the following the steps:

  1. Create the IAM policy (lambda-exec-policy) for the Lambda execution role to use.
  2. Create the Lambda execution role (lambda-cflogs-exec-role) and assign the service to use this role.
  3. Attach the policy created in step 1 to the Lambda execution role.

To download the policy document to your local machine, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/preprocessiong-lambda/lambda-exec-policy.json  <path_on_your_local_machine>

To download the assume policy document to your local machine, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/preprocessiong-lambda/assume-lambda-policy.json  <path_on_your_local_machine>

Following is the lambda-exec-policy.json file, which is the IAM policy used by the Lambda execution role.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "CloudWatchAccess",
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Sid": "S3Access",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::*"
            ]
        },
        {
            "Sid": "FirehoseAccess",
            "Effect": "Allow",
            "Action": [
                "firehose:ListDeliveryStreams",
                "firehose:PutRecord",
                "firehose:PutRecordBatch"
            ],
            "Resource": [
                "arn:aws:firehose:*:*:deliverystream/CloudFrontLogsToS3"
            ]
        }
    ]
}

To create the IAM policy used by Lambda execution role, type the following command.

aws iam create-policy --policy-name lambda-exec-policy --policy-document file://<path>/lambda-exec-policy.json

To create the AWS Lambda execution role and assign the service to use this role, type the following command.

aws iam create-role --role-name lambda-cflogs-exec-role --assume-role-policy-document file://<path>/assume-lambda-policy.json

Following is the assume-lambda-policy.json file, to grant Lambda permission to assume a role.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

To attach the policy (lambda-exec-policy) created to the AWS Lambda execution role (lambda-cflogs-exec-role), type the following command.

aws iam attach-role-policy --role-name lambda-cflogs-exec-role --policy-arn arn:aws:iam::<your-account-id>:policy/lambda-exec-policy

Now that we have created the CloudFrontLogsToS3 Firehose delivery stream and the lambda-cflogs-exec-role IAM role for Lambda, the next step is to create a Lambda preprocessing function.

This Lambda preprocessing function parses the CloudFront access logs delivered into the S3 bucket and performs a few transformation and mapping operations on the data. The Lambda function adds descriptive information, such as the browser and the operating system that were used to make this request based on the user agent string found in the logs. The Lambda function also adds information about the web distribution to support scenarios where CloudFront access logs are delivered to a centralized S3 bucket from multiple distributions. With the solution in this blog post, you can get insights across distributions and their edge locations.

Use the Lambda Management Console to create a new Lambda function with a Python 2.7 runtime and the s3-get-object-python blueprint. Open the console, and on the Configure triggers page, choose the name of the S3 bucket where the CloudFront access logs are delivered. Choose Put for Event type. For Prefix, type the name of the prefix, if any, for the folder where CloudFront access logs are delivered, for example cloudfront-logs/. To invoke Lambda to retrieve the logs from the S3 bucket as they are delivered, select Enable trigger.

Choose Next and provide a function name to identify this Lambda preprocessing function.

For Code entry type, choose Upload a file from Amazon S3. For S3 link URL, type https.amazonaws.com//preprocessing-lambda/pre-data.zip. In the section, also create an environment variable with the key KINESIS_FIREHOSE_STREAM and a value with the name of the Firehose delivery stream as CloudFrontLogsToS3.

Choose lambda-cflogs-exec-role as the IAM role for the Lambda function, and type prep-data.lambda_handler for the value for Handler.

Choose Next, and then choose Create Lambda.

Table creation in Amazon Athena

In this step, we will build the Athena table. Use the Athena console in the same region and create the table using the query editor.

CREATE EXTERNAL TABLE IF NOT EXISTS cf_logs (
  logdate date,
  logtime string,
  location string,
  bytes bigint,
  requestip string,
  method string,
  host string,
  uri string,
  status bigint,
  referrer string,
  useragent string,
  uriquery string,
  cookie string,
  resulttype string,
  requestid string,
  header string,
  csprotocol string,
  csbytes string,
  timetaken bigint,
  forwardedfor string,
  sslprotocol string,
  sslcipher string,
  responseresulttype string,
  protocolversion string,
  browserfamily string,
  osfamily string,
  isbot string,
  filename string,
  distribution string
)
PARTITIONED BY(year string, month string, day string, hour string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://<pre-processing-log-bucket>/prefix/';

Creation of the Athena partition

A popular website with millions of requests each day routed using Amazon CloudFront can generate a large volume of logs, on the order of a few terabytes a day. We strongly recommend that you partition your data to effectively restrict the amount of data scanned by each query. Partitioning significantly improves query performance and substantially reduces cost. The Lambda partitioning function adds the partition information to the Athena table for the data delivered to the preprocessed logs bucket.

Before delivering the preprocessed Amazon CloudFront logs file into the preprocessed logs bucket, Amazon Kinesis Firehose adds a UTC time prefix in the format YYYY/MM/DD/HH. This approach supports multilevel partitioning of the data by year, month, date, and hour. You can invoke the Lambda partitioning function every time a new processed Amazon CloudFront log is delivered to the preprocessed logs bucket. To do so, configure the Lambda partitioning function to be triggered by an S3 Put event.

For a website with millions of requests, a large number of preprocessed logs can be delivered multiple times in an hour—for example, at the interval of one each second. To avoid querying the Athena table for partition information every time a preprocessed log file is delivered, you can create an Amazon DynamoDB table for fast lookup.

Based on the year, month, data and hour in the prefix of the delivered log, the Lambda partitioning function checks if the partition specification exists in the Amazon DynamoDB table. If it doesn’t, it’s added to the table using an atomic operation, and then the Athena table is updated.

Type the following command to create the Amazon DynamoDB table.

aws dynamodb create-table --table-name athenapartitiondetails \
--attribute-definitions AttributeName=PartitionSpec,AttributeType=S \
--key-schema AttributeName=PartitionSpec,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=100

Here the following is true:

  • PartitionSpec is the hash key and is a representation of the partition signature—for example, year=”2017”; month=”05”; day=”15”; hour=”10”.
  • Depending on the rate at which the processed log files are delivered to the processed log bucket, you might have to increase the ReadCapacityUnits and WriteCapacityUnits values, if these are throttled.

The other attributes besides PartitionSpec are the following:

  • PartitionPath – The S3 path associated with the partition.
  • PartitionType – The type of partition used (Hour, Month, Date, Year, or ALL). In this case, ALL is used.

Next step is to create the IAM role to provide permissions for the Lambda partitioning function. You require permissions to do the following:

  1. Look up and write partition information to DynamoDB.
  2. Alter the Athena table with new partition information.
  3. Perform Amazon CloudWatch logs operations.
  4. Perform Amazon S3 operations.

To download the policy document to your local machine, type following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/partitioning-lambda/lambda-partition-function-execution-policy.json  <path_on_your_local_machine>

To download the assume policy document to your local machine, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/partitioning-lambda/assume-lambda-policy.json <path_on_your_local_machine>

To create the Lambda execution role and assign the service to use this role, type the following command.

aws iam create-role --role-name lambda-cflogs-exec-role --assume-role-policy-document file://<path>/assume-lambda-policy.json

Let’s use the AWS CLI to create the IAM role using the following three steps:

  1. Create the IAM policy(lambda-partition-exec-policy) used by the Lambda execution role.
  2. Create the Lambda execution role (lambda-partition-execution-role)and assign the service to use this role.
  3. Attach the policy created in step 1 to the Lambda execution role.

To create the IAM policy used by Lambda execution role, type the following command.

aws iam create-policy --policy-name lambda-partition-exec-policy --policy-document file://<path>/lambda-partition-function-execution-policy.json

To create the Lambda execution role and assign the service to use this role, type the following command.

aws iam create-role --role-name lambda-partition-execution-role --assume-role-policy-document file://<path>/assume-lambda-policy.json

To attach the policy (lambda-partition-exec-policy) created to the AWS Lambda execution role (lambda-partition-execution-role), type the following command.

aws iam attach-role-policy --role-name lambda-partition-execution-role --policy-arn arn:aws:iam::<your-account-id>:policy/lambda-partition-exec-policy

Following is the lambda-partition-function-execution-policy.json file, which is the IAM policy used by the Lambda execution role.

{
    "Version": "2012-10-17",
    "Statement": [
      	{
            	"Sid": "DDBTableAccess",
            	"Effect": "Allow",
            	"Action": "dynamodb:PutItem"
            	"Resource": "arn:aws:dynamodb*:*:table/athenapartitiondetails"
        	},
        	{
            	"Sid": "S3Access",
            	"Effect": "Allow",
            	"Action": [
                		"s3:GetBucketLocation",
                		"s3:GetObject",
                		"s3:ListBucket",
                		"s3:ListBucketMultipartUploads",
                		"s3:ListMultipartUploadParts",
                		"s3:AbortMultipartUpload",
                		"s3:PutObject"
            	],
          		"Resource":"arn:aws:s3:::*"
		},
	              {
		      "Sid": "AthenaAccess",
      		"Effect": "Allow",
      		"Action": [ "athena:*" ],
      		"Resource": [ "*" ]
	      },
        	{
            	"Sid": "CloudWatchLogsAccess",
            	"Effect": "Allow",
            	"Action": [
                		"logs:CreateLogGroup",
                		"logs:CreateLogStream",
             	   	"logs:PutLogEvents"
            	],
            	"Resource": "arn:aws:logs:*:*:*"
        	}
    ]
}

Download the .jar file containing the Java deployment package to your local machine.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/partitioning-lambda/aws-lambda-athena-1.0.0.jar <path_on_your_local_machine>

From the AWS Management Console, create a new Lambda function with Java8 as the runtime. Select the Blank Function blueprint.

On the Configure triggers page, choose the name of the S3 bucket where the preprocessed logs are delivered. Choose Put for the Event Type. For Prefix, type the name of the prefix folder, if any, where preprocessed logs are delivered by Firehose—for example, out/. For Suffix, type the name of the compression format that the Firehose stream (CloudFrontLogToS3) delivers the preprocessed logs —for example, gz. To invoke Lambda to retrieve the logs from the S3 bucket as they are delivered, select Enable Trigger.

Choose Next and provide a function name to identify this Lambda partitioning function.

Choose Java8 for Runtime for the AWS Lambda function. Choose Upload a .ZIP or .JAR file for the Code entry type, and choose Upload to upload the downloaded aws-lambda-athena-1.0.0.jar file.

Next, create the following environment variables for the Lambda function:

  • TABLE_NAME – The name of the Athena table (for example, cf_logs).
  • PARTITION_TYPE – The partition to be created based on the Athena table for the logs delivered to the sub folders in S3 bucket based on Year, Month, Date, Hour, or Set this to ALL to use Year, Month, Date, and Hour.
  • DDB_TABLE_NAME – The name of the DynamoDB table holding partition information (for example, athenapartitiondetails).
  • ATHENA_REGION – The current AWS Region for the Athena table to construct the JDBC connection string.
  • S3_STAGING_DIR – The Amazon S3 location where your query output is written. The JDBC driver asks Athena to read the results and provide rows of data back to the user (for example, s3://<bucketname>/<folder>/).

To configure the function handler and IAM, for Handler copy and paste the name of the handler: com.amazonaws.services.lambda.CreateAthenaPartitionsBasedOnS3EventWithDDB::handleRequest. Choose the existing IAM role, lambda-partition-execution-role.

Choose Next and then Create Lambda.

Interactive analysis using Amazon Athena

In this section, we analyze the historical data that’s been collected since we added the partitions to the Amazon Athena table for data delivered to the preprocessing logs bucket.

Scenario 1 is robot traffic by edge location.

SELECT COUNT(*) AS ct, requestip, location FROM cf_logs
WHERE isbot='True'
GROUP BY requestip, location
ORDER BY ct DESC;

Scenario 2 is total bytes transferred per distribution for each edge location for your website.

SELECT distribution, location, SUM(bytes) as totalBytes
FROM cf_logs
GROUP BY location, distribution;

Scenario 3 is the top 50 viewers of your website.

SELECT requestip, COUNT(*) AS ct  FROM cf_logs
GROUP BY requestip
ORDER BY ct DESC;

Streaming analysis using Amazon Kinesis Analytics

In this section, you deploy a stream processing application using Amazon Kinesis Analytics to analyze the preprocessed Amazon CloudFront log streams. This application analyzes directly from the Amazon Kinesis Stream as it is delivered to the preprocessing logs bucket. The stream queries in section are focused on gaining the following insights:

  • The IP address of the bot, identified by its Amazon CloudFront edge location, that is currently sending requests to your website. The query also includes the total bytes transferred as part of the response.
  • The total bytes served per distribution per population for your website.
  • The top 10 viewers of your website.

To download the firehose-access-policy.json file, type the following.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/kinesisanalytics/firehose-access-policy.json  <path_on_your_local_machine>

To download the kinesisanalytics-policy.json file, type the following.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/kinesisanalytics/assume-kinesisanalytics-policy.json <path_on_your_local_machine>

Before we create the Amazon Kinesis Analytics application, we need to create the IAM role to provide permission for the analytics application to access Amazon Kinesis Firehose stream.

Let’s use the AWS CLI to create the IAM role using the following three steps:

  1. Create the IAM policy(firehose-access-policy) for the Lambda execution role to use.
  2. Create the Lambda execution role (ka-execution-role) and assign the service to use this role.
  3. Attach the policy created in step 1 to the Lambda execution role.

Following is the firehose-access-policy.json file, which is the IAM policy used by Kinesis Analytics to read Firehose delivery stream.

{
    "Version": "2012-10-17",
    "Statement": [
      	{
    	"Sid": "AmazonFirehoseAccess",
    	"Effect": "Allow",
    	"Action": [
       	"firehose:DescribeDeliveryStream",
        	"firehose:Get*"
    	],
    	"Resource": [
              "arn:aws:firehose:*:*:deliverystream/CloudFrontLogsToS3”
       ]
     }
}

Following is the assume-kinesisanalytics-policy.json file, to grant Amazon Kinesis Analytics permissions to assume a role.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "kinesisanalytics.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

To create the IAM policy used by Analytics access role, type the following command.

aws iam create-policy --policy-name firehose-access-policy --policy-document file://<path>/firehose-access-policy.json

To create the Analytics execution role and assign the service to use this role, type the following command.

aws iam attach-role-policy --role-name ka-execution-role --policy-arn arn:aws:iam::<your-account-id>:policy/firehose-access-policy

To attach the policy (irehose-access-policy) created to the Analytics execution role (ka-execution-role), type the following command.

aws iam attach-role-policy --role-name ka-execution-role --policy-arn arn:aws:iam::<your-account-id>:policy/firehose-access-policy

To deploy the Analytics application, first download the configuration file and then modify ResourceARN and RoleARN for the Amazon Kinesis Firehose input configuration.

"KinesisFirehoseInput": { 
    "ResourceARN": "arn:aws:firehose:<region>:<account-id>:deliverystream/CloudFrontLogsToS3", 
    "RoleARN": "arn:aws:iam:<account-id>:role/ka-execution-role"
}

To download the Analytics application configuration file, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis//kinesisanalytics/kinesis-analytics-app-configuration.json <path_on_your_local_machine>

To deploy the application, type the following command.

aws kinesisanalytics create-application --application-name "cf-log-analysis" --cli-input-json file://<path>/kinesis-analytics-app-configuration.json

To start the application, type the following command.

aws kinesisanalytics start-application --application-name "cf-log-analysis" --input-configuration Id="1.1",InputStartingPositionConfiguration={InputStartingPosition="NOW"}

SQL queries using Amazon Kinesis Analytics

Scenario 1 is a query for detecting bots for sending request to your website detection for your website.

-- Create output stream, which can be used to send to a destination
CREATE OR REPLACE STREAM "BOT_DETECTION" (requesttime TIME, destribution VARCHAR(16), requestip VARCHAR(64), edgelocation VARCHAR(64), totalBytes BIGINT);
-- Create pump to insert into output 
CREATE OR REPLACE PUMP "BOT_DETECTION_PUMP" AS INSERT INTO "BOT_DETECTION"
--
SELECT STREAM 
    STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND) as requesttime,
    "distribution_name" as distribution,
    "request_ip" as requestip, 
    "edge_location" as edgelocation, 
    SUM("bytes") as totalBytes
FROM "CF_LOG_STREAM_001"
WHERE "is_bot" = true
GROUP BY "request_ip", "edge_location", "distribution_name",
STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND),
STEP("CF_LOG_STREAM_001".ROWTIME BY INTERVAL '1' SECOND);

Scenario 2 is a query for total bytes transferred per distribution for each edge location for your website.

-- Create output stream, which can be used to send to a destination
CREATE OR REPLACE STREAM "BYTES_TRANSFFERED" (requesttime TIME, destribution VARCHAR(16), edgelocation VARCHAR(64), totalBytes BIGINT);
-- Create pump to insert into output 
CREATE OR REPLACE PUMP "BYTES_TRANSFFERED_PUMP" AS INSERT INTO "BYTES_TRANSFFERED"
-- Bytes Transffered per second per web destribution by edge location
SELECT STREAM 
    STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND) as requesttime,
    "distribution_name" as distribution,
    "edge_location" as edgelocation, 
    SUM("bytes") as totalBytes
FROM "CF_LOG_STREAM_001"
GROUP BY "distribution_name", "edge_location", "request_date",
STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND),
STEP("CF_LOG_STREAM_001".ROWTIME BY INTERVAL '1' SECOND);

Scenario 3 is a query for the top 50 viewers for your website.

-- Create output stream, which can be used to send to a destination
CREATE OR REPLACE STREAM "TOP_TALKERS" (requestip VARCHAR(64), requestcount DOUBLE);
-- Create pump to insert into output 
CREATE OR REPLACE PUMP "TOP_TALKERS_PUMP" AS INSERT INTO "TOP_TALKERS"
-- Top Ten Talker
SELECT STREAM ITEM as requestip, ITEM_COUNT as requestcount FROM TABLE(TOP_K_ITEMS_TUMBLING(
  CURSOR(SELECT STREAM * FROM "CF_LOG_STREAM_001"),
  'request_ip', -- name of column in single quotes
  50, -- number of top items
  60 -- tumbling window size in seconds
  )
);

Conclusion

Following the steps in this blog post, you just built an end-to-end serverless architecture to analyze Amazon CloudFront access logs. You analyzed these both in interactive and streaming mode, using Amazon Athena and Amazon Kinesis Analytics respectively.

By creating a partition in Athena for the logs delivered to a centralized bucket, this architecture is optimized for performance and cost when analyzing large volumes of logs for popular websites that receive millions of requests. Here, we have focused on just three common use cases for analysis, sharing the analytic queries as part of the post. However, you can extend this architecture to gain deeper insights and generate usage reports to reduce latency and increase availability. This way, you can provide a better experience on your websites fronted with Amazon CloudFront.

In this blog post, we focused on building serverless architecture to analyze Amazon CloudFront access logs. Our plan is to extend the solution to provide rich visualization as part of our next blog post.


About the Authors

Rajeev Srinivasan is a Senior Solution Architect for AWS. He works very close with our customers to provide big data and NoSQL solution leveraging the AWS platform and enjoys coding . In his spare time he enjoys riding his motorcycle and reading books.

 

Sai Sriparasa is a consultant with AWS Professional Services. He works with our customers to provide strategic and tactical big data solutions with an emphasis on automation, operations & security on AWS. In his spare time, he follows sports and current affairs.

 

 


Related

Analyzing VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight

SAML for Your Serverless JavaScript Application: Part II

Post Syndicated from Bryan Liston original https://aws.amazon.com/blogs/compute/saml-for-your-serverless-javascript-application-part-ii/

Contributors: Richard Threlkeld, Gene Ting, Stefano Buliani

The full code for both scenarios—including SAM templates—can be found at the samljs-serverless-sample GitHub repository. We highly recommend you use the SAM templates in the GitHub repository to create the resources, opitonally you can manually create them.


This is the second part of a two part series for using SAML providers in your application and receiving short-term credentials to access AWS Services. These credentials can be limited with IAM roles so the users of the applications can perform actions like fetching data from databases or uploading files based on their level of authorization. For example, you may want to build a JavaScript application that allows a user to authenticate against Active Directory Federation Services (ADFS). The user can be granted scoped AWS credentials to invoke an API to display information in the application or write to an Amazon DynamoDB table.

Part I of this series walked through a client-side flow of retrieving SAML claims and passing them to Amazon Cognito to retrieve credentials. This blog post will take you through a more advanced scenario where logic can be moved to the backend for a more comprehensive and flexible solution.

Prerequisites

As in Part I of this series, you need ADFS running in your environment. The following configurations are used for reference:

  1. ADFS federated with the AWS console. For a walkthrough with an AWS CloudFormation template, see Enabling Federation to AWS Using Windows Active Directory, ADFS, and SAML 2.0.
  2. Verify that you can authenticate with user example\bob for both the ADFS-Dev and ADFS-Production groups via the sign-in page.
  3. Create an Amazon Cognito identity pool.

Scenario Overview

The scenario in the last blog post may be sufficient for many organizations but, due to size restrictions, some browsers may drop part or all of a query string when sending a large number of claims in the SAMLResponse. Additionally, for auditing and logging reasons, you may wish to relay SAML assertions via POST only and perform parsing in the backend before sending credentials to the client. This scenario allows you to perform custom business logic and validation as well as putting tracking controls in place.

In this post, we want to show you how these requirements can be achieved in a Serverless application. We also show how different challenges (like XML parsing and JWT exchange) can be done in a Serverless application design. Feel free to mix and match, or swap pieces around to suit your needs.

This scenario uses the following services and features:

  • Cognito for unique ID generation and default role mapping
  • S3 for static website hosting
  • API Gateway for receiving the SAMLResponse POST from ADFS
  • Lambda for processing the SAML assertion using a native XML parser
  • DynamoDB conditional writes for session tracking exceptions
  • STS for credentials via Lambda
  • KMS for signing JWT tokens
  • API Gateway custom authorizers for controlling per-session access to credentials, using JWT tokens that were signed with KMS keys
  • JavaScript-generated SDK from API Gateway using a service proxy to DynamoDB
  • RelayState in the SAMLRequest to ADFS to transmit the CognitoID and a short code from the client to your AWS backend

At a high level, this solution is similar to that of Scenario 1; however, most of the work is done in the infrastructure rather than on the client.

  • ADFS still uses a POST binding to redirect the SAMLResponse to API Gateway; however, the Lambda function does not immediately redirect.
  • The Lambda function decodes and uses an XML parser to read the properties of the SAML assertion.
  • If the user’s assertion shows that they belong to a certain group matching a specified string (“Prod” in the sample), then you assign a role that they can assume (“ADFS-Production”).
  • Lambda then gets the credentials on behalf of the user and stores them in DynamoDB as well as logging an audit record in a separate table.
  • Lambda then returns a short-lived, signed JSON Web Token (JWT) to the JavaScript application.
  • The application uses the JWT to get their stored credentials from DynamoDB through an API Gateway custom authorizer.

The architecture you build in this tutorial is outlined in the following diagram.

lambdasamltwo_1.png

First, a user visits your static website hosted on S3. They generate an ephemeral random code that is transmitted during redirection to ADFS, where they are prompted for their Active Directory credentials.

Upon successful authentication, the ADFS server redirects the SAMLResponse assertion, along with the code (as the RelayState) via POST to API Gateway.

The Lambda function parses the SAMLResponse. If the user is part of the appropriate Active Directory group (AWS-Production in this tutorial), it retrieves credentials from STS on behalf of the user.

The credentials are stored in a DynamoDB table called SAMLSessions, along with the short code. The user login is stored in a tracking table called SAMLUsers.

The Lambda function generates a JWT token, with a 30-second expiration time signed with KMS, then redirects the client back to the static website along with this token.

The client then makes a call to an API Gateway resource acting as a DynamoDB service proxy that retrieves the credentials via a DeleteItem call. To make this call, the client passes the JWT in the authorization header.

A custom authorizer runs to validate the token using the KMS key again as well as the original random code.

Now that the client has credentials, it can use these to access AWS resources.

Tutorial: Backend processing and audit tracking

Before you walk through this tutorial you will need the source code from the samljs-serverless-sample Github Repository. You should use the SAM template provided in order to streamline the process but we’ll outline how you you would manually create resources too. There is a readme in the repository with instructions for using the SAM template. Either way you will still perform the manual steps of KMS key configuration, ADFS enablement of RelayState, and Amazon Cognito Identity Pool creation. The template will automate the process in creating the S3 website, Lambda functions, API Gateway resources and DynamoDB tables.

We walk through the details of all the steps and configuration below for illustrative purposes in this tutorial calling out the sections that can be omitted if you used the SAM template.

KMS key configuration

To sign JWT tokens, you need an encrypted plaintext key, to be stored in KMS. You will need to complete this step even if you use the SAM template.

  1. In the IAM console, choose Encryption Keys, Create Key.
  2. For Alias, type sessionMaster.
  3. For Advanced Options, choose KMS, Next Step.
  4. For Key Administrative Permissions, select your administrative role or user account.
  5. For Key Usage Permissions, you can leave this blank as the IAM Role (next section) will have individual key actions configured. This allows you to perform administrative actions on the set of keys while the Lambda functions have rights to just create data keys for encryption/decryption and use them to sign JWTs.
  6. Take note of the Key ID, which is needed for the Lambda functions.

IAM role configuration

You will need an IAM role for executing your Lambda functions. If you are using the SAM template this can be skipped. The sample code in the GitHub repository under Scenario2 creates separate roles for each function, with limited permissions on individual resources when you use the SAM template. We recommend separate roles scoped to individual resources for production deployments. Your Lambda functions need the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1432927122000",
            "Effect": "Allow",
            "Action": [
                "dynamodb:PutItem",
                “dynamodb:GetItem”,
                “dynamodb:DeleteItem”,
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents",
                "kms:GenerateDataKey*",
                “kms:Decrypt”
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

Lambda function configuration

If you are not using the SAM template, create the following three Lambda functions from the GitHub repository in /Scenario2/lambda using the following names and environment variables. The Lambda functions are written in Node.js.

  • GenerateKey_awslabs_samldemo
  • ProcessSAML_awslabs_samldemo
  • SAMLCustomAuth_awslabs_samldemo

The functions above are built, packaged, and uploaded to Lambda. For two of the functions, this can be done from your workstation (the sample commands for each function assume OSX or Linux). The third will need to be built on an AWS EC2 instance running the current Lambda AMI.

GenerateKey_awslabs_samldemo

This function is only used one time to create keys in KMS for signing JWT tokens. The function calls GenerateDataKey and stores the encrypted CipherText blob as Base64 in DynamoDB. This is used by the other two functions for getting the PlainTextKey for signing with a Decrypt operation.

This function only requires a single file. It has the following environment variables:

  • KMS_KEY_ID: Unique identifier from KMS for your sessionMaster Key
  • SESSION_DDB_TABLE: SAMLSessions
  • ENC_CONTEXT: ADFS (or something unique to your organization)
  • RAND_HASH: us-east-1:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX

Navigate into /Scenario2/lambda/GenerateKey and run the following commands:

zip –r generateKey.zip .

aws lambda create-function --function-name GenerateKey_awslabs_samldemo --runtime nodejs4.3 --role LAMBDA_ROLE_ARN --handler index.handler --timeout 10 --memory-size 512 --zip-file fileb://generateKey.zip --environment Variables={SESSION_DDB_TABLE=SAMLSessions,ENC_CONTEXT=ADFS,RAND_HASH=us-east-1:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX,KMS_KEY_ID=<kms key="KEY" id="ID">}

SAMLCustomAuth_awslabs_samldemo

This is an API Gateway custom authorizer called after the client has been redirected to the website as part of the login workflow. This function calls a GET against the service proxy to DynamoDB, retrieving credentials. The function uses the KMS key signing validation of the JWT created in the ProcessSAML_awslabs_samldemo function and also validates the random code that was generated at the beginning of the login workflow.

You must install the dependencies before zipping this function up. It has the following environment variables:

  • SESSION_DDB_TABLE: SAMLSessions
  • ENC_CONTEXT: ADFS (or whatever was used in GenerateKey_awslabs_samldemo)
  • ID_HASH: us-east-1:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX

Navigate into /Scenario2/lambda/CustomAuth and run:

npm install

zip –r custom_auth.zip .

aws lambda create-function --function-name SAMLCustomAuth_awslabs_samldemo --runtime nodejs4.3 --role LAMBDA_ROLE_ARN --handler CustomAuth.handler --timeout 10 --memory-size 512 --zip-file fileb://custom_auth.zip --environment Variables={SESSION_DDB_TABLE=SAMLSessions,ENC_CONTEXT=ADFS,ID_HASH= us-east-1:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX }

ProcessSAML_awslabs_samldemo

This function is called when ADFS sends the SAMLResponse to API Gateway. The function parses the SAML assertion to select a role (based on a simple string search) and extract user information. It then uses this data to get short-term credentials from STS via AssumeRoleWithSAML and stores this information in a SAMLSessions table and tracks the user login via a SAMLUsers table. Both of these are DynamoDB tables but you could also store the user information in another AWS database type, as this is for auditing purposes. Finally, this function creates a JWT (signed with the KMS key) which is only valid for 30 seconds and is returned to the client as part of a 302 redirect from API Gateway.

This function needs to be built on an EC2 server running Amazon Linux. This function leverages two main external libraries:

  • nJwt: Used for secure JWT creation for individual client sessions to get access to their records
  • libxmljs: Used for XML XPath queries of the decoded SAMLResponse from AD FS

Libxmljs uses native build tools and you should run this on EC2 running the same AMI as Lambda and with Node.js v4.3.2; otherwise, you might see errors. For more information about current Lambda AMI information, see Lambda Execution Environment and Available Libraries.

After you have the correct AMI launched in EC2 and have SSH open to that host, install Node.js. Ensure that the Node.js version on EC2 is 4.3.2, to match Lambda. If your version is off, you can roll back with NVM.

After you have set up Node.js, run the following command:

yum install -y make gcc*

Now, create a /saml folder on your EC2 server and copy up ProcessSAML.js and package.json from /Scenario2/lambda/ProcessSAML to the EC2 server. Here is a sample SCP command:

cd ProcessSAML/

ls

package.json    ProcessSAML.js

scp -i ~/path/yourpemfile.pem ./* [email protected]:/home/ec2-user/saml/

Then you can SSH to your server, cd into the /saml directory, and run:

npm install

A successful build should look similar to the following:

lambdasamltwo_2.png

Finally, zip up the package and create the function using the following AWS CLI command and these environment variables. Configure the CLI with your credentials as needed.

  • SESSION_DDB_TABLE: SAMLSessions
  • ENC_CONTEXT: ADFS (or whatever was used in GenerateKeyawslabssamldemo)
  • PRINCIPAL_ARN: Full ARN of the AD FS IdP created in the IAM console
  • USER_DDB_TABLE: SAMLUsers
  • REDIRECT_URL: Endpoint URL of your static S3 website (or CloudFront distribution domain name if you did that optional step)
  • ID_HASH: us-east-1:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
zip –r saml.zip .

aws lambda create-function --function-name ProcessSAML_awslabs_samldemo --runtime nodejs4.3 --role LAMBDA_ROLE_ARN --handler ProcessSAML.handler --timeout 10 --memory-size 512 --zip-file fileb://saml.zip –environment Variables={USER_DDB_TABLE=SAMLUsers,SESSION_DDB_TABLE= SAMLSessions,REDIRECT_URL=<your S3 bucket and test page path>,ID_HASH=us-east-1:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX,ENC_CONTEXT=ADFS,PRINCIPAL_ARN=<your ADFS IdP ARN>}

If you built the first two functions on your workstation and created the ProcessSAML_awslabs_samldemo function separately in the Lambda console before building on EC2, you can update the code after building on with the following command:

aws lambda update-function-code --function-name ProcessSAML_awslabs_samldemo --zip-file fileb://saml.zip

Role trust policy configuration

This scenario uses STS directly to assume a role. You will need to complete this step even if you use the SAM template. Modify the trust policy, as you did before when Amazon Cognito was assuming the role. In the GitHub repository sample code, ProcessSAML.js is preconfigured to filter and select a role with “Prod” in the name via the selectedRole variable.

This is an example of business logic you can alter in your organization later, such as a callout to an external mapping database for other rules matching. In this tutorial, it corresponds to the ADFS-Production role that was created.

  1. In the IAM console, choose Roles and open the ADFS-Production Role.
  2. Edit the Trust Permissions field and replace the content with the following:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Federated": [
              "arn:aws:iam::ACCOUNTNUMBER:saml-provider/ADFS"
    ]
          },
          "Action": "sts:AssumeRoleWithSAML"
        }
      ]
    }

If you end up using another role (or add more complex filtering/selection logic), ensure that those roles have similar trust policy configurations. Also note that the sample policy above purposely uses an array for the federated provider matching the IdP ARN that you added. If your environment has multiple SAML providers, you could list them here and modify the code in ProcessSAML.js to process requests from different IdPs and grant or revoke credentials accordingly.

DynamoDB table creation

If you are not using the SAM template, create two DynamoDB tables:

  • SAMLSessions: Temporarily stores credentials from STS. Credentials are removed by an API Gateway Service Proxy to the DynamoDB DeleteItem call that simultaneously returns the credentials to the client.
  • SAMLUsers: This table is for tracking user information and the last time they authenticated in the system via ADFS.

The following AWS CLI commands creates the tables (indexed only with a primary key hash, called identityHash and CognitoID respectively):

aws dynamodb create-table \
    --table-name SAMLSessions \
    --attribute-definitions \
        AttributeName=group,AttributeType=S \
    --key-schema AttributeName=identityhash,KeyType=HASH \
    --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5
aws dynamodb create-table \
    --table-name SAMLUsers \
    --attribute-definitions \
        AttributeName=CognitoID,AttributeType=S \
    --key-schema AttributeName=CognitoID,KeyType=HASH \
    --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

After the tables are created, you should be able to run the GenerateKey_awslabs_samldemo Lambda function and see a CipherText key stored in SAMLSessions. This is only for convenience of this post, to demonstrate that you should persist CipherText keys in a data store and never persist plaintext keys that have been decrypted. You should also never log plaintext keys in your code.

API Gateway configuration

If you are not using the SAM template, you will need to create API Gateway resources. If you have created resources for Scenario 1 in Part I, then the naming of these resources may be similar. If that is the case, then simply create an API with a different name (SAMLAuth2 or similar) and follow these steps accordingly.

  1. In the API Gateway console for your API, choose Authorizers, Custom Authorizer.
  2. Select your region and enter SAMLCustomAuth_awslabs_samldemo for the Lambda function. Choose a friendly name like JWTParser and ensure that Identity token source is method.request.header.Authorization. This tells the custom authorizer to look for the JWT in the Authorization header of the HTTP request, which is specified in the JavaScript code on your S3 webpage. Save the changes.

    lambdasamltwo_3.png

Now it’s time to wire up the Lambda functions to API Gateway.

  1. In the API Gateway console, choose Resources, select your API, and then create a Child Resource called SAML. This includes a POST and a GET method. The POST method uses the ProcessSAML_awslabs_samldemo Lambda function and a 302 redirect, while the GET method uses the JWTParser custom authorizer with a service proxy to DynamoDB to retrieve credentials upon successful authorization.
  2. lambdasamltwo_4.png

  3. Create a POST method. For Integration Type, choose Lambda and add the ProcessSAML_awslabs_samldemo Lambda function. For Method Request, add headers called RelayState and SAMLResponse.

    lambdasamltwo_5.png

  4. Delete the Method Response code for 200 and add a 302. Create a response header called Location. In the Response Models section, for Content-Type, choose application/json and for Models, choose Empty.

    lambdasamltwo_6.png

  5. Delete the Integration Response section for 200 and add one for 302 that has a Method response status of 302. Edit the response header for Location to add a Mapping value of integration.response.body.location.

    lambdasamltwo_7.png

  6. Finally, in order for Lambda to capture the SAMLResponse and RelayState values, choose Integration Request.

  7. In the Body Mapping Template section, for Content-Type, enter application/x-www-form-urlencoded and add the following template:

    {
    "SAMLResponse" :"$input.params('SAMLResponse')",
    "RelayState" :"$input.params('RelayState')",
    "formparams" : $input.json('$')
    }

  8. Create a GET method with an Integration Type of Service Proxy. Select the region and DynamoDB as the AWS Service. Use POST for the HTTP method and DeleteItem for the Action. This is important as you leverage a DynamoDB feature to return the current records when you perform deletion. This simultaneously allows credentials in this system to not be stored long term and also allows clients to retrieve them. For Execution role, use the Lambda role from earlier or a new role that only has IAM scoped permissions for DeleteItem on the SAMLSessions table.

    lambdasamltwo_8.png

  9. Save this and open Method Request.

  10. For Authorization, select your custom authorizer JWTParser. Add in a header called COGNITO_ID and save the changes.

    lambdasamltwo_9.png

  11. In the Integration Request, add in a header name of Content-Type and a value for Mapped of ‘application/x-amzn-json-1.0‘ (you need the single quotes surrounding the entry).

  12. Next, in the Body Mapping Template section, for Content-Type, enter application/json and add the following template:

    {
        "TableName": "SAMLSessions",
        "Key": {
            "identityhash": {
                "S": "$input.params('COGNITO_ID')"
            }
        },
        "ReturnValues": "ALL_OLD"
    }

Inspect this closely for a moment. When your client passes the JWT in an Authorization Header to this GET method, the JWTParser Custom Authorizer grants/denies executing a DeleteItem call on the SAMLSessions table.

ADF

If it is granted, then there needs to be an item to delete the reference as a primary key to the table. The client JavaScript (seen in a moment) passes its CognitoID through as a header called COGNITO_ID that is mapped above. DeleteItem executes to remove the credentials that were placed there via a call to STS by the ProcessSAML_awslabs_samldemo Lambda function. Because the above action specifies ALL_OLD under the ReturnValues mapping, DynamoDB returns these credentials at the same time.

lambdasamltwo_10.png

  1. Save the changes and open your /saml resource root.
  2. Choose Actions, Enable CORS.
  3. In the Access-Control-Allow-Headers section, add COGNITO_ID into the end (inside the quotes and separated from other headers by a comma), then choose Enable CORS and replace existing CORS headers.
  4. When completed, choose Actions, Deploy API. Use the Prod stage or another stage.
  5. In the Stage Editor, choose SDK Generation. For Platform, choose JavaScript and then choose Generate SDK. Save the folder someplace close. Take note of the Invoke URL value at the top, as you need this for ADFS configuration later.

Website configuration

If you are not using the SAM template, create an S3 bucket and configure it as a static website in the same way that you did for Part I.

If you are using the SAM template this will automatically be created for you however the steps below will still need to be completed:

In the source code repository, edit /Scenario2/website/configs.js.

  1. Ensure that the identityPool value matches your Amazon Cognito Pool ID and the region is correct.
  2. Leave adfsUrl the same if you’re testing on your lab server; otherwise, update with the AD FS DNS entries as appropriate.
  3. Update the relayingPartyId value as well if you used something different from the prerequisite blog post.

Next, download the minified version of the AWS SDK for JavaScript in the Browser (aws-sdk.min.js) and place it along with the other files in /Scenario2/website into the S3 bucket.

Copy the files from the API Gateway Generated SDK in the last section to this bucket so that the apigClient.js is in the root directory and lib folder is as well. The imports for these scripts (which do things like sign API requests and configure headers for the JWT in the Authorization header) are already included in the index.html file. Consult the latest API Gateway documentation if the SDK generation process updates in the future

ADFS configuration

Now that the AWS setup is complete, modify your ADFS setup to capture RelayState information about the client and to send the POST response to API Gateway for processing. You will need to complete this step even if you use the SAM template.

If you’re using Windows Server 2008 with ADFS 2.0, ensure that Update Rollup 2 is installed before enabling RelayState. Please see official Microsoft documentation for specific download information.

  1. After Update Rollup 2 is installed, modify %systemroot%\inetpub\adfs\ls\web.config. If you’re on a newer version of Windows Server running AD FS 3.0, modify %systemroot%\ADFS\Microsoft.IdentityServer.Servicehost.exe.config.
  2. Find the section in the XML marked <Microsoft.identityServer.web> and add an entry for <useRelayStateForIdpInitiatedSignOn enabled="true">. If you have the proper ADFS rollup or version installed, this should allow the RelayState parameter to be accepted by the service provider.
  3. In the ADFS console, open Relaying Party Trusts for Amazon Web Services and choose Endpoints.
  4. For Binding, choose POST and for Invoke URL,enter the URL to your API Gateway from the stage that you noted earlier.

At this point, you are ready to test out your webpage. Navigate to the S3 static website Endpoint URL and it should redirect you to the ADFS login screen. If the user login has been recent enough to have a valid SAML cookie, then you should see the login pass-through; otherwise, a login prompt appears. After the authentication has taken place, you should quickly end up back at your original webpage. Using the browser debugging tools, you see “Successful DDB call” followed by the results of a call to STS that were stored in DynamoDB.

lambdasamltwo_11.png

As in Scenario 1, the sample code under /scenario2/website/index.html has a button that allows you to “ping” an endpoint to test if the federated credentials are working. If you have used the SAM template this should already be working and you can test it out (it will fail at first – keep reading to find out how to set the IAM permissions!). If not go to API Gateway and create a new Resource called /users at the same level of /saml in your API with a GET method.

lambdasamltwo_12.png

For Integration type, choose Mock.

lambdasamltwo_13.png

In the Method Request, for Authorization, choose AWS_IAM. In the Integration Response, in the Body Mapping Template section, for Content-Type, choose application/json and add the following JSON:

{
    "status": "Success",
    "agent": "${context.identity.userAgent}"
}

lambdasamltwo_14.png

Before using this new Mock API as a test, configure CORS and re-generate the JavaScript SDK so that the browser knows about the new methods.

  1. On the /saml resource root and choose Actions, Enable CORS.
  2. In the Access-Control-Allow-Headers section, add COGNITO_ID into the endpoint and then choose Enable CORS and replace existing CORS headers.
  3. Choose Actions, Deploy API. Use the stage that you configured earlier.
  4. In the Stage Editor, choose SDK Generation and select JavaScript as your platform. Choose Generate SDK.
  5. Upload the new apigClient.js and lib directory to the S3 bucket of your static website.

One last thing must be completed before testing (You will need to complete this step even if you use the SAM template) if the credentials can invoke this mock endpoint with AWS_IAM credentials. The ADFS-Production Role needs execute-api:Invoke permissions for this API Gateway resource.

  1. In the IAM console, choose Roles, and open the ADFS-Production Role.

  2. For testing, you can attach the AmazonAPIGatewayInvokeFullAccess policy; however, for production, you should scope this down to the resource as documented in Control Access to API Gateway with IAM Permissions.

  3. After you have attached a policy with invocation rights and authenticated with AD FS to finish the redirect process, choose PING.

If everything has been set up successfully you should see an alert with information about the user agent.

Final Thoughts

We hope these scenarios and sample code help you to not only begin to build comprehensive enterprise applications on AWS but also to enhance your understanding of different AuthN and AuthZ mechanisms. Consider some ways that you might be able to evolve this solution to meet the needs of your own customers and innovate in this space. For example:

  • Completing the CloudFront configuration and leveraging SSL termination for site identification. See if this can be incorporated into the Lambda processing pipeline.
  • Attaching a scope-down IAM policy if the business rules are matched. For example, the default role could be more permissive for a group but if the user is a contractor (username with –C appended) they get extra restrictions applied when assumeRoleWithSaml is called in the ProcessSAML_awslabs_samldemo Lambda function.
  • Changing the time duration before credentials expire on a per-role basis. Perhaps if the SAMLResponse parsing determines the user is an Administrator, they get a longer duration.
  • Passing through additional user claims in SAMLResponse for further logical decisions or auditing by adding more claim rules in the ADFS console. This could also be a mechanism to synchronize some Active Directory schema attributes with AWS services.
  • Granting different sets of credentials if a user has accounts with multiple SAML providers. While this tutorial was made with ADFS, you could also leverage it with other solutions such as Shibboleth and modify the ProcessSAML_awslabs_samldemo Lambda function to be aware of the different IdP ARN values. Perhaps your solution grants different IAM roles for the same user depending on if they initiated a login from Shibboleth rather than ADFS?

The Lambda functions can be altered to take advantage of these options which you can read more about here. For more information about ADFS claim rule language manipulation, see The Role of the Claim Rule Language on Microsoft TechNet.

We would love to hear feedback from our customers on these designs and see different secure application designs that you’re implementing on the AWS platform.

Streamline AMI Maintenance and Patching Using Amazon EC2 Systems Manager | Automation

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/streamline-ami-maintenance-and-patching-using-amazon-ec2-systems-manager-automation/

Here to tell you about using Automation for streamline AMI maintenance and patching is Taylor Anderson, a Senior Product Manager with EC2.

-Ana


 

Last December at re:Invent, we launched Amazon EC2 Systems Manager, which helps you automatically collect software inventory, apply OS patches, create system images, and configure Windows and Linux operating systems. These capabilities enable automated configuration and ongoing management of systems at scale, and help maintain software compliance for instances running in Amazon EC2 or on-premises.

One feature within Systems Manager is Automation, which can be used to patch, update agents, or bake applications into an Amazon Machine Image (AMI). With Automation, you can avoid the time and effort associated with manual image updates, and instead build AMIs through a streamlined, repeatable, and auditable process.

Recently, we released the first public document for Automation: AWS-UpdateLinuxAmi. This document allows you to automate patching of Ubuntu, CentOS, RHEL, and Amazon Linux AMIs, as well as automating the installation of additional site-specific packages and configurations.

More importantly, it makes it easy to get started with Automation, eliminating the need to first write an Automation document. AWS-UpdateLinuxAmi can also be used as a template when building your own Automation workflow. Windows users can expect the equivalent document―AWS-UpdateWindowsAmi―in the coming weeks.

AWS-UpdateLinuxAmi automates the following workflow:

  1. Launch a temporary EC2 instance from a source Linux AMI.
  2. Update the instance.
    • Invoke a user-provided, pre-update hook script on the instance (optional).
    • Update any AWS tools and agents on the instance, if present.
    • Update the instance’s distribution packages using the native package manager.
    • Invoke a user-provided post-update hook script on the instance (optional).
  3. Stop the temporary instance.
  4. Create a new AMI from the stopped instance.
  5. Terminate the instance.

Warning: Creation of an AMI from a running instance carries a risk that credentials, secrets, or other confidential information from that instance may be recorded to the new image. Use caution when managing AMIs created by this process.

Configuring roles and permissions for Automation

If you haven’t used Automation before, you need to configure IAM roles and permissions. This includes creating a service role for Automation, assigning a passrole to authorize a user to provide the service role, and creating an instance role to enable instance management under Systems Manager. For more details, see Configuring Access to Automation.

Executing Automation

      1. In the EC2 console, choose Systems Manager Services, Automations.
      2. Choose Run automation document
      3. Expand Document name and choose AWS-UpdateLinuxAmi.
      4. Choose the latest document version.
      5.  For SourceAmiId, enter the ID of the Linux AMI to update.
      6. For InstanceIamRole, enter the name of the instance role you created enabling Systems Manager to manage an instance (that is, it includes the AmazonEC2RoleforSSM managed policy). For more details, see Configuring Access to Automation.
      7.  For AutomationAssumeRole, enter the ARN of the service role you created for Automation. For more details, see Configuring Access to Automation.
      8.  Choose Run Automation.
      9. Monitor progress in the Automation Steps tab, and view step-level outputs.

After execution is complete, choose Description to view any outputs returned by the workflow. In this example, AWS-UpdateLinuxAmi returns the new AMI ID.

Next, choose Images, AMIs to view your new AMI.

There is no additional charge to use the Automation service, and any resources created by a workflow incur nominal charges. Note that if you terminate AWS-UpdateLinuxAmi before reaching the “Terminate Instance” step, shut down the temporary instance created by the workflow.

A CLI walkthrough of the above steps can be found at Automation CLI Walkthrough: Patch a Linux AMI.

Conclusion

Now that you’ve successfully run AWS-UpdateLinuxAmi, you may want to create default values for the service and instance roles. You can customize your workflow by creating your own Automation document based on AWS-UpdateLinuxAmi. For more details, see Create an Automaton Document. After you’ve created your document, you can write additional steps and add them to the workflow.

Example steps include:

      • Updating an Auto Scaling group with the new AMI ID (aws:invokeLambdaFunction action type)
      • Creating an encrypted copy of your new AMI (aws:encrypedCopy action type)
      • Validating your new AMI using Run Command with the RunPowerShellScript document (aws:runCommand action type)

Automation also makes a great addition to a CI/CD pipeline for application bake-in, and can be invoked as a CLI build step in Jenkins. For details on these examples, be sure to check out the Automation technical documentation. For updates on Automation, Amazon EC2 Systems Manager, Amazon CloudFormation, AWS Config, AWS OpsWorks and other management services, be sure to follow the all-new Management Tools blog.

 

New! Attach an AWS IAM Role to an Existing Amazon EC2 Instance by Using the AWS CLI

Post Syndicated from Apurv Awasthi original https://aws.amazon.com/blogs/security/new-attach-an-aws-iam-role-to-an-existing-amazon-ec2-instance-by-using-the-aws-cli/

AWS Identity and Access Management (IAM) roles enable your applications running on Amazon EC2 to use temporary security credentials that AWS creates, distributes, and rotates automatically. Using temporary credentials is an IAM best practice because you do not need to maintain long-term keys on your instance. Using IAM roles for EC2 also eliminates the need to use long-term AWS access keys that you have to manage manually or programmatically. Starting today, you can enable your applications to use temporary security credentials provided by AWS by attaching an IAM role to an existing EC2 instance. You can also replace the IAM role attached to an existing EC2 instance.

In this blog post, I show how you can attach an IAM role to an existing EC2 instance by using the AWS CLI.

Overview of the solution

In this blog post’s solution, I:

  1. Create an IAM role.
  2. Attach the IAM role to an existing EC2 instance that was originally launched without an IAM role.
  3. Replace the attached IAM role.

For the purpose of this post, I will use the placeholder, YourNewRole, to denote the newly created IAM role; the placeholder, YourNewRole-Instance-Profile, to denote the instance profile associated with this role; and the placeholder, YourInstanceId, to denote the existing instance. Be sure to replace these placeholders with the resource names from your account.

This post assumes you have set up the AWS Command Line Interface (CLI), have permissions to create an IAM role, and have permissions to call EC2 APIs.

Create an IAM role

Note: If you want to attach an existing IAM role, skip ahead to the “Attach the IAM role to an existing EC2 instance that was originally launched without an IAM role” section of this post. You also can create an IAM role using the console, and then skip ahead to the same section.

Before you can create an IAM role from the AWS CLI, you must create a trust policy. A trust policy permits AWS services such as EC2 to assume an IAM role on behalf of your application. To create the trust policy, copy the following policy and paste it in a text file that you save with the name, YourNewRole-Trust-Policy.json.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Now that you have created the trust policy, you are ready to create an IAM role that you can then attach to an existing EC2 instance.

To create an IAM role from the AWS CLI:

  1. Open the AWS CLI and call the create-role command to create the IAM role, YourNewRole, based on the trust policy, YourNewRole-Trust-Policy.json.
    $aws iam create-role --role-name YourNewRole --assume-role–policy-document file://YourNewRole-Trust-Policy.json
    

  2. Call the attach-role-policy command to grant this IAM role permission to access resources in your account. In this example, I assume your application requires read-only access to all Amazon S3 buckets in your account and objects inside the buckets. Therefore, I will use the AmazonS3ReadOnlyAccess AWS managed policy. For more information about AWS managed policies, see Working with Managed Policies.
    $aws iam attach-role-policy --role-name YourNewRole --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess

  3. Call the create-instance-profile command, followed by the add-role-to-instance-profile command to create the IAM instance profile, YourNewRole-Instance-Profile. The instance profile allows EC2 to pass the IAM role, YourNewRole, to an EC2 instance. To learn more, see Using Instance Profiles.
    $aws iam create-instance-profile --instance-profile-name YourNewRole-Instance-Profile
    $aws iam add-role-to-instance-profile --role-name YourNewRole --instance-profile-name YourNewRole-Instance-Profile

You have successfully created the IAM role, YourNewRole.

Attach the IAM role to an existing EC2 instance that was originally launched without an IAM role

You are now ready to attach the IAM role, YourNewRole, to the EC2 instance, YourInstanceId. To attach the role:

  1. Call the associate-iam-instance-profile command to attach the instance profile, YourNewRole-Instance-Profile, for the newly created IAM role, YourNewRole, to your EC2 instance, YourInstanceId.
    $aws ec2 associate-iam-instance-profile --instance-id YourInstanceId --iam-instance-profile Name=YourNewRole-Instance-Profile

  2. You can verify that the IAM role is now attached to the instance by calling the describe-iam-instance-profile-association command.
    $aws ec2 describe-iam-instance-profile-associations

  3. Now, you can update your application to use the IAM role to access AWS resources and delete the long-term keys from your instance.

Replace the attached IAM role

If your role requirements change and you need to modify the permissions you granted your EC2 instance via the IAM role, you can replace the policy attached to the IAM role. However, this will also modify permissions for other EC2 instances that use this IAM role.

Instead, you could call replace-iam-instance-profile-association to replace the currently attached IAM role, YourNewRole, with another IAM role without terminating your EC2 instance. In the following example, I use the placeholder, YourCurrentAssociation-id, to denote the current iam-instance-profile-association instance, and the placeholder, YourReplacementRole-Instance-Profile, to denote the replacement instance profile you want to associate with that instance. Be sure to replace these placeholders with the appropriate association-id and the IAM instance profile name from your account.

$aws ec2 replace-iam-instance-profile-association --association-id YourCurrentAssociation-id --iam-instance-profile Name=YourReplacementRole-Instance-Profile 

Note: You can get YourCurrentAssociation-id by making the describe-iam-instance-profile-associations call.

Conclusion

As I have shown in this post, you can enable your applications to use temporary security credentials provided by AWS by attaching an IAM role to an existing EC2 instance, without relaunching the instance. You can also replace the IAM role attached to an EC2 instance, without terminating mission-critical workloads.

If you have comments about this post, submit them in the “Comments” section below. If you have questions or suggestions, please start a new thread on the IAM forum.

– Apurv

Managing Secrets for Amazon ECS Applications Using Parameter Store and IAM Roles for Tasks

Post Syndicated from Chris Barclay original https://aws.amazon.com/blogs/compute/managing-secrets-for-amazon-ecs-applications-using-parameter-store-and-iam-roles-for-tasks/

Thanks to my colleague Stas Vonholsky  for a great blog on managing secrets with Amazon ECS applications.

—–

As containerized applications and microservice-oriented architectures become more popular, managing secrets, such as a password to access an application database, becomes more challenging and critical.

Some examples of the challenges include:

  • Support for various access patterns across container environments such as dev, test, and prod
  • Isolated access to secrets on a container/application level rather than at the host level
  • Multiple decoupled services with their own needs for access, both as services and as clients of other services

This post focuses on newly released features that support further improvements to secret management for containerized applications running on Amazon ECS. My colleague, Matthew McClean, also published an excellent post on the AWS Security Blog, How to Manage Secrets for Amazon EC2 Container Service–Based Applications by Using Amazon S3 and Docker, which discusses some of the limitations of passing and storing secrets with container parameter variables.

Most secret management tools provide the following functionality:

  • Highly secured storage system
  • Central management capabilities
  • Secure authorization and authentication mechanisms
  • Integration with key management and encryption providers
  • Secure introduction mechanisms for access
  • Auditing
  • Secret rotation and revocation

Amazon EC2 Systems Manager Parameter Store

Parameter Store is a feature of Amazon EC2 Systems Manager. It provides a centralized, encrypted store for sensitive information and has many advantages when combined with other capabilities of Systems Manager, such as Run Command and State Manager. The service is fully managed, highly available, and highly secured.

Because Parameter Store is accessible using the Systems Manager API, AWS CLI, and AWS SDKs, you can also use it as a generic secret management store. Secrets can be easily rotated and revoked. Parameter Store is integrated with AWS KMS so that specific parameters can be encrypted at rest with the default or custom KMS key. Importing KMS keys enables you to use your own keys to encrypt sensitive data.

Access to Parameter Store is enabled by IAM policies and supports resource level permissions for access. An IAM policy that grants permissions to specific parameters or a namespace can be used to limit access to these parameters. CloudTrail logs, if enabled for the service, record any attempt to access a parameter.

While Amazon S3 has many of the above features and can also be used to implement a central secret store, Parameter Store has the following added advantages:

  • Easy creation of namespaces to support different stages of the application lifecycle.
  • KMS integration that abstracts parameter encryption from the application while requiring the instance or container to have access to the KMS key and for the decryption to take place locally in memory.
  • Stored history about parameter changes.
  • A service that can be controlled separately from S3, which is likely used for many other applications.
  • A configuration data store, reducing overhead from implementing multiple systems.
  • No usage costs.

Note: At the time of publication, Systems Manager doesn’t support VPC private endpoint functionality. To enforce stricter access to a Parameter Store endpoint from a private VPC, use a NAT gateway with a set Elastic IP address together with IAM policy conditions that restrict parameter access to a limited set of IP addresses.

IAM roles for tasks

With IAM roles for Amazon ECS tasks, you can specify an IAM role to be used by the containers in a task. Applications interacting with AWS services must sign their API requests with AWS credentials. This feature provides a strategy for managing credentials for your applications to use, similar to the way that Amazon EC2 instance profiles provide credentials to EC2 instances.

Instead of creating and distributing your AWS credentials to the containers or using the EC2 instance role, you can associate an IAM role with an ECS task definition or the RunTask API operation. For more information, see IAM Roles for Tasks.

You can use IAM roles for tasks to securely introduce and authenticate the application or container with the centralized Parameter Store. Access to the secret manager should include features such as:

  • Limited TTL for credentials used
  • Granular authorization policies
  • An ID to track the requests in the logs of the central secret manager
  • Integration support with the scheduler that could map between the container or task deployed and the relevant access privileges

IAM roles for tasks support this use case well, as the role credentials can be accessed only from within the container for which the role is defined. The role exposes temporary credentials and these are rotated automatically. Granular IAM policies are supported with optional conditions about source instances, source IP addresses, time of day, and other options.

The source IAM role can be identified in the CloudTrail logs based on a unique Amazon Resource Name and the access permissions can be revoked immediately at any time with the IAM API or console. As Parameter Store supports resource level permissions, a policy can be created to restrict access to specific keys and namespaces.

Dynamic environment association

In many cases, the container image does not change when moving between environments, which supports immutable deployments and ensures that the results are reproducible. What does change is the configuration: in this context, specifically the secrets. For example, a database and its password might be different in the staging and production environments. There’s still the question of how do you point the application to retrieve the correct secret? Should it retrieve prod.app1.secret, test.app1.secret or something else?

One option can be to pass the environment type as an environment variable to the container. The application then concatenates the environment type (prod, test, etc.) with the relative key path and retrieves the relevant secret. In most cases, this leads to a number of separate ECS task definitions.

When you describe the task definition in a CloudFormation template, you could base the entry in the IAM role that provides access to Parameter Store, KMS key, and environment property on a single CloudFormation parameter, such as “environment type.” This approach could support a single task definition type that is based on a generic CloudFormation template.

Walkthrough: Securely access Parameter Store resources with IAM roles for tasks

This walkthrough is configured for the North Virginia region (us-east-1). I recommend using the same region.

Step 1: Create the keys and parameters

First, create the following KMS keys with the default security policy to be used to encrypt various parameters:

  • prod-app1 –used to encrypt any secrets for app1.
  • license-key –used to encrypt license-related secrets.
aws kms create-key --description prod-app1 --region us-east-1
aws kms create-key --description license-code --region us-east-1

Note the KeyId property in the output of both commands. You use it throughout the walkthrough to identify the KMS keys.

The following commands create three parameters in Parameter Store:

  • prod.app1.db-pass (encrypted with the prod-app1 KMS key)
  • general.license-code (encrypted with the license-key KMS key)
  • prod.app2.user-name (stored as a standard string without encryption)
aws ssm put-parameter --name prod.app1.db-pass --value "AAAAAAAAAAA" --type SecureString --key-id "<key-id-for-prod-app1-key>" --region us-east-1
aws ssm put-parameter --name general.license-code --value "CCCCCCCCCCC" --type SecureString --key-id "<key-id-for-license-code-key>" --region us-east-1
aws ssm put-parameter --name prod.app2.user-name --value "BBBBBBBBBBB" --type String --region us-east-1

Step 2: Create the IAM role and policies

Now, create a role and an IAM policy to be associated later with the ECS task that you create later on.
The trust policy for the IAM role needs to allow the ecs-tasks entity to assume the role.

{
   "Version": "2012-10-17",
   "Statement": [
     {
       "Sid": "",
       "Effect": "Allow",
       "Principal": {
         "Service": "ecs-tasks.amazonaws.com"
       },
       "Action": "sts:AssumeRole"
     }
   ]
 }

Save the above policy as a file in the local directory with the name ecs-tasks-trust-policy.json.

aws iam create-role --role-name prod-app1 --assume-role-policy-document file://ecs-tasks-trust-policy.json

The following policy is attached to the role and later associated with the app1 container. Access is granted to the prod.app1.* namespace parameters, the encryption key required to decrypt the prod.app1.db-pass parameter and the license code parameter. The namespace resource permission structure is useful for building various hierarchies (based on environments, applications, etc.).

Make sure to replace <key-id-for-prod-app1-key> with the key ID for the relevant KMS key and <account-id> with your account ID in the following policy.

{
     "Version": "2012-10-17",
     "Statement": [
         {
             "Effect": "Allow",
             "Action": [
                 "ssm:DescribeParameters"
             ],
             "Resource": "*"
         },
         {
             "Sid": "Stmt1482841904000",
             "Effect": "Allow",
             "Action": [
                 "ssm:GetParameters"
             ],
             "Resource": [
                 "arn:aws:ssm:us-east-1:<account-id>:parameter/prod.app1.*",
                 "arn:aws:ssm:us-east-1:<account-id>:parameter/general.license-code"
             ]
         },
         {
             "Sid": "Stmt1482841948000",
             "Effect": "Allow",
             "Action": [
                 "kms:Decrypt"
             ],
             "Resource": [
                 "arn:aws:kms:us-east-1:<account-id>:key/<key-id-for-prod-app1-key>"
             ]
         }
     ]
 }

Save the above policy as a file in the local directory with the name app1-secret-access.json:

aws iam create-policy --policy-name prod-app1 --policy-document file://app1-secret-access.json

Replace <account-id> with your account ID in the following command:

aws iam attach-role-policy --role-name prod-app1 --policy-arn "arn:aws:iam::<account-id>:policy/prod-app1"

Step 3: Add the testing script to an S3 bucket

Create a file with the script below, name it access-test.sh and add it to an S3 bucket in your account. Make sure the object is publicly accessible and note down the object link, for example https://s3-eu-west-1.amazonaws.com/my-new-blog-bucket/access-test.sh

#!/bin/bash
#This is simple bash script that is used to test access to the EC2 Parameter store.
# Install the AWS CLI
apt-get -y install python2.7 curl
curl -O https://bootstrap.pypa.io/get-pip.py
python2.7 get-pip.py
pip install awscli
# Getting region
EC2_AVAIL_ZONE=`curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone`
EC2_REGION="`echo \"$EC2_AVAIL_ZONE\" | sed -e 's:\([0-9][0-9]*\)[a-z]*\$:\\1:'`"
# Trying to retrieve parameters from the EC2 Parameter Store
APP1_WITH_ENCRYPTION=`aws ssm get-parameters --names prod.app1.db-pass --with-decryption --region $EC2_REGION --output text 2>&1`
APP1_WITHOUT_ENCRYPTION=`aws ssm get-parameters --names prod.app1.db-pass --no-with-decryption --region $EC2_REGION --output text 2>&1`
LICENSE_WITH_ENCRYPTION=`aws ssm get-parameters --names general.license-code --with-decryption --region $EC2_REGION --output text 2>&1`
LICENSE_WITHOUT_ENCRYPTION=`aws ssm get-parameters --names general.license-code --no-with-decryption --region $EC2_REGION --output text 2>&1`
APP2_WITHOUT_ENCRYPTION=`aws ssm get-parameters --names prod.app2.user-name --no-with-decryption --region $EC2_REGION --output text 2>&1`
# The nginx server is started after the script is invoked, preparing folder for HTML.
if [ ! -d /usr/share/nginx/html/ ]; then
mkdir -p /usr/share/nginx/html/;
fi
chmod 755 /usr/share/nginx/html/

# Creating an HTML file to be accessed at http://<public-instance-DNS-name>/ecs.html
cat > /usr/share/nginx/html/ecs.html <<EOF
<!DOCTYPE html>
<html>
<head>
<title>App1</title>
<style>
body {padding: 20px;margin: 0 auto;font-family: Tahoma, Verdana, Arial, sans-serif;}
code {white-space: pre-wrap;}
result {background: hsl(220, 80%, 90%);}
</style>
</head>
<body>
<h1>Hi there!</h1>
<p style="padding-bottom: 0.8cm;">Following are the results of different access attempts as expirienced by "App1".</p>

<p><b>Access to prod.app1.db-pass:</b><br/>
<pre><code>aws ssm get-parameters --names prod.app1.db-pass --with-decryption</code><br/>
<code><result>$APP1_WITH_ENCRYPTION</result></code><br/>
<code>aws ssm get-parameters --names prod.app1.db-pass --no-with-decryption</code><br/>
<code><result>$APP1_WITHOUT_ENCRYPTION</result></code></pre><br/>
</p>

<p><b>Access to general.license-code:</b><br/>
<pre><code>aws ssm get-parameters --names general.license-code --with-decryption</code><br/>
<code><result>$LICENSE_WITH_ENCRYPTION</result></code><br/>
<code>aws ssm get-parameters --names general.license-code --no-with-decryption</code><br/>
<code><result>$LICENSE_WITHOUT_ENCRYPTION</result></code></pre><br/>
</p>

<p><b>Access to prod.app2.user-name:</b><br/>
<pre><code>aws ssm get-parameters --names prod.app2.user-name --no-with-decryption</code><br/>
<code><result>$APP2_WITHOUT_ENCRYPTION</result></code><br/>
</p>

<p><em>Thanks for visiting</em></p>
</body>
</html>
EOF

Step 4: Create a test cluster

I recommend creating a new ECS test cluster with the latest ECS AMI and ECS agent on the instance. Use the following field values:

  • Cluster name: access-test
  • EC2 instance type: t2.micro
  • Number of instances: 1
  • Key pair: No EC2 key pair is required, unless you’d like to SSH to the instance and explore the running container.
  • VPC: Choose the default VPC. If unsure, you can find the VPC ID with the IP range 172.31.0.0/16 in the Amazon VPC console.
  • Subnets: Pick a subnet in the default VPC.
  • Security group: Create a new security group with CIDR block 0.0.0.0/0 and port 80 for inbound access.

Leave other fields with the default settings.

Create a simple task definition that relies on the public NGINX container and the role that you created for app1. Specify the properties such as the available container resources and port mappings. Note the command option is used to download and invoke a test script that installs the AWS CLI on the container, runs a number of get-parameter commands, and creates an HTML file with the results.

Replace <account-id> with your account ID, <your-S3-URI> with a link to the S3 object created in step 3 in the following commands:

aws ecs register-task-definition --family access-test --task-role-arn "arn:aws:iam::<account-id>:role/prod-app1" --container-definitions name="access-test",image="nginx",portMappings="[{containerPort=80,hostPort=80,protocol=tcp}]",readonlyRootFilesystem=false,cpu=512,memory=490,essential=true,entryPoint="sh,-c",command="\"/bin/sh -c \\\"apt-get update ; apt-get -y install curl ; curl -O <your-S3-URI> ; chmod +x access-test.sh ; ./access-test.sh ; nginx -g 'daemon off;'\\\"\"" --region us-east-1

aws ecs run-task --cluster access-test --task-definition access-test --count 1 --region us-east-1

Verifying access

After the task is in a running state, check the public DNS name of the instance and navigate to the following page:

http://<ec2-instance-public-DNS-name>/ecs.html

You should see the results of running different access tests from the container after a short duration.

If the test results don’t appear immediately, wait a few seconds and refresh the page.
Make sure that inbound traffic for port 80 is allowed on the security group attached to the instance.

The results you see in the static results HTML page should be the same as running the following commands from the container.

prod.app1.key1

aws ssm get-parameters --names prod.app1.db-pass --with-decryption --region us-east-1
aws ssm get-parameters --names prod.app1.db-pass --no-with-decryption --region us-east-1

Both commands should work, as the policy provides access to both the parameter and the required KMS key.

general.license-code

aws ssm get-parameters --names general.license-code --no-with-decryption --region us-east-1
aws ssm get-parameters --names general.license-code --with-decryption --region us-east-1

Only the first command with the “no-with-decryption” parameter should work. The policy allows access to the parameter in Parameter Store but there’s no access to the KMS key. The second command should fail with an access denied error.

prod.app2.user-name

aws ssm get-parameters --names prod.app2.user-name –no-with-decryption --region us-east-1

The command should fail with an access denied error, as there are no permissions associated with the namespace for prod.app2.

Finishing up

Remember to delete all resources (such as the KMS keys and EC2 instance), so that you don’t incur charges.

Conclusion

Central secret management is an important aspect of securing containerized environments. By using Parameter Store and task IAM roles, customers can create a central secret management store and a well-integrated access layer that allows applications to access only the keys they need, to restrict access on a container basis, and to further encrypt secrets with custom keys with KMS.

Whether the secret management layer is implemented with Parameter Store, Amazon S3, Amazon DynamoDB, or a solution such as Vault or KeyWhiz, it’s a vital part to the process of managing and accessing secrets.

How to Detect and Automatically Remediate Unintended Permissions in Amazon S3 Object ACLs with CloudWatch Events

Post Syndicated from Mustafa Torun original https://aws.amazon.com/blogs/security/how-to-detect-and-automatically-remediate-unintended-permissions-in-amazon-s3-object-acls-with-cloudwatch-events/

Amazon S3 Access Control Lists (ACLs) enable you to specify permissions that grant access to S3 buckets and objects. When S3 receives a request for an object, it verifies whether the requester has the necessary access permissions in the associated ACL. For example, you could set up an ACL for an object so that only the users in your account can access it, or you could make an object public so that it can be accessed by anyone.

If the number of objects and users in your AWS account is large, ensuring that you have attached correctly configured ACLs to your objects can be a challenge. For example, what if a user were to call the PutObjectAcl API call on an object that is supposed to be private and make it public? Or, what if a user were to call the PutObject with the optional Acl parameter set to public-read, therefore uploading a confidential file as publicly readable? In this blog post, I show a solution that uses Amazon CloudWatch Events to detect PutObject and PutObjectAcl API calls in near real time and helps ensure that the objects remain private by making automatic PutObjectAcl calls, when necessary.

Note that this process is a reactive approach, a complement to the proactive approach in which you would use the AWS Identity and Access Management (IAM) policy conditions to force your users to put objects with private access (see Specifying Conditions in a Policy for more information). The reactive approach I present in this post is for “just in case” situations in which the change on the ACL is accidental and must be fixed.

Solution overview

The following diagram illustrates this post’s solution:

  1. An IAM or root user in your account makes a PutObjectAcl or PutObject call.
  2. S3 sends the corresponding API call event to both AWS CloudTrail and CloudWatch Events in near real time.
  3. A CloudWatch Events rule delivers the event to an AWS Lambda function.
  4. If the object is in a bucket in which all the objects need to be private and the object is not private anymore, the Lambda function makes a PutObjectAcl call to S3 to make the object private.

Solution diagram

To detect the PutObjectAcl call and modify the ACL on the object, I:

  1. Turn on object-level logging in CloudTrail for the buckets I want to monitor.
  2. Create an IAM execution role to be used when the Lambda function is being executed so that Lambda can make API calls to S3 on my behalf.
  3. Create a Lambda function that receives the PutObjectAcl API call event, checks whether the call is for a monitored bucket, and, if so, ensures the object is private.
  4. Create a CloudWatch Events rule that matches the PutObjectAcl API call event and invokes the Lambda function created in the previous step.

The remainder of this blog post details the steps of this solution’s deployment and the testing of its setup.

Deploying the solution

In this section, I follow the four solution steps outlined in the previous section to use CloudWatch Events to detect and fix unintended access permissions in S3 object ACLs automatically. I start with turning on object-level logging in CloudTrail for the buckets of interest.

I use the AWS CLI in this section. (To learn more about setting up the AWS CLI, see Getting Set Up with the AWS Command Line Interface.) Before you start, make sure your installed AWS CLI is up to date (specifically, you must use version 1.11.28 or newer). You can check the version of your CLI as follows.

$ aws --version

I run the following AWS CLI command to create an S3 bucket named everything-must-be-private. Remember to replace the placeholder bucket names with your own bucket names, in this instance and throughout the post (bucket names are global in S3).

$ aws s3api create-bucket \
--bucket everything-must-be-private

As the bucket name suggests, I want all the files in this bucket to be private. In the rest of this section, I detail above four steps for deploying the solution.

Step 1: Turn on object-level logging in CloudTrail for the S3 bucket

In this step, I create a CloudTrail trail and turn on object-level logging for the bucket, everything-must-be-private. My goal is to achieve the following setup, which are also illustrated in the following diagram:

  1. A requester makes an API call against this bucket.
  2. I let a CloudTrail trail named my-object-level-s3-trail receive the corresponding events.
  3. I log the events to a bucket named bucket-for-my-object-level-s3-trail and deliver them to CloudWatch Events.

Diagram2-012417-MT

First, I create a file named bucket_policy.json and populate it with the following policy. Remember to replace the placeholder account ID with your own account ID, in this instance and throughout the post.

{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Principal": {
            "Service": "cloudtrail.amazonaws.com"
        },
        "Action": "s3:GetBucketAcl",
        "Resource": "arn:aws:s3:::bucket-for-my-object-level-s3-trail"
    }, {
        "Effect": "Allow",
        "Principal": {
            "Service": "cloudtrail.amazonaws.com"
        },
        "Action": "s3:PutObject",
        "Resource": "arn:aws:s3:::bucket-for-my-object-level-s3-trail/AWSLogs/123456789012/*",
        "Condition": {
            "StringEquals": {
                "s3:x-amz-acl": "bucket-owner-full-control"
            }
        }
    }]
}

Then, I run the following commands to create the bucket for logging with the preceding policy so that CloudTrail can deliver log files to the bucket.

$ aws s3api create-bucket \
--bucket bucket-for-my-object-level-s3-trail

$ aws s3api put-bucket-policy \
--bucket bucket-for-my-object-level-s3-trail \
--policy file://bucket_policy.json

Next, I create a trail and start logging on the trail.

$ aws cloudtrail create-trail \
--name my-object-level-s3-trail \
--s3-bucket-name bucket-for-my-object-level-s3-trail

$ aws cloudtrail start-logging \
--name my-object-level-s3-trail

I then create a file named my_event_selectors.json and populate it with the following content.

[{
        "IncludeManagementEvents": false,
        "DataResources": [{
            "Values": [
                 "arn:aws:s3:::everything-must-be-private/"
            ],
            "Type": "AWS::S3::Object"
        }],
        "ReadWriteType": "All"
}]

By default, S3 object-level operations are not logged in CloudTrail. Only bucket-level operations are logged. As a result, I finish my trail setup by creating the event selector shown previously to have the object-level operations logged in those two buckets. Note that I explicitly set IncludeManagementEvents to false because I want only object-level operations to be logged.

$ aws cloudtrail put-event-selectors \
--trail-name my-object-level-s3-trail \
--event-selectors file://my_event_selectors.json

Step 2: Create the IAM execution role for the Lambda function

In this step, I create an IAM execution role for my Lambda function. The role allows Lambda to perform S3 actions on my behalf while the function is being executed. The role also allows CreateLogGroup, CreateLogStream, and PutLogEvents CloudWatch Logs APIs so that the Lambda function can write logs for debugging.

I start with putting the following trust policy document in a file named trust_policy.json.

{
        "Version": "2012-10-17",
        "Statement": [{
            "Effect": "Allow",
            "Principal": {
                "Service": "lambda.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }]
}

Next, I run the following command to create the IAM execution role.

$ aws iam create-role \
--role-name AllowLogsAndS3ACL \
--assume-role-policy-document file://trust_policy.json

I continue by putting the following access policy document in a file named access_policy.json.

{
        "Version": "2012-10-17",
        "Statement": [{
            "Effect": "Allow",
            "Action": [
                "s3:GetObjectAcl",
                "s3:PutObjectAcl"
            ],
            "Resource": "arn:aws:s3:::everything-must-be-private/*"
        }, {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        }]
}

Finally, I run the following command to define the access policy for the IAM execution role I have just created.

$ aws iam put-role-policy \
--role-name AllowLogsAndS3ACL \
--policy-name AllowLogsAndS3ACL \
--policy-document file://access_policy.json

Step 3: Create a Lambda function that processes the PutObjectAcl API call event

In this step, I create the Lambda function that processes the event. It also decides whether the ACL on the object needs to be changed, and, if so, makes a PutObjectAcl call to make it private. I start with adding the following code to a file named lambda_function.py.

from __future__ import print_function

import json
import boto3

print('Loading function')

s3 = boto3.client('s3')

bucket_of_interest = "everything-must-be-private"

# For a PutObjectAcl API Event, gets the bucket and key name from the event
# If the object is not private, then it makes the object private by making a
# PutObjectAcl call.
def lambda_handler(event, context):
    # Get bucket name from the event
    bucket = event['detail']['requestParameters']['bucketName']
    if (bucket != bucket_of_interest):
        print("Doing nothing for bucket = " + bucket)
        return
    
    # Get key name from the event
    key = event['detail']['requestParameters']['key']
    
    # If object is not private then make it private
    if not (is_private(bucket, key)):
        print("Object with key=" + key + " in bucket=" + bucket + " is not private!")
        make_private(bucket, key)
    else:
        print("Object with key=" + key + " in bucket=" + bucket + " is already private.")
    
# Checks an object with given bucket and key is private
def is_private(bucket, key):
    # Get the object ACL from S3
    acl = s3.get_object_acl(Bucket=bucket, Key=key)
    
    # Private object should have only one grant which is the owner of the object
    if (len(acl['Grants']) > 1):
        return False
    
    # If canonical owner and grantee ids do no match, then conclude that the object
    # is not private
    owner_id = acl['Owner']['ID']
    grantee_id = acl['Grants'][0]['Grantee']['ID']
    if (owner_id != grantee_id):
        return False
    return True

# Makes an object with given bucket and key private by calling the PutObjectAcl API.
def make_private(bucket, key):
    s3.put_object_acl(Bucket=bucket, Key=key, ACL="private")
    print("Object with key=" + key + " in bucket=" + bucket + " is marked as private.")

Next, I zip the lambda_function.py file into an archive named CheckAndCorrectObjectACL.zip and run the following command to create the Lambda function.

$ aws lambda create-function \
--function-name CheckAndCorrectObjectACL \
--zip-file fileb://CheckAndCorrectObjectACL.zip \
--role arn:aws:iam::123456789012:role/AllowLogsAndS3ACL \
--handler lambda_function.lambda_handler \
--runtime python2.7

Finally, I run the following command to allow CloudWatch Events to invoke my Lambda function for me.

$ aws lambda add-permission \
--function-name CheckAndCorrectObjectACL \
--statement-id AllowCloudWatchEventsToInvoke \
--action 'lambda:InvokeFunction' \
--principal events.amazonaws.com \
--source-arn arn:aws:events:us-east-1:123456789012:rule/S3ObjectACLAutoRemediate

Step 4: Create the CloudWatch Events rule

Now, I create the CloudWatch Events rule that is triggered when an event is received from S3. I start with defining the event pattern for my rule. I create a file named event_pattern.json and populate it with the following code.

{
	"detail-type": [
		"AWS API Call via CloudTrail"
	],
	"detail": {
		"eventSource": [
			"s3.amazonaws.com"
		],
		"eventName": [
			"PutObjectAcl",
			"PutObject"
		],
		"requestParameters": {
			"bucketName": [
				"everything-must-be-private"
			]
		}
	}
}

With this event pattern, I am configuring my rule so that it is triggered only when a PutObjectAcl or a PutObject API call event from S3 via CloudTrail is delivered for the objects in the bucket I choose. I finish my setup by running the following commands to create the CloudWatch Events rule and adding to it as a target the Lambda function I created in the previous step.

$ aws events put-rule \
--name S3ObjectACLAutoRemediate \
--event-pattern file://event_pattern.json

$ aws events put-targets \
--rule S3ObjectACLAutoRemediate \
--targets Id=1,Arn=arn:aws:lambda:us-east-1:123456789012:function:CheckAndCorrectObjectACL

Test the setup

From now on, whenever a user in my account makes a PutObjectAcl call against the bucket everything-must-be-private, S3 will deliver the corresponding event to CloudWatch Events via CloudTrail. The event must match the CloudWatch Events rule in order to be delivered to the Lambda function. Finally, the function checks if the object ACL is expected. If not, the function makes the object private.

For testing the setup, I create an empty file named MyCreditInfo in the bucket everything-must-be-private, and I check its ACL.

$ aws s3api put-object \
--bucket everything-must-be-private \
--key MyCreditInfo

$ aws s3api get-object-acl \
--bucket everything-must-be-private \
--key MyCreditInfo

In the response to my command, I see only one grantee, which is the owner (me). This means the object is private. Now, I add public read access to this object (which is supposed to stay private).

$ aws s3api put-object-acl \
--bucket everything-must-be-private \
--key MyCreditInfo \
--acl public-read

If I act quickly and describe the ACL on the object again by calling the GetObjectAcl API, I see another grantee that allows everybody to read this object.

{
	"Grantee": {
		"Type": "Group",
		"URI": "http://acs.amazonaws.com/groups/global/AllUsers"
	},
	"Permission": "READ"
}

When I describe the ACL again, I see that the grantee for public read access has been removed. Therefore, the file is private again, as it should be. You can also test the PutObject API call by putting another object in this bucket with public read access.

$ aws s3api put-object \
--bucket everything-must-be-private \
--key MyDNASequence \
--acl public-read

Conclusion

In this post, I showed how you can detect unintended public access permissions in the ACL of an S3 object and how to revoke them automatically with the help of CloudWatch Events. Keep in mind that object-level S3 API call events and Lambda functions are only a small set of the events and targets that are available in CloudWatch Events. To learn more, see Using CloudWatch Events.

If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about this post or how to implement the solution described, please start a new thread on the CloudWatch forum.

– Mustafa

SAML Identity Federation: Follow-Up Questions, Materials, Guides, and Templates from an AWS re:Invent 2016 Workshop (SEC306)

Post Syndicated from Quint Van Deman original https://aws.amazon.com/blogs/security/saml-identity-federation-follow-up-questions-materials-guides-and-templates-from-an-aws-reinvent-2016-workshop-sec306/

As part of the re:Source Mini Con for Security Services at AWS re:Invent 2016, we conducted a workshop focused on Security Assertion Markup Language (SAML) identity federation: Choose Your Own SAML Adventure: A Self-Directed Journey to AWS Identity Federation Mastery. As part of this workshop, attendees were able to submit their own federation-focused questions to a panel of AWS experts. In this post, I share the questions and answers from that workshop because this information can benefit any AWS customer interested in identity federation.

I have also made available the full set of workshop materials, lab guides, and AWS CloudFormation templates. I encourage you to use these materials to enrich your exploration of SAML for use with AWS.

Q: SAML assertions are limited to 50,000 characters. We often hit this limit by being in too many groups. What can AWS do to resolve this size-limit problem?

A: Because the SAML assertion is ultimately part of an API call, an upper bound must be in place for the assertion size.

On the AWS side, your AWS solution architect can log a feature request on your behalf to increase the maximum size of the assertion in a future release. The AWS service teams use these feature requests, in conjunction with other avenues of customer feedback, to plan and prioritize the features they deliver. To facilitate this process you need two things: the proposed higher value to which you’d like to see the maximum size raised, and a short written description that would help us understand what this increased limit would enable you to do.

On the AWS customer’s side, we often find that these cases are most relevant to centralized cloud teams that have broad, persistent access across many roles and accounts. This access is often necessary to support troubleshooting or simply as part of an individual’s job function. However, in many cases, exchanging persistent access for just-in-time access enables the same level of access but with better levels of visibility, reduced blast radius, and better adherence to the principle of least privilege. For example, you might implement a fast, efficient, and monitored workflow that allows you to provision a user into the necessary backend directory group for a short duration when needed in lieu of that user maintaining all of those group memberships on a persistent basis. This approach could effectively resolve the limit issue you are facing.

Q: Can we use OpenID Connect (OIDC) for federated authentication and authorization into the AWS Management Console? If so, does it have a similar size limit?

A: Currently, AWS support for OIDC is oriented around providing access to AWS resources from mobile or web applications, not access to the AWS Management Console. This is possible to do, but it requires the construction of a custom identity broker. In this solution, this broker would consume the OIDC identity, use its own logic to authenticate and authorize the user (thus being subject only to any size limits you enforce on the OIDC side), and use the sts:GetFederatedToken call to vend the user an AWS Security Token Service (STS) token for either AWS Management Console use or API/CLI use. During this sts:GetFederatedToken call, you attach a scoping policy with a limit of 2,048 characters. See Creating a URL that Enables Federated Users to Access the AWS Management Console (Custom Federation Broker) for additional details about custom identity brokers.

Q: We want to eliminate permanent AWS Identity and Access Management (IAM) access keys, but we cannot do so because of third-party tools. We are contemplating using HashiCorp Vault to vend permanent keys. Vault lets us tie keys to LDAP identities. Have you seen this work elsewhere? Do you think it will work for us?

A: For third-party tools that can run within Amazon EC2, you should use EC2 instance profiles to eliminate long-term credentials and their associated management (distribution, rotation, etc.). For third-party tools that cannot run within EC2, most customers opt to leverage their existing secrets-management tools and processes for the long-lived keys. These tools are often enhanced to make use of AWS APIs such as iam:GetCredentialReport (rotation information) and iam:{Create,Update,Delete}AccessKey (rotation operations). HashiCorp Vault is a popular tool with an available AWS Quick Start Reference Deployment, but any secrets management platform that is able to efficiently fingerprint the authorized resources and is extensible to work with the previously mentioned APIs will fill this need for you nicely.

Q: Currently, we use an Object Graph Navigation Library (OGNL) script in our identity provider (IdP) to build role Amazon Resource Names (ARNs) for the role attribute in the SAML assertion. The script consumes a list of distribution list display names from our identity management platform of which the user is a member. There is a 60-character limit on display names, which leaves no room for IAM pathing (which has a 512-character limit). We are contemplating a change. The proposed solution would make AWS API calls to get role ARNs from the AWS APIs. Have you seen this before? Do you think it will work? Does the AWS SAML integration support full-length role ARNs that would include up to 512 characters for the IAM path?

A: In most cases, we recommend that you use regular expression-based transformations within your IdP to translate a list of group names to a list of role ARNs for inclusion in the SAML assertion. Without pathing, you need to know the 12-digit AWS account number and the role name in order to be able to do so, which is accomplished using our recommended group-naming convention (AWS-Acct#-RoleName). With pathing (because “/” is not a valid character for group names within most directories), you need an additional source from which to pull this third data element. This could be as simple as an extra dimension in the group name (such as AWS-Acct#-Path-RoleName); however, that would multiply the number of groups required to support the solution. Instead, you would most likely derive the path element from a user attribute, dynamic group information, or even an external information store. It should work as long as you can reliably determine all three data elements for the user.

We do not recommend drawing the path information from AWS APIs, because the logic within the IdP is authoritative for the user’s authorization. In other words, the IdP should know for which full path role ARNs the user is authorized without asking AWS. You might consider using the AWS APIs to validate that the role actually exists, but that should really be an edge case. This is because you should integrate any automation that builds and provisions the roles with the frontend authorization layer. This way, there would never be a case in which the IdP authorizes a user for a role that doesn’t exist. AWS SAML integration supports full-length role ARNs.

Q: Why do AWS STS tokens contain a session token? This makes them incompatible with third-party tools that only support permanent keys. Is there a way to get rid of the session token to make temporary keys that contain only the access_key and the secret_key components?

A: The session token contains information that AWS uses to confirm that the AWS STS token is valid. There is no way to create a token that does not contain this third component. Instead, the preferred AWS mechanisms for distributing AWS credentials to third-party tools are EC2 instance profiles (in EC2), IAM cross-account trust (SaaS), or IAM access keys with secrets management (on-premises). Using IAM access keys, you can rotate the access keys as often as the third-party tool and your secrets management platform allow.

Q: How can we use SAML to authenticate and authorize code instead of having humans do the work? One proposed solution is to use our identity management (IDM) platform to generate X.509 certificates for identities, and then present these certificates to our IdP in order to get valid SAML assertions. This could then be included in an sts:AssumeRoleWithSAML call. Have you seen this working before? Do you think it will work for us?

A: Yes, when you receive a SAML assertion from your IdP by using your desired credential form (user name/password, X.509, etc.), you can use the sts:AssumeRoleWithSAML call to retrieve an AWS STS token. See How to Implement a General Solution for Federated API/CLI Access Using SAML 2.0 for a reference implementation.

Q: As a follow-up to the previous question, how can we get code using multi-factor authentication (MFA)? There is a gauth project that uses NodeJS to generate virtual MFAs. Code could theoretically get MFA codes from the NodeJS gauth server.

A: The answer depends on your choice of IdP and MFA provider. Generically speaking, you need to authenticate (either web-based or code-based) to the IdP using all of your desired factors before the SAML assertion is generated. This assertion can then include details of the authentication mechanism used as an additional attribute in role-assumption conditions within the trust policy in AWS. This lab guide from the workshop provides further details and a how-to guide for MFA-for-SAML.

This blog post clarifies some re:Invent 2016 attendees’ questions about SAML-based federation with AWS. For more information presented in the workshop, see the full set of workshop materials, lab guides, and CloudFormation templates. If you have follow-up questions, start a new thread in the IAM forum.

– Quint