Tag Archives: AWS Identity and Access Management (IAM)

Amazon Elasticsearch Service now supports VPC

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-elasticsearch-service-now-supports-vpc/

Starting today, you can connect to your Amazon Elasticsearch Service domains from within an Amazon VPC without the need for NAT instances or Internet gateways. VPC support for Amazon ES is easy to configure, reliable, and offers an extra layer of security. With VPC support, traffic between other services and Amazon ES stays entirely within the AWS network, isolated from the public Internet. You can manage network access using existing VPC security groups, and you can use AWS Identity and Access Management (IAM) policies for additional protection. VPC support for Amazon ES domains is available at no additional charge.

Getting Started

Creating an Amazon Elasticsearch Service domain in your VPC is easy. Follow all the steps you would normally follow to create your cluster and then select “VPC access”.

That’s it. There are no additional steps. You can now access your domain from within your VPC!

Things To Know

To support VPCs, Amazon ES places an endpoint into at least one subnet of your VPC. Amazon ES places an Elastic Network Interface (ENI) into the VPC for each data node in the cluster. Each ENI uses a private IP address from the IPv4 range of your subnet and receives a public DNS hostname. If you enable zone awareness, Amazon ES creates endpoints in two subnets in different availability zones, which provides greater data durability.

You need to set aside three times the number of IP addresses as the number of nodes in your cluster. You can divide that number by two if Zone Awareness is enabled. Ideally, you would create separate subnets just for Amazon ES.

A few notes:

  • Currently, you cannot move existing domains to a VPC or vice-versa. To take advantage of VPC support, you must create a new domain and migrate your data.
  • Currently, Amazon ES does not support Amazon Kinesis Firehose integration for domains inside a VPC.

To learn more, see the Amazon ES documentation.

Randall

How to Automatically Revert and Receive Notifications About Changes to Your Amazon VPC Security Groups

Post Syndicated from Rob Barnes original https://aws.amazon.com/blogs/security/how-to-automatically-revert-and-receive-notifications-about-changes-to-your-amazon-vpc-security-groups/

In a previous AWS Security Blog post, Jeff Levine showed how you can monitor changes to your Amazon EC2 security groups. The methods he describes in that post are examples of detective controls, which can help you determine when changes are made to security controls on your AWS resources.

In this post, I take that approach a step further by introducing an example of a responsive control, which you can use to automatically respond to a detected security event by applying a chosen security mitigation. I demonstrate a solution that continuously monitors changes made to an Amazon VPC security group, and if a new ingress rule (the same as an inbound rule) is added to that security group, the solution removes the rule and then sends you a notification after the changes have been automatically reverted.

The scenario

Let’s say you want to reduce your infrastructure complexity by replacing your Secure Shell (SSH) bastion hosts with Amazon EC2 Systems Manager (SSM). SSM allows you to run commands on your hosts remotely, removing the need to manage bastion hosts or rely on SSH to execute commands. To support this objective, you must prevent your staff members from opening SSH ports to your web server’s Amazon VPC security group. If one of your staff members does modify the VPC security group to allow SSH access, you want the change to be automatically reverted and then receive a notification that the change to the security group was automatically reverted. If you are not yet familiar with security groups, see Security Groups for Your VPC before reading the rest of this post.

Solution overview

This solution begins with a directive control to mandate that no web server should be accessible using SSH. The directive control is enforced using a preventive control, which is implemented using a security group rule that prevents ingress from port 22 (typically used for SSH). The detective control is a “listener” that identifies any changes made to your security group. Finally, the responsive control reverts changes made to the security group and then sends a notification of this security mitigation.

The detective control, in this case, is an Amazon CloudWatch event that detects changes to your security group and triggers the responsive control, which in this case is an AWS Lambda function. I use AWS CloudFormation to simplify the deployment.

The following diagram shows the architecture of this solution.

Solution architecture diagram

Here is how the process works:

  1. Someone on your staff adds a new ingress rule to your security group.
  2. A CloudWatch event that continually monitors changes to your security groups detects the new ingress rule and invokes a designated Lambda function (with Lambda, you can run code without provisioning or managing servers).
  3. The Lambda function evaluates the event to determine whether you are monitoring this security group and reverts the new security group ingress rule.
  4. Finally, the Lambda function sends you an email to let you know what the change was, who made it, and that the change was reverted.

Deploy the solution by using CloudFormation

In this section, you will click the Launch Stack button shown below to launch the CloudFormation stack and deploy the solution.

Prerequisites

  • You must have AWS CloudTrail already enabled in the AWS Region where you will be deploying the solution. CloudTrail lets you log, continuously monitor, and retain events related to API calls across your AWS infrastructure. See Getting Started with CloudTrail for more information.
  • You must have a default VPC in the region in which you will be deploying the solution. AWS accounts have one default VPC per AWS Region. If you’ve deleted your VPC, see Creating a Default VPC to recreate it.

Resources that this solution creates

When you launch the CloudFormation stack, it creates the following resources:

  • A sample VPC security group in your default VPC, which is used as the target for reverting ingress rule changes.
  • A CloudWatch event rule that monitors changes to your AWS infrastructure.
  • A Lambda function that reverts changes to the security group and sends you email notifications.
  • A permission that allows CloudWatch to invoke your Lambda function.
  • An AWS Identity and Access Management (IAM) role with limited privileges that the Lambda function assumes when it is executed.
  • An Amazon SNS topic to which the Lambda function publishes notifications.

Launch the CloudFormation stack

The link in this section uses the us-east-1 Region (the US East [N. Virginia] Region). Change the region if you want to use this solution in a different region. See Selecting a Region for more information about changing the region.

To deploy the solution, click the following Launch Stack button to launch the stack. After you click the button, you must sign in to the AWS Management Console if you have not already done so.

Click this "Launch Stack" button

Then:

  1. Choose Next to proceed to the Specify Details page.
  2. On the Specify Details page, type your email address in the Send notifications to box. This is the email address to which change notifications will be sent. (After the stack is launched, you will receive a confirmation email that you must accept before you can receive notifications.)
  3. Choose Next until you get to the Review page, and then choose the I acknowledge that AWS CloudFormation might create IAM resources check box. This confirms that you are aware that the CloudFormation template includes an IAM resource.
  4. Choose Create. CloudFormation displays the stack status, CREATE_COMPLETE, when the stack has launched completely, which should take less than two minutes.Screenshot showing that the stack has launched completely

Testing the solution

  1. Check your email for the SNS confirmation email. You must confirm this subscription to receive future notification emails. If you don’t confirm the subscription, your security group ingress rules still will be automatically reverted, but you will not receive notification emails.
  2. Navigate to the EC2 console and choose Security Groups in the navigation pane.
  3. Choose the security group created by CloudFormation. Its name is Web Server Security Group.
  4. Choose the Inbound tab in the bottom pane of the page. Note that only one rule allows HTTPS ingress on port 443 from 0.0.0.0/0 (from anywhere).Screenshot showing the "Inbound" tab in the bottom pane of the page
  1. Choose Edit to display the Edit inbound rules dialog box (again, an inbound rule and an ingress rule are the same thing).
  2. Choose Add Rule.
  3. Choose SSH from the Type drop-down list.
  4. Choose My IP from the Source drop-down list. Your IP address is populated for you. By adding this rule, you are simulating one of your staff members violating your organization’s policy (in this blog post’s hypothetical example) against allowing SSH access to your EC2 servers. You are testing the solution created when you launched the CloudFormation stack in the previous section. The solution should remove this newly created SSH rule automatically.
    Screenshot of editing inbound rules
  5. Choose Save.

Adding this rule creates an EC2 AuthorizeSecurityGroupIngress service event, which triggers the Lambda function created in the CloudFormation stack. After a few moments, choose the refresh button ( The "refresh" icon ) to see that the new SSH ingress rule that you just created has been removed by the solution you deployed earlier with the CloudFormation stack. If the rule is still there, wait a few more moments and choose the refresh button again.

Screenshot of refreshing the page to see that the SSH ingress rule has been removed

You should also receive an email to notify you that the ingress rule was added and subsequently reverted.

Screenshot of the notification email

Cleaning up

If you want to remove the resources created by this CloudFormation stack, you can delete the CloudFormation stack:

  1. Navigate to the CloudFormation console.
  2. Choose the stack that you created earlier.
  3. Choose the Actions drop-down list.
  4. Choose Delete Stack, and then choose Yes, Delete.
  5. CloudFormation will display a status of DELETE_IN_PROGRESS while it deletes the resources created with the stack. After a few moments, the stack should no longer appear in the list of completed stacks.
    Screenshot of stack "DELETE_IN_PROGRESS"

Other applications of this solution

I have shown one way to use multiple AWS services to help continuously ensure that your security controls haven’t deviated from your security baseline. However, you also could use the CIS Amazon Web Services Foundations Benchmarks, for example, to establish a governance baseline across your AWS accounts and then use the principles in this blog post to automatically mitigate changes to that baseline.

To scale this solution, you can create a framework that uses resource tags to identify particular resources for monitoring. You also can use a consolidated monitoring approach by using cross-account event delivery. See Sending and Receiving Events Between AWS Accounts for more information. You also can extend the principle of automatic mitigation to detect and revert changes to other resources such as IAM policies and Amazon S3 bucket policies.

Summary

In this blog post, I demonstrated how you can automatically revert changes to a VPC security group and have a notification sent about the changes. You can use this solution in your own AWS accounts to enforce your security requirements continuously.

If you have comments about this blog post or other ideas for ways to use this solution, submit a comment in the “Comments” section below. If you have implementation questions, start a new thread in the EC2 forum or contact AWS Support.

– Rob

Join Us for AWS IAM Day on Monday, October 16, in New York City

Post Syndicated from Craig Liebendorfer original https://aws.amazon.com/blogs/security/join-us-for-aws-iam-day-on-monday-october-16-in-new-york-city/

Join us in New York City at the AWS Pop-up Loft for AWS IAM Day on Monday, October 16, from 9:30 A.M.–4:15 P.M. Eastern Time. At this free technical event, you will learn AWS Identity and Access Management (IAM) concepts from IAM product managers, as well as tools and strategies you can use for controlling access to your AWS environment, such as the IAM policy language and IAM best practices. You also will take an IAM policy ninja dive deep into permissions and how to use IAM roles to delegate access to your AWS resources. Last, you will learn how to integrate Active Directory with AWS workloads.

You can attend one session or stay for the full day.

Learn more about the available sessions and register!

– Craig

Join Us for AWS IAM Day on Monday, October 9, in San Francisco

Post Syndicated from Craig Liebendorfer original https://aws.amazon.com/blogs/security/join-us-for-aws-iam-day-on-monday-october-9-in-san-francisco/

Join us in San Francisco at the AWS Pop-up Loft for AWS IAM Day on Monday, October 9, from 9:30 A.M.–4:15 P.M. Pacific Time. At this free technical event, you will learn AWS Identity and Access Management (IAM) concepts from IAM product managers, as well as tools and strategies you can use for controlling access to your AWS environment, such as the IAM policy language and IAM best practices. You also will take an IAM policy ninja dive deep into permissions and how to use IAM roles to delegate access to your AWS resources. Last, you will learn how to integrate Active Directory with AWS workloads.

You can attend one session or stay for the full day.

Learn more about the available sessions and register!

– Craig

Now Use AWS IAM to Delete a Service-Linked Role When You No Longer Require an AWS Service to Perform Actions on Your Behalf

Post Syndicated from Ujjwal Pugalia original https://aws.amazon.com/blogs/security/now-use-aws-iam-to-delete-a-service-linked-role-when-you-no-longer-require-an-aws-service-to-perform-actions-on-your-behalf/

Earlier this year, AWS Identity and Access Management (IAM) introduced service-linked roles, which provide you an easy and secure way to delegate permissions to AWS services. Each service-linked role delegates permissions to an AWS service, which is called its linked service. Service-linked roles help with monitoring and auditing requirements by providing a transparent way to understand all actions performed on your behalf because AWS CloudTrail logs all actions performed by the linked service using service-linked roles. For information about which services support service-linked roles, see AWS Services That Work with IAM. Over time, more AWS services will support service-linked roles.

Today, IAM added support for the deletion of service-linked roles through the IAM console and the IAM API/CLI. This means you now can revoke permissions from the linked service to create and manage AWS resources in your account. When you delete a service-linked role, the linked service no longer has the permissions to perform actions on your behalf. To ensure your AWS services continue to function as expected when you delete a service-linked role, IAM validates that you no longer have resources that require the service-linked role to function properly. This prevents you from inadvertently revoking permissions required by an AWS service to manage your existing AWS resources and helps you maintain your resources in a consistent state. If there are any resources in your account that require the service-linked role, you will receive an error when you attempt to delete the service-linked role, and the service-linked role will remain in your account. If you do not have any resources that require the service-linked role, you can delete the service-linked role and IAM will remove the service-linked role from your account.

In this blog post, I show how to delete a service-linked role by using the IAM console. To learn more about how to delete service-linked roles by using the IAM API/CLI, see the DeleteServiceLinkedRole API documentation.

Note: The IAM console does not currently support service-linked role deletion for Amazon Lex, but you can delete your service-linked role by using the Amazon Lex console. To learn more, see Service Permissions.

How to delete a service-linked role by using the IAM console

If you no longer need to use an AWS service that uses a service-linked role, you can remove permissions from that service by deleting the service-linked role through the IAM console. To delete a service-linked role, you must have permissions for the iam:DeleteServiceLinkedRole action. For example, the following IAM policy grants the permission to delete service-linked roles used by Amazon Redshift. To learn more about working with IAM policies, see Working with Policies.

{ 
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowDeletionOfServiceLinkedRolesForRedshift",
            "Effect": "Allow",
            "Action": ["iam:DeleteServiceLinkedRole"],
            "Resource": ["arn:aws:iam::*:role/aws-service-role/redshift.amazonaws.com/AWSServiceRoleForRedshift*"]
	 }
    ]
}

To delete a service-linked role by using the IAM console:

  1. Navigate to the IAM console and choose Roles from the navigation pane.

Screenshot of the Roles page in the IAM console

  1. Choose the service-linked role you want to delete and then choose Delete role. In this example, I choose the  AWSServiceRoleForRedshift service-linked role.

Screenshot of the AWSServiceRoleForRedshift service-linked role

  1. A dialog box asks you to confirm that you want to delete the service-linked role you have chosen. In the Last activity column, you can see when the AWS service last used the service-linked role, which tells you when the linked service last used the service-linked role to perform an action on your behalf. If you want to continue to delete the service-linked role, choose Yes, delete to delete the service-linked role.

Screenshot of the "Delete role" window

  1. IAM then checks whether you have any resources that require the service-linked role you are trying to delete. While IAM checks, you will see the status message, Deletion in progress, below the role name. Screenshot showing "Deletion in progress"
  1. If no resources require the service-linked role, IAM deletes the role from your account and displays a success message on the console.

Screenshot of the success message

  1. If there are AWS resources that require the service-linked role you are trying to delete, you will see the status message, Deletion failed, below the role name.

Screenshot showing the "Deletion failed"

  1. If you choose View details, you will see a message that explains the deletion failed because there are resources that use the service-linked role.
    Screenshot showing details about why the role deletion failed
  2. Choose View Resources to view the Amazon Resource Names (ARNs) of the first five resources that require the service-linked role. You can delete the service-linked role only after you delete all resources that require the service-linked role. In this example, only one resource requires the service-linked role.

Conclusion

Service-linked roles make it easier for you to delegate permissions to AWS services to create and manage AWS resources on your behalf and to understand all actions the service will perform on your behalf. If you no longer need to use an AWS service that uses a service-linked role, you can remove permissions from that service by deleting the service-linked role through the IAM console. However, before you delete a service-linked role, you must delete all the resources associated with that role to ensure that your resources remain in a consistent state.

If you have any questions, submit a comment in the “Comments” section below. If you need help working with service-linked roles, start a new thread on the IAM forum or contact AWS Support.

– Ujjwal

AWS IAM Policy Summaries Now Help You Identify Errors and Correct Permissions in Your IAM Policies

Post Syndicated from Joy Chatterjee original https://aws.amazon.com/blogs/security/iam-policy-summaries-now-help-you-identify-errors-and-correct-permissions-in-your-iam-policies/

In March, we made it easier to view and understand the permissions in your AWS Identity and Access Management (IAM) policies by using IAM policy summaries. Today, we updated policy summaries to help you identify and correct errors in your IAM policies. When you set permissions using IAM policies, for each action you specify, you must match that action to supported resources or conditions. Now, you will see a warning if these policy elements (Actions, Resources, and Conditions) defined in your IAM policy do not match.

When working with policies, you may find that although the policy has valid JSON syntax, it does not grant or deny the desired permissions because the Action element does not have an applicable Resource element or Condition element defined in the policy. For example, you may want to create a policy that allows users to view a specific Amazon EC2 instance. To do this, you create a policy that specifies ec2:DescribeInstances for the Action element and the Amazon Resource Name (ARN) of the instance for the Resource element. When testing this policy, you find AWS denies this access because ec2:DescribeInstances does not support resource-level permissions and requires access to list all instances. Therefore, to grant access to this Action element, you need to specify a wildcard (*) in the Resource element of your policy for this Action element in order for the policy to function correctly.

To help you identify and correct permissions, you will now see a warning in a policy summary if the policy has either of the following:

  • An action that does not support the resource specified in a policy.
  • An action that does not support the condition specified in a policy.

In this blog post, I walk through two examples of how you can use policy summaries to help identify and correct these types of errors in your IAM policies.

How to use IAM policy summaries to debug your policies

Example 1: An action does not support the resource specified in a policy

Let’s say a human resources (HR) representative, Casey, needs access to the personnel files stored in HR’s Amazon S3 bucket. To do this, I create the following policy to grant all actions that begin with s3:List. In addition, I grant access to s3:GetObject in the Action element of the policy. To ensure that Casey has access only to a specific bucket and not others, I specify the bucket ARN in the Resource element of the policy.

Note: This policy does not grant the desired permissions.

This policy does not work. Do not copy.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ThisPolicyDoesNotGrantAllListandGetActions",
            "Effect": "Allow",
            "Action": ["s3:List*",
                       "s3:GetObject"],
            "Resource": ["arn:aws:s3:::HumanResources"]
        }
    ]
}

After I create the policy, HRBucketPermissions, I select this policy from the Policies page to view the policy summary. From here, I check to see if there are any warnings or typos in the policy. I see a warning at the top of the policy detail page because the policy does not grant some permissions specified in the policy, which is caused by a mismatch among the actions, resources, or conditions.

Screenshot showing the warning at the top of the policy

To view more details about the warning, I choose Show remaining so that I can understand why the permissions do not appear in the policy summary. As shown in the following screenshot, I see no access to the services that are not granted by the IAM policy in the policy, which is expected. However, next to S3, I see a warning that one or more S3 actions do not have an applicable resource.

Screenshot showing that one or more S3 actions do not have an applicable resource

To understand why the specific actions do not have a supported resource, I choose S3 from the list of services and choose Show remaining. I type List in the filter to understand why some of the list actions are not granted by the policy. As shown in the following screenshot, I see these warnings:

  • This action does not support resource-level permissions. This means the action does not support resource-level permissions and requires a wildcard (*) in the Resource element of the policy.
  • This action does not have an applicable resource. This means the action supports resource-level permissions, but not the resource type defined in the policy. In this example, I specified an S3 bucket for an action that supports only an S3 object resource type.

From these warnings, I see that s3:ListAllMyBuckets, s3:ListBucketMultipartUploadsParts3:ListObjects , and s3:GetObject do not support an S3 bucket resource type, which results in Casey not having access to the S3 bucket. To correct the policy, I choose Edit policy and update the policy with three statements based on the resource that the S3 actions support. Because Casey needs access to view and read all of the objects in the HumanResources bucket, I add a wildcard (*) for the S3 object path in the Resource ARN.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "TheseActionsSupportBucketResourceType",
            "Effect": "Allow",
            "Action": ["s3:ListBucket",
                       "s3:ListBucketByTags",
                       "s3:ListBucketMultipartUploads",
                       "s3:ListBucketVersions"],
            "Resource": ["arn:aws:s3:::HumanResources"]
        },{
            "Sid": "TheseActionsRequireAllResources",
            "Effect": "Allow",
            "Action": ["s3:ListAllMyBuckets",
                       "s3:ListMultipartUploadParts",
                       "s3:ListObjects"],
            "Resource": [ "*"]
        },{
            "Sid": "TheseActionsRequireSupportsObjectResourceType",
            "Effect": "Allow",
            "Action": ["s3:GetObject"],
            "Resource": ["arn:aws:s3:::HumanResources/*"]
        }
    ]
}

After I make these changes, I see the updated policy summary and see that warnings are no longer displayed.

Screenshot of the updated policy summary that no longer shows warnings

In the previous example, I showed how to identify and correct permissions errors that include actions that do not support a specified resource. In the next example, I show how to use policy summaries to identify and correct a policy that includes actions that do not support a specified condition.

Example 2: An action does not support the condition specified in a policy

For this example, let’s assume Bob is a project manager who requires view and read access to all the code builds for his team. To grant him this access, I create the following JSON policy that specifies all list and read actions to AWS CodeBuild and defines a condition to limit access to resources in the us-west-2 Region in which Bob’s team develops.

This policy does not work. Do not copy. 
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ListReadAccesstoCodeServices",
            "Effect": "Allow",
            "Action": [
                "codebuild:List*",
                "codebuild:BatchGet*"
            ],
            "Resource": ["*"], 
             "Condition": {
                "StringEquals": {
                    "ec2:Region": "us-west-2"
                }
            }
        }
    ]	
}

After I create the policy, PMCodeBuildAccess, I select this policy from the Policies page to view the policy summary in the IAM console. From here, I check to see if the policy has any warnings or typos. I see an error at the top of the policy detail page because the policy does not grant any permissions.

Screenshot with an error showing the policy does not grant any permissions

To view more details about the error, I choose Show remaining to understand why no permissions result from the policy. I see this warning: One or more conditions do not have an applicable action. This means that the condition is not supported by any of the actions defined in the policy.

From the warning message (see preceding screenshot), I realize that ec2:Region is not a supported condition for any actions in CodeBuild. To correct the policy, I separate the list actions that do not support resource-level permissions into a separate Statement element and specify * as the resource. For the remaining CodeBuild actions that support resource-level permissions, I use the ARN to specify the us-west-2 Region in the project resource type.

CORRECT POLICY 
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "TheseActionsSupportAllResources",
            "Effect": "Allow",
            "Action": [
                "codebuild:ListBuilds",
                "codebuild:ListProjects",
                "codebuild:ListRepositories",
                "codebuild:ListCuratedEnvironmentImages",
                "codebuild:ListConnectedOAuthAccounts"
            ],
            "Resource": ["*"] 
        }, {
            "Sid": "TheseActionsSupportAResource",
            "Effect": "Allow",
            "Action": [
                "codebuild:ListBuildsForProject",
                "codebuild:BatchGet*"
            ],
            "Resource": ["arn:aws:codebuild:us-west-2:123456789012:project/*"] 
        }

    ]	
}

After I make the changes, I view the updated policy summary and see that no warnings are displayed.

Screenshot showing the updated policy summary with no warnings

When I choose CodeBuild from the list of services, I also see that for the actions that support resource-level permissions, the access is limited to the us-west-2 Region.

Screenshow showing that for the Actions that support resource-level permissions, the access is limited to the us-west-2 region.

Conclusion

Policy summaries make it easier to view and understand the permissions and resources in your IAM policies by displaying the permissions granted by the policies. As I’ve demonstrated in this post, you can also use policy summaries to help you identify and correct your IAM policies. To understand the types of warnings that policy summaries support, you can visit Troubleshoot IAM Policies. To view policy summaries in your AWS account, sign in to the IAM console and navigate to any policy on the Policies page of the IAM console or the Permissions tab on a user’s page.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about or suggestions for this solution, start a new thread on the IAM forum or contact AWS Support.

– Joy

Automate Your IT Operations Using AWS Step Functions and Amazon CloudWatch Events

Post Syndicated from Andy Katz original https://aws.amazon.com/blogs/compute/automate-your-it-operations-using-aws-step-functions-and-amazon-cloudwatch-events/


Rob Percival, Associate Solutions Architect

Are you interested in reducing the operational overhead of your AWS Cloud infrastructure? One way to achieve this is to automate the response to operational events for resources in your AWS account.

Amazon CloudWatch Events provides a near real-time stream of system events that describe the changes and notifications for your AWS resources. From this stream, you can create rules to route specific events to AWS Step Functions, AWS Lambda, and other AWS services for further processing and automated actions.

In this post, learn how you can use Step Functions to orchestrate serverless IT automation workflows in response to CloudWatch events sourced from AWS Health, a service that monitors and generates events for your AWS resources. As a real-world example, I show automating the response to a scenario where an IAM user access key has been exposed.

Serverless workflows with Step Functions and Lambda

Step Functions makes it easy to develop and orchestrate components of operational response automation using visual workflows. Building automation workflows from individual Lambda functions that perform discrete tasks lets you develop, test, and modify the components of your workflow quickly and seamlessly. As serverless services, Step Functions and Lambda also provide the benefits of more productive development, reduced operational overhead, and no costs incurred outside of when the workflows are actively executing.

Example workflow

As an example, this post focuses on automating the response to an event generated by AWS Health when an IAM access key has been publicly exposed on GitHub. This is a diagram of the automation workflow:

AWS proactively monitors popular code repository sites for IAM access keys that have been publicly exposed. Upon detection of an exposed IAM access key, AWS Health generates an AWS_RISK_CREDENTIALS_EXPOSED event in the AWS account related to the exposed key. A configured CloudWatch Events rule detects this event and invokes a Step Functions state machine. The state machine then orchestrates the automated workflow that deletes the exposed IAM access key, summarizes the recent API activity for the exposed key, and sends the summary message to an Amazon SNS topic to notify the subscribers―in that order.

The corresponding Step Functions state machine diagram of this automation workflow can be seen below:

While this particular example focuses on IT automation workflows in response to the AWS_RISK_CREDENTIALS_EXPOSEDevent sourced from AWS Health, it can be generalized to integrate with other events from these services, other event-generating AWS services, and even run on a time-based schedule.

Walkthrough

To follow along, use the code and resources found in the aws-health-tools GitHub repo. The code and resources include an AWS CloudFormation template, in addition to instructions on how to use it.

Launch Stack into N. Virginia with CloudFormation

The Step Functions state machine execution starts with the exposed keys event details in JSON, a sanitized example of which is provided below:

{
    "version": "0",
    "id": "121345678-1234-1234-1234-123456789012",
    "detail-type": "AWS Health Event",
    "source": "aws.health",
    "account": "123456789012",
    "time": "2016-06-05T06:27:57Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "eventArn": "arn:aws:health:us-east-1::event/AWS_RISK_CREDENTIALS_EXPOSED_XXXXXXXXXXXXXXXXX",
        "service": "RISK",
        "eventTypeCode": "AWS_RISK_CREDENTIALS_EXPOSED",
        "eventTypeCategory": "issue",
        "startTime": "Sat, 05 Jun 2016 15:10:09 GMT",
        "eventDescription": [
            {
                "language": "en_US",
                "latestDescription": "A description of the event is provided here"
            }
        ],
        "affectedEntities": [
            {
                "entityValue": "ACCESS_KEY_ID_HERE"
            }
        ]
    }
}

After it’s invoked, the state machine execution proceeds as follows.

Step 1: Delete the exposed IAM access key pair

The first thing you want to do when you determine that an IAM access key has been exposed is to delete the key pair so that it can no longer be used to make API calls. This Step Functions task state deletes the exposed access key pair detailed in the incoming event, and retrieves the IAM user associated with the key to look up API activity for the user in the next step. The user name, access key, and other details about the event are passed to the next step as JSON.

This state contains a powerful error-handling feature offered by Step Functions task states called a catch configuration. Catch configurations allow you to reroute and continue state machine invocation at new states depending on potential errors that occur in your task function. In this case, the catch configuration skips to Step 3. It immediately notifies your security team that errors were raised in the task function of this step (Step 1), when attempting to look up the corresponding IAM user for a key or delete the user’s access key.

Note: Step Functions also offers a retry configuration for when you would rather retry a task function that failed due to error, with the option to specify an increasing time interval between attempts and a maximum number of attempts.

Step 2: Summarize recent API activity for key

After you have deleted the access key pair, you’ll want to have some immediate insight into whether it was used for malicious activity in your account. Another task state, this step uses AWS CloudTrail to look up and summarize the most recent API activity for the IAM user associated with the exposed key. The summary is in the form of counts for each API call made and resource type and name affected. This summary information is then passed to the next step as JSON. This step requires information that you obtained in Step 1. Step Functions ensures the successful completion of Step 1 before moving to Step 2.

Step 3: Notify security

The summary information gathered in the last step can provide immediate insight into any malicious activity on your account made by the exposed key. To determine this and further secure your account if necessary, you must notify your security team with the gathered summary information.

This final task state generates an email message providing in-depth detail about the event using the API activity summary, and publishes the message to an SNS topic subscribed to by the members of your security team.

If the catch configuration of the task state in Step 1 was triggered, then the security notification email instead directs your security team to log in to the console and navigate to the Personal Health Dashboard to view more details on the incident.

Lessons learned

When implementing this use case with Step Functions and Lambda, consider the following:

  • One of the most important parts of implementing automation in response to operational events is to ensure visibility into the response and resolution actions is retained. Step Functions and Lambda enable you to orchestrate your granular response and resolution actions that provides direct visibility into the state of the automation workflow.
  • This basic workflow currently executes these steps serially with a catch configuration for error handling. More sophisticated workflows can leverage the parallel execution, branching logic, and time delay functionality provided by Step Functions.
  • Catch and retry configurations for task states allow for orchestrating reliable workflows while maintaining the granularity of each Lambda function. Without leveraging a catch configuration in Step 1, you would have had to duplicate code from the function in Step 3 to ensure that your security team was notified on failure to delete the access key.
  • Step Functions and Lambda are serverless services, so there is no cost for these services when they are not running. Because this IT automation workflow only runs when an IAM access key is exposed for this account (which is hopefully rare!), the total monthly cost for this workflow is essentially $0.

Conclusion

Automating the response to operational events for resources in your AWS account can free up the valuable time of your engineers. Step Functions and Lambda enable granular IT automation workflows to achieve this result while gaining direct visibility into the orchestration and state of the automation.

For more examples of how to use Step Functions to automate the operations of your AWS resources, or if you’d like to see how Step Functions can be used to build and orchestrate serverless applications, visit Getting Started on the Step Functions website.