If you’ve used Amazon CloudWatch Events to schedule the invocation of a Lambda function at regular intervals, you may have noticed that the highest frequency possible is one invocation per minute. However, in some cases, you may need to invoke Lambda more often than that. In this blog post, I’ll cover invoking a Lambda function every 10 seconds, but with some simple math you can change to whatever interval you like.
To achieve this, I’ll show you how to leverage Step Functions and Amazon Kinesis Data Streams.
The Solution
For this example, I’ve created a Step Functions State Machine that invokes our Lambda function 6 times, 10 seconds apart. Such State Machine is then executed once per minute by a CloudWatch Events Rule. This state machine is then executed once per minute by an Amazon CloudWatch Events rule. Finally, the Kinesis Data Stream triggers our Lambda function for each record inserted. The result is our Lambda function being invoked every 10 seconds, indefinitely.
Below is a diagram illustrating how the various services work together.
Step 1: My sampleLambda function doesn’t actually do anything, it just simulates an execution for a few seconds. This is the (Python) code of my dummy function:
import time
import random
def lambda_handler(event, context):
rand = random.randint(1, 3)
print('Running for {} seconds'.format(rand))
time.sleep(rand)
return True
Step 2:
The next step is to create a second Lambda function, that I called Iterator, which has two duties:
It keeps track of the current number of iterations, since Step Function doesn’t natively have a state we can use for this purpose.
It asynchronously invokes our Lambda function at every loops.
This is the code of the Iterator, adapted from here.
The state machine starts and sets the index at 0 and the count at 6.
Iterator function is invoked.
If the iterator function reached the end of the loop, the IsCountReached state terminates the execution, otherwise the machine waits for 10 seconds.
The machine loops back to the iterator.
Step 3: Create an Amazon CloudWatch Events rule scheduled to trigger every minute and add the state machine as its target. I’ve actually prepared an Amazon CloudFormation template that creates the whole stack and starts the Lambda invocations, you can find it here.
Performance
Let’s have a look at a sample series of invocations and analyse how precise the timing is. In the following chart I reported the delay (in excess of the expected 10-second-wait) of 30 consecutive invocations of my dummy function, when the Iterator is configured with a memory size of 1024MB.
Invocations Delay
Notice the delay increases by a few hundred milliseconds at every invocation. The good news is it accrues only within the same loop, 6 times; after that, a new CloudWatch Events kicks in and it resets.
This delay is due to the work that AWS Step Function does outside of the Wait state, the main component of which is the Iterator function itself, that runs synchronously in the state machine and therefore adds up its duration to the 10-second-wait.
As we can easily imagine, the memory size of the Iterator Lambda function does make a difference. Here are the Average and Maximum duration of the function with 256MB, 512MB, 1GB and 2GB of memory.
Average Duration
Maximum Duration
Given those results, I’d say that a memory of 1024MB is a good compromise between costs and performance.
Caveats
As mentioned, in our Amazon CloudWatch Events documentation, in rare cases a rule can be triggered twice, causing two parallel executions of the state machine. If that is a concern, we can add a task state at the beginning of the state machine that checks if any other executions are currently running. If the outcome is positive, then a choice state can immediately terminate the flow. Since the state machine is invoked every 60 seconds and runs for about 50, it is safe to assume that executions should all be sequential and any parallel executions should be treated as duplicates. The task state that checks for current running executions can be a Lambda function similar to the following:
AWS Config enables continuous monitoring of your AWS resources, making it simple to assess, audit, and record resource configurations and changes. AWS Config does this through the use of rules that define the desired configuration state of your AWS resources. AWS Config provides a number of AWS managed rules that address a wide range of security concerns such as checking if you encrypted your Amazon Elastic Block Store (Amazon EBS) volumes, tagged your resources appropriately, and enabled multi-factor authentication (MFA) for root accounts. You can also create custom rules to codify your compliance requirements through the use of AWS Lambda functions.
In this post we’ll show you how to use AWS Config to monitor our Amazon Simple Storage Service (S3) bucket ACLs and policies for violations which allow public read or public write access. If AWS Config finds a policy violation, we’ll have it trigger an Amazon CloudWatch Event rule to trigger an AWS Lambda function which either corrects the S3 bucket ACL, or notifies you via Amazon Simple Notification Service (Amazon SNS) that the policy is in violation and allows public read or public write access. We’ll show you how to do this in five main steps.
Enable AWS Config to monitor Amazon S3 bucket ACLs and policies for compliance violations.
Create an IAM Role and Policy that grants a Lambda function permissions to read S3 bucket policies and send alerts through SNS.
Create and configure a CloudWatch Events rule that triggers the Lambda function when AWS Config detects an S3 bucket ACL or policy violation.
Create a Lambda function that uses the IAM role to review S3 bucket ACLs and policies, correct the ACLs, and notify your team of out-of-compliance policies.
Verify the monitoring solution.
Note: This post assumes your compliance policies require the buckets you monitor not allow public read or write access. If you have intentionally open buckets serving static content, for example, you can use this post as a jumping-off point for a solution tailored to your needs.
At the end of this post, we provide an AWS CloudFormation template that implements the solution outlined. The template enables you to deploy the solution in multiple regions quickly.
Important: The use of some of the resources deployed, including those deployed using the provided CloudFormation template, will incur costs as long as they are in use. AWS Config Rules incur costs in each region they are active.
Architecture
Here’s an architecture diagram of what we’ll implement:
Figure 1: Architecture diagram
Step 1: Enable AWS Config and Amazon S3 Bucket monitoring
The following steps demonstrate how to set up AWS Config to monitor Amazon S3 buckets.
If this is your first time using AWS Config, select Get started. If you’ve already used AWS Config, select Settings.
In the Settings page, under Resource types to record, clear the All resources checkbox. In the Specific types list, select Bucket under S3.
Figure 2: The Settings dialog box showing the “Specific types” list
Choose the Amazon S3 bucket for storing configuration history and snapshots. We’ll create a new Amazon S3 bucket.
Figure 3: Creating an S3 bucket
If you prefer to use an existing Amazon S3 bucket in your account, select the Choose a bucket from your account radio button and, using the dropdown, select an existing bucket.
Figure 4: Selecting an existing S3 bucket
Under Amazon SNS topic, check the box next to Stream configuration changes and notifications to an Amazon SNS topic, and then select the radio button to Create a topic.
Alternatively, you can choose a topic that you have previously created and subscribed to.
Figure 5: Selecting a topic that you’ve previously created and subscribed to
If you created a new SNS topic you need to subscribe to it to receive notifications. We’ll cover this in a later step.
Under AWS Config role, choose Create a role (unless you already have a role you want to use). We’re using the auto-suggested role name.
Figure 6: Creating a role
Select Next.
Configure Amazon S3 bucket monitoring rules:
On the AWS Config rules page, search for S3 and choose the s3-bucket-publice-read-prohibited and s3-bucket-public-write-prohibited rules, then click Next.
Figure 7: AWS Config rules dialog
On the Review page, select Confirm. AWS Config is now analyzing your Amazon S3 buckets, capturing their current configurations, and evaluating the configurations against the rules we selected.
If you created a new Amazon SNS topic, open the Amazon SNS Management Console and locate the topic you created:
Figure 8: Amazon SNS topic list
Copy the ARN of the topic (the string that begins with arn:) because you’ll need it in a later step.
Select the checkbox next to the topic, and then, under the Actions menu, select Subscribe to topic.
Select Email as the protocol, enter your email address, and then select Create subscription.
After several minutes, you’ll receive an email asking you to confirm your subscription for notifications for this topic. Select the link to confirm the subscription.
Step 2: Create a Role for Lambda
Our Lambda will need permissions that enable it to inspect and modify Amazon S3 bucket ACLs and policies, log to CloudWatch Logs, and publishing to an Amazon SNS topic. We’ll now set up a custom AWS Identity and Access Management (IAM) policy and role to support these actions and assign them to the Lambda function we’ll create in the next section.
In the AWS Management Console, under Services, select IAM to access the IAM Console.
Create a policy with the following permissions, or copy the following policy:
Select Lambda from the list of services that will use this role.
Select the check box next to the policy you created previously, and then select Next: Review
Name your role, give it a description, and then select Create Role. In this example, we’re naming the role LambdaS3PolicySecuringRole.
Step 3: Create and Configure a CloudWatch Rule
In this section, we’ll create a CloudWatch Rule to trigger the Lambda function when AWS Config determines that your Amazon S3 buckets are non-compliant.
In the AWS Management Console, under Services, select CloudWatch.
On the left-hand side, under Events, select Rules.
Click Create rule.
In Step 1: Create rule, under Event Source, select the dropdown list and select Build custom event pattern.
Copy the following pattern and paste it into the text box:
The pattern matches events generated by AWS Config when it checks the Amazon S3 bucket for public accessibility.
We’ll add a Lambda target later. For now, select your Amazon SNS topic created earlier, and then select Configure details.
Figure 9: The “Create rule” dialog
Give your rule a name and description. For this example, we’ll name ours AWSConfigFoundOpenBucket
Click Create rule.
Step 4: Create a Lambda Function
In this section, we’ll create a new Lambda function to examine an Amazon S3 bucket’s ACL and bucket policy. If the bucket ACL is found to allow public access, the Lambda function overwrites it to be private. If a bucket policy is found, the Lambda function creates an SNS message, puts the policy in the message body, and publishes it to the Amazon SNS topic we created. Bucket policies can be complex, and overwriting your policy may cause unexpected loss of access, so this Lambda function doesn’t attempt to alter your policy in any way.
Get the ARN of the Amazon SNS topic created earlier.
In the AWS Management Console, under Services, select Lambda to go to the Lambda Console.
From the Dashboard, select Create Function. Or, if you were taken directly to the Functions page, select the Create Function button in the upper-right.
On the Create function page:
Choose Author from scratch.
Provide a name for the function. We’re using AWSConfigOpenAccessResponder.
The Lambda function we’ve written is Python 3.6 compatible, so in the Runtime dropdown list, select Python 3.6.
Under Role, select Choose an existing role. Select the role you created in the previous section, and then select Create function.
Figure 10: The “Create function” dialog
We’ll now add a CloudWatch Event based on the rule we created earlier.
In the Add triggers section, select CloudWatch Events. A CloudWatch Events box should appear connected to the left side of the Lambda Function and have a note that states Configuration required.
Figure 11: CloudWatch Events in the “Add triggers” section
From the Rule dropdown box, choose the rule you created earlier, and then select Add.
Scroll up to the Designer section and select the name of your Lambda function.
Delete the default code and paste in the following code:
import boto3
from botocore.exceptions import ClientError
import json
import os
ACL_RD_WARNING = "The S3 bucket ACL allows public read access."
PLCY_RD_WARNING = "The S3 bucket policy allows public read access."
ACL_WRT_WARNING = "The S3 bucket ACL allows public write access."
PLCY_WRT_WARNING = "The S3 bucket policy allows public write access."
RD_COMBO_WARNING = ACL_RD_WARNING + PLCY_RD_WARNING
WRT_COMBO_WARNING = ACL_WRT_WARNING + PLCY_WRT_WARNING
def policyNotifier(bucketName, s3client):
try:
bucketPolicy = s3client.get_bucket_policy(Bucket = bucketName)
# notify that the bucket policy may need to be reviewed due to security concerns
sns = boto3.client('sns')
subject = "Potential compliance violation in " + bucketName + " bucket policy"
message = "Potential bucket policy compliance violation. Please review: " + json.dumps(bucketPolicy['Policy'])
# send SNS message with warning and bucket policy
response = sns.publish(
TopicArn = os.environ['TOPIC_ARN'],
Subject = subject,
Message = message
)
except ClientError as e:
# error caught due to no bucket policy
print("No bucket policy found; no alert sent.")
def lambda_handler(event, context):
# instantiate Amazon S3 client
s3 = boto3.client('s3')
resource = list(event['detail']['requestParameters']['evaluations'])[0]
bucketName = resource['complianceResourceId']
complianceFailure = event['detail']['requestParameters']['evaluations'][0]['annotation']
if(complianceFailure == ACL_RD_WARNING or complianceFailure == PLCY_RD_WARNING):
s3.put_bucket_acl(Bucket = bucketName, ACL = 'private')
elif(complianceFailure == PLCY_RD_WARNING or complianceFailure == PLCY_WRT_WARNING):
policyNotifier(bucketName, s3)
elif(complianceFailure == RD_COMBO_WARNING or complianceFailure == WRT_COMBO_WARNING):
s3.put_bucket_acl(Bucket = bucketName, ACL = 'private')
policyNotifier(bucketName, s3)
return 0 # done
Scroll down to the Environment variables section. This code uses an environment variable to store the Amazon SNS topic ARN.
For the key, enter TOPIC_ARN.
For the value, enter the ARN of the Amazon SNS topic created earlier.
Under Execution role, select Choose an existing role, and then select the role created earlier from the dropdown.
Leave everything else as-is, and then, at the top, select Save.
Step 5: Verify it Works
We now have the Lambda function, an Amazon SNS topic, AWS Config watching our Amazon S3 buckets, and a CloudWatch Rule to trigger the Lambda function if a bucket is found to be non-compliant. Let’s test them to make sure they work.
We have an Amazon S3 bucket, myconfigtestbucket that’s been created in the region monitored by AWS Config, as well as the associated Lambda function. This bucket has no public read or write access set in an ACL or a policy, so it’s compliant.
Figure 12: The “Config Dashboard”
Let’s change the bucket’s ACL to allow public listing of objects:
Figure 13: Screen shot of “Permissions” tab showing Everyone granted list access
After saving, the bucket now has public access. After several minutes, the AWS Config Dashboard notes that there is one non-compliant resource:
Figure 14: The “Config Dashboard” shown with a non-compliant resource
In the Amazon S3 Console, we can see that the bucket no longer has public listing of objects enabled after the invocation of the Lambda function triggered by the CloudWatch Rule created earlier.
Figure 15: The “Permissions” tab showing list access no longer allowed
Notice that the AWS Config Dashboard now shows that there are no longer any non-compliant resources:
Figure 16: The “Config Dashboard” showing zero non-compliant resources
Now, let’s try out the Amazon S3 bucket policy check by configuring a bucket policy that allows list access:
Figure 17: A bucket policy that allows list access
A few minutes after setting this bucket policy on the myconfigtestbucket bucket, AWS Config recognizes the bucket is no longer compliant. Because this is a bucket policy rather than an ACL, we publish a notification to the SNS topic we created earlier that lets us know about the potential policy violation:
Figure 18: Notification about potential policy violation
Knowing that the policy allows open listing of the bucket, we can now modify or delete the policy, after which AWS Config will recognize that the resource is compliant.
Conclusion
In this post, we demonstrated how you can use AWS Config to monitor for Amazon S3 buckets with open read and write access ACLs and policies. We also showed how to use Amazon CloudWatch, Amazon SNS, and Lambda to overwrite a public bucket ACL, or to alert you should a bucket have a suspicious policy. You can use the CloudFormation template to deploy this solution in multiple regions quickly. With this approach, you will be able to easily identify and secure open Amazon S3 bucket ACLs and policies. Once you have deployed this solution to multiple regions you can aggregate the results using an AWS Config aggregator. See this post to learn more.
If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the AWS Config forum or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
We launched AWS Support a full decade ago, with Gold and Silver plans focused on Amazon EC2, Amazon S3, and Amazon SQS. Starting from that initial offering, backed by a small team in Seattle, AWS Support now encompasses thousands of people working from more than 60 locations.
A Quick Look Back Over the years, that offering has matured and evolved in order to meet the needs of an increasingly diverse base of AWS customers. We aim to support you at every step of your cloud adoption journey, from your initial experiments to the time you deploy mission-critical workloads and applications.
We have worked hard to make our support model helpful and proactive. We do our best to provide you with the tools, alerts, and knowledge that will help you to build systems that are secure, robust, and dependable. Here are some of our most recent efforts toward that goal:
Trusted Advisor S3 Bucket Policy Check – AWS Trusted Advisor provides you with five categories of checks and makes recommendations that are designed to improve security and performance. Earlier this year we announced that the S3 Bucket Permissions Check is now free, and available to all AWS users. If you are signed up for the Business or Professional level of AWS Support, you can also monitor this check (and many others) using Amazon CloudWatch Events. You can use this to monitor and secure your buckets without human intervention.
Personal Health Dashboard – This tool provides you with alerts and guidance when AWS is experiencing events that may affect you. You get a personalized view into the performance and availability of the AWS services that underlie your AWS resources. It also generates Amazon CloudWatch Events so that you can initiate automated failover and remediation if necessary.
Well Architected / Cloud Ops Review – We’ve learned a lot about how to architect AWS-powered systems over the years and we want to share everything we know with you! The AWS Well-Architected Framework provide proven, detailed guidance in critical areas including operational excellence, security, reliability, performance efficiency, and cost optimization. You can read the materials online and you can also sign up for the online training course. If you are signed up for Enterprise support, you can also benefit from our Cloud Ops review.
Infrastructure Event Management – If you are launching a new app, kicking off a big migration, or hosting a large-scale event similar to Prime Day, we are ready with guidance and real-time support. Our Infrastructure Event Management team will help you to assess the readiness of your environment and work with you to identify and mitigate risks ahead of time.
To learn more about how AWS customers have used AWS support to realize all of the benefits that I noted above, watch these videos (and find more on the Customer Testmonials page):
The Amazon retail site makes heavy use of AWS. You can read my post, Prime Day 2017 – Powered by AWS, to learn more about the process of preparing to sustain a record-setting amount of traffic and to accept a like number of orders.
Come and Join Us The AWS Support Team is in continuous hiring mode and we have openings all over the world! Here are a couple of highlights:
Amazon EC2 Spot Instances are spare compute capacity in the AWS Cloud available to you at steep discounts compared to On-Demand prices. The only difference between On-Demand Instances and Spot Instances is that Spot Instances can be interrupted by Amazon EC2 with two minutes of notification when EC2 needs the capacity back.
Customers have been taking advantage of Spot Instance interruption notices available via the instance metadata service since January 2015 to orchestrate their workloads seamlessly around any potential interruptions. Examples include saving the state of a job, detaching from a load balancer, or draining containers. Needless to say, the two-minute Spot Instance interruption notice is a powerful tool when using Spot Instances.
In January 2018, the Spot Instance interruption notice also became available as an event in Amazon CloudWatch Events. This allows targets such as AWS Lambda functions or Amazon SNS topics to process Spot Instance interruption notices by creating a CloudWatch Events rule to monitor for the notice.
In this post, I walk through an example use case for taking advantage of Spot Instance interruption notices in CloudWatch Events to automatically deregister Spot Instances from an Elastic Load Balancing Application Load Balancer.
When any of the Spot Instances receives an interruption notice, Spot Fleet sends the event to CloudWatch Events. The CloudWatch Events rule then notifies both targets, the Lambda function and SNS topic. The Lambda function detaches the Spot Instance from the Application Load Balancer target group, taking advantage of nearly a full two minutes of connection draining before the instance is interrupted. The SNS topic also receives a message, and is provided as an example for the reader to use as an exercise.
To complete this walkthrough, have the AWS CLI installed and configured, as well as the ability to launch CloudFormation stacks.
Launch the stack
Go ahead and launch the CloudFormation stack. You can check it out from GitHub, or grab the template directly. In this post, I use the stack name “spot-spin-cwe“, but feel free to use any name you like. Just remember to change it in the instructions.
Here are the details of the architecture being launched by the stack.
IAM permissions
Give permissions to a few components in the architecture:
The Lambda function
The CloudWatch Events rule
The Spot Fleet
The Lambda function needs basic Lambda function execution permissions so that it can write logs to CloudWatch Logs. You can use the AWS managed policy for this. It also needs to describe EC2 tags as well as deregister targets within Elastic Load Balancing. You can create a custom policy for these.
Finally, Spot Fleet needs permissions to request Spot Instances, tag, and register targets in Elastic Load Balancing. You can tap into an AWS managed policy for this.
Because you are taking advantage of the two-minute Spot Instance notice, you can tune the Elastic Load Balancing target group deregistration timeout delay to match. When a target is deregistered from the target group, it is put into connection draining mode for the length of the timeout delay: 120 seconds to equal the two-minute notice.
To capture the Spot Instance interruption notice being published to CloudWatch Events, create a rule with two targets: the Lambda function and the SNS topic.
The Lambda function does the heavy lifting for you. The details of the CloudWatch event are published to the Lambda function, which then uses boto3 to make a couple of AWS API calls. The first call is to describe the EC2 tags for the Spot Instance, filtering on a key of “TargetGroupArn”. If this tag is found, the instance is then deregistered from the target group ARN stored as the value of the tag.
import boto3
def handler(event, context):
instanceId = event['detail']['instance-id']
instanceAction = event['detail']['instance-action']
try:
ec2client = boto3.client('ec2')
describeTags = ec2client.describe_tags(Filters=[{'Name': 'resource-id','Values':[instanceId],'Name':'key','Values':['loadBalancerTargetGroup']}])
except:
print("No action being taken. Unable to describe tags for instance id:", instanceId)
return
try:
elbv2client = boto3.client('elbv2')
deregisterTargets = elbv2client.deregister_targets(TargetGroupArn=describeTags['Tags'][0]['Value'],Targets=[{'Id':instanceId}])
except:
print("No action being taken. Unable to deregister targets for instance id:", instanceId)
return
print("Detaching instance from target:")
print(instanceId, describeTags['Tags'][0]['Value'], deregisterTargets, sep=",")
return
SNS topic
Finally, you’ve created an SNS topic as an example target. For example, you could subscribe an email address to the SNS topic in order to receive email notifications when a Spot Instance interruption notice is received.
To proceed to creating your Spot Fleet request, use some of the resources that the CloudFormation stack created, to populate the Spot Fleet request launch configuration. You can find the values in the outputs values of the CloudFormation stack:
You can confirm that the Spot Fleet request was fulfilled by checking that ActivityStatus is “fulfilled”, or by checking that FulfilledCapacity is greater than or equal to TargetCapacity, while describing the request:
In order to test, you can take advantage of the fact that any interruption action that Spot Fleet takes on a Spot Instance results in a Spot Instance interruption notice being provided. Therefore, you can simply decrease the target size of your Spot Fleet from 2 to 1. The instance that is interrupted receives the interruption notice:
As soon as the interruption notice is published to CloudWatch Events, the Lambda function triggers and detaches the instance from the target group, effectively putting the instance in a draining state.
In conclusion, Amazon EC2 Spot Instance interruption notices are an extremely powerful tool when taking advantage of Amazon EC2 Spot Instances in your workloads, for tasks such as saving state, draining connections, and much more. I’d love to hear how you are using them in your own environment!
Chad Schmutzer Solutions Architect
Chad Schmutzer is a Solutions Architect at Amazon Web Services based in Pasadena, CA. As an extension of the Amazon EC2 Spot Instances team, Chad helps customers significantly reduce the cost of running their applications, growing their compute capacity and throughput without increasing budget, and enabling new types of cloud computing applications.
Amazon EMR enables data analysts and scientists to deploy a cluster running popular frameworks such as Spark, HBase, Presto, and Flink of any size in minutes. When you launch a cluster, Amazon EMR automatically configures the underlying Amazon EC2 instances with the frameworks and applications that you choose for your cluster. This can include popular web interfaces such as Hue workbench, Zeppelin notebook, and Ganglia monitoring dashboards and tools.
These web interfaces are hosted on the EMR master node and must be accessed using the public DNS name of the master node (master public DNS value). The master public DNS value is dynamically created, not very user friendly and is hard to remember— it looks something like ip-###-###-###-###.us-west-2.compute.internal. Not having a friendly URL to connect to the popular workbench or notebook interfaces may impact the workflow and hinder your gained agility.
Some customers have addressed this challenge through custom bootstrap actions, steps, or external scripts that periodically check for new clusters and register a friendlier name in DNS. These approaches either put additional burden on the data practitioners or require additional resources to execute the scripts. In addition, there is typically some lag time associated with such scripts. They often don’t do a great job cleaning up the DNS records after the cluster has terminated, potentially resulting in a security risk.
The solution in this post provides an automated, serverless approach to registering a friendly master node name for easy access to the web interfaces.
Before I dive deeper, I review these key services and how they are part of this solution.
CloudWatch Events
CloudWatch Events delivers a near real-time stream of system events that describe changes in AWS resources. Using simple rules, you can match events and route them to one or more target functions or streams. An event can be generated in one of four ways:
From an AWS service when resources change state
From API calls that are delivered via AWS CloudTrail
From your own code that can generate application-level events
In this solution, I cover the first type of event, which is automatically emitted by EMR when the cluster state changes. Based on the state of this event, either create or update the DNS record in Route 53 when the cluster state changes to STARTING, or delete the DNS record when the cluster is no longer needed and the state changes to TERMINATED. For more information about all EMR event details, see Monitor CloudWatch Events.
Route 53 private hosted zones
A private hosted zone is a container that holds information about how to route traffic for a domain and its subdomains within one or more VPCs. Private hosted zones enable you to use custom DNS names for your internal resources without exposing the names or IP addresses to the internet.
Route 53 supports resource record sets with a wide range of record types. In this solution, you use a CNAME record that is used to specify a domain name as an alias for another domain (the ‘canonical’ domain). You use a friendly name of the cluster as the CNAME for the EMR master public DNS value.
You are using private hosted zones because an EMR cluster is typically deployed within a private subnet and is accessed either from within the VPC or from on-premises resources over VPN or AWS Direct Connect. To resolve domain names in private hosted zones from your on-premises network, configure a DNS forwarder, as described in How can I resolve Route 53 private hosted zones from an on-premises network via an Ubuntu instance?.
Lambda
Lambda is a compute service that lets you run code without provisioning or managing servers. Lambda executes your code only when needed and scales automatically to thousands of requests per second. Lambda takes care of high availability, and server and OS maintenance and patching. You pay only for the consumed compute time. There is no charge when your code is not running.
Lambda provides the ability to invoke your code in response to events, such as when an object is put to an Amazon S3 bucket or as in this case, when a CloudWatch event is emitted. As part of this solution, you deploy a Lambda function as a target that is invoked by CloudWatch Events when the event matches your rule. You also configure the necessary permissions based on the Lambda permissions model, including a Lambda function policy and Lambda execution role.
Putting it all together
Now that you have all of the pieces, you can put together a complete solution. The following diagram illustrates how the solution works:
Start with a user activity such as launching or terminating an EMR cluster.
EMR automatically sends events to the CloudWatch Events stream.
A CloudWatch Events rule matches the specified event, and routes it to a target, which in this case is a Lambda function. In this case, you are using the EMR Cluster State Change
The Lambda function performs the following key steps:
Get the clusterId value from the event detail and use it to call EMR. DescribeCluster API to retrieve the following data points:
MasterPublicDnsName – public DNS name of the master node
Locate the tag containing the friendly name to use as the CNAME for the cluster. The key name containing the friendly name should be The value should be specified as host.domain.com, where domain is the private hosted zone in which to update the DNS record.
Update DNS based on the state in the event detail.
If the state is STARTING, the function calls the Route 53 API to create or update a resource record set in the private hosted zone specified by the domain tag. This is a CNAME record mapped to MasterPublicDnsName.
Conversely, if the state is TERMINATED, the function calls the Route 53 API to delete the associated resource record set from the private hosted zone.
Deploying the solution
Because all of the components of this solution are serverless, use the AWS Serverless Application Model (AWS SAM) template to deploy the solution. AWS SAM is natively supported by AWS CloudFormation and provides a simplified syntax for expressing serverless resources, resulting in fewer lines of code.
Overview of the SAM template
For this solution, the SAM template has 76 lines of text as compared to 142 lines without SAM resources (and writing the template in YAML would be even slightly smaller). The solution can be deployed using the AWS Management Console, AWS Command Line Interface (AWS CLI), or AWS SAM Local.
CloudFormation transforms help simplify template authoring by condensing a multiple-line resource declaration into a single line in your template. To inform CloudFormation that your template defines a serverless application, add a line under the template format version as follows:
Before SAM, you would use the AWS::Lambda::Function resource type to define your Lambda function. You would then need a resource to define the permissions for the function (AWS::Lambda::Permission), another resource to define a Lambda execution role (AWS::IAM::Role), and finally a CloudWatch Events resource (Events::Rule) that triggers this function.
With SAM, you need to define just a single resource for your function, AWS::Serverless::Function. Using this single resource type, you can define everything that you need, including function properties such as function handler, runtime, and code URI, as well as the required IAM policies and the CloudWatch event.
A few additional things to note in the code example:
CodeUri – Before you can deploy a SAM template, first upload your Lambda function code zip to S3. You can do this manually or use the aws cloudformation package CLI command to automate the task of uploading local artifacts to a S3 bucket, as shown later.
Lambda execution role and permissions – You are not specifying a Lambda execution role in the template. Rather, you are providing the required permissions as IAM policy documents. When the template is submitted, CloudFormation expands the AWS::Serverless::Function resource, declaring a Lambda function and an execution role. The created role has two attached policies: a default AWSLambdaBasicExecutionRole and the inline policy specified in the template.
CloudWatch Events rule – Instead of specifying a CloudWatch Events resource type, you are defining an event source object as a property of the function itself. When the template is submitted, CloudFormation expands this into a CloudWatch Events rule resource and automatically creates the Lambda resource-based permissions to allow the CloudWatch Events rule to trigger the function.
NOTE: If you are trying this solution outside of us-east-1, then you should download the necessary files, upload them to the buckets in your region, edit the script as appropriate and then run it or use the CLI deployment method below.
3.) Choose Next.
4.) On the Specify Details page, keep or modify the stack name and choose Next.
5.) On the Options page, choose Next.
6.) On the Review page, take the following steps:
Acknowledge the two Transform access capabilities. This allows the CloudFormation transform to create the required IAM resources with custom names.
Under Transforms, choose Create Change Set.
Wait a few seconds for the change set to be created before proceeding. The change set should look as follows:
7.) Choose Execute to deploy the template.
After the template is deployed, you should see four resources created:
After the package is successfully uploaded, the output should look as follows:
Uploading to 0f6d12c7872b50b37dbfd5a60385b854 1872 / 1872.0 (100.00%)
Successfully packaged artifacts and wrote output template to file serverless-output.template.
The CodeUri property in serverless-output.template is now referencing the packaged artifacts in the S3 bucket that you specified:
s3://<bucket>/0f6d12c7872b50b37dbfd5a60385b854
Use the aws cloudformation deploy CLI command to deploy the stack:
You should see the following output after the stack has been successfully created:
Waiting for changeset to be created...
Waiting for stack create/update to complete
Successfully created/updated stack – EmrDnsSetterCli
Validating results
To test the solution, launch an EMR cluster. The Lambda function looks for the cluster_name tag associated with the EMR cluster. Make sure to specify the friendly name of your cluster as host.domain.com where the domain is the private hosted zone in which to create the CNAME record.
Here is a sample CLI command to launch a cluster within a specific subnet in a VPC with the required tag cluster_name.
After the cluster is launched, log in to the Route 53 console. In the left navigation pane, choose Hosted Zones to view the list of private and public zones currently configured in Route 53. Select the hosted zone that you specified in the ZONE tag when you launched the cluster. Verify that the resource records were created.
You can also monitor the CloudWatch Events metrics that are published to CloudWatch every minute, such as the number of TriggeredRules and Invocations.
Now that you’ve verified that the Lambda function successfully updated the Route 53 resource records in the zone file, terminate the EMR cluster and verify that the records are removed by the same function.
Conclusion
This solution provides a serverless approach to automatically assigning a friendly name for your EMR cluster for easy access to popular notebooks and other web interfaces. CloudWatch Events also supports cross-account event delivery, so if you are running EMR clusters in multiple AWS accounts, all cluster state events across accounts can be consolidated into a single account.
I hope that this solution provides a small glimpse into the power of CloudWatch Events and Lambda and how they can be leveraged with EMR and other AWS big data services. For example, by using the EMR step state change event, you can chain various pieces of your analytics pipeline. You may have a transient cluster perform data ingest and, when the task successfully completes, spin up an ETL cluster for transformation and upload to Amazon Redshift. The possibilities are truly endless.
Message brokers can be used to solve a number of needs in enterprise architectures, including managing workload queues and broadcasting messages to a number of subscribers. Amazon MQ is a managed message broker service for Apache ActiveMQ that makes it easy to set up and operate message brokers in the cloud.
In this post, I discuss one approach to invoking AWS Lambda from queues and topics managed by Amazon MQ brokers. This and other similar patterns can be useful in integrating legacy systems with serverless architectures. You could also integrate systems already migrated to the cloud that use common APIs such as JMS.
For example, imagine that you work for a company that produces training videos and which recently migrated its video management system to AWS. The on-premises system used to publish a message to an ActiveMQ broker when a video was ready for processing by an on-premises transcoder. However, on AWS, your company uses Amazon Elastic Transcoder. Instead of modifying the management system, Lambda polls the broker for new messages and starts a new Elastic Transcoder job. This approach avoids changes to the existing application while refactoring the workload to leverage cloud-native components.
This solution uses Amazon CloudWatch Events to trigger a Lambda function that polls the Amazon MQ broker for messages. Instead of starting an Elastic Transcoder job, the sample writes the received message to an Amazon DynamoDB table with a time stamp indicating the time received.
Getting started
To start, navigate to the Amazon MQ console. Next, launch a new Amazon MQ instance, selecting Single-instance Broker and supplying a broker name, user name, and password. Be sure to document the user name and password for later.
For the purposes of this sample, choose the default options in the Advanced settings section. Your new broker is deployed to the default VPC in the selected AWS Region with the default security group. For this post, you update the security group to allow access for your sample Lambda function. In a production scenario, I recommend deploying both the Lambda function and your Amazon MQ broker in your own VPC.
After several minutes, your instance changes status from “Creation Pending” to “Available.” You can then visit the Details page of your broker to retrieve connection information, including a link to the ActiveMQ web console where you can monitor the status of your broker, publish test messages, and so on. In this example, use the Stomp protocol to connect to your broker. Be sure to capture the broker host name, for example:
<BROKER_ID>.mq.us-east-1.amazonaws.com
You should also modify the Security Group for the broker by clicking on its Security Group ID. Click the Edit button and then click Add Rule to allow inbound traffic on port 8162 for your IP address.
Deploying and scheduling the Lambda function
To simplify the deployment of this example, I’ve provided an AWS Serverless Application Model (SAM) template that deploys the sample function and DynamoDB table, and schedules the function to be invoked every five minutes. Detailed instructions can be found with sample code on GitHub in the amazonmq-invoke-aws-lambda repository, with sample code. I discuss a few key aspects in this post.
First, SAM makes it easy to deploy and schedule invocation of our function:
In the code, you include the URI, user name, and password for your newly created Amazon MQ broker. These allow the function to poll the broker for new messages on the sample queue.
stomp.connect(options, (error, client) => {
if (error) { /* do something */ }
let headers = {
destination: ‘/queue/SAMPLE_QUEUE’,
ack: ‘auto’
}
client.subscribe(headers, (error, message) => {
if (error) { /* do something */ }
message.readString(‘utf-8’, (error, body) => {
if (error) { /* do something */ }
let params = {
FunctionName: MyWorkerFunction,
Payload: JSON.stringify({
message: body,
timestamp: Date.now()
})
}
let lambda = new AWS.Lambda()
lambda.invoke(params, (error, data) => {
if (error) { /* do something */ }
})
}
})
})
Sending a sample message
For the purpose of this example, use the Amazon MQ console to send a test message. Navigate to the details page for your broker.
About midway down the page, choose ActiveMQ Web Console. Next, choose Manage ActiveMQ Broker to launch the admin console. When you are prompted for a user name and password, use the credentials created earlier.
At the top of the page, choose Send. From here, you can send a sample message from the broker to subscribers. For this example, this is how you generate traffic to test the end-to-end system. Be sure to set the Destination value to “SAMPLE_QUEUE.” The message body can contain any text. Choose Send.
You now have a Lambda function polling for messages on the broker. To verify that your function is working, you can confirm in the DynamoDB console that the message was successfully received and processed by the sample Lambda function.
First, choose Tables on the left and select the table name “amazonmq-messages” in the middle section. With the table detail in view, choose Items. If the function was successful, you’ll find a new entry similar to the following:
If there is no message in DynamoDB, check again in a few minutes or review the CloudWatch Logs group for Lambda functions that contain debug messages.
Alternative approaches
Beyond the approach described here, you may consider other approaches as well. For example, you could use an intermediary system such as Apache Flume to pass messages from the broker to Lambda or deploy Apache Camel to trigger Lambda via a POST to API Gateway. There are trade-offs to each of these approaches. My goal in using CloudWatch Events was to introduce an easily repeatable pattern familiar to many Lambda developers.
Summary
I hope that you have found this example of how to integrate AWS Lambda with Amazon MQ useful. If you have expertise or legacy systems that leverage APIs such as JMS, you may find this useful as you incorporate serverless concepts in your enterprise architectures.
To learn more, see the Amazon MQ website and Developer Guide. You can try Amazon MQ for free with the AWS Free Tier, which includes up to 750 hours of a single-instance mq.t2.micro broker and up to 1 GB of storage per month for one year.
AWS CodeDeploy is a service that automates application deployments to your compute infrastructure, including fleets of Amazon EC2 instances. AWS CodeDeploy can automatically deploy the latest app version to any new EC2 instance launched due to a scaling event. However, if your servers are not part of the Auto Scaling group, it might be a challenge to automate the code deployment for new EC2 launches. This is especially true if you are running your workload on an EC2 Spot Fleet. In this post, I’ll show you how to use AWS CodeDeploy, an Amazon CloudWatch Events rule, EC2 tags, and AWS Lambda to automate your deployments so that all EC2 instances run the same version of your application.
Solution overview:
An Amazon CloudWatch rule is triggered when a new Amazon EC2 instance is launched. The rule invokes an AWS Lambda function that extracts three tags:
· Name
· CodeDeployDeploymentGroup
· CodeDeployApplication
The Lambda function then uses the instance tags to add the EC2 instance to the deployment group. After the instance has been added, Lambda queries AWS CodeDeploy to retrieve the last successful deployment in that deployment group and synchronizes the EC2 instance with the latest code. The following diagram shows the sequence:
Note: For an EC2 Spot Fleet, the tags that you create for your Spot instance are not added automatically by the Spot service to fulfill the request. The Lambda function adds these tags after the Spot instance is launched.
Example:
Let’s take an example of an AWS CodeDeploy application cd-single-deploy-app, and its deployment group, cd-single-deploy-app-group, which has two instances in it. These instances are not part of an Auto Scaling group.
In the AWS CodeDeploy console, you’ll find the deployment ID, d-CSBCXR2GP. We will use this ID later in the testing phase. Our workflow ensures that when a new EC2 instance is launched, it joins the deployment group automatically and the latest version of the application is deployed to it.
You can configure the Lambda function, associated IAM role and policy, and the CloudWatch Events rule by running this CloudFormation template included in the code repository.
Step 3: In Amazon CloudWatch Events, set up a rule for running instances and configure the Lambda function as a target.
Step 4: In the AWS Lambda console, choose your Lambda function. On the Triggers tab, check that the trigger is enabled.
You can now test to ensure if a new EC2 instance is launched, it gets synchronized automatically with the latest code
Test:
A new EC2 instance is launched from the EC2 console, the AWS CloudFormation template, or the EC2 APIs with CodeDeployApplication, CodeDeployDeploymentGroup, and Name tag.
Go to CloudWatch Logs console to check the Lambda execution logs.
Now, go to CodeDeploy console to confirm that the new deployment was successful.
In this screenshot, you can see that the new EC2 instance was synched with latest code.
Summary:
In this post, I’ve demonstrated how to use AWS CodeDeploy, AWS Lambda, an Amazon CloudWatch Events rule, and EC2 tags to automate deployments to disparate EC2 instances. If you are running your workload on Spot instances or have a fleet of On-Demand EC2 instances as part of your application, you can use the steps in this blog post to solve the automation around the deployment for new instances.
As companies mature in their cloud journey, they implement layered security capabilities and practices in their cloud architectures. One such practice is to continually assess golden Amazon Machine Images (AMIs) for security vulnerabilities. AMIs provide the information required to launch an Amazon EC2 instance, which is a virtual server in the AWS Cloud. A golden AMI is an AMI that contains the latest security patches, software, configuration, and software agents that you need to install for logging, security maintenance, and performance monitoring. You can build and deploy golden AMIs in your environment, but the AMIs quickly become dated as new vulnerabilities are discovered.
A security best practice is to perform routine vulnerability assessments of your golden AMIs to identify if newly found vulnerabilities apply to them. If you identify a vulnerability, you can update your golden AMIs with the appropriate security patches, test the AMIs, and deploy the patched AMIs in your environment. In this blog post, I demonstrate how to use Amazon Inspector to set up such continuous vulnerability assessments to scan your golden AMIs routinely.
Solution overview
Amazon Inspector performs security assessments of Amazon EC2 instances by using AWS managed rules packages such as the Common Vulnerabilities and Exposures (CVEs) package. The solution in this post creates EC2 instances from golden AMIs and then runs an Amazon Inspector security assessment on the created instances. When the assessment results are available, the solution consolidates the findings and advises you about next steps. Furthermore, the solution schedules an Amazon CloudWatch Events rule to run the golden AMI vulnerability assessments on a regular basis.
The following solution diagram illustrates how this solution works.
Here’s how this solution works, as illustrated in the preceding diagram:
A scheduled CloudWatch Events event triggers the StartContinuousAssessmentAWS Lambda function, which starts the security assessment of your golden AMIs.
The StartContinuousAssessment Lambda function performs the following actions:
It reads a JSON parameter stored in the AWS Systems Manager (Systems Manager) Parameter Store. This JSON parameter contains the following metadata for each golden AMI:
InstanceType – A valid instance-type for launching an EC2 instance of the golden AMI.
Later in this blog post, I provide instructions for creating this JSON parameter.
For each AMI specified in the JSON parameter, the Lambda function creates an EC2 instance. When each instance starts, it installs the Amazon Inspector agent by using the user-data script provided in the JSON. The Lambda function then copies each golden AMI’s tags (you will assign custom metadata in the form of tags to each golden AMI when you set up the solution) to the corresponding EC2 instance. The function also adds a tag with the key of continuous-assessment-instance and value as true. This tag identifies EC2 instances that require regular security assessments. The Lambda function copies the AMI’s tags to the instance (and later, to the security findings found for the instance) to help you identify the golden AMIs for each security finding. After you analyze security findings, you can patch your golden AMIs.
The first time the StartContinuousAssessment function runs, it creates:
An Amazon Inspector assessment template: The template contains a reference to the Amazon Inspector assessment target created in the preceding step and the following AWS managed rules packages to evaluate:
For subsequent assessments, the StartContinuousAssessment function reuses the target and the template created during the first run of StartContinuousAssessment function.
Note: Amazon Inspector can start an assessment only after it finds at least one running Amazon Inspector agent. To allow EC2 instances to boot and the Amazon inspector agent to start, the Lambda function waits four minutes. Because the assessment runs for approximately one hour and boot time for EC2 instances typically takes a few minutes, all Amazon Inspector agents start before the assessment ends.
The Lambda function then runs the assessment. The Amazon Inspector agents collect behavior and configuration data, and pass it to Amazon Inspector. Amazon Inspector analyzes the data and generates Amazon Inspector findings, which are possible security findings you may need to address.
After the Lambda function completes the assessment, Amazon Inspector publishes an assessment-completion notification message to an Amazon SNS topic called ContinuousAssessmentCompleteTopic. SNS uses topics, which are communication channels for sending messages and subscribing to notifications.
The notification message published to SNS triggers the AnalyzeInspectorFindings Lambda function, which performs the following actions:
Associates the tags of each EC2 instance with security findings found for that EC2 instance. This enables you to identify the security findings using the app-name tag you specified for your golden AMIs. You can use the information provided in the findings to patch your golden AMIs.
Terminates all instances associated with the continuous-assessment-instance=true tag.
Aggregates the number of findings found for each EC2 instance by severity and then publishes a consolidated result to an SNS topic called ContinuousAssessmentResultsTopic.
How to deploy the solution
To deploy this solution, you must set it up in the AWS Region where you build your golden AMIs. If that AWS Region does not support Amazon Inspector, at the end of your continuous integration pipeline, you can copy your AMIs to an AWS Region where Amazon Inspector assessments are supported. To learn more about continuous integration pipelines, see What is Continuous Integration?
To deploy continuous golden AMI vulnerability assessments in your AWS account, follow these steps:
Tag your golden AMIs – Tagging your golden AMIs lets you search assessment result findings based on tags after Amazon Inspector completes an assessment.
Store your golden AMI metadata in the Systems Manager Parameter Store – Prepare and store the golden AMI metadata in the Systems Manager Parameter Store. The StartContinuousAssessment Lambda function reads golden AMI metadata and starts assessing for vulnerabilities.
Run the supplied AWS CloudFormation template and subscribe to an SNS topic to receive assessment results – Set up the infrastructure required to run vulnerability assessments and subscribe to an SNS topic to receive assessment results via email.
Test golden AMI vulnerability assessments – Ensure you have successfully set up the required resources to run vulnerability assessments.
Set up a CloudWatch Events rule for triggering continuous golden AMI vulnerability assessments – Schedule the execution of vulnerability assessments on a regular basis.
1. Tag your golden AMIs
You can search assessment findings based on golden AMI tags after Amazon Inspector completes an assessment.
To tag a golden AMI by using the AWS Management Console:
Choose your AMI from the list, and then choose Actions > Add/Edit Tags.
Choose Create Tag. In the Key column, type app-name. In the Value column, type your application name. Following the same steps, create the app-version and app-environment tags. Choose Save.
Now that you have tagged your golden AMIs, you need to create golden AMI metadata, which will be read by the StartContinuousAssessment function to initiate vulnerability assessments. You will store the golden AMI metadata in the Systems Manager Parameter Store.
2. Store your golden AMI metadata in the Systems Manager Parameter Store
This solution reads golden AMI metadata from a parameter stored in the Systems Manager Parameter Store. The metadata must be in JSON format and must contain the following information for each golden AMI:
Ami-Id
InstanceType
UserData
Step A: Find the AMI ID of your golden AMI.
An AMI ID uniquely identifies an AMI in an AWS Region and is a required parameter for launching an EC2 instance from a golden AMI. To find the AMI ID of your golden AMI:
Choose your AMI from the list and then note the corresponding value in the AMI ID column.
Step B: Find a compatible InstanceType for your golden AMI.
Each AMI has a list of compatible InstanceTypes. The InstanceType is a required parameter for launching an EC2 instance from a golden AMI. To find a compatible InstanceType for your golden AMI:
Choose Launch Instance. On the Choose an Amazon Machine Image (AMI) page, choose My AMIs.
Type the AMI ID that you noted in Step A in the Search my AMIs box, and then choose Enter.
The search result will contain your golden AMI. To choose it, choose Select.
Locate any available Instance Type and then note the corresponding value in the Type column.
Choose Cancel.
Note: Amazon Inspector will launch the chosen InstanceType every time the vulnerability assessment runs.
Step C: Create the user-data script to install and start the Amazon Inspector agent.
The user-data script automates the installation of software packages when an EC2 instance launches for the first time. In this step, you create an operating system specific, JSON-compatible user-data script that installs and starts the Amazon Inspector agent.
Identify the command that installs the Amazon Inspector agent
Based on Installing Amazon Inspector Agents, the following shell command installs the Amazon Inspector agent on an Amazon Linux-based EC2 instance.
Based on Running Commands on Your Linux Instance at Launch, you make a Linux shell script user-data compatible by prefixing it with a #!/bin/bash. In this step, you add the #!/bin/bash prefix to the script from the preceding step. The following is the user-data compatible version of the script from the preceding step.
The user-data script provided in the JSON metadata must be JSON-compatible, which you will do next.
Make the user-data script JSON compatible
To make the user-data script JSON compatible, you must replace all new-line characters with a \r\n\r\n sequence. The following is the JSON-compatible user-data script that you specify for your Amazon Linux-based golden AMI in Step D.
Repeat Steps A, B, and C to find the Ami Id, InstanceType, and UserData for each of your golden AMIs. When you have this metadata, you can create the JSON document of metadata for all your golden AMIs. The StartContinuousAssessment Lambda function reads this JSON to start golden AMI vulnerability assessments.
Step D: Create a JSON document of metadata of all your golden AMIs.
Use the following template to create a JSON document:
Replace all placeholder values with values corresponding to your first golden AMI. If your golden AMI is Amazon Linux-based, you can specify the userData as the JSON-compatible-user-data-for-Amazon-Linux-AMI from Step C.5. Next, replace the placeholder values for your second golden AMI. You can add more entries to your JSON document, if you have more than two golden AMIs.
Note: The total number of characters in the JSON document must be fewer than or equal to 4,096 characters, and the number of golden AMIs must be fewer than 500. You must verify whether your account has permissions to run one on-demand EC2 instance for each of your golden AMIs. For information about how to verify service limits, see Amazon EC2 Service Limits.
Now that you have created the JSON document of your golden AMIs, you will store the JSON document in a Systems Manager parameter. The StartContinuousAssessment Lambda function will read the metadata from this parameter.
Step E: Store the JSON in a Systems Manager parameter.
Expand Systems Manager Shared Resources in the navigation pane, and then choose Parameter Store.
Choose Create Parameter.
For Name, type ContinuousAssessmentInput.
In the Description field, type Continuous golden AMI vulnerability assessment process metadata.
For Type, choose String.
Paste the JSON that you created in Step D in the Value field.
Choose Create Parameter. After the system creates the parameter, choose Close.
To set up the remaining components required to run assessments, you will run a CloudFormation template and perform the configuration explained in the next section.
3. Run the CloudFormation template and subscribe to an SNS topic to receive assessment results
Next, create a CloudFormation stack using the provided CloudFormation template. Before you start, download the CloudFormation template to your computer.
On the Stacks page, choose AmazonInspectorAssessment.
In the Detail pane, choose Outputs to view the output of your stack.
After CloudFormation successfully creates a stack, the Outputs tab displays following results:
StartContinuousAssessmentLambdaFunction – The Value box displays the name of the StartContinuousAssessment function. You will run this function to trigger the entire workflow.
ContinuousAssessmentResultsTopic – The Value box displays the ContinuousAssessmentResultsTopic topic’s Amazon Resource Name (ARN), which you will use later.
To receive consolidated vulnerability assessment results in email, you must subscribe to ContinuousAssessmentResultsTopic.
Choose Create subscription. In the Topic ARN field, paste the ARN of ContinuousAssessmentResultsTopic that you noted in the previous section.
In the Protocol drop-down, choose Email.
In the Endpoint box, type the email address where you will receive notifications.
Choose Create subscription.
Navigate to your email application and open the message from AWS Notifications. Click the link to confirm your subscription to the SNS topic.
4. Test golden AMI vulnerability assessments
Before you schedule vulnerability assessments, you should test the process by running the StartContinuousAssessment function. In this test, you trigger a security assessment and monitor it. You then receive an email after the assessment has completed, which shows that vulnerability assessments have been successfully set up.
On Dashboard under Recent Assessment Runs, you will see an entry with the status, Collecting Data. This status indicates that Amazon Inspector agents are collecting data from instances running your golden AMIs. The agents collect data for an hour and then Amazon Inspector analyzes the collected data.
After Amazon Inspector completes the assessment, the status in the console changes to Analysis complete. Amazon Inspector then publishes an SNS message that triggers the AnalyzeInspectionReports Lambda function. When AnalyzeInspectionReports publishes results, you will receive an email containing consolidated assessment results. You also will be able to see the findings.
To see the findings in Amazon Inspector’s Findings section:
In the navigation pane, choose Assessment Runs. In the table on the Amazon Inspector – Assessment Runs page, choose the findings of the latest assessment run.
Choose the settings () icon and choose the appropriate tags to see the details of findings, as shown in the following screenshot. The findings also contain information about how you can address each underlying vulnerability.
Having verified that you have successfully set up all components of golden AMI vulnerability assessments, you now will schedule the vulnerability assessments to run on a regular basis to give you continual insight into the health of instances created from your golden AMIs.
5. Set up a CloudWatch Events rule for triggering continuous golden AMI vulnerability assessments
The last step is to create a CloudWatch Events rule to schedule the execution of the vulnerability assessments on a daily or weekly basis.
In the navigation pane, choose Rules > Create rule.
On the Event Source page, choose Schedule. Choose Fixed rate of and specify the interval (for example, 1 day).
For Targets, choose Add target and then choose Lambda function.
For Function, choose the StartContinuousAssessment function.
Choose Configure Input.
Choose Constant (JSON text).
In the box, paste the following JSON code.
{
"AMIsParamName": "ContinuousAssessmentInput"
}
Choose Configure details.
For Rule definition, type ContinuousGoldenAMIAssessmentTrigger for the name, and type as the description, This rule triggers the continuous golden AMI vulnerability assessment process.
Choose Create rule.
The vulnerability assessments are executed on the first occurrence of the schedule you chose while setting up the CloudWatch Events rule. After the vulnerability assessment is executed, you will receive an email to indicate that your continuous golden AMI vulnerability assessments are set up.
Summary
To get visibility into the security of your EC2 instances created from your golden AMIs, it is important that you perform security assessments of your golden AMIs on a regular basis. In this blog post, I have demonstrated how to set up vulnerability assessments, and the results of these continuous golden AMI vulnerability assessments can help you keep your environment up to date with security patches. To learn how to patch your golden AMIs, see Streamline AMI Maintenance and Patching Using Amazon EC2 Systems Manager.
If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about implementing the solution in this post, start a new thread on the Amazon Inspector forum or contact AWS Support.
Whether you want to review a Security, Compliance, & Identity track session you attended at AWS re:Invent 2017, or you want to experience a session for the first time, videos and slide decks from the Security, Compliance, & Identity track are now available.
Introductory
SID201: IAM for Enterprises: How Vanguard Strikes the Balance Between Agility, Governance, and Security
Introduced at AWS re:Invent 2017, Amazon GuardDuty is a managed threat detection service that continuously monitors for malicious or unauthorized behavior to help you protect your AWS accounts and workloads. In an AWS Blog post, Jeff Barr shows you how to enable GuardDuty to monitor your AWS resources continuously. That blog post shows how to get started with a single GuardDuty account and provides an overview of the features of the service. Your security team, though, will probably want to use GuardDuty to monitor a group of AWS accounts continuously.
In this post, I demonstrate how to use GuardDuty to monitor a group of AWS accounts and have their findings routed to another AWS account—the master account—that is owned by a security team. The method I demonstrate in this post is especially useful if your security team is responsible for monitoring a group of AWS accounts over which it does not have direct access—known as member accounts. In this solution, I simplify the work needed to enable GuardDuty in member accounts and configure findings by simplifying the process, which I do by enabling GuardDuty in the master account and inviting member accounts.
Enable GuardDuty in a master account and invite member accounts
To get started, you must enable GuardDuty in the master account, which will receive GuardDuty findings. The master account should be managed by your security team, and it will display the findings from all member accounts. The master account can be reverted later by removing any member accounts you add to it. Adding member accounts is a two-way handshake mechanism to ensure that administrators from both the master and member accounts formally agree to establish the relationship.
To enable GuardDuty in the master account and add member accounts:
To designate this account as the GuardDuty master account, start adding member accounts:
You can add individual accounts by choosing Add Account, or you can add a list of accounts by choosing Upload List (.csv).
Now, add the account ID and email address of the member account, and choose Add. (If you are uploading a list of accounts, choose Browse, choose the .csv file with the member accounts [one email address and account ID per line], and choose Add accounts.)
For security reasons, AWS checks to make sure each account ID is valid and that you’ve entered each member account’s email address that was used to create the account. If a member account’s account ID and email address do not match, GuardDuty does not send an invitation.
After you add all the member accounts you want to add, you will see them listed in the Member accounts table with a Status of Invite. You don’t have to individually invite each account—you can choose a group of accounts and when you choose to invite one account in the group, all accounts are invited.
When you choose Invite for each member account:
AWS checks to make sure the account ID is valid and the email address provided is the email address of the member account.
AWS sends an email to the member account email address with a link to the GuardDuty console, where the member account owner can accept the invitation. You can add a customized message from your security team. Account owners who receive the invitation must sign in to their AWS account to accept the invitation. The service also sends an invitation through the AWS Personal Health Dashboard in case the member email address is not monitored. This invitation appears in the member account under the AWS Personal Health Dashboard alert bell on the AWS Management Console.
A pending-invitation indicator is shown on the GuardDuty console of the member account, as shown in the following screenshot.
When the invitation is sent by email, it is sent to the account owner of the GuardDuty member account.
The account owner can click the link in the email invitation or the AWS Personal Health Dashboard message, or the account owner can sign in to their account and navigate to the GuardDuty console. In all cases, the member account displays the pending invitation in the member account’s GuardDuty console with instructions for accepting the invitation. The GuardDuty console walks the account owner through accepting the invitation, including enabling GuardDuty if it is not already enabled.
If you prefer to work in the AWS CLI, you can enable GuardDuty and accept the invitation. To do this, call CreateDetector to enable GuardDuty, and then call AcceptInvitation, which serves the same purpose as accepting the invitation in the GuardDuty console.
After the member account owner accepts the invitation, the Status in the master account is changed to Monitored. The status helps you track the status of each AWS account that you invite.
You have enabled GuardDuty on the member account, and all findings will be forwarded to the master account. You can now monitor the findings about GuardDuty member accounts from the GuardDuty console in the master account.
The member account owner can see GuardDuty findings by default and can control all aspects of the experience in the member account with AWS Identity and Access Management (IAM) permissions. Users with the appropriate permissions can end the multi-account relationship at any time by toggling the Accept button on the Accounts page. Note that ending the relationship changes the Status of the account to Resigned and also triggers a security finding on the side of the master account so that the security team knows the member account is no longer linked to the master account.
Working with GuardDuty findings
Most security teams have ticketing systems, chat operations, security information event management (SIEM) systems, or other security automation systems to which they would like to push GuardDuty findings. For this purpose, GuardDuty sends all findings as JSON-based messages through Amazon CloudWatch Events, a scalable service to which you can subscribe and to which AWS services can stream system events. To access these events, navigate to the CloudWatch Events console and create a rule that subscribes to the GuardDuty-related findings. You then can assign a target such as Amazon Kinesis Data Firehose that can place the findings in a number of services such as Amazon S3. The following screenshot is of the CloudWatch Events console, where I have a rule that pulls all events from GuardDuty and pushes them to a preconfigured AWS Lambda function.
The following example is a subset of GuardDuty findings that includes relevant context and information about the nature of a threat that was detected. In this example, the instanceId, i-00bb62b69b7004a4c, is performing Secure Shell (SSH) brute-force attacks against IP address 172.16.0.28. From a Lambda function, you can access any of the following fields such as the title of the finding and its description, and send those directly to your ticketing system.
You can use other AWS services to build custom analytics and visualizations of your security findings. For example, you can connect Kinesis Data Firehose to CloudWatch Events and write events to an S3 bucket in a standard format, which can be encrypted with AWS Key Management Service and then compressed. You also can use Amazon QuickSight to build ad hoc dashboards by using AWS Glue and Amazon Athena. Similarly, you can place the data from Kinesis Data Firehose in Amazon Elasticsearch Service, with which you can use tools such as Kibana to build your own visualizations and dashboards.
Like most other AWS services, GuardDuty is a regional service. This means that when you enable GuardDuty in an AWS Region, all findings are generated and delivered in that region. If you are regulated by a compliance regime, this is often an important requirement to ensure that security findings remain in a specific jurisdiction. Because customers have let us know they would prefer to be able to enable GuardDuty globally and have all findings aggregated in one place, we intend to give the choice of regional or global isolation as we evolve this new service.
Summary
In this blog post, I have demonstrated how to use GuardDuty to monitor a group of GuardDuty member accounts and aggregate security findings in a central master GuardDuty account. You can use this solution whether or not you have direct control over the member accounts.
If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about using GuardDuty, start a thread in the GuardDuty forum or contact AWS Support.
This post contributed by: Wangechi Dole, AWS Solutions Architect Milan Krasnansky, ING, Digital Solutions Developer, SGK Rian Mookencherry, Director – Product Innovation, SGK
Data processing and transformation is a common use case you see in our customer case studies and success stories. Often, customers deal with complex data from a variety of sources that needs to be transformed and customized through a series of steps to make it useful to different systems and stakeholders. This can be difficult due to the ever-increasing volume, velocity, and variety of data. Today, data management challenges cannot be solved with traditional databases.
Workflow automation helps you build solutions that are repeatable, scalable, and reliable. You can use AWS Step Functions for this. A great example is how SGK used Step Functions to automate the ETL processes for their client. With Step Functions, SGK has been able to automate changes within the data management system, substantially reducing the time required for data processing.
In this post, SGK shares the details of how they used Step Functions to build a robust data processing system based on highly configurable business transformation rules for ETL processes.
SGK: Building dynamic ETL pipelines
SGK is a subsidiary of Matthews International Corporation, a diversified organization focusing on brand solutions and industrial technologies. SGK’s Global Content Creation Studio network creates compelling content and solutions that connect brands and products to consumers through multiple assets including photography, video, and copywriting.
We were recently contracted to build a sophisticated and scalable data management system for one of our clients. We chose to build the solution on AWS to leverage advanced, managed services that help to improve the speed and agility of development.
The data management system served two main functions:
Ingesting a large amount of complex data to facilitate both reporting and product funding decisions for the client’s global marketing and supply chain organizations.
Processing the data through normalization and applying complex algorithms and data transformations. The system goal was to provide information in the relevant context—such as strategic marketing, supply chain, product planning, etc. —to the end consumer through automated data feeds or updates to existing ETL systems.
We were faced with several challenges:
Output data that needed to be refreshed at least twice a day to provide fresh datasets to both local and global markets. That constant data refresh posed several challenges, especially around data management and replication across multiple databases.
The complexity of reporting business rules that needed to be updated on a constant basis.
Data that could not be processed as contiguous blocks of typical time-series data. The measurement of the data was done across seasons (that is, combination of dates), which often resulted with up to three overlapping seasons at any given time.
Input data that came from 10+ different data sources. Each data source ranged from 1–20K rows with as many as 85 columns per input source.
These challenges meant that our small Dev team heavily invested time in frequent configuration changes to the system and data integrity verification to make sure that everything was operating properly. Maintaining this system proved to be a daunting task and that’s when we turned to Step Functions—along with other AWS services—to automate our ETL processes.
Solution overview
Our solution included the following AWS services:
AWS Step Functions: Before Step Functions was available, we were using multiple Lambda functions for this use case and running into memory limit issues. With Step Functions, we can execute steps in parallel simultaneously, in a cost-efficient manner, without running into memory limitations.
AWS Lambda: The Step Functions state machine uses Lambda functions to implement the Task states. Our Lambda functions are implemented in Java 8.
Amazon DynamoDB provides us with an easy and flexible way to manage business rules. We specify our rules as Keys. These are key-value pairs stored in a DynamoDB table.
Amazon RDS: Our ETL pipelines consume source data from our RDS MySQL database.
Amazon Redshift: We use Amazon Redshift for reporting purposes because it integrates with our BI tools. Currently we are using Tableau for reporting which integrates well with Amazon Redshift.
Amazon S3: We store our raw input files and intermediate results in S3 buckets.
Amazon CloudWatch Events: Our users expect results at a specific time. We use CloudWatch Events to trigger Step Functions on an automated schedule.
Solution architecture
This solution uses a declarative approach to defining business transformation rules that are applied by the underlying Step Functions state machine as data moves from RDS to Amazon Redshift. An S3 bucket is used to store intermediate results. A CloudWatch Event rule triggers the Step Functions state machine on a schedule. The following diagram illustrates our architecture:
Here are more details for the above diagram:
A rule in CloudWatch Events triggers the state machine execution on an automated schedule.
The state machine invokes the first Lambda function.
The Lambda function deletes all existing records in Amazon Redshift. Depending on the dataset, the Lambda function can create a new table in Amazon Redshift to hold the data.
The same Lambda function then retrieves Keys from a DynamoDB table. Keys represent specific marketing campaigns or seasons and map to specific records in RDS.
The state machine executes the second Lambda function using the Keys from DynamoDB.
The second Lambda function retrieves the referenced dataset from RDS. The records retrieved represent the entire dataset needed for a specific marketing campaign.
The second Lambda function executes in parallel for each Key retrieved from DynamoDB and stores the output in CSV format temporarily in S3.
Finally, the Lambda function uploads the data into Amazon Redshift.
To understand the above data processing workflow, take a closer look at the Step Functions state machine for this example.
We walk you through the state machine in more detail in the following sections.
Walkthrough
To get started, you need to:
Create a schedule in CloudWatch Events
Specify conditions for RDS data extracts
Create Amazon Redshift input files
Load data into Amazon Redshift
Step 1: Create a schedule in CloudWatch Events Create rules in CloudWatch Events to trigger the Step Functions state machine on an automated schedule. The following is an example cron expression to automate your schedule:
In this example, the cron expression invokes the Step Functions state machine at 3:00am and 2:00pm (UTC) every day.
Step 2: Specify conditions for RDS data extracts We use DynamoDB to store Keys that determine which rows of data to extract from our RDS MySQL database. An example Key is MCS2017, which stands for, Marketing Campaign Spring 2017. Each campaign has a specific start and end date and the corresponding dataset is stored in RDS MySQL. A record in RDS contains about 600 columns, and each Key can represent up to 20K records.
A given day can have multiple campaigns with different start and end dates running simultaneously. In the following example DynamoDB item, three campaigns are specified for the given date.
The state machine example shown above uses Keys 31, 32, and 33 in the first ChoiceState and Keys 21 and 22 in the second ChoiceState. These keys represent marketing campaigns for a given day. For example, on Monday, there are only two campaigns requested. The ChoiceState with Keys 21 and 22 is executed. If three campaigns are requested on Tuesday, for example, then ChoiceState with Keys 31, 32, and 33 is executed. MCS2017 can be represented by Key 21 and Key 33 on Monday and Tuesday, respectively. This approach gives us the flexibility to add or remove campaigns dynamically.
Step 3: Create Amazon Redshift input files When the state machine begins execution, the first Lambda function is invoked as the resource for FirstState, represented in the Step Functions state machine as follows:
As described in the solution architecture, the purpose of this Lambda function is to delete existing data in Amazon Redshift and retrieve keys from DynamoDB. In our use case, we found that deleting existing records was more efficient and less time-consuming than finding the delta and updating existing records. On average, an Amazon Redshift table can contain about 36 million cells, which translates to roughly 65K records. The following is the code snippet for the first Lambda function in Java 8:
public class LambdaFunctionHandler implements RequestHandler<Map<String,Object>,Map<String,String>> {
Map<String,String> keys= new HashMap<>();
public Map<String, String> handleRequest(Map<String, Object> input, Context context){
Properties config = getConfig();
// 1. Cleaning Redshift Database
new RedshiftDataService(config).cleaningTable();
// 2. Reading data from Dynamodb
List<String> keyList = new DynamoDBDataService(config).getCurrentKeys();
for(int i = 0; i < keyList.size(); i++) {
keys.put(”key" + (i+1), keyList.get(i));
}
keys.put(”key" + T,String.valueOf(keyList.size()));
// 3. Returning the key values and the key count from the “for” loop
return (keys);
}
The variable $.keyT represents the number of keys retrieved from DynamoDB. This variable determines which of the parallel branches should be executed. At the time of publication, Step Functions does not support dynamic parallel state. Therefore, choices under ChoiceState are manually created and assigned hardcoded StringEquals values. These values represent the number of parallel executions for the second Lambda function.
For example, if $.keyT equals 3, the second Lambda function is executed three times in parallel with keys, $key1, $key2 and $key3 retrieved from DynamoDB. Similarly, if $.keyT equals two, the second Lambda function is executed twice in parallel. The following JSON represents this parallel execution:
Step 4: Load data into Amazon Redshift The second Lambda function in the state machine extracts records from RDS associated with keys retrieved for DynamoDB. It processes the data then loads into an Amazon Redshift table. The following is code snippet for the second Lambda function in Java 8.
public class LambdaFunctionHandler implements RequestHandler<String, String> {
public static String key = null;
public String handleRequest(String input, Context context) {
key=input;
//1. Getting basic configurations for the next classes + s3 client Properties
config = getConfig();
AmazonS3 s3 = AmazonS3ClientBuilder.defaultClient();
// 2. Export query results from RDS into S3 bucket
new RdsDataService(config).exportDataToS3(s3,key);
// 3. Import query results from S3 bucket into Redshift
new RedshiftDataService(config).importDataFromS3(s3,key);
System.out.println(input);
return "SUCCESS";
}
}
After the data is loaded into Amazon Redshift, end users can visualize it using their preferred business intelligence tools.
Lessons learned
At the time of publication, the 1.5–GB memory hard limit for Lambda functions was inadequate for processing our complex workload. Step Functions gave us the flexibility to chunk our large datasets and process them in parallel, saving on costs and time.
In our previous implementation, we assigned each key a dedicated Lambda function along with CloudWatch rules for schedule automation. This approach proved to be inefficient and quickly became an operational burden. Previously, we processed each key sequentially, with each key adding about five minutes to the overall processing time. For example, processing three keys meant that the total processing time was three times longer. With Step Functions, the entire state machine executes in about five minutes.
Using DynamoDB with Step Functions gave us the flexibility to manage keys efficiently. In our previous implementations, keys were hardcoded in Lambda functions, which became difficult to manage due to frequent updates. DynamoDB is a great way to store dynamic data that changes frequently, and it works perfectly with our serverless architectures.
Conclusion
With Step Functions, we were able to fully automate the frequent configuration updates to our dataset resulting in significant cost savings, reduced risk to data errors due to system downtime, and more time for us to focus on new product development rather than support related issues. We hope that you have found the information useful and that it can serve as a jump-start to building your own ETL processes on AWS with managed AWS services.
For more information about how Step Functions makes it easy to coordinate the components of distributed applications and microservices in any workflow, see the use case examples and then build your first state machine in under five minutes in the Step Functions console.
If you have questions or suggestions, please comment below.
Today, the following Security, Compliance, & Identity sessions, workshops, and chalk talks will be presented at AWS re:Invent 2017 in Las Vegas. All sessions are in the MGM Grand and all times are local. See the re:Invent Session Catalog for complete information about every session. You can also download the AWS re:Invent 2017 Mobile App for the latest updates and information.
If you are not attending re:Invent 2017, keep in mind that all videos of and slide decks from these sessions will be made available next week. We will publish a post on the Security Blog next week that links to all available videos and slide decks.
Threats to your IT infrastructure (AWS accounts & credentials, AWS resources, guest operating systems, and applications) come in all shapes and sizes! The online world can be a treacherous place and we want to make sure that you have the tools, knowledge, and perspective to keep your IT infrastructure safe & sound.
Amazon GuardDuty is designed to give you just that. Informed by a multitude of public and AWS-generated data feeds and powered by machine learning, GuardDuty analyzes billions of events in pursuit of trends, patterns, and anomalies that are recognizable signs that something is amiss. You can enable it with a click and see the first findings within minutes.
How it Works GuardDuty voraciously consumes multiple data streams, including several threat intelligence feeds, staying aware of malicious IP addresses, devious domains, and more importantly, learning to accurately identify malicious or unauthorized behavior in your AWS accounts. In combination with information gleaned from your VPC Flow Logs, AWS CloudTrail Event Logs, and DNS logs, this allows GuardDuty to detect many different types of dangerous and mischievous behavior including probes for known vulnerabilities, port scans and probes, and access from unusual locations. On the AWS side, it looks for suspicious AWS account activity such as unauthorized deployments, unusual CloudTrail activity, patterns of access to AWS API functions, and attempts to exceed multiple service limits. GuardDuty will also look for compromised EC2 instances talking to malicious entities or services, data exfiltration attempts, and instances that are mining cryptocurrency.
GuardDuty operates completely on AWS infrastructure and does not affect the performance or reliability of your workloads. You do not need to install or manage any agents, sensors, or network appliances. This clean, zero-footprint model should appeal to your security team and allow them to green-light the use of GuardDuty across all of your AWS accounts.
Findings are presented to you at one of three levels (low, medium, or high), accompanied by detailed evidence and recommendations for remediation. The findings are also available as Amazon CloudWatch Events; this allows you to use your own AWS Lambda functions to automatically remediate specific types of issues. This mechanism also allows you to easily push GuardDuty findings into event management systems such as Splunk, Sumo Logic, and PagerDuty and to workflow systems like JIRA, ServiceNow, and Slack.
A Quick Tour Let’s take a quick tour. I open up the GuardDuty Console and click on Get started:
Then I confirm that I want to enable GuardDuty. This gives it permission to set up the appropriate service-linked roles and to analyze my logs by clicking on Enable GuardDuty:
My own AWS environment isn’t all that exciting, so I visit the General Settings and click on Generate sample findings to move ahead. Now I’ve got some intriguing findings:
I can click on a finding to learn more:
The magnifying glass icons allow me to create inclusion or exclusion filters for the associated resource, action, or other value. I can filter for all of the findings related to this instance:
I can customize GuardDuty by adding lists of trusted IP addresses and lists of malicious IP addresses that are peculiar to my environment:
After I enable GuardDuty in my administrator account, I can invite my other accounts to participate:
Once the accounts decide to participate, GuardDuty will arrange for their findings to be shared with the administrator account.
I’ve barely scratched the surface of GuardDuty in the limited space and time that I have. You can try it out at no charge for 30 days; after that you pay based on the number of entries it processes from your VPC Flow, CloudTrail, and DNS logs.
Available Now Amazon GuardDuty is available in production form in the US East (Northern Virginia), US East (Ohio), US West (Oregon), US West (Northern California), EU (Ireland), EU (Frankfurt), EU (London), South America (São Paulo), Canada (Central), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), and Asia Pacific (Mumbai) Regions and you can start using it today!
This is a guest post by Rafi Ton, founder and CEO of NUVIAD. NUVIAD is, in their own words, “a mobile marketing platform providing professional marketers, agencies and local businesses state of the art tools to promote their products and services through hyper targeting, big data analytics and advanced machine learning tools.”
At NUVIAD, we’ve been using Amazon Redshift as our main data warehouse solution for more than 3 years.
We store massive amounts of ad transaction data that our users and partners analyze to determine ad campaign strategies. When running real-time bidding (RTB) campaigns in large scale, data freshness is critical so that our users can respond rapidly to changes in campaign performance. We chose Amazon Redshift because of its simplicity, scalability, performance, and ability to load new data in near real time.
Over the past three years, our customer base grew significantly and so did our data. We saw our Amazon Redshift cluster grow from three nodes to 65 nodes. To balance cost and analytics performance, we looked for a way to store large amounts of less-frequently analyzed data at a lower cost. Yet, we still wanted to have the data immediately available for user queries and to meet their expectations for fast performance. We turned to Amazon Redshift Spectrum.
In this post, I explain the reasons why we extended Amazon Redshift with Redshift Spectrum as our modern data warehouse. I cover how our data growth and the need to balance cost and performance led us to adopt Redshift Spectrum. I also share key performance metrics in our environment, and discuss the additional AWS services that provide a scalable and fast environment, with data available for immediate querying by our growing user base.
Amazon Redshift as our foundation
The ability to provide fresh, up-to-the-minute data to our customers and partners was always a main goal with our platform. We saw other solutions provide data that was a few hours old, but this was not good enough for us. We insisted on providing the freshest data possible. For us, that meant loading Amazon Redshift in frequent micro batches and allowing our customers to query Amazon Redshift directly to get results in near real time.
The benefits were immediately evident. Our customers could see how their campaigns performed faster than with other solutions, and react sooner to the ever-changing media supply pricing and availability. They were very happy.
However, this approach required Amazon Redshift to store a lot of data for long periods, and our data grew substantially. In our peak, we maintained a cluster running 65 DC1.large nodes. The impact on our Amazon Redshift cluster was evident, and we saw our CPU utilization grow to 90%.
Why we extended Amazon Redshift to Redshift Spectrum
Redshift Spectrum gives us the ability to run SQL queries using the powerful Amazon Redshift query engine against data stored in Amazon S3, without needing to load the data. With Redshift Spectrum, we store data where we want, at the cost that we want. We have the data available for analytics when our users need it with the performance they expect.
Seamless scalability, high performance, and unlimited concurrency
Scaling Redshift Spectrum is a simple process. First, it allows us to leverage Amazon S3 as the storage engine and get practically unlimited data capacity.
Second, if we need more compute power, we can leverage Redshift Spectrum’s distributed compute engine over thousands of nodes to provide superior performance – perfect for complex queries running against massive amounts of data.
Third, all Redshift Spectrum clusters access the same data catalog so that we don’t have to worry about data migration at all, making scaling effortless and seamless.
Lastly, since Redshift Spectrum distributes queries across potentially thousands of nodes, they are not affected by other queries, providing much more stable performance and unlimited concurrency.
Keeping it SQL
Redshift Spectrum uses the same query engine as Amazon Redshift. This means that we did not need to change our BI tools or query syntax, whether we used complex queries across a single table or joins across multiple tables.
An interesting capability introduced recently is the ability to create a view that spans both Amazon Redshift and Redshift Spectrum external tables. With this feature, you can query frequently accessed data in your Amazon Redshift cluster and less-frequently accessed data in Amazon S3, using a single view.
Leveraging Parquet for higher performance
Parquet is a columnar data format that provides superior performance and allows Redshift Spectrum (or Amazon Athena) to scan significantly less data. With less I/O, queries run faster and we pay less per query. You can read all about Parquet at https://parquet.apache.org/ or https://en.wikipedia.org/wiki/Apache_Parquet.
Lower cost
From a cost perspective, we pay standard rates for our data in Amazon S3, and only small amounts per query to analyze data with Redshift Spectrum. Using the Parquet format, we can significantly reduce the amount of data scanned. Our costs are now lower, and our users get fast results even for large complex queries.
What we learned about Amazon Redshift vs. Redshift Spectrum performance
When we first started looking at Redshift Spectrum, we wanted to put it to the test. We wanted to know how it would compare to Amazon Redshift, so we looked at two key questions:
What is the performance difference between Amazon Redshift and Redshift Spectrum on simple and complex queries?
Does the data format impact performance?
During the migration phase, we had our dataset stored in Amazon Redshift and S3 as CSV/GZIP and as Parquet file formats. We tested three configurations:
Amazon Redshift cluster with 28 DC1.large nodes
Redshift Spectrum using CSV/GZIP
Redshift Spectrum using Parquet
We performed benchmarks for simple and complex queries on one month’s worth of data. We tested how much time it took to perform the query, and how consistent the results were when running the same query multiple times. The data we used for the tests was already partitioned by date and hour. Properly partitioning the data improves performance significantly and reduces query times.
Simple query
First, we tested a simple query aggregating billing data across a month:
SELECT
user_id,
count(*) AS impressions,
SUM(billing)::decimal /1000000 AS billing
FROM <table_name>
WHERE
date >= '2017-08-01' AND
date <= '2017-08-31'
GROUP BY
user_id;
We ran the same query seven times and measured the response times (red marking the longest time and green the shortest time):
Execution Time (seconds)
Amazon Redshift
Redshift Spectrum CSV
Redshift Spectrum Parquet
Run #1
39.65
45.11
11.92
Run #2
15.26
43.13
12.05
Run #3
15.27
46.47
13.38
Run #4
21.22
51.02
12.74
Run #5
17.27
43.35
11.76
Run #6
16.67
44.23
13.67
Run #7
25.37
40.39
12.75
Average
21.53
44.82
12.61
For simple queries, Amazon Redshift performed better than Redshift Spectrum, as we thought, because the data is local to Amazon Redshift.
What was surprising was that using Parquet data format in Redshift Spectrum significantly beat ‘traditional’ Amazon Redshift performance. For our queries, using Parquet data format with Redshift Spectrum delivered an average 40% performance gain over traditional Amazon Redshift. Furthermore, Redshift Spectrum showed high consistency in execution time with a smaller difference between the slowest run and the fastest run.
Comparing the amount of data scanned when using CSV/GZIP and Parquet, the difference was also significant:
Data Scanned (GB)
CSV (Gzip)
135.49
Parquet
2.83
Because we pay only for the data scanned by Redshift Spectrum, the cost saving of using Parquet is evident and substantial.
Complex query
Next, we compared the same three configurations with a complex query.
Execution Time (seconds)
Amazon Redshift
Redshift Spectrum CSV
Redshift Spectrum Parquet
Run #1
329.80
84.20
42.40
Run #2
167.60
65.30
35.10
Run #3
165.20
62.20
23.90
Run #4
273.90
74.90
55.90
Run #5
167.70
69.00
58.40
Average
220.84
71.12
43.14
This time, Redshift Spectrum using Parquet cut the average query time by 80% compared to traditional Amazon Redshift!
Bottom line: For complex queries, Redshift Spectrum provided a 67% performance gain over Amazon Redshift. Using the Parquet data format, Redshift Spectrum delivered an 80% performance improvement over Amazon Redshift. For us, this was substantial.
Optimizing the data structure for different workloads
Because the cost of S3 is relatively inexpensive and we pay only for the data scanned by each query, we believe that it makes sense to keep our data in different formats for different workloads and different analytics engines. It is important to note that we can have any number of tables pointing to the same data on S3. It all depends on how we partition the data and update the table partitions.
Data permutations
For example, we have a process that runs every minute and generates statistics for the last minute of data collected. With Amazon Redshift, this would be done by running the query on the table with something as follows:
SELECT
user,
COUNT(*)
FROM
events_table
WHERE
ts BETWEEN ‘2017-08-01 14:00:00’ AND ‘2017-08-01 14:00:59’
GROUP BY
user;
(Assuming ‘ts’ is your column storing the time stamp for each event.)
With Redshift Spectrum, we pay for the data scanned in each query. If the data is partitioned by the minute instead of the hour, a query looking at one minute would be 1/60th the cost. If we use a temporary table that points only to the data of the last minute, we save that unnecessary cost.
Creating Parquet data efficiently
On the average, we have 800 instances that process our traffic. Each instance sends events that are eventually loaded into Amazon Redshift. When we started three years ago, we would offload data from each server to S3 and then perform a periodic copy command from S3 to Amazon Redshift.
Recently, Amazon Kinesis Firehose added the capability to offload data directly to Amazon Redshift. While this is now a viable option, we kept the same collection process that worked flawlessly and efficiently for three years.
This changed, however, when we incorporated Redshift Spectrum. With Redshift Spectrum, we needed to find a way to:
Collect the event data from the instances.
Save the data in Parquet format.
Partition the data effectively.
To accomplish this, we save the data as CSV and then transform it to Parquet. The most effective method to generate the Parquet files is to:
Send the data in one-minute intervals from the instances to Kinesis Firehose with an S3 temporary bucket as the destination.
Aggregate hourly data and convert it to Parquet using AWS Lambda and AWS Glue.
Add the Parquet data to S3 by updating the table partitions.
With this new process, we had to give more attention to validating the data before we sent it to Kinesis Firehose, because a single corrupted record in a partition fails queries on that partition.
Data validation
To store our click data in a table, we considered the following SQL create table command:
create external TABLE spectrum.blog_clicks (
user_id varchar(50),
campaign_id varchar(50),
os varchar(50),
ua varchar(255),
ts bigint,
billing float
)
partitioned by (date date, hour smallint)
stored as parquet
location 's3://nuviad-temp/blog/clicks/';
The above statement defines a new external table (all Redshift Spectrum tables are external tables) with a few attributes. We stored ‘ts’ as a Unix time stamp and not as Timestamp, and billing data is stored as float and not decimal (more on that later). We also said that the data is partitioned by date and hour, and then stored as Parquet on S3.
First, we need to get the table definitions. This can be achieved by running the following query:
SELECT
*
FROM
svv_external_columns
WHERE
tablename = 'blog_clicks';
This query lists all the columns in the table with their respective definitions:
schemaname
tablename
columnname
external_type
columnnum
part_key
spectrum
blog_clicks
user_id
varchar(50)
1
0
spectrum
blog_clicks
campaign_id
varchar(50)
2
0
spectrum
blog_clicks
os
varchar(50)
3
0
spectrum
blog_clicks
ua
varchar(255)
4
0
spectrum
blog_clicks
ts
bigint
5
0
spectrum
blog_clicks
billing
double
6
0
spectrum
blog_clicks
date
date
7
1
spectrum
blog_clicks
hour
smallint
8
2
Now we can use this data to create a validation schema for our data:
Next, we create a function that uses this schema to validate data:
function valueIsValid(value, item_schema) {
if (schema.type == 'string') {
return (typeof value == 'string' && value.length <= schema.max_length);
}
else if (schema.type == 'integer') {
return (typeof value == 'number' && value >= schema.min_value && value <= schema.max_value);
}
else if (schema.type == 'float' || schema.type == 'double') {
return (typeof value == 'number' && value >= schema.min_value && value <= schema.max_value);
}
else if (schema.type == 'boolean') {
return typeof value == 'boolean';
}
else if (schema.type == 'timestamp') {
return (new Date(value)).getTime() > 0;
}
else {
return true;
}
}
Near real-time data loading with Kinesis Firehose
On Kinesis Firehose, we created a new delivery stream to handle the events as follows:
Delivery stream name: events
Source: Direct PUT
S3 bucket: nuviad-events
S3 prefix: rtb/
IAM role: firehose_delivery_role_1
Data transformation: Disabled
Source record backup: Disabled
S3 buffer size (MB): 100
S3 buffer interval (sec): 60
S3 Compression: GZIP
S3 Encryption: No Encryption
Status: ACTIVE
Error logging: Enabled
This delivery stream aggregates event data every minute, or up to 100 MB, and writes the data to an S3 bucket as a CSV/GZIP compressed file. Next, after we have the data validated, we can safely send it to our Kinesis Firehose API:
if (validated) {
let itemString = item.join('|')+'\n'; //Sending csv delimited by pipe and adding new line
let params = {
DeliveryStreamName: 'events',
Record: {
Data: itemString
}
};
firehose.putRecord(params, function(err, data) {
if (err) {
console.error(err, err.stack);
}
else {
// Continue to your next step
}
});
}
Now, we have a single CSV file representing one minute of event data stored in S3. The files are named automatically by Kinesis Firehose by adding a UTC time prefix in the format YYYY/MM/DD/HH before writing objects to S3. Because we use the date and hour as partitions, we need to change the file naming and location to fit our Redshift Spectrum schema.
Automating data distribution using AWS Lambda
We created a simple Lambda function triggered by an S3 put event that copies the file to a different location (or locations), while renaming it to fit our data structure and processing flow. As mentioned before, the files generated by Kinesis Firehose are structured in a pre-defined hierarchy, such as:
All we need to do is parse the object name and restructure it as we see fit. In our case, we did the following (the event is an object received in the Lambda function with all the data about the object written to S3):
/*
object key structure in the event object:
your-prefix/2017/08/01/20/event-4-2017-08-01-20-06-06-536f5c40-6893-4ee4-907d-81e4d3b09455.gz
*/
let key_parts = event.Records[0].s3.object.key.split('/');
let event_type = key_parts[0];
let date = key_parts[1] + '-' + key_parts[2] + '-' + key_parts[3];
let hour = key_parts[4];
if (hour.indexOf('0') == 0) {
hour = parseInt(hour, 10) + '';
}
let parts1 = key_parts[5].split('-');
let minute = parts1[7];
if (minute.indexOf('0') == 0) {
minute = parseInt(minute, 10) + '';
}
Now, we can redistribute the file to the two destinations we need—one for the minute processing task and the other for hourly aggregation:
Kinesis Firehose stores the data in a temporary folder. We copy the object to another folder that holds the data for the last processed minute. This folder is connected to a small Redshift Spectrum table where the data is being processed without needing to scan a much larger dataset. We also copy the data to a folder that holds the data for the entire hour, to be later aggregated and converted to Parquet.
Because we partition the data by date and hour, we created a new partition on the Redshift Spectrum table if the processed minute is the first minute in the hour (that is, minute 0). We ran the following:
ALTER TABLE
spectrum.events
ADD partition
(date='2017-08-01', hour=0)
LOCATION 's3://nuviad-temp/events/2017-08-01/0/';
After the data is processed and added to the table, we delete the processed data from the temporary Kinesis Firehose storage and from the minute storage folder.
Migrating CSV to Parquet using AWS Glue and Amazon EMR
The simplest way we found to run an hourly job converting our CSV data to Parquet is using Lambda and AWS Glue (and thanks to the awesome AWS Big Data team for their help with this).
Creating AWS Glue jobs
What this simple AWS Glue script does:
Gets parameters for the job, date, and hour to be processed
Creates a Spark EMR context allowing us to run Spark code
Reads CSV data into a DataFrame
Writes the data as Parquet to the destination S3 bucket
Adds or modifies the Redshift Spectrum / Amazon Athena table partition for the table
Note: Because Redshift Spectrum and Athena both use the AWS Glue Data Catalog, we could use the Athena client to add the partition to the table.
Here are a few words about float, decimal, and double. Using decimal proved to be more challenging than we expected, as it seems that Redshift Spectrum and Spark use them differently. Whenever we used decimal in Redshift Spectrum and in Spark, we kept getting errors, such as:
S3 Query Exception (Fetch). Task failed due to an internal error. File 'https://s3-external-1.amazonaws.com/nuviad-temp/events/2017-08-01/hour=2/part-00017-48ae5b6b-906e-4875-8cde-bc36c0c6d0ca.c000.snappy.parquet has an incompatible Parquet schema for column 's3://nuviad-events/events.lat'. Column type: DECIMAL(18, 8), Parquet schema:\noptional float lat [i:4 d:1 r:0]\n (https://s3-external-1.amazonaws.com/nuviad-temp/events/2017-08-01/hour=2/part-00017-48ae5b6b-906e-4875-8cde-bc36c0c6d0ca.c000.snappy.parq
We had to experiment with a few floating-point formats until we found that the only combination that worked was to define the column as double in the Spark code and float in Spectrum. This is the reason you see billing defined as float in Spectrum and double in the Spark code.
Creating a Lambda function to trigger conversion
Next, we created a simple Lambda function to trigger the AWS Glue script hourly using a simple Python code:
Using Amazon CloudWatch Events, we trigger this function hourly. This function triggers an AWS Glue job named ‘convertEventsParquetHourly’ and runs it for the previous hour, passing job names and values of the partitions to process to AWS Glue.
Redshift Spectrum and Node.js
Our development stack is based on Node.js, which is well-suited for high-speed, light servers that need to process a huge number of transactions. However, a few limitations of the Node.js environment required us to create workarounds and use other tools to complete the process.
Node.js and Parquet
The lack of Parquet modules for Node.js required us to implement an AWS Glue/Amazon EMR process to effectively migrate data from CSV to Parquet. We would rather save directly to Parquet, but we couldn’t find an effective way to do it.
One interesting project in the works is the development of a Parquet NPM by Marc Vertes called node-parquet (https://www.npmjs.com/package/node-parquet). It is not in a production state yet, but we think it would be well worth following the progress of this package.
Timestamp data type
According to the Parquet documentation, Timestamp data are stored in Parquet as 64-bit integers. However, JavaScript does not support 64-bit integers, because the native number type is a 64-bit double, giving only 53 bits of integer range.
The result is that you cannot store Timestamp correctly in Parquet using Node.js. The solution is to store Timestamp as string and cast the type to Timestamp in the query. Using this method, we did not witness any performance degradation whatsoever.
Lessons learned
You can benefit from our trial-and-error experience.
Lesson #1: Data validation is critical
As mentioned earlier, a single corrupt entry in a partition can fail queries running against this partition, especially when using Parquet, which is harder to edit than a simple CSV file. Make sure that you validate your data before scanning it with Redshift Spectrum.
Lesson #2: Structure and partition data effectively
One of the biggest benefits of using Redshift Spectrum (or Athena for that matter) is that you don’t need to keep nodes up and running all the time. You pay only for the queries you perform and only for the data scanned per query.
Keeping different permutations of your data for different queries makes a lot of sense in this case. For example, you can partition your data by date and hour to run time-based queries, and also have another set partitioned by user_id and date to run user-based queries. This results in faster and more efficient performance of your data warehouse.
Storing data in the right format
Use Parquet whenever you can. The benefits of Parquet are substantial. Faster performance, less data to scan, and much more efficient columnar format. However, it is not supported out-of-the-box by Kinesis Firehose, so you need to implement your own ETL. AWS Glue is a great option.
Creating small tables for frequent tasks
When we started using Redshift Spectrum, we saw our Amazon Redshift costs jump by hundreds of dollars per day. Then we realized that we were unnecessarily scanning a full day’s worth of data every minute. Take advantage of the ability to define multiple tables on the same S3 bucket or folder, and create temporary and small tables for frequent queries.
Lesson #3: Combine Athena and Redshift Spectrum for optimal performance
Moving to Redshift Spectrum also allowed us to take advantage of Athena as both use the AWS Glue Data Catalog. Run fast and simple queries using Athena while taking advantage of the advanced Amazon Redshift query engine for complex queries using Redshift Spectrum.
Redshift Spectrum excels when running complex queries. It can push many compute-intensive tasks, such as predicate filtering and aggregation, down to the Redshift Spectrum layer, so that queries use much less of your cluster’s processing capacity.
Lesson #4: Sort your Parquet data within the partition
We achieved another performance improvement by sorting data within the partition using sortWithinPartitions(sort_field). For example:
We were extremely pleased with using Amazon Redshift as our core data warehouse for over three years. But as our client base and volume of data grew substantially, we extended Amazon Redshift to take advantage of scalability, performance, and cost with Redshift Spectrum.
Redshift Spectrum lets us scale to virtually unlimited storage, scale compute transparently, and deliver super-fast results for our users. With Redshift Spectrum, we store data where we want at the cost we want, and have the data available for analytics when our users need it with the performance they expect.
About the Author
With 7 years of experience in the AdTech industry and 15 years in leading technology companies, Rafi Ton is the founder and CEO of NUVIAD. He enjoys exploring new technologies and putting them to use in cutting edge products and services, in the real world generating real money. Being an experienced entrepreneur, Rafi believes in practical-programming and fast adaptation of new technologies to achieve a significant market advantage.
Step 3: Take EBS snapshots using EBS Snapshot Scheduler
In this section, I show you how to use EBS Snapshot Scheduler to take snapshots of your instances at specific intervals. To do this, I will show you how to:
Determine the schedule for EBS Snapshot Scheduler by providing you with best practices.
Tag your EC2 instances so that EBS Snapshot Scheduler backs up your instances when you want them backed up.
In addition to making sure your EC2 instances have all the available operating system patches applied on a regular schedule, you should take snapshots of the EBS storage volumes attached to your EC2 instances. Taking regular snapshots allows you to restore your data to a previous state quickly and cost effectively. With Amazon EBS snapshots, you pay only for the actual data you store, and snapshots save only the data that has changed since the previous snapshot, which minimizes your cost. You will use EBS Snapshot Scheduler to make regular snapshots of your EC2 instance. EBS Snapshot Scheduler takes advantage of other AWS services including CloudFormation, Amazon DynamoDB, and AWS Lambda to make backing up your EBS volumes simple.
Determine the schedule
As a best practice, you should back up your data frequently during the hours when your data changes the most. This reduces the amount of data you lose if you have to restore from a snapshot. For the purposes of this blog post, the data for my instances changes the most between the business hours of 9:00 A.M. to 5:00 P.M. Pacific Time. During these hours, I will make snapshots hourly to minimize data loss.
In addition to backing up frequently, another best practice is to establish a strategy for retention. This will vary based on how you need to use the snapshots. If you have compliance requirements to be able to restore for auditing, your needs may be different than if you are able to detect data corruption within three hours and simply need to restore to something that limits data loss to five hours. EBS Snapshot Scheduler enables you to specify the retention period for your snapshots. For this post, I only need to keep snapshots for recent business days. To account for weekends, I will set my retention period to three days, which is down from the default of 15 days when deploying EBS Snapshot Scheduler.
Deploy EBS Snapshot Scheduler
In Step 1 of Part 1 of this post, I showed how to configure an EC2 for Windows Server 2012 R2 instance with an EBS volume. You will use EBS Snapshot Scheduler to take eight snapshots each weekday of your EC2 instance’s EBS volumes:
Navigate to the EBS Snapshot Scheduler deployment page and choose Launch Solution. This takes you to the CloudFormation console in your account. The Specify an Amazon S3 template URL option is already selected and prefilled. Choose Next on the Select Template page.
On the Specify Details page, retain all default parameters except for AutoSnapshotDeletion. Set AutoSnapshotDeletion to Yes to ensure that old snapshots are periodically deleted. The default retention period is 15 days (you will specify a shorter value on your instance in the next subsection).
Choose Next twice to move to the Review step, and start deployment by choosing the I acknowledge that AWS CloudFormation might create IAM resources check box and then choosing Create.
Tag your EC2 instances
EBS Snapshot Scheduler takes a few minutes to deploy. While waiting for its deployment, you can start to tag your instance to define its schedule. EBS Snapshot Scheduler reads tag values and looks for four possible custom parameters in the following order:
<snapshot time> – Time in 24-hour format with no colon.
<retention days> – The number of days (a positive integer) to retain the snapshot before deletion, if set to automatically delete snapshots.
<time zone> – The time zone of the times specified in <snapshot time>.
<active day(s)> – all, weekdays, or mon, tue, wed, thu, fri, sat, and/or sun.
Because you want hourly backups on weekdays between 9:00 A.M. and 5:00 P.M. Pacific Time, you need to configure eight tags—one for each hour of the day. You will add the eight tags shown in the following table to your EC2 instance.
Tag
Value
scheduler:ebs-snapshot:0900
0900;3;utc;weekdays
scheduler:ebs-snapshot:1000
1000;3;utc;weekdays
scheduler:ebs-snapshot:1100
1100;3;utc;weekdays
scheduler:ebs-snapshot:1200
1200;3;utc;weekdays
scheduler:ebs-snapshot:1300
1300;3;utc;weekdays
scheduler:ebs-snapshot:1400
1400;3;utc;weekdays
scheduler:ebs-snapshot:1500
1500;3;utc;weekdays
scheduler:ebs-snapshot:1600
1600;3;utc;weekdays
Next, you will add these tags to your instance. If you want to tag multiple instances at once, you can use Tag Editor instead. To add the tags in the preceding table to your EC2 instance:
Navigate to your EC2 instance in the EC2 console and choose Tags in the navigation pane.
Choose Add/Edit Tags and then choose Create Tag to add all the tags specified in the preceding table.
Confirm you have added the tags by choosing Save. After adding these tags, navigate to your EC2 instance in the EC2 console. Your EC2 instance should look similar to the following screenshot.
After waiting a couple of hours, you can see snapshots beginning to populate on the Snapshots page of the EC2 console.
To check if EBS Snapshot Scheduler is active, you can check the CloudWatch rule that runs the Lambda function. If the clock icon shown in the following screenshot is green, the scheduler is active. If the clock icon is gray, the rule is disabled and does not run. You can enable or disable the rule by selecting it, choosing Actions, and choosing Enable or Disable. This also allows you to temporarily disable EBS Snapshot Scheduler.
You can also monitor when EBS Snapshot Scheduler has run by choosing the name of the CloudWatch rule as shown in the previous screenshot and choosing Show metrics for the rule.
In this section, I show you how to you use Amazon Inspector to scan your EC2 instance for common vulnerabilities and exposures (CVEs) and set up Amazon SNS notifications. To do this I will show you how to:
Install the Amazon Inspector agent by using EC2 Run Command.
Set up notifications using Amazon SNS to notify you of any findings.
Define an Amazon Inspector target and template to define what assessment to perform on your EC2 instance.
Schedule Amazon Inspector assessment runs to assess your EC2 instance on a regular interval.
Amazon Inspector can help you scan your EC2 instance using prebuilt rules packages, which are built and maintained by AWS. These prebuilt rules packages tell Amazon Inspector what to scan for on the EC2 instances you select. Amazon Inspector provides the following prebuilt packages for Microsoft Windows Server 2012 R2:
Common Vulnerabilities and Exposures
Center for Internet Security Benchmarks
Runtime Behavior Analysis
In this post, I’m focused on how to make sure you keep your EC2 instances patched, backed up, and inspected for common vulnerabilities and exposures (CVEs). As a result, I will focus on how to use the CVE rules package and use your instance tags to identify the instances on which to run the CVE rules. If your EC2 instance is fully patched using Systems Manager, as described earlier, you should not have any findings with the CVE rules package. Regardless, as a best practice I recommend that you use Amazon Inspector as an additional layer for identifying any unexpected failures. This involves using Amazon CloudWatch to set up weekly Amazon Inspector scans, and configuring Amazon Inspector to notify you of any findings through SNS topics. By acting on the notifications you receive, you can respond quickly to any CVEs on any of your EC2 instances to help ensure that malware using known CVEs does not affect your EC2 instances. In a previous blog post, Eric Fitzgerald showed how to remediate Amazon Inspector security findings automatically.
Install the Amazon Inspector agent
To install the Amazon Inspector agent, you will use EC2 Run Command, which allows you to run any command on any of your EC2 instances that have the Systems Manager agent with an attached IAM role that allows access to Systems Manager.
Choose Run Command under Systems Manager Services in the navigation pane of the EC2 console. Then choose Run a command.
To install the Amazon Inspector agent, you will use an AWS managed and provided command document that downloads and installs the agent for you on the selected EC2 instance. Choose AmazonInspector-ManageAWSAgent. To choose the target EC2 instance where this command will be run, use the tag you previously assigned to your EC2 instance, Patch Group, with a value of Windows Servers. For this example, set the concurrent installations to 1 and tell Systems Manager to stop after 5 errors.
Retain the default values for all other settings on the Run a command page and choose Run. Back on the Run Command page, you can see if the command that installed the Amazon Inspector agent executed successfully on all selected EC2 instances.
Set up notifications using Amazon SNS
Now that you have installed the Amazon Inspector agent, you will set up an SNS topic that will notify you of any findings after an Amazon Inspector run.
To set up an SNS topic:
In the AWS Management Console, choose Simple Notification Service under Messaging in the Services menu.
Choose Create topic, name your topic (only alphanumeric characters, hyphens, and underscores are allowed) and give it a display name to ensure you know what this topic does (I’ve named mine Inspector). Choose Create topic.
To allow Amazon Inspector to publish messages to your new topic, choose Other topic actions and choose Edit topic policy.
For Allow these users to publish messages to this topic and Allow these users to subscribe to this topic, choose Only these AWS users. Type the following ARN for the US East (N. Virginia) Region in which you are deploying the solution in this post: arn:aws:iam::316112463485:root. This is the ARN of Amazon Inspector itself. For the ARNs of Amazon Inspector in other AWS Regions, see Setting Up an SNS Topic for Amazon Inspector Notifications (Console). Amazon Resource Names (ARNs) uniquely identify AWS resources across all of AWS.
To receive notifications from Amazon Inspector, subscribe to your new topic by choosing Create subscription and adding your email address. After confirming your subscription by clicking the link in the email, the topic should display your email address as a subscriber. Later, you will configure the Amazon Inspector template to publish to this topic.
Define an Amazon Inspector target and template
Now that you have set up the notification topic by which Amazon Inspector can notify you of findings, you can create an Amazon Inspector target and template. A target defines which EC2 instances are in scope for Amazon Inspector. A template defines which packages to run, for how long, and on which target.
To create an Amazon Inspector target:
Navigate to the Amazon Inspector console and choose Get started. At the time of writing this blog post, Amazon Inspector is available in the US East (N. Virginia), US West (N. California), US West (Oregon), EU (Ireland), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Sydney), and Asia Pacific (Tokyo) Regions.
For Amazon Inspector to be able to collect the necessary data from your EC2 instance, you must create an IAM service role for Amazon Inspector. Amazon Inspector can create this role for you if you choose Choose or create role and confirm the role creation by choosing Allow.
Amazon Inspector also asks you to tag your EC2 instance and install the Amazon Inspector agent. You already performed these steps in Part 1 of this post, so you can proceed by choosing Next. To define the Amazon Inspector target, choose the previously used Patch Group tag with a Value of Windows Servers. This is the same tag that you used to define the targets for patching. Then choose Next.
Now, define your Amazon Inspector template, and choose a name and the package you want to run. For this post, use the Common Vulnerabilities and Exposures package and choose the default duration of 1 hour. As you can see, the package has a version number, so always select the latest version of the rules package if multiple versions are available.
Configure Amazon Inspector to publish to your SNS topic when findings are reported. You can also choose to receive a notification of a started run, a finished run, or changes in the state of a run. For this blog post, you want to receive notifications if there are any findings. To start, choose Assessment Templates from the Amazon Inspector console and choose your newly created Amazon Inspector assessment template. Choose the icon below SNS topics (see the following screenshot).
A pop-up appears in which you can choose the previously created topic and the events about which you want SNS to notify you (choose Finding reported).
Schedule Amazon Inspector assessment runs
The last step in using Amazon Inspector to assess for CVEs is to schedule the Amazon Inspector template to run using Amazon CloudWatch Events. This will make sure that Amazon Inspector assesses your EC2 instance on a regular basis. To do this, you need the Amazon Inspector template ARN, which you can find under Assessment templates in the Amazon Inspector console. CloudWatch Events can run your Amazon Inspector assessment at an interval you define using a Cron-based schedule. Cron is a well-known scheduling agent that is widely used on UNIX-like operating systems and uses the following syntax for CloudWatch Events.
All scheduled events use a UTC time zone, and the minimum precision for schedules is one minute. For more information about scheduling CloudWatch Events, see Schedule Expressions for Rules.
To create the CloudWatch Events rule:
Navigate to the CloudWatch console, choose Events, and choose Create rule.
On the next page, specify if you want to invoke your rule based on an event pattern or a schedule. For this blog post, you will select a schedule based on a Cron expression.
You can schedule the Amazon Inspector assessment any time you want using the Cron expression, or you can use the Cron expression I used in the following screenshot, which will run the Amazon Inspector assessment every Sunday at 10:00 P.M. GMT.
Choose Add target and choose Inspector assessment template from the drop-down menu. Paste the ARN of the Amazon Inspector template you previously created in the Amazon Inspector console in the Assessment template box and choose Create a new role for this specific resource. This new role is necessary so that CloudWatch Events has the necessary permissions to start the Amazon Inspector assessment. CloudWatch Events will automatically create the new role and grant the minimum set of permissions needed to run the Amazon Inspector assessment. To proceed, choose Configure details.
Next, give your rule a name and a description. I suggest using a name that describes what the rule does, as shown in the following screenshot.
Finish the wizard by choosing Create rule. The rule should appear in the Events – Rules section of the CloudWatch console.
To confirm your CloudWatch Events rule works, wait for the next time your CloudWatch Events rule is scheduled to run. For testing purposes, you can choose your CloudWatch Events rule and choose Edit to change the schedule to run it sooner than scheduled.
Now navigate to the Amazon Inspector console to confirm the launch of your first assessment run. The Start time column shows you the time each assessment started and the Status column the status of your assessment. In the following screenshot, you can see Amazon Inspector is busy Collecting data from the selected assessment targets.
You have concluded the last step of this blog post by setting up a regular scan of your EC2 instance with Amazon Inspector and a notification that will let you know if your EC2 instance is vulnerable to any known CVEs. In a previous Security Blog post, Eric Fitzgerald explained How to Remediate Amazon Inspector Security Findings Automatically. Although that blog post is for Linux-based EC2 instances, the post shows that you can learn about Amazon Inspector findings in other ways than email alerts.
Conclusion
In this two-part blog post, I showed how to make sure you keep your EC2 instances up to date with patching, how to back up your instances with snapshots, and how to monitor your instances for CVEs. Collectively these measures help to protect your instances against common attack vectors that attempt to exploit known vulnerabilities. In Part 1, I showed how to configure your EC2 instances to make it easy to use Systems Manager, EBS Snapshot Scheduler, and Amazon Inspector. I also showed how to use Systems Manager to schedule automatic patches to keep your instances current in a timely fashion. In Part 2, I showed you how to take regular snapshots of your data by using EBS Snapshot Scheduler and how to use Amazon Inspector to check if your EC2 instances running Microsoft Windows contain any common vulnerabilities and exposures (CVEs).
Contributed by Otavio Ferreira, Manager, Software Development, AWS Messaging
Like other developers around the world, you may be tackling increasingly complex business problems. A key success factor, in that case, is the ability to break down a large project scope into smaller, more manageable components. A service-oriented architecture guides you toward designing systems as a collection of loosely coupled, independently scaled, and highly reusable services. Microservices take this even further. To improve performance and scalability, they promote fine-grained interfaces and lightweight protocols.
However, the communication among isolated microservices can be challenging. Services are often deployed onto independent servers and don’t share any compute or storage resources. Also, you should avoid hard dependencies among microservices, to preserve maintainability and reusability.
If you apply the pub/sub design pattern, you can effortlessly decouple and independently scale out your microservices and serverless architectures. A pub/sub messaging service, such as Amazon SNS, promotes event-driven computing that statically decouples event publishers from subscribers, while dynamically allowing for the exchange of messages between them. An event-driven architecture also introduces the responsiveness needed to deal with complex problems, which are often unpredictable and asynchronous.
What is event-driven computing?
Given the context of microservices, event-driven computing is a model in which subscriber services automatically perform work in response to events triggered by publisher services. This paradigm can be applied to automate workflows while decoupling the services that collectively and independently work to fulfil these workflows. Amazon SNS is an event-driven computing hub, in the AWS Cloud, that has native integration with several AWS publisher and subscriber services.
Which AWS services publish events to SNS natively?
Several AWS services have been integrated as SNS publishers and, therefore, can natively trigger event-driven computing for a variety of use cases. In this post, I specifically cover AWS compute, storage, database, and networking services, as depicted below.
Compute services
Auto Scaling: Helps you ensure that you have the correct number of Amazon EC2 instances available to handle the load for your application. You can configure Auto Scaling lifecycle hooks to trigger events, as Auto Scaling resizes your EC2 cluster.As an example, you may want to warm up the local cache store on newly launched EC2 instances, and also download log files from other EC2 instances that are about to be terminated. To make this happen, set an SNS topic as your Auto Scaling group’s notification target, then subscribe two Lambda functions to this SNS topic. The first function is responsible for handling scale-out events (to warm up cache upon provisioning), whereas the second is in charge of handling scale-in events (to download logs upon termination).
AWS Elastic Beanstalk: An easy-to-use service for deploying and scaling web applications and web services developed in a number of programming languages. You can configure event notifications for your Elastic Beanstalk environment so that notable events can be automatically published to an SNS topic, then pushed to topic subscribers.As an example, you may use this event-driven architecture to coordinate your continuous integration pipeline (such as Jenkins CI). That way, whenever an environment is created, Elastic Beanstalk publishes this event to an SNS topic, which triggers a subscribing Lambda function, which then kicks off a CI job against your newly created Elastic Beanstalk environment.
Elastic Load Balancing:Automatically distributes incoming application traffic across Amazon EC2 instances, containers, or other resources identified by IP addresses.You can configure CloudWatch alarms on Elastic Load Balancing metrics, to automate the handling of events derived from Classic Load Balancers. As an example, you may leverage this event-driven design to automate latency profiling in an Amazon ECS cluster behind a Classic Load Balancer. In this example, whenever your ECS cluster breaches your load balancer latency threshold, an event is posted by CloudWatch to an SNS topic, which then triggers a subscribing Lambda function. This function runs a task on your ECS cluster to trigger a latency profiling tool, hosted on the cluster itself. This can enhance your latency troubleshooting exercise by making it timely.
Amazon S3:Object storage built to store and retrieve any amount of data.You can enable S3 event notifications, and automatically get them posted to SNS topics, to automate a variety of workflows. For instance, imagine that you have an S3 bucket to store incoming resumes from candidates, and a fleet of EC2 instances to encode these resumes from their original format (such as Word or text) into a portable format (such as PDF).In this example, whenever new files are uploaded to your input bucket, S3 publishes these events to an SNS topic, which in turn pushes these messages into subscribing SQS queues. Then, encoding workers running on EC2 instances poll these messages from the SQS queues; retrieve the original files from the input S3 bucket; encode them into PDF; and finally store them in an output S3 bucket.
Amazon EFS: Provides simple and scalable file storage, for use with Amazon EC2 instances, in the AWS Cloud.You can configure CloudWatch alarms on EFS metrics, to automate the management of your EFS systems. For example, consider a highly parallelized genomics analysis application that runs against an EFS system. By default, this file system is instantiated on the “General Purpose” performance mode. Although this performance mode allows for lower latency, it might eventually impose a scaling bottleneck. Therefore, you may leverage an event-driven design to handle it automatically.Basically, as soon as the EFS metric “Percent I/O Limit” breaches 95%, CloudWatch could post this event to an SNS topic, which in turn would push this message into a subscribing Lambda function. This function automatically creates a new file system, this time on the “Max I/O” performance mode, then switches the genomics analysis application to this new file system. As a result, your application starts experiencing higher I/O throughput rates.
Amazon Glacier: A secure, durable, and low-cost cloud storage service for data archiving and long-term backup.You can set a notification configuration on an Amazon Glacier vault so that when a job completes, a message is published to an SNS topic. Retrieving an archive from Amazon Glacier is a two-step asynchronous operation, in which you first initiate a job, and then download the output after the job completes. Therefore, SNS helps you eliminate polling your Amazon Glacier vault to check whether your job has been completed, or not. As usual, you may subscribe SQS queues, Lambda functions, and HTTP endpoints to your SNS topic, to be notified when your Amazon Glacier job is done.
AWS Snowball: A petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data.You can leverage Snowball notifications to automate workflows related to importing data into and exporting data from AWS. More specifically, whenever your Snowball job status changes, Snowball can publish this event to an SNS topic, which in turn can broadcast the event to all its subscribers.As an example, imagine a Geographic Information System (GIS) that distributes high-resolution satellite images to users via Web browser. In this example, the GIS vendor could capture up to 80 TB of satellite images; create a Snowball job to import these files from an on-premises system to an S3 bucket; and provide an SNS topic ARN to be notified upon job status changes in Snowball. After Snowball changes the job status from “Importing” to “Completed”, Snowball publishes this event to the specified SNS topic, which delivers this message to a subscribing Lambda function, which finally creates a CloudFront web distribution for the target S3 bucket, to serve the images to end users.
Amazon RDS: Makes it easy to set up, operate, and scale a relational database in the cloud.RDS leverages SNS to broadcast notifications when RDS events occur. As usual, these notifications can be delivered via any protocol supported by SNS, including SQS queues, Lambda functions, and HTTP endpoints.As an example, imagine that you own a social network website that has experienced organic growth, and needs to scale its compute and database resources on demand. In this case, you could provide an SNS topic to listen to RDS DB instance events. When the “Low Storage” event is published to the topic, SNS pushes this event to a subscribing Lambda function, which in turn leverages the RDS API to increase the storage capacity allocated to your DB instance. The provisioning itself takes place within the specified DB maintenance window.
Amazon ElastiCache: A web service that makes it easy to deploy, operate, and scale an in-memory data store or cache in the cloud.ElastiCache can publish messages using Amazon SNS when significant events happen on your cache cluster. This feature can be used to refresh the list of servers on client machines connected to individual cache node endpoints of a cache cluster. For instance, an ecommerce website fetches product details from a cache cluster, with the goal of offloading a relational database and speeding up page load times. Ideally, you want to make sure that each web server always has an updated list of cache servers to which to connect.To automate this node discovery process, you can get your ElastiCache cluster to publish events to an SNS topic. Thus, when ElastiCache event “AddCacheNodeComplete” is published, your topic then pushes this event to all subscribing HTTP endpoints that serve your ecommerce website, so that these HTTP servers can update their list of cache nodes.
Amazon Redshift: A fully managed data warehouse that makes it simple to analyze data using standard SQL and BI (Business Intelligence) tools.Amazon Redshift uses SNS to broadcast relevant events so that data warehouse workflows can be automated. As an example, imagine a news website that sends clickstream data to a Kinesis Firehose stream, which then loads the data into Amazon Redshift, so that popular news and reading preferences might be surfaced on a BI tool. At some point though, this Amazon Redshift cluster might need to be resized, and the cluster enters a ready-only mode. Hence, this Amazon Redshift event is published to an SNS topic, which delivers this event to a subscribing Lambda function, which finally deletes the corresponding Kinesis Firehose delivery stream, so that clickstream data uploads can be put on hold.At a later point, after Amazon Redshift publishes the event that the maintenance window has been closed, SNS notifies a subscribing Lambda function accordingly, so that this function can re-create the Kinesis Firehose delivery stream, and resume clickstream data uploads to Amazon Redshift.
AWS DMS: Helps you migrate databases to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database.DMS also uses SNS to provide notifications when DMS events occur, which can automate database migration workflows. As an example, you might create data replication tasks to migrate an on-premises MS SQL database, composed of multiple tables, to MySQL. Thus, if replication tasks fail due to incompatible data encoding in the source tables, these events can be published to an SNS topic, which can push these messages into a subscribing SQS queue. Then, encoders running on EC2 can poll these messages from the SQS queue, encode the source tables into a compatible character set, and restart the corresponding replication tasks in DMS. This is an event-driven approach to a self-healing database migration process.
Amazon Route 53: A highly available and scalable cloud-based DNS (Domain Name System). Route 53 health checks monitor the health and performance of your web applications, web servers, and other resources.You can set CloudWatch alarms and get automated Amazon SNS notifications when the status of your Route 53 health check changes. As an example, imagine an online payment gateway that reports the health of its platform to merchants worldwide, via a status page. This page is hosted on EC2 and fetches platform health data from DynamoDB. In this case, you could configure a CloudWatch alarm for your Route 53 health check, so that when the alarm threshold is breached, and the payment gateway is no longer considered healthy, then CloudWatch publishes this event to an SNS topic, which pushes this message to a subscribing Lambda function, which finally updates the DynamoDB table that populates the status page. This event-driven approach avoids any kind of manual update to the status page visited by merchants.
AWS Direct Connect (AWS DX): Makes it easy to establish a dedicated network connection from your premises to AWS, which can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.You can monitor physical DX connections using CloudWatch alarms, and send SNS messages when alarms change their status. As an example, when a DX connection state shifts to 0 (zero), indicating that the connection is down, this event can be published to an SNS topic, which can fan out this message to impacted servers through HTTP endpoints, so that they might reroute their traffic through a different connection instead. This is an event-driven approach to connectivity resilience.
In addition to SNS, event-driven computing is also addressed by Amazon CloudWatch Events, which delivers a near real-time stream of system events that describe changes in AWS resources. With CloudWatch Events, you can route each event type to one or more targets, including:
Many AWS services publish events to CloudWatch. As an example, you can get CloudWatch Events to capture events on your ETL (Extract, Transform, Load) jobs running on AWS Glue and push failed ones to an SQS queue, so that you can retry them later.
Conclusion
Amazon SNS is a pub/sub messaging service that can be used as an event-driven computing hub to AWS customers worldwide. By capturing events natively triggered by AWS services, such as EC2, S3 and RDS, you can automate and optimize all kinds of workflows, namely scaling, testing, encoding, profiling, broadcasting, discovery, failover, and much more. Business use cases presented in this post ranged from recruiting websites, to scientific research, geographic systems, social networks, retail websites, and news portals.
Now that you can reserve seating in AWS re:Invent 2017 breakout sessions, workshops, chalk talks, and other events, the time is right to review the list of introductory, advanced, and expert content being offered this year. To learn more about breakout content types and levels, see Breakout Content.
SID201 – IAM for Enterprises: How Vanguard Strikes the Balance Between Agility, Governance, and Security For Vanguard, managing the creation of AWS Identity and Access Management (IAM) objects is key to balancing developer velocity and compliance. In this session, you learn how Vanguard designs IAM roles to control the blast radius of AWS resources and maintain simplicity for developers. Vanguard will also share best practices to help you manage governance and improve your visibility across your AWS resources.
SID202 – Deep dive about how Capital One automates the delivery of directory services across AWS accounts Traditional solutions for using Microsoft Active Directory across on-premises and AWS Cloud Windows workloads can require complex networking or syncing identities across multiple systems. AWS Directory Service for Microsoft Active Directory, also known as AWS Managed AD, offers you actual Microsoft Active Directory in the AWS Cloud as a managed service. In this session, you will learn how Capital One uses AWS Managed AD to provide highly available authentication and authorization services for its Windows workloads, such as Amazon RDS for SQL Server.
SID205 – Building the Largest Repo for Serverless Compliance-as-Code When you use the cloud to enable speed and agility, how do you know if you’ve done it correctly? We are on a mission to help builders follow industry best practices within security guardrails by creating the largest compliance-as-code repository, available to all. Compliance-as-code is the idea to translate best practices, guardrails, policies, and standards into codified unit testing. Apply this to your AWS environment to provide insights about what can or must be improved. Learn why compliance-as-code matters to gain speed (by getting developers, architects, and security pros on the same page), how it is currently used (demo), and how to start using it or being part of building it.
SID206 – Best Practices for Managing Security Operation on AWS To help prevent unexpected access to your AWS resources, it is critical to maintain strong identity and access policies and track, detect, and react to changes. In this session, you will learn how to use AWS Identity and Access Management (IAM) to control access to AWS resources and integrate your existing authentication system with IAM. We will cover how to deploy and control AWS infrastructure using code templates, including change management policies with AWS CloudFormation.
SID207 – Feedback Security in the Cloud Like many security teams, Riot has been challenged by new paradigms that came with the move to the cloud. We discuss how our security team has developed a security culture based on feedback and self-service to best thrive in the cloud. We detail how the team assessed the security gaps and challenges in our move to AWS, and then describe how the team works within Riot’s unique feedback culture.
SID208 – Less (Privilege) Is More: Getting Least Privilege Right in AWS AWS services are designed to enable control through AWS Identity and Access Management (IAM) and Amazon Virtual Private Cloud (VPC). Join us in this chalk talk to learn how to apply these toward the security principal of least privilege for applications and data and how to practically integrate them in your security operations.
SID209 – Designing and Deploying an AWS Account Factory AWS customers start off with one AWS account, but quickly realize the benefits of having multiple AWS accounts. A common learning curve for customers is how to securely baseline and set up new accounts at scale. This talk helps you understand how to use AWS Organizations, AWS Identity and Access Management (IAM), AWS CloudFormation, and other tools to baseline new accounts, set them up for federation, and make a secure and repeatable account factory to create new AWS accounts. Walk away with demos and tools to use in your own environment.
SID210 – A CISO’s Journey at Vonage: Achieving Unified Security at Scale Making sense of the risks of IT deployments that sit in hybrid environments and span multiple countries is a major challenge. When you add in multiple toolsets and global compliance requirements, including GDPR, it can get overwhelming. Listen to Vonage’s Chief Information Security Officer, Johan Hybinette, share his experiences tackling these challenges.
SID212 – Maximizing Your Move to AWS – Five Key Lessons from Vanguard and Cloud Technology Partners CTP’s Robert Christiansen and Mike Kavis describe how to maximize the value of your AWS initiative. From building a Minimum Viable Cloud to establishing a cloud robust security and compliance posture, we walk through key client success stories and lessons learned. We also explore how CTP has helped Vanguard, the leading provider of investor communications and technology, take advantage of AWS to delight customers, drive new revenue streams, and transform their business.
SID213 – Managing Regulator Expectations – Lessons Learned on Positioning AWS Services from an Audit Perspective Cloud migration in highly regulated industries can stall without a solid understanding of how (and when) to address regulatory expectations. This session provides a guide to explaining the aspects of AWS services that are most frequently the subject of an internal or regulatory audit. Because regulatory agencies and internal auditors might not share a common understanding of the cloud, this session is designed to help you to help them, regardless of their level of technical fluency.
SID214 – Best Security Practices in the Intelligence Community Executives from the Intelligence community discuss cloud security best practices in a field where security is imperative to operations. CIA security cloud chief John Nicely and NGA security cloud chief Scot Kaplan share success stories of migrating mass data to the cloud from a security perspective. Hear how they migrated their IT portfolios while managing their organizations’ unique blend of constraints, budget issues, politics, culture, and security pressures. Learn how these institutions overcame barriers to migration, and ask these panelists what actions you can take to better prepare yourself for the journey of mass migration to the cloud.
SID216 – Defending Diverse Applications Against Common Threats In this session, you learn how to adapt application defenses and operational responses based on your unique requirements. You also hear directly from customers about how they architected their applications on AWS to protect their applications. There are many ways to build secure, high-availability applications in the cloud. Services such as Amazon API Gateway, Amazon VPC, ALB, ELB, and Amazon EC2 are the basic building blocks that enable you to address a wide range of use cases. Best practices for defending your applications against Distributed Denial of Service (DDoS) attacks, exploitation attempts, and bad bots can vary with your choices in architecture.
Advanced level
SID301 – Using AWS Lambda as a Security Team Operating a security practice on AWS brings many new challenges that haven’t been faced in data center environments. The dynamic nature of infrastructure, the relationship between development team members and their applications, and the architecture paradigms have all changed as a result of building software on top of AWS. In this session, learn how your security team can leverage AWS Lambda as a tool to monitor, audit, and enforce your security policies within an AWS environment.
SID302 – Force Multiply Your Security Team with Automation and Alexa Adversaries automate. Who says the good guys can’t as well? By combining AWS offerings like AWS CloudTrail, Amazon Cloudwatch, AWS Config, and AWS Lambda with the power of Amazon Alexa, you can do more security tasks faster, with fewer resources. Force multiplying your security team is all about automation! Last year, we showed off penetration testing at the push of an (AWS IoT) button, and surprise-previewed how to ask Alexa to run Inspector as-needed. Want to see other ways to ask Alexa to be your cloud security sidekick? We have crazy new demos at the ready to show security geeks how to sling security automation solutions for their AWS environments (and impress and help your boss, too).
SID303 – How You Can Use AWS’s Identity Services to be Successful on Your AWS Cloud Journey Every journey to the AWS Cloud is unique. Some customers are migrating existing applications, while others are building new applications using cloud-native services. Along each of these journeys, identity and access management helps customers protect their applications and resources. In this session, you will learn how AWS’s identity services provide you a secure, flexible, and easy solution for managing identities and access on the AWS Cloud. With AWS’’s Identity Services, you do not have to adapt to AWS. Instead, you have a choice of services designed to meet you anywhere along your journey to the AWS Cloud. Every journey to the AWS Cloud is unique. Some customers are migrating existing applications, while others are building new applications using cloud-native services.
SID304 – SecOps 2021 Today: Using AWS Services to Deliver SecOps This talk dives deep on how to build end-to-end security capabilities using AWS. Our goal is orchestrating AWS Security services with other AWS building blocks to deliver enhanced security. We cover working with AWS CloudWatch Events as a queueing mechanism for processing security events, using Amazon DynamoDB to provide a stateful layer to provide tailored response to events and other ancillary functions, using DynamoDB as an attack signature engine, and the use of analytics to derive tailored signatures for detection with AWS Lambda.
SID305 – How CrowdStrike Built a Real-time Security Monitoring Service on AWS The CrowdStrike motto is “We Stop Breaches.” To do that, it needed to build a real-time security monitoring service to detect threats. Join this session to learn how Crowdstrike uses Amazon EC2 and Amazon EBS to help its customers identify vulnerabilities before they become large-scale problems.
SID306 – How Chick-fil-A Embraces DevSecOps on AWS As Chick-fil-A became a cloud-first organization, their security team didn’t want to become the bottleneck for agility. But the security team also wanted to raise the bar for their security posture on AWS. Robert Davis, security architect at Chick-fil-A, provides an overview about how he and his team recognized that writing code was the best way for their security policies to scale across the many AWS accounts that Chick-fil-A operates.
SID307 – Serverless for Security Officers: Paradigm Walkthrough and Comprehensive Security Best Practices For security practitioners, serverless represents a context switch from the familiar servers and networks to a decentralized set of code snippets and AWS platform constructs. This new ecosystem also represents new operational teams, data flows, security tooling, and faster-then-ever change velocity. In this talk, we perform live demos and provide code samples for a wide array of security best practices aligned to industry standards such as NIST 800-53 and ISO 27001.
SID308 – Multi-Account Strategies We will explore a multi-account architecture and how to approach the design/thought process around it. This chalk talk will allow attendees to dive deep into the topic and discuss the nuances of the architecture as well as provide feedback around the approach.
SID309 – Credentials, Credentials, Credentials, Oh My! For new and experienced customers alike, understanding the various credential forms and exchange mechanisms within AWS can be a daunting exercise. In this chalk talk, we clear up the confusion by performing a cartography exercise. We visually depict the right source credentials (for example, enterprise user name and password, IAM keys, AWS STS tokens, and so on) and transformation mechanisms (for example, AssumeRole and so on) to use depending on what you’re trying to do and where you’re coming from.
SID310 – Moving from the Shadows to the Throne What do you do when leadership embraces what was called “shadow IT” as the new path forward? How do you onboard new accounts while simultaneously pushing policy to secure all existing accounts? This session walks through Cisco’s journey consolidating over 700 existing accounts in the Cisco organization, while building and applying Cisco’s new cloud policies.
SID311 – Designing Security and Governance Across a Multi-Account Strategy When organizations plan their journey to cloud adoption at scale, they quickly encounter questions such as: How many accounts do we need? How do we share resources? How do we integrate with existing identity solutions? In this workshop, we present best practices and give you the hands-on opportunity to test and develop best practices. You will work in teams to set up and create an AWS environment that is enterprise-ready for application deployment and integration into existing operations, security, and procurement processes. You will get hands-on experience with cross-account roles, consolidated logging, account governance and other challenges to solve.
SID312 – DevSecOps Capture the Flag In this Capture the Flag workshop, we divide groups into teams and work on AWS CloudFormation DevSecOps. The AWS Red Team supplies an AWS DevSecOps Policy that needs to be enforced via CloudFormation static analysis. Participant Blue Teams are provided with an AWS Lambda-based reference architecture to be used to inspect CloudFormation templates against that policy. Interesting items need to be logged, and made visible via ChatOps. Dangerous items need to be logged, and recorded accurately as a template fail. The secondary challenge is building a CloudFormation template to thwart the controls being created by the other Blue teams.
SID313 – Continuous Compliance on AWS at Scale In cloud migrations, the cloud’s elastic nature is often touted as a critical capability in delivering on key business initiatives. However, you must account for it in your security and compliance plans or face some real challenges. Always counting on a virtual host to be running, for example, causes issues when that host is rebooted or retired. Managing security and compliance in the cloud is continuous, requiring forethought and automation. Learn how a leading, next generation managed cloud provider uses automation and cloud expertise to manage security and compliance at scale in an ever-changing environment.
SID314 – IAM Policy Ninja Are you interested in learning how to control access to your AWS resources? Have you wondered how to best scope permissions to achieve least-privilege permissions access control? If your answer is “yes,” this session is for you. We look at the AWS Identity and Access Management (IAM) policy language, starting with the basics of the policy language and how to create and attach policies to IAM users, groups, and roles. We explore policy variables, conditions, and tools to help you author least privilege policies. We cover common use cases, such as granting a user secure access to an Amazon S3 bucket or to launch an Amazon EC2 instance of a specific type.
SID315 – Security and DevOps: Agility and Teamwork In this session, you learn pragmatic steps to integrate security controls into DevOps processes in your AWS environment at scale. Cybersecurity expert and founder of Alert Logic Misha Govshteyn shares insights from high performing teams who are embracing the reality that an agile security program can enable faster and more secure workload deployments. Joining Misha is Joey Peloquin, Director of Cloud Security Operations at Citrix, who discusses Citrix’s DevOps experiences and how they manage their cybersecurity posture within the AWS Cloud. Session sponsored by Alert Logic.
SID316 – Using Access Advisor to Strike the Balance Between Security and Usability AWS provides a killer feature for security operations teams: Access Advisor. In this session, we discuss how Access Advisor shows the services to which an IAM policy grants access and provides a timestamp for the last time that the role authenticated against that service. At Netflix, we use this valuable data to automatically remove permissions that are no longer used. By continually removing excess permissions, we can achieve a balance of empowering developers and maintaining a best-practice, secure environment.
SID317 – Automating Security and Compliance Testing of Infrastructure-as-Code for DevSecOps Infrastructure-as-Code (IaC) has emerged as an essential element of organizational DevOps practices. Tools such as AWS CloudFormation and Terraform allow software-defined infrastructure to be deployed quickly and repeatably to AWS. But the agility of CI/CD pipelines also creates new challenges in infrastructure security hardening. This session provides a foundation for how to bring proven software hardening practices into the world of infrastructure deployment. We discuss how to build security and compliance tests for infrastructure analogous to unit tests for application code, and showcase how security, compliance and governance testing fit in a modern CI/CD pipeline.
SID318 – From Obstacle to Advantage: The Changing Role of Security & Compliance in Your Organization A surprising trend is starting to emerge among organizations who are progressing through the cloud maturity lifecycle: major improvements in revenue growth, customer satisfaction, and mission success are being directly attributed to improvements in security and compliance. At one time thought of as speed bumps in the path to deployment, security and compliance are now seen as critical ingredients that help organizations differentiate their offerings in the market, win more deals, and achieve mission-critical goals faster. This session explores how organizations like Jive Software and the National Geospatial Agency use the Evident Security Platform, AWS, and AWS Quick Starts to automate security and compliance processes in their organization to accomplish more, do it faster, and deliver better results.
SID319 – Incident Response in the Cloud In this session, we walk you through a hypothetical incident response managed on AWS. Learn how to apply existing best practices as well as how to leverage the unique security visibility, control, and automation that AWS provides. We cover how to set up your AWS environment to prevent a security event and how to build a cloud-specific incident response plan so that your organization is prepared before a security event occurs. This session also covers specific environment recovery steps available on AWS.
SID320 – Fraud Prevention, Detection, Lessons Learned, and Best Practices Fighting fraud means countering human actors that quickly adapt to whatever you do to stop them. In this presentation, we discuss the key components of a fraud prevention program in the cloud. Additionally, we provide techniques for detecting known and unknown fraud activity and explore different strategies for effectively preventing detected patterns. Finally, we discuss lessons learned from our own prevention activities as well as the best practices that you can apply to manage risk.
SID321 – How Capital One Applies AWS Organizations Best Practices to Manage Multiple AWS Accounts In this session, we review best practices for managing multiple AWS accounts using AWS Organizations. We cover how to think about the master account and your account strategy, as well as how to roll out changes. You learn how Capital One applies these best practices to manage its AWS accounts, which number over 160, and PCI workloads.
SID322 – The AWS Philosophy of Security AWS distinguished engineer Eric Brandwine speaks with hundreds of customers each year, and noticed one question coming up more than any other, “How does AWS operationalize its own security?” In this session, Eric details both strategic and tactical considerations, along with an insider’s look at AWS tooling and processes.
SID324 – Automating DDoS Response in the Cloud If left unmitigated, Distributed Denial of Service (DDoS) attacks have the potential to harm application availability or impair application performance. DDoS attacks can also act as a smoke screen for intrusion attempts or as a harbinger for attacks against non-cloud infrastructure. Accordingly, it’s crucial that developers architect for DDoS resiliency and maintain robust operational capabilities that allow for rapid detection and engagement during high-severity events. In this session, you learn how to build a DDoS-resilient application and how to use services like AWS Shield and Amazon CloudWatch to defend against DDoS attacks and automate response to attacks in progress.
SID325 – Amazon Macie: Data Visibility Powered by Machine Learning for Security and Compliance Workloads In this session, Edmunds discusses how they create workflows to manage their regulated workloads with Amazon Macie, a newly-released security and compliance management service that leverages machine learning to classify your sensitive data and business-critical information. Amazon Macie uses recurrent neural networks (RNN) to identify and alert potential misuse of intellectual property. They do a deep dive into machine learning within the security ecosystem.
SID326 – AWS Security State of the Union Steve Schmidt, chief information security officer at AWS, addresses the current state of security in the cloud, with a particular focus on feature updates, the AWS internal “secret sauce,” and what’s on horizon in terms of security, identity, and compliance tooling.
SID327 – How Zocdoc Achieved Security and Compliance at Scale With Infrastructure as Code In less than 12 months, Zocdoc became a cloud-first organization, diversifying their tech stack and liberating data to help drive rapid product innovation. Brian Lozada, CISO at Zocdoc, and Zhen Wang, Director of Engineering, provide an overview on how their teams recognized that infrastructure as code was the most effective approach for their security policies to scale across their AWS infrastructure. They leveraged tools such as AWS CloudFormation, hardened AMIs, and hardened containers. The use of DevSecOps within Zocdoc has enhanced data protection with the use of AWS services such as AWS KMS and AWS CloudHSM and auditing capabilities, and event-based policy enforcement with Amazon Elasticsearch Service and Amazon CloudWatch, all built on top of AWS.
SID328 – Cloud Adoption in Regulated Financial Services Macquarie, a global provider of financial services, identified early on that it would require strong partnership between its business, technology and risk teams to enable the rapid adoption of AWS cloud technologies. As a result, Macquarie built a Cloud Governance Platform to enable its risk functions to move as quickly as its development teams. This platform has been the backbone of Macquarie’s adoption of AWS over the past two years and has enabled Macquarie to accelerate its use of cloud technologies for the benefit of clients across multiple global markets. This talk will outline the strategy that Macquarie embarked on, describe the platform they built, and provide examples of other organizations who are on a similar journey.
SID329 – A Deep Dive into AWS Encryption Services AWS Encryption Services provide an easy and cost-effective way to protect your data in AWS. In this session, you learn about leveraging the latest encryption management features to minimize risk for your data.
SID330 – Best Practices for Implementing Your Encryption Strategy Using AWS Key Management Service AWS Key Management Service (KMS) is a managed service that makes it easy for you to create and manage the encryption keys used to encrypt your data. In this session, we will dive deep into best practices learned by implementing AWS KMS at AWS’s largest enterprise clients. We will review the different capabilities described in the AWS Cloud Adoption Framework (CAF) Security Perspective and how to implement these recommendations using AWS KMS. In addition to sharing recommendations, we will also provide examples that will help you protect sensitive information on the AWS Cloud.
SID331 – Architecting Security and Governance Across a Multi-Account Strategy Whether it is per business unit or per application, many AWS customers use multiple accounts to meet their infrastructure isolation, separation of duties, and billing requirements. In this session, we discuss considerations, limitations, and security patterns when building out a multi-account strategy. We explore topics such as identity federation, cross-account roles, consolidated logging, and account governance. Thomson Reuters shared their journey and their approach to a multi-account strategy. At the end of the session, we present an enterprise-ready, multi-account architecture that you can start leveraging today.
SID332 – Identity Management for Your Users and Apps: A Deep Dive on Amazon Cognito Learn how to set up an end-user directory, secure sign-up and sign-in, manage user profiles, authenticate and authorize your APIs, federate from enterprise and social identity providers, and use OAuth to integrate with your app—all without any server setup or code. With clear blueprints, we show you how to leverage Amazon Cognito to administer and secure your end users and enable identity for the applied patterns of mobile, web, and enterprise apps.
SID334 – How Amazon Business Uses Amazon Cloud Directory as the Data Store for Its Account Management Platform Join the Amazon Business team to learn how it uses Amazon Cloud Directory as the data store for its account management platform. You will learn how Amazon Business uses Amazon DynamoDB with Cloud Directory to manage user authorization and business process workflows. You also will learn how Cloud Directory helps to manage hierarchical datasets and how to get started modeling these datasets in Cloud Directory.
SID335 – Implementing Security and Governance across a Multi-Account Strategy As existing or new organizations expand their AWS footprint, managing multiple accounts while maintaining security quickly becomes a challenge. In this chalk talk, we will demonstrate how AWS Organizations, IAM roles, identity federation, and cross-account manager can be combined to build a scalable multi-account management platform. By the end of this session, attendees will have the understanding and deployment patterns to bring a secure, flexible and automated multi-account management platform to their organizations.
SID336 – Use AWS to Effectively Manage GDPR Compliance The General Data Protection Regulation (GDPR) is considered to be the most stringent privacy regulation ever enacted. Complying with GDPR could be a challenge for organizations, and AWS services can help get you ahead of the May 2018 enforcement deadline. In this chalk talk, the Legal and Compliance GDPR leadership at AWS discusses what enforcement of GDPR might mean for you and your customer’s compliance programs.
SID337 – Best Practices for Managing Access to AWS Resources Using IAM Roles In this chalk talk, we discuss why using temporary security credentials to manage access to your AWS resources is an AWS Identity and Access Management (IAM) best practice. IAM roles help you follow this best practice by delivering and rotating temporary credentials automatically. We discuss the different types of IAM roles, the assume role functionality, and how to author fine-grained trust and access policies that limit the scope of IAM roles. We then show you how to attach IAM roles to your AWS resources, such as Amazon EC2 instances and AWS Lambda functions. We also discuss migrating applications that use long-term AWS access keys to temporary credentials managed by IAM roles.
SID338 – [email protected] Once a customer achieves success with using AWS in a few pilot projects, most look to rapidly adopt an “all-in” enterprise migration strategy. Along this journey, several new challenges emerge that quickly become blockers and slow down migrations if they are not addressed properly. At this scale, customers will deal with the governance of hundreds of accounts, as well as thousands of IT resources residing within those accounts. Humans and traditional IT management processes cannot scale at the same pace and inevitably challenging questions emerge. In this session, we discuss those questions about governance at scale.
SID339 – Deep Dive on AWS CloudHSM Organizations building applications that handle confidential or sensitive data are subject to many types of regulatory requirements and often rely on hardware security modules (HSMs) to provide validated control of encryption keys and cryptographic operations. AWS CloudHSM is a cloud-based hardware security module (HSM) that enables you to easily generate and use your own encryption keys on the AWS Cloud using FIPS 140-2 Level 3 validated HSMs. This chalk talk will provide you a deep-dive on CloudHSM, and demonstrate how you can quickly and easily use CloudHSM to help secure your data and meet your compliance requirements.
SID340 – Using Infrastructure as Code to Inject Security Best Practices as Part of the Software Deployment Lifecycle A proactive approach to security is key to securing your applications as part of software deployment. In this chalk talk, T. Rowe Price, a financial asset management institution, outlines how they built their security automation process in enabling their numerous developer teams to rapidly and securely build and deploy applications at scale on AWS. Learn how they use services like AWS Identity and Access Management (IAM), HashiCorp tools, Terraform for automation, and Vault for secrets management, and incorporate certificate management and monitoring as part of the deployment process. T. Rowe Price discusses lessons learned and best practices to move from a tightly controlled legacy environment to an agile, automated software development process on AWS.
SID341 – Using AWS CloudTrail Logs for Scalable, Automated Anomaly Detection This workshop gives you an opportunity to develop a solution that can continuously monitor for and detect a realistic threat by analyzing AWS CloudTrail log data. Participants are provided with a CloudTrail data source and some clues to get started. Then you have to design a system that can process the logs, detect the threat, and trigger an alarm. You can make use of any AWS services that can assist in this endeavor, such as AWS Lambda for serverless detection logic, Amazon CloudWatch or Amazon SNS for alarming and notification, Amazon S3 for data and configuration storage, and more.
SID342 – Protect Your Web Applications from Common Attack Vectors Using AWS WAF As attacks and attempts to exploit vulnerabilities in web applications become more sophisticated, having an effective web request filtering solution becomes key to keeping your users’ data safe. In this workshop, discover how the OWASP Top 10 list of application security risks can help you secure your web applications. Learn how to use AWS services, such as AWS WAF, to mitigate vulnerabilities. This session includes hands-on labs to help you build a solution. Key learning goals include understanding the breadth and complexity of vulnerabilities customers need to protect from, understanding the AWS tools and capabilities that can help mitigate vulnerabilities, and learning how to configure effective HTTP request filtering rules using AWS WAF.
SID343 – User Management and App Authentication with Amazon Cognito Are you curious about how to authenticate and authorize your applications on AWS? Have you thought about how to integrate AWS Identity and Access Management (IAM) with your app authentication? Have you tried to integrate third-party SAML providers with your app authentication? Look no further. This workshop walks you through step by step to configure and create Amazon Cognito user pools and identity pools. This workshop presents you with the framework to build an application using Java, .NET, and serverless. You choose the stack and build the app with local users. See the service being used not only with mobile applications but with other stacks that normally don’t include Amazon Cognito.
SID344 – Soup to Nuts: Identity Federation for AWS AWS offers customers multiple solutions for federating identities on the AWS Cloud. In this session, we will embark on a tour of these solutions and the use cases they support. Along the way, we will dive deep with demonstrations and best practices to help you be successful managing identities on the AWS Cloud. We will cover how and when to use Security Assertion Markup Language 2.0 (SAML), OpenID Connect (OIDC), and other AWS native federation mechanisms. You will learn how these solutions enable federated access to the AWS Management Console, APIs, and CLI, AWS Infrastructure and Managed Services, your web and mobile applications running on the AWS Cloud, and much more.
SID345 – AWS Encryption SDK: The Busy Engineer’s Guide to Client-Side Encryption You know you want client-side encryption for your service but you don’t know exactly where to start. Join us for a hands-on workshop where we review some of your client-side encryption options and explore implementing client-side encryption using the AWS Encryption SDK. In this session, we cover the basics of client-side encryption, perform encrypt and decrypt operations using AWS KMS and the AWS Encryption SDK, and discuss security and performance considerations when implementing client-side encryption in your service.
Expert level
SID401 – Let’s Dive Deep Together: Advancing Web Application Security Beginning with a recap of best practices in CloudFront, AWS WAF, Route 53, and Amazon VPC security, we break into small teams to work together on improving the security of a typical web application. How can we creatively use the services? What additional features would help us? This technically advanced chalk talk requires certification at the solutions architect associate level or greater.
SID402 – An AWS Security Odyssey: Implementing Security Controls in the World of Internet, Big Data, IoT and E-Commerce Platforms This workshop will give participants the opportunity to take a security-focused journey across various AWS services and implement automated controls along the way. You will learn how to apply AWS security controls to services such as Amazon EC2, Amazon S3, AWS Lambda, and Amazon VPC. In short, you will learn how to use the cloud to protect the cloud. We will talk about how to: Adopt a workload-centric approach to your security strategy, Address security issues in a cost-effective manner Automate your security responses to promote maturity and auditability. In order to complete this workshop, attendees will need a laptop with wireless access, an AWS account and an IAM user that has full administrative privileges within their account. AWS credits will be provided as attendees depart the session to cover the cost of running the workshop in their own account.
SID404 – Amazon Inspector – Automating the “Sec” in DevSecOps Adopting DevSecOps can be challenging using traditional security tools that are designed for on-premises infrastructure. Amazon Inspector is an automated security assessment service that helps you adopt DevSecOps by integrating security assessments directly into the development process of applications running on Amazon EC2. We dive deep on how to use Inspector to automate host security assessments. We show you how to integrate Inspector with other AWS Cloud services to provide automated security assessments throughout your development process. We demo installing the AWS agent, setting up assessment targets and templates, and running assessments. We review the findings and discuss how you can automate the management and remediation of those findings with your available AWS services.
SID405 – Five New Security Automation Improvements You Can Make by Using Amazon CloudWatch Events and AWS Config Rules This presentation will include a deep dive into the code behind multiple security automation and remediation functions. This session will consider potential use cases, as well as feature a demonstration of a proposed script, and then walk through the code set to explain the various challenges and solutions of the intended script. All examples of code will be previously unreleased and will feature integration with services such as Trusted Advisor and Macie. All code will be released as OSS after re:Invent.
In August 2017 at the AWS Summit New York, AWS launched a new security and compliance service called Amazon Macie. Macie uses machine learning to automatically discover, classify, and protect sensitive data in AWS. In this blog post, I demonstrate how you can use Macie to help enable compliance with applicable regulations, starting with data retention.
How to query retained PII with Macie
Data retention and mandatory data deletion are common topics across compliance frameworks, so knowing what is stored and how long it has been or needs to be stored is of critical importance. For example, you can use Macie for Payment Card Industry Data Security Standard (PCI DSS) 3.2, requirement 3, “Protect stored cardholder data,” which mandates a “quarterly process for identifying and securely deleting stored cardholder data that exceeds defined retention.” You also can use Macie for ISO 27017 requirement 12.3.1, which calls for “retention periods for backup data.” In each of these cases, you can use Macie’s built-in queries to identify the age of data in your Amazon S3 buckets and to help meet your compliance needs.
To get started with Macie and run your first queries of personally identifiable information (PII) and sensitive data, follow the initial setup as described in the launch post on the AWS Blog. After you have set up Macie, walk through the following steps to start running queries. Start by focusing on the S3 buckets that you want to inventory and capture important compliance related activity and data.
To start running Macie queries:
In the AWS Management Console, launch the Macie console (you can type macie to find the console).
Click Dashboard in the navigation pane. This shows you an overview of the risk level and data classification type of all inventoried S3 buckets, categorized by date and type.
Choose S3 objects by PII priority. This dashboard lets you sort by PII priority and PII types.
In this case, I want to find information about credit card numbers. I choose the magnifying glass for the type cc_number (note that PII types can be used for custom queries). This view shows the events where PII classified data has been uploaded to S3. When I scroll down, I see the individual files that have been identified.
Before looking at the files, I want to continue to build the query by only showing items with high priority. To do so, I choose the row called Object PII Priority and then the magnifying glass icon next to High.
To view the results matching these queries, I scroll down and choose any file listed. This shows vital information such as creation date, location, and object access control list (ACL).
The piece I am most interested in this case is the Object PII details line to understand more about what was found in the file. In this case, I see name and credit card information, which is what caused the high priority. Scrolling up again, I also see that the query fields have updated as I interacted with the UI.
Let’s say that I want to get an alert every time Macie finds new data matching this query. This alert can be used to automate response actions by using AWS Lambda and Amazon CloudWatch Events.
I choose the left green icon called Save query as alert.
I can customize the alert and change things like category or severity to fit my needs based on the alert data.
Another way to find the information I am looking for is to run custom queries. To start using custom queries, I choose Research in the navigation pane.
To learn more about custom Macie queries and what you can do on the Research tab, see Using the Macie Research Tab.
I change the type of query I want to run from CloudTrail data to S3 objects in the drop-down list menu.
Because I want PII data, I start typing in the query box, which has an autocomplete feature. I choose the pii_types: query. I can now type the data I want to look for. In this case, I want to see all files matching the credit card filter so I type cc_number and press Enter. The query box now says, pii_types:cc_number. I press Enter again to enable autocomplete, and then I type AND pii_types:email to require both a credit card number and email address in a single object.
I choose the magnifying glass to search and Macie shows me all S3 objects that are tagged as PII of type Credit Cards. I can further specify that I only want to see PII of type Credit Card that are classified as High priority by adding AND and pii_impact:high to the query. As before, I can save this new query as an alert by clicking Save query as alert, which will be triggered by data matching the query going forward.
Advanced tip
Try the following advanced queries using Lucene query syntax and save the queries as alerts in Macie.
Use a regular-expression based query to search for a minimum of 10 credit card numbers and 10 email addresses in a single object:
pii_explain.cc_number:/([1-9][0-9]|[0-9]{3,}) distinct Credit Card Numbers.*/ AND pii_explain.email:/([1-9][0-9]|[0-9]{3,}) distinct Email Addresses.*/
Search for objects containing at least one credit card, name, and email address that have an object policy enabling global access (searching for S3 AllUsers or AuthenticatedUsers permissions):
(object_acl.Grants.Grantee.URI:”http\://acs.amazonaws.com/groups/global/AllUsers” OR object_acl.Grants.Grantee.URI:”http\://acs.amazonaws.com/groups/global/AllUsers”) AND (pii_types.cc_number AND pii_types.email AND pii_types.name)
These are two ways to identify and be alerted about PII by using Macie. In a similar way, you can create custom alerts for various AWS CloudTrail events by choosing a different data set on which to run the queries again. In the examples in this post, I identified credit cards stored in plain text (all data in this post is example data only), determined how long they had been stored in S3 by viewing the result details, and set up alerts to notify or trigger actions on new sensitive data being stored. With queries like these, you can build a reliable data validation program.
If you have comments about this post, submit them in the “Comments” section below. If you have questions about how to use Macie, start a new thread on the Macie forum or contact AWS Support.
Business continuity is important for building mission-critical workloads on AWS. As an AWS customer, you might define recovery point objectives (RPO) and recovery time objectives (RTO) for different tier applications in your business. After the RPO and RTO requirements are defined, it is up to your architects to determine how to meet those requirements.
You probably store persistent data in Amazon EBS volumes, which live within a single Availability Zone. And, following best practices, you take snapshots of your EBS volumes to back up the data on Amazon S3, which provides 11 9’s of durability. If you are following these best practices, then you’ve probably recognized the need to manage the number of snapshots you keep for a particular EBS volume and delete older, unneeded snapshots. Doing this cleanup helps save on storage costs.
Some customers also have policies stating that backups need to be stored a certain number of miles away as part of a disaster recovery (DR) plan. To meet these requirements, customers copy their EBS snapshots to the DR region. Then, the same snapshot management and cleanup has to also be done in the DR region.
All of this snapshot management logic consists of different components. You would first tag your snapshots so you could manage them. Then, determine how many snapshots you currently have for a particular EBS volume and assess that value against a retention rule. If the number of snapshots was greater than your retention value, then you would clean up old snapshots. And finally, you might copy the latest snapshot to your DR region. All these steps are just an example of a simple snapshot management workflow. But how do you automate something like this in AWS? How do you do it without servers?
One of the most powerful AWS services released in 2016 was Amazon CloudWatch Events. It enables you to build event-driven IT automation, based on events happening within your AWS infrastructure. CloudWatch Events integrates with AWS Lambda to let you execute your custom code when one of those events occurs. However, the actions to take based on those events aren’t always composed of a single Lambda function. Instead, your business logic may consist of multiple steps (like in the case of the example snapshot management flow described earlier). And you may want to run those steps in sequence or in parallel. You may also want to have retry logic or exception handling for each step.
AWS Step Functions serves just this purpose―to help you coordinate your functions and microservices. Step Functions enables you to simplify your effort and pull the error handling, retry logic, and workflow logic out of your Lambda code. Step Functions integrates with Lambda to provide a mechanism for building complex serverless applications. Now, you can kick off a Step Functions state machine based on a CloudWatch event.
In this post, I discuss how you can target Step Functions in a CloudWatch Events rule. This allows you to have event-driven snapshot management based on snapshot completion events firing in CloudWatch Event rules.
As an example of what you could do with Step Functions and CloudWatch Events, we’ve developed a reference architecture that performs management of your EBS snapshots.
Automating EBS Snapshot Management with Step Functions
The state machine then tags the snapshot, cleans up the oldest snapshots if the number of snapshots is greater than the defined number to retain, and copies the snapshot to a DR region.
When the DR region snapshot copy is completed, another state machine kicks off in the DR region. The new state machine has a similar flow and uses some of the same Lambda code to clean up the oldest snapshots that are greater than the defined number to retain.
Also, both state machines demonstrate how you can use Step Functions to handle errors within your workflow. Any errors that are caught during execution result in the execution of a Lambda function that writes a message to an SNS topic. Therefore, if any errors occur, you can subscribe to the SNS topic and get notified.
The following is an architecture diagram of the reference architecture:
Creating the Lambda functions and Step Functions state machines
First, pull the code from GitHub and use the AWS CLI to create S3 buckets for the Lambda code in the primary and DR regions. For this example, assume that the primary region is us-west-2 and the DR region is us-east-2. Run the following commands, replacing the italicized text in <> with your own unique bucket names.
git clone https://github.com/awslabs/aws-step-functions-ebs-snapshot-mgmt.git
cd aws-step-functions-ebs-snapshot-mgmt/
aws s3 mb s3://<primary region bucket name> – region us-west-2
aws s3 mb s3://<DR region bucket name> – region us-east-2
Next, use the Serverless Application Model (SAM), which uses AWS CloudFormation to deploy the Lambda functions and Step Functions state machines in the primary and DR regions. Replace the italicized text in <> with the S3 bucket names that you created earlier.
The CloudFormation templates deploy the following resources:
The Lambda functions that are coordinated by Step Functions
The Step Functions state machine
The SNS topic
The CloudWatch Events rules that trigger the state machine execution
So, all of the CloudWatch event rules have been created for you by performing the preceding commands. The next section demonstrates how you could create the CloudWatch event rule manually. To jump straight to testing the workflow, see the “Testing in your Account” section. Otherwise, you begin by setting up the CloudWatch event rule in the primary region for the createSnapshot event and also the CloudWatch event rule in the DR region for the copySnapshot command.
First, open the CloudWatch console in the primary region.
Choose Create Rule and create a rule for the createSnapshot command, with your newly created Step Function state machine as the target.
For Event Source, choose Event Pattern and specify the following values:
Service Name: EC2
Event Type: EBS Snapshot Notification
Specific Event: createSnapshot
For Target, choose Step Functions state machine, then choose the state machine created by the CloudFormation commands. Choose Create a new role for this specific resource. Your completed rule should look like the following:
Choose Configure Details and give the rule a name and description.
Choose Create Rule. You now have a CloudWatch Events rule that triggers a Step Functions state machine execution when the EBS snapshot creation is complete.
Now, set up the CloudWatch Events rule in the DR region as well. This looks almost same, but is based off the copySnapshot event instead of createSnapshot.
In the upper right corner in the console, switch to your DR region. Choose CloudWatch, Create Rule.
For Event Source, choose Event Pattern and specify the following values:
Service Name: EC2
Event Type: EBS Snapshot Notification
Specific Event: copySnapshot
For Target, choose Step Functions state machine, then select the state machine created by the CloudFormation commands. Choose Create a new role for this specific resource. Your completed rule should look like in the following:
As in the primary region, choose Configure Details and then give this rule a name and description. Complete the creation of the rule.
Testing in your account
To test this setup, open the EC2 console and choose Volumes. Select a volume to snapshot. Choose Actions, Create Snapshot, and then create a snapshot.
This results in a new execution of your state machine in the primary and DR regions. You can view these executions by going to the Step Functions console and selecting your state machine.
From there, you can see the execution of the state machine.
Primary region state machine:
DR region state machine:
I’ve also provided CloudFormation templates that perform all the earlier setup without using git clone and running the CloudFormation commands. Choose the Launch Stack buttons below to launch the primary and DR region stacks in Dublin and Ohio, respectively. From there, you can pick up at the Testing in Your Account section above to finish the example. All of the code for this example architecture is located in the aws-step-functions-ebs-snapshot-mgmt AWSLabs repo.
Primary Region eu-west-1 (Ireland)
DR Region us-east-2 (Ohio)
Summary
This reference architecture is just an example of how you can use Step Functions and CloudWatch Events to build event-driven IT automation. The possibilities are endless:
Use this pattern to perform other common cleanup type jobs such as managing Amazon RDS snapshots, old versions of Lambda functions, or old Amazon ECR images—all triggered by scheduled events.
Use Trusted Advisor events to identify unused EC2 instances or EBS volumes, then coordinate actions on them, such as alerting owners, stopping, or snapshotting.
Happy coding and please let me know what useful state machines you build!
Are you interested in reducing the operational overhead of your AWS Cloud infrastructure? One way to achieve this is to automate the response to operational events for resources in your AWS account.
Amazon CloudWatch Events provides a near real-time stream of system events that describe the changes and notifications for your AWS resources. From this stream, you can create rules to route specific events to AWS Step Functions, AWS Lambda, and other AWS services for further processing and automated actions.
In this post, learn how you can use Step Functions to orchestrate serverless IT automation workflows in response to CloudWatch events sourced from AWS Health, a service that monitors and generates events for your AWS resources. As a real-world example, I show automating the response to a scenario where an IAM user access key has been exposed.
Serverless workflows with Step Functions and Lambda
Step Functions makes it easy to develop and orchestrate components of operational response automation using visual workflows. Building automation workflows from individual Lambda functions that perform discrete tasks lets you develop, test, and modify the components of your workflow quickly and seamlessly. As serverless services, Step Functions and Lambda also provide the benefits of more productive development, reduced operational overhead, and no costs incurred outside of when the workflows are actively executing.
Example workflow
As an example, this post focuses on automating the response to an event generated by AWS Health when an IAM access key has been publicly exposed on GitHub. This is a diagram of the automation workflow:
AWS proactively monitors popular code repository sites for IAM access keys that have been publicly exposed. Upon detection of an exposed IAM access key, AWS Health generates an AWS_RISK_CREDENTIALS_EXPOSED event in the AWS account related to the exposed key. A configured CloudWatch Events rule detects this event and invokes a Step Functions state machine. The state machine then orchestrates the automated workflow that deletes the exposed IAM access key, summarizes the recent API activity for the exposed key, and sends the summary message to an Amazon SNS topic to notify the subscribers―in that order.
The corresponding Step Functions state machine diagram of this automation workflow can be seen below:
While this particular example focuses on IT automation workflows in response to the AWS_RISK_CREDENTIALS_EXPOSEDevent sourced from AWS Health, it can be generalized to integrate with other events from these services, other event-generating AWS services, and even run on a time-based schedule.
Walkthrough
To follow along, use the code and resources found in the aws-health-tools GitHub repo. The code and resources include an AWS CloudFormation template, in addition to instructions on how to use it.
The Step Functions state machine execution starts with the exposed keys event details in JSON, a sanitized example of which is provided below:
After it’s invoked, the state machine execution proceeds as follows.
Step 1: Delete the exposed IAM access key pair
The first thing you want to do when you determine that an IAM access key has been exposed is to delete the key pair so that it can no longer be used to make API calls. This Step Functions task state deletes the exposed access key pair detailed in the incoming event, and retrieves the IAM user associated with the key to look up API activity for the user in the next step. The user name, access key, and other details about the event are passed to the next step as JSON.
This state contains a powerful error-handling feature offered by Step Functions task states called a catch configuration. Catch configurations allow you to reroute and continue state machine invocation at new states depending on potential errors that occur in your task function. In this case, the catch configuration skips to Step 3. It immediately notifies your security team that errors were raised in the task function of this step (Step 1), when attempting to look up the corresponding IAM user for a key or delete the user’s access key.
Note: Step Functions also offers a retry configuration for when you would rather retry a task function that failed due to error, with the option to specify an increasing time interval between attempts and a maximum number of attempts.
Step 2: Summarize recent API activity for key
After you have deleted the access key pair, you’ll want to have some immediate insight into whether it was used for malicious activity in your account. Another task state, this step uses AWS CloudTrail to look up and summarize the most recent API activity for the IAM user associated with the exposed key. The summary is in the form of counts for each API call made and resource type and name affected. This summary information is then passed to the next step as JSON. This step requires information that you obtained in Step 1. Step Functions ensures the successful completion of Step 1 before moving to Step 2.
Step 3: Notify security
The summary information gathered in the last step can provide immediate insight into any malicious activity on your account made by the exposed key. To determine this and further secure your account if necessary, you must notify your security team with the gathered summary information.
This final task state generates an email message providing in-depth detail about the event using the API activity summary, and publishes the message to an SNS topic subscribed to by the members of your security team.
If the catch configuration of the task state in Step 1 was triggered, then the security notification email instead directs your security team to log in to the console and navigate to the Personal Health Dashboard to view more details on the incident.
Lessons learned
When implementing this use case with Step Functions and Lambda, consider the following:
One of the most important parts of implementing automation in response to operational events is to ensure visibility into the response and resolution actions is retained. Step Functions and Lambda enable you to orchestrate your granular response and resolution actions that provides direct visibility into the state of the automation workflow.
This basic workflow currently executes these steps serially with a catch configuration for error handling. More sophisticated workflows can leverage the parallel execution, branching logic, and time delay functionality provided by Step Functions.
Catch and retry configurations for task states allow for orchestrating reliable workflows while maintaining the granularity of each Lambda function. Without leveraging a catch configuration in Step 1, you would have had to duplicate code from the function in Step 3 to ensure that your security team was notified on failure to delete the access key.
Step Functions and Lambda are serverless services, so there is no cost for these services when they are not running. Because this IT automation workflow only runs when an IAM access key is exposed for this account (which is hopefully rare!), the total monthly cost for this workflow is essentially $0.
Conclusion
Automating the response to operational events for resources in your AWS account can free up the valuable time of your engineers. Step Functions and Lambda enable granular IT automation workflows to achieve this result while gaining direct visibility into the orchestration and state of the automation.
For more examples of how to use Step Functions to automate the operations of your AWS resources, or if you’d like to see how Step Functions can be used to build and orchestrate serverless applications, visit Getting Started on the Step Functions website.
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.