Tag Archives: Amazon CloudWatch

New – Amazon CloudWatch Anomaly Detection

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-amazon-cloudwatch-anomaly-detection/

Amazon CloudWatch launched in early 2009 as part of our desire to (as I said at the time) “make it even easier for you to build sophisticated, scalable, and robust web applications using AWS.” We have continued to expand CloudWatch over the years, and our customers now use it to monitor their infrastructure, systems, applications, and even business metrics. They build custom dashboards, set alarms, and count on CloudWatch to alert them to issues that affect the performance or reliability of their applications.

If you have used CloudWatch Alarms, you know that there’s a bit of an art to setting your alarm thresholds. You want to make sure to catch trouble early, but you don’t want to trigger false alarms. You need to deal with growth and with scale, and you also need to make sure that you adjust and recalibrate your thresholds to deal with cyclic and seasonal behavior.

Anomaly Detection
Today we are enhancing CloudWatch with a new feature that will help you to make more effective use of CloudWatch Alarms. Powered by machine learning and building on over a decade of experience, CloudWatch Anomaly Detection has its roots in over 12,000 internal models. It will help you to avoid manual configuration and experimentation, and can be used in conjunction with any standard or custom CloudWatch metric that has a discernible trend or pattern.

Anomaly Detection analyzes the historical values for the chosen metric, and looks for predictable patterns that repeat hourly, daily, or weekly. It then creates a best-fit model that will help you to better predict the future, and to more cleanly differentiate normal and problematic behavior. You can adjust and fine-tune the model as desired, and you can even use multiple models for the same CloudWatch metric.

Using Anomaly Detection
I can create my own models in a matter of seconds! I have an EC2 instance that generates a spike in CPU Utilization every 24 hours:

I select the metric, and click the “wave” icon to enable anomaly detection for this metric and statistic:

This creates a model with default settings. If I select the model and zoom in to see one of the utilization spikes, I can see that the spike is reflected in the prediction bands:

I can use this model as-is to drive alarms on the metric, or I can select the model and click Edit model to customize it:

I can exclude specific time ranges (past or future) from the data that is used to train the model; this is a good idea if the data reflects a one-time event that will not happen again. I can also specify the timezone of the data; this lets me handle metrics that are sensitive to changes in daylight savings time:

After I have set this up, the anomaly detection model goes in to effect and I can use to create alarms as usual. I choose Anomaly detection as my Threshold type, and use the Anomaly detection threshold to control the thickness of the band. I can raise the alarm when the metric is outside of, great than, or lower than the band:

The remaining steps are identical to the ones that you already use to create other types of alarms.

Things to Know
Here are a couple of interesting things to keep in mind when you are getting ready to use this new CloudWatch feature:

Suitable Metrics – Anomaly Detection works best when the metrics have a discernible pattern or trend, and when there is a minimal number of missing data points.

Updates – Once the model has been created, it will be updated every five minutes with any new metric data.

One-Time Events – The model cannot predict one-time events such as Black Friday or the holiday shopping season.

API / CLI / CloudFormation – You can create and manage anomaly models from the Console, the CloudWatch API (PutAnomalyDetector) and the CloudWatch CLI. You can also create AWS::CloudWatch::AnomalyDetector resources in your AWS CloudFormation templates.

Now Available
You can start creating and using CloudWatch Anomaly Detection today in all commercial AWS regions. To learn more, read about CloudWatch Anomaly Detection in the CloudWatch Documentation.

Jeff;

 

Automated Disaster Recovery using CloudEndure

Post Syndicated from Ryan Jaeger original https://aws.amazon.com/blogs/architecture/automated-disaster-recovery-using-cloudendure/

There are any number of events that cause IT outages and impact business continuity. These could include the unexpected infrastructure or application outages caused by flooding, earthquakes, fires, hardware failures, or even malicious attacks. Cloud computing opens a new door to support disaster recovery strategies, with benefits such as elasticity, agility, speed to innovate, and cost savings—all which aid new disaster recovery solutions.

With AWS, organizations can acquire IT resources on-demand, and pay only for the resources they use. Automating disaster recovery (DR) has always been challenging. This blog post shows how you can use automation to allow the orchestration of recovery to eliminate manual processes. CloudEndure Disaster Recovery, an AWS Company, Amazon Route 53, and AWS Lambda are the building blocks to deliver a cost-effective automated DR solution. The example in this post demonstrates how you can recover a production web application with sub-second Recovery Point Objects (RPOs) and Recovery Time Objectives (RTOs) in minutes.

As part of a DR strategy, knowing RPOs and RTOs will determine what kind of solution architecture you need. The RPO represents the point in time of the last recoverable data point (for example, the “last backup”). Any disaster after that point would result in data loss.

The time from the outage to restoration is the RTO. Minimizing RTO and RPO is a cost tradeoff. Restoring from backups and recreating infrastructure after the event is the lowest cost but highest RTO. Conversely, the highest cost and lowest RTO is a solution running a duplicate auto-failover environment.

Solution Overview

CloudEndure is an automated IT resilience solution that lets you recover your environment from unexpected infrastructure or application outages, data corruption, ransomware, or other malicious attacks. It utilizes block-level Continuous Data Replication (CDP), which ensures that target machines are spun up in their most current state during a disaster or drill, so that you can achieve sub-second RPOs. In the event of a disaster, CloudEndure triggers a highly automated machine conversion process and a scalable orchestration engine that can spin up machines in the target AWS Region within minutes. This process enables you to achieve RTOs in minutes. The CloudEndure solution uses a software agent that installs on physical or virtual servers. It connects to a self-service, web-based use console, which then issues an API call to the selected AWS target Region to create a Staging Area in the customer’s AWS account designated to receive the source machine’s replicated data.

Architecture

In the above example, a webserver and database server have the CloudEndure Agent installed, and the disk volumes on each server replicated to a staging environment in the customer’s AWS account. The CloudEndure Replication Server receives the encrypted data replication traffic and writes to the appropriate corresponding EBS volumes. It’s also possible to configure data replication traffic to use VPN or AWS Direct Connect.

With this current setup, if an infrastructure or application outage occurs, a failover to AWS is executed by manually starting the process from the CloudEndure Console. When this happens, CloudEndure creates EC2 instances from the synchronized target EBS volumes. After the failover completes, additional manual steps are needed to change the website’s DNS entry to point to the IP address of the failed over webserver.

Could the CloudEndure failover and DNS update be automated? Yes.

Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service with three main functions: domain registration, DNS routing, and health checking. A configured Route 53 health check monitors the endpoint of a webserver. If the health check fails over a specified period, an alarm is raised to execute an AWS Lambda function to start the CloudEndure failover process. In addition to health checks, Route 53 DNS Failover allows the DNS record for the webserver to be automatically update based on a healthy endpoint. Now the previously manual process of updating the DNS record to point to the restored web server is automated. You can also build Route 53 DNS Failover configurations to support decision trees to handle complex configurations.

To illustrate this, the following builds on the example by having a primary, secondary, and tertiary DNS Failover choice for the web application:

How Health Checks Work in Complex Amazon Route 53 Configurations

When the CloudEndure failover action executes, it takes several minutes until the target EC2 is launched and configured by CloudEndure. An S3 static web page can be returned to the end-user to improve communication while the failover is happening.

To support this example, Amazon Route 53 DNS failover decision tree can be configured to have a primary, secondary, and tertiary failover. The decision tree logic to support the scenario is the following:

  1. If the primary health check passes, return the primary webserver.
  2. Else, if the secondary health check passes, return the failover webserver.
  3. Else, return the S3 static site.

When the Route 53 health check fails when monitoring the primary endpoint for the webserver, a CloudWatch alarm is configured to ALARM after a set time. This CloudWatch alarm then executes a Lambda function that calls the CloudEndure API to begin the failover.

In the screenshot below, both health checks are reporting “Unhealthy” while the primary health check is in a state of ALARM. At the point, the DNS failover logic should be returning the path to the static S3 site, and the Lambda function executed to start the CloudEndure failover.

The following architecture illustrates the completed scenario:

Conclusion

Having a disaster recovery strategy is critical for business continuity. The benefits of AWS combined with CloudEndure Disaster Recovery creates a non-disruptive DR solution that provides minimal RTO and RPO while reducing total cost of ownership for customers. Leveraging CloudWatch Alarms combined with AWS Lambda for serverless computing are building blocks for a variety of automation scenarios.

 

 

Tips for building a cloud security operating model in the financial services industry

Post Syndicated from Stephen Quigg original https://aws.amazon.com/blogs/security/tips-for-building-a-cloud-security-operating-model-in-the-financial-services-industry/

My team helps financial services customers understand how AWS services operate so that you can incorporate AWS into your existing processes and security operations centers (SOCs). As soon as you create your first AWS account for your organization, you’re live in the cloud. So, from day one, you should be equipped with certain information: you should understand some basics about how our products and services work, you should know how to spot when something bad could happen, and you should understand how to recover from that situation. Below is some of the advice I frequently offer to financial services customers who are just getting started.

How to think about cloud security

Security is security – the principles don’t change. Many of the on-premises security processes that you have now can extend directly to an AWS deployment. For example, your processes for vulnerability management, security monitoring, and security logging can all be transitioned over.

That said, AWS is more than just infrastructure. I sometimes talk to customers who are only thinking about the security of their AWS Virtual Private Clouds (VPCs), and about the Amazon Elastic Compute Cloud (EC2) instances running in those VPCs. And that’s good; its traditional network security that remains quite standard. But I also ask my customers questions that focus on other services they may be using. For example:

  • How are you thinking about who has Database Administrator (DBA) rights for Amazon Aurora Serverless? Aurora Serverless is a managed database service that lets AWS do the heavy lifting for many DBA tasks.
  • Do you understand how to configure (and monitor the configuration of) your Amazon Athena service? Athena lets you query large amounts of information that you’ve stored in Amazon Simple Storage Service (S3).
  • How will you secure and monitor your AWS Lambda deployments? Lambda is a serverless platform that has no infrastructure for you to manage.

Understanding AWS security services

As a customer, it’s important to understand the information that’s available to you about the state of your cloud infrastructure. Typically, AWS delivers much of that information via the Amazon CloudWatch service. So, I encourage my customers to get comfortable with CloudWatch, alongside our AWS security services. The key services that any security team needs to understand include:

  • Amazon GuardDuty, which is a threat detection system for the cloud.
  • AWS Cloudtrail, which is the log of AWS API services.
  • VPC Flow Logs, which enables you to capture information about the IP traffic going to and from network interfaces in your VPC.
  • AWS Config, which records all the configuration changes that your teams have made to AWS resources, allowing you to assess those changes.
  • AWS Security Hub, which offers a “single pane of glass” that helps you assess AWS resources and collect information from across your security services. It gives you a unified view of resources per Region, so that you can more easily manage your security and compliance workflow.

These tools make it much quicker for you to get up to speed on your cloud security status and establish a position of safety.

Getting started with automation in the cloud

You don’t have to be a software developer to use AWS. You don’t have to write any code; the basics are straightforward. But to optimize your use of AWS and to get faster at automating, there is a real advantage if you have coding skills. Automation is the core of the operating model. We have a number of tutorials that can help you get up to speed.

Self-service cloud security resources for financial services customers

There are people like me who can come and talk to you. But to keep you from having to wait for us, we also offer a lot of self-service cloud security resources on our website.

We offer a free digital training course on AWS security fundamentals, plus webinars on financial services topics. We also offer an AWS security certification, which lets you show that your security knowledge has been validated by a third-party.

There are also a number of really good videos you can watch. For example, we had our inaugural security conference, re:Inforce, in Boston this past June. The videos and slides from the conference are now on YouTube, so you can sit and watch at your own pace. If you’re not sure where to start, try this list of popular sessions.

Finding additional help

You can work with a number of technology partners to help extend your security tools and processes to the cloud.

  • Our AWS Professional Services team can come and help you on site. In addition, we can simulate security incidents with you tohelp you get comfortable with security and cloud technology and how to respond to incidents.
  • AWS security consulting partners can also help you develop processes or write the code that you might need.
  • The AWS Marketplace is a wonderful self-service location where you can get all sorts of great security solutions, including finding a consulting partner.

And if you’re interested in speaking directly to AWS, you can always get in touch. There are forms on our website, or you can reach out to your AWS account manager and they can help you find the resources that are necessary for your business.

Conclusion

Financial services customers face some tough security challenges. You handle large amounts of data, and it’s really important that this data is stored securely and that its privacy is respected. We know that our customers do lots of due diligence of AWS before adopting our services, and they have many different regulatory environments within which they have to work. In turn, we want to help customers understand how they can build a cloud security operating model that meets their needs while using our services.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Stephen Quigg

Stephen Quigg is a Principal Securities Solutions Architect within AWS Financial Services. Quigg started his AWS career in Sydney, Australia, but returned home to Scotland three years ago having missed the wind and rain too much. He manages to fit some work in between being a husband and father to two angelic children and making music.

Automating notifications when AMI permissions change

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/automating-notifications-when-ami-permissions-change/

This post is courtesy of Ernes Taljic, Solutions Architect and Sudhanshu Malhotra, Solutions Architect

This post demonstrates how to automate alert notifications when users modify the permissions of an Amazon Machine Image (AMI). You can use it as a blueprint for a wide variety of alert notifications by making simple modifications to the events that you want to receive alerts about. For example, updating the specific operation in Amazon CloudWatch allows you to receive alerts on any activity that AWS CloudTrail captures.

This post walks you through on how to configure an event rule in CloudWatch that triggers an AWS Lambda function. The Lambda function uses Amazon SNS to send an email when an AMI changes to public, private, shared, or unshared with one or more AWS accounts.

Solution overview

The following diagram describes the solution at a high level:

  1. A user changes an attribute of an AMI.
  2. CloudTrail logs the change as a ModifyImageAttribute API event.
  3. A CloudWatch Events rule captures this event.
  4. The CloudWatch Events rule triggers a Lambda function.
  5. The Lambda function publishes a message to the defined SNS topic.
  6. SNS sends an email alert to the topic’s subscribers.

Deployment walkthrough

To implement this solution, you must create:

  • An SNS topic
  • An IAM role
  • A Lambda function
  • A CloudWatch Events rule

Step 1: Creating an SNS topic

To create an SNS topic, complete the following steps:

  1. Open the SNS console.
  2. Under Create topic, for Topic name, enter a name and choose Create topic. You can now see the MySNSTopic page. The Details section displays the topic’s Name, ARN, Display name (optional), and the AWS account ID of the Topic owner.
  3. In the Details section, copy the topic ARN to the clipboard, for example:
    arn:aws:sns:us-east-1:123456789012:MySNSTopic
  4. On the left navigation pane, choose Subscriptions, Create subscription.
  5. On the Create subscription page, do the following:
    1. Enter the topic ARN of the topic you created earlier:
      arn:aws:sns:us-east-1:123456789012:MySNSTopic
    2. For Protocol, select Email.
    3. For Endpoint, enter an email address that can receive notifications.
    4. Choose Create subscription.

Step 2: Creating an IAM role

To create an IAM role, complete the following steps. For more information, see Creating an IAM Role.

  1. In the IAM console, choose Policies, Create Policy.
  2. On the JSON tab, enter the following IAM policy:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:*:*:*"
            ],
            "Effect": "Allow",
            "Sid": "LogStreamAccess"
        },
        {
            "Action": [
                "sns:Publish"
            ],
            "Resource": [
                "arn:aws:sns:*:*:*"
            ],
            "Effect": "Allow",
            "Sid": "SNSPublishAllow"
        },
        {
            "Action": [
                "iam:ListAccountAliases"
            ],
            "Resource": "*",
            "Effect": "Allow",
            "Sid": "ListAccountAlias"
        }
    ]
}

3. Choose Review policy.

4. Enter a name (MyCloudWatchRole) for this policy and choose Create policy. Note the name of this policy for later steps.

5. In the left navigation pane, choose Roles, Create role.

6. On the Select role type page, choose Lambda and the Lambda use case.

7. Choose Next: Permissions.

8. Filter policies by the policy name that you just created, and select the check box.

9. Choose Next: Tags, and give it an appropriate tag.

10. Choose Next: Review. Give this IAM role an appropriate name, and note it for future use.

11.   Choose Create role.

Step 3: Creating a Lambda function

To create a Lambda function, complete the following steps. For more information, see Create a Lambda Function with the Console.

  1. In the Lambda console, choose Author from scratch.
  2. For Function Name, enter the name of your function.
  3. For Runtime, choose Python 3.7.
  4. For Execution role, select Use an existing role, then select the IAM role created in the previous step.
  5. Choose Create Function, remove the default function, and copy the following code into the Function Code window:
# Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
#     http://aws.amazon.com/apache2.0/
#
# or in the "license" file accompanying this file.
# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
# either express or implied. See the License for the specific language governing permissions
# and limitations under the License.
#
# Description: This Lambda function sends an SNS notification to a given AWS SNS topic when an API event of \"Modify Image Attribute\" is detected.
#              The SNS subject is- "API call-<insert call event> by < insert user name> detected in Account-<insert account alias>, see message for further details". 
#              The JSON message body of the SNS notification contains the full event details.
# 
#
# Author: Sudhanshu Malhotra


import json
import boto3
import logging
import os
import botocore.session
from botocore.exceptions import ClientError
session = botocore.session.get_session()

logging.basicConfig(level=logging.DEBUG)
logger=logging.getLogger(__name__)

import ipaddress
import traceback

def lambda_handler(event, context):
	logger.setLevel(logging.DEBUG)
	eventname = event['detail']['eventName']
	snsARN = os.environ['snsARN']          #Getting the SNS Topic ARN passed in by the environment variables.
	user = event['detail']['userIdentity']['type']
	srcIP = event['detail']['sourceIPAddress']
	imageId = event['detail']['requestParameters']['imageId']
	launchPermission = event['detail']['requestParameters']['launchPermission']
	imageAction = list(launchPermission.keys())[0]
	accnt_num =[]
	
	if imageAction == "add":
		if "userId" in launchPermission['add']['items'][0].keys():
			accnt_num = [li['userId'] for li in launchPermission['add']['items']]      # Get the AWS account numbers that the image was shared with
			imageAction = "Image shared with AWS account: " + str(accnt_num)[1:-1]
		else:
			imageAction = "Image made Public"
	
	elif imageAction == "remove":
		if "userId" in launchPermission['remove']['items'][0].keys():
			accnt_num = [li['userId'] for li in launchPermission['remove']['items']]
			imageAction = "Image Unshared with AWS account: " + str(accnt_num)[1:-1]    # Get the AWS account numbers that the image was unshared with
		else:
			imageAction = "Image made Private"
	
	
	logger.debug("Event is --- %s" %event)
	logger.debug("Event Name is--- %s" %eventname)
	logger.debug("SNSARN is-- %s" %snsARN)
	logger.debug("User Name is -- %s" %user)
	logger.debug("Source IP Address is -- %s" %srcIP)
	
	client = boto3.client('iam')
	snsclient = boto3.client('sns')
	response = client.list_account_aliases()
	logger.debug("List Account Alias response --- %s" %response)
	
	# Check if the source IP is a valid IP or AWS service DNS name.
	# If DNS name then we ignore the API activity as this is internal AWS operation
	# For more information check - https://aws.amazon.com/premiumsupport/knowledge-center/cloudtrail-root-action-logs/
	try:
	    validIP = ipaddress.ip_address(srcIP)
	    logger.debug("IP addr is-- %s" %validIP)
	except Exception as e:
	    logger.error("Catching the traceback error: %s" %traceback.format_exc())
	    logger.debug("Seems like the root API activity was caused by an internal operation. IP address is internal service DNS name")
	    return
	try:
		if not response['AccountAliases']:
			accntAliase = (boto3.client('sts').get_caller_identity()['Account'])
			logger.info("Account Aliase is not defined. Account ID is %s" %accntAliase)
		else:
			accntAliase = response['AccountAliases'][0]
			logger.info("Account Aliase is : %s" %accntAliase)
	
	except ClientError as e:
		logger.error("Client Error occured")
	
	try: 
		publish_message = ""
		publish_message += "\nImage Attribute change summary" + "\n\n"
		publish_message += "##########################################################\n"
		publish_message += "# Event Name- " +str(eventname) + "\n" 
		publish_message += "# Account- " +str(accntAliase) +  "\n"
		publish_message += "# AMI ID- " +str(imageId) +  "\n"
		publish_message += "# Image Action- " +str(imageAction) +  "\n"
		publish_message += "# Source IP- " +str(srcIP) +   "\n"
		publish_message += "##########################################################\n"
		publish_message += "\n\n\nThe full event is as below:- \n\n" +str(event) +"\n"
		logger.debug("MESSAGE- %s" %publish_message)
		
		#Sending the notification...
		snspublish = snsclient.publish(
						TargetArn= snsARN,
						Subject=(("Image Attribute change API call-\"%s\" detected in Account-\"%s\"" %(eventname,accntAliase))[:100]),
						Message=publish_message
						)
	except ClientError as e:
		logger.error("An error occured: %s" %e)

6. In the Environment variables section, enter the following key-value pair:

  • Key= snsARN
  • Value= the ARN of the MySNSTopic created earlier

7. Choose Save.

Step 4: Creating a CloudWatch Events rule

To create a CloudWatch Events rule, complete the following steps. This rule catches when a user performs a ModifyImageAttribute API event and triggers the Lambda function (set as a target).

1.       In the CloudWatch console, choose Rules, Create rule.

  • On the Step 1: Create rule page, under Event Source, select Event Pattern.
  • Copy the following event into the preview pane:
{
  "source": [
    "aws.ec2"
  ],
  "detail-type": [
    "AWS API Call via CloudTrail"
  ],
  "detail": {
    "eventSource": [
      "ec2.amazonaws.com"
    ],
    "eventName": [
      "ModifyImageAttribute"
    ]
  }
}
  • For Targets, select Lambda function, and select the Lambda function created in Step 2.

  • Choose Configure details.

2. On the Step 2: Configure rule details page, enter a name and description for the rule.

3. For State, select Enabled.

4. Choose Create rule.

Solution validation

Confirm that the solution works by changing an AMI:

  1. Open the Amazon EC2 console. From the menu, select AMIs under the Images heading.
  2. Select one of the AMIs, and choose Actions, Modify Image Permissions. To create an AMI, see How do I create an AMI that is based on my EBS-backed EC2 instance?
  3. Choose Private.
  4. For AWS Account Number, choose an account with which to share the AMI.
  5. Choose Add Permission, Save.
  6. Check your inbox to verify that you received an email from SNS. The email contains a summary message with the following information, followed by the full event:
  • Event Name
  • Account
  • AMI ID
  • Image Action
  • Source IP

For Image Action, the email lists one of the following events:

  • Image made Public
  • Image made Private
  • Image shared with AWS account
  • Image Unshared with AWS account

The message also includes the account ID of any shared or unshared accounts.

Creating other alert notifications

This post shows you how to automate notifications when AMI permissions change, but the solution is a blueprint for a wide variety of use cases that require alert notifications. You can use the CloudTrail event history to find the API associated with the event that you want to receive notifications about, and create a new CloudWatch Events rule for that event.

  1. Open the CloudTrail console and choose Event history.
  2. Explore the CloudTrail event history and locate the event name associated with the actions that you performed in your AWS account.
  3. Make a note of the event name and modify the eventName parameter in Step 4, 1.b to configure alerts for that particular event.

Automating Notifications - CloudTrail console

Conclusion

This post demonstrated how you can use CloudWatch to create automated notifications when AMI permissions change. Additionally, you can use the CloudTrail event history to find the API for other events, and use the preceding walkthrough to create other event alerts.

For further reading, see the following posts:

 

 

Learn From Your VPC Flow Logs With Additional Meta-Data

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/learn-from-your-vpc-flow-logs-with-additional-meta-data/

Flow Logs for Amazon Virtual Private Cloud enables you to capture information about the IP traffic going to and from network interfaces in your VPC. Flow Logs data can be published to Amazon CloudWatch Logs or Amazon Simple Storage Service (S3).

Since we launched VPC Flow Logs in 2015, you have been using it for variety of use-cases like troubleshooting connectivity issues across your VPCs, intrusion detection, anomaly detection, or archival for compliance purposes. Until today, VPC Flow Logs provided information that included source IP, source port, destination IP, destination port, action (accept, reject) and status. Once enabled, a VPC Flow Log entry looks like the one below.

While this information was sufficient to understand most flows, it required additional computation and lookup to match IP addresses to instance IDs or to guess the directionality of the flow to come to meaningful conclusions.

Today we are announcing the availability of additional meta data to include in your Flow Logs records to better understand network flows. The enriched Flow Logs will allow you to simplify your scripts or remove the need for postprocessing altogether, by reducing the number of computations or lookups required to extract meaningful information from the log data.

When you create a new VPC Flow Log, in addition to existing fields, you can now choose to add the following meta-data:

  • vpc-id : the ID of the VPC containing the source Elastic Network Interface (ENI).
  • subnet-id : the ID of the subnet containing the source ENI.
  • instance-id : the Amazon Elastic Compute Cloud (EC2) instance ID of the instance associated with the source interface. When the ENI is placed by AWS services (for example, AWS PrivateLink, NAT Gateway, Network Load Balancer etc) this field will be “-
  • tcp-flags : the bitmask for TCP Flags observed within the aggregation period. For example, FIN is 0x01 (1), SYN is 0x02 (2), ACK is 0x10 (16), SYN + ACK is 0x12 (18), etc. (the bits are specified in “Control Bits” section of RFC793 “Transmission Control Protocol Specification”).
    This allows to understand who initiated or terminated the connection. TCP uses a three way handshake to establish a connection. The connecting machine sends a SYN packet to the destination, the destination replies with a SYN + ACK and, finally, the connecting machine sends an ACK. In the Flow Logs, the handshake is shown as two lines, with tcp-flags values of 2 (SYN), 18 (SYN + ACK).  ACK is reported only when it is accompanied with SYN (otherwise it would be too much noise for you to filter out).
  • type : the type of traffic : IPV4, IPV6 or Elastic Fabric Adapter.
  • pkt-srcaddr : the packet-level IP address of the source. You typically use this field in conjunction with srcaddr to distinguish between the IP address of an intermediate layer through which traffic flows, such as a NAT gateway.
  • pkt-dstaddr : the packet-level destination IP address, similar to the previous one, but for destination IP addresses.

To create a VPC Flow Log, you can use the AWS Management Console, the AWS Command Line Interface (CLI) or the CreateFlowLogs API and select which additional information and the order you want to consume the fields, for example:

Or using the AWS Command Line Interface (CLI) as below:

$ aws ec2 create-flow-logs --resource-type VPC \
                            --region eu-west-1 \
                            --resource-ids vpc-12345678 \
                            --traffic-type ALL  \
                            --log-destination-type s3 \
                            --log-destination arn:aws:s3:::sst-vpc-demo \
                            --log-format '${version} ${vpc-id} ${subnet-id} ${instance-id} ${interface-id} ${account-id} ${type} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${pkt-srcaddr} ${pkt-dstaddr} ${protocol} ${bytes} ${packets} ${start} ${end} ${action} ${tcp-flags} ${log-status}'

# be sure to replace the bucket name and VPC ID !

{
    "ClientToken": "1A....HoP=",
    "FlowLogIds": [
        "fl-12345678123456789"
    ],
    "Unsuccessful": [] 
}

Enriched VPC Flow Logs are delivered to S3. We will automatically add the required S3 Bucket Policy to authorize VPC Flow Logs to write to your S3 bucket. VPC Flow Logs does not capture real-time log streams for your network interface, it might take several minutes to begin collecting and publishing data to the chosen destinations. Your logs will eventually be available on S3 at s3://<bucket name>/AWSLogs/<account id>/vpcflowlogs/<region>/<year>/<month>/<day>/

An SSH connection from my laptop with IP address 90.90.0.200 to an EC2 instance would appear like this :

3 vpc-exxxxxx2 subnet-8xxxxf3 i-0bfxxxxxxaf eni-08xxxxxxa5 48xxxxxx93 IPv4 172.31.22.145 90.90.0.200 22 62897 172.31.22.145 90.90.0.200 6 5225 24 1566328660 1566328672 ACCEPT 18 OK
3 vpc-exxxxxx2 subnet-8xxxxf3 i-0bfxxxxxxaf eni-08xxxxxxa5 48xxxxxx93 IPv4 90.90.0.200 172.31.22.145 62897 22 90.90.0.200 172.31.22.145 6 4877 29 1566328660 1566328672 ACCEPT 2 OK

172.31.22.145 is the private IP address of the EC2 instance, the one you see when you type ifconfig on the instance.  All flags are “OR”ed during aggregation period. When connection is short, probably both SYN and FIN (3), as well as SYN+ACK and FIN (19) will be set for the same lines.

Once a Flow Log is created, you can not add additional fields or modify the structure of the log to ensure you will not accidently break scripts consuming this data. Any modification will require you to delete and recreate the VPC Flow Logs. There is no additional cost to capture the extra information in the VPC Flow Logs, normal VPC Flow Log pricing applies, remember that Enriched VPC Flow Log records might consume more storage when selecting all fields.  We do recommend to select only the fields relevant to your use-cases.

Enriched VPC Flow Logs is available in all regions where VPC Flow logs is available, you can start to use it today.

— seb

PS: I heard from the team they are working on adding additional meta-data to the logs, stay tuned for updates.

Operational Insights for Containers and Containerized Applications

Post Syndicated from Steve Roberts original https://aws.amazon.com/blogs/aws/operational-insights-for-containers-and-containerized-applications/

The increasing adoption of containerized applications and microservices also brings an increased burden for monitoring and management. Builders have an expectation of, and requirement for, the same level of monitoring as would be used with longer lived infrastructure such as Amazon Elastic Compute Cloud (EC2) instances. By contrast containers are relatively short-lived, and usually subject to continuous deployment. This can make it difficult to reliably collect monitoring data and to analyze performance or other issues, which in turn affects remediation time. In addition builders have to resort to a disparate collection of tools to perform this analysis and inspection, manually correlating context across a set of infrastructure and application metrics, logs, and other traces.

Announcing general availability of Amazon CloudWatch Container Insights
At the AWS Summit in New York this past July, Amazon CloudWatch Container Insights support for Amazon ECS and AWS Fargate was announced as an open preview for new clusters. Starting today Container Insights is generally available, with the added ability to now also monitor existing clusters. Immediate insights into compute utilization and failures for both new and existing cluster infrastructure and containerized applications can be easily obtained from container management services including Kubernetes, Amazon Elastic Container Service for Kubernetes, Amazon ECS, and AWS Fargate.

Once enabled Amazon CloudWatch discovers all of the running containers in a cluster and collects performance and operational data at every layer in the container stack. It also continuously monitors and updates as changes occur in the environment, simplifying the number of tools required to collect, monitor, act, and analyze container metrics and logs giving complete end to end visibility.

Being able to easily access this data means customers can shift focus to increased developer productivity and away from building mechanisms to curate and build dashboards.

Getting started with Amazon CloudWatch Container Insights
I can enable Container Insights by following the instructions in the documentation. Once enabled and new clusters launched, when I visit the CloudWatch console for my region I see a new option for Container Insights in the list of dashboards available to me.

Clicking this takes me to the relevant dashboard where I can select the container management service that hosts the clusters that I want to observe.

In the below image I have selected to view metrics for my ECS Clusters that are hosting a sample application I have deployed in AWS Fargate. I can examine the metrics for standard time periods such as 1 hour, 3 hours, etc but can also specify custom time periods. Here I am looking at the metrics for a custom time period of the past 15 minutes.

You can see that I can quickly gain operational oversight of the overall performance of the cluster. Clicking the cluster name takes me deeper to view the metrics for the tasks inside the cluster.

Selecting a container allows me to then dive into either AWS X-Ray traces or performance logs.

Selecting performance logs takes me to the Amazon CloudWatch Logs Insights page where I can run queries against the performance events collected for my container ecosystem (e.g., Container, Task/Pod, Cluster, etc.) that I can then use to troubleshoot and dive deeper.

Container Insights makes it easy for me to get started monitoring my containers and enables me to quickly drill down into performance metrics and log analytics without the need to build custom dashboards to curate data from multiple tools. Beyond monitoring and troubleshooting I can also use the data and dashboards Container Insights provides me to support other use cases such as capacity requirements and planning, by helping me understand compute utilization by Pod/Task, Container, and Service for example.

Availability
Amazon CloudWatch Container Insights is generally available today to customers in all public AWS regions where Amazon Elastic Container Service for Kubernetes, Kubernetes, Amazon ECS, and AWS Fargate are present.

— Steve

Scaling Kubernetes deployments with Amazon CloudWatch metrics

Post Syndicated from Ignacio Riesgo original https://aws.amazon.com/blogs/compute/scaling-kubernetes-deployments-with-amazon-cloudwatch-metrics/

This post is contributed by Kwunhok Chan | Solutions Architect, AWS

 

In an earlier post, AWS introduced Horizontal Pod Autoscaler and Kubernetes Metrics Server support for Amazon Elastic Kubernetes Service. These tools make it easy to scale your Kubernetes workloads managed by EKS in response to built-in metrics like CPU and memory.

However, one common use case for applications running on EKS is the integration with AWS services. For example, you administer an application that processes messages published to an Amazon SQS queue. You want the application to scale according to the number of messages in that queue. The Amazon CloudWatch Metrics Adapter for Kubernetes (k8s-cloudwatch-adapter) helps.

 

Amazon CloudWatch Metrics Adapter for Kubernetes

The k8s-cloudwatch-adapter is an implementation of the Kubernetes Custom Metrics API and External Metrics API with integration for CloudWatch metrics. It allows you to scale your Kubernetes deployment using the Horizontal Pod Autoscaler (HPA) with CloudWatch metrics.

 

Prerequisites

Before starting, you need the following:

 

Getting started

Before using the k8s-cloudwatch-adapter, set up a way to manage IAM credentials to Kubernetes pods. The CloudWatch Metrics Adapter requires the following permissions to access metric data from CloudWatch:

cloudwatch:GetMetricData

Create an IAM policy with the following template:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:GetMetricData"
            ],
            "Resource": "*"
        }
    ]
}

For demo purposes, I’m granting admin permissions to my Kubernetes worker nodes. Don’t do this in your production environment. To associate IAM roles to your Kubernetes pods, you may want to look at kube2iam or kiam.

If you’re using an EKS cluster, you most likely provisioned it with AWS CloudFormation. The following command uses AWS CloudFormation stacks to update the proper instance policy with the correct permissions:

aws iam attach-role-policy \
--policy-arn arn:aws:iam::aws:policy/AdministratorAccess \
--role-name $(aws cloudformation describe-stacks --stack-name ${STACK_NAME} --query 'Stacks[0].Parameters[?ParameterKey==`NodeInstanceRoleName`].ParameterValue' | jq -r ".[0]")

 

Make sure to replace ${STACK_NAME} with the nodegroup stack name from the AWS CloudFormation console .

 

You can now deploy the k8s-cloudwatch-adapter to your Kubernetes cluster.

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/deploy/adapter.yaml

 

This deployment creates a new namespace—custom-metrics—and deploys the necessary ClusterRole, Service Account, and Role Binding values, along with the deployment of the adapter. Use the created custom resource definition (CRD) to define the configuration for the external metrics to retrieve from CloudWatch. The adapter reads the configuration defined in ExternalMetric CRDs and loads its external metrics. That allows you to use HPA to autoscale your Kubernetes pods.

 

Verifying the deployment

Next, query the metrics APIs to see if the adapter is deployed correctly. Run the following command:

$ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq.
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "external.metrics.k8s.io/v1beta1",
  "resources": [
  ]
}

There are no resources from the response because you haven’t registered any metric resources yet.

 

Deploying an Amazon SQS application

Next, deploy a sample SQS application to test out k8s-cloudwatch-adapter. The SQS producer and consumer are provided, together with the YAML files for deploying the consumer, metric configuration, and HPA.

Both the producer and consumer use an SQS queue named helloworld. If it doesn’t exist already, the producer creates this queue.

Deploy the consumer with the following command:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/consumer-deployment.yaml

 

You can verify that the consumer is running with the following command:

$ kubectl get deploy sqs-consumer
NAME           DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
sqs-consumer   1         1         1            0           5s

 

Set up Amazon CloudWatch metric and HPA

Next, create an ExternalMetric resource for the CloudWatch metric. Take note of the Kind value for this resource. This CRD resource tells the adapter how to retrieve metric data from CloudWatch.

You define the query parameters used to retrieve the ApproximateNumberOfMessagesVisible for an SQS queue named helloworld. For details about how metric data queries work, see CloudWatch GetMetricData API.

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric:
  metadata:
    name: hello-queue-length
  spec:
    name: hello-queue-length
    resource:
      resource: "deployment"
      queries:
        - id: sqs_helloworld
          metricStat:
            metric:
              namespace: "AWS/SQS"
              metricName: "ApproximateNumberOfMessagesVisible"
              dimensions:
                - name: QueueName
                  value: "helloworld"
            period: 300
            stat: Average
            unit: Count
          returnData: true

 

Create the ExternalMetric resource:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/externalmetric.yaml

 

Then, set up the HPA for your consumer. Here is the configuration to use:

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: sqs-consumer-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: sqs-consumer
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metricName: hello-queue-length
      targetValue: 30

 

This HPA rule starts scaling out when the number of messages visible in your SQS queue exceeds 30, and scales in when there are fewer than 30 messages in the queue.

Create the HPA resource:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/hpa.yaml

 

Generate load using a producer

Finally, you can start generating messages to the queue:

$ kubectl apply -f https://raw.githubusercontent.com/awslabs/k8s-cloudwatch-adapter/master/samples/sqs/deploy/producer-deployment.yaml

On a separate terminal, you can now watch your HPA retrieving the queue length and start scaling the replicas. SQS metrics generate at five-minute intervals, so give the process a few minutes:

$ kubectl get hpa sqs-consumer-scaler -w

 

Clean up

After you complete this experiment, you can delete the Kubernetes deployment and respective resources.

Run the following commands to remove the consumer, external metric, HPA, and SQS queue:

$ kubectl delete deploy sqs-producer
$ kubectl delete hpa sqs-consumer-scaler
$ kubectl delete externalmetric sqs-helloworld-length
$ kubectl delete deploy sqs-consumer

$ aws sqs delete-queue helloworld

 

Other CloudWatch integrations

AWS recently announced the preview for Amazon CloudWatch Container Insights, which monitors, isolates, and diagnoses containerized applications running on EKS and Kubernetes clusters. To get started, see Using Container Insights.

 

Get involved

This project is currently under development. AWS welcomes issues and pull requests, and would love to hear your feedback.

How could this adapter be best implemented to work in your environment? Visit the Amazon CloudWatch Metrics Adapter for Kubernetes project on GitHub and let AWS know what you think.

FICO: Fraud Detection and Anti-Money Laundering with AWS Lambda and AWS Step Functions

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/fico-fraud-detection-and-anti-money-laundering-with-aws-lambda-and-aws-step-functions/

In this episode of This is My Architecture, filmed in 2018 on the last day of re:Invent (a learning conference hosted by Amazon Web Services for the global cloud computing community), FICO lead Software Engineer Sven Ahlfeld talks to AWS Solutions Architect Tom Jones about how the company uses a combination of AWS Lambda and AWS Step Functions to architect an on-demand solution for fraud detection and anti-money laundering.

When you think of FICO, you probably thing credit score. And that’s true: founded in 1956, FICO introduced analytic solutions–such as credit scoring–that have made credit more widely available in the US and around the world. As well, the FICO score is the standard measure of consumer risk in the US.

In the video, Sven explains that FICO is making software to meet regulatory compliance goals and requirements, in this case to tackle money laundering. FICO ingests a massive amounts of customer data in the form of financial documents into S3, and then uses S3 to trigger and analyze each document for a number of different fraud and money laundering characteristics.

Key architecture components are designed to be immutable, assuring that the EC2 instances doing the analysis work themselves can’t be compromised and tampered with. But as well, an unchangeable instance can scale very fast and allow for the ability to ingest a large amount of documents. It can also scale back when there is less demand. The immutable images also support regulatory requirements for the various needs and regulations of localities around the world.

 

*Check out more This Is My Architecture videos on YouTube.

About the author

Annik StahlAnnik Stahl is a Senior Program Manager in AWS, specializing in blog and magazine content as well as customer ratings and satisfaction. Having been the face of Microsoft Office for 10 years as the Crabby Office Lady columnist, she loves getting to know her customers and wants to hear from you.

Optimize Amazon EMR costs with idle checks and automatic resource termination using advanced Amazon CloudWatch metrics and AWS Lambda

Post Syndicated from Praveen Krishnamoorthy Ravikumar original https://aws.amazon.com/blogs/big-data/optimize-amazon-emr-costs-with-idle-checks-and-automatic-resource-termination-using-advanced-amazon-cloudwatch-metrics-and-aws-lambda/

Many customers use Amazon EMR to run big data workloads, such as Apache Spark and Apache Hive queries, in their development environment. Data analysts and data scientists frequently use these types of clusters, known as analytics EMR clusters. Users often forget to terminate the clusters after their work is done. This leads to idle running of the clusters and in turn, adds up unnecessary costs.

To avoid this overhead, you must track the idleness of the EMR cluster and terminate it if it is running idle for long hours. There is the Amazon EMR native IsIdle Amazon CloudWatch metric, which determines the idleness of the cluster by checking whether there’s a YARN job running. However, you should consider additional metrics, such as SSH users connected or Presto jobs running, to determine whether the cluster is idle. Also, when you execute any Spark jobs in Apache Zeppelin, the IsIdle metric remains active (1) for long hours, even after the job is finished executing. In such cases, the IsIdle metric is not ideal in deciding the inactivity of a cluster.

In this blog post, we propose a solution to cut down this overhead cost. We implemented a bash script to be installed in the master node of the EMR cluster, and the script is scheduled to run every 5 minutes. The script monitors the clusters and sends a CUSTOM metric EMR-INUSE (0=inactive; 1=active) to CloudWatch every 5 minutes. If CloudWatch receives 0 (inactive) for some predefined set of data points, it triggers an alarm, which in turn executes an AWS Lambda function that terminates the cluster.

Prerequisites

You must have the following before you can create and deploy this framework:

Note: This solution is designed as an additional feature. It can be applied to any existing EMR clusters by executing the scheduler script (explained later in the post) as an EMR step. If you want to implement this solution as a mandatory feature for your future clusters, you can include the EMR step as part of your cluster deployment. You can also apply this solution to EMR clusters that are spun up through AWS CloudFormation, the AWS CLI, and even the AWS Management Console.

Key components

The following are the key components of the solution.

Analytics EMR cluster

Amazon EMR provides a managed Apache Hadoop framework that lets you easily process large amounts of data across dynamically scalable Amazon EC2 instances. Data scientists use analytics EMR clusters for data analysis, machine learning using notebook applications (such as Apache Zeppelin or JupyterHub), and running big data workloads based on Apache Spark, Presto, etc.

Scheduler script

The schedule_script.sh is the shell script to be executed as an Amazon EMR step. When executed, it copies the monitoring script from the Amazon S3 artifacts folder and schedules the monitoring script to run every 5 minutes. The S3 location of the monitoring script should be passed as an argument.

Monitoring script

The pushShutDownMetrin.sh script is a monitoring script that is implemented using shell commands. It should be installed in the master node of the EMR cluster as an Amazon EMR step. The script is scheduled to run every 5 minutes and sends the cluster activity status to CloudWatch.  

JupyterHub API token script

The jupyterhub_addAdminToken.sh script is a shell script to be executed as an Amazon EMR step if JupyterHub is enabled on the cluster. In our design, the monitoring script uses REST APIs provided by JupyterHub to check whether the application is in use.

To send the request to JupyterHub, you must pass an API token along with the request. By default, the application does not generate API tokens. This script generates the API token and assigns it to the admin user, which is then picked up by the jupyterhub module in the monitoring script to track the activity of the application.

Custom CloudWatch metric

All Amazon EMR clusters send data for several metrics to CloudWatch. Metrics are updated every 5 minutes, automatically collected, and pushed to CloudWatch. For this use case, we created the Amazon EMR metric EMR-INUSE. This metric represents the active status of the cluster based on the module checks implemented in the monitoring script. The metric is set to 0 when the cluster is inactive and 1 when active.

Amazon CloudWatch

CloudWatch is a monitoring service that you can use to set high-resolution alarms to take automated actions. In this case, CloudWatch triggers an alarm if it receives 0 continuously for the configured number of hours.

AWS Lambda

Lambda is a serverless technology that lets you run code without provisioning or managing servers. With Lambda, you can run code for virtually any type of application or backend service—all with zero administration. You can set up your code to automatically trigger from other AWS services. In this case, the triggered CloudWatch alarm mentioned earlier signals Lambda to terminate the cluster.

Architectural diagram

The following diagram illustrates the sequence of events when the solution is enabled, showing what happens to the EMR cluster that is spun up via AWS CloudFormation.

 

The diagram shows the following steps:

  1. The AWS CloudFormation stack is launched to spin up an EMR cluster.
  2. The Amazon EMR step is executed (installs the pushShutDownMetric.sh and then schedules it as a cron job to run every 5 minutes).
  3. If the EMR cluster is active (executing jobs), the master node sets the EMR-INUSE metric to 1 and sends it to CloudWatch.
  4. If the EMR cluster is inactive, the master node sets the EMR-INUSE metric to 0 and sends it to CloudWatch.
  5. On receiving 0 for a predefined number of data points, CloudWatch triggers a CloudWatch alarm.
  6. The CloudWatch alarm sends notification to AWS Lambda to terminate the cluster.
  7. AWS Lambda executes the Lambda function.
  8. The Lambda function then deletes all the stack resources associated with the cluster.
  9. Finally, the EMR cluster is terminated, and the Stack ID is removed from AWS CloudFormation.

Modules in the monitoring script

Following are the different activity checks that are implemented in the monitoring script (pushShutDownMetric.sh). The script is designed in a modular fashion so that you can easily include new modules without modifying the core functionality.

ActiveSSHCheck

The ActiveSSHCheck module checks whether there are any active SSH connections to the master node. If there is an active SSH connection, and it’s idle for less than 10 minutes, the function sets the EMR-INUSE metric to 1 and pushes it to CloudWatch.

YARNCheck

Apache Hadoop YARN is the resource manager of the EMR Hadoop ecosystem. All the Spark Submits and Hive queries reach YARN initially. It then schedules and processes these jobs. The YARNCheck module checks whether there are any running jobs in YARN or jobs completed within last 5 minutes. If it finds any, the function sets the EMR-INUSE metric to 1 and pushes it to CloudWatch.The checks are performed by calling REST APIs exposed by YARN.

The API to fetch the running jobs is http://localhost:8088/ws/v1/cluster/apps?state=RUNNING.

The API to fetch the completed jobs is

http://localhost:8088/ws/v1/cluster/apps?state=FINISHED.

PRESTOCheck

Presto is an open-source distributed query engine for running interactive analytic queries. It is included in EMR release version 5.0.0 and later.The PRESTOCheck module checks whether there are any running Presto queries or if any queries have been completed within last 5 minutes. If there are some, the function sets the EMR-INUSE metric to 1 and pushes it to CloudWatch. These checks are performed by calling REST APIs exposed by the Presto server.

The API to fetch the Presto jobs is http://localhost:8889/v1/query.

ZeppelinCheck

Amazon EMR users use Apache Zeppelin as a notebook for interactive data exploration. The ZeppelinCheck module checks whether there are any jobs running or if any have been completed within the last 5 minutes. If so, the function sets the EMR-INUSE metric to 1 and pushes it to CloudWatch. These checks are performed by calling the REST APIs exposed by Zeppelin.

The API to fetch the list of notebook IDs is http://localhost:8890/api/notebook.

The API to fetch the status of each cell inside each notebook ID is http://localhost:8890/api/notebook/job/$notebookID.

JupyterHubCheck

Jupyter Notebook is an open-source web application that you can use to create and share documents that contain live code, equations, visualizations, and narrative text. JupyterHub allows you to host multiple instances of a single-user Jupyter notebook server.The JupyterHubCheck module checks whether any Jupyter notebook is currently in use.

The function uses REST APIs exposed by JupyterHub to fetch the list of Jupyter notebook users and gathers the data about individual notebook servers. From the response, it extracts the last activity time of the servers and checks whether any server was used in the last 5 minutes. If so, the function sets the EMR-INUSE metric to 1 and pushes it to CloudWatch. The jupyterhub_addAdminToken.sh script needs to be executed as an EMR step before enabling the scheduler script.

The API to fetch the list of notebook users is https://localhost:9443/hub/api/users -H "Authorization: token $admin_token".

The API to fetch individual server information is https://localhost:9443/hub/api/users/$user -H "Authorization: token $admin_token.

If any one of these checks fails, the cluster is considered to be inactive, and the monitoring script sets the EMR-INUSE metric to 0 and pushes it to CloudWatch.

Note:

The scheduler script schedules the monitoring script (pushShutDownMetric.sh) to run every 5 minutes. Internal cron jobs that run for a very few minutes are not considered in calibrating the EMR-INUSE metric.

Deploying each component

Follow the steps in this section to deploy each component of the proposed design.

Step 1. Create the Lambda function and SNS subscription

The Lambda function and the SNS subscription are the core components of the design. You must set up these components initially, and they are common for every cluster. The following are the AWS resources to be created for these components:

  • Execution role for the Lambda function
  • Terminate Idle EMR Lambda function
  • SNS topic and Lambda subscription

For one-step deployment, use this AWS CloudFormation template to launch and configure the resources in a single go.

The following parameters are available in the template.

ParameterDefaultDescription
s3Bucketemr-shutdown-blogartifactsThe name of the S3 bucket that contains the Lambda file
s3KeyEMRTerminate.zipThe Amazon S3 key of the Lambda file

For manual deployment, follow these steps on the AWS Management Console.

Execution role for the Lambda function

  1. Open the AWS Identity and Access Management (IAM) consoleand choose PoliciesCreate policy.
  2. Choose the JSON tab, paste the following policy text, and then choose Review policy.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:HeadBucket",
                "s3:ListObjects",
                "s3:GetObject",
                "cloudformation:ListStacks",
                "cloudformation:DeleteStack",
                "cloudformation:DescribeStacks",
                "cloudformation:ListStackResources",
                "elasticmapreduce:TerminateJobFlows"
            ],
            "Resource": "*",
            "Effect": "Allow",
            "Sid": "GenericAccess"
        },
        {
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*",
            "Effect": "Allow",
            "Sid": "LogAccess"
        }
    ]
}
  1. For Name, enter TerminateEMRPolicy and choose Create policy.
  2. Choose RolesCreate role.
  3. Under Choose the service that will use this role, choose Lambda, and then choose Next: Permissions.
  4. For Attach permissions policies, choose the arrow next to Filter policies and choose Customer managed in the drop-down list.
  5. Attach the TerminateEMRPolicy policy that you just created, and choose Review.
  6. For Role name, enter TerminateEMRLambdaRole and then choose Create role.

Terminate idle EMR – Lambda function

I created a deployment package to use with this function.

  1. Open the Lambda consoleand choose Create function.
  2. Choose Author from scratch, and provide the details as shown in the following screenshot:
  • Name: lambdaTerminateEMR
  • Runtime: Python 2.7
  • Role: Choose an existing role
  • Existing role: TerminateEMRLambdaRole

  1. Choose Create function.
  2. In the Function code section, for Code entry type, choose Upload a file from Amazon S3, and for Runtime, choose Python 2.7.

The Lambda function S3 link URL is

s3://emr-shutdown-blogartifacts/EMRTerminate.zip.

Link to the function: https://s3.amazonaws.com/emr-shutdown-blogartifacts/EMRTerminate.zip

This Lambda function is triggered by a CloudWatch alarm. It parses the input event, retrieves the JobFlowId, and deletes the AWS CloudFormation stack of the corresponding JobFlowId.

SNS topic and Lambda subscription

For setting the CloudWatch alarm in the further stages, you must create an Amazon SNS topic that notifies the preceding Lambda function to execute. Follow these steps to create an SNS topic and configure the Lambda endpoint.

  1. Navigate to the Amazon SNS console, and choose Create topic.
  2. Enter the Topic name and Display name, and choose Create topic.

  1. The topic is created and displayed in the Topics
  2. Select the topic and choose Actions, Subscribe to topic.

  1. In the Create subscription, choose the AWS Lambda Choose lambdaterminateEMR as the endpoint, and choose Create subscription.

Step 2. Execute the JupyterHub API token script as an EMR step

This step is required only when JupyterHub is enabled in the cluster.

Navigate to the EMR cluster to be monitored, and execute the scheduler script as an EMR step.

Command: s3://emr-shutdown-blogartifacts/jupyterhub_addAdminToken.sh

This script generates an API token and assigns it to the admin user. It is then picked up by the jupyterhub module in the monitoring script to track the activity of the application.

Step 3. Execute the scheduler script as an EMR step

Navigate to the EMR cluster to be monitored and execute the scheduler script as an EMR step.

Note:

Ensure that termination protection is disabled in the cluster. The termination protection flag causes the Lambda function to fail.

Command: s3://emr-shutdown-blogartifacts/schedule_script.sh

Parameter: s3://emr-shutdown-blogartifacts/pushShutDownMetrin.sh

The step function copies the pushShutDownMetric.sh script to the master node and schedules it to run every 5 minutes.

The schedule_script.sh is at https://s3.amazonaws.com/emr-shutdown-blogartifacts/schedule_script.sh.

The pushShutDownMetrin.sh is at https://s3.amazonaws.com/emr-shutdown-blogartifacts/pushShutDownMetrin.sh.

Step 4. Create a CloudWatch alarm

For single-step deployment, use this AWS CloudFormation template to launch and configure the resources in a single go.

The following parameters are available in the template.

ParameterDefaultDescription
AlarmNameTerminateIDLE-EMRAlarmThe name for the alarm.
EMRJobFlowIDRequires inputThe Jobflowid of the cluster.
EvaluationPeriodRequires inputThe idle timeout value—input should be in data points (1 data point equals 5 minutes). For example, to terminate the cluster if it is idle for 20 minutes, the input should be 4.
SNSSubscribeTopicRequires inputThe Amazon Resource Name (ARN) of the SNS topic to be triggered on the alarm.

 

The AWS CloudFormation CLI command is as follows:

aws cloudformation create-stack --stack-name EMRAlarmStack \
      --template-body s3://emr-shutdown-blogartifacts/Cloudformation/alarm.json \
      --parameters AlarmName=TerminateIDLE-EMRAlarm,EMRJobFlowID=<Input>,                 EvaluationPeriod=4,SNSSubscribeTopic=<Input>

For manual deployment, follow these steps to create the alarm.

  1. Open the Amazon CloudWatch console and choose Alarms.
  2. Choose Create Alarm.
  3. On the Select Metric page, under Custom Metrics, choose EMRShutdown/Cluster-Metric.

  1. Choose the isEMRUsed metric of the EMR JobFlowId, and then choose Next.

  1. Define the alarm as required. In this case, the alarm is set to send notification to the SNS topic shutDownEMRTest when CloudWatch receives the IsEMRUsed metric as 0 for every data point in the last 2 hours.

  1. Choose Create Alarm.

Summary

In this post, we focused on building a framework to cut down the additional cost that you might incur due to the idle running of an EMR cluster. The modules implemented in the shell script, the tracking of the execution status of the Spark scripts, and the Hive/Presto queries using the lightweight REST API calls make this approach an efficient solution.

If you have questions or suggestions, please comment below.

 


About the Author

Praveen Krishnamoorthy Ravikumar is an associate big data consultant with Amazon Web Services.

 

 

 

 

Build and automate a serverless data lake using an AWS Glue trigger for the Data Catalog and ETL jobs

Post Syndicated from Saurabh Shrivastava original https://aws.amazon.com/blogs/big-data/build-and-automate-a-serverless-data-lake-using-an-aws-glue-trigger-for-the-data-catalog-and-etl-jobs/

Today, data is flowing from everywhere, whether it is unstructured data from resources like IoT sensors, application logs, and clickstreams, or structured data from transaction applications, relational databases, and spreadsheets. Data has become a crucial part of every business. This has resulted in a need to maintain a single source of truth and automate the entire pipeline—from data ingestion to transformation and analytics— to extract value from the data quickly.

There is a growing concern over the complexity of data analysis as the data volume, velocity, and variety increases. The concern stems from the number and complexity of steps it takes to get data to a state that is usable by business users. Often data engineering teams spend most of their time on building and optimizing extract, transform, and load (ETL) pipelines. Automating the entire process can reduce the time to value and cost of operations. In this post, we describe how to create a fully automated data cataloging and ETL pipeline to transform your data.

Architecture

In this post, you learn how to build and automate the following architecture.

You build your serverless data lake with Amazon Simple Storage Service (Amazon S3) as the primary data store. Given the scalability and high availability of Amazon S3, it is best suited as the single source of truth for your data.

You can use various techniques to ingest and store data in Amazon S3. For example, you can use Amazon Kinesis Data Firehose to ingest streaming data. You can use AWS Database Migration Service (AWS DMS) to ingest relational data from existing databases. And you can use AWS DataSync to ingest files from an on-premises Network File System (NFS).

Ingested data lands in an Amazon S3 bucket that we refer to as the raw zone. To make that data available, you have to catalog its schema in the AWS Glue Data Catalog. You can do this using an AWS Lambda function invoked by an Amazon S3 trigger to start an AWS Glue crawler that catalogs the data. When the crawler is finished creating the table definition, you invoke a second Lambda function using an Amazon CloudWatch Events rule. This step starts an AWS Glue ETL job to process and output the data into another Amazon S3 bucket that we refer to as the processed zone.

The AWS Glue ETL job converts the data to Apache Parquet format and stores it in the processed S3 bucket. You can modify the ETL job to achieve other objectives, like more granular partitioning, compression, or enriching of the data. Monitoring and notification is an integral part of the automation process. So as soon as the ETL job finishes, another CloudWatch rule sends you an email notification using an Amazon Simple Notification Service (Amazon SNS) topic. This notification indicates that your data was successfully processed.

In summary, this pipeline classifies and transforms your data, sending you an email notification upon completion.

Deploy the automated data pipeline using AWS CloudFormation

First, you use AWS CloudFormation templates to create all of the necessary resources. This removes opportunities for manual error, increases efficiency, and ensures consistent configurations over time.

Launch the AWS CloudFormation template with the following Launch stack button.

Be sure to choose the US East (N. Virginia) Region (us-east-1). Then enter the appropriate stack name, email address, and AWS Glue crawler name to create the Data Catalog. Add the AWS Glue database name to save the metadata tables. Acknowledge the IAM resource creation as shown in the following screenshot, and choose Create.

Note: It is important to enter your valid email address so that you get a notification when the ETL job is finished.

This AWS CloudFormation template creates the following resources in your AWS account:

  • Two Amazon S3 buckets to store both the raw data and processed Parquet data.
  • Two AWS Lambda functions: one to create the AWS Glue Data Catalog and another function to publish topics to Amazon SNS.
  • An Amazon Simple Queue Service (Amazon SQS) queue for maintaining the retry logic.
  • An Amazon SNS topic to inform you that your data has been successfully processed.
  • Two CloudWatch Events rules: one rule on the AWS Glue crawler and another on the AWS Glue ETL job.
  • AWS Identity and Access Management (IAM) roles for accessing AWS Glue, Amazon SNS, Amazon SQS, and Amazon S3.

When the AWS CloudFormation stack is ready, check your email and confirm the SNS subscription. Choose the Resources tab and find the details.

Follow these steps to verify your email subscription so that you receive an email alert as soon as your ETL job finishes.

  1. On the Amazon SNS console, in the navigation pane, choose Topics. An SNS topic named SNSProcessedEvent appears in the display.

  1. Choose the ARN The topic details page appears, listing the email subscription as Pending confirmation. Be sure to confirm the subscription for your email address as provided in the Endpoint column.

If you don’t see an email address, or the link is showing as not valid in the email, choose the corresponding subscription endpoint. Then choose Request confirmation to confirm your subscription. Be sure to check your email junk folder for the request confirmation link.

Configure an Amazon S3 bucket event trigger

In this section, you configure a trigger on a raw S3 bucket. So when new data lands in the bucket, you trigger GlueTriggerLambda, which was created in the AWS CloudFormation deployment.

To configure notifications:

  1. Open the Amazon S3 console.
  2. Choose the source bucket. In this case, the bucket name contains raws3bucket, for example, <stackname>-raws3bucket-1k331rduk5aph.
  3. Go to the Properties tab, and under Advanced settings, choose Events.

  1. Choose Add notification and configure a notification with the following settings:
  • Name– Enter a name of your choice. In this example, it is crawlerlambdaTrigger.
  • Events– Select the All object create events check box to create the AWS Glue Data Catalog when you upload the file.
  • Send to– Choose Lambda function.
  • Lambda– Choose the Lambda function that was created in the deployment section. Your Lambda function should contain the string GlueTriggerLambda.

See the following screenshot for all the settings. When you’re finished, choose Save.

For more details on configuring events, see How Do I Enable and Configure Event Notifications for an S3 Bucket? in the Amazon S3 Console User Guide.

Download the dataset

For this post, you use a publicly available New York green taxi dataset in CSV format. You upload monthly data to your raw zone and perform automated data cataloging using an AWS Glue crawler. After cataloging, an automated AWS Glue ETL job triggers to transform the monthly green taxi data to Parquet format and store it in the processed zone.

You can download the raw dataset from the NYC Taxi & Limousine Commission trip record data site. Download the monthly green taxi dataset and upload only one month of data. For example, first upload only the green taxi January 2018 data to the raw S3 bucket.

Automate the Data Catalog with an AWS Glue crawler

One of the important aspects of a modern data lake is to catalog the available data so that it’s easily discoverable. To run ETL jobs or ad hoc queries against your data lake, you must first determine the schema of the data along with other metadata information like location, format, and size. An AWS Glue crawler makes this process easy.

After you upload the data into the raw zone, the Amazon S3 trigger that you created earlier in the post invokes the GlueTriggerLambdafunction. This function creates an AWS Glue Data Catalog that stores metadata information inferred from the data that was crawled.

Open the AWS Glue console. You should see the database, table, and crawler that were created using the AWS CloudFormation template. Your AWS Glue crawler should appear as follows.

Browse to the table using the left navigation, and you will see the table in the database that you created earlier.

Choose the table name, and further explore the metadata discovered by the crawler, as shown following.

You can also view the columns, data types, and other details.  In following screenshot, Glue Crawler has created schema from files available in Amazon S3 by determining column name and respective data type. You can use this schema to create external table.

Author ETL jobs with AWS Glue

AWS Glue provides a managed Apache Spark environment to run your ETL job without maintaining any infrastructure with a pay as you go model.

Open the AWS Glue console and choose Jobs under the ETL section to start authoring an AWS Glue ETL job. Give the job a name of your choice, and note the name because you’ll need it later. Choose the already created IAM role with the name containing <stackname>– GlueLabRole, as shown following. Keep the other default options.

AWS Glue generates the required Python or Scala code, which you can customize as per your data transformation needs. In the Advanced properties section, choose Enable in the Job bookmark list to avoid reprocessing old data.

On the next page, choose your raw Amazon S3 bucket as the data source, and choose Next. On the Data target page, choose the processed Amazon S3 bucket as the data target path, and choose Parquet as the Format.

On the next page, you can make schema changes as required, such as changing column names, dropping ones that you’re less interested in, or even changing data types. AWS Glue generates the ETL code accordingly.

Lastly, review your job parameters, and choose Save Job and Edit Script, as shown following.

On the next page, you can modify the script further as per your data transformation requirements. For this post, you can leave the script as is. In the next section, you automate the execution of this ETL job.

Automate ETL job execution

As the frequency of data ingestion increases, you will want to automate the ETL job to transform the data. Automating this process helps reduce operational overhead and free your data engineering team to focus on more critical tasks.

AWS Glue is optimized for processing data in batches. You can configure it to process data in batches on a set time interval. How often you run a job is determined by how recent the end user expects the data to be and the cost of processing. For information about the different methods, see Triggering Jobs in AWS Glue in the AWS Glue Developer Guide.

First, you need to make one-time changes and configure your ETL job name in the Lambda function and the CloudWatch Events rule. On the console, open the ETLJobLambda Lambda function, which was created using the AWS CloudFormation stack.

Choose the Lambda function link that appears, and explore the code. Change the JobName value to the ETL job name that you created in the previous step, and then choose Save.

As shown in in the following screenshot, you will see an AWS CloudWatch Events rule CrawlerEventRule that is associated with an AWS Lambda function. When the CloudWatch Events rule receives a success status, it triggers the ETLJobLambda Lambda function.

Now you are all set to trigger your AWS Glue ETL job as soon as you upload a file in the raw S3 bucket. Before testing your data pipeline, set up the monitoring and alerts.

Monitoring and notification with Amazon CloudWatch Events

Suppose that you want to receive a notification over email when your AWS Glue ETL job is completed. To achieve that, the CloudWatch Events rule OpsEventRule was deployed from the AWS CloudFormation template in the data pipeline deployment section. This CloudWatch Events rule monitors the status of the AWS Glue ETL job and sends an email notification using an SNS topic upon successful completion of the job.

As the following image shows, you configure your AWS Glue job name in the Event pattern section in CloudWatch. The event triggers an SNS topic configured as a target when the AWS Glue job state changes to SUCCEEDED. This SNS topic sends an email notification to the email address that you provided in the deployment section to receive notification.

Let’s make one-time configuration changes in the CloudWatch Events rule OpsEventRule to capture the status of the AWS Glue ETL job.

  1. Open the CloudWatch console.
  2. In the navigation pane, under Events, choose Rules. Choose the rule name that contains OpsEventRule, as shown following.

  1. In the upper-right corner, choose Actions, Edit.

  1. Replace Your-ETL-jobName with the ETL job name that you created in the previous step.

  1. Scroll down and choose Configure details. Then choose Update rule.

Now that you have set up an entire data pipeline in an automated way with the appropriate notifications and alerts, it’s time to test your pipeline. If you upload new monthly data to the raw Amazon S3 bucket (for example, upload the NY green taxi February 2018 CSV), it triggers the GlueTriggerLambda AWS Lambda function. You can navigate to the AWS Glue console, where you can see that the AWS Glue crawler is running.

Upon completion of the crawler, the CloudWatch Events rule CrawlerEventRule triggers your ETLJobLambda Lambda function. You can notice now that the AWS Glue ETL job is running.

When the ETL job is successful, the CloudWatch Events rule OpsEventRule sends an email notification to you using an Amazon SNS topic, as shown following, hence completing the automation cycle.

Be sure to check your processed Amazon S3 bucket, where you will find transformed data processed by your automated ETL pipeline. Now that the processed data is ready in Amazon S3, you need to run the AWS Glue crawler on this Amazon S3 location. The crawler creates a metadata table with the relevant schema in the AWS Glue Data Catalog.

After the Data Catalog table is created, you can execute standard SQL queries using Amazon Athena and visualize the data using Amazon QuickSight. To learn more, see the blog post Harmonize, Query, and Visualize Data from Various Providers using AWS Glue, Amazon Athena, and Amazon QuickSight

Conclusion

Having an automated serverless data lake architecture lessens the burden of managing data from its source to destination—including discovery, audit, monitoring, and data quality. With an automated data pipeline across organizations, you can identify relevant datasets and extract value much faster than before. The advantage of reducing the time to analysis is that businesses can analyze the data as it becomes available in real time. From the BI tools, queries return results much faster for a single dataset than for multiple databases.

Business analysts can now get their job done faster, and data engineering teams can free themselves from repetitive tasks. You can extend it further by loading your data into a data warehouse like Amazon Redshift or making it available for machine learning via Amazon SageMaker.

Additional resources

See the following resources for more information:

 


About the Author

Saurabh Shrivastava is a partner solutions architect and big data specialist working with global systems integrators. He works with AWS partners and customers to provide them with architectural guidance for building scalable architecture in hybrid and AWS environments. He enjoys spending time with his family outdoors and traveling to new destinations to discover new cultures.

 

 

 

Luis Lopez Soria is a partner solutions architect and serverless specialist working with global systems integrators. He works with AWS partners and customers to help them with adoption of the cloud operating model at a large scale. He enjoys doing sports in addition to traveling around the world exploring new foods and cultures.

 

 

 

Chirag Oswal is a partner solutions architect and AR/VR specialist working with global systems integrators. He works with AWS partners and customers to help them with adoption of the cloud operating model at a large scale. He enjoys video games and travel.

Using AWS Lambda and Amazon SNS to Get File Change Notifications from AWS CodeCommit

Post Syndicated from Eason Cao original https://aws.amazon.com/blogs/devops/using-aws-lambda-and-amazon-sns-to-get-file-change-notifications-from-aws-codecommit/

Notifications are an important part of DevOps workflows. Although you can set them up from any stage in the CI or CD pipelines, in this blog post, I will show you how to integrate AWS Lambda and Amazon SNS to extend AWS CodeCommit. Specifically, the solution described in this post makes it possible for you to receive detailed notifications from Amazon SNS about file changes and commit messages when an update is pushed to AWS CodeCommit.

Amazon SNS is a flexible, fully managed notifications service. It coordinates the delivery of messages to receivers. With Amazon SNS, you can fan out messages to a large number of subscribers, including distributed systems and services, and mobile devices. It is easy to set up, operate, and reliably send notifications to all your endpoints – at any scale.

AWS Lambda is our popular serverless service that lets you run code without provisioning or managing servers. In the example used in this post, I use a Lambda function to publish a topic through Amazon SNS to get an update notification.

Amazon CloudWatch is a monitoring and management service. It can collect operational data of AWS resources in the form of events. You can set up simple rules in Amazon CloudWatch to detect changes to your AWS resources. After CloudWatch captures the update event from your AWS resources, it can trigger specific targets to perform other actions (for example, to invoke a Lambda function).

To help you quickly deploy the solution, I have created an AWS CloudFormation template. AWS CloudFormation is a management tool that provides a common language to describe and provision all of the infrastructure resources in AWS.

 

Overview

The following diagram shows how to use AWS services to receive the CodeCommit file change event and details.

AWS CodeCommit supports several useful CloudWatch events, which can notify you of changes to AWS resources. By setting up simple rules, you can detect branch or repository changes. In this example, I create a CloudWatch event rule for an AWS CodeCommit repository so that any designated event invokes a Lambda function. When a change is made to the CodeCommit repository, CloudWatch detects the event and invokes the customized Lambda function.

When this Lambda function is triggered, the following steps are executed:

  1. Use the GetCommit operation in the CodeCommit API to get the latest commit. I want to compare the parent commit IDs with the last commit.
  2. For each commit, use the GetDifferences operation to get a list of each file that was added, modified, or deleted.
  3. Group the modification information from the comparison result and publish the message template to an SNS topic defined in the Lambda environment variable.
  4. Allow reviewers to subscribe to the SNS topic. Any update message from CodeCommit is published to subscribers.

I’ve used Python and Boto 3 to implement this function. The full source code has been published on GitHub. You can find the example in aws-codecommit-file-change-publisher repository.

 

Getting started

There is an AWS CloudFormation template, codecommit-sns-publisher.yml, in the source code. This template uses the AWS Serverless Application Model to define required components of the CodeCommit notification serverless application in simple and clean syntax.

The template is translated to an AWS CloudFormation stack and deploys an SNS topic, CloudWatch event rule, and Lambda function. The Lambda function code already demonstrates a simple notification use case. You can use the sample code to define your own logic and extend the function by using other APIs provided in the AWS SDK for Python (Boto3).

Prerequisites

Before you deploy this example, you must use the AWS CloudFormation template to create a CodeCommit repository. In this example, I have created an empty repository, sample-repo, in the Ohio (us-east-2) Region to demonstrate a scenario in which your repository has a file change or other update on a CodeCommit branch. If you already have a CodeCommit repository, follow these steps to deploy the template and Lambda function.

To deploy the AWS CloudFormation template and Lambda function

1. Download the source code from the aws-codecommit-file-change-publisher repository.

2. Sign in to the AWS Management Console and choose the AWS Region where your CodeCommit repository is located. Create an S3 bucket and then upload the AWS Lambda deployment package, codecommit-sns-publisher.zip, to it. For information, see How Do I Create an S3 Bucket? in the Amazon S3 Console User Guide.

3. Upload the Lambda deployment package to the S3 bucket.

In this example, I created an S3 bucket named codecommit-sns-publisher in the Ohio (us-east-2) Region and uploaded the deployment package from the Amazon S3 console.

4. In the AWS Management Console, choose CloudFormation. You can also open the AWS CloudFormation console directly at https://console.aws.amazon.com/cloudformation.

5. Choose Create Stack.

6. On the Select Template page, choose Upload a template to Amazon S3, and then choose the codecommit-sns-publisher.yml template.

7. Specify the following parameters:

  • Stack Name: codecommit-sns-publisher (You can use your own stack name, if you prefer.)
  • CodeS3BucketLocation: codecommit-sns-publisher (This is the S3 bucket name where you put the sample code.)
  • CodeS3KeyLocation: codecommit-sns-publisher.zip (This is the key name of the sample code S3 object. The object should be a zip file.)
  • CodeCommitRepo: sample-repo (The name of your CodeCommit repository.)
  • MainBranchName: master (Specify the branch name you would like to use as a trigger for publishing an SNS topic.)
  • NotificationEmailAddress: [email protected] (This is the email address you would like to use to subscribe to the SNS topic. The CloudFormation template creates an SNS topic to publish notifications to subscribers.)

8. Choose Next.

9. On the Review page, under Capabilities, choose the following options:

  • I acknowledge that AWS CloudFormation might create IAM resources.
  • I acknowledge that AWS CloudFormation might create IAM resources with custom names.

10. Under Transforms, choose Create Change Set. AWS CloudFormation starts to perform the template transformation and then creates a change set.

11. After the transformation, choose Execute to create the AWS CloudFormation stack.

After the stack has been created, you should receive an SNS subscription confirmation in your email account:

After you subscribe to the SNS topic, you can go to the AWS CloudFormation console and check the created AWS resources. If you would like to monitor the Lambda function, choose Resource to open the SNSPublisherFunction Lambda function.

Now, you can try to push a commit to the remote AWS CodeCommit repository.

1. Clone the CodeCommit repository to your local computer. For information, see Connect to an AWS CodeCommit Repository in the AWS CodeCommit User Guide. The following example shows how to clone a repository named sample-repo in the US East (Ohio) Region:

git clone ssh://git-codecommit.us-east-2.amazonaws.com/v1/repos/sample-repo

2. Enter the folder and create a plain text file:

cd sample-repo/
echo 'This is a sample file' > newfile

3. Add and commit this file change:

git add newfile
git commit -m 'Create initial file'

Look for this output:

[master (root-commit) 810d192] Create initial file
1 file changed, 1 insertion(+)
create mode 100644 newfile

4. Push the commit to the remote CodeCommit repository:

git push -u origin master:master

Look for this output:

Counting objects: 100% (3/3), done.
Writing objects: 100% (3/3), 235 bytes | 235.00 KiB/s, done.
…
* [new branch]      master -> master
Branch 'master' set up to track remote branch 'master' from 'origin'.

After the local commit has been pushed to the remote CodeCommit repository, the CloudWatch event detects this update. You should see the following notification message in your email account:

Commit ID: <Commit ID>
author: [YourName] ([email protected]) - <Timestamp> +0000
message: Create initial file

File: newfile Addition - Blob ID: <Blob ID>

Summary

In this blog post, I showed you how to use an AWS CloudFormation template to quickly build a sample solution that can help your operations team or development team track updates to a CodeCommit repository.

The example CloudFormation template and Lambda function can be found in the aws-codecommit-file-change-publisher GitHub repository. Using the sample code, you can customize the email content with HTML or add other information to your email message.

If you have questions or other feedback about this example, please open an issue or submit a pull request.

Stream Amazon CloudWatch Logs to a Centralized Account for Audit and Analysis

Post Syndicated from David Bailey original https://aws.amazon.com/blogs/architecture/stream-amazon-cloudwatch-logs-to-a-centralized-account-for-audit-and-analysis/

A key component of enterprise multi-account environments is logging. Centralized logging provides a single point of access to all salient logs generated across accounts and regions, and is critical for auditing, security and compliance. While some customers use the built-in ability to push Amazon CloudWatch Logs directly into Amazon Elasticsearch Service for analysis, others would prefer to move all logs into a centralized Amazon Simple Storage Service (Amazon S3) bucket location for access by several custom and third-party tools. In this blog post, I will show you how to forward existing and any new CloudWatch Logs log groups created in the future to a cross-account centralized logging Amazon S3 bucket.

The streaming architecture I use in the destination logging account is a streamlined version of the architecture and AWS CloudFormation templates from the Central logging in Multi-Account Environments blog post by Mahmoud Matouk. This blog post assumes some knowledge of CloudFormation, Python3 and the boto3 AWS SDK. You will need to have or configure an AWS working account and logging account, an IAM access and secret key for those accounts, and a working environment containing Python and the boto3 SDK. (For assistance, see the Getting Started Resource Center and Start Building with SDKs and Tools.) All CloudFormation templates and Python code used in this article can be found in this GitHub Repository.

Setting Up the Solution

You need to create or use an existing S3 bucket for storing CloudFormation templates and Python code for an AWS Lambda function. This S3 bucket is referred to throughout the blog post as the <S3 infrastructure-bucket>. Ensure that the bucket does not block new bucket policies or cross-account access by checking the bucket’s Permissions tab and the Public access settings button.

You also need a bucket policy that allows each account that needs to stream logs to access it when we create the AWS Lambda function below. To do so, update your bucket policy to include each new account you create and the <S3 infrastructure-bucket> ARN from the top of the Bucket policy editor page to modify this template:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                  "03XXXXXXXX85",
                  "29XXXXXXXX02",
                  "13XXXXXXXX96",
                  "37XXXXXXXX30",
                  "86XXXXXXXX95"
                ]
            },
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": [
                "arn:aws:s3:::<S3 infrastructure-bucket>",
                "arn:aws:s3:::<S3 infrastructure-bucket>/*"
            ]
        }
    ]
}

Clone a local copy of the CloudFormation templates and Python code from the GitHub repository. Compress the CentralLogging.py and lambda.py into a .zip file for the lambda function we create below and name it AddSubscriptionFilter.zip. Load these local files into the <S3 infrastructure-bucket>. I recommend using folders called /python for the .py files, /lambdas for the AddSubscriptionFilter.zip file and /cfn for the CloudFormation templates.

Multi-Account Configuration and the Central Logging Account

One form of multi-account configuration is the Landing Zone offering, which provides a core logging account for storing all logs for auditing. I use this account configuration as an example in this blog post. Initially, the Landing Zone setup creates several stack sets and resources, including roles, security groups, alarms, lambda functions, a cloud trail stream and an S3 bucket.

If you are not using a Landing Zone, create an appropriately named S3 bucket in the account you have chosen as a logging account. This S3 bucket will be referred to later as the <LoggingS3Bucket>. To mimic what the Landing Zone calls its logging bucket, you can use the format aws-landing-zone-logs-<Account Number><Region>, or simply pick an appropriate name for the centralized logging location. In a production environment, remember that it is critical to lock down the access to logging resources and the permissions allowed within the account to prevent deletion or tampering with the logs.

Figure 1 - Initial Landing Zone logging account resources

Figure 1 – Initial Landing Zone logging account resources

The S3 bucket – aws-landing-zone-logs-<Account Number><Region> is the most important resource created by the stack-sets for logging purposes. It contains all of the logs streamed to it from all of the accounts. Initially, the Landing Zone only sends the AWS CloudTrail and AWS Config logs to this S3 bucket.

In order to send all of the other CloudWatch Logs that are necessary for auditing, we need to add a destination and streaming mechanism to the logging account.

Logging Account Insfrastructure

The additional infrastructure required in the central logging account provides a destination for the log group subscription filters and a stream for log events that are sent from all accounts and appropriate regions to load them into the <LoggingS3Bucket> repository. The selection of these particular AWS resources is important, because Kinesis Data Streams is the only resource currently supported as a destination for cross-account CloudWatch Logs subscription filters.

The centralLogging.yml CloudFormation template automates the creation of the entire required infrastructure in the core logging account. Make sure to run it in each of the regions in which you need to centralize logs. The log group subscription filter and destination regions must match in order to successfully stream the logs.

Installation Instructions:

  1. Modify the centralLogging.yml template to add your account numbers for all of the accounts you want to stream logs from into the DestinationPolicy where you see the <AccountNumberHere> placeholders. Remove any unused placeholders.
  2. In the same DestinationPolicy, modify the final arn statement, replacing <region> with the region it will be run in (e.g., us-east-1), and the <logging account number> with the account number of the logging account where this template is to be run.
  3. Log in to the core logging account and access the AWS management console using administrator credentials.
  4. Navigate to CloudFormation and click the Create Stack button.
  5. Select Specify an Amazon S3 template URL and enter the Link for the centralLogging.yml template found in the <S3 infrastructure-bucket>.
  6. Enter a stack name, such as CentralizedLogging, and the one parameter called LoggingS3Bucket. Enter in the ARN of the logging bucket: arn:aws:s3::: <LoggingS3Bucket>. This can be obtained by opening the S3 console, clicking on the bucket icon next to this bucket, and then clicking the Copy Bucket ARN button.
  7. Skip the next page, acknowledge the creation of IAM resources, and Create the stack.
  8. When the stack completes, select the stack name to go to stack details and open the Outputs. Copy the value of the DestinationArnExport, which will be needed as a parameter for the script in the next section.

Upon successful creation of this CloudFormation stack, the following new resources will be created:

  • Amazon CloudWatch Logs Destination
  • Amazon Kinesis Stream
  • Amazon Kinesis Firehose Stream
  • Two AWS Identity and Access Management (IAM) Roles
Figure 2 - New infrastructure required in the centralized logging account

Figure 2 – New infrastructure required in the centralized logging account

Because the Landing Zone is a multi-account offering, the Log Destination is required to be the destination for all subscription filters. The key feature of the destination is its DestinationPolicy. Whenever a new account is added to the environment, its account number needs to be added to this DestinationPolicy in order for logs to be sent to it from the new account. Add the new account number in the centralLogging.yml CloudFormation template, and run an update in CloudFormation to complete the addition. A sample Destination Policy looks like this:

{
  "Version" : "2012-10-17",
  "Statement" : [
    {
      "Effect" : "Allow",
      "Principal" : {
        "AWS" : [
          "03XXXXXXXX85",
          "29XXXXXXXX02",
          "13XXXXXXXX96",
          "37XXXXXXXX30",
          "86XXXXXXXX95"
        ]
      },
      "Action" : "logs:PutSubscriptionFilter",
      "Resource" : "arn:aws:logs:<Region>:<LoggingAccountNumber>:destination:CentralLogDestination"
    }
  ]
}

The Kinesis Stream get records from the Logs Destination and holds them for 48 hours. Kinesis Streams scale by adding shards. The CloudFormation template starts the stream with two shards. You need to monitor this as instances and applications are deployed into the accounts, however, because all CloudWatch log objects will flow through this stream, and it will need to be scaled up at some point. To scale, change the number of shards (ShardCount) in the Kinesis Stream resource (KinesisLoggingStream) to the required number. See the Amazon Kinesis Data Streams FAQ documentation to confirm the capacity and throughput of each shard.

Kinesis Firehose provides a simple and efficient mechanism to retrieve the records from the Kinesis Stream and load them into the <LoggingS3Bucket> repository. It uses the CloudFormation template parameter to know where to load the logs. All of the CloudWatch logs loaded by Firehose will be under the prefix /CentralizedAccountsLog. The buffering hints for Firehose suggest that the logs be loaded every 5 minutes or 50 MB. Leave the CompressionFormat UNCOMPRESSED, since the logs are already compressed.

There are two AWS Identity and Access Management (IAM) roles created for this infrastructure. The first, CWLtoKinesisRole is used by the destination to allow CloudWatch Logs from all regions to use the destination to put the log object records into the Kinesis Stream, as well as to pass the role. The second, FirehoseDeliveryRole, allows Firehose to get the log object records from the Kinesis Stream, and then to load them into S3 logging bucket.

Once you have successfully created this infrastructure, the next step is to add the subscription filters to existing log groups.

Adding Subscription Filters to Existing Log Groups

The next step in the process is to add subscription filters for the Log Destination in the core logging account to all existing log groups. Several log groups are created by the Landing Zone, or you may have created them by using various AWS services or by logging application events. For every new AWS account, you will need to run the init_account_central_logging.py Python script to add the subscription filters to all the existing log groups.

The init_account_central_logging.py script takes one parameter, which is the Log Destination ARN. Use the Destination ARN you copied from the stack details output in the previous section as the parameter to the script.

The init_account_central_logging.py script first adds this Destination ARN to the AWS Systems Manager Parameter Store so that the core logic that creates the subscription filter can use it. The script then gets a list of all existing log groups, iterates over them, deletes any existing subscription filters (because there can only be one subscription filter per log group and attempting to create another would cause an error), and then adds the new subscription filter to the centralized logging account to the Log Destination.

Figure 3 - Run script to add subscription filters to existing log groups

Figure 3 – Run script to add subscription filters to existing log groups

Installation Instructions:

  1. Make sure that Python and boto3 are installed and accessible in the client computer – consider loading into a virtual environment to keep dependencies separate.
  2. Set the AWS_PROFILE environment variable to the appropriate AWS account profile.
  3. Log in to the proper account, and obtain administrator or other credentials with appropriate permissions, and add the account access key and secret key to the AWS credentials file.
  4. Set the region and output in the AWS config file.
  5. Download and place two python files into a working directory: init_account_central_logging.py and CentralLogging.py.
  6. Run the script using the command python3 ./init_account_central_logging.py -d <LogDestinationArn>.

Use the AWS Management Console to validate the results. Navigate to CloudWatch Logs and view all of the log groups. Each one should now have a subscription filter named “Logs (CentralLogDestination).”

Automatically Adding Subscription Filters to New Log Groups

The final step to set up the centralized log streaming capability is to run a CloudFormation script to create resources that automatically add subscription filters to new log groups. New log groups are created in accounts by resources (e.g., Lambda functions) and by applications. A subscription filter must be added to every new log group in order to deliver its log events to the logging account,

The AddSubscriptionFilter.yml CloudFormation template contains resources to automatically add subscription filters.

First, it creates a role that allows it to access the lambda code that is stored in a centralized location – the <S3 infrastructure-bucket>. (Remember that its S3 bucket policy must contain this account number in order to access the lambda code.)

Second, the template creates the AddSubscriptionLambda, which reuses the core logic shared by the script in the last section. It retrieves the proper destination from the Parameter Store, deletes any existing subscription filter from the log group, and adds the new subscription filter to the newly created log group. This lambda function is triggered by a CloudWatch event rule.

Third, the CloudFormation creates a Lambda Permission, which allows the event trigger to invoke this particular lambda.

Finally, the CloudFormation template creates an Amazon CloudWatch Events Rule that acts as a trigger for the lambda. This rule looks for an event coming from CloudTrail that signals the creation of a new log group. For each create log group event found, it invokes the AddSubscriptionLambda.

Figure 4 - Infrastructure to automatically add a subscription filter to a new log group and the log flow to the centralized account

Figure 4 – Infrastructure to automatically add a subscription filter to a new log group and the log flow to the centralized account

Installation Instructions:

(Important note: This functionality requires that the LogDestination parameter be properly set to the LogDestinationArn in the Parameter Store before the Lambda will run successfully. The script in the previous step sets this parameter, or it can be done manually. Make certain that the destination specified is in this same region.)

  1. Ensure that the <S3 infrastructure-bucket> has the AddSubscriptionFilter.zip file containing the Python code files lambda.py and CentralLogging.py.
  2. Log in to the appropriate account, and access using administrator credentials. Make sure that the region is set properly.
  3. Navigate to Cloudformation and click the Create Stack button.
  4. Select Specify an Amazon S3 template URL and enter the Link for the AddSubscriptionFilter.yml template found in <S3 infrastructure-bucket>
  5. Enter a stack name, such as AddSubscription.
  6. Enter the two parameters, the <S3 infrastructure-bucket> name (not ARN) and the folder and file name (e.g., lambdas/AddSubscriptionFilter.zip)
  7. Skip the next page, acknowledge the creation of IAM resources, and Create the stack.

In order to test that the automated addition of subscription filters is working properly, use the AWS Management Console to navigate to CloudWatch Logs and click the Actions button. Select Create New Log Group and enter a random log group name, such as “testLogGroup.” When first created, the log group will not have a subscription filter. After a few minutes, refresh the display and you should see the new subscription filter on the log group. At this point, you can delete the test log group.

New Account Setup

As a reminder, when you add new accounts that you want to have stream log events to the central logging account, you will need to configure the new accounts in two places in order for this functionality to work properly.

First, add the account number to the LoggingDestination property DestinationPolicy in the centralLogging.yml template. Then, update the CloudFormation stack.

Second, modify the bucket policy for the <S3 infrastructure-bucket>. Select the Permissions tab, then the Bucket Policy button. Add the new account to allow cross-account access to the lambda code by adding the line “arn:aws:iam::<new account number>:root” to the Principal.AWS list.

Conclusion

Centralized logging is a key component in enterprise multi-account architectures. In this blog post, I have built on the central logging in multi-account environments streaming architecture to automatically subscribe all CloudWatch Logs log groups to send all log events to an S3 bucket in a designated logging account. The solution uses a script to add subscription filters to existing log groups, and a lambda function to automatically place a subscription filter on all new log groups created within the account. This can be used to forward application logs, security logs, VPC flow logs, or any other important logs that are required for audit, security, or compliance purposes.

About the author

David BaileyDavid Bailey is a Cloud Infrastructure Architect with AWS Professional Services specializing in serverless application architecture, IoT, and artificial intelligence. He has spent decades architecting and developing complex custom software applications, as well as teaching internationally on object-oriented design, expert systems, and neural networks.

 

 

New – How to better monitor your custom application metrics using Amazon CloudWatch Agent

Post Syndicated from Helen Lin original https://aws.amazon.com/blogs/devops/new-how-to-better-monitor-your-custom-application-metrics-using-amazon-cloudwatch-agent/

This blog was contributed by Zhou Fang, Sr. Software Development Engineer for Amazon CloudWatch and Helen Lin, Sr. Product Manager for Amazon CloudWatch

Amazon CloudWatch collects monitoring and operational data from both your AWS resources and on-premises servers, providing you with a unified view of your infrastructure and application health. By default, CloudWatch automatically collects and stores many of your AWS services’ metrics and enables you to monitor and alert on metrics such as high CPU utilization of your Amazon EC2 instances. With the CloudWatch Agent that launched last year, you can also deploy the agent to collect system metrics and application logs from both your Windows and Linux environments. Using this data collected by CloudWatch, you can build operational dashboards to monitor your service and application health, set high-resolution alarms to alert and take automated actions, and troubleshoot issues using Amazon CloudWatch Logs.

We recently introduced CloudWatch Agent support for collecting custom metrics using StatsD and collectd. It’s important to collect system metrics like available memory, and you might also want to monitor custom application metrics. You can use these custom application metrics, such as request count to understand the traffic going through your application or understand latency so you can be alerted when requests take too long to process. StatsD and collectd are popular, open-source solutions that gather system statistics for a wide variety of applications. By combining the system metrics the agent already collects, with the StatsD protocol for instrumenting your own metrics and collectd’s numerous plugins, you can better monitor, analyze, alert, and troubleshoot the performance of your systems and applications.

Let’s dive into an example that demonstrates how to monitor your applications using the CloudWatch Agent.  I am operating a RESTful service that performs simple text encoding. I want to use CloudWatch to help monitor a few key metrics:

  • How many requests are coming into my service?
  • How many of these requests are unique?
  • What is the typical size of a request?
  • How long does it take to process a job?

These metrics help me understand my application performance and throughput, in addition to setting alarms on critical metrics that could indicate service degradation, such as request latency.

Step 1. Collecting StatsD metrics

My service is running on an EC2 instance, using Amazon Linux AMI 2018.03.0. Make sure to attach the CloudWatchAgentServerPolicy AWS managed policy so that the CloudWatch agent can collect and publish metrics from this instance:

Here is the service structure:

 

The “/encode” handler simply returns the base64 encoded string of an input text.  To monitor key metrics, such as total and unique request count as well as request size and method response time, I used StatsD to define these custom metrics.

@RestController

public class EncodeController {

    @RequestMapping("/encode")
    public String encode(@RequestParam(value = "text") String text) {
        long startTime = System.currentTimeMillis();
        statsd.incrementCounter("totalRequest.count", new String[]{"path:/encode"});
        statsd.recordSetValue("uniqueRequest.count", text, new String[]{"path:/encode"});
        statsd.recordHistogramValue("request.size", text.length(), new String[]{"path:/encode"});
        String encodedString = Base64.getEncoder().encodeToString(text.getBytes());
        statsd.recordExecutionTime("latency", System.currentTimeMillis() - startTime, new String[]{"path:/encode"});
        return encodedString;
    }
}

Note that I need to first choose a StatsD client from here.

The “/status” handler responds with a health check ping.  Here I am monitoring my available JVM memory:

@RestController
public class StatusController {

    @RequestMapping("/status")
    public int status() {
        statsd.recordGaugeValue("memory.free", Runtime.getRuntime().freeMemory(), new String[]{"path:/status"});
        return 0;
    }
}

 

Step 2. Emit custom metrics using collectd (optional)

collectd is another popular, open-source daemon for collecting application metrics. If I want to use the hundreds of available collectd plugins to gather application metrics, I can also use the CloudWatch Agent to publish collectd metrics to CloudWatch for 15-months retention. In practice, I might choose to use either StatsD or collectd to collect custom metrics, or I have the option to use both. All of these use cases  are supported by the CloudWatch agent.

Using the same demo RESTful service, I’ll show you how to monitor my service health using the collectd cURL plugin, which passes the collectd metrics to CloudWatch Agent via the network plugin.

For my RESTful service, the “/status” handler returns HTTP code 200 to signify that it’s up and running. This is important to monitor the health of my service and trigger an alert when the application does not respond with a HTTP 200 success code. Additionally, I want to monitor the lapsed time for each health check request.

To collect these metrics using collectd, I have a collectd daemon installed on the EC2 instance, running version 5.8.0. Here is my collectd config:

LoadPlugin logfile
LoadPlugin curl
LoadPlugin network

<Plugin logfile>
  LogLevel "debug"
  File "/var/log/collectd.log"
  Timestamp true
</Plugin>

<Plugin curl>
    <Page "status">
        URL "http://localhost:8080/status";
        MeasureResponseTime true
        MeasureResponseCode true
    </Page>
</Plugin>

<Plugin network>
    <Server "127.0.0.1" "25826">
        SecurityLevel Encrypt
        Username "user"
        Password "secret"
    </Server>
</Plugin>

 

For the cURL plugin, I configured it to measure response time (latency) and response code (HTTP status code) from the RESTful service.

Note that for the network plugin, I used Encrypt mode which requires an authentication file for the CloudWatch Agent to authenticate incoming collectd requests.  Click here for full details on the collectd installation script.

 

Step 3. Configure the CloudWatch agent

So far, I have shown you how to:

A.  Use StatsD to emit custom metrics to monitor my service health
B.  Optionally use collectd to collect metrics using plugins

Next, I will install and configure the CloudWatch agent to accept metrics from both the StatsD client and collectd plugins.

I installed the CloudWatch Agent following the instructions in the user guide, but here are the detailed steps:

Install CloudWatch Agent:

wget https://s3.amazonaws.com/amazoncloudwatch-agent/linux/amd64/latest/AmazonCloudWatchAgent.zip -O AmazonCloudWatchAgent.zip && unzip -o AmazonCloudWatchAgent.zip && sudo ./install.sh

Configure CloudWatch Agent to receive metrics from StatsD and collectd:

{
  "metrics": {
    "append_dimensions": {
      "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
      "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "collectd": {},
      "statsd": {}
    }
  }
}

Pass the above config (config.json) to the CloudWatch Agent:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:config.json -s

In case you want to skip these steps and just execute my sample agent install script, you can find it here.

 

Step 4. Generate and monitor application traffic in CloudWatch

Now that I have the CloudWatch agent installed and configured to receive StatsD and collect metrics, I’m going to generate traffic through the service:

echo "send 100 requests"
for i in {1..100}
do
   curl "localhost:8080/encode?text=TextToEncode_${i}[email protected]#%"
   echo ""
   sleep 1
done

 

Next, I log in to the CloudWatch console and check that the service is up and running. Here’s a graph of the StatsD metrics:

 

Here is a graph of the collectd metrics:

 

Conclusion

With StatsD and collectd support, you can now use the CloudWatch Agent to collect and monitor your custom applications in addition to the system metrics and application logs it already collects. Furthermore, you can create operational dashboards with these metrics, set alarms to take automated actions when free memory is low, and troubleshoot issues by diving into the application logs.  Note that StatsD supports both Windows and Linux operating systems while collectd is Linux only.  For Windows, you can also continue to use Windows Performance Counters to collect custom metrics instead.

The CloudWatch Agent with custom metrics support (version 1.203420.0 or later) is available in all public AWS Regions, AWS GovCloud (US), with AWS China (Beijing) and AWS China (Ningxia) coming soon.

The agent is free to use; you pay the usual CloudWatch prices for logs and custom metrics.

For more details, head over to the CloudWatch user guide for StatsD and collectd.

Building an Amazon CloudWatch Dashboard Outside of the AWS Management Console

Post Syndicated from Stephen McCurry original https://aws.amazon.com/blogs/devops/building-an-amazon-cloudwatch-dashboard-outside-of-the-aws-management-console/

Steve McCurry is a Senior Product Manager for CloudWatch

This is the second in a series of two blog posts that demonstrate how to use the new CloudWatch
snapshot graphs feature. You can find the first post here.

A key challenge for any DevOps team is to provide sufficient monitoring visibility on service
health. Although CloudWatch dashboards are a powerful tool for monitoring your systems and
applications, the dashboards are accessible only to users with permissions to the AWS
Management Console. You can now use a new CloudWatch feature, snapshot graphs, to create
dashboards that contain CloudWatch graphs and are available outside of the AWS Management
Console. You can display CloudWatch snapshot graphs on your internal wiki pages or TV-based
dashboards. You can integrate them with chat applications and ticketing and bug tracking tools.

This blog post shows you how to embed CloudWatch snapshot graphs into your websites using a
lightweight, embeddable widget written in JavaScript.

Snapshot graphs overview

CloudWatch snapshot graphs are images of CloudWatch charts that are useful for building
custom dashboards or integrating with tools outside of AWS. Although the images are static,
they can be refreshed frequently to create a live dashboard experience.

CloudWatch dashboards and charts provide flexible, interactive visualizations that can be used to
create unified operational views across your AWS resources and metrics. However, maybe you
want to display CloudWatch charts on a TV screen for team-level visibility, take snapshots of
charts for auditing in ticketing systems and bug tracking tools, or share snapshots in chat
applications to collaborate on an issue. For these use cases and more, snapshot graphs are an
ideal tool for integrating CloudWatch charts with your webpages and third-party applications.

Snapshot graphs are available through the CloudWatch API, which you can use through the
AWS SDKs or CLI. The charts you request through the API are represented as JSON. To copy
the JSON definition of the graph and use it in the API request, open the Amazon CloudWatch
console. You’ll find the JSON on the Source tab of the Metrics page, as shown here.

All of the features of the CloudWatch line and stacked graphs are available in snapshot graphs,
including vertical and horizontal annotations.

Embedding a snapshot graph in your webpage

In this demonstration, we will set up monitoring for an EC2 instance and embed a CloudWatch
snapshot graph for CPUUtilization in a website outside of the AWS Management Console. The
embeddable widget can be configured to support any CloudWatch line or stacked chart. This
demonstration involves these steps:

  1. Create an EC2 instance to monitor.
  2. Create the Lambda function that calls CloudWatch GetMetricWidgetImage.
  3. Create an API Gateway endpoint that proxies requests to the Lambda function.
  4. Embed the widget into a website and configure it for the API Gateway request.

The code for this solution is available from the SnapshotWidgetDemo GitHub repo.

The embeddable JavaScript widget will communicate with CloudWatch through a gateway in
Amazon API Gateway and an AWS Lambda backend. The advantage of using API Gateway is
the additional flexibility you have to secure the endpoint and create fine-grained access control.
For example, you can block access to the endpoint from outside of your corporate network.
Amazon Route 53 could be an alternative solution.

The end goal is to have a webpage running on your local machine that displays a CloudWatch
snapshot graph displaying live metric data from a sample EC2 instance. The sample code
includes a basic webpage containing the embed code.

The JavaScript widget requests a snapshot graph from an API Gateway endpoint. API Gateway
proxies the request to a Lambda function that calls the new CloudWatch API service,
GetMetricWidgetImage. The retrieved snapshot graph is returned in binary and displayed on the
website in an IMG HTML tag.

Here is what the end-to-end solution looks like:

Server setup

  1. Download the repository.
  2. Navigate to ./server and run npm install
  3. From the server folder, run zip -r snapshotwidgetdemo.zip ./*
  4. Upload snapshotwidgetdemo.zip to any S3 bucket.
  5. Upload ./server/apigateway-lambda.json to any S3 bucket.
  6. Navigate to the AWS CloudFormation console and choose Create Stack.
    • Point the new stack to the S3 location in step 5.
    • During setup, you will be asked for the Lambda S3 bucket name from step 4.

The AWS CloudFormation script will create all the required server-side components described in
the previous section.

Client setup

  1. Navigate to ./client and run npm install
  2. Edit ./demo/index.html to replace the following placeholders with your values.
    a. <YOUR_INSTANCE_ID> You can find the instance ID in the AWS
    CloudFormation stack output.
    b. <YOUR_API_GATEWAY_URL> You can find the full URL in the AWS
    CloudFormation stack output.
    c. <YOUR_API_KEY> The API gateway requires a key. The key reference but not
    the key itself appears in theAWS CloudFormation stack output. To retrieve the
    key value, go to the Keys tab of the Amazon API Gateway console.
  3. Build the component using WebPack ./node_modules/.bin/webpack –config
    webpack.config.js
  4. Server the demo webpage on localhost ./node_modules/.bin/webpack-dev-server —
    open

The browser should open at index.html automatically. The page contains one embedded snapshot
graph with the CPU utilization of your EC2 instance.

Troubleshooting

If you don’t see anything on the webpage, use the browser console tools to check for console
error messages.

If you still can’t debug the problem, go to the Amazon API Gateway console. On the Logs tab,
make sure that Enable CloudWatch Logs is selected, as shown here:

To check the Lambda logs, in the CloudWatch console, choose the Logs tab, and then search for
the name of your Lambda function.

Summary

This blog post provided a solution for embedding CloudWatch snapshot graphs into webpages
and wikis outside of the AWS Management Console. To read the other blog post in this series
about CloudWatch snapshot graphs, see Reduce Time to Resolution with Amazon CloudWatch
Snapshot Graphs and Alerts.

For more information, see the snapshot graphs API documentation or visit our home page to
learn more about how Amazon CloudWatch achieves monitoring visibility for your cloud
resources and applications.

It would be great to hear your feedback.

Investigating spikes in AWS Lambda function concurrency

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/investigating-spikes-in-aws-lambda-function-concurrency/

This post is courtesy of Ian Carlson, Principal Solutions Architect – AWS

As mentioned in an earlier post, a key benefit of serverless applications is the ease with which they can scale to meet traffic demands or requests. AWS Lambda is at the core of this platform.

Although this flexibility is hugely beneficial for our customers, sometimes an errant bit of code or upstream scaling can lead to spikes in concurrency. Unwanted usage can increase costs, pressure downstream systems, and throttle other functions in the account. Administrators new to serverless technologies need to leverage different metrics to manage their environment. In this blog, I walk through a sample scenario where I’m getting API errors. I trace those errors up through Amazon CloudWatch logs and leverage CloudWatch Metrics to understand what is happening in my environment. Finally, I show you how to set up alerts to reduce throttling surprises.

In my environment, I have a few Lambda functions configured. The first function is from Chris Munns’ concurrency blog, called concurrencyblog. I set up that function to execute behind an API hosted on Amazon API Gateway. In the background, I’m simulating activity with another function. This exercise uses the services in the following image.

To start, I make an API Gateway call to invoke the concurrencyblog function.

curl -i -XGET https://XXXXXXXX.execute-api.us-east-2.amazonaws.com/prod/concurrencyblog

I get the following output.

HTTP/2 502
content-type: application/json
content-length: 36
date: Wed, 01 Aug 2018 14:46:03 GMT
x-amzn-requestid: 9d5eca92-9599-11e8-bb13-dddafe0dbaa3
x-amz-apigw-id: K8ocOG_iiYcFa_Q=
x-cache: Error from cloudfront
via: 1.1 cb9e55028a8e7365209ebc8f2737b69b.cloudfront.net (CloudFront)
x-amz-cf-id: fk-gvFwSan8hzBtrC1hC_V5idaSDAKL9EwKDq205iN2RgQjnmIURYg==
 
{"message": "Internal server error"}

Hmmm, a 502 error. That shouldn’t happen. I don’t know the cause, but I configured logging for my API, so I can search for the requestid in CloudWatch logs. I navigate to the logs, select Search Log Group, and enter the x-amzn-requestid, enclosed in double quotes.

My API is invoking a Lambda function, and it’s getting an error from Lambda called ConcurrentInvocationLimitExceeded. This means my Lambda function was throttled. If I navigate to the function in the Lambda console, I get a similar message at the top.

If I scroll down, I observe that I don’t have throttling configured, so this must be coming from a different function or functions.

Using CloudWatch forensics

Lambda functions report lots of metrics in CloudWatch to tell you how they’re doing. Three of the metrics that I investigate here are Invocations, Duration, and ConcurrentExecutions. Invocations is incremented any time a Lambda function executes and is recorded for all functions and by individual functions. Duration is recorded to tell you how long Lambda functions take to execute. ConcurrentExecutions reports how many Lambda functions are executing at the same time and is emitted for the entire account and for functions that have a concurrency reservation set. Lambda emits CloudWatch metrics whenever there is Lambda activity in the account.

Lambda reports concurrency metrics for my account under AWS/Lambda/ConcurrentExecutions. To begin, I navigate to the Metrics pane of the CloudWatch console and choose Lambda on the All metrics tab.

Next, I choose Across All Functions.

Then I choose ConcurrentExecutions.

I choose the Graphed metrics tab and change Statistic from Average to Maximum, which shows me the peak concurrent executions in my account. For Period, I recommend reviewing 1-minute period data over the previous 2 weeks. After 2 weeks, the precision is aggregated over 5 minutes, which is a long time for Lambda!

In my test account, I find a concurrency spike at 14:46 UTC with 1000 concurrent executions.

Next, I want to find the culprit for this spike. I go back to the All metrics tab, but this time I choose By Function Name and enter Invocations in the search field. Then I select all of the functions listed.

The following image shows that BadLambdaConcurrency is the culprit.

It seems odd that there are only 331 invocations during that sample in the graph, so let’s dig in. Using the same method as before, I add the Duration metric for BadLambdaConcurrency. On average, this function is taking 30 seconds to complete, as shown in the following image.

Because there are 669 invocations the previous minute and the function is taking, on average, 30 seconds to complete, the next minute’s invocations (331) drives the concurrency up to 1000. Lambda functions can execute very quickly, so exact precision can be challenging, even over a 1-minute time period. However, this gives you a reasonable indication of the troublesome function in the account.

Automating this process

Investigating via the Lambda and CloudWatch console works fine if you have a few functions, but when you have tens or hundreds it can be pretty time consuming. Fortunately CloudWatch metrics are also available via API. To speed up this process I’ve written a script in Python that will go back over the last 7 days of metrics, find the minute with the highest concurrency, and output the Invocations and Average Duration for all functions for six minutes prior to that spike. You can download the script here. To execute, make sure you have rights to CloudWatch metrics, or are running from an EC2 instance that has those rights. Then you can execute:

sudo yum install python3
pip install boto3 --user
curl https://raw.githubusercontent.com/aws-samples/aws-lambda-concurrency-hunt/master/lambda-con-hunt.py -o lambda-con-hunt.py
python3 lambda-con-hunt.py

or, to output it to a file:

python3 lambda-con-hunt.py > output.csv

You should get output similar to the following image.

You can import this data into a spreadsheet program and sort it, or you can confirm visually that BadLambdaConcurrency is driving the concurrency.

Getting to the root cause

Now I want to understand what is driving that spike in Invocations for BadLambdaConcurrency, so I go to the Lambda console. It shows that API Gateway is triggering this Lambda function.

I choose API Gateway and scroll down to discover which API is triggering. Choosing the name (ConcurrencyTest) takes me to that API.

It’s the same API that I set up for concurrencyblog, but a different method. Because I already set up logging for this API, I can search the log group to check for interesting behavior. Perusing the logs, I check the method request headers for any insights as to who is calling this API. In real life I wouldn’t leave an API open without authentication, so I’ll have to do some guessing.

(a915ba7f-9591-11e8-8f19-a7737a1fb2d7) Method request headers: {CloudFront-Viewer-Country=US, CloudFront-Forwarded-Proto=https, CloudFront-Is-Tablet-Viewer=false, CloudFront-Is-Mobile-Viewer=false, User-Agent=hey/0.0.1, X-Forwarded-Proto=https, CloudFront-Is-SmartTV-Viewer=false, Host=xxxxxxxxx.execute-api.us-east-2.amazonaws.com, Accept-Encoding=gzip, X-Forwarded-Port=443, X-Amzn-Trace-Id=Root=1-5b61ba53-1958c6e2022ef9df9aac7bdb, Via=1.1 a0286f15cb377e35ea96015406919392.cloudfront.net (CloudFront), X-Amz-Cf-Id=O0GQ_V_eWRe5KydZNc46-aPSz7dfI19bmyhWCsbTBMoety73q0AtZA==, X-Forwarded-For=f.f.f.f, a.a.a.a, CloudFront-Is-Desktop-Viewer=true, Content-Type=text/html}

The method request headers have a user agent called hey. Hey, that’s a load testing utility! I bet that someone is load-testing this API, but it shouldn’t be allowed to consume all of my resources.

Applying rate and concurrency limiting

To keep this from happening, I place a throttle on the API method. In API Gateway console, in the APIs navigation pane, I choose Stages, choose prod, choose the Settings tab, and select the Enable throttling check box. Then I set a rate of 20 requests per second. It doesn’t sound like much, but with an average function duration of 30 seconds, 20 requests per second can use 600 concurrent Lambda executions.

I can also set a concurrency reservation on the function itself, as Chris pointed out in his blog.

If this is a bad function running amok or an emergency, I can throttle it directly, sometimes referred to as flipping a kill switch. I can do that quickly by choosing Throttle on the Lambda console.

I recommend throttling to zero only in emergency situations.

Investigating the duration

The other and larger problem is this function is taking 30 seconds to execute. That is a long time for an API, and the API Gateway integration timeout is 29 seconds. I wonder what is making it time out, so I check the traces in AWS X-Ray.

It initializes quickly enough, and I don’t find any downstream processes called. This function is a simple one, and the code is available from the Lambda console window. There I find my timeout culprit, a 30-second sleep call.

Not sure how that got through testing!

Setting up ongoing monitoring and alerting

To ensure that I’m not surprised again, let’s create a CloudWatch alert. In the CloudWatch console’s navigation pane, I choose Alarms and then choose Create Alarm.

When prompted, I choose Lambda and the ConcurrentExecutions metric across all functions, as shown in the following image.

Under Alarm Threshold, I give the alarm a name and description and enter 800 for is, as shown in the following image. I treat missing data as good because Lambda won’t publish a metric if there is no activity. I make sure that my period is 1 minute and use Maximum as the statistic. I want to be alerted only if this happens for any 2 minutes out of a 5-minute period. Finally, I can set up an Amazon SNS notification to alert me via email or text if this threshold is reached. This enables me to troubleshoot or request a limit increase for my account. Individual functions should be able to handle a throttling event through client-side retry and exponential backoff, but it’s still something that I want to know about.

Conclusion

In this blog, I walked through a method to investigate concurrency issues with Lambda, remediate those issues, and set up alerting. Managing concurrency is going to be new for a lot of people. As you deploy more applications, it’s especially important to segment them, monitor them, and understand how they are reporting their health. I hope you enjoyed this blog and start monitoring your functions today!

Reduce Time to Resolution with Amazon CloudWatch Snapshot Graphs and Alerts

Post Syndicated from Stephen McCurry original https://aws.amazon.com/blogs/devops/reduce-time-to-resolution-with-amazon-cloudwatch-snapshot-graphs-and-alerts/

Steve McCurry is a Senior Product Manager for Amazon CloudWatch.

This is the first in a series of two blog posts that show how to use the new Amazon CloudWatch
snapshot graphs feature.

Although automated alerts are an important feature of any monitoring solution, including
Amazon CloudWatch, it can be challenging to identify the issues that matter in the noise of
monitoring alerts. Reducing the time to resolution depends on being able to make quick
decisions around the importance of alerts.

When you receive an alert, you ask, “Is this issue urgent?” Unfortunately, a page or email alert
that contains a text description about a symptom doesn’t provide enough context to answer the
question. You need to correlate the alert with metrics to understand what was happening around
the time of the alert. This digging for more information slows down the process of resolving the
issue.

This blog post shows you how to add richer context to an email alert by attaching a CloudWatch
snapshot graph. The snapshot graph shows the behavior of the underlying metric for the three
hours leading up to the alert.

Snapshot graphs overview

You can use snapshot graphs to integrate and display CloudWatch charts outside of the AWS
Management Console to improve monitoring visibility or reduce time to resolution. This feature
makes it possible for you to display CloudWatch charts on your webpage or integrate charts with
third-party tools, such as ticketing, chat applications, and bug tracking.

CloudWatch snapshot graphs are images of CloudWatch charts that are useful for building
custom dashboards. Although the images are static, they can be refreshed frequently to create a
live dashboard experience.

Snapshot graphs are available through the CloudWatch API, which you can use through the
AWS SDKs or AWS CLI. The charts you request through the API are represented as JSON. To
copy the JSON definition of the chart and use it in the API request, open the Amazon
CloudWatch console. You’ll find the JSON on the Source tab of the Metrics page, as shown
here.

All of the features of the CloudWatch line and stacked charts are available in snapshot graphs,
including vertical and horizontal annotations. The example in this post uses horizontal annotations.

Adding context to an EC2 monitoring alert

In this post, we will set up monitoring for an EC2 instance and generate a monitoring alert. The
alert contains details and a chart that displays the trend of the underlying metric (CPUUtilization)
for three hours leading up to the alert. To follow along, use the code in the SnapshotAlarmDemo
GitHub repo.

These are the steps required for the monitoring workflow:

  1. Create an EC2 instance to monitor.
  2. Create a Lambda function that creates an email alert with a snapshot graph attachment.
  3. Create an Amazon Simple Notification Service (Amazon SNS) topic with a Lambda
    subscription.
  4. Configure Amazon Simple Email Service (Amazon SES) to send email to your
    account(s).
  5. Create an alarm on a metric in CloudWatch whose target is the SNS topic.

After these components are set up and configured, the alarm will be triggered when the metric
breaches the alarm threshold value. The alarm will trigger the SNS topic and the Lambda
function will be executed. The Lambda function will interrogate the SNS message to extract the
details of the underlying metric and will call the Snapshot Graphs API to create the
corresponding chart. The Lambda function will also create an email and add the chart as an
attachment before using SES to send it. The operator will receive the email alert and can view
the chart immediately, without signing in to AWS.

Here is what the end-to-end solution looks like:

Create the EC2 instance to monitor

  1. Open the Amazon EC2 console.
  2. From the console dashboard, choose Launch Instance.
  3. On the Choose an Instance Type page, choose any instance type. For size, choose nano.
  4. Choose Review and Launch to let the wizard complete the other configuration settings
    for you.
  5. On the Review Instance Launch page, choose Launch.
  6. When prompted for a key pair, select Choose an existing key pair, or create one.
  7. Make a note of the instance ID that is displayed on the Launch Status page. You need
    this later, when you create your dashboard.

Create the Lambda function

First, create an IAM role for your Lambda function. Your Lambda function needs a role with the
permissions required to execute Lambda and call the CloudWatch API.

Step 1: Create the IAM role

Navigate to the IAM console.

In the navigation pane, choose Roles, and then choose Create role.

Add the following policies to the new role. These policies are more permissive than the
minimum permissions required for this example, so review and adjust according to your
requirements:

  • CloudWatchReadOnlyAccess
  • AmazonSESFullAccess
  • AmazonSNSReadOnlyAccess

Step 2: Create the Lambda function

Next, navigate to the AWS Lambda console and choose Create a function. If you are using the
demo code provided with this post, choose Node.js 6.10 (or later) for the runtime. Attach the
IAM role you created in step 1.

Step 3: Upload the code

Download the code from the SnapshotAlarmDemo GitHub repo. You will need to run npm install locally and then ZIP the project to upload to the Lambda function.

For Handler, enter emailer.myHandler.

Set the function timeout to 30 seconds, and then choose Save. Requests to the Snapshot Graphs
service will take longer based on the number of metrics and time interval requested. To optimize
request time, keep requests to under three hours of the most recent data.

Step 4: Set the environment variables

The email address and region used by the Lambda function are configured in the environment
variables.

EMAIL_TO_ADDRESS is the email address where the alert email will be sent.
EMAIL_FROM_ADDRESS is the sender email address.

Create the SNS topic

Navigate to the Amazon SNS console and choose Create Topic.

Create a subscription on the topic. For Endpoint, enter your Lambda function.

Configure SES

Navigate to the Amazon SES console. In the left navigation pane, choose Email Addresses.
Choose Verify a New Email Address, and then enter the email address.

SES will send a verification email to the selected address. To verify the account, you must click
the link in the verification email.

Create an alarm in CloudWatch

Navigate to the Amazon CloudWatch console. In the left navigation pane, choose Alarms, and
then choose Create Alarm.

For the Select Metric step, select an instance and metric (CPUUtilization, as shown here). For
the Define Alarm step, enter a unique name and threshold value. Use your SNS topic as the
target action. Change the period to 1 minute. Create the alarm.

Testing the workflow

To simulate a problem, let’s manually change the threshold of the metric to a very low value.

Save the alarm and then wait for the alarm to go into an error state. (This can take a couple of
minutes.) As soon as your alarm is in the error state, you should receive the alert email almost
immediately.

The email contains information about the alarm. The chart of the last three hours of the
associated metric is embedded in the email:

Troubleshooting

If you didn’t receive an email, make sure that the alarm is in the alarm state. If it is, check the
AWS Lambda logs, which you’ll find on the Logs tab of the CloudWatch console.

Conclusion

As you have seen in this demo, you can create CloudWatch alarm workflows that provide more
context than a basic text alert.

In my next post in this series, I will show you other ways to use CloudWatch snapshot graphs to
improve monitoring visibility.

For more information, see the snapshot graphs API documentation.

Taking it further

Most popular ticketing systems allow you to send emails that autogenerate tickets. You can use
the code provided in this post to email your ticketing system to create a ticket for the alarm and
automatically attach the snapshot graph in the body of the ticket. Try it!

You can also use the code provided in this post as a starting point to programmatically email an
entire CloudWatch dashboard rather than an individual chart. Simply retrieve the dashboard
definition using the GetDashboard API and make multiple calls to GetMetricWidgetImage.

I look forward to seeing what you build!

Developing .NET Core AWS Lambda functions

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/developing-net-core-aws-lambda-functions/

This post is courtesy of Mark Easton, Senior Solutions Architect – AWS

One of the biggest benefits of Lambda functions is that they isolate you from the underlying infrastructure. While that makes it easy to deploy and manage your code, it’s critical to have a clearly defined approach for testing, debugging, and diagnosing problems.

There’s a variety of best practices and AWS services to help you out. When developing Lambda functions in .NET, you can follow a four-pronged approach:

This post demonstrates the approach by creating a simple Lambda function that can be called from a gateway created by Amazon API Gateway and which returns the current UTC time. The post shows you how to design your code to allow for easy debugging, logging and tracing.

If you haven’t created Lambda functions with .NET Core before, then the following posts can help you get started:

Unit testing Lambda functions

One of the easiest ways to create a .NET Core Lambda function is to use the .NET Core CLI and create a solution using the Lambda Empty Serverless template.

If you haven’t already installed the Lambda templates, run the following command:

dotnet new -i Amazon.Lambda.Templates::*

You can now use the template to create a serverless project and unit test project, and then add them to a .NET Core solution by running the following commands:

dotnet new serverless.EmptyServerless -n DebuggingExample
cd DebuggingExample
dotnet new sln -n DebuggingExample\
dotnet sln DebuggingExample.sln add */*/*.csproj

Although you haven’t added any code yet, you can validate that everything’s working by executing the unit tests. Run the following commands:

cd test/DebuggingExample.Tests/
dotnet test

One of the key principles to effective unit testing is ensuring that units of functionality can be tested in isolation. It’s good practice to de-couple the Lambda function’s actual business logic from the plumbing code that handles the actual Lambda requests.

Using your favorite editor, create a new file, ITimeProcessor.cs, in the src/DebuggingExample folder, and create the following basic interface:

using System;

namespace DebuggingExample
{
    public interface ITimeProcessor
    {
        DateTime CurrentTimeUTC();
    }
}

Then, create a new TimeProcessor.cs file in the src/DebuggingExample folder. The file contains a concrete class implementing the interface.

using System;

namespace DebuggingExample
{
    public class TimeProcessor : ITimeProcessor
    {
        public DateTime CurrentTimeUTC()
        {
            return DateTime.UtcNow;
        }
    }
} 

Now add a TimeProcessorTest.cs file to the src/DebuggingExample.Tests folder. The file should contain the following code:

using System;
using Xunit;

namespace DebuggingExample.Tests
{
    public class TimeProcessorTest
    {
        [Fact]
        public void TestCurrentTimeUTC()
        {
            // Arrange
            var processor = new TimeProcessor();
            var preTestTimeUtc = DateTime.UtcNow;

            // Act
            var result = processor.CurrentTimeUTC();

            // Assert time moves forwards 
            var postTestTimeUtc = DateTime.UtcNow;
            Assert.True(result >= preTestTimeUtc);
            Assert.True(result <= postTestTimeUtc);
        }
    }
}

You can then execute all the tests. From the test/DebuggingExample.Tests folder, run the following command:

dotnet test

Surfacing business logic in a Lambda function

Now that you have your business logic written and tested, you can surface it as a Lambda function. Edit the src/DebuggingExample/Function.cs file so that it calls the CurrentTimeUTC method:

using System;
using System.Collections.Generic;
using System.Net;
using Amazon.Lambda.Core;
using Amazon.Lambda.APIGatewayEvents;
using Newtonsoft.Json;

// Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
[assembly: LambdaSerializer(
typeof(Amazon.Lambda.Serialization.Json.JsonSerializer))] 

namespace DebuggingExample
{
    public class Functions
    {
        ITimeProcessor processor = new TimeProcessor();

        public APIGatewayProxyResponse Get(
APIGatewayProxyRequest request, ILambdaContext context)
        {
            var result = processor.CurrentTimeUTC();

            return CreateResponse(result);
        }

APIGatewayProxyResponse CreateResponse(DateTime? result)
{
    int statusCode = (result != null) ? 
        (int)HttpStatusCode.OK : 
        (int)HttpStatusCode.InternalServerError;

    string body = (result != null) ? 
        JsonConvert.SerializeObject(result) : string.Empty;

    var response = new APIGatewayProxyResponse
    {
        StatusCode = statusCode,
        Body = body,
        Headers = new Dictionary<string, string>
        { 
            { "Content-Type", "application/json" }, 
            { "Access-Control-Allow-Origin", "*" } 
        }
    };
    
    return response;
}
    }
}

First, an instance of the TimeProcessor class is instantiated, and a Get() method is then defined to act as the entry point to the Lambda function.

By default, .NET Core Lambda function handlers expect their input in a Stream. This can be overridden by declaring a customer serializer, and then defining the handler’s method signature using a custom request and response type.

Because the project was created using the serverless.EmptyServerless template, it already overrides the default behavior. It does this by including a using reference to Amazon.Lambda.APIGatewayEvents and then declaring a custom serializer. For more information about using custom serializers in .NET, see the AWS Lambda for .NET Core repository on GitHub.

Get() takes a couple of parameters:

  • The APIGatewayProxyRequest parameter contains the request from the API Gateway fronting the Lambda function
  • The optional ILambdaContext parameter contains details of the execution context.

The Get() method calls CurrentTimeUTC() to retrieve the time from the business logic.

Finally, the result from CurrentTimeUTC() is passed to the CreateResponse() method, which converts the result into an APIGatewayResponse object to be returned to the caller.

Because the updated Lambda function no longer passes the unit tests, update the TestGetMethod in test/DebuggingExample.Tests/FunctionTest.cs file. Update the test by removing the following line:

Assert.Equal("Hello AWS Serverless", response.Body);

This leaves your FunctionTest.cs file as follows:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using Xunit;
using Amazon.Lambda.Core;
using Amazon.Lambda.TestUtilities;
using Amazon.Lambda.APIGatewayEvents;
using DebuggingExample;

namespace DebuggingExample.Tests
{
    public class FunctionTest
    {
        public FunctionTest()
        {
        }

        [Fact]
        public void TetGetMethod()
        {
            TestLambdaContext context;
            APIGatewayProxyRequest request;
            APIGatewayProxyResponse response;

            Functions functions = new Functions();

            request = new APIGatewayProxyRequest();
            context = new TestLambdaContext();
            response = functions.Get(request, context);
            Assert.Equal(200, response.StatusCode);
        }
    }
}

Again, you can check that everything is still working. From the test/DebuggingExample.Tests folder, run the following command:

dotnet test

Local integration testing with the AWS SAM CLI

Unit testing is a great start for testing thin slices of functionality. But to test that your API Gateway and Lambda function integrate with each other, you can test locally by using the AWS SAM CLI, installed as described in the AWS Lambda Developer Guide.

Unlike unit testing, which allows you to test functions in isolation outside of their runtime environment, the AWS SAM CLI executes your code in a locally hosted Docker container. It can also simulate a locally hosted API gateway proxy, allowing you to run component integration tests.

After you’ve installed the AWS SAM CLI, you can start using it by creating a template that describes your Lambda function by saving a file named template.yaml in the DebuggingExample directory with the following contents:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Sample SAM Template for DebuggingExample

# More info about Globals: https://github.com/awslabs/serverless-application-model/blob/master/docs/globals.rst
Globals:
    Function:
        Timeout: 10

Resources:

    DebuggingExampleFunction:
        Type: AWS::Serverless::Function # More info about Function Resource: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#awsserverlessfunction
        Properties:
            FunctionName: DebuggingExample
			CodeUri: src/DebuggingExample/bin/Release/netcoreapp2.1/publish
            Handler: DebuggingExample::DebuggingExample.Functions::Get
            Runtime: dotnetcore2.1
            Environment: # More info about Env Vars: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#environment-object
                Variables:
                    PARAM1: VALUE
            Events:
                DebuggingExample:
                    Type: Api # More info about API Event Source: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#api
                    Properties:
                        Path: /
                        Method: get

Outputs:

    DebuggingExampleApi:
      Description: "API Gateway endpoint URL for Prod stage for Debugging Example function"
      Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/DebuggingExample/"

    DebuggingExampleFunction:
      Description: "Debugging Example Lambda Function ARN"
      Value: !GetAtt DebuggingExampleFunction.Arn

    DebuggingExampleFunctionIamRole:
      Description: "Implicit IAM Role created for Debugging Example function"
      Value: !GetAtt DebuggingExampleFunctionRole.Arn

Now that you have an AWS SAM CLI template, you can test your code locally. Because the Lambda function expects a request from API Gateway, create a sample API Gateway request. Run the following command:

sam local generate-event api > testApiRequest.json

You can now publish your DebuggingExample code locally and invoke it by passing in the sample request as follows:

dotnet publish -c Release
sam local invoke "DebuggingExampleFunction" --event testApiRequest.json

The first time that you run it, it might take some time to pull down the container image in which to host the Lambda function. After you’ve invoked it one time, the container image is cached locally, and execution speeds up.

Finally, rather than testing your function by sending it a sample request, test it with a real API gateway request by running API Gateway locally:

sam local start-api

If you now navigate to http://127.0.0.1:3000/ in your browser, you can get the API gateway to send a request to your locally hosted Lambda function. See the results in your browser.

Logging events with CloudWatch

Having a test strategy allows you to execute, test, and debug Lambda functions. After you’ve deployed your functions to AWS, you must still log what the functions are doing so that you can monitor their behavior.

The easiest way to add logging to your Lambda functions is to add code that writes events to CloudWatch. To do this, add a new method, LogMessage(), to the src/DebuggingExample/Function.cs file.

void LogMessage(ILambdaContext ctx, string msg)
{
    ctx.Logger.LogLine(
        string.Format("{0}:{1} - {2}", 
            ctx.AwsRequestId, 
            ctx.FunctionName,
            msg));
}

This takes in the context object from the Lambda function’s Get() method, and sends a message to CloudWatch by calling the context object’s Logger.Logline() method.

You can now add calls to LogMessage in the Get() method to log events in CloudWatch. It’s also a good idea to add a Try… Catch… block to ensure that exceptions are logged as well.

        public APIGatewayProxyResponse Get(APIGatewayProxyRequest request, ILambdaContext context)
        {
            LogMessage(context, "Processing request started");

            APIGatewayProxyResponse response;
            try
            {
                var result = processor.CurrentTimeUTC();
                response = CreateResponse(result);

                LogMessage(context, "Processing request succeeded.");
            }
            catch (Exception ex)
            {
                LogMessage(context, string.Format("Processing request failed - {0}", ex.Message));
                response = CreateResponse(null);
            }

            return response;
        }

To validate that the changes haven’t broken anything, you can now execute the unit tests again. Run the following commands:

cd test/DebuggingExample.Tests/
dotnet test

Tracing execution with X-Ray

Your code now logs events in CloudWatch, which provides a solid mechanism to help monitor and diagnose problems.

However, it can also be useful to trace your Lambda function’s execution to help diagnose performance or connectivity issues, especially if it’s called by or calling other services. X-Ray provides a variety of features to help analyze and trace code execution.

To enable active tracing on your function you need to modify the SAM template we created earlier to add a new attribute to the function resource definition. With SAM this is as easy as adding the Tracing attribute and specifying it as Active below the Timeout attribute in the Globals section of the template.yaml file:

Globals:
    Function:
        Timeout: 10
        Tracing: Active

To call X-Ray from within your .NET Core code, you must add the AWSSDKXRayRecoder to your solution by running the following command in the src/DebuggingExample folder:

dotnet add package AWSXRayRecorder –-version 2.2.1-beta

Then, add the following using statement at the top of the src/DebuggingExample/Function.cs file:

using Amazon.XRay.Recorder.Core;

Add a new method to the Function class, which takes a function and name and then records an X-Ray subsegment to trace the execution of the function.

        private T TraceFunction<T>(Func<T> func, string subSegmentName)
        {
            AWSXRayRecorder.Instance.BeginSubsegment(subSegmentName);
            T result = func();
            AWSXRayRecorder.Instance.EndSubsegment();

            return result;
        } 

You can now update the Get() method by replacing the following line:

var result = processor.CurrentTimeUTC();

Replace it with this line:

var result = TraceFunction(processor.CurrentTimeUTC, "GetTime");

The final version of Function.cs, in all its glory, is now:

using System;
using System.Collections.Generic;
using System.Net;
using Amazon.Lambda.Core;
using Amazon.Lambda.APIGatewayEvents;
using Newtonsoft.Json;
using Amazon.XRay.Recorder.Core;

// Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
[assembly: LambdaSerializer(
typeof(Amazon.Lambda.Serialization.Json.JsonSerializer))]

namespace DebuggingExample
{
    public class Functions
    {
        ITimeProcessor processor = new TimeProcessor();

        public APIGatewayProxyResponse Get(APIGatewayProxyRequest request, ILambdaContext context)
        {
            LogMessage(context, "Processing request started");

            APIGatewayProxyResponse response;
            try
            {
                var result = TraceFunction(processor.CurrentTimeUTC, "GetTime");
                response = CreateResponse(result);

                LogMessage(context, "Processing request succeeded.");
            }
            catch (Exception ex)
            {
                LogMessage(context, string.Format("Processing request failed - {0}", ex.Message));
                response = CreateResponse(null);
            }

            return response;
        }

        APIGatewayProxyResponse CreateResponse(DateTime? result)
        {
            int statusCode = (result != null) ?
                (int)HttpStatusCode.OK :
                (int)HttpStatusCode.InternalServerError;

            string body = (result != null) ?
                JsonConvert.SerializeObject(result) : string.Empty;

            var response = new APIGatewayProxyResponse
            {
                StatusCode = statusCode,
                Body = body,
                Headers = new Dictionary<string, string>
        {
            { "Content-Type", "application/json" },
            { "Access-Control-Allow-Origin", "*" }
        }
            };

            return response;
        }

        private void LogMessage(ILambdaContext context, string message)
        {
            context.Logger.LogLine(string.Format("{0}:{1} - {2}", context.AwsRequestId, context.FunctionName, message));
        }

        private T TraceFunction<T>(Func<T> func, string actionName)
        {
            AWSXRayRecorder.Instance.BeginSubsegment(actionName);
            T result = func();
            AWSXRayRecorder.Instance.EndSubsegment();

            return result;
        }
    }
}

Since AWS X-Ray requires an agent to collect trace information, if you want to test the code locally you should now install the AWS X-Ray agent. Once it’s installed, confirm the changes haven’t broken anything by running the unit tests again:

cd test/DebuggingExample.Tests/
dotnet test

For more information about using X-Ray from .NET Core, see the AWS X-Ray Developer Guide. For information about adding support for X-Ray in Visual Studio, see the New AWS X-Ray .NET Core Support post.

Deploying and testing the Lambda function remotely

Having created your Lambda function and tested it locally, you’re now ready to package and deploy your code.

First of all you need an Amazon S3 bucket to deploy the code into. If you don’t already have one, create a suitable S3 bucket.

You can now package the .NET Lambda Function and copy it to Amazon S3.

sam package \
  --template-file template.yaml \
  --output-template debugging-example.yaml \
  --s3-bucket debugging-example-deploy

Finally, deploy the Lambda function by running the following command:

sam deploy \
   --template-file debugging-example.yaml \
   --stack-name DebuggingExample \
   --capabilities CAPABILITY_IAM \
   --region eu-west-1

After your code has deployed successfully, test it from your local machine by running the following command:

dotnet lambda invoke-function DebuggingExample -–region eu-west-1

Diagnosing the Lambda function

Having run the Lambda function, you can now monitor its behavior by logging in to the AWS Management Console and then navigating to CloudWatch LogsCloudWatch Logs Console

You can now click on the /aws/lambda/DebuggingExample log group to view all the recorded log streams for your Lambda function.

If you open one of the log streams, you see the various messages recorded for the Lambda function, including the two events explicitly logged from within the Get() method.Lambda CloudWatch Logs

To review the logs locally, you can also use the AWS SAM CLI to retrieve CloudWatch logs and then display them in your terminal.

sam logs -n DebuggingExample --region eu-west-1

As a final alternative, you can also execute the Lambda function by choosing Test on the Lambda console. The execution results are displayed in the Log output section. Lambda Console Execution

In the X-Ray console, the Service Map page shows a map of the Lambda function’s connections.

Your Lambda function is essentially standalone. However, the Service Map page can be critical in helping to understand performance issues when a Lambda function is connected with a number of other services.X-Ray Service Map

If you open the Traces screen, the trace list showing all the trace results that it’s recorded. Open one of the traces to see a breakdown of the Lambda function performance.

X-Ray Traces UI

Conclusion

In this post, I showed you how to develop Lambda functions in .NET Core, how unit tests can be used, how to use the AWS SAM CLI for local integration tests, how CloudWatch can be used for logging and monitoring events, and finally how to use X-Ray to trace Lambda function execution.

Put together, these techniques provide a solid foundation to help you debug and diagnose your Lambda functions effectively. Explore each of the services further, because when it comes to production workloads, great diagnosis is key to providing a great and uninterrupted customer experience.

Protecting your API using Amazon API Gateway and AWS WAF — Part 2

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/protecting-your-api-using-amazon-api-gateway-and-aws-waf-part-2/

This post courtesy of Heitor Lessa, AWS Specialist Solutions Architect – Serverless

In Part 1 of this blog, we described how to protect your API provided by Amazon API Gateway using AWS WAF. In this blog, we show how to use API keys between an Amazon CloudFront distribution and API Gateway to secure access to your API in API Gateway in addition to your preferred authorization (AuthZ) mechanism already set up in API Gateway. For more information about AuthZ mechanisms in API Gateway, see Secure API Access with Amazon Cognito Federated Identities, Amazon Cognito User Pools, and Amazon API Gateway.

We also extend the AWS CloudFormation stack previously used to automate the creation of the following necessary resources of this solution:

The following are alternative solutions to using an API key, depending on your security requirements:

Using a randomly generated HTTP secret header in CloudFront and verifying by API Gateway request validation
Signing incoming requests with [email protected] and verifying with API Gateway Lambda authorizers

Requirements

To follow along, you need full permissions to create, update, and delete API Gateway, CloudFront, Lambda, and CloudWatch Events through AWS CloudFormation.

Extending the existing AWS CloudFormation stack

First, click here to download the full template. Then follow these steps to update the existing AWS CloudFormation stack:

  1. Go to the AWS Management Console and open the AWS CloudFormation console.
  2. Select the stack that you created in Part 1, right-click it, and select Update Stack.
  3. For option 2, choose Choose file and select the template that you downloaded.
  4. Fill in the required parameters as shown in the following image.

Here’s more information about these parameters:

  • API Gateway to send traffic to – We use the same API Gateway URL as in Part 1 except without the URL scheme (https://): cxm45444t9a.execute-api.us-east-2.amazonaws.com/prod
  • Rotating API Keys – We define Daily and use 2018-04-03 as the timestamp value to append to the API key name

Continue with the AWS CloudFormation console to complete the operation. It might take a couple of minutes to update the stack as CloudFront takes its time to propagate changes across all point of presences.

Enabling API Keys in the example Pet Store API

While the stack completes in the background, let’s enable the use of API Keys in the API that CloudFront will send traffic to.

  1. Go to the AWS Management Console and open the API Gateway console.
  2. Select the API that you created in Part 1 and choose Resources.
  3. Under /pets, choose GET and then choose Method Request.
  4. For API Key Required, choose the dropdown menu and choose true.
  5. To save this change, select the highlighted check mark as shown in the following image.

Next, we need to deploy these changes so that requests sent to /pets fail if an API key isn’t present.

  1. Choose Actions and select Deploy API.
  2. Choose the Deployment stage dropdown menu and select the stage you created in Part 1.
  3. Add a deployment description such as “Requires API Keys under /pets” and choose Deploy.

When the deployment succeeds, you’re redirected to the API Gateway Stage page. There you can use the Invoke URL to test if the following request fails due to not having an API key.

This failure is expected and proves that our deployed changes are working. Next, let’s try to access the same API but this time through our CloudFront distribution.

  1. From the AWS Management Console, open the AWS Cloudformation console.
  2. Select the stack that you created in Part 1 and choose Outputs at the bottom left.
  3. On the CFDistribution line, copy the URL. Before you paste in a new browser tab or window, append ‘/pets’ to it.

As opposed to our first attempt without an API key, we receive a JSON response from the PetStore API. This is because CloudFront is injecting an API key before it forwards the request to the PetStore API. The following image demonstrates both of these tests:

  1. Successful request when accessing the API through CloudFront
  2. Unsuccessful request when accessing the API directly through its Invoke URL

This works as a secret between CloudFront and API Gateway, which could be any agreed random secret that can be rotated like an API key. However, it’s important to know that the API key is a feature to track or meter API consumers’ usage. It’s not a secure authorization mechanism and therefore should be used only in conjunction with an API Gateway authorizer.

Rotating API keys

API keys are automatically rotated based on the schedule (e.g., daily or monthly) that you chose when updating the AWS CloudFormation stack. This requires no maintenance or intervention on your part. In this section, we explain how this process works under the hood and what you can do if you want to manually trigger an API key rotation.

The AWS CloudFormation template that we downloaded and used to update our stack does the following in addition to Part 1.

Introduce a Timestamp parameter that is appended to the API key name

Parameters:
  Timestamp:
    Type: String
    Description: Fill in this format <Year>-<Month>-<Day>
    Default: 2018-04-02

Create an API Gateway key, API Gateway usage plan, associate the new key with the API gateway given as a parameter, and configure the CloudFront distribution to send a custom header when forwarding traffic to API Gateway

CFDistribution:
  Type: AWS::CloudFront::Distribution
  Properties:
    DistributionConfig:
      Logging:
        IncludeCookies: 'false'
        Bucket: !Sub ${S3BucketAccessLogs}.s3.amazonaws.com
        Prefix: cloudfront-logs
      Enabled: 'true'
      Comment: API Gateway Regional Endpoint Blog post
      Origins:
        -
          Id: APIGWRegional
          DomainName: !Select [0, !Split ['/', !Ref ApiURL]]
          CustomOriginConfig:
            HTTPPort: 443
            OriginProtocolPolicy: https-only
          OriginCustomHeaders:
            - 
              HeaderName: x-api-key
              HeaderValue: !Ref ApiKey
              ...

ApiUsagePlan:
  Type: AWS::ApiGateway::UsagePlan
  Properties:
    Description: CloudFront usage only
    UsagePlanName: CloudFront_only
    ApiStages:
      - 
        ApiId: !Select [0, !Split ['.', !Ref ApiURL]]
        Stage: !Select [1, !Split ['/', !Ref ApiURL]]

ApiKey: 
  Type: "AWS::ApiGateway::ApiKey"
  Properties: 
    Name: !Sub "CloudFront-${Timestamp}"
    Description: !Sub "CloudFormation API Key ${Timestamp}"
    Enabled: true

ApiKeyUsagePlan:
  Type: "AWS::ApiGateway::UsagePlanKey"
  Properties:
    KeyId: !Ref ApiKey
    KeyType: API_KEY
    UsagePlanId: !Ref ApiUsagePlan

As shown in the ApiKey resource, we append the given Timestamp to Name as well as use it in the API Gateway usage plan key resource. This means that whenever the Timestamp parameter changes, AWS CloudFormation triggers a resource replacement and updates every resource that depends on that API key. In this case, that includes the AWS CloudFront configuration and API Gateway usage plan.

But what does the rotation schedule that you chose at the beginning of this blog mean in this example?

Create a scheduled activity to trigger a Lambda function on a given schedule

Parameters:
...
  ApiKeyRotationSchedule: 
    Description: Schedule to rotate API Keys e.g. Daily, Monthly, Bimonthly basis
    Type: String
    Default: Daily
    AllowedValues:
      - Daily
      - Fortnightly
      - Monthly
      - Bimonthly
      - Quarterly
    ConstraintDescription: Must be any of the available options

Mappings: 

  ScheduleMap: 
    CloudwatchEvents: 
      Daily: "rate(1 day)"
      Fortnightly: "rate(14 days)"
      Monthly: "rate(30 days)"
      Bimonthly: "rate(60 days)"
      Quarterly: "rate(90 days)"

Resources:
...
  RotateApiKeysScheduledJob: 
    Type: "AWS::Events::Rule"
    Properties: 
      Description: "ScheduledRule"
      ScheduleExpression: !FindInMap [ScheduleMap, CloudwatchEvents, !Ref ApiKeyRotationSchedule]
      State: "ENABLED"
      Targets: 
        - 
          Arn: !GetAtt RotateApiKeysFunction.Arn
          Id: "RotateApiKeys"

The resource RotateApiKeysScheduledJob shows that the schedule that you selected through a dropdown menu when updating the AWS CloudFormation stack is actually converted to a CloudWatch Events rule. This in turn triggers a Lambda function that is defined in the same template.

RotateApiKeysFunction:
      Type: "AWS::Lambda::Function"
      Properties:
        Handler: "index.lambda_handler"
        Role: !GetAtt RotateApiKeysFunctionRole.Arn
        Runtime: python3.6
        Environment:
          Variables:
            StackName: !Ref "AWS::StackName"
        Code:
          ZipFile: !Sub |
            import datetime
            import os

            import boto3
            from botocore.exceptions import ClientError

            session = boto3.Session()
            cfn = session.client('cloudformation')
            
            timestamp = datetime.date.today()            
            params = {
                'StackName': os.getenv('StackName'),
                'UsePreviousTemplate': True,
                'Capabilities': ["CAPABILITY_IAM"],
                'Parameters': [
                    {
                      'ParameterKey': 'ApiURL',
                      'UsePreviousValue': True
                    },
                    {
                      'ParameterKey': 'ApiKeyRotationSchedule',
                      'UsePreviousValue': True
                    },
                    {
                      'ParameterKey': 'Timestamp',
                      'ParameterValue': str(timestamp)
                    },
                ],                
            }

            def lambda_handler(event, context):
              """ Updates CloudFormation Stack with a new timestamp and returns CloudFormation response"""
              try:
                  response = cfn.update_stack(**params)
              except ClientError as err:
                  if "No updates are to be performed" in err.response['Error']['Message']:
                      return {"message": err.response['Error']['Message']}
                  else:
                      raise Exception("An error happened while updating the stack: {}".format(err))          
  
              return response

All this Lambda function does is trigger an AWS CloudFormation stack update via API (exactly what you did through the console but programmatically) and updates the Timestamp parameter. As a result, it rotates the API key and the CloudFront distribution configuration.

This gives you enough flexibility to change the API key rotation schedule at any time without maintaining or writing any code. You can also manually update the stack and rotate the keys by updating the AWS CloudFormation stack’s Timestamp parameter.

Next Steps

We hope you found the information in this blog helpful. You can use it to understand how to create a mechanism to allow traffic only from CloudFront to API Gateway and avoid bypassing the AWS WAF rules that Part 1 set up.

Keep the following important notes in mind about this solution:

  • It assumes that you already have a strong AuthZ mechanism, managed by API Gateway, to control access to your API.
  • The API Gateway usage plan and other resources created in this solution work only for APIs created in the same account (the ApiUrl parameter).
  • If you already use API keys for tracking API usage, consider using either of the following solutions as a replacement:
    • Use a random HTTP header value in CloudFront origin configuration and use an API Gateway request model validation to verify it instead of API keys alone.
    • Combine [email protected] and an API Gateway custom authorizer to sign and verify incoming requests using a shared secret known only to the two. This is a more advanced technique.

Amazon Neptune Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-neptune-generally-available/

Amazon Neptune is now Generally Available in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland). Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets. At the core of Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with millisecond latencies. Neptune supports two popular graph models, Property Graph and RDF, through Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune can be used to power everything from recommendation engines and knowledge graphs to drug discovery and network security. Neptune is fully-managed with automatic minor version upgrades, backups, encryption, and fail-over. I wrote about Neptune in detail for AWS re:Invent last year and customers have been using the preview and providing great feedback that the team has used to prepare the service for GA.

Now that Amazon Neptune is generally available there are a few changes from the preview:

Launching an Amazon Neptune Cluster

Launching a Neptune cluster is as easy as navigating to the AWS Management Console and clicking create cluster. Of course you can also launch with CloudFormation, the CLI, or the SDKs.

You can monitor your cluster health and the health of individual instances through Amazon CloudWatch and the console.

Additional Resources

We’ve created two repos with some additional tools and examples here. You can expect continuous development on these repos as we add additional tools and examples.

  • Amazon Neptune Tools Repo
    This repo has a useful tool for converting GraphML files into Neptune compatible CSVs for bulk loading from S3.
  • Amazon Neptune Samples Repo
    This repo has a really cool example of building a collaborative filtering recommendation engine for video game preferences.

Purpose Built Databases

There’s an industry trend where we’re moving more and more onto purpose-built databases. Developers and businesses want to access their data in the format that makes the most sense for their applications. As cloud resources make transforming large datasets easier with tools like AWS Glue, we have a lot more options than we used to for accessing our data. With tools like Amazon Redshift, Amazon Athena, Amazon Aurora, Amazon DynamoDB, and more we get to choose the best database for the job or even enable entirely new use-cases. Amazon Neptune is perfect for workloads where the data is highly connected across data rich edges.

I’m really excited about graph databases and I see a huge number of applications. Looking for ideas of cool things to build? I’d love to build a web crawler in AWS Lambda that uses Neptune as the backing store. You could further enrich it by running Amazon Comprehend or Amazon Rekognition on the text and images found and creating a search engine on top of Neptune.

As always, feel free to reach out in the comments or on twitter to provide any feedback!

Randall

Monitoring your Amazon SNS message filtering activity with Amazon CloudWatch

Post Syndicated from Rachel Richardson original https://aws.amazon.com/blogs/compute/monitoring-your-amazon-sns-message-filtering-activity-with-amazon-cloudwatch/

This post is courtesy of Otavio Ferreira, Manager, Amazon SNS, AWS Messaging.

Amazon SNS message filtering provides a set of string and numeric matching operators that allow each subscription to receive only the messages of interest. Hence, SNS message filtering can simplify your pub/sub messaging architecture by offloading the message filtering logic from your subscriber systems, as well as the message routing logic from your publisher systems.

After you set the subscription attribute that defines a filter policy, the subscribing endpoint receives only the messages that carry attributes matching this filter policy. Other messages published to the topic are filtered out for this subscription. In this way, the native integration between SNS and Amazon CloudWatch provides visibility into the number of messages delivered, as well as the number of messages filtered out.

CloudWatch metrics are captured automatically for you. To get started with SNS message filtering, see Filtering Messages with Amazon SNS.

Message Filtering Metrics

The following six CloudWatch metrics are relevant to understanding your SNS message filtering activity:

  • NumberOfMessagesPublished – Inbound traffic to SNS. This metric tracks all the messages that have been published to the topic.
  • NumberOfNotificationsDelivered – Outbound traffic from SNS. This metric tracks all the messages that have been successfully delivered to endpoints subscribed to the topic. A delivery takes place either when the incoming message attributes match a subscription filter policy, or when the subscription has no filter policy at all, which results in a catch-all behavior.
  • NumberOfNotificationsFilteredOut – This metric tracks all the messages that were filtered out because they carried attributes that didn’t match the subscription filter policy.
  • NumberOfNotificationsFilteredOut-NoMessageAttributes – This metric tracks all the messages that were filtered out because they didn’t carry any attributes at all and, consequently, didn’t match the subscription filter policy.
  • NumberOfNotificationsFilteredOut-InvalidAttributes – This metric keeps track of messages that were filtered out because they carried invalid or malformed attributes and, thus, didn’t match the subscription filter policy.
  • NumberOfNotificationsFailed – This last metric tracks all the messages that failed to be delivered to subscribing endpoints, regardless of whether a filter policy had been set for the endpoint. This metric is emitted after the message delivery retry policy is exhausted, and SNS stops attempting to deliver the message. At that moment, the subscribing endpoint is likely no longer reachable. For example, the subscribing SQS queue or Lambda function has been deleted by its owner. You may want to closely monitor this metric to address message delivery issues quickly.

Message filtering graphs

Through the AWS Management Console, you can compose graphs to display your SNS message filtering activity. The graph shows the number of messages published, delivered, and filtered out within the timeframe you specify (1h, 3h, 12h, 1d, 3d, 1w, or custom).

SNS message filtering for CloudWatch Metrics

To compose an SNS message filtering graph with CloudWatch:

  1. Open the CloudWatch console.
  2. Choose Metrics, SNS, All Metrics, and Topic Metrics.
  3. Select all metrics to add to the graph, such as:
    • NumberOfMessagesPublished
    • NumberOfNotificationsDelivered
    • NumberOfNotificationsFilteredOut
  4. Choose Graphed metrics.
  5. In the Statistic column, switch from Average to Sum.
  6. Title your graph with a descriptive name, such as “SNS Message Filtering”

After you have your graph set up, you may want to copy the graph link for bookmarking, emailing, or sharing with co-workers. You may also want to add your graph to a CloudWatch dashboard for easy access in the future. Both actions are available to you on the Actions menu, which is found above the graph.

Summary

SNS message filtering defines how SNS topics behave in terms of message delivery. By using CloudWatch metrics, you gain visibility into the number of messages published, delivered, and filtered out. This enables you to validate the operation of filter policies and more easily troubleshoot during development phases.

SNS message filtering can be implemented easily with existing AWS SDKs by applying message and subscription attributes across all SNS supported protocols (Amazon SQS, AWS Lambda, HTTP, SMS, email, and mobile push). CloudWatch metrics for SNS message filtering is available now, in all AWS Regions.

For information about pricing, see the CloudWatch pricing page.

For more information, see: