Tag Archives: serverless architecture

Event-Driven Computing with Amazon SNS and AWS Compute, Storage, Database, and Networking Services

Post Syndicated from Christie Gifrin original https://aws.amazon.com/blogs/compute/event-driven-computing-with-amazon-sns-compute-storage-database-and-networking-services/

Contributed by Otavio Ferreira, Manager, Software Development, AWS Messaging

Like other developers around the world, you may be tackling increasingly complex business problems. A key success factor, in that case, is the ability to break down a large project scope into smaller, more manageable components. A service-oriented architecture guides you toward designing systems as a collection of loosely coupled, independently scaled, and highly reusable services. Microservices take this even further. To improve performance and scalability, they promote fine-grained interfaces and lightweight protocols.

However, the communication among isolated microservices can be challenging. Services are often deployed onto independent servers and don’t share any compute or storage resources. Also, you should avoid hard dependencies among microservices, to preserve maintainability and reusability.

If you apply the pub/sub design pattern, you can effortlessly decouple and independently scale out your microservices and serverless architectures. A pub/sub messaging service, such as Amazon SNS, promotes event-driven computing that statically decouples event publishers from subscribers, while dynamically allowing for the exchange of messages between them. An event-driven architecture also introduces the responsiveness needed to deal with complex problems, which are often unpredictable and asynchronous.

What is event-driven computing?

Given the context of microservices, event-driven computing is a model in which subscriber services automatically perform work in response to events triggered by publisher services. This paradigm can be applied to automate workflows while decoupling the services that collectively and independently work to fulfil these workflows. Amazon SNS is an event-driven computing hub, in the AWS Cloud, that has native integration with several AWS publisher and subscriber services.

Which AWS services publish events to SNS natively?

Several AWS services have been integrated as SNS publishers and, therefore, can natively trigger event-driven computing for a variety of use cases. In this post, I specifically cover AWS compute, storage, database, and networking services, as depicted below.

Compute services

  • Auto Scaling: Helps you ensure that you have the correct number of Amazon EC2 instances available to handle the load for your application. You can configure Auto Scaling lifecycle hooks to trigger events, as Auto Scaling resizes your EC2 cluster.As an example, you may want to warm up the local cache store on newly launched EC2 instances, and also download log files from other EC2 instances that are about to be terminated. To make this happen, set an SNS topic as your Auto Scaling group’s notification target, then subscribe two Lambda functions to this SNS topic. The first function is responsible for handling scale-out events (to warm up cache upon provisioning), whereas the second is in charge of handling scale-in events (to download logs upon termination).

  • AWS Elastic Beanstalk: An easy-to-use service for deploying and scaling web applications and web services developed in a number of programming languages. You can configure event notifications for your Elastic Beanstalk environment so that notable events can be automatically published to an SNS topic, then pushed to topic subscribers.As an example, you may use this event-driven architecture to coordinate your continuous integration pipeline (such as Jenkins CI). That way, whenever an environment is created, Elastic Beanstalk publishes this event to an SNS topic, which triggers a subscribing Lambda function, which then kicks off a CI job against your newly created Elastic Beanstalk environment.

  • Elastic Load Balancing: Automatically distributes incoming application traffic across Amazon EC2 instances, containers, or other resources identified by IP addresses.You can configure CloudWatch alarms on Elastic Load Balancing metrics, to automate the handling of events derived from Classic Load Balancers. As an example, you may leverage this event-driven design to automate latency profiling in an Amazon ECS cluster behind a Classic Load Balancer. In this example, whenever your ECS cluster breaches your load balancer latency threshold, an event is posted by CloudWatch to an SNS topic, which then triggers a subscribing Lambda function. This function runs a task on your ECS cluster to trigger a latency profiling tool, hosted on the cluster itself. This can enhance your latency troubleshooting exercise by making it timely.

Storage services

  • Amazon S3: Object storage built to store and retrieve any amount of data.You can enable S3 event notifications, and automatically get them posted to SNS topics, to automate a variety of workflows. For instance, imagine that you have an S3 bucket to store incoming resumes from candidates, and a fleet of EC2 instances to encode these resumes from their original format (such as Word or text) into a portable format (such as PDF).In this example, whenever new files are uploaded to your input bucket, S3 publishes these events to an SNS topic, which in turn pushes these messages into subscribing SQS queues. Then, encoding workers running on EC2 instances poll these messages from the SQS queues; retrieve the original files from the input S3 bucket; encode them into PDF; and finally store them in an output S3 bucket.

  • Amazon EFS: Provides simple and scalable file storage, for use with Amazon EC2 instances, in the AWS Cloud.You can configure CloudWatch alarms on EFS metrics, to automate the management of your EFS systems. For example, consider a highly parallelized genomics analysis application that runs against an EFS system. By default, this file system is instantiated on the “General Purpose” performance mode. Although this performance mode allows for lower latency, it might eventually impose a scaling bottleneck. Therefore, you may leverage an event-driven design to handle it automatically.Basically, as soon as the EFS metric “Percent I/O Limit” breaches 95%, CloudWatch could post this event to an SNS topic, which in turn would push this message into a subscribing Lambda function. This function automatically creates a new file system, this time on the “Max I/O” performance mode, then switches the genomics analysis application to this new file system. As a result, your application starts experiencing higher I/O throughput rates.

  • Amazon Glacier: A secure, durable, and low-cost cloud storage service for data archiving and long-term backup.You can set a notification configuration on an Amazon Glacier vault so that when a job completes, a message is published to an SNS topic. Retrieving an archive from Amazon Glacier is a two-step asynchronous operation, in which you first initiate a job, and then download the output after the job completes. Therefore, SNS helps you eliminate polling your Amazon Glacier vault to check whether your job has been completed, or not. As usual, you may subscribe SQS queues, Lambda functions, and HTTP endpoints to your SNS topic, to be notified when your Amazon Glacier job is done.

  • AWS Snowball: A petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data.You can leverage Snowball notifications to automate workflows related to importing data into and exporting data from AWS. More specifically, whenever your Snowball job status changes, Snowball can publish this event to an SNS topic, which in turn can broadcast the event to all its subscribers.As an example, imagine a Geographic Information System (GIS) that distributes high-resolution satellite images to users via Web browser. In this example, the GIS vendor could capture up to 80 TB of satellite images; create a Snowball job to import these files from an on-premises system to an S3 bucket; and provide an SNS topic ARN to be notified upon job status changes in Snowball. After Snowball changes the job status from “Importing” to “Completed”, Snowball publishes this event to the specified SNS topic, which delivers this message to a subscribing Lambda function, which finally creates a CloudFront web distribution for the target S3 bucket, to serve the images to end users.

Database services

  • Amazon RDS: Makes it easy to set up, operate, and scale a relational database in the cloud.RDS leverages SNS to broadcast notifications when RDS events occur. As usual, these notifications can be delivered via any protocol supported by SNS, including SQS queues, Lambda functions, and HTTP endpoints.As an example, imagine that you own a social network website that has experienced organic growth, and needs to scale its compute and database resources on demand. In this case, you could provide an SNS topic to listen to RDS DB instance events. When the “Low Storage” event is published to the topic, SNS pushes this event to a subscribing Lambda function, which in turn leverages the RDS API to increase the storage capacity allocated to your DB instance. The provisioning itself takes place within the specified DB maintenance window.

  • Amazon ElastiCache: A web service that makes it easy to deploy, operate, and scale an in-memory data store or cache in the cloud.ElastiCache can publish messages using Amazon SNS when significant events happen on your cache cluster. This feature can be used to refresh the list of servers on client machines connected to individual cache node endpoints of a cache cluster. For instance, an ecommerce website fetches product details from a cache cluster, with the goal of offloading a relational database and speeding up page load times. Ideally, you want to make sure that each web server always has an updated list of cache servers to which to connect.To automate this node discovery process, you can get your ElastiCache cluster to publish events to an SNS topic. Thus, when ElastiCache event “AddCacheNodeComplete” is published, your topic then pushes this event to all subscribing HTTP endpoints that serve your ecommerce website, so that these HTTP servers can update their list of cache nodes.

  • Amazon Redshift: A fully managed data warehouse that makes it simple to analyze data using standard SQL and BI (Business Intelligence) tools.Amazon Redshift uses SNS to broadcast relevant events so that data warehouse workflows can be automated. As an example, imagine a news website that sends clickstream data to a Kinesis Firehose stream, which then loads the data into Amazon Redshift, so that popular news and reading preferences might be surfaced on a BI tool. At some point though, this Amazon Redshift cluster might need to be resized, and the cluster enters a ready-only mode. Hence, this Amazon Redshift event is published to an SNS topic, which delivers this event to a subscribing Lambda function, which finally deletes the corresponding Kinesis Firehose delivery stream, so that clickstream data uploads can be put on hold.At a later point, after Amazon Redshift publishes the event that the maintenance window has been closed, SNS notifies a subscribing Lambda function accordingly, so that this function can re-create the Kinesis Firehose delivery stream, and resume clickstream data uploads to Amazon Redshift.

  • AWS DMS: Helps you migrate databases to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database.DMS also uses SNS to provide notifications when DMS events occur, which can automate database migration workflows. As an example, you might create data replication tasks to migrate an on-premises MS SQL database, composed of multiple tables, to MySQL. Thus, if replication tasks fail due to incompatible data encoding in the source tables, these events can be published to an SNS topic, which can push these messages into a subscribing SQS queue. Then, encoders running on EC2 can poll these messages from the SQS queue, encode the source tables into a compatible character set, and restart the corresponding replication tasks in DMS. This is an event-driven approach to a self-healing database migration process.

Networking services

  • Amazon Route 53: A highly available and scalable cloud-based DNS (Domain Name System). Route 53 health checks monitor the health and performance of your web applications, web servers, and other resources.You can set CloudWatch alarms and get automated Amazon SNS notifications when the status of your Route 53 health check changes. As an example, imagine an online payment gateway that reports the health of its platform to merchants worldwide, via a status page. This page is hosted on EC2 and fetches platform health data from DynamoDB. In this case, you could configure a CloudWatch alarm for your Route 53 health check, so that when the alarm threshold is breached, and the payment gateway is no longer considered healthy, then CloudWatch publishes this event to an SNS topic, which pushes this message to a subscribing Lambda function, which finally updates the DynamoDB table that populates the status page. This event-driven approach avoids any kind of manual update to the status page visited by merchants.

  • AWS Direct Connect (AWS DX): Makes it easy to establish a dedicated network connection from your premises to AWS, which can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.You can monitor physical DX connections using CloudWatch alarms, and send SNS messages when alarms change their status. As an example, when a DX connection state shifts to 0 (zero), indicating that the connection is down, this event can be published to an SNS topic, which can fan out this message to impacted servers through HTTP endpoints, so that they might reroute their traffic through a different connection instead. This is an event-driven approach to connectivity resilience.

More event-driven computing on AWS

In addition to SNS, event-driven computing is also addressed by Amazon CloudWatch Events, which delivers a near real-time stream of system events that describe changes in AWS resources. With CloudWatch Events, you can route each event type to one or more targets, including:

Many AWS services publish events to CloudWatch. As an example, you can get CloudWatch Events to capture events on your ETL (Extract, Transform, Load) jobs running on AWS Glue and push failed ones to an SQS queue, so that you can retry them later.

Conclusion

Amazon SNS is a pub/sub messaging service that can be used as an event-driven computing hub to AWS customers worldwide. By capturing events natively triggered by AWS services, such as EC2, S3 and RDS, you can automate and optimize all kinds of workflows, namely scaling, testing, encoding, profiling, broadcasting, discovery, failover, and much more. Business use cases presented in this post ranged from recruiting websites, to scientific research, geographic systems, social networks, retail websites, and news portals.

Start now by visiting Amazon SNS in the AWS Management Console, or by trying the AWS 10-Minute Tutorial, Send Fan-out Event Notifications with Amazon SNS and Amazon SQS.

 

Capturing Custom, High-Resolution Metrics from Containers Using AWS Step Functions and AWS Lambda

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/capturing-custom-high-resolution-metrics-from-containers-using-aws-step-functions-and-aws-lambda/

Contributed by Trevor Sullivan, AWS Solutions Architect

When you deploy containers with Amazon ECS, are you gathering all of the key metrics so that you can correctly monitor the overall health of your ECS cluster?

By default, ECS writes metrics to Amazon CloudWatch in 5-minute increments. For complex or large services, this may not be sufficient to make scaling decisions quickly. You may want to respond immediately to changes in workload or to identify application performance problems. Last July, CloudWatch announced support for high-resolution metrics, up to a per-second basis.

These high-resolution metrics can be used to give you a clearer picture of the load and performance for your applications, containers, clusters, and hosts. In this post, I discuss how you can use AWS Step Functions, along with AWS Lambda, to cost effectively record high-resolution metrics into CloudWatch. You implement this solution using a serverless architecture, which keeps your costs low and makes it easier to troubleshoot the solution.

To show how this works, you retrieve some useful metric data from an ECS cluster running in the same AWS account and region (Oregon, us-west-2) as the Step Functions state machine and Lambda function. However, you can use this architecture to retrieve any custom application metrics from any resource in any AWS account and region.

Why Step Functions?

Step Functions enables you to orchestrate multi-step tasks in the AWS Cloud that run for any period of time, up to a year. Effectively, you’re building a blueprint for an end-to-end process. After it’s built, you can execute the process as many times as you want.

For this architecture, you gather metrics from an ECS cluster, every five seconds, and then write the metric data to CloudWatch. After your ECS cluster metrics are stored in CloudWatch, you can create CloudWatch alarms to notify you. An alarm can also trigger an automated remediation activity such as scaling ECS services, when a metric exceeds a threshold defined by you.

When you build a Step Functions state machine, you define the different states inside it as JSON objects. The bulk of the work in Step Functions is handled by the common task state, which invokes Lambda functions or Step Functions activities. There is also a built-in library of other useful states that allow you to control the execution flow of your program.

One of the most useful state types in Step Functions is the parallel state. Each parallel state in your state machine can have one or more branches, each of which is executed in parallel. Another useful state type is the wait state, which waits for a period of time before moving to the next state.

In this walkthrough, you combine these three states (parallel, wait, and task) to create a state machine that triggers a Lambda function, which then gathers metrics from your ECS cluster.

Step Functions pricing

This state machine is executed every minute, resulting in 60 executions per hour, and 1,440 executions per day. Step Functions is billed per state transition, including the Start and End state transitions, and giving you approximately 37,440 state transitions per day. To reach this number, I’m using this estimated math:

26 state transitions per-execution x 60 minutes x 24 hours

Based on current pricing, at $0.000025 per state transition, the daily cost of this metric gathering state machine would be $0.936.

Step Functions offers an indefinite 4,000 free state transitions every month. This benefit is available to all customers, not just customers who are still under the 12-month AWS Free Tier. For more information and cost example scenarios, see Step Functions pricing.

Why Lambda?

The goal is to capture metrics from an ECS cluster, and write the metric data to CloudWatch. This is a straightforward, short-running process that makes Lambda the perfect place to run your code. Lambda is one of the key services that makes up “Serverless” application architectures. It enables you to consume compute capacity only when your code is actually executing.

The process of gathering metric data from ECS and writing it to CloudWatch takes a short period of time. In fact, my average Lambda function execution time, while developing this post, is only about 250 milliseconds on average. For every five-second interval that occurs, I’m only using 1/20th of the compute time that I’d otherwise be paying for.

Lambda pricing

For billing purposes, Lambda execution time is rounded up to the nearest 100-ms interval. In general, based on the metrics that I observed during development, a 250-ms runtime would be billed at 300 ms. Here, I calculate the cost of this Lambda function executing on a daily basis.

Assuming 31 days in each month, there would be 535,680 five-second intervals (31 days x 24 hours x 60 minutes x 12 five-second intervals = 535,680). The Lambda function is invoked every five-second interval, by the Step Functions state machine, and runs for a 300-ms period. At current Lambda pricing, for a 128-MB function, you would be paying approximately the following:

Total compute

Total executions = 535,680
Total compute = total executions x (3 x $0.000000208 per 100 ms) = $0.334 per day

Total requests

Total requests = (535,680 / 1000000) * $0.20 per million requests = $0.11 per day

Total Lambda Cost

$0.11 requests + $0.334 compute time = $0.444 per day

Similar to Step Functions, Lambda offers an indefinite free tier. For more information, see Lambda Pricing.

Walkthrough

In the following sections, I step through the process of configuring the solution just discussed. If you follow along, at a high level, you will:

  • Configure an IAM role and policy
  • Create a Step Functions state machine to control metric gathering execution
  • Create a metric-gathering Lambda function
  • Configure a CloudWatch Events rule to trigger the state machine
  • Validate the solution

Prerequisites

You should already have an AWS account with a running ECS cluster. If you don’t have one running, you can easily deploy a Docker container on an ECS cluster using the AWS Management Console. In the example produced for this post, I use an ECS cluster running Windows Server (currently in beta), but either a Linux or Windows Server cluster works.

Create an IAM role and policy

First, create an IAM role and policy that enables Step Functions, Lambda, and CloudWatch to communicate with each other.

  • The CloudWatch Events rule needs permissions to trigger the Step Functions state machine.
  • The Step Functions state machine needs permissions to trigger the Lambda function.
  • The Lambda function needs permissions to query ECS and then write to CloudWatch Logs and metrics.

When you create the state machine, Lambda function, and CloudWatch Events rule, you assign this role to each of those resources. Upon execution, each of these resources assumes the specified role and executes using the role’s permissions.

  1. Open the IAM console.
  2. Choose Roles, create New Role.
  3. For Role Name, enter WriteMetricFromStepFunction.
  4. Choose Save.

Create the IAM role trust relationship
The trust relationship (also known as the assume role policy document) for your IAM role looks like the following JSON document. As you can see from the document, your IAM role needs to trust the Lambda, CloudWatch Events, and Step Functions services. By configuring your role to trust these services, they can assume this role and inherit the role permissions.

  1. Open the IAM console.
  2. Choose Roles and select the IAM role previously created.
  3. Choose Trust RelationshipsEdit Trust Relationships.
  4. Enter the following trust policy text and choose Save.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "events.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "states.us-west-2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Create an IAM policy

After you’ve finished configuring your role’s trust relationship, grant the role access to the other AWS resources that make up the solution.

The IAM policy is what gives your IAM role permissions to access various resources. You must whitelist explicitly the specific resources to which your role has access, because the default IAM behavior is to deny access to any AWS resources.

I’ve tried to keep this policy document as generic as possible, without allowing permissions to be too open. If the name of your ECS cluster is different than the one in the example policy below, make sure that you update the policy document before attaching it to your IAM role. You can attach this policy as an inline policy, instead of creating the policy separately first. However, either approach is valid.

  1. Open the IAM console.
  2. Select the IAM role, and choose Permissions.
  3. Choose Add in-line policy.
  4. Choose Custom Policy and then enter the following policy. The inline policy name does not matter.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [ "logs:*" ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [ "cloudwatch:PutMetricData" ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [ "states:StartExecution" ],
            "Resource": [
                "arn:aws:states:*:*:stateMachine:WriteMetricFromStepFunction"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [ "lambda:InvokeFunction" ],
            "Resource": "arn:aws:lambda:*:*:function:WriteMetricFromStepFunction"
        },
        {
            "Effect": "Allow",
            "Action": [ "ecs:Describe*" ],
            "Resource": "arn:aws:ecs:*:*:cluster/ECSEsgaroth"
        }
    ]
}

Create a Step Functions state machine

In this section, you create a Step Functions state machine that invokes the metric-gathering Lambda function every five (5) seconds, for a one-minute period. If you divide a minute (60) seconds into equal parts of five-second intervals, you get 12. Based on this math, you create 12 branches, in a single parallel state, in the state machine. Each branch triggers the metric-gathering Lambda function at a different five-second marker, throughout the one-minute period. After all of the parallel branches finish executing, the Step Functions execution completes and another begins.

Follow these steps to create your Step Functions state machine:

  1. Open the Step Functions console.
  2. Choose DashboardCreate State Machine.
  3. For State Machine Name, enter WriteMetricFromStepFunction.
  4. Enter the state machine code below into the editor. Make sure that you insert your own AWS account ID for every instance of “676655494xxx”
  5. Choose Create State Machine.
  6. Select the WriteMetricFromStepFunction IAM role that you previously created.
{
    "Comment": "Writes ECS metrics to CloudWatch every five seconds, for a one-minute period.",
    "StartAt": "ParallelMetric",
    "States": {
      "ParallelMetric": {
        "Type": "Parallel",
        "Branches": [
          {
            "StartAt": "WriteMetricLambda",
            "States": {
             	"WriteMetricLambda": {
                  "Type": "Task",
				  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitFive",
            "States": {
            	"WaitFive": {
            		"Type": "Wait",
            		"Seconds": 5,
            		"Next": "WriteMetricLambdaFive"
          		},
             	"WriteMetricLambdaFive": {
                  "Type": "Task",
				  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitTen",
            "States": {
            	"WaitTen": {
            		"Type": "Wait",
            		"Seconds": 10,
            		"Next": "WriteMetricLambda10"
          		},
             	"WriteMetricLambda10": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitFifteen",
            "States": {
            	"WaitFifteen": {
            		"Type": "Wait",
            		"Seconds": 15,
            		"Next": "WriteMetricLambda15"
          		},
             	"WriteMetricLambda15": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait20",
            "States": {
            	"Wait20": {
            		"Type": "Wait",
            		"Seconds": 20,
            		"Next": "WriteMetricLambda20"
          		},
             	"WriteMetricLambda20": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait25",
            "States": {
            	"Wait25": {
            		"Type": "Wait",
            		"Seconds": 25,
            		"Next": "WriteMetricLambda25"
          		},
             	"WriteMetricLambda25": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait30",
            "States": {
            	"Wait30": {
            		"Type": "Wait",
            		"Seconds": 30,
            		"Next": "WriteMetricLambda30"
          		},
             	"WriteMetricLambda30": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait35",
            "States": {
            	"Wait35": {
            		"Type": "Wait",
            		"Seconds": 35,
            		"Next": "WriteMetricLambda35"
          		},
             	"WriteMetricLambda35": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait40",
            "States": {
            	"Wait40": {
            		"Type": "Wait",
            		"Seconds": 40,
            		"Next": "WriteMetricLambda40"
          		},
             	"WriteMetricLambda40": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait45",
            "States": {
            	"Wait45": {
            		"Type": "Wait",
            		"Seconds": 45,
            		"Next": "WriteMetricLambda45"
          		},
             	"WriteMetricLambda45": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait50",
            "States": {
            	"Wait50": {
            		"Type": "Wait",
            		"Seconds": 50,
            		"Next": "WriteMetricLambda50"
          		},
             	"WriteMetricLambda50": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait55",
            "States": {
            	"Wait55": {
            		"Type": "Wait",
            		"Seconds": 55,
            		"Next": "WriteMetricLambda55"
          		},
             	"WriteMetricLambda55": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          }
        ],
        "End": true
      }
  }
}

Now you’ve got a shiny new Step Functions state machine! However, you might ask yourself, “After the state machine has been created, how does it get executed?” Before I answer that question, create the Lambda function that writes the custom metric, and then you get the end-to-end process moving.

Create a Lambda function

The meaty part of the solution is a Lambda function, written to consume the Python 3.6 runtime, that retrieves metric values from ECS, and then writes them to CloudWatch. This Lambda function is what the Step Functions state machine is triggering every five seconds, via the Task states. Key points to remember:

The Lambda function needs permission to:

  • Write CloudWatch metrics (PutMetricData API).
  • Retrieve metrics from ECS clusters (DescribeCluster API).
  • Write StdOut to CloudWatch Logs.

Boto3, the AWS SDK for Python, is included in the Lambda execution environment for Python 2.x and 3.x.

Because Lambda includes the AWS SDK, you don’t have to worry about packaging it up and uploading it to Lambda. You can focus on writing code and automatically take a dependency on boto3.

As for permissions, you’ve already created the IAM role and attached a policy to it that enables your Lambda function to access the necessary API actions. When you create your Lambda function, make sure that you select the correct IAM role, to ensure it is invoked with the correct permissions.

The following Lambda function code is generic. So how does the Lambda function know which ECS cluster to gather metrics for? Your Step Functions state machine automatically passes in its state to the Lambda function. When you create your CloudWatch Events rule, you specify a simple JSON object that passes the desired ECS cluster name into your Step Functions state machine, which then passes it to the Lambda function.

Use the following property values as you create your Lambda function:

Function Name: WriteMetricFromStepFunction
Description: This Lambda function retrieves metric values from an ECS cluster and writes them to Amazon CloudWatch.
Runtime: Python3.6
Memory: 128 MB
IAM Role: WriteMetricFromStepFunction

import boto3

def handler(event, context):
    cw = boto3.client('cloudwatch')
    ecs = boto3.client('ecs')
    print('Got boto3 client objects')
    
    Dimension = {
        'Name': 'ClusterName',
        'Value': event['ECSClusterName']
    }

    cluster = get_ecs_cluster(ecs, Dimension['Value'])
    
    cw_args = {
       'Namespace': 'ECS',
       'MetricData': [
           {
               'MetricName': 'RunningTask',
               'Dimensions': [ Dimension ],
               'Value': cluster['runningTasksCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'PendingTask',
               'Dimensions': [ Dimension ],
               'Value': cluster['pendingTasksCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'ActiveServices',
               'Dimensions': [ Dimension ],
               'Value': cluster['activeServicesCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'RegisteredContainerInstances',
               'Dimensions': [ Dimension ],
               'Value': cluster['registeredContainerInstancesCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           }
        ]
    }
    cw.put_metric_data(**cw_args)
    print('Finished writing metric data')
    
def get_ecs_cluster(client, cluster_name):
    cluster = client.describe_clusters(clusters = [ cluster_name ])
    print('Retrieved cluster details from ECS')
    return cluster['clusters'][0]

Create the CloudWatch Events rule

Now you’ve created an IAM role and policy, Step Functions state machine, and Lambda function. How do these components actually start communicating with each other? The final step in this process is to set up a CloudWatch Events rule that triggers your metric-gathering Step Functions state machine every minute. You have two choices for your CloudWatch Events rule expression: rate or cron. In this example, use the cron expression.

A couple key learning points from creating the CloudWatch Events rule:

  • You can specify one or more targets, of different types (for example, Lambda function, Step Functions state machine, SNS topic, and so on).
  • You’re required to specify an IAM role with permissions to trigger your target.
    NOTE: This applies only to certain types of targets, including Step Functions state machines.
  • Each target that supports IAM roles can be triggered using a different IAM role, in the same CloudWatch Events rule.
  • Optional: You can provide custom JSON that is passed to your target Step Functions state machine as input.

Follow these steps to create the CloudWatch Events rule:

  1. Open the CloudWatch console.
  2. Choose Events, RulesCreate Rule.
  3. Select Schedule, Cron Expression, and then enter the following rule:
    0/1 * * * ? *
  4. Choose Add Target, Step Functions State MachineWriteMetricFromStepFunction.
  5. For Configure Input, select Constant (JSON Text).
  6. Enter the following JSON input, which is passed to Step Functions, while changing the cluster name accordingly:
    { "ECSClusterName": "ECSEsgaroth" }
  7. Choose Use Existing Role, WriteMetricFromStepFunction (the IAM role that you previously created).

After you’ve completed with these steps, your screen should look similar to this:

Validate the solution

Now that you have finished implementing the solution to gather high-resolution metrics from ECS, validate that it’s working properly.

  1. Open the CloudWatch console.
  2. Choose Metrics.
  3. Choose custom and select the ECS namespace.
  4. Choose the ClusterName metric dimension.

You should see your metrics listed below.

Troubleshoot configuration issues

If you aren’t receiving the expected ECS cluster metrics in CloudWatch, check for the following common configuration issues. Review the earlier procedures to make sure that the resources were properly configured.

  • The IAM role’s trust relationship is incorrectly configured.
    Make sure that the IAM role trusts Lambda, CloudWatch Events, and Step Functions in the correct region.
  • The IAM role does not have the correct policies attached to it.
    Make sure that you have copied the IAM policy correctly as an inline policy on the IAM role.
  • The CloudWatch Events rule is not triggering new Step Functions executions.
    Make sure that the target configuration on the rule has the correct Step Functions state machine and IAM role selected.
  • The Step Functions state machine is being executed, but failing part way through.
    Examine the detailed error message on the failed state within the failed Step Functions execution. It’s possible that the
  • IAM role does not have permissions to trigger the target Lambda function, that the target Lambda function may not exist, or that the Lambda function failed to complete successfully due to invalid permissions.
    Although the above list covers several different potential configuration issues, it is not comprehensive. Make sure that you understand how each service is connected to each other, how permissions are granted through IAM policies, and how IAM trust relationships work.

Conclusion

In this post, you implemented a Serverless solution to gather and record high-resolution application metrics from containers running on Amazon ECS into CloudWatch. The solution consists of a Step Functions state machine, Lambda function, CloudWatch Events rule, and an IAM role and policy. The data that you gather from this solution helps you rapidly identify issues with an ECS cluster.

To gather high-resolution metrics from any service, modify your Lambda function to gather the correct metrics from your target. If you prefer not to use Python, you can implement a Lambda function using one of the other supported runtimes, including Node.js, Java, or .NET Core. However, this post should give you the fundamental basics about capturing high-resolution metrics in CloudWatch.

If you found this post useful, or have questions, please comment below.

AWS Developer Tools Expands Integration to Include GitHub

Post Syndicated from Balaji Iyer original https://aws.amazon.com/blogs/devops/aws-developer-tools-expands-integration-to-include-github/

AWS Developer Tools is a set of services that include AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, and AWS CodeDeploy. Together, these services help you securely store and maintain version control of your application’s source code and automatically build, test, and deploy your application to AWS or your on-premises environment. These services are designed to enable developers and IT professionals to rapidly and safely deliver software.

As part of our continued commitment to extend the AWS Developer Tools ecosystem to third-party tools and services, we’re pleased to announce AWS CodeStar and AWS CodeBuild now integrate with GitHub. This will make it easier for GitHub users to set up a continuous integration and continuous delivery toolchain as part of their release process using AWS Developer Tools.

In this post, I will walk through the following:

Prerequisites:

You’ll need an AWS account, a GitHub account, an Amazon EC2 key pair, and administrator-level permissions for AWS Identity and Access Management (IAM), AWS CodeStar, AWS CodeBuild, AWS CodePipeline, Amazon EC2, Amazon S3.

 

Integrating GitHub with AWS CodeStar

AWS CodeStar enables you to quickly develop, build, and deploy applications on AWS. Its unified user interface helps you easily manage your software development activities in one place. With AWS CodeStar, you can set up your entire continuous delivery toolchain in minutes, so you can start releasing code faster.

When AWS CodeStar launched in April of this year, it used AWS CodeCommit as the hosted source repository. You can now choose between AWS CodeCommit or GitHub as the source control service for your CodeStar projects. In addition, your CodeStar project dashboard lets you centrally track GitHub activities, including commits, issues, and pull requests. This makes it easy to manage project activity across the components of your CI/CD toolchain. Adding the GitHub dashboard view will simplify development of your AWS applications.

In this section, I will show you how to use GitHub as the source provider for your CodeStar projects. I’ll also show you how to work with recent commits, issues, and pull requests in the CodeStar dashboard.

Sign in to the AWS Management Console and from the Services menu, choose CodeStar. In the CodeStar console, choose Create a new project. You should see the Choose a project template page.

CodeStar Project

Choose an option by programming language, application category, or AWS service. I am going to choose the Ruby on Rails web application that will be running on Amazon EC2.

On the Project details page, you’ll now see the GitHub option. Type a name for your project, and then choose Connect to GitHub.

Project details

You’ll see a message requesting authorization to connect to your GitHub repository. When prompted, choose Authorize, and then type your GitHub account password.

Authorize

This connects your GitHub identity to AWS CodeStar through OAuth. You can always review your settings by navigating to your GitHub application settings.

Installed GitHub Apps

You’ll see AWS CodeStar is now connected to GitHub:

Create project

You can choose a public or private repository. GitHub offers free accounts for users and organizations working on public and open source projects and paid accounts that offer unlimited private repositories and optional user management and security features.

In this example, I am going to choose the public repository option. Edit the repository description, if you like, and then choose Next.

Review your CodeStar project details, and then choose Create Project. On Choose an Amazon EC2 Key Pair, choose Create Project.

Key Pair

On the Review project details page, you’ll see Edit Amazon EC2 configuration. Choose this link to configure instance type, VPC, and subnet options. AWS CodeStar requires a service role to create and manage AWS resources and IAM permissions. This role will be created for you when you select the AWS CodeStar would like permission to administer AWS resources on your behalf check box.

Choose Create Project. It might take a few minutes to create your project and resources.

Review project details

When you create a CodeStar project, you’re added to the project team as an owner. If this is the first time you’ve used AWS CodeStar, you’ll be asked to provide the following information, which will be shown to others:

  • Your display name.
  • Your email address.

This information is used in your AWS CodeStar user profile. User profiles are not project-specific, but they are limited to a single AWS region. If you are a team member in projects in more than one region, you’ll have to create a user profile in each region.

User settings

User settings

Choose Next. AWS CodeStar will create a GitHub repository with your configuration settings (for example, https://github.com/biyer/ruby-on-rails-service).

When you integrate your integrated development environment (IDE) with AWS CodeStar, you can continue to write and develop code in your preferred environment. The changes you make will be included in the AWS CodeStar project each time you commit and push your code.

IDE

After setting up your IDE, choose Next to go to the CodeStar dashboard. Take a few minutes to familiarize yourself with the dashboard. You can easily track progress across your entire software development process, from your backlog of work items to recent code deployments.

Dashboard

After the application deployment is complete, choose the endpoint that will display the application.

Pipeline

This is what you’ll see when you open the application endpoint:

The Commit history section of the dashboard lists the commits made to the Git repository. If you choose the commit ID or the Open in GitHub option, you can use a hotlink to your GitHub repository.

Commit history

Your AWS CodeStar project dashboard is where you and your team view the status of your project resources, including the latest commits to your project, the state of your continuous delivery pipeline, and the performance of your instances. This information is displayed on tiles that are dedicated to a particular resource. To see more information about any of these resources, choose the details link on the tile. The console for that AWS service will open on the details page for that resource.

Issues

You can also filter issues based on their status and the assigned user.

Filter

AWS CodeBuild Now Supports Building GitHub Pull Requests

CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy. With CodeBuild, you don’t need to provision, manage, and scale your own build servers. CodeBuild scales continuously and processes multiple builds concurrently, so your builds are not left waiting in a queue. You can use prepackaged build environments to get started quickly or you can create custom build environments that use your own build tools.

We recently announced support for GitHub pull requests in AWS CodeBuild. This functionality makes it easier to collaborate across your team while editing and building your application code with CodeBuild. You can use the AWS CodeBuild or AWS CodePipeline consoles to run AWS CodeBuild. You can also automate the running of AWS CodeBuild by using the AWS Command Line Interface (AWS CLI), the AWS SDKs, or the AWS CodeBuild Plugin for Jenkins.

AWS CodeBuild

In this section, I will show you how to trigger a build in AWS CodeBuild with a pull request from GitHub through webhooks.

Open the AWS CodeBuild console at https://console.aws.amazon.com/codebuild/. Choose Create project. If you already have a CodeBuild project, you can choose Edit project, and then follow along. CodeBuild can connect to AWS CodeCommit, S3, BitBucket, and GitHub to pull source code for builds. For Source provider, choose GitHub, and then choose Connect to GitHub.

Configure

After you’ve successfully linked GitHub and your CodeBuild project, you can choose a repository in your GitHub account. CodeBuild also supports connections to any public repository. You can review your settings by navigating to your GitHub application settings.

GitHub Apps

On Source: What to Build, for Webhook, select the Rebuild every time a code change is pushed to this repository check box.

Note: You can select this option only if, under Repository, you chose Use a repository in my account.

Source

In Environment: How to build, for Environment image, select Use an image managed by AWS CodeBuild. For Operating system, choose Ubuntu. For Runtime, choose Base. For Version, choose the latest available version. For Build specification, you can provide a collection of build commands and related settings, in YAML format (buildspec.yml) or you can override the build spec by inserting build commands directly in the console. AWS CodeBuild uses these commands to run a build. In this example, the output is the string “hello.”

Environment

On Artifacts: Where to put the artifacts from this build project, for Type, choose No artifacts. (This is also the type to choose if you are just running tests or pushing a Docker image to Amazon ECR.) You also need an AWS CodeBuild service role so that AWS CodeBuild can interact with dependent AWS services on your behalf. Unless you already have a role, choose Create a role, and for Role name, type a name for your role.

Artifacts

In this example, leave the advanced settings at their defaults.

If you expand Show advanced settings, you’ll see options for customizing your build, including:

  • A build timeout.
  • A KMS key to encrypt all the artifacts that the builds for this project will use.
  • Options for building a Docker image.
  • Elevated permissions during your build action (for example, accessing Docker inside your build container to build a Dockerfile).
  • Resource options for the build compute type.
  • Environment variables (built-in or custom). For more information, see Create a Build Project in the AWS CodeBuild User Guide.

Advanced settings

You can use the AWS CodeBuild console to create a parameter in Amazon EC2 Systems Manager. Choose Create a parameter, and then follow the instructions in the dialog box. (In that dialog box, for KMS key, you can optionally specify the ARN of an AWS KMS key in your account. Amazon EC2 Systems Manager uses this key to encrypt the parameter’s value during storage and decrypt during retrieval.)

Create parameter

Choose Continue. On the Review page, either choose Save and build or choose Save to run the build later.

Choose Start build. When the build is complete, the Build logs section should display detailed information about the build.

Logs

To demonstrate a pull request, I will fork the repository as a different GitHub user, make commits to the forked repo, check in the changes to a newly created branch, and then open a pull request.

Pull request

As soon as the pull request is submitted, you’ll see CodeBuild start executing the build.

Build

GitHub sends an HTTP POST payload to the webhook’s configured URL (highlighted here), which CodeBuild uses to download the latest source code and execute the build phases.

Build project

If you expand the Show all checks option for the GitHub pull request, you’ll see that CodeBuild has completed the build, all checks have passed, and a deep link is provided in Details, which opens the build history in the CodeBuild console.

Pull request

Summary:

In this post, I showed you how to use GitHub as the source provider for your CodeStar projects and how to work with recent commits, issues, and pull requests in the CodeStar dashboard. I also showed you how you can use GitHub pull requests to automatically trigger a build in AWS CodeBuild — specifically, how this functionality makes it easier to collaborate across your team while editing and building your application code with CodeBuild.


About the author:

Balaji Iyer is an Enterprise Consultant for the Professional Services Team at Amazon Web Services. In this role, he has helped several customers successfully navigate their journey to AWS. His specialties include architecting and implementing highly scalable distributed systems, serverless architectures, large scale migrations, operational security, and leading strategic AWS initiatives. Before he joined Amazon, Balaji spent more than a decade building operating systems, big data analytics solutions, mobile services, and web applications. In his spare time, he enjoys experiencing the great outdoors and spending time with his family.

 

Take the Journey: Build Your First Serverless Web Application

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/build-your-first-serverless-application/

I realized at a young age that I really liked writing those special statements that would control the computer and make it work in the manner in which I desired. This technique of controlling the computer and building things on the machine, I learned from my teachers was called writing code, and it fascinated me. Even now, what seems like centuries later, I still get the thrill of writing code, building cool solutions, and tackling all the associated challenges of this craft. It is no wonder then, that I am a huge fan of serverless computing and serverless architectures.

Serverless Computing allows me to do what I enjoy, which is write code, without having to provision and/or configure servers. Using the AWS Serverless Platform means that all the heavy lifting of server management is handled by AWS, allowing you to focus on building your application.

If you enjoy coding like I do and have yet to dive into building serverless applications, boy do I have some sensational news for you. You can build your own serverless web application with our new Serverless Web Application Guide, which provides step-by-step instructions for you to create and deploy your serverless web application on AWS.

 

The Serverless Web Application Guide is a hands-on tutorial that will assist you in building a fully scalable, serverless web application using the following AWS Services:

  • AWS Lambda: a managed service for serverless compute that allows you to run code without provisioning or managing servers
  • Amazon S3: a managed service that provides simple, durable, scalable object storage
  • Amazon Cognito: a managed service that allows you to add user sign-up, and data synchronization to your application
  • Amazon API Gateway: a managed service which you can create, publish, and maintain secure APIs
  • Amazon DynamoDB: a fast and flexible NoSQL managed cloud database with support for various document and key-value storage models

The application you will build is a simple web application designed for a fictional transportation service. The application will enable users to register and login into the website to request rides from a very unique transportation fleet. You will accomplish this by using the aforementioned AWS services with the serverless application architecture shown in the diagram below.

 
The guide breaks up the each step to build your serverless web application into five separate modules.

 

  1. Static Web Hosting: Amazon S3 hosts static web resources including HTML, CSS, JavaScript, and image files that are loaded in the user’s browser.
  2. User Management: Amazon Cognito provides user management and authentication functions to secure the backend API.
  3. Serverless Backend: Amazon DynamoDB provides a persistence layer where data can be stored by the API’s Lambda function.
  4. RESTful APIs: JavaScript executed in the browser sends and receives data from a public backend API built using AWS Lambda and API Gateway.
  5. Resource Cleanup: All the resources created throughout the tutorial will be terminated.

To be successful in building the application, you must remember to complete each module in sequential order, as the modules are dependent on resources created in the previous one. Some of the guide’s modules provide CloudFormation templates to aid you in generating the necessary resources to build the application if you do not wish to create them manually.

 

Summary

Now that you know all about this fantastic new guide for building a serverless web application, you are ready to journey into the world of AWS serverless computing and have some fun writing the code to build the application. The guide is great for beginners and yet still has cool features that even seasoned serverless computing developers will enjoy building. And to top it off, you don’t have to worry about the cost. Each service used is eligible for the AWS Free Tier and is only estimated to cost less than $0.25 if you are outside of Free Tier usage limits.

Take the plunge today and dive into building serverless applications on the AWS serverless platform with this new and exciting Serverless Web Application Guide.

 

Tara

Analysis of Top-N DynamoDB Objects using Amazon Athena and Amazon QuickSight

Post Syndicated from Rendy Oka original https://aws.amazon.com/blogs/big-data/analysis-of-top-n-dynamodb-objects-using-amazon-athena-and-amazon-quicksight/

If you run an operation that continuously generates a large amount of data, you may want to know what kind of data is being inserted by your application. The ability to analyze data intake quickly can be very valuable for business units, such as operations and marketing. For many operations, it’s important to see what is driving the business at any particular moment. For retail companies, for example, understanding which products are currently popular can aid in planning for future growth. Similarly, for PR companies, understanding the impact of an advertising campaign can help them market their products more effectively.

This post covers an architecture that helps you analyze your streaming data. You’ll build a solution using Amazon DynamoDB Streams, AWS Lambda, Amazon Kinesis Firehose, and Amazon Athena to analyze data intake at a frequency that you choose. And because this is a serverless architecture, you can use all of the services here without the need to provision or manage servers.

The data source

You’ll collect a random sampling of tweets via Twitter’s API and store a variety of attributes in your DynamoDB table, such as: Twitter handle, tweet ID, hashtags, location, and Time-To-Live (TTL) value.

In DynamoDB, the primary key is used as an input to an internal hash function. The output from this function determines the partition in which the data will be stored. When using a combination of primary key and sort key as a DynamoDB schema, you need to make sure that no single partition key contains many more objects than the other partition keys because this can cause partition level throttling. For the demonstration in this blog, the Twitter handle will be the primary key and the tweet ID will be the sort key. This allows you to group and sort tweets from each user.

To help you get started, I have written a script that pulls a live Twitter stream that you can use to generate your data. All you need to do is provide your own Twitter Apps credentials, and it should generate the data immediately. Alternatively, I have also provided a script that you can use to generate random Tweets with little effort.

You can find both scripts in the Github repository:

https://github.com/awslabs/aws-blog-dynamodb-analysis

There are some modules that you may need to install to run these scripts. You can find them in Python’s module repository:

To get your own Twitter credentials, go to https://www.twitter.com/ and sign up for a free account, if you don’t already have one. After your account is set up, go to https://apps.twitter.com/. On the main landing page, choose the Create New App button. After the application is created, go to Keys and Access Tokens to get your credentials to use the Twitter API. You’ll need to generate Customer Tokens/Secret and Access Token/Secret. All four keys will be used to authenticate your request.

Architecture overview

Before we begin, let’s take a look at the overall flow of information will look like, from data ingestion into DynamoDB to visualization of results in Amazon QuickSight.

As illustrated in the architecture diagram above, any changes made to the items in DynamoDB will be captured and processed using DynamoDB Streams. Next, a Lambda function will be invoked by a trigger that is configured to respond to events in DynamoDB Streams. The Lambda function processes the data prior to pushing to Amazon Kinesis Firehose, which will output to Amazon S3. Finally, you use Amazon Athena to analyze the streaming data landing in Amazon S3. The result can be explored and visualized in Amazon QuickSight for your company’s business analytics.

You’ll need to implement your custom Lambda function to help transform the raw <key, value> data stored in DynamoDB to a JSON format for Athena to digest, but I can help you with a sample code that you are free to modify.

Implementation

In the following sections, I’ll walk through how you can set up the architecture discussed earlier.

Create your DynamoDB table

First, let’s create a DynamoDB table and enable DynamoDB Streams. This will enable data to be copied out of this table. From the console, use the user_id as the partition key and tweet_id as the sort key:

After the table is ready, you can enable DynamoDB Streams. This process operates asynchronously, so there is no performance impact on the table when you enable this feature. The easiest way to manage DynamoDB Streams is also through the DynamoDB console.

In the Overview tab of your newly created table, click Manage Stream. In the window, choose the information that will be written to the stream whenever data in the table is added or modified. In this example, you can choose either New image or New and old images.

For more details on this process, check out our documentation:

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html

Configure Kinesis Firehose

Before creating the Lambda function, you need to configure Kinesis Firehose delivery stream so that it’s ready to accept data from Lambda. Open the Firehose console and choose Create Firehose Delivery Stream. From here, choose S3 as the destination and use the following to information to configure the resource. Note the Delivery stream name because you will use it in the next step.

For more details on this process, check out our documentation:

http://docs.aws.amazon.com/firehose/latest/dev/basic-create.html#console-to-s3

Create your Lambda function

Now that Kinesis Firehose is ready to accept data, you can create your Lambda function.

From the AWS Lambda console, choose the Create a Lambda function button and use the Blank Function. Enter a name and description, and choose Python 2.7 as the Runtime. Note your Lambda function name because you’ll need it in the next step.

In the Lambda function code field, you can paste the script that I have written for this purpose. All this function needs is the name of your Firehose stream name set as an environment variable.

import boto3
import json
import os

# Initiate Firehose client
firehose_client = boto3.client('firehose')

def lambda_handler(event, context):
    records = []
    batch   = []
    try :
        for record in event['Records']:
            tweet = {}
            t_stats = '{ "table_name":"%s", "user_id":"%s", "tweet_id":"%s", "approx_post_time":"%d" }\n' \
                      % ( record['eventSourceARN'].split('/')[1], \
                          record['dynamodb']['Keys']['user_id']['S'], \
                          record['dynamodb']['Keys']['tweet_id']['N'], \
                          int(record['dynamodb']['ApproximateCreationDateTime']) )
            tweet["Data"] = t_stats
            records.append(tweet)
        batch.append(records)
        res = firehose_client.put_record_batch(
            DeliveryStreamName = os.environ['firehose_stream_name'],
            Records = batch[0]
        )
        return 'Successfully processed {} records.'.format(len(event['Records']))
    except Exception :
        pass

The handler should be set to lambda_function.lambda_handler and you can use the existing lambda_dynamodb_streams role that’s been created by default.

Enable DynamoDB trigger and start collecting data

Everything is ready to go. Open your table using the DynamoDB console and go to the Triggers tab. Select the Create trigger drop down list and choose Existing Lambda function. In the pop-up window, select the function that you just created, and choose the Create button.

At this point, you can start collecting data with the Python script that I’ve provided. The first one will create a script that will pull public Twitter data and the other will generate fake tweets using Lorem Ipsum text.

Configure Amazon Athena to read the data

Next, you will configure Amazon Athena so that it can read the data Kinesis Firehose outputs to Amazon S3 and allow you to analyze the data as needed. You can connect to Athena directly from the Athena console, and you can establish a connection using JDBC or the Athena API. In this example, I’m going to demonstrate what this looks like on the Athena console.

First, create a new database and a new table. You can do this by running the following two queries. The first query creates a new database:

CREATE DATABASE IF NOT EXISTS ddbtablestats

And the second query creates a new table:

CREATE EXTERNAL TABLE IF NOT EXISTS ddbtablestats.twitterfeed (
    `table_name` string,
    `user_id` string,
    `tweet_id` bigint,
    `approx_post_time` timestamp 
) PARTITIONED BY (
    year string,
    month string,
    day string,
    hour string 
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('serialization.format' = '1')
LOCATION 's3://myBucket/dynamodb/streams/transactions/'

Note that this table is created using partitions. Partitioning separates your data into logical parts based on certain criteria, such as date, location, language, etc. This allows Athena to selectively pull your data without needing to process the entire data set. This effectively minimizes the query execution time, and it also allows you to have greater control over the data that you want to query.

After the query has completed, you should be able to see the table in the left side pane of the Athena dashboard.

After the database and table have been created, execute the ALTER TABLE query to populate the partitions in your table. Replace the date with the current date when the script was executed.

ALTER TABLE ddbtablestats.TwitterFeed ADD IF NOT EXISTS
PARTITION (year='2017',month='05',day='17',hour='01') location 's3://myBucket/dynamodb/streams/transactions/2017/05/17/01/'

Using the Athena console, you’ll need to manually populate each partition for each additional partition that you’d like to analyze, however you can programmatically automate this process by using the JDBC driver or any AWS SDK of your choice.

For more information on partitioning in Athena, check out our documentation:

http://docs.aws.amazon.com/athena/latest/ug/partitions.html

Querying the data in Amazon Athena

This is it! Let’s run this query to see the top 10 most active Twitter users in the last 24 hours. You can do this from the Athena console:

SELECT user_id, COUNT(DISTINCT tweet_id) tweets FROM ddbTableStats.TwitterFeed
WHERE year='2017' AND month='05' AND day='17'
GROUP BY user_id
ORDER BY tweets DESC
LIMIT 10

The result should look similar to the following:

Linking Athena to Amazon QuickSight

Finally, to make this data available to a larger audience, let’s visualize this data in Amazon QuickSight. Amazon QuickSight provides native connectivity to AWS data sources such as Amazon Redshift, Amazon RDS, and Amazon Athena. Amazon QuickSight can also connect to on-premises databases, Excel, or CSV files, and it can connect to cloud data sources such as Salesforce.com. For this solution, we will connect Amazon QuickSight to the Athena table we just created.

Amazon QuickSight has a free tier that provides 1 user and 1GB of SPICE (Superfast Parallel In-memory Calculated Engine) capacity free. So you can sign up and use QuickSight free of charge.

When you are signing up for Amazon QuickSight, ensure that you grant permissions for QuickSight to connect to Athena and the S3 bucket where the data is stored.

After you’ve signed up, navigate to the new analysis button, and choose new data set, and then select the Athena data source option. Create a new name for your data source and proceed to the next prompt. At this point, you should see the Athena table you created earlier.

Choose the option to import the data to SPICE for a quicker analysis. SPICE is an in-memory optimized calculation engine that is designed for quick data visualization through parallel processing. SPICE also enables you to refresh your data sets at a regular interval or on-demand as you want.

In the dialog box, confirm this data set creation, and you’ll arrive on the landing page where you can start building your graph. The X-axis will represent the user_id and the Value will be used to represent the SUM total of the tweets from each user.

The Amazon QuickSight report looks like this:

Through this visualization, I can easily see that there are 3 users that tweeted over 20 times that day and that the majority of the users have fewer than 10 tweets that day. I can also set up a scheduled refresh of my SPICE dataset so that I have a dashboard that is regularly updated with the latest data.

Closing thoughts

Here are the benefits that you can gain from using this architecture:

  1. You can optimize the design of your DynamoDB schema that follows AWS best practice recommendations.
  1. You can run analysis and data intelligence in order to understand the current customer demands for your business.
  1. You can store incremental backup for future auditing.

The flexibility of our AWS services invites you to create and design the ideal workflow for your production at any scale, and, as always, if you ever need some guidance, don’t hesitate to reach out to us.I  hope this has been helpful to you! Please leave any questions and comments below.

 


Additional Reading

Learn how to analyze VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight.


About the Author

Rendy Oka is a Big Data Support Engineer for Amazon Web Services. He provides consultations and architectural designs and partners with the TAMs, Solution Architects, and AWS product teams to help develop solutions for our customers. He is also a team lead for the big data support team in Seattle. Rendy has traveled to dozens of countries around the world and takes every opportunity to experience the local culture wherever he goes

 

 

 

 

Introducing Our NEW AWS Community Heroes (Summer 2017 Edition)

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/introducing-our-new-aws-community-heroes-summer-2017-edition/

The AWS Community Heroes program seeks to recognize and honor the most engaged Amazon Web Services developers who have had a positive impact in the global community.  If you are interested in learning more about the AWS Community Heroes program or curious about ways to get involved with your local AWS community, please click the graphic below to see the AWS Heroes talk directly about the program.

Now that you know more about the AWS Community Hero program, I am elated to introduce to you all the latest AWS Heroes to join the fold:

These guys and gals impart their passion for AWS and cloud technologies with the technical community by sharing their time and knowledge across social media and via in-person events.

Ben Kehoe

Ben Kehoe works in the field of Cloud Robotics—using the internet to enable robots to do more and better things—an area of IoT involving computation in the cloud and at the edge, Big Data, and machine learning. Approaching cloud computing from this angle, Ben focuses on developing business value rapidly through serverless (and service full) applications.

At iRobot, Ben guided the transition to a serverless architecture on AWS based on AWS Lambda and AWS IoT to support iRobot’s connected robot fleet. This architecture enables iRobot to focus on its core mission of building amazing robots with a minimum of development and operations effort.

Ben seeks to amplify voices from dev, operations, and security to help the community shape the evolution of serverless and event-driven designs for IoT and cloud computing more broadly.

 

 

Marcia Villalba

Marcia is a Senior Full-stack Developer at Rovio, the creators of Angry Birds. She is originally from Uruguay but has been living in Finland for almost a decade.

She has been designing and developing software professionally for over 10 years. For more than four years she has been working with AWS, including the past year which she’s worked mostly with serverless technologies.

Marcia runs her own YouTube channel, in which she publishes at least one new video every week. In her channel, she focuses on teaching how to use AWS serverless technologies and managed services. In addition to her professional work, she is the Tech Lead in “Girls in Tech” Helsinki, helping to inspire more women to enter into technology and programming.

 

 

Joshua Levy

Joshua Levy is an entrepreneur, engineer, writer, and serial startup technologist and advisor in cloud, AI, search, and startup scaling.

He co-founded the Open Guide to AWS, which is one of the most popular AWS resources and communities on the web. The collaborative project welcomes new contributors or editors, and anyone who wishes to ask or answer questions.

Josh has years of experience in hands-on software engineering and leadership at fast-growing consumer and enterprise startups, including Viv Labs (acquired by Samsung) and BloomReach (where he led engineering and AWS infrastructure), and a background in AI and systems research at SRI and mathematics at Berkeley. He has a passion for improving how we share knowledge on complex engineering, product, or business topics. If you share any of these interests, reach out on Twitter or find his contact details on GitHub.

 

Michael Ezzell

Michael Ezzell is a frequent contributor of detailed, in-depth solutions to questions spanning a wide variety of AWS services on Stack Overflow and other sites on the Stack Exchange Network.

Michael is the resident DBA and systems administrator for Online Rewards, a leading provider of web-based employee recognition, channel incentive, and customer loyalty programs, where he was a key player in the company’s full transition to the AWS platform.

Based in Cincinnati, and known to coworkers and associates as “sqlbot,” he also provides design, development, and support services to freelance consulting clients for AWS services and MySQL, as well as, broadcast & cable television and telecommunications technologies.

 

 

 

Thanos Baskous

Thanos Baskous is a San Francisco-based software engineer and entrepreneur who is passionate about designing and building scalable and robust systems.

He co-founded the Open Guide to AWS, which is one of the most popular AWS resources and communities on the web.

At Twitter, he built infrastructure that allows engineers to seamlessly deploy and run their applications across private data centers and public cloud environments. He previously led a team at TellApart (acquired by Twitter) that built an internal platform-as-a-service (Docker, Apache Aurora, Mesos on AWS) in support of a migration from a monolithic application architecture to a microservice-based architecture. Before TellApart, he co-founded AWS-hosted AdStack (acquired by TellApart) in order to automatically personalize and improve the quality of content in marketing emails and email newsletters.

 

 

Rob Gruhl

Rob is a senior engineering manager located in Seattle, WA. He supports a team of talented engineers at Nordstrom Technology exploring and deploying a variety of serverless systems to production.

From the beginning of the serverless era, Rob has been exclusively using serverless architectures to allow a small team of engineers to deliver incredible solutions that scale effortlessly and wake them in the middle of the night rarely. In addition to a number of production services, together with his team Rob has created and released two major open source projects and accompanying open source workshops using a 100% serverless approach. He’d love to talk with you about serverless, event-sourcing, and/or occasionally-connected distributed data layers.

 

Feel free to follow these great AWS Heroes on Twitter and check out their blogs. It is exciting to have them all join the AWS Community Heroes program.

–  Tara

Visualize and Monitor Amazon EC2 Events with Amazon CloudWatch Events and Amazon Kinesis Firehose

Post Syndicated from Karan Desai original https://aws.amazon.com/blogs/big-data/visualize-and-monitor-amazon-ec2-events-with-amazon-cloudwatch-events-and-amazon-kinesis-firehose/

Monitoring your AWS environment is important for security, performance, and cost control purposes. For example, by monitoring and analyzing API calls made to your Amazon EC2 instances, you can trace security incidents and gain insights into administrative behaviors and access patterns. The kinds of events you might monitor include console logins, Amazon EBS snapshot creation/deletion/modification, VPC creation/deletion/modification, and instance reboots, etc.

In this post, I show you how to build a near real-time API monitoring solution for EC2 events using Amazon CloudWatch Events and Amazon Kinesis Firehose. Please be sure to have Amazon CloudTrail enabled in your account.

  • CloudWatch Events offers a near real-time stream of system events that describe changes in AWS resources. CloudWatch Events now supports Kinesis Firehose as a target.
  • Kinesis Firehose is a fully managed service for continuously capturing, transforming, and delivering data in minutes to storage and analytics destinations such as Amazon S3, Amazon Kinesis Analytics, Amazon Redshift, and Amazon Elasticsearch Service.

Walkthrough

For this walkthrough, you create a CloudWatch event rule that matches specific EC2 events such as:

  • Starting, stopping, and terminating an instance
  • Creating and deleting VPC route tables
  • Creating and deleting a security group
  • Creating, deleting, and modifying instance volumes and snapshots

Your CloudWatch event target is a Kinesis Firehose delivery stream that delivers this data to an Elasticsearch cluster, where you set up Kibana for visualization. Using this solution, you can easily load and visualize EC2 events in minutes without setting up complicated data pipelines.

Set up the Elasticsearch cluster

Create the Amazon ES domain in the Amazon ES console, or by using the create-elasticsearch-domain command in the AWS CLI.

This example uses the following configuration:

  • Domain Name: esLogSearch
  • Elasticsearch Version: 1
  • Instance Count: 2
  • Instance type:elasticsearch
  • Enable dedicated master: true
  • Enable zone awareness: true
  • Restrict Amazon ES to an IP-based access policy

Other settings are left as the defaults.

Create a Kinesis Firehose delivery stream

In the Kinesis Firehose console, create a new delivery stream with Amazon ES as the destination. For detailed steps, see Create a Kinesis Firehose Delivery Stream to Amazon Elasticsearch Service.

Set up CloudWatch Events

Create a rule, and configure the event source and target. You can choose to configure multiple event sources with several AWS resources, along with options to specify specific or multiple event types.

In the CloudWatch console, choose Events.

For Service Name, choose EC2.

In Event Pattern Preview, choose Edit and copy the pattern below. For this walkthrough, I selected events that are specific to the EC2 API, but you can modify it to include events for any of your AWS resources.

 

{
	"source": [
		"aws.ec2"
	],
	"detail-type": [
		"AWS API Call via CloudTrail"
	],
	"detail": {
		"eventSource": [
			"ec2.amazonaws.com"
		],
		"eventName": [
			"RunInstances",
			"StopInstances",
			"StartInstances",
			"CreateFlowLogs",
			"CreateImage",
			"CreateNatGateway",
			"CreateVpc",
			"DeleteKeyPair",
			"DeleteNatGateway",
			"DeleteRoute",
			"DeleteRouteTable",
"CreateSnapshot",
"DeleteSnapshot",
			"DeleteVpc",
			"DeleteVpcEndpoints",
			"DeleteSecurityGroup",
			"ModifyVolume",
			"ModifyVpcEndpoint",
			"TerminateInstances"
		]
	}
}

The following screenshot shows what your event looks like in the console.

Next, choose Add target and select the delivery stream that you just created.

Set up Kibana on the Elasticsearch cluster

Amazon ES provides a default installation of Kibana with every Amazon ES domain. You can find the Kibana endpoint on your domain dashboard in the Amazon ES console. You can restrict Amazon ES access to an IP-based access policy.

In the Kibana console, for Index name or pattern, type log. This is the name of the Elasticsearch index.

For Time-field name, choose @time.

To view the events, choose Discover.

The following chart demonstrates the API operations and the number of times that they have been triggered in the past 12 hours.

Summary

In this post, you created a continuous, near real-time solution to monitor various EC2 events such as starting and shutting down instances, creating VPCs, etc. Likewise, you can build a continuous monitoring solution for all the API operations that are relevant to your daily AWS operations and resources.

With Kinesis Firehose as a new target for CloudWatch Events, you can retrieve, transform, and load system events to the storage and analytics destination of your choice in minutes, without setting up complicated data pipelines.

If you have any questions or suggestions, please comment below.


Additional Reading

Learn how to build a serverless architecture to analyze Amazon CloudFront access logs using AWS Lambda, Amazon Athena, and Amazon Kinesis Analytics

 

 

 

AWS Online Tech Talks – June 2017

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-june-2017/

As the sixth month of the year, June is significant in that it is not only my birth month (very special), but it contains the summer solstice in the Northern Hemisphere, the day with the most daylight hours, and the winter solstice in the Southern Hemisphere, the day with the fewest daylight hours. In the United States, June is also the month in which we celebrate our dads with Father’s Day and have month-long celebrations of music, heritage, and the great outdoors.

Therefore, the month of June can be filled with lots of excitement. So why not add even more delight to the month, by enhancing your cloud computing skills. This month’s AWS Online Tech Talks features sessions on Artificial Intelligence (AI), Storage, Big Data, and Compute among other great topics.

June 2017 – Schedule

Noted below are the upcoming scheduled live, online technical sessions being held during the month of June. Make sure to register ahead of time so you won’t miss out on these free talks conducted by AWS subject matter experts. All schedule times for the online tech talks are shown in the Pacific Time (PDT) time zone.

Webinars featured this month are:

Thursday, June 1

Storage

9:00 AM – 10:00 AM: Deep Dive on Amazon Elastic File System

Big Data

10:30 AM – 11:30 AM: Migrating Big Data Workloads to Amazon EMR

Serverless

12:00 Noon – 1:00 PM: Building AWS Lambda Applications with the AWS Serverless Application Model (AWS SAM)

 

Monday, June 5

Artificial Intelligence

9:00 AM – 9:40 AM: Exploring the Business Use Cases for Amazon Lex

 

Tuesday, June 6

Management Tools

9:00 AM – 9:40 AM: Automated Compliance and Governance with AWS Config and AWS CloudTrail

 

Wednesday, June 7

Storage

9:00 AM – 9:40 AM: Backing up Amazon EC2 with Amazon EBS Snapshots

Big Data

10:30 AM – 11:10 AM: Intro to Amazon Redshift Spectrum: Quickly Query Exabytes of Data in S3

DevOps

12:00 Noon – 12:40 PM: Introduction to AWS CodeStar: Quickly Develop, Build, and Deploy Applications on AWS

 

Thursday, June 8

Artificial Intelligence

9:00 AM – 9:40 AM: Exploring the Business Use Cases for Amazon Polly

10:30 AM – 11:10 AM: Exploring the Business Use Cases for Amazon Rekognition

 

Monday, June 12

Artificial Intelligence

9:00 AM – 9:40 AM: Exploring the Business Use Cases for Amazon Machine Learning

 

Tuesday, June 13

Compute

9:00 AM – 9:40 AM: DevOps with Visual Studio, .NET and AWS

IoT

10:30 AM – 11:10 AM: Create, with Intel, an IoT Gateway and Establish a Data Pipeline to AWS IoT

Big Data

12:00 Noon – 12:40 PM: Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service

 

Wednesday, June 14

Containers

9:00 AM – 9:40 AM: Batch Processing with Containers on AWS

Security & Identity

12:00 Noon – 12:40 PM: Using Microsoft Active Directory across On-premises and Cloud Workloads

 

Thursday, June 15

Big Data

12:00 Noon – 1:00 PM: Building Big Data Applications with Serverless Architectures

 

Monday, June 19

Artificial Intelligence

9:00 AM – 9:40 AM: Deep Learning for Data Scientists: Using Apache MxNet and R on AWS

 

Tuesday, June 20

Storage

9:00 AM – 9:40 AM: Cloud Backup & Recovery Options with AWS Partner Solutions

Artificial Intelligence

10:30 AM – 11:10 AM: An Overview of AI on the AWS Platform

 

The AWS Online Tech Talks series covers a broad range of topics at varying technical levels. These sessions feature live demonstrations & customer examples led by AWS engineers and Solution Architects. Check out the AWS YouTube channel for more on-demand webinars on AWS technologies.

Tara

Build a Serverless Architecture to Analyze Amazon CloudFront Access Logs Using AWS Lambda, Amazon Athena, and Amazon Kinesis Analytics

Post Syndicated from Rajeev Srinivasan original https://aws.amazon.com/blogs/big-data/build-a-serverless-architecture-to-analyze-amazon-cloudfront-access-logs-using-aws-lambda-amazon-athena-and-amazon-kinesis-analytics/

Nowadays, it’s common for a web server to be fronted by a global content delivery service, like Amazon CloudFront. This type of front end accelerates delivery of websites, APIs, media content, and other web assets to provide a better experience to users across the globe.

The insights gained by analysis of Amazon CloudFront access logs helps improve website availability through bot detection and mitigation, optimizing web content based on the devices and browser used to view your webpages, reducing perceived latency by caching of popular object closer to its viewer, and so on. This results in a significant improvement in the overall perceived experience for the user.

This blog post provides a way to build a serverless architecture to generate some of these insights. To do so, we analyze Amazon CloudFront access logs both at rest and in transit through the stream. This serverless architecture uses Amazon Athena to analyze large volumes of CloudFront access logs (on the scale of terabytes per day), and Amazon Kinesis Analytics for streaming analysis.

The analytic queries in this blog post focus on three common use cases:

  1. Detection of common bots using the user agent string
  2. Calculation of current bandwidth usage per Amazon CloudFront distribution per edge location
  3. Determination of the current top 50 viewers

However, you can easily extend the architecture described to power dashboards for monitoring, reporting, and trigger alarms based on deeper insights gained by processing and analyzing the logs. Some examples are dashboards for cache performance, usage and viewer patterns, and so on.

Following we show a diagram of this architecture.

Prerequisites

Before you set up this architecture, install the AWS Command Line Interface (AWS CLI) tool on your local machine, if you don’t have it already.

Setup summary

The following steps are involved in setting up the serverless architecture on the AWS platform:

  1. Create an Amazon S3 bucket for your Amazon CloudFront access logs to be delivered to and stored in.
  2. Create a second Amazon S3 bucket to receive processed logs and store the partitioned data for interactive analysis.
  3. Create an Amazon Kinesis Firehose delivery stream to batch, compress, and deliver the preprocessed logs for analysis.
  4. Create an AWS Lambda function to preprocess the logs for analysis.
  5. Configure Amazon S3 event notification on the CloudFront access logs bucket, which contains the raw logs, to trigger the Lambda preprocessing function.
  6. Create an Amazon DynamoDB table to look up partition details, such as partition specification and partition location.
  7. Create an Amazon Athena table for interactive analysis.
  8. Create a second AWS Lambda function to add new partitions to the Athena table based on the log delivered to the processed logs bucket.
  9. Configure Amazon S3 event notification on the processed logs bucket to trigger the Lambda partitioning function.
  10. Configure Amazon Kinesis Analytics application for analysis of the logs directly from the stream.

ETL and preprocessing

In this section, we parse the CloudFront access logs as they are delivered, which occurs multiple times in an hour. We filter out commented records and use the user agent string to decipher the browser name, the name of the operating system, and whether the request has been made by a bot. For more details on how to decipher the preceding information based on the user agent string, see user-agents 1.1.0 in the Python documentation.

We use the Lambda preprocessing function to perform these tasks on individual rows of the access log. On successful completion, the rows are pushed to an Amazon Kinesis Firehose delivery stream to be persistently stored in an Amazon S3 bucket, the processed logs bucket.

To create a Firehose delivery stream with a new or existing S3 bucket as the destination, follow the steps described in Create a Firehose Delivery Stream to Amazon S3 in the S3 documentation. Keep most of the default settings, but select an AWS Identity and Access Management (IAM) role that has write access to your S3 bucket and specify GZIP compression. Name the delivery stream CloudFrontLogsToS3.

Another pre-requisite for this setup is to create an IAM role that provides the necessary permissions our AWS Lambda function to get the data from S3, process it, and deliver it to the CloudFrontLogsToS3 delivery stream.

Let’s use the AWS CLI to create the IAM role using the following the steps:

  1. Create the IAM policy (lambda-exec-policy) for the Lambda execution role to use.
  2. Create the Lambda execution role (lambda-cflogs-exec-role) and assign the service to use this role.
  3. Attach the policy created in step 1 to the Lambda execution role.

To download the policy document to your local machine, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/preprocessiong-lambda/lambda-exec-policy.json  <path_on_your_local_machine>

To download the assume policy document to your local machine, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/preprocessiong-lambda/assume-lambda-policy.json  <path_on_your_local_machine>

Following is the lambda-exec-policy.json file, which is the IAM policy used by the Lambda execution role.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "CloudWatchAccess",
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Sid": "S3Access",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::*"
            ]
        },
        {
            "Sid": "FirehoseAccess",
            "Effect": "Allow",
            "Action": [
                "firehose:ListDeliveryStreams",
                "firehose:PutRecord",
                "firehose:PutRecordBatch"
            ],
            "Resource": [
                "arn:aws:firehose:*:*:deliverystream/CloudFrontLogsToS3"
            ]
        }
    ]
}

To create the IAM policy used by Lambda execution role, type the following command.

aws iam create-policy --policy-name lambda-exec-policy --policy-document file://<path>/lambda-exec-policy.json

To create the AWS Lambda execution role and assign the service to use this role, type the following command.

aws iam create-role --role-name lambda-cflogs-exec-role --assume-role-policy-document file://<path>/assume-lambda-policy.json

Following is the assume-lambda-policy.json file, to grant Lambda permission to assume a role.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

To attach the policy (lambda-exec-policy) created to the AWS Lambda execution role (lambda-cflogs-exec-role), type the following command.

aws iam attach-role-policy --role-name lambda-cflogs-exec-role --policy-arn arn:aws:iam::<your-account-id>:policy/lambda-exec-policy

Now that we have created the CloudFrontLogsToS3 Firehose delivery stream and the lambda-cflogs-exec-role IAM role for Lambda, the next step is to create a Lambda preprocessing function.

This Lambda preprocessing function parses the CloudFront access logs delivered into the S3 bucket and performs a few transformation and mapping operations on the data. The Lambda function adds descriptive information, such as the browser and the operating system that were used to make this request based on the user agent string found in the logs. The Lambda function also adds information about the web distribution to support scenarios where CloudFront access logs are delivered to a centralized S3 bucket from multiple distributions. With the solution in this blog post, you can get insights across distributions and their edge locations.

Use the Lambda Management Console to create a new Lambda function with a Python 2.7 runtime and the s3-get-object-python blueprint. Open the console, and on the Configure triggers page, choose the name of the S3 bucket where the CloudFront access logs are delivered. Choose Put for Event type. For Prefix, type the name of the prefix, if any, for the folder where CloudFront access logs are delivered, for example cloudfront-logs/. To invoke Lambda to retrieve the logs from the S3 bucket as they are delivered, select Enable trigger.

Choose Next and provide a function name to identify this Lambda preprocessing function.

For Code entry type, choose Upload a file from Amazon S3. For S3 link URL, type https.amazonaws.com//preprocessing-lambda/pre-data.zip. In the section, also create an environment variable with the key KINESIS_FIREHOSE_STREAM and a value with the name of the Firehose delivery stream as CloudFrontLogsToS3.

Choose lambda-cflogs-exec-role as the IAM role for the Lambda function, and type prep-data.lambda_handler for the value for Handler.

Choose Next, and then choose Create Lambda.

Table creation in Amazon Athena

In this step, we will build the Athena table. Use the Athena console in the same region and create the table using the query editor.

CREATE EXTERNAL TABLE IF NOT EXISTS cf_logs (
  logdate date,
  logtime string,
  location string,
  bytes bigint,
  requestip string,
  method string,
  host string,
  uri string,
  status bigint,
  referrer string,
  useragent string,
  uriquery string,
  cookie string,
  resulttype string,
  requestid string,
  header string,
  csprotocol string,
  csbytes string,
  timetaken bigint,
  forwardedfor string,
  sslprotocol string,
  sslcipher string,
  responseresulttype string,
  protocolversion string,
  browserfamily string,
  osfamily string,
  isbot string,
  filename string,
  distribution string
)
PARTITIONED BY(year string, month string, day string, hour string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://<pre-processing-log-bucket>/prefix/';

Creation of the Athena partition

A popular website with millions of requests each day routed using Amazon CloudFront can generate a large volume of logs, on the order of a few terabytes a day. We strongly recommend that you partition your data to effectively restrict the amount of data scanned by each query. Partitioning significantly improves query performance and substantially reduces cost. The Lambda partitioning function adds the partition information to the Athena table for the data delivered to the preprocessed logs bucket.

Before delivering the preprocessed Amazon CloudFront logs file into the preprocessed logs bucket, Amazon Kinesis Firehose adds a UTC time prefix in the format YYYY/MM/DD/HH. This approach supports multilevel partitioning of the data by year, month, date, and hour. You can invoke the Lambda partitioning function every time a new processed Amazon CloudFront log is delivered to the preprocessed logs bucket. To do so, configure the Lambda partitioning function to be triggered by an S3 Put event.

For a website with millions of requests, a large number of preprocessed logs can be delivered multiple times in an hour—for example, at the interval of one each second. To avoid querying the Athena table for partition information every time a preprocessed log file is delivered, you can create an Amazon DynamoDB table for fast lookup.

Based on the year, month, data and hour in the prefix of the delivered log, the Lambda partitioning function checks if the partition specification exists in the Amazon DynamoDB table. If it doesn’t, it’s added to the table using an atomic operation, and then the Athena table is updated.

Type the following command to create the Amazon DynamoDB table.

aws dynamodb create-table --table-name athenapartitiondetails \
--attribute-definitions AttributeName=PartitionSpec,AttributeType=S \
--key-schema AttributeName=PartitionSpec,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=100

Here the following is true:

  • PartitionSpec is the hash key and is a representation of the partition signature—for example, year=”2017”; month=”05”; day=”15”; hour=”10”.
  • Depending on the rate at which the processed log files are delivered to the processed log bucket, you might have to increase the ReadCapacityUnits and WriteCapacityUnits values, if these are throttled.

The other attributes besides PartitionSpec are the following:

  • PartitionPath – The S3 path associated with the partition.
  • PartitionType – The type of partition used (Hour, Month, Date, Year, or ALL). In this case, ALL is used.

Next step is to create the IAM role to provide permissions for the Lambda partitioning function. You require permissions to do the following:

  1. Look up and write partition information to DynamoDB.
  2. Alter the Athena table with new partition information.
  3. Perform Amazon CloudWatch logs operations.
  4. Perform Amazon S3 operations.

To download the policy document to your local machine, type following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/partitioning-lambda/lambda-partition-function-execution-policy.json  <path_on_your_local_machine>

To download the assume policy document to your local machine, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/partitioning-lambda/assume-lambda-policy.json <path_on_your_local_machine>

To create the Lambda execution role and assign the service to use this role, type the following command.

aws iam create-role --role-name lambda-cflogs-exec-role --assume-role-policy-document file://<path>/assume-lambda-policy.json

Let’s use the AWS CLI to create the IAM role using the following three steps:

  1. Create the IAM policy(lambda-partition-exec-policy) used by the Lambda execution role.
  2. Create the Lambda execution role (lambda-partition-execution-role)and assign the service to use this role.
  3. Attach the policy created in step 1 to the Lambda execution role.

To create the IAM policy used by Lambda execution role, type the following command.

aws iam create-policy --policy-name lambda-partition-exec-policy --policy-document file://<path>/lambda-partition-function-execution-policy.json

To create the Lambda execution role and assign the service to use this role, type the following command.

aws iam create-role --role-name lambda-partition-execution-role --assume-role-policy-document file://<path>/assume-lambda-policy.json

To attach the policy (lambda-partition-exec-policy) created to the AWS Lambda execution role (lambda-partition-execution-role), type the following command.

aws iam attach-role-policy --role-name lambda-partition-execution-role --policy-arn arn:aws:iam::<your-account-id>:policy/lambda-partition-exec-policy

Following is the lambda-partition-function-execution-policy.json file, which is the IAM policy used by the Lambda execution role.

{
    "Version": "2012-10-17",
    "Statement": [
      	{
            	"Sid": "DDBTableAccess",
            	"Effect": "Allow",
            	"Action": "dynamodb:PutItem"
            	"Resource": "arn:aws:dynamodb*:*:table/athenapartitiondetails"
        	},
        	{
            	"Sid": "S3Access",
            	"Effect": "Allow",
            	"Action": [
                		"s3:GetBucketLocation",
                		"s3:GetObject",
                		"s3:ListBucket",
                		"s3:ListBucketMultipartUploads",
                		"s3:ListMultipartUploadParts",
                		"s3:AbortMultipartUpload",
                		"s3:PutObject"
            	],
          		"Resource":"arn:aws:s3:::*"
		},
	              {
		      "Sid": "AthenaAccess",
      		"Effect": "Allow",
      		"Action": [ "athena:*" ],
      		"Resource": [ "*" ]
	      },
        	{
            	"Sid": "CloudWatchLogsAccess",
            	"Effect": "Allow",
            	"Action": [
                		"logs:CreateLogGroup",
                		"logs:CreateLogStream",
             	   	"logs:PutLogEvents"
            	],
            	"Resource": "arn:aws:logs:*:*:*"
        	}
    ]
}

Download the .jar file containing the Java deployment package to your local machine.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/partitioning-lambda/aws-lambda-athena-1.0.0.jar <path_on_your_local_machine>

From the AWS Management Console, create a new Lambda function with Java8 as the runtime. Select the Blank Function blueprint.

On the Configure triggers page, choose the name of the S3 bucket where the preprocessed logs are delivered. Choose Put for the Event Type. For Prefix, type the name of the prefix folder, if any, where preprocessed logs are delivered by Firehose—for example, out/. For Suffix, type the name of the compression format that the Firehose stream (CloudFrontLogToS3) delivers the preprocessed logs —for example, gz. To invoke Lambda to retrieve the logs from the S3 bucket as they are delivered, select Enable Trigger.

Choose Next and provide a function name to identify this Lambda partitioning function.

Choose Java8 for Runtime for the AWS Lambda function. Choose Upload a .ZIP or .JAR file for the Code entry type, and choose Upload to upload the downloaded aws-lambda-athena-1.0.0.jar file.

Next, create the following environment variables for the Lambda function:

  • TABLE_NAME – The name of the Athena table (for example, cf_logs).
  • PARTITION_TYPE – The partition to be created based on the Athena table for the logs delivered to the sub folders in S3 bucket based on Year, Month, Date, Hour, or Set this to ALL to use Year, Month, Date, and Hour.
  • DDB_TABLE_NAME – The name of the DynamoDB table holding partition information (for example, athenapartitiondetails).
  • ATHENA_REGION – The current AWS Region for the Athena table to construct the JDBC connection string.
  • S3_STAGING_DIR – The Amazon S3 location where your query output is written. The JDBC driver asks Athena to read the results and provide rows of data back to the user (for example, s3://<bucketname>/<folder>/).

To configure the function handler and IAM, for Handler copy and paste the name of the handler: com.amazonaws.services.lambda.CreateAthenaPartitionsBasedOnS3EventWithDDB::handleRequest. Choose the existing IAM role, lambda-partition-execution-role.

Choose Next and then Create Lambda.

Interactive analysis using Amazon Athena

In this section, we analyze the historical data that’s been collected since we added the partitions to the Amazon Athena table for data delivered to the preprocessing logs bucket.

Scenario 1 is robot traffic by edge location.

SELECT COUNT(*) AS ct, requestip, location FROM cf_logs
WHERE isbot='True'
GROUP BY requestip, location
ORDER BY ct DESC;

Scenario 2 is total bytes transferred per distribution for each edge location for your website.

SELECT distribution, location, SUM(bytes) as totalBytes
FROM cf_logs
GROUP BY location, distribution;

Scenario 3 is the top 50 viewers of your website.

SELECT requestip, COUNT(*) AS ct  FROM cf_logs
GROUP BY requestip
ORDER BY ct DESC;

Streaming analysis using Amazon Kinesis Analytics

In this section, you deploy a stream processing application using Amazon Kinesis Analytics to analyze the preprocessed Amazon CloudFront log streams. This application analyzes directly from the Amazon Kinesis Stream as it is delivered to the preprocessing logs bucket. The stream queries in section are focused on gaining the following insights:

  • The IP address of the bot, identified by its Amazon CloudFront edge location, that is currently sending requests to your website. The query also includes the total bytes transferred as part of the response.
  • The total bytes served per distribution per population for your website.
  • The top 10 viewers of your website.

To download the firehose-access-policy.json file, type the following.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/kinesisanalytics/firehose-access-policy.json  <path_on_your_local_machine>

To download the kinesisanalytics-policy.json file, type the following.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/kinesisanalytics/assume-kinesisanalytics-policy.json <path_on_your_local_machine>

Before we create the Amazon Kinesis Analytics application, we need to create the IAM role to provide permission for the analytics application to access Amazon Kinesis Firehose stream.

Let’s use the AWS CLI to create the IAM role using the following three steps:

  1. Create the IAM policy(firehose-access-policy) for the Lambda execution role to use.
  2. Create the Lambda execution role (ka-execution-role) and assign the service to use this role.
  3. Attach the policy created in step 1 to the Lambda execution role.

Following is the firehose-access-policy.json file, which is the IAM policy used by Kinesis Analytics to read Firehose delivery stream.

{
    "Version": "2012-10-17",
    "Statement": [
      	{
    	"Sid": "AmazonFirehoseAccess",
    	"Effect": "Allow",
    	"Action": [
       	"firehose:DescribeDeliveryStream",
        	"firehose:Get*"
    	],
    	"Resource": [
              "arn:aws:firehose:*:*:deliverystream/CloudFrontLogsToS3”
       ]
     }
}

Following is the assume-kinesisanalytics-policy.json file, to grant Amazon Kinesis Analytics permissions to assume a role.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "kinesisanalytics.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

To create the IAM policy used by Analytics access role, type the following command.

aws iam create-policy --policy-name firehose-access-policy --policy-document file://<path>/firehose-access-policy.json

To create the Analytics execution role and assign the service to use this role, type the following command.

aws iam attach-role-policy --role-name ka-execution-role --policy-arn arn:aws:iam::<your-account-id>:policy/firehose-access-policy

To attach the policy (irehose-access-policy) created to the Analytics execution role (ka-execution-role), type the following command.

aws iam attach-role-policy --role-name ka-execution-role --policy-arn arn:aws:iam::<your-account-id>:policy/firehose-access-policy

To deploy the Analytics application, first download the configuration file and then modify ResourceARN and RoleARN for the Amazon Kinesis Firehose input configuration.

"KinesisFirehoseInput": { 
    "ResourceARN": "arn:aws:firehose:<region>:<account-id>:deliverystream/CloudFrontLogsToS3", 
    "RoleARN": "arn:aws:iam:<account-id>:role/ka-execution-role"
}

To download the Analytics application configuration file, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis//kinesisanalytics/kinesis-analytics-app-configuration.json <path_on_your_local_machine>

To deploy the application, type the following command.

aws kinesisanalytics create-application --application-name "cf-log-analysis" --cli-input-json file://<path>/kinesis-analytics-app-configuration.json

To start the application, type the following command.

aws kinesisanalytics start-application --application-name "cf-log-analysis" --input-configuration Id="1.1",InputStartingPositionConfiguration={InputStartingPosition="NOW"}

SQL queries using Amazon Kinesis Analytics

Scenario 1 is a query for detecting bots for sending request to your website detection for your website.

-- Create output stream, which can be used to send to a destination
CREATE OR REPLACE STREAM "BOT_DETECTION" (requesttime TIME, destribution VARCHAR(16), requestip VARCHAR(64), edgelocation VARCHAR(64), totalBytes BIGINT);
-- Create pump to insert into output 
CREATE OR REPLACE PUMP "BOT_DETECTION_PUMP" AS INSERT INTO "BOT_DETECTION"
--
SELECT STREAM 
    STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND) as requesttime,
    "distribution_name" as distribution,
    "request_ip" as requestip, 
    "edge_location" as edgelocation, 
    SUM("bytes") as totalBytes
FROM "CF_LOG_STREAM_001"
WHERE "is_bot" = true
GROUP BY "request_ip", "edge_location", "distribution_name",
STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND),
STEP("CF_LOG_STREAM_001".ROWTIME BY INTERVAL '1' SECOND);

Scenario 2 is a query for total bytes transferred per distribution for each edge location for your website.

-- Create output stream, which can be used to send to a destination
CREATE OR REPLACE STREAM "BYTES_TRANSFFERED" (requesttime TIME, destribution VARCHAR(16), edgelocation VARCHAR(64), totalBytes BIGINT);
-- Create pump to insert into output 
CREATE OR REPLACE PUMP "BYTES_TRANSFFERED_PUMP" AS INSERT INTO "BYTES_TRANSFFERED"
-- Bytes Transffered per second per web destribution by edge location
SELECT STREAM 
    STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND) as requesttime,
    "distribution_name" as distribution,
    "edge_location" as edgelocation, 
    SUM("bytes") as totalBytes
FROM "CF_LOG_STREAM_001"
GROUP BY "distribution_name", "edge_location", "request_date",
STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND),
STEP("CF_LOG_STREAM_001".ROWTIME BY INTERVAL '1' SECOND);

Scenario 3 is a query for the top 50 viewers for your website.

-- Create output stream, which can be used to send to a destination
CREATE OR REPLACE STREAM "TOP_TALKERS" (requestip VARCHAR(64), requestcount DOUBLE);
-- Create pump to insert into output 
CREATE OR REPLACE PUMP "TOP_TALKERS_PUMP" AS INSERT INTO "TOP_TALKERS"
-- Top Ten Talker
SELECT STREAM ITEM as requestip, ITEM_COUNT as requestcount FROM TABLE(TOP_K_ITEMS_TUMBLING(
  CURSOR(SELECT STREAM * FROM "CF_LOG_STREAM_001"),
  'request_ip', -- name of column in single quotes
  50, -- number of top items
  60 -- tumbling window size in seconds
  )
);

Conclusion

Following the steps in this blog post, you just built an end-to-end serverless architecture to analyze Amazon CloudFront access logs. You analyzed these both in interactive and streaming mode, using Amazon Athena and Amazon Kinesis Analytics respectively.

By creating a partition in Athena for the logs delivered to a centralized bucket, this architecture is optimized for performance and cost when analyzing large volumes of logs for popular websites that receive millions of requests. Here, we have focused on just three common use cases for analysis, sharing the analytic queries as part of the post. However, you can extend this architecture to gain deeper insights and generate usage reports to reduce latency and increase availability. This way, you can provide a better experience on your websites fronted with Amazon CloudFront.

In this blog post, we focused on building serverless architecture to analyze Amazon CloudFront access logs. Our plan is to extend the solution to provide rich visualization as part of our next blog post.


About the Authors

Rajeev Srinivasan is a Senior Solution Architect for AWS. He works very close with our customers to provide big data and NoSQL solution leveraging the AWS platform and enjoys coding . In his spare time he enjoys riding his motorcycle and reading books.

 

Sai Sriparasa is a consultant with AWS Professional Services. He works with our customers to provide strategic and tactical big data solutions with an emphasis on automation, operations & security on AWS. In his spare time, he follows sports and current affairs.

 

 


Related

Analyzing VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight

ServerlessConf and More!

Post Syndicated from Bryan Liston original https://aws.amazon.com/blogs/compute/serverless-conference-and-more/

ServerlessConf Austin

ServerlessConf Austin is just around the corner! April 26-28th come join us in Austin at the Zach Topfer Theater. Our very own Tim Wagner, Chris Munns and Randall Hunt will be giving some great talks.

Serverlessconf is a community led conference focused on sharing experiences building applications using serverless architectures. Serverless architectures enable developers to express their creativity and focus on user needs instead of spending time managing infrastructure and servers.

Tim Wagner, GM Serverless Applications, will be giving a keynote on Friday the 28th, do not miss this!!!
Chris Munns, Sr. Developer Advocate, will be giving an excellent talk on CI/CD for Serverless Applications.

Check out the full agenda here!

AWS Serverless Updates and More!

Incase you’ve missed out lately on some of our new content such as our new YouTube series “Coding with Sam”, or our new Serverless Special AWS Podcast Series, check them out!

Meet SAM!

We’ve recently come out with a new branding for AWS SAM (Serverless Application Model), so please join me in welcoming SAM the Squirrel!

The goal of AWS SAM is to define a standard application model for serverless applications.

Once again, don’t hesitate to reach out if you have questions, comments, or general feedback.

Thanks,
@listonb

AWS Big Data Blog Month in Review: March 2017

Post Syndicated from Derek Young original https://aws.amazon.com/blogs/big-data/aws-big-data-blog-month-in-review-march-2017/

Another month of big data solutions on the Big Data Blog. Please take a look at our summaries below and learn, comment, and share. Thank you for reading!

Analyze Security, Compliance, and Operational Activity Using AWS CloudTrail and Amazon Athena
In this blog post, walk through how to set up and use the recently released Amazon Athena CloudTrail SerDe to query CloudTrail log files for EC2 security group modifications, console sign-in activity, and operational account activity.  

Big Updates to the Big Data on AWS Training Course!
AWS offers a range of training resources to help you advance your knowledge with practical skills so you can get more out of the cloud. We’ve updated Big Data on AWS, a three-day, instructor-led training course to keep pace with the latest AWS big data innovations. This course allows you to hear big data best practices from an expert, get answers to your questions in person, and get hands-on practice using AWS big data services. 

Analyzing VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight
In this blog post, build a serverless architecture using Amazon Kinesis Firehose, AWS Lambda, Amazon S3, Amazon Athena, and Amazon QuickSight to collect, store, query, and visualize flow logs. In building this solution, you also learn how to implement Athena best practices with regard to compressing and partitioning data so as to reduce query latencies and drive down query costs. 

Amazon Redshift Monitoring Now Supports End User Queries and Canaries
The serverless Amazon Redshift Monitoring utility lets you gather important performance metrics from your Redshift cluster’s system tables and persists the results in Amazon CloudWatch. You can now create your own diagnostic queries and plug-in “canaries” that monitor the runtime of your most vital end user queries. These user-defined metrics can be used to create dashboards and trigger Alarms and should improve visibility into workloads running on a Cluster.  

Running R on Amazon Athena
In this blog post, connect R/RStudio running on an Amazon EC2 instance with Athena. You’ll learn to build a simple interactive application with Athena and R. Athena can be used to store and query the underlying data for your big data applications using standard SQL, while R can be used to interactively query Athena and generate analytical insights using the powerful set of libraries that R provides. This post has been translated into Japanese. 

Top 10 Performance Tuning Tips for Amazon Athena
In this blog post, we review the top 10 tips that can improve query performance. We focus on aspects related to storing data in Amazon S3 and tuning specific to queries. Amazon Athena uses Presto to run SQL queries and hence some of the advice will work if you are running Presto on Amazon EMR. This post has been translated into Japanese. 

Big Data Resources on the AWS Knowledge Center
The AWS Knowledge Center answers the questions we receive most frequently from AWS customers. It is a resource for you that is distinct from AWS Documentation, the AWS Discussion Forums, and the AWS Support Center. It covers questions from across every AWS service. This post is an introduction to Big Data resources on the AWS Knowledge Center. 

Encrypt and Decrypt Amazon Kinesis Records Using AWS KMS
In this bog post, learn to build encryption and decryption into sample Kinesis producer and consumer applications using the Amazon Kinesis Producer Library (KPL), the Amazon Kinesis Consumer Library (KCL), AWS KMS, and the aws-encryption-sdk. The methods and the techniques used in this post to encrypt and decrypt Kinesis records can be easily replicated into your architecture.

Want to learn more about Big Data or Streaming Data? Check out our Big Data and Streaming data educational pages.

Leave a comment below to let us know what big data topics you’d like to see next on the AWS Big Data Blog.

A Serverless Authentication System by Jumia

Post Syndicated from Bryan Liston original https://aws.amazon.com/blogs/compute/a-serverless-authentication-system-by-jumia/

Jumia Group is an ecosystem of nine different companies operating in 22 different countries in Africa. Jumia employs 3000 people and serves 15 million users/month.

Want to secure and centralize millions of user accounts across Africa? Shut down your servers! Jumia Group unified and centralized customer authentication on nine digital services platforms, operating in 22 (and counting) countries in Africa, totaling over 120 customer and merchant facing applications. All were unified into a custom Jumia Central Authentication System (JCAS), built in a timely fashion and designed using a serverless architecture.

In this post, we give you our solution overview. For the full technical implementation, see the Jumia Central Authentication System post on the Jumia Tech blog.

The challenge

A group-level initiative was started to centralize authentication for all Jumia users in all countries for all companies. But it was impossible to unify the operational databases for the different companies. Each company had its own user database with its own technological implementation. Each company alone had yet to unify the logins for their own countries. The effects of deduplicating all user accounts were yet to be determined but were considered to be large. Finally, there was no team responsible for manage this new project, given that a new dedicated infrastructure would be needed.

With these factors in mind, we decided to design this project as a serverless architecture to eliminate the need for infrastructure and a dedicated team. AWS was immediately considered as the best option for its level of
service, intelligent pricing model, and excellent serverless services.

The goal was simple. For all user accounts on all Jumia Group websites:

  • Merge duplicates that might exist in different companies and countries
  • Create a unique login (username and password)
  • Enrich the user profile in this centralized system
  • Share the profile with all Jumia companies

Requirements

We had the following initial requirements while designing this solution on the AWS platform:

  • Secure by design
  • Highly available via multimaster replication
  • Single login for all platforms/countries
  • Minimal performance impact
  • No admin overhead

Chosen technologies

We chose the following AWS services and technologies to implement our solution.

Amazon API Gateway

Amazon API Gateway is a fully managed service, making it really simple to set up an API. It integrates directly with AWS Lambda, which was chosen as our endpoint. It can be easily replicated to other regions, using Swagger import/export.

AWS Lambda

AWS Lambda is the base of serverless computing, allowing you to run code without worrying about infrastructure. All our code runs on Lambda functions using Python; some functions are
called from the API Gateway, others are scheduled like cron jobs.

Amazon DynamoDB

Amazon DynamoDB
is a highly scalable, NoSQL database with a good API and a clean pricing model. It has great scalability as well as high availability, and fits the serverless model we aimed for with JCAS.

AWS KMS

AWS KMS was chosen as a key manager to perform envelope encryption. It’s a simple and secure key manager with multiple encryption functionalities.

Envelope encryption

Oversimplifying envelope encryption is when you encrypt a key rather than the data itself. That encrypted key, which was used to encrypt the data, may now be stored with the data itself on your persistence layer since it doesn’t decrypt the data if compromised. For more information, see How Envelope Encryption Works with Supported AWS Services.

Envelope encryption was chosen given that master keys have a 4 KB limit for data to be encrypted or decrypted.

Amazon SQS

Amazon SQS is an inexpensive queuing service with dynamic scaling, 14-day retention availability, and easy management. It’s the perfect choice for our needs, as we use queuing systems only as a
fallback when saving data to remote regions fails. All the features needed for those fallback cases were covered.

JWT

We also use JSON web tokens for encoding and signing communications between JCAS and company servers. It’s another layer of security for data in transit.

Data structure

Our database design was pretty straightforward. DynamoDB records can be accessed by the primary key and allow access using secondary indexes as well. We created a UUID for each user on JCAS, which is used as a primary key
on DynamoDB.

Most data must be encrypted so we use a single field for that data. On the other hand, there’s data that needs to be stored in separate fields as they need to be accessed from the code without decryption for lookups or basic checks. This indexed data was also stored, as a hash or plain, outside the main ciphered blob store.

We used a field for each searchable data piece:

  • Id (primary key)
  • main phone (indexed)
  • main email (indexed)
  • account status
  • document timestamp

To hold the encrypted data we use a data dictionary with two main dictionaries, info with the user’s encrypted data and keys with the key for each AWS KMS region to decrypt the info blob.
Passwords are stored in two dictionaries, old_hashes contains the legacy hashes from the origin systems and secret holds the user’s JCAS password.

Here’s an example:
data_at_rest

Security

Security is like an onion, it needs to be layered. That’s what we did when designing this solution. Our design makes all of our data unreadable at each layer of this solution, while easing our compliance needs.

A field called data stores all personal information from customers. It’s encrypted using AES256-CBC with a key generated and managed by AWS KMS. A new data key is used for each transaction. For communication between companies and the API, we use API Keys, TLS and JWT, in the body to ensure that the post is signed and verified.

Data flow

Our second requirement on JCAS was system availability, so we designed some data pipelines. These pipelines allow multi-region replication, which evades collision using idempotent operations on all endpoints. The only technology added to the stack was Amazon SQS. On SQS queues, we place all the items we aren’t able to replicate at the time of the client’s request.

JCAS may have inconsistencies between regions, caused by network or region availability; by using SQS, we have workers that synchronize all regions as soon as possible.

Example with two regions

We have three Lambda functions and two SQS queues, where:

1) A trigger Lambda function is called for the DynamoDB stream. Upon changes to the user’s table, it tries to write directly to another DynamoDB table in the second region, falling back to writing to two SQS queues.

fig1

2) A scheduled Lambda function (cron-style) checks a SQS queue for items and tries writing them to the DynamoDB table that potentially failed.

fig2

3) A cron-style Lambda function checks the SQS queue, calling KMS for any items, and fixes the issue.

fig3

The following diagram shows the full infrastrucure (for clarity, this diagram leaves out KMS recovery).

CAS Full Diagram

Results

Upon going live, we noticed a minor impact in our response times. Note the brown legend in the images below.

lat_1

This was a cold start, as the infrastructure started to get hot, response time started to converge. On 27 April, we were almost at the initial 500ms.
lat_2

It kept steady on values before JCAS went live (≈500ms).
lat_3

As of the writing of this post, our response time kept improving (dev changed the method name and we changed the subdomain name).
lat_4

Customer login used to take ≈500ms and it still takes ≈500ms with JCAS. Those times have improved as other components changed inside our code.

Lessons learned

  • Instead of the standard cross-region replication, creating your own DynamoDB cross-region replication might be a better fit for you, if your data and applications allow it.
  • Take some time to tweak the Lambda runtime memory. Remember that it’s billed per 100ms, so it saves you money if you have it run near a round number.
  • KMS takes away the problem of key management with great security by design. It really simplifies your life.
  • Always check the timestamp before operating on data. If it’s invalid, save the money by skipping further KMS, DynamoDB, and Lambda calls. You’ll love your systems even more.
  • Embrace the serverless paradigm, even if it looks complicated at first. It will save your life further down the road when your traffic bursts or you want to find an engineer who knows your whole system.

Next steps

We are going to leverage everything we’ve done so far to implement SSO in Jumia. For a future project, we are already testing OpenID connect with DynamoDB as a backend.

Conclusion

We did a mindset revolution in many ways.
Not only we went completely serverless as we started storing critical info on the cloud.
On top of this we also architectured a system where all user data is decoupled between local systems and our central auth silo.
Managing all of these critical systems became far more predictable and less cumbersome than we thought possible.
For us this is the proof that good and simple designs are the best features to look out for when sketching new systems.

If we were to do this again, we would do it in exactly the same way.

Daniel Loureiro (SecOps) & Tiago Caxias (DBA) – Jumia Group

AWS Week in Review – February 27, 2016

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-week-in-review-february-27-2016/

This edition includes all of our announcements, content from all of our blogs, and as much community-generated AWS content as I had time for. Going forward I hope to bring back the other sections, as soon as I get my tooling and automation into better shape.

Monday

February 27

Tuesday

February 28

Wednesday

March 1

Thursday

March 2

Friday

March 3

Saturday

March 4

Sunday

March 5

Jeff;

 

Resize Images on the Fly with Amazon S3, AWS Lambda, and Amazon API Gateway

Post Syndicated from Bryan Liston original https://aws.amazon.com/blogs/compute/resize-images-on-the-fly-with-amazon-s3-aws-lambda-and-amazon-api-gateway/


John Pignata, Solutions Architect

With the explosion of device types used to access the Internet with different capabilities, screen sizes, and resolutions, developers must often provide images in an array of sizes to ensure a great user experience. This can become complex to manage and drive up costs.

Images stored using Amazon S3 are often processed into multiple sizes to fit within the design constraints of a website or mobile application. It’s a common approach to use S3 event notifications and AWS Lambda for eager processing of images when a new object is created in a bucket.

In this post, I explore a different approach and outline a method of lazily generating images, in which a resized asset is only created if a user requests that specific size.

Resizing on the fly

Instead of processing and resizing images into all necessary sizes upon upload, the approach of processing images on the fly has several upsides:

  • Increased agility
  • Reduced storage costs
  • Resilience to failure

Increased agility

When you redesign your website or application, you can add new dimensions on the fly, rather than working to reprocess the entire archive of images that you have stored.

Running a batch process to resize all original images into new, resized dimensions can be time-consuming, costly, and error-prone. With the on-the-fly approach, a developer can instead specify a new set of dimensions and lazily generate new assets as customers use the new website or application.

Reduced storage costs

With eager image processing, the resized images must be stored indefinitely as the operation only happens one time. The approach of resizing on-demand means that developers do not need to store images that are not accessed by users.

As a user request initiates resizing, this also unlocks options for optimizing storage costs for resized image assets, such as S3 lifecycle rules to expire older images that can be tuned to an application’s specific access patterns. If a user attempts to access a resized image that has been removed by a lifecycle rule, the API resizes it on demand to fulfill the request.

Resilience to failure

A key best practice outlined in the Architecting for the Cloud: Best Practices whitepaper is

“Design for failure and nothing will fail.” When building distributed services, developers should be pessimistic and assume that failures will occur.

If image processing is designed to occur only one time upon object creation, an intermittent failure in that process―or any data loss to the processed images―could cause continual failures to future users. When resizing images on-demand, each request initiates processing if a resized image is not found, meaning that future requests could recover from a previous failure automatically.

Architecture overview

diagram

Here’s the process:

  1. A user requests a resized asset from an S3 bucket through its static website hosting endpoint. The bucket has a routing rule configured to redirect to the resize API any request for an object that cannot be found.
  2. Because the resized asset does not exist in the bucket, the request is temporarily redirected to the resize API method.
  3. The user’s browser follows the redirect and requests the resize operation via API Gateway.
  4. The API Gateway method is configured to trigger a Lambda function to serve the request.
  5. The Lambda function downloads the original image from the S3 bucket, resizes it, and uploads the resized image back into the bucket as the originally requested key.
  6. When the Lambda function completes, API Gateway permanently redirects the user to the file stored in S3.
  7. The user’s browser requests the now-available resized image from the S3 bucket. Subsequent requests from this and other users will be served directly from S3 and bypass the resize operation. If the resized image is deleted in the future, the above process repeats and the resized image is re-created and replaced into the S3 bucket.

Set up resources

A working example with code is open source and available in the serverless-image-resizing GitHub repo. You can create the required resources by following the README directions, which use an AWS Serverless Application Model (AWS SAM) template, or manually following the directions below.

To create and configure the S3 bucket

  1. In the S3 console, create a new S3 bucket.
  2. Choose Permissions, Add Bucket Policy. Add a bucket policy to allow anonymous access.
  3. Choose Static Website Hosting, Enable website hosting and, for Index Document, enter index.html.
  4. Choose Save.
  5. Note the name of the bucket that you’ve created and the hostname in the Endpoint field.

To create the Lambda function

  1. In the Lambda console, choose Create a Lambda function, Blank Function.
  2. To select an integration, choose the dotted square and choose API Gateway.
  3. To allow all users to invoke the API method, for Security, choose Open and then Next.
  4. For Name, enter resize. For Code entry type, choose Upload a .ZIP file.
  5. Choose Function package and upload the .ZIP file of the contents of the Lambda function.
  6. To configure your function, for Environment variables, add two variables:
    • For Key, enter BUCKET; for Value,enter the bucket name that you created above.
    • For Key, enter URL; for Value, enter the endpoint field that you noted above, prefixed with http://.
  7. To define the execution role permissions for the function, for Role, choose Create a custom role. Choose View Policy Document, Edit, Ok.
  8. Replace YOUR_BUCKET_NAME_HERE with the name of the bucket that you’ve created and copy the following code into the policy document. Note that any leading spaces in your policy may cause a validation error.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*"
    },
    {
      "Effect": "Allow",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::__YOUR_BUCKET_NAME_HERE__/*"    
    }
  ]
}
  1. For Memory, choose 1536. For Timeout, enter 10 sec. Choose Next, Create function.
  2. Choose Triggers, and note the hostname in the URL of your function.

resizefly_2.png

To set up the S3 redirection rule

  1. In the S3 console, open the bucket that you created above.
  2. Expand Static Website Hosting, Edit Redirection Rules.
  3. Replace YOUR_API_HOSTNAME_HERE with the hostname that you noted above and copy the following into the redirection rules configuration:
<routingrules>
 <routingrule>
  <condition>
   <keyprefixequals>
    <httperrorcodereturnedequals>
     404
    </httperrorcodereturnedequals>
   </keyprefixequals>
  </condition>
  <redirect>
   <protocol>
    https
   </protocol>
   <hostname>
    __YOUR_API_HOSTNAME_HERE__
   </hostname>
   <replacekeyprefixwith>
    prod/resize?key=
   </replacekeyprefixwith>
   <httpredirectcode>
    307
   </httpredirectcode>
  </redirect>
 </routingrule>
</routingrules>

Test image resizing

Upload a test image into your bucket to for testing. The blue marble is a great sample image for testing because it is large and square. Once uploaded, try to retrieve resized versions of the image using your bucket’s static website hosting endpoint:

http://YOUR_BUCKET_WEBSITE_HOSTNAME_HERE/300×300/blue_marble.jpg

http://YOUR_BUCKET_WEBSITE_HOSTNAME_HERE/25×25/blue_marble.jpg

http://YOUR_BUCKET_WEBSITE_HOSTNAME_HERE/500×500/blue_marble.jpg

You should see a smaller version of the test photo. If not, choose Monitoring in your Lambda function and check CloudWatch Logs for troubleshooting. You can also refer to the serverless-image-resizing GitHub repo for a working example that you can deploy to your account.

Conclusion

The solution I’ve outlined is a simplified example of how to implement this functionality. For example, in a real-world implementation, there would likely be a list of permitted sizes to prevent a requestor from filling your bucket with randomly sized images. Further cost optimizations could be employed, such as using S3 lifecycle rules on a bucket dedicated to resized images to expire resized images after a given amount of time.

This approach allows you to lazily generate resized images while taking advantage of serverless architecture. This means you have no operating systems to manage, secure, or patch; no servers to right-size, monitor, or scale; no risk of over-spending by over-provisioning; and no risk of delivering a poor user experience due to poor performance by under-provisioning.

If you have any questions, feedback, or suggestions about this approach, please let us know in the comments!

AWS Lambda – A Look Back at 2016

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/aws-lambda-a-look-back-at-2016/

2016 was an exciting year for AWS Lambda, Amazon API Gateway and serverless compute technology, to say the least. But just in case you have been hiding away and haven’t heard of serverless computing with AWS Lambda and Amazon API Gateway, let me introduce these great services to you.  AWS Lambda lets you run code without provisioning or managing servers, making it a serverless compute service that is event-driven and allows developers to bring their functions to the cloud easily for virtually any type of application or backend.  Amazon API Gateway helps you quickly build highly scalable, secure, and robust APIs at scale and provides the ability to maintain and monitor created APIs.

With the momentum of serverless in 2016, of course, the year had to end with a bang as the AWS team launched some powerful service features at re:Invent to make it even easier to build serverless solutions.  These features include:

Since Jeff has already introduced most of the aforementioned new service features for building distributed applications and microservices like Step Functions, let’s walk-through the last four new features not yet discussed using a common serverless use case example: Real-time Stream Processing.  In our walk-through of the stream processing use case, we will implement a Dead Letter Queue for notifications of errors that may come from the Lambda function processing a stream of data, we will take an existing Lambda function written in Node.js to process the stream and rewrite it using the C# language.  We then will build an example of the monetization of a Lambda backed API using API Gateway’s integration with AWS Marketplace.  This will be exciting, so let’s get started.

During the AWS Developer Days in San Francisco and Austin, I presented an example of leveraging AWS Lambda for real-time stream processing by building a demo showcasing a streaming solution with Twitter Streaming APIs. I will build upon this example to demonstrate the power of Dead Letter Queues (DLQ), C# Support, API Gateway Monetization features, and the open source template for API Gateway Developer Portal.  In the demo, a console or web application streams tweets gathered from the Twitter Streaming API that has the keywords ‘awscloud’ and/or ‘serverless’.  Those tweets are sent real-time to Amazon Kinesis Streams where Lambda detects the new records and processes the stream batch by writing the tweets to the NoSQL database, Amazon DynamoDB.

Now that we understand the real-time streaming process demo’s workflow, let’s take a deeper look at the Lambda function that processes the batch records from Kinesis.  First, you will notice below that the Lambda function, DevDayStreamProcessor, has an event source or trigger that is a Kinesis stream named DevDay2016Stream with a Batch size of 100.  Our Lambda function will poll the stream periodically for new records and automatically read and process batches of records, in this case, the tweets detected on the stream.

Now we will examine our Lambda function code which is written in Node.js 4.3. The section of the Lambda function shown below loops through the batch of tweet records from our Kinesis stream, parses each record, and writes desired tweet information into an array of JSON data. The array of the JSON tweet items is passed to the function, ddbItemsWrite which is outside of our Lambda handler.

'use strict';
console.log('Loading function');

var timestamp;
var twitterID;
var tweetData;
var ddbParams;
var itemNum = 0;
var dataItemsBatch = [];
var dbBatch = [];
var AWS = require('aws-sdk');
var ddbTable = 'TwitterStream';
var dynamoDBClient = new AWS.DynamoDB.DocumentClient();

exports.handler = (event, context, callback) => {
    var counter = 0; 
    
    event.Records.forEach((record) => {
        // Kinesis data is base64 encoded so decode here
        console.log("Base 64 record: " + JSON.stringify(record, null, 2));
        const payload = new Buffer(record.kinesis.data, 'base64').toString('ascii');
        console.log('Decoded payload:', payload);
        
        var data = payload.replace(/[\u0000-\u0019]+/g," "); 
        try
        {  tweetData = JSON.parse(data);   }
        catch(err)
        {  callback(err, err.stack);   }
        
        timestamp = "" + new Date().getTime();
        twitterID = tweetData.id.toString();
        itemNum = itemNum+1;
               
         var ddbItem = {
                PutRequest: { 
                    Item: { 
                        TwitterID: twitterID,
                        TwitterUser: tweetData.username.toString(),
                        TwitterUserPic: tweetData.pic,
                        TwitterTime: new Date(tweetData.time.replace(/( \+)/, ' UTC$1')).toLocaleString(), 
                        Tweet: tweetData.text,
                        TweetTopic: tweetData.topic,
                        Tags: (tweetData.hashtags) ? tweetData.hashtags : " ",
                        Location: (tweetData.loc) ? tweetData.loc : " ",
                        Country: (tweetData.country) ? tweetData.country : " ",
                        TimeStamp: timestamp,
                        RecordNum: itemNum
                    }
                }
            };
            
            dataItemsBatch.push(ddbItem);
            counter++;
});
    
    var twitterItems = {}; 
    twitterItems[ddbTable] = dataItemsBatch; 
    ddbItemsWrite(twitterItems, 0, context, callback); 

};

The ddbItemsWrite function shown below will take the array of JSON tweet records processed from the Kinesis stream, and write the records multiple items at a time to our DynamoDB table using batch operations. This function leverages the DynamoDB best practice of retrying unprocessed items by implementing an exponential backoff algorithm to prevent write request failures due to throttling on the individual tables.

 function ddbItemsWrite(items, retries, ddbContext, ddbCallback) 
    { 
        dynamoDBClient.batchWrite({ RequestItems: items }, function(err, data) 
            { 
                if (err) 
                { 
                    console.log('DDB call failed: ' + err, err.stack); 
                    ddbCallback(err, err.stack); 
                } 
                else 
                { 
                    if(Object.keys(data.UnprocessedItems).length) 
                    { 
                        console.log('Unprocessed items remain, retrying.'); 
                        var delay = Math.min(Math.pow(2, retries) * 100, ddbContext.getRemainingTimeInMillis() - 200); 
                        setTimeout(function() {ddbItemsWrite(data.UnprocessedItems, retries + 1, ddbContext, ddbCallback)}, delay); 
                    } 
                    else 
                    { 
                         ddbCallback(null, "Success");
                         console.log("Completed Successfully");
                    } 
                } 
            } 
        );
    }

Currently, this Lambda function works as expected and will successfully process tweets captured in Kinesis from the Twitter Streaming API, however, this function has a flaw that will cause an error to occur when processing batch write requests to our DynamoDB table.  In the Lambda function, the current code does not take into account that the DynamoDB batchWrite function should be comprised of no more than 25 write (put) requests per single call to this function up to 16 MB of data. Therefore, without changing the code appropriately to have the ddbItemsWrite function to handle batches of 25 or have the handler function put items in the array in groups of 25 requests before sending to the ddbItemsWrite function; there will be a validation exception thrown when the batch of tweets items sent is greater than 25.  This is a great example of a bug that is not easily detected in small-scale testing scenarios yet will cause failures under production load.

 

Dead Letter Queues

Now that we are aware of an event that will cause the ddbItemsWrite Lambda function to throw an exception and/or an event that will fail while processing records, we have a first-rate scenario for leveraging Dead Letter Queues (DLQ).

Since AWS Lambda DLQ functionality is only available for asynchronous event sources like Amazon S3, Amazon SNS, AWS IoT or direct asynchronous invocations, and not for streaming event sources such as Amazon Kinesis or Amazon DynamoDB streams; our first step is to break this Lambda function into two functions.  The first Lambda function will handle the processing of the Kinesis stream, and the second Lambda function will take the data processed by the first function and write the tweet information to DynamoDB.  We will then setup our DLQ on the second Lambda function for the error that will occur on writing the batch of tweets to DynamoDB as noted above.

We have two options when setting up a target for our DLQ; Amazon SNS topic or an Amazon SQS queue.  In this walk-through, we will opt for using an Amazon SQS queue.  Therefore, my first step in using DLQ is to create a SQS Standard queue.  A Standard queue type is a queue which has high transactions throughput, a message will be delivered at least once, but another copy of the message may also be delivered, and it is possible that messages might be delivered in an order different from which they were sent.  You can learn more about creating SQS queues and queue type in the Amazon SQS documentation.

Once my queue, StreamDemoDLQ, is created, I will grab the ARN from the Details tab of this selected queue. If I am not using the console to designate the DLQ resource for this function, I will need the ARN for the queue for my Lambda function to identify this SQS queue as the DLQ target for error and event failure notifications. Additionally, I will use the ARN to add permissions to my Lambda execution role policy in order to access this SQS queue.

I will now return to my Lambda function and select the Configuration tab and expand the Advanced settings section. I will select SQS in the DLQ Resource field and select my StreamDemoDLQ queue in the SQS Queue field dropdown.

Remember, the execution role for the Lambda function must explicitly provide sqs:SendMessage access permissions to in order to successfully send messages to your SQS DLQ.  Therefore, I ensured that my Lambda role, lambda_kinesis_role, has the following IAM policy for SQS permissions.

 

We have now successfully configured a Dead Letter Queue for our Lambda function using Amazon SQS. To learn more about Dead Letter Queues in Lambda, read the Troubleshooting and Monitoring section of the AWS Lambda Developer Guide and check out the AWS Compute Blog post on Dead Letter Queues.

C# Support

As I mentioned earlier, another very exciting feature added to Lambda during AWS re:Invent was the support for the C# language via the open source .NET Core 1.0 platform.  Since the Lambda console does not offer editing for compiled languages yet, in order to author a C# Lambda function you can use tooling in Visual Studio with the AWS Toolkit, Yeoman, and/or the .NET CLI.  To deploy Lambda functions written in C#, you can use the Lambda plugin in the AWS ToolKit for Visual Studio or create a deployment package with the .NET Core command line.

A C# Lambda function handler should be defined as an instance or static method in a class. There are two handler function parameters; the first is the input type which is the event data and second is the Lambda context object of type ILambdaContext. The event data input object types for AWS Services include the following:

  • Amazon.Lambda.APIGatewayEvents
  • Amazon.Lambda.CognitoEvents
  • Amazon.Lambda.ConfigEvents
  • Amazon.Lambda.DynamoDBEvents
  • Amazon.Lambda.KinesisEvents
  • Amazon.Lambda.S3Events
  • Amazon.Lambda.SNSEvents

Now that we have discussed more detail around C# Support in Lambda, let’s rewrite our DevDayStreamProcessor lambda function with the C# language. For this example, I will use Visual Studio IDE to write the Lambda function, and additionally take advantage of the AWS Lambda Visual Studio plugin to deploy the function. Remember in order to use the AWS Toolkit for Visual Studio with Lambda, you will need to have Visual Studio 2015 Update 3 version and NET Core tools. You can read more about installing Visual Studio 2015 Update 3 and .NET Core here.

To create the C# function using Visual Studio, I start a New Project, select AWS Lambda Project (.NET Core) and name it ServerlessStreamProcessor.

What’s really cool about taking advantage of the AWS Toolkit for Visual Studio to author this function, is that inside of Visual Studio I can use Lambda blueprints to get started in a similar way that I would in using the Lambda console.  Therefore in order to replicate the DevDayStreamProcessor in C#, I will select the Simple Kinesis Function blueprint.

It should be noted that when writing Lambda functions in C#, there is no need to mark the class declaration nor the target handler function as a Lambda function. Additionally, when writing CloudWatch logs you can use the standard C# Console class WriteLine function or use the ILambdaContext LogLine function found as a part of the ILambdaContext interface. With the template for accessing the Kinesis stream in place, I finish writing the C# Lambda function, ServerlessStreamProcessor, utilizing the same variable names as in the Node.js code in DevDayStreamProcessor. Please note the C# Lambda handler function below.

using System.Collections.Generic;
using Amazon.Lambda.Core;
using Amazon.Lambda.KinesisEvents;
using Amazon.DynamoDBv2;
using Amazon.DynamoDBv2.DataModel;
using Newtonsoft.Json.Linq;

// Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class.
[assembly: LambdaSerializerAttribute(typeof(Amazon.Lambda.Serialization.Json.JsonSerializer))]

namespace ServerlessStreamProcessor
{
    public class LambdaTwitterStream
    {
        string twitterID, timeStamp;
        int itemNum = 0;
        
        private static AmazonDynamoDBClient dynamoDBClient = new AmazonDynamoDBClient();
        List<TwitterItem> dataItemsBatch = new List<TwitterItem>();
        
        public void FunctionHandler(KinesisEvent kinesisEvent, ILambdaContext context)
        {
            DynamoDBContext dbContext = new DynamoDBContext(dynamoDBClient);
            context.Logger.LogLine($"Beginning to process {kinesisEvent.Records.Count} records...");
            
            foreach (var record in kinesisEvent.Records)
            {
                context.Logger.LogLine($"Event ID: {record.EventId}");
                context.Logger.LogLine($"Event Name: {record.EventName}");

                // Kinesis data is base64 encoded so decode here
                string tweetData = GetRecordContents(record.Kinesis);
                context.Logger.LogLine($"Decoded Payload: {tweetData}");
                tweetData = @"" + tweetData;
                JObject twitterObj = JObject.Parse(tweetData);
                
                twitterID = twitterObj["id"].ToString();
                timeStamp = DateTime.Now.Millisecond.ToString();
                itemNum++;
                context.Logger.LogLine(timeStamp);
                context.Logger.LogLine($"Twitter ID is: {twitterID}");
                context.Logger.LogLine(itemNum.ToString());

                TwitterItem ddbItem = new TwitterItem()
                { 
                    TwitterID = twitterID,
                    TwitterUser = twitterObj["username"].ToString(),
                    TwitterUserPic = twitterObj["pic"].ToString(),
                    TwitterTime = DateTime.Parse(twitterObj["time"].ToString()).ToUniversalTime().ToString(),
                    Tweet = twitterObj["text"].ToString(),
                    TweetTopic = twitterObj["topic"].ToString(),
                    Tags = twitterObj["hashtags"] != null ? twitterObj["hashtags"].ToString() : String.Empty,
                    Location = twitterObj["loc"] != null ? twitterObj["loc"].ToString() : String.Empty,
                    Country = twitterObj["country"] != null ? twitterObj["country"].ToString() : String.Empty,
                    TimeStamp =  timeStamp,
                    RecordNum = itemNum
                };
                
                dataItemsBatch.Add(ddbItem);
            }

            context.Logger.LogLine(JObject.FromObject(dataItemsBatch).ToString());
            ddbItemsWrite(dataItemsBatch, 0, dbContext, context);
            context.Logger.LogLine("Success - Completed Successfully");
            context.Logger.LogLine("Stream processing complete.");
        }

There are only a few differences that should be noted between our Kinesis stream processor written in C# and our original Node.js code.  Since the input parameter type supported by default in C# Lambda functions is the System.IO.Stream type, the Kinesis base64 string is decoded by using a StreamReader with ASCII encoding in a blueprint provided function, GetRecordContents.

 

private string GetRecordContents(KinesisEvent.Record streamRecord)
{
    using (var reader = new StreamReader(streamRecord.Data, Encoding.ASCII))
    {
        return reader.ReadToEnd();
    }
}

The other thing to note is that in order to write the tweet data to the DynamoDB Table, I added the AWS .NET SDK NuGet package for DynamoDB; AWSSDK.DynamoDBv2 to the Lambda function project via the NuGet package manager within Visual Studio.  I also created a .NET data object, TwitterItem, to map to the data being stored in the DynamoDB table. Using the AWS .NET SDK higher level programming interface, object persistence model for DynamoDB, I created a collection of TwitterItem objects to be written via the BatchWrite object class in our ddbItemsWrite C# function.

private async void ddbItemsWrite(List<TwitterItem> items, int retries, DynamoDBContext ddbContext, ILambdaContext context)
{
BatchWrite<TwitterItem> twitterStreamBatchWrite = ddbContext.CreateBatchWrite<TwitterItem>();
        
        try
        {
            twitterStreamBatchWrite.AddPutItems(items);   
            await twitterStreamBatchWrite.ExecuteAsync();
        }
        catch (Exception ex)
        {
            context.Logger.LogLine($"DDB call failed: {ex.Source} ");
            context.Logger.LogLine($"Exception: {ex.Message}");
            context.Logger.LogLine($"Exception Stacktrace: {ex.StackTrace}");
        }      
}

Another benefit of using AWS Toolkit for Visual Studio to author my C# Lambda function is that I can deploy my Lambda function directly to AWS with a single click.  Selecting my project name in the Solution Explorer and performing a right-click, I get a menu option, Publish to AWS Lambda, which brings up a menu for information to include about my Lambda function for deployment to AWS.

It is important to note that the handler function signature follows the nomenclature of Assembly :: Namespace :: ClassName :: Method, therefore, the signature of our C# Lambda function shown here is: ServerlessStreamProcessor :: ServerlessStreamProcessor.LambdaTwitterStream :: FunctionHandler.  We provide this information to the Upload to AWS Lambda dialog box and select Next to assign a role for the function.

Upon completion, you can test in the Lambda console or in Visual Studio with AWS toolkit provided plugin (shown below) using the sample data of the triggering event source for an iterative approach to developing the Lambda function.

You can learn more about authoring AWS Lambda functions using the C# Language in the AWS Lambda developer guide or by reading the post announcing C# Support on the Compute Blog.

API Gateway Monetization and Developer Portal

If you have been following the microservices momentum, you may be aware of an architectural pattern that calls for using smart endpoints and/or using an API gateway via REST APIs to manage access and exposure of individual services that make up a microservices solution.  Amazon API Gateway enables creation and management of RESTful APIs to expose AWS Lambda functions, external HTTP endpoints, as well as, other AWS services.  In addition, Amazon API Gateway allows clients and external developers to have access to a deployed APIs by via HTTP protocol or a platform/language targeted SDK.

With the introduction of SaaS Subscriptions on AWS Marketplace and the API Gateway integration with the AWS Marketplace, you can now monetize your APIs by allowing customers to directly consume the APIs you create with API Gateway in the AWS Marketplace.  AWS customers can subscribe and be billed for the APIs published on the marketplace with their existing AWS account.  With the integration of API Gateway with the AWS Marketplace, the process to get started is easy on the AWS Marketplace.

To get started, you must ensure that you have enabled the Usage Plan feature in Amazon API Gateway.

Once enabled the next step is to create a Usage Plan, enable throttling (if desired) with targeted rate and burst request thresholds, and finally enable quotas (if you choose) by providing targeted request quota per a set timeframe.

Next, we would choose our APIs and related stage(s) that we wish to be associated with the usage plan. Please note that this is an optional step as you can opt not associate a specific API with your usage plan.

All that is left to do is add or create an API key for the usage plan.  Again, it should be noted that this is also an optional step in creating your usage plan.

Now that we have our usage plan, StreamingPlan, we are ready for the next step in preparation for selling our API on the marketplace. You have the option to create multiple usage plans with varying APIs and limits, and sell these plans as differentiated API products on AWS Marketplace.

In order to enable customers to buy our new API product, however, the AWS Marketplace requires that each API product has an external developer portal to handle subscription requests, provide API information details and ability for the management of usage.

This customer need for an external developer portal for the marketplace birthed the new open source API Gateway developer portal serverless web application implementation.  The goal of the API Gateway developer portal project was to allow customers to follow a few easy steps to create a serverless web application that lists a catalog of your APIs built with API Gateway while allowing for developer signups.

The API Gateway developer portal was built upon AWS Serverless Express; an open source library published by AWS which aids you in utilizing AWS Lambda and Amazon API Gateway in building web applications/services with the Node.js Express framework.  Additionally, the API Gateway developer portal application uses an AWS SAM (Serverless Application Model) template to deploy its serverless resources.  AWS SAM is a simplified CloudFormation template and specification that allows easier management and deployment of serverless applications on AWS.

To build your developer portal using the API Gateway portal, you would start by cloning the aws-api-gateway-developer-portal project from GitHub.

Assuming you have the latest version of the AWS CLI and Node.js installed, you would setup the developer portal by running “npm run setup” on the command line for Mac and Linux OS users. For Windows users, you would run “npm run win-setup” on the command line setup the developer portal.

The result is a functional sample developer portal website running on S3 that you can customize in order to create your own developer portal for your APIs.

The frontend of the sample developer portal website is built with the React JavaScript library, and the backend is an AWS Lambda function running using the aws-serverless-express library. Additionally, a Lambda function with a SNS event source was created as a listener for notification when customers subscribe or unsubscribe to your API via the AWS Marketplace console.  You can learn more about the steps to build, customize, and deploy your API Gateway developer portal web application with this reference project by visiting the AWS Compute blog post which discusses the architecture and implementation in more detail.

 

The next key step in monetizing our API is establishing an account on the AWS Marketplace.  If an account is not already established, registering is simply verifying that you meet the requirement prerequisites provided in the AWS Marketplace Seller Guide and completing a seller registration form on the AWS Marketplace Management Portal.  You can see a snapshot of the start of the seller registration form below.

To list the API, you would fill a product load form describing the API, establish the pricing for the API, and provide t\he IDs of AWS Accounts that will test the API subscription process.  Completing this form would also require you to submit the URL for your API developer portal.

When your seller registration is complete, you will be supplied an AWS Marketplace product code.  You will need to associate your marketplace product code with your API usage plan.  In order to complete this step, you would simply log into the API Gateway console and go to your API usage plan. Go to the Marketplace tab and enter your product code. This tells API Gateway to send measurement data to AWS Marketplace when your API is used.

With your Amazon API Gateway managed API packaged into a usage plan, the accompanying API developer portal created, seller account registration completed, and product code associated with API usage plan; we are now ready to monetize our API on the AWS Marketplace.

Learn more about monetizing your APIs created with API Gateway by checking out the related blog post and reviewing the API Gateway developer guide documentation.

Summary

As you can see, the AWS teams were busy in 2016 working to make the customer experience easier for creating and deploying serverless architectures, as well as, providing mechanisms for customers to generate and monetize their API Gateway managed APIs.

Visit the product documentation for AWS Lambda and Amazon API Gateway to learn more about these services and all the newly released features.

Tara