Tag Archives: console

Flight Sim Company Threatens Reddit Mods Over “Libelous” DRM Posts

Post Syndicated from Andy original https://torrentfreak.com/flight-sim-company-threatens-reddit-mods-over-libellous-drm-posts-180604/

Earlier this year, in an effort to deal with piracy of their products, flight simulator company FlightSimLabs took drastic action by installing malware on customers’ machines.

The story began when a Reddit user reported something unusual in his download of FlightSimLabs’ A320X module. A file – test.exe – was being flagged up as a ‘Chrome Password Dump’ tool, something which rang alarm bells among flight sim fans.

As additional information was made available, the story became even more sensational. After first dodging the issue with carefully worded statements, FlightSimLabs admitted that it had installed a password dumper onto ALL users’ machines – whether they were pirates or not – in an effort to catch a particular software cracker and launch legal action.

It was an incredible story that no doubt did damage to FlightSimLabs’ reputation. But the now the company is at the center of a new storm, again centered around anti-piracy measures and again focused on Reddit.

Just before the weekend, Reddit user /u/walkday reported finding something unusual in his A320X module, the same module that caused the earlier controversy.

“The latest installer of FSLabs’ A320X puts two cmdhost.exe files under ‘system32\’ and ‘SysWOW64\’ of my Windows directory. Despite the name, they don’t open a command-line window,” he reported.

“They’re a part of the authentication because, if you remove them, the A320X won’t get loaded. Does someone here know more about cmdhost.exe? Why does FSLabs give them such a deceptive name and put them in the system folders? I hate them for polluting my system folder unless, of course, it is a dll used by different applications.”

Needless to say, the news that FSLabs were putting files into system folders named to make them look like system files was not well received.

“Hiding something named to resemble Window’s “Console Window Host” process in system folders is a huge red flag,” one user wrote.

“It’s a malware tactic used to deceive users into thinking the executable is a part of the OS, thus being trusted and not deleted. Really dodgy tactic, don’t trust it and don’t trust them,” opined another.

With a disenchanted Reddit userbase simmering away in the background, FSLabs took to Facebook with a statement to quieten down the masses.

“Over the past few hours we have become aware of rumors circulating on social media about the cmdhost file installed by the A320-X and wanted to clear up any confusion or misunderstanding,” the company wrote.

“cmdhost is part of our eSellerate infrastructure – which communicates between the eSellerate server and our product activation interface. It was designed to reduce the number of product activation issues people were having after the FSX release – which have since been resolved.”

The company noted that the file had been checked by all major anti-virus companies and everything had come back clean, which does indeed appear to be the case. Nevertheless, the critical Reddit thread remained, bemoaning the actions of a company which probably should have known better than to irritate fans after February’s debacle. In response, however, FSLabs did just that once again.

In private messages to the moderators of the /r/flightsim sub-Reddit, FSLabs’ Marketing and PR Manager Simon Kelsey suggested that the mods should do something about the thread in question or face possible legal action.

“Just a gentle reminder of Reddit’s obligations as a publisher in order to ensure that any libelous content is taken down as soon as you become aware of it,” Kelsey wrote.

Noting that FSLabs welcomes “robust fair comment and opinion”, Kelsey gave the following advice.

“The ‘cmdhost.exe’ file in question is an entirely above board part of our anti-piracy protection and has been submitted to numerous anti-virus providers in order to verify that it poses no threat. Therefore, ANY suggestion that current or future products pose any threat to users is absolutely false and libelous,” he wrote, adding:

“As we have already outlined in the past, ANY suggestion that any user’s data was compromised during the events of February is entirely false and therefore libelous.”

Noting that FSLabs would “hate for lawyers to have to get involved in this”, Kelsey advised the /r/flightsim mods to ensure that no such claims were allowed to remain on the sub-Reddit.

But after not receiving the response he would’ve liked, Kelsey wrote once again to the mods. He noted that “a number of unsubstantiated and highly defamatory comments” remained online and warned that if something wasn’t done to clean them up, he would have “no option” than to pass the matter to FSLabs’ legal team.

Like the first message, this second effort also failed to have the desired effect. In fact, the moderators’ response was to post an open letter to Kelsey and FSLabs instead.

“We sincerely disagree that you ‘welcome robust fair comment and opinion’, demonstrated by the censorship on your forums and the attempted censorship on our subreddit,” the mods wrote.

“While what you do on your forum is certainly your prerogative, your rules do not extend to Reddit nor the r/flightsim subreddit. Removing content you disagree with is simply not within our purview.”

The letter, which is worth reading in full, refutes Kelsey’s claims and also suggests that critics of FSLabs may have been subjected to Reddit vote manipulation and coordinated efforts to discredit them.

What will happen next is unclear but the matter has now been placed in the hands of Reddit’s administrators who have agreed to deal with Kelsey and FSLabs’ personally.

It’s a little early to say for sure but it seems unlikely that this will end in a net positive for FSLabs, no matter what decision Reddit’s admins take.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Amazon Neptune Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-neptune-generally-available/

Amazon Neptune is now Generally Available in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland). Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets. At the core of Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with millisecond latencies. Neptune supports two popular graph models, Property Graph and RDF, through Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune can be used to power everything from recommendation engines and knowledge graphs to drug discovery and network security. Neptune is fully-managed with automatic minor version upgrades, backups, encryption, and fail-over. I wrote about Neptune in detail for AWS re:Invent last year and customers have been using the preview and providing great feedback that the team has used to prepare the service for GA.

Now that Amazon Neptune is generally available there are a few changes from the preview:

Launching an Amazon Neptune Cluster

Launching a Neptune cluster is as easy as navigating to the AWS Management Console and clicking create cluster. Of course you can also launch with CloudFormation, the CLI, or the SDKs.

You can monitor your cluster health and the health of individual instances through Amazon CloudWatch and the console.

Additional Resources

We’ve created two repos with some additional tools and examples here. You can expect continuous development on these repos as we add additional tools and examples.

  • Amazon Neptune Tools Repo
    This repo has a useful tool for converting GraphML files into Neptune compatible CSVs for bulk loading from S3.
  • Amazon Neptune Samples Repo
    This repo has a really cool example of building a collaborative filtering recommendation engine for video game preferences.

Purpose Built Databases

There’s an industry trend where we’re moving more and more onto purpose-built databases. Developers and businesses want to access their data in the format that makes the most sense for their applications. As cloud resources make transforming large datasets easier with tools like AWS Glue, we have a lot more options than we used to for accessing our data. With tools like Amazon Redshift, Amazon Athena, Amazon Aurora, Amazon DynamoDB, and more we get to choose the best database for the job or even enable entirely new use-cases. Amazon Neptune is perfect for workloads where the data is highly connected across data rich edges.

I’m really excited about graph databases and I see a huge number of applications. Looking for ideas of cool things to build? I’d love to build a web crawler in AWS Lambda that uses Neptune as the backing store. You could further enrich it by running Amazon Comprehend or Amazon Rekognition on the text and images found and creating a search engine on top of Neptune.

As always, feel free to reach out in the comments or on twitter to provide any feedback!

Randall

Monitoring your Amazon SNS message filtering activity with Amazon CloudWatch

Post Syndicated from Rachel Richardson original https://aws.amazon.com/blogs/compute/monitoring-your-amazon-sns-message-filtering-activity-with-amazon-cloudwatch/

This post is courtesy of Otavio Ferreira, Manager, Amazon SNS, AWS Messaging.

Amazon SNS message filtering provides a set of string and numeric matching operators that allow each subscription to receive only the messages of interest. Hence, SNS message filtering can simplify your pub/sub messaging architecture by offloading the message filtering logic from your subscriber systems, as well as the message routing logic from your publisher systems.

After you set the subscription attribute that defines a filter policy, the subscribing endpoint receives only the messages that carry attributes matching this filter policy. Other messages published to the topic are filtered out for this subscription. In this way, the native integration between SNS and Amazon CloudWatch provides visibility into the number of messages delivered, as well as the number of messages filtered out.

CloudWatch metrics are captured automatically for you. To get started with SNS message filtering, see Filtering Messages with Amazon SNS.

Message Filtering Metrics

The following six CloudWatch metrics are relevant to understanding your SNS message filtering activity:

  • NumberOfMessagesPublished – Inbound traffic to SNS. This metric tracks all the messages that have been published to the topic.
  • NumberOfNotificationsDelivered – Outbound traffic from SNS. This metric tracks all the messages that have been successfully delivered to endpoints subscribed to the topic. A delivery takes place either when the incoming message attributes match a subscription filter policy, or when the subscription has no filter policy at all, which results in a catch-all behavior.
  • NumberOfNotificationsFilteredOut – This metric tracks all the messages that were filtered out because they carried attributes that didn’t match the subscription filter policy.
  • NumberOfNotificationsFilteredOut-NoMessageAttributes – This metric tracks all the messages that were filtered out because they didn’t carry any attributes at all and, consequently, didn’t match the subscription filter policy.
  • NumberOfNotificationsFilteredOut-InvalidAttributes – This metric keeps track of messages that were filtered out because they carried invalid or malformed attributes and, thus, didn’t match the subscription filter policy.
  • NumberOfNotificationsFailed – This last metric tracks all the messages that failed to be delivered to subscribing endpoints, regardless of whether a filter policy had been set for the endpoint. This metric is emitted after the message delivery retry policy is exhausted, and SNS stops attempting to deliver the message. At that moment, the subscribing endpoint is likely no longer reachable. For example, the subscribing SQS queue or Lambda function has been deleted by its owner. You may want to closely monitor this metric to address message delivery issues quickly.

Message filtering graphs

Through the AWS Management Console, you can compose graphs to display your SNS message filtering activity. The graph shows the number of messages published, delivered, and filtered out within the timeframe you specify (1h, 3h, 12h, 1d, 3d, 1w, or custom).

SNS message filtering for CloudWatch Metrics

To compose an SNS message filtering graph with CloudWatch:

  1. Open the CloudWatch console.
  2. Choose Metrics, SNS, All Metrics, and Topic Metrics.
  3. Select all metrics to add to the graph, such as:
    • NumberOfMessagesPublished
    • NumberOfNotificationsDelivered
    • NumberOfNotificationsFilteredOut
  4. Choose Graphed metrics.
  5. In the Statistic column, switch from Average to Sum.
  6. Title your graph with a descriptive name, such as “SNS Message Filtering”

After you have your graph set up, you may want to copy the graph link for bookmarking, emailing, or sharing with co-workers. You may also want to add your graph to a CloudWatch dashboard for easy access in the future. Both actions are available to you on the Actions menu, which is found above the graph.

Summary

SNS message filtering defines how SNS topics behave in terms of message delivery. By using CloudWatch metrics, you gain visibility into the number of messages published, delivered, and filtered out. This enables you to validate the operation of filter policies and more easily troubleshoot during development phases.

SNS message filtering can be implemented easily with existing AWS SDKs by applying message and subscription attributes across all SNS supported protocols (Amazon SQS, AWS Lambda, HTTP, SMS, email, and mobile push). CloudWatch metrics for SNS message filtering is available now, in all AWS Regions.

For information about pricing, see the CloudWatch pricing page.

For more information, see:

Measuring the throughput for Amazon MQ using the JMS Benchmark

Post Syndicated from Rachel Richardson original https://aws.amazon.com/blogs/compute/measuring-the-throughput-for-amazon-mq-using-the-jms-benchmark/

This post is courtesy of Alan Protasio, Software Development Engineer, Amazon Web Services

Just like compute and storage, messaging is a fundamental building block of enterprise applications. Message brokers (aka “message-oriented middleware”) enable different software systems, often written in different languages, on different platforms, running in different locations, to communicate and exchange information. Mission-critical applications, such as CRM and ERP, rely on message brokers to work.

A common performance consideration for customers deploying a message broker in a production environment is the throughput of the system, measured as messages per second. This is important to know so that application environments (hosts, threads, memory, etc.) can be configured correctly.

In this post, we demonstrate how to measure the throughput for Amazon MQ, a new managed message broker service for ActiveMQ, using JMS Benchmark. It should take between 15–20 minutes to set up the environment and an hour to run the benchmark. We also provide some tips on how to configure Amazon MQ for optimal throughput.

Benchmarking throughput for Amazon MQ

ActiveMQ can be used for a number of use cases. These use cases can range from simple fire and forget tasks (that is, asynchronous processing), low-latency request-reply patterns, to buffering requests before they are persisted to a database.

The throughput of Amazon MQ is largely dependent on the use case. For example, if you have non-critical workloads such as gathering click events for a non-business-critical portal, you can use ActiveMQ in a non-persistent mode and get extremely high throughput with Amazon MQ.

On the flip side, if you have a critical workload where durability is extremely important (meaning that you can’t lose a message), then you are bound by the I/O capacity of your underlying persistence store. We recommend using mq.m4.large for the best results. The mq.t2.micro instance type is intended for product evaluation. Performance is limited, due to the lower memory and burstable CPU performance.

Tip: To improve your throughput with Amazon MQ, make sure that you have consumers processing messaging as fast as (or faster than) your producers are pushing messages.

Because it’s impossible to talk about how the broker (ActiveMQ) behaves for each and every use case, we walk through how to set up your own benchmark for Amazon MQ using our favorite open-source benchmarking tool: JMS Benchmark. We are fans of the JMS Benchmark suite because it’s easy to set up and deploy, and comes with a built-in visualizer of the results.

Non-Persistent Scenarios – Queue latency as you scale producer throughput

JMS Benchmark nonpersistent scenarios

Getting started

At the time of publication, you can create an mq.m4.large single-instance broker for testing for $0.30 per hour (US pricing).

This walkthrough covers the following tasks:

  1.  Create and configure the broker.
  2. Create an EC2 instance to run your benchmark
  3. Configure the security groups
  4.  Run the benchmark.

Step 1 – Create and configure the broker
Create and configure the broker using Tutorial: Creating and Configuring an Amazon MQ Broker.

Step 2 – Create an EC2 instance to run your benchmark
Launch the EC2 instance using Step 1: Launch an Instance. We recommend choosing the m5.large instance type.

Step 3 – Configure the security groups
Make sure that all the security groups are correctly configured to let the traffic flow between the EC2 instance and your broker.

  1. Sign in to the Amazon MQ console.
  2. From the broker list, choose the name of your broker (for example, MyBroker)
  3. In the Details section, under Security and network, choose the name of your security group or choose the expand icon ( ).
  4. From the security group list, choose your security group.
  5. At the bottom of the page, choose Inbound, Edit.
  6. In the Edit inbound rules dialog box, add a role to allow traffic between your instance and the broker:
    • Choose Add Rule.
    • For Type, choose Custom TCP.
    • For Port Range, type the ActiveMQ SSL port (61617).
    • For Source, leave Custom selected and then type the security group of your EC2 instance.
    • Choose Save.

Your broker can now accept the connection from your EC2 instance.

Step 4 – Run the benchmark
Connect to your EC2 instance using SSH and run the following commands:

$ cd ~
$ curl -L https://github.com/alanprot/jms-benchmark/archive/master.zip -o master.zip
$ unzip master.zip
$ cd jms-benchmark-master
$ chmod a+x bin/*
$ env \
  SERVER_SETUP=false \
  SERVER_ADDRESS={activemq-endpoint} \
  ACTIVEMQ_TRANSPORT=ssl\
  ACTIVEMQ_PORT=61617 \
  ACTIVEMQ_USERNAME={activemq-user} \
  ACTIVEMQ_PASSWORD={activemq-password} \
  ./bin/benchmark-activemq

After the benchmark finishes, you can find the results in the ~/reports directory. As you may notice, the performance of ActiveMQ varies based on the number of consumers, producers, destinations, and message size.

Amazon MQ architecture

The last bit that’s important to know so that you can better understand the results of the benchmark is how Amazon MQ is architected.

Amazon MQ is architected to be highly available (HA) and durable. For HA, we recommend using the multi-AZ option. After a message is sent to Amazon MQ in persistent mode, the message is written to the highly durable message store that replicates the data across multiple nodes in multiple Availability Zones. Because of this replication, for some use cases you may see a reduction in throughput as you migrate to Amazon MQ. Customers have told us they appreciate the benefits of message replication as it helps protect durability even in the face of the loss of an Availability Zone.

Conclusion

We hope this gives you an idea of how Amazon MQ performs. We encourage you to run tests to simulate your own use cases.

To learn more, see the Amazon MQ website. You can try Amazon MQ for free with the AWS Free Tier, which includes up to 750 hours of a single-instance mq.t2.micro broker and up to 1 GB of storage per month for one year.

Protecting your API using Amazon API Gateway and AWS WAF — Part I

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/protecting-your-api-using-amazon-api-gateway-and-aws-waf-part-i/

This post courtesy of Thiago Morais, AWS Solutions Architect

When you build web applications or expose any data externally, you probably look for a platform where you can build highly scalable, secure, and robust REST APIs. As APIs are publicly exposed, there are a number of best practices for providing a secure mechanism to consumers using your API.

Amazon API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, authorization and access control, monitoring, and API version management.

In this post, I show you how to take advantage of the regional API endpoint feature in API Gateway, so that you can create your own Amazon CloudFront distribution and secure your API using AWS WAF.

AWS WAF is a web application firewall that helps protect your web applications from common web exploits that could affect application availability, compromise security, or consume excessive resources.

As you make your APIs publicly available, you are exposed to attackers trying to exploit your services in several ways. The AWS security team published a whitepaper solution using AWS WAF, How to Mitigate OWASP’s Top 10 Web Application Vulnerabilities.

Regional API endpoints

Edge-optimized APIs are endpoints that are accessed through a CloudFront distribution created and managed by API Gateway. Before the launch of regional API endpoints, this was the default option when creating APIs using API Gateway. It primarily helped to reduce latency for API consumers that were located in different geographical locations than your API.

When API requests predominantly originate from an Amazon EC2 instance or other services within the same AWS Region as the API is deployed, a regional API endpoint typically lowers the latency of connections. It is recommended for such scenarios.

For better control around caching strategies, customers can use their own CloudFront distribution for regional APIs. They also have the ability to use AWS WAF protection, as I describe in this post.

Edge-optimized API endpoint

The following diagram is an illustrated example of the edge-optimized API endpoint where your API clients access your API through a CloudFront distribution created and managed by API Gateway.

Regional API endpoint

For the regional API endpoint, your customers access your API from the same Region in which your REST API is deployed. This helps you to reduce request latency and particularly allows you to add your own content delivery network, as needed.

Walkthrough

In this section, you implement the following steps:

  • Create a regional API using the PetStore sample API.
  • Create a CloudFront distribution for the API.
  • Test the CloudFront distribution.
  • Set up AWS WAF and create a web ACL.
  • Attach the web ACL to the CloudFront distribution.
  • Test AWS WAF protection.

Create the regional API

For this walkthrough, use an existing PetStore API. All new APIs launch by default as the regional endpoint type. To change the endpoint type for your existing API, choose the cog icon on the top right corner:

After you have created the PetStore API on your account, deploy a stage called “prod” for the PetStore API.

On the API Gateway console, select the PetStore API and choose Actions, Deploy API.

For Stage name, type prod and add a stage description.

Choose Deploy and the new API stage is created.

Use the following AWS CLI command to update your API from edge-optimized to regional:

aws apigateway update-rest-api \
--rest-api-id {rest-api-id} \
--patch-operations op=replace,path=/endpointConfiguration/types/EDGE,value=REGIONAL

A successful response looks like the following:

{
    "description": "Your first API with Amazon API Gateway. This is a sample API that integrates via HTTP with your demo Pet Store endpoints", 
    "createdDate": 1511525626, 
    "endpointConfiguration": {
        "types": [
            "REGIONAL"
        ]
    }, 
    "id": "{api-id}", 
    "name": "PetStore"
}

After you change your API endpoint to regional, you can now assign your own CloudFront distribution to this API.

Create a CloudFront distribution

To make things easier, I have provided an AWS CloudFormation template to deploy a CloudFront distribution pointing to the API that you just created. Click the button to deploy the template in the us-east-1 Region.

For Stack name, enter RegionalAPI. For APIGWEndpoint, enter your API FQDN in the following format:

{api-id}.execute-api.us-east-1.amazonaws.com

After you fill out the parameters, choose Next to continue the stack deployment. It takes a couple of minutes to finish the deployment. After it finishes, the Output tab lists the following items:

  • A CloudFront domain URL
  • An S3 bucket for CloudFront access logs
Output from CloudFormation

Output from CloudFormation

Test the CloudFront distribution

To see if the CloudFront distribution was configured correctly, use a web browser and enter the URL from your distribution, with the following parameters:

https://{your-distribution-url}.cloudfront.net/{api-stage}/pets

You should get the following output:

[
  {
    "id": 1,
    "type": "dog",
    "price": 249.99
  },
  {
    "id": 2,
    "type": "cat",
    "price": 124.99
  },
  {
    "id": 3,
    "type": "fish",
    "price": 0.99
  }
]

Set up AWS WAF and create a web ACL

With the new CloudFront distribution in place, you can now start setting up AWS WAF to protect your API.

For this demo, you deploy the AWS WAF Security Automations solution, which provides fine-grained control over the requests attempting to access your API.

For more information about deployment, see Automated Deployment. If you prefer, you can launch the solution directly into your account using the following button.

For CloudFront Access Log Bucket Name, add the name of the bucket created during the deployment of the CloudFormation stack for your CloudFront distribution.

The solution allows you to adjust thresholds and also choose which automations to enable to protect your API. After you finish configuring these settings, choose Next.

To start the deployment process in your account, follow the creation wizard and choose Create. It takes a few minutes do finish the deployment. You can follow the creation process through the CloudFormation console.

After the deployment finishes, you can see the new web ACL deployed on the AWS WAF console, AWSWAFSecurityAutomations.

Attach the AWS WAF web ACL to the CloudFront distribution

With the solution deployed, you can now attach the AWS WAF web ACL to the CloudFront distribution that you created earlier.

To assign the newly created AWS WAF web ACL, go back to your CloudFront distribution. After you open your distribution for editing, choose General, Edit.

Select the new AWS WAF web ACL that you created earlier, AWSWAFSecurityAutomations.

Save the changes to your CloudFront distribution and wait for the deployment to finish.

Test AWS WAF protection

To validate the AWS WAF Web ACL setup, use Artillery to load test your API and see AWS WAF in action.

To install Artillery on your machine, run the following command:

$ npm install -g artillery

After the installation completes, you can check if Artillery installed successfully by running the following command:

$ artillery -V
$ 1.6.0-12

As the time of publication, Artillery is on version 1.6.0-12.

One of the WAF web ACL rules that you have set up is a rate-based rule. By default, it is set up to block any requesters that exceed 2000 requests under 5 minutes. Try this out.

First, use cURL to query your distribution and see the API output:

$ curl -s https://{distribution-name}.cloudfront.net/prod/pets
[
  {
    "id": 1,
    "type": "dog",
    "price": 249.99
  },
  {
    "id": 2,
    "type": "cat",
    "price": 124.99
  },
  {
    "id": 3,
    "type": "fish",
    "price": 0.99
  }
]

Based on the test above, the result looks good. But what if you max out the 2000 requests in under 5 minutes?

Run the following Artillery command:

artillery quick -n 2000 --count 10  https://{distribution-name}.cloudfront.net/prod/pets

What you are doing is firing 2000 requests to your API from 10 concurrent users. For brevity, I am not posting the Artillery output here.

After Artillery finishes its execution, try to run the cURL request again and see what happens:

 

$ curl -s https://{distribution-name}.cloudfront.net/prod/pets

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<TITLE>ERROR: The request could not be satisfied</TITLE>
</HEAD><BODY>
<H1>ERROR</H1>
<H2>The request could not be satisfied.</H2>
<HR noshade size="1px">
Request blocked.
<BR clear="all">
<HR noshade size="1px">
<PRE>
Generated by cloudfront (CloudFront)
Request ID: [removed]
</PRE>
<ADDRESS>
</ADDRESS>
</BODY></HTML>

As you can see from the output above, the request was blocked by AWS WAF. Your IP address is removed from the blocked list after it falls below the request limit rate.

Conclusion

In this first part, you saw how to use the new API Gateway regional API endpoint together with Amazon CloudFront and AWS WAF to secure your API from a series of attacks.

In the second part, I will demonstrate some other techniques to protect your API using API keys and Amazon CloudFront custom headers.

Use Slack ChatOps to Deploy Your Code – How to Integrate Your Pipeline in AWS CodePipeline with Your Slack Channel

Post Syndicated from Rumi Olsen original https://aws.amazon.com/blogs/devops/use-slack-chatops-to-deploy-your-code-how-to-integrate-your-pipeline-in-aws-codepipeline-with-your-slack-channel/

Slack is widely used by DevOps and development teams to communicate status. Typically, when a build has been tested and is ready to be promoted to a staging environment, a QA engineer or DevOps engineer kicks off the deployment. Using Slack in a ChatOps collaboration model, the promotion can be done in a single click from a Slack channel. And because the promotion happens through a Slack channel, the whole development team knows what’s happening without checking email.

In this blog post, I will show you how to integrate AWS services with a Slack application. I use an interactive message button and incoming webhook to promote a stage with a single click.

To follow along with the steps in this post, you’ll need a pipeline in AWS CodePipeline. If you don’t have a pipeline, the fastest way to create one for this use case is to use AWS CodeStar. Go to the AWS CodeStar console and select the Static Website template (shown in the screenshot). AWS CodeStar will create a pipeline with an AWS CodeCommit repository and an AWS CodeDeploy deployment for you. After the pipeline is created, you will need to add a manual approval stage.

You’ll also need to build a Slack app with webhooks and interactive components, write two Lambda functions, and create an API Gateway API and a SNS topic.

As you’ll see in the following diagram, when I make a change and merge a new feature into the master branch in AWS CodeCommit, the check-in kicks off my CI/CD pipeline in AWS CodePipeline. When CodePipeline reaches the approval stage, it sends a notification to Amazon SNS, which triggers an AWS Lambda function (ApprovalRequester).

The Slack channel receives a prompt that looks like the following screenshot. When I click Yes to approve the build promotion, the approval result is sent to CodePipeline through API Gateway and Lambda (ApprovalHandler). The pipeline continues on to deploy the build to the next environment.

Create a Slack app

For App Name, type a name for your app. For Development Slack Workspace, choose the name of your workspace. You’ll see in the following screenshot that my workspace is AWS ChatOps.

After the Slack application has been created, you will see the Basic Information page, where you can create incoming webhooks and enable interactive components.

To add incoming webhooks:

  1. Under Add features and functionality, choose Incoming Webhooks. Turn the feature on by selecting Off, as shown in the following screenshot.
  2. Now that the feature is turned on, choose Add New Webhook to Workspace. In the process of creating the webhook, Slack lets you choose the channel where messages will be posted.
  3. After the webhook has been created, you’ll see its URL. You will use this URL when you create the Lambda function.

If you followed the steps in the post, the pipeline should look like the following.

Write the Lambda function for approval requests

This Lambda function is invoked by the SNS notification. It sends a request that consists of an interactive message button to the incoming webhook you created earlier.  The following sample code sends the request to the incoming webhook. WEBHOOK_URL and SLACK_CHANNEL are the environment variables that hold values of the webhook URL that you created and the Slack channel where you want the interactive message button to appear.

# This function is invoked via SNS when the CodePipeline manual approval action starts.
# It will take the details from this approval notification and sent an interactive message to Slack that allows users to approve or cancel the deployment.

import os
import json
import logging
import urllib.parse

from base64 import b64decode
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError

# This is passed as a plain-text environment variable for ease of demonstration.
# Consider encrypting the value with KMS or use an encrypted parameter in Parameter Store for production deployments.
SLACK_WEBHOOK_URL = os.environ['SLACK_WEBHOOK_URL']
SLACK_CHANNEL = os.environ['SLACK_CHANNEL']

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    print("Received event: " + json.dumps(event, indent=2))
    message = event["Records"][0]["Sns"]["Message"]
    
    data = json.loads(message) 
    token = data["approval"]["token"]
    codepipeline_name = data["approval"]["pipelineName"]
    
    slack_message = {
        "channel": SLACK_CHANNEL,
        "text": "Would you like to promote the build to production?",
        "attachments": [
            {
                "text": "Yes to deploy your build to production",
                "fallback": "You are unable to promote a build",
                "callback_id": "wopr_game",
                "color": "#3AA3E3",
                "attachment_type": "default",
                "actions": [
                    {
                        "name": "deployment",
                        "text": "Yes",
                        "style": "danger",
                        "type": "button",
                        "value": json.dumps({"approve": True, "codePipelineToken": token, "codePipelineName": codepipeline_name}),
                        "confirm": {
                            "title": "Are you sure?",
                            "text": "This will deploy the build to production",
                            "ok_text": "Yes",
                            "dismiss_text": "No"
                        }
                    },
                    {
                        "name": "deployment",
                        "text": "No",
                        "type": "button",
                        "value": json.dumps({"approve": False, "codePipelineToken": token, "codePipelineName": codepipeline_name})
                    }  
                ]
            }
        ]
    }

    req = Request(SLACK_WEBHOOK_URL, json.dumps(slack_message).encode('utf-8'))

    response = urlopen(req)
    response.read()
    
    return None

 

Create a SNS topic

Create a topic and then create a subscription that invokes the ApprovalRequester Lambda function. You can configure the manual approval action in the pipeline to send a message to this SNS topic when an approval action is required. When the pipeline reaches the approval stage, it sends a notification to this SNS topic. SNS publishes a notification to all of the subscribed endpoints. In this case, the Lambda function is the endpoint. Therefore, it invokes and executes the Lambda function. For information about how to create a SNS topic, see Create a Topic in the Amazon SNS Developer Guide.

Write the Lambda function for handling the interactive message button

This Lambda function is invoked by API Gateway. It receives the result of the interactive message button whether or not the build promotion was approved. If approved, an API call is made to CodePipeline to promote the build to the next environment. If not approved, the pipeline stops and does not move to the next stage.

The Lambda function code might look like the following. SLACK_VERIFICATION_TOKEN is the environment variable that contains your Slack verification token. You can find your verification token under Basic Information on Slack manage app page. When you scroll down, you will see App Credential. Verification token is found under the section.

# This function is triggered via API Gateway when a user acts on the Slack interactive message sent by approval_requester.py.

from urllib.parse import parse_qs
import json
import os
import boto3

SLACK_VERIFICATION_TOKEN = os.environ['SLACK_VERIFICATION_TOKEN']

#Triggered by API Gateway
#It kicks off a particular CodePipeline project
def lambda_handler(event, context):
	#print("Received event: " + json.dumps(event, indent=2))
	body = parse_qs(event['body'])
	payload = json.loads(body['payload'][0])

	# Validate Slack token
	if SLACK_VERIFICATION_TOKEN == payload['token']:
		send_slack_message(json.loads(payload['actions'][0]['value']))
		
		# This will replace the interactive message with a simple text response.
		# You can implement a more complex message update if you would like.
		return  {
			"isBase64Encoded": "false",
			"statusCode": 200,
			"body": "{\"text\": \"The approval has been processed\"}"
		}
	else:
		return  {
			"isBase64Encoded": "false",
			"statusCode": 403,
			"body": "{\"error\": \"This request does not include a vailid verification token.\"}"
		}


def send_slack_message(action_details):
	codepipeline_status = "Approved" if action_details["approve"] else "Rejected"
	codepipeline_name = action_details["codePipelineName"]
	token = action_details["codePipelineToken"] 

	client = boto3.client('codepipeline')
	response_approval = client.put_approval_result(
							pipelineName=codepipeline_name,
							stageName='Approval',
							actionName='ApprovalOrDeny',
							result={'summary':'','status':codepipeline_status},
							token=token)
	print(response_approval)

 

Create the API Gateway API

  1. In the Amazon API Gateway console, create a resource called InteractiveMessageHandler.
  2. Create a POST method.
    • For Integration type, choose Lambda Function.
    • Select Use Lambda Proxy integration.
    • From Lambda Region, choose a region.
    • In Lambda Function, type a name for your function.
  3.  Deploy to a stage.

For more information, see Getting Started with Amazon API Gateway in the Amazon API Developer Guide.

Now go back to your Slack application and enable interactive components.

To enable interactive components for the interactive message (Yes) button:

  1. Under Features, choose Interactive Components.
  2. Choose Enable Interactive Components.
  3. Type a request URL in the text box. Use the invoke URL in Amazon API Gateway that will be called when the approval button is clicked.

Now that all the pieces have been created, run the solution by checking in a code change to your CodeCommit repo. That will release the change through CodePipeline. When the CodePipeline comes to the approval stage, it will prompt to your Slack channel to see if you want to promote the build to your staging or production environment. Choose Yes and then see if your change was deployed to the environment.

Conclusion

That is it! You have now created a Slack ChatOps solution using AWS CodeCommit, AWS CodePipeline, AWS Lambda, Amazon API Gateway, and Amazon Simple Notification Service.

Now that you know how to do this Slack and CodePipeline integration, you can use the same method to interact with other AWS services using API Gateway and Lambda. You can also use Slack’s slash command to initiate an action from a Slack channel, rather than responding in the way demonstrated in this post.

AWS IoT 1-Click – Use Simple Devices to Trigger Lambda Functions

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-iot-1-click-use-simple-devices-to-trigger-lambda-functions/

We announced a preview of AWS IoT 1-Click at AWS re:Invent 2017 and have been refining it ever since, focusing on simplicity and a clean out-of-box experience. Designed to make IoT available and accessible to a broad audience, AWS IoT 1-Click is now generally available, along with new IoT buttons from AWS and AT&T.

I sat down with the dev team a month or two ago to learn about the service so that I could start thinking about my blog post. During the meeting they gave me a pair of IoT buttons and I started to think about some creative ways to put them to use. Here are a few that I came up with:

Help Request – Earlier this month I spent a very pleasant weekend at the HackTillDawn hackathon in Los Angeles. As the participants were hacking away, they occasionally had questions about AWS, machine learning, Amazon SageMaker, and AWS DeepLens. While we had plenty of AWS Solution Architects on hand (decked out in fashionable & distinctive AWS shirts for easy identification), I imagined an IoT button for each team. Pressing the button would alert the SA crew via SMS and direct them to the proper table.

Camera ControlTim Bray and I were in the AWS video studio, prepping for the first episode of Tim’s series on AWS Messaging. Minutes before we opened the Twitch stream I realized that we did not have a clean, unobtrusive way to ask the camera operator to switch to a closeup view. Again, I imagined that a couple of IoT buttons would allow us to make the request.

Remote Dog Treat Dispenser – My dog barks every time a stranger opens the gate in front of our house. While it is great to have confirmation that my Ring doorbell is working, I would like to be able to press a button and dispense a treat so that Luna stops barking!

Homes, offices, factories, schools, vehicles, and health care facilities can all benefit from IoT buttons and other simple IoT devices, all managed using AWS IoT 1-Click.

All About AWS IoT 1-Click
As I said earlier, we have been focusing on simplicity and a clean out-of-box experience. Here’s what that means:

Architects can dream up applications for inexpensive, low-powered devices.

Developers don’t need to write any device-level code. They can make use of pre-built actions, which send email or SMS messages, or write their own custom actions using AWS Lambda functions.

Installers don’t have to install certificates or configure cloud endpoints on newly acquired devices, and don’t have to worry about firmware updates.

Administrators can monitor the overall status and health of each device, and can arrange to receive alerts when a device nears the end of its useful life and needs to be replaced, using a single interface that spans device types and manufacturers.

I’ll show you how easy this is in just a moment. But first, let’s talk about the current set of devices that are supported by AWS IoT 1-Click.

Who’s Got the Button?
We’re launching with support for two types of buttons (both pictured above). Both types of buttons are pre-configured with X.509 certificates, communicate to the cloud over secure connections, and are ready to use.

The AWS IoT Enterprise Button communicates via Wi-Fi. It has a 2000-click lifetime, encrypts outbound data using TLS, and can be configured using BLE and our mobile app. It retails for $19.99 (shipping and handling not included) and can be used in the United States, Europe, and Japan.

The AT&T LTE-M Button communicates via the LTE-M cellular network. It has a 1500-click lifetime, and also encrypts outbound data using TLS. The device and the bundled data plan is available an an introductory price of $29.99 (shipping and handling not included), and can be used in the United States.

We are very interested in working with device manufacturers in order to make even more shapes, sizes, and types of devices (badge readers, asset trackers, motion detectors, and industrial sensors, to name a few) available to our customers. Our team will be happy to tell you about our provisioning tools and our facility for pushing OTA (over the air) updates to large fleets of devices; you can contact them at [email protected].

AWS IoT 1-Click Concepts
I’m eager to show you how to use AWS IoT 1-Click and the buttons, but need to introduce a few concepts first.

Device – A button or other item that can send messages. Each device is uniquely identified by a serial number.

Placement Template – Describes a like-minded collection of devices to be deployed. Specifies the action to be performed and lists the names of custom attributes for each device.

Placement – A device that has been deployed. Referring to placements instead of devices gives you the freedom to replace and upgrade devices with minimal disruption. Each placement can include values for custom attributes such as a location (“Building 8, 3rd Floor, Room 1337”) or a purpose (“Coffee Request Button”).

Action – The AWS Lambda function to invoke when the button is pressed. You can write a function from scratch, or you can make use of a pair of predefined functions that send an email or an SMS message. The actions have access to the attributes; you can, for example, send an SMS message with the text “Urgent need for coffee in Building 8, 3rd Floor, Room 1337.”

Getting Started with AWS IoT 1-Click
Let’s set up an IoT button using the AWS IoT 1-Click Console:

If I didn’t have any buttons I could click Buy devices to get some. But, I do have some, so I click Claim devices to move ahead. I enter the device ID or claim code for my AT&T button and click Claim (I can enter multiple claim codes or device IDs if I want):

The AWS buttons can be claimed using the console or the mobile app; the first step is to use the mobile app to configure the button to use my Wi-Fi:

Then I scan the barcode on the box and click the button to complete the process of claiming the device. Both of my buttons are now visible in the console:

I am now ready to put them to use. I click on Projects, and then Create a project:

I name and describe my project, and click Next to proceed:

Now I define a device template, along with names and default values for the placement attributes. Here’s how I set up a device template (projects can contain several, but I just need one):

The action has two mandatory parameters (phone number and SMS message) built in; I add three more (Building, Room, and Floor) and click Create project:

I’m almost ready to ask for some coffee! The next step is to associate my buttons with this project by creating a placement for each one. I click Create placements to proceed. I name each placement, select the device to associate with it, and then enter values for the attributes that I established for the project. I can also add additional attributes that are peculiar to this placement:

I can inspect my project and see that everything looks good:

I click on the buttons and the SMS messages appear:

I can monitor device activity in the AWS IoT 1-Click Console:

And also in the Lambda Console:

The Lambda function itself is also accessible, and can be used as-is or customized:

As you can see, this is the code that lets me use {{*}}include all of the placement attributes in the message and {{Building}} (for example) to include a specific placement attribute.

Now Available
I’ve barely scratched the surface of this cool new service and I encourage you to give it a try (or a click) yourself. Buy a button or two, build something cool, and let me know all about it!

Pricing is based on the number of enabled devices in your account, measured monthly and pro-rated for partial months. Devices can be enabled or disabled at any time. See the AWS IoT 1-Click Pricing page for more info.

To learn more, visit the AWS IoT 1-Click home page or read the AWS IoT 1-Click documentation.

Jeff;

 

Brutus 2: the gaming PC case of your dreams

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/brutus-2-gaming-pc-case/

Attention, case modders: take a look at the Brutus 2, an extremely snazzy computer case with a partly transparent, animated side panel that’s powered by a Pi. Daniel Otto and Carsten Lehman have a current crowdfunder for the case; their video is in German, but the looks of the build speak for themselves. There are some truly gorgeous effects here.

der BRUTUS 2 by 3nb Gaming

Vorbestellungen ab sofort auf https://www.startnext.com/brutus2 Weitere Infos zu uns auf: https://3nb.de https://www.facebook.com/3nb.de https://www.instagram.com/3nb.de Über 3nb: – GbR aus Leipzig, gegründet 2017 – wir kommen aus den Bereichen Elektronik und Informatik – erstes Produkt: der Brutus One ein Gaming PC mit transparentem Display in der Seite Kurzinfo Brutus 2: – Markencomputergehäuse für Gaming- /Casemoddingszene – Besonderheit: animiertes Seitenfenster angesteuert mit einem Raspberry Pi – Vorteile von unserem Case: o Case ist einzeln lieferbar und nicht nur als komplett-PC o kein Leistungsverbrauch der Grafikkarte dank integriertem Raspberry Pi o bessere Darstellung von Texten und Grafiken durch unscharfen Hintergrund

What’s case modding?

Case modding just means modifying your computer or gaming console’s case, and it’s very popular in the gaming community. Some mods are functional, while others improve the way the case looks. Lots of dedicated gamers don’t only want a powerful computer, they also want it to look amazing — at home, or at LAN parties and games tournaments.

The Brutus 2 case

The Brutus 2 case is made by Daniel and Carsten’s startup, 3nb electronics, and it’s a product that is officially Powered by Raspberry Pi. Its standout feature is the semi-transparent TFT screen, which lets you play any video clip you choose while keeping your gaming hardware on display. It looks incredibly cool. All the graphics for the case’s screen are handled by a Raspberry Pi, so it doesn’t use any of your main PC’s GPU power and your gaming won’t suffer.

Brutus 2 PC case powered by Raspberry Pi

The software

To use Brutus 2, you just need to run a small desktop application on your PC to choose what you want to display on the case. A number of neat animations are included, and you can upload your own if you want.

So far, the app only runs on Windows, but 3nb electronics are planning to make the code open-source, so you can modify it for other operating systems, or to display other file types. This is true to the spirit of the case modding and Raspberry Pi communities, who love adapting, retrofitting, and overhauling projects and code to fit their needs.

Brutus 2 PC case powered by Raspberry Pi

Daniel and Carsten say that one of their campaign’s stretch goals is to implement more functionality in the Brutus 2 app. So in the future, the case could also show things like CPU temperature, gaming stats, and in-game messages. Of course, there’s nothing stopping you from integrating features like that yourself.

If you have any questions about the case, you can post them directly to Daniel and Carsten here.

The crowdfunding campaign

The Brutus 2 campaign on Startnext is currently halfway to its first funding goal of €10000, with over three weeks to go until it closes. If you’re quick, you still be may be able to snatch one of the early-bird offers. And if your whole guild NEEDS this, that’s OK — there are discounts for bulk orders.

The post Brutus 2: the gaming PC case of your dreams appeared first on Raspberry Pi.

Solving Complex Ordering Challenges with Amazon SQS FIFO Queues

Post Syndicated from Christie Gifrin original https://aws.amazon.com/blogs/compute/solving-complex-ordering-challenges-with-amazon-sqs-fifo-queues/

Contributed by Shea Lutton, AWS Cloud Infrastructure Architect

Amazon Simple Queue Service (Amazon SQS) is a fully managed queuing service that helps decouple applications, distributed systems, and microservices to increase fault tolerance. SQS queues come in two distinct types:

  • Standard SQS queues are able to scale to enormous throughput with at-least-once delivery.
  • FIFO queues are designed to guarantee that messages are processed exactly once in the exact order that they are received and have a default rate of 300 transactions per second.

As customers explore SQS FIFO queues, they often have questions about how the behavior works when messages arrive and are consumed. This post walks through some common situations to identify the exact behavior that you can expect. It also covers the behavior of message groups in depth and explains why message groups are key to understanding how FIFO queues work.

The simple case

Suppose that you run a major auction platform where people buy and sell a wide range of products. Your platform requires that transactions from buyers and sellers get processed in exactly the order received. Here’s how a FIFO queue helps you keep all your transactions in one straight flow.

A seller currently is holding an auction for a laptop, and three different bids are received for the same price. Ties are awarded to the first bidder at that price so it is important to track which arrived first. Your auction platform receives the three bids and sends them to a FIFO queue before they are processed.

Now observe how messages leave the queue. When your consumer asks for a batch of up to 10 messages, SQS starts filling the batch with the oldest message (bid A1). It keeps filling until either the batch is full or the queue is empty. In this case, the batch contains the three messages and the queue is now empty. After a batch has left the queue, SQS considers that batch of messages to be “in-flight” until the consumer either deletes them or the batch’s visibility timer expires.

 

When you have a single consumer, this is easy to envision. The consumer gets a batch of messages (now in-flight), does its processing, and deletes the messages. That consumer is then ready to ask for the next batch of messages.

The critical thing to keep in mind is that SQS won’t release the next batch of messages until the first batch has been deleted. By adding more messages to the queue, you can see more interesting behaviors. Imagine that a burst of 11 bids is sent to your FIFO queue, with two bids for Auction A arriving last.

The FIFO queue now has at least two batches of messages in it. When your single consumer requests the first batch of 10 messages, it receives a batch starting with B1 and ending with A1. Later, after the first batch has been deleted, the consumer can get the second batch of messages containing the final A2 message from the queue.

Adding complexity with multiple message groups

A new challenge arises. Your auction platform is getting busier and your dev team added a number of new features. The combination of increased messages and extra processing time for the new features means that a single consumer is too slow. The solution is to scale to have more consumers and process messages in parallel.

To work in parallel, your team realized that only the messages related to a single auction must be kept in order. All transactions for Auction A need to be kept in order and so do all transactions for Auction B. But the two auctions are independent and it does not matter which auctions transactions are processed first.

FIFO can handle that case with a feature called message groups. Each transaction related to Auction A is placed by your producer into message group A, and so on. In the diagram below, Auction A and Auction B each received three bid transactions, with bid B1 arriving first. The FIFO queue always keeps transactions within a message group in the order in which they arrived.

How is this any different than earlier examples? The consumer now gets the messages ordered by message groups, all the B group messages followed by all the A group messages. Multiple message groups create the possibility of using multiple consumers, which I explain in a moment. If FIFO can’t fill up a batch of messages with a single message group, FIFO can place more than one message group in a batch of messages. But whenever possible, the queue gives you a full batch of messages from the same group.

The order of messages leaving a FIFO queue is governed by three rules:

  1. Return the oldest message where no other message in the same message group is currently in-flight.
  2. Return as many messages from the same message group as possible.
  3. If a message batch is still not full, go back to rule 1.

To see this behavior, add a second consumer and insert many more messages into the queue. For simplicity, the delete message action has been omitted in these diagrams but it is assumed that all messages in a batch are processed successfully by the consumer and the batch is properly deleted immediately after.

In this example, there are 11 Group A and 11 Group B transactions arriving in interleaved order and a second consumer has been added. Consumer 1 asks for a group of 10 messages and receives 10 Group A messages. Consumer 2 then asks for 10 messages but SQS knows that Group A is in flight, so it releases 10 Group B messages. The two consumers are now processing two batches of messages in parallel, speeding up throughput and then deleting their batches. When Consumer 1 requests the next batch of messages, it receives the remaining two messages, one from Group A and one from Group B.

Consider this nuanced detail from the example above. What would happen if Consumer 1 was on a faster server and processed its first batch of messages before Consumer 2 could mark its messages for deletion? See if you can predict the behavior before looking at the answer.

If Consumer 2 has not deleted its Group B messages yet when Consumer 1 asks for the next batch, then the FIFO queue considers Group B to still be in flight. It does not release any more Group B messages. Consumer 1 gets only the remaining Group A message. Later, after Consumer 2 has deleted its first batch, the remaining Group B message is released.

Conclusion

I hope this post answered your questions about how Amazon SQS FIFO queues work and why message groups are helpful. If you’re interested in exploring SQS FIFO queues further, here are a few ideas to get you started:

Analyze Apache Parquet optimized data using Amazon Kinesis Data Firehose, Amazon Athena, and Amazon Redshift

Post Syndicated from Roy Hasson original https://aws.amazon.com/blogs/big-data/analyzing-apache-parquet-optimized-data-using-amazon-kinesis-data-firehose-amazon-athena-and-amazon-redshift/

Amazon Kinesis Data Firehose is the easiest way to capture and stream data into a data lake built on Amazon S3. This data can be anything—from AWS service logs like AWS CloudTrail log files, Amazon VPC Flow Logs, Application Load Balancer logs, and others. It can also be IoT events, game events, and much more. To efficiently query this data, a time-consuming ETL (extract, transform, and load) process is required to massage and convert the data to an optimal file format, which increases the time to insight. This situation is less than ideal, especially for real-time data that loses its value over time.

To solve this common challenge, Kinesis Data Firehose can now save data to Amazon S3 in Apache Parquet or Apache ORC format. These are optimized columnar formats that are highly recommended for best performance and cost-savings when querying data in S3. This feature directly benefits you if you use Amazon Athena, Amazon Redshift, AWS Glue, Amazon EMR, or any other big data tools that are available from the AWS Partner Network and through the open-source community.

Amazon Connect is a simple-to-use, cloud-based contact center service that makes it easy for any business to provide a great customer experience at a lower cost than common alternatives. Its open platform design enables easy integration with other systems. One of those systems is Amazon Kinesis—in particular, Kinesis Data Streams and Kinesis Data Firehose.

What’s really exciting is that you can now save events from Amazon Connect to S3 in Apache Parquet format. You can then perform analytics using Amazon Athena and Amazon Redshift Spectrum in real time, taking advantage of this key performance and cost optimization. Of course, Amazon Connect is only one example. This new capability opens the door for a great deal of opportunity, especially as organizations continue to build their data lakes.

Amazon Connect includes an array of analytics views in the Administrator dashboard. But you might want to run other types of analysis. In this post, I describe how to set up a data stream from Amazon Connect through Kinesis Data Streams and Kinesis Data Firehose and out to S3, and then perform analytics using Athena and Amazon Redshift Spectrum. I focus primarily on the Kinesis Data Firehose support for Parquet and its integration with the AWS Glue Data Catalog, Amazon Athena, and Amazon Redshift.

Solution overview

Here is how the solution is laid out:

 

 

The following sections walk you through each of these steps to set up the pipeline.

1. Define the schema

When Kinesis Data Firehose processes incoming events and converts the data to Parquet, it needs to know which schema to apply. The reason is that many times, incoming events contain all or some of the expected fields based on which values the producers are advertising. A typical process is to normalize the schema during a batch ETL job so that you end up with a consistent schema that can easily be understood and queried. Doing this introduces latency due to the nature of the batch process. To overcome this issue, Kinesis Data Firehose requires the schema to be defined in advance.

To see the available columns and structures, see Amazon Connect Agent Event Streams. For the purpose of simplicity, I opted to make all the columns of type String rather than create the nested structures. But you can definitely do that if you want.

The simplest way to define the schema is to create a table in the Amazon Athena console. Open the Athena console, and paste the following create table statement, substituting your own S3 bucket and prefix for where your event data will be stored. A Data Catalog database is a logical container that holds the different tables that you can create. The default database name shown here should already exist. If it doesn’t, you can create it or use another database that you’ve already created.

CREATE EXTERNAL TABLE default.kfhconnectblog (
  awsaccountid string,
  agentarn string,
  currentagentsnapshot string,
  eventid string,
  eventtimestamp string,
  eventtype string,
  instancearn string,
  previousagentsnapshot string,
  version string
)
STORED AS parquet
LOCATION 's3://your_bucket/kfhconnectblog/'
TBLPROPERTIES ("parquet.compression"="SNAPPY")

That’s all you have to do to prepare the schema for Kinesis Data Firehose.

2. Define the data streams

Next, you need to define the Kinesis data streams that will be used to stream the Amazon Connect events.  Open the Kinesis Data Streams console and create two streams.  You can configure them with only one shard each because you don’t have a lot of data right now.

3. Define the Kinesis Data Firehose delivery stream for Parquet

Let’s configure the Data Firehose delivery stream using the data stream as the source and Amazon S3 as the output. Start by opening the Kinesis Data Firehose console and creating a new data delivery stream. Give it a name, and associate it with the Kinesis data stream that you created in Step 2.

As shown in the following screenshot, enable Record format conversion (1) and choose Apache Parquet (2). As you can see, Apache ORC is also supported. Scroll down and provide the AWS Glue Data Catalog database name (3) and table names (4) that you created in Step 1. Choose Next.

To make things easier, the output S3 bucket and prefix fields are automatically populated using the values that you defined in the LOCATION parameter of the create table statement from Step 1. Pretty cool. Additionally, you have the option to save the raw events into another location as defined in the Source record S3 backup section. Don’t forget to add a trailing forward slash “ / “ so that Data Firehose creates the date partitions inside that prefix.

On the next page, in the S3 buffer conditions section, there is a note about configuring a large buffer size. The Parquet file format is highly efficient in how it stores and compresses data. Increasing the buffer size allows you to pack more rows into each output file, which is preferred and gives you the most benefit from Parquet.

Compression using Snappy is automatically enabled for both Parquet and ORC. You can modify the compression algorithm by using the Kinesis Data Firehose API and update the OutputFormatConfiguration.

Be sure to also enable Amazon CloudWatch Logs so that you can debug any issues that you might run into.

Lastly, finalize the creation of the Firehose delivery stream, and continue on to the next section.

4. Set up the Amazon Connect contact center

After setting up the Kinesis pipeline, you now need to set up a simple contact center in Amazon Connect. The Getting Started page provides clear instructions on how to set up your environment, acquire a phone number, and create an agent to accept calls.

After setting up the contact center, in the Amazon Connect console, choose your Instance Alias, and then choose Data Streaming. Under Agent Event, choose the Kinesis data stream that you created in Step 2, and then choose Save.

At this point, your pipeline is complete.  Agent events from Amazon Connect are generated as agents go about their day. Events are sent via Kinesis Data Streams to Kinesis Data Firehose, which converts the event data from JSON to Parquet and stores it in S3. Athena and Amazon Redshift Spectrum can simply query the data without any additional work.

So let’s generate some data. Go back into the Administrator console for your Amazon Connect contact center, and create an agent to handle incoming calls. In this example, I creatively named mine Agent One. After it is created, Agent One can get to work and log into their console and set their availability to Available so that they are ready to receive calls.

To make the data a bit more interesting, I also created a second agent, Agent Two. I then made some incoming and outgoing calls and caused some failures to occur, so I now have enough data available to analyze.

5. Analyze the data with Athena

Let’s open the Athena console and run some queries. One thing you’ll notice is that when we created the schema for the dataset, we defined some of the fields as Strings even though in the documentation they were complex structures.  The reason for doing that was simply to show some of the flexibility of Athena to be able to parse JSON data. However, you can define nested structures in your table schema so that Kinesis Data Firehose applies the appropriate schema to the Parquet file.

Let’s run the first query to see which agents have logged into the system.

The query might look complex, but it’s fairly straightforward:

WITH dataset AS (
  SELECT 
    from_iso8601_timestamp(eventtimestamp) AS event_ts,
    eventtype,
    -- CURRENT STATE
    json_extract_scalar(
      currentagentsnapshot,
      '$.agentstatus.name') AS current_status,
    from_iso8601_timestamp(
      json_extract_scalar(
        currentagentsnapshot,
        '$.agentstatus.starttimestamp')) AS current_starttimestamp,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.firstname') AS current_firstname,
    json_extract_scalar(
      currentagentsnapshot,
      '$.configuration.lastname') AS current_lastname,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.username') AS current_username,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.routingprofile.defaultoutboundqueue.name') AS               current_outboundqueue,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.routingprofile.inboundqueues[0].name') as current_inboundqueue,
    -- PREVIOUS STATE
    json_extract_scalar(
      previousagentsnapshot, 
      '$.agentstatus.name') as prev_status,
    from_iso8601_timestamp(
      json_extract_scalar(
        previousagentsnapshot, 
       '$.agentstatus.starttimestamp')) as prev_starttimestamp,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.firstname') as prev_firstname,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.lastname') as prev_lastname,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.username') as prev_username,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.routingprofile.defaultoutboundqueue.name') as current_outboundqueue,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.routingprofile.inboundqueues[0].name') as prev_inboundqueue
  from kfhconnectblog
  where eventtype <> 'HEART_BEAT'
)
SELECT
  current_status as status,
  current_username as username,
  event_ts
FROM dataset
WHERE eventtype = 'LOGIN' AND current_username <> ''
ORDER BY event_ts DESC

The query output looks something like this:

Here is another query that shows the sessions each of the agents engaged with. It tells us where they were incoming or outgoing, if they were completed, and where there were missed or failed calls.

WITH src AS (
  SELECT
     eventid,
     json_extract_scalar(currentagentsnapshot, '$.configuration.username') as username,
     cast(json_extract(currentagentsnapshot, '$.contacts') AS ARRAY(JSON)) as c,
     cast(json_extract(previousagentsnapshot, '$.contacts') AS ARRAY(JSON)) as p
  from kfhconnectblog
),
src2 AS (
  SELECT *
  FROM src CROSS JOIN UNNEST (c, p) AS contacts(c_item, p_item)
),
dataset AS (
SELECT 
  eventid,
  username,
  json_extract_scalar(c_item, '$.contactid') as c_contactid,
  json_extract_scalar(c_item, '$.channel') as c_channel,
  json_extract_scalar(c_item, '$.initiationmethod') as c_direction,
  json_extract_scalar(c_item, '$.queue.name') as c_queue,
  json_extract_scalar(c_item, '$.state') as c_state,
  from_iso8601_timestamp(json_extract_scalar(c_item, '$.statestarttimestamp')) as c_ts,
  
  json_extract_scalar(p_item, '$.contactid') as p_contactid,
  json_extract_scalar(p_item, '$.channel') as p_channel,
  json_extract_scalar(p_item, '$.initiationmethod') as p_direction,
  json_extract_scalar(p_item, '$.queue.name') as p_queue,
  json_extract_scalar(p_item, '$.state') as p_state,
  from_iso8601_timestamp(json_extract_scalar(p_item, '$.statestarttimestamp')) as p_ts
FROM src2
)
SELECT 
  username,
  c_channel as channel,
  c_direction as direction,
  p_state as prev_state,
  c_state as current_state,
  c_ts as current_ts,
  c_contactid as id
FROM dataset
WHERE c_contactid = p_contactid
ORDER BY id DESC, current_ts ASC

The query output looks similar to the following:

6. Analyze the data with Amazon Redshift Spectrum

With Amazon Redshift Spectrum, you can query data directly in S3 using your existing Amazon Redshift data warehouse cluster. Because the data is already in Parquet format, Redshift Spectrum gets the same great benefits that Athena does.

Here is a simple query to show querying the same data from Amazon Redshift. Note that to do this, you need to first create an external schema in Amazon Redshift that points to the AWS Glue Data Catalog.

SELECT 
  eventtype,
  json_extract_path_text(currentagentsnapshot,'agentstatus','name') AS current_status,
  json_extract_path_text(currentagentsnapshot, 'configuration','firstname') AS current_firstname,
  json_extract_path_text(currentagentsnapshot, 'configuration','lastname') AS current_lastname,
  json_extract_path_text(
    currentagentsnapshot,
    'configuration','routingprofile','defaultoutboundqueue','name') AS current_outboundqueue,
FROM default_schema.kfhconnectblog

The following shows the query output:

Summary

In this post, I showed you how to use Kinesis Data Firehose to ingest and convert data to columnar file format, enabling real-time analysis using Athena and Amazon Redshift. This great feature enables a level of optimization in both cost and performance that you need when storing and analyzing large amounts of data. This feature is equally important if you are investing in building data lakes on AWS.

 


Additional Reading

If you found this post useful, be sure to check out Analyzing VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight and Work with partitioned data in AWS Glue.


About the Author

Roy Hasson is a Global Business Development Manager for AWS Analytics. He works with customers around the globe to design solutions to meet their data processing, analytics and business intelligence needs. Roy is big Manchester United fan cheering his team on and hanging out with his family.

 

 

 

Spring 2018 AWS SOC Reports are Now Available with 11 Services Added in Scope

Post Syndicated from Chris Gile original https://aws.amazon.com/blogs/security/spring-2018-aws-soc-reports-are-now-available-with-11-services-added-in-scope/

Since our last System and Organization Control (SOC) audit, our service and compliance teams have been working to increase the number of AWS Services in scope prioritized based on customer requests. Today, we’re happy to report 11 services are newly SOC compliant, which is a 21 percent increase in the last six months.

With the addition of the following 11 new services, you can now select from a total of 62 SOC-compliant services. To see the full list, go to our Services in Scope by Compliance Program page:

• Amazon Athena
• Amazon QuickSight
• Amazon WorkDocs
• AWS Batch
• AWS CodeBuild
• AWS Config
• AWS OpsWorks Stacks
• AWS Snowball
• AWS Snowball Edge
• AWS Snowmobile
• AWS X-Ray

Our latest SOC 1, 2, and 3 reports covering the period from October 1, 2017 to March 31, 2018 are now available. The SOC 1 and 2 reports are available on-demand through AWS Artifact by logging into the AWS Management Console. The SOC 3 report can be downloaded here.

Finally, prospective customers can read our SOC 1 and 2 reports by reaching out to AWS Compliance.

Want more AWS Security news? Follow us on Twitter.

Amazon Aurora Backtrack – Turn Back Time

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-aurora-backtrack-turn-back-time/

We’ve all been there! You need to make a quick, seemingly simple fix to an important production database. You compose the query, give it a once-over, and let it run. Seconds later you realize that you forgot the WHERE clause, dropped the wrong table, or made another serious mistake, and interrupt the query, but the damage has been done. You take a deep breath, whistle through your teeth, wish that reality came with an Undo option. Now what?

New Amazon Aurora Backtrack
Today I would like to tell you about the new backtrack feature for Amazon Aurora. This is as close as we can come, given present-day technology, to an Undo option for reality.

This feature can be enabled at launch time for all newly-launched Aurora database clusters. To enable it, you simply specify how far back in time you might want to rewind, and use the database as usual (this is on the Configure advanced settings page):

Aurora uses a distributed, log-structured storage system (read Design Considerations for High Throughput Cloud-Native Relational Databases to learn a lot more); each change to your database generates a new log record, identified by a Log Sequence Number (LSN). Enabling the backtrack feature provisions a FIFO buffer in the cluster for storage of LSNs. This allows for quick access and recovery times measured in seconds.

After that regrettable moment when all seems lost, you simply pause your application, open up the Aurora Console, select the cluster, and click Backtrack DB cluster:

Then you select Backtrack and choose the point in time just before your epic fail, and click Backtrack DB cluster:

Then you wait for the rewind to take place, unpause your application and proceed as if nothing had happened. When you initiate a backtrack, Aurora will pause the database, close any open connections, drop uncommitted writes, and wait for the backtrack to complete. Then it will resume normal operation and being to accept requests. The instance state will be backtracking while the rewind is underway:

The console will let you know when the backtrack is complete:

If it turns out that you went back a bit too far, you can backtrack to a later time. Other Aurora features such as cloning, backups, and restores continue to work on an instance that has been configured for backtrack.

I’m sure you can think of some creative and non-obvious use cases for this cool new feature. For example, you could use it to restore a test database after running a test that makes changes to the database. You can initiate the restoration from the API or the CLI, making it easy to integrate into your existing test framework.

Things to Know
This option applies to newly created MySQL-compatible Aurora database clusters and to MySQL-compatible clusters that have been restored from a backup. You must opt-in when you create or restore a cluster; you cannot enable it for a running cluster.

This feature is available now in all AWS Regions where Amazon Aurora runs, and you can start using it today.

Jeff;

Analyze data in Amazon DynamoDB using Amazon SageMaker for real-time prediction

Post Syndicated from YongSeong Lee original https://aws.amazon.com/blogs/big-data/analyze-data-in-amazon-dynamodb-using-amazon-sagemaker-for-real-time-prediction/

Many companies across the globe use Amazon DynamoDB to store and query historical user-interaction data. DynamoDB is a fast NoSQL database used by applications that need consistent, single-digit millisecond latency.

Often, customers want to turn their valuable data in DynamoDB into insights by analyzing a copy of their table stored in Amazon S3. Doing this separates their analytical queries from their low-latency critical paths. This data can be the primary source for understanding customers’ past behavior, predicting future behavior, and generating downstream business value. Customers often turn to DynamoDB because of its great scalability and high availability. After a successful launch, many customers want to use the data in DynamoDB to predict future behaviors or provide personalized recommendations.

DynamoDB is a good fit for low-latency reads and writes, but it’s not practical to scan all data in a DynamoDB database to train a model. In this post, I demonstrate how you can use DynamoDB table data copied to Amazon S3 by AWS Data Pipeline to predict customer behavior. I also demonstrate how you can use this data to provide personalized recommendations for customers using Amazon SageMaker. You can also run ad hoc queries using Amazon Athena against the data. DynamoDB recently released on-demand backups to create full table backups with no performance impact. However, it’s not suitable for our purposes in this post, so I chose AWS Data Pipeline instead to create managed backups are accessible from other services.

To do this, I describe how to read the DynamoDB backup file format in Data Pipeline. I also describe how to convert the objects in S3 to a CSV format that Amazon SageMaker can read. In addition, I show how to schedule regular exports and transformations using Data Pipeline. The sample data used in this post is from Bank Marketing Data Set of UCI.

The solution that I describe provides the following benefits:

  • Separates analytical queries from production traffic on your DynamoDB table, preserving your DynamoDB read capacity units (RCUs) for important production requests
  • Automatically updates your model to get real-time predictions
  • Optimizes for performance (so it doesn’t compete with DynamoDB RCUs after the export) and for cost (using data you already have)
  • Makes it easier for developers of all skill levels to use Amazon SageMaker

All code and data set in this post are available in this .zip file.

Solution architecture

The following diagram shows the overall architecture of the solution.

The steps that data follows through the architecture are as follows:

  1. Data Pipeline regularly copies the full contents of a DynamoDB table as JSON into an S3
  2. Exported JSON files are converted to comma-separated value (CSV) format to use as a data source for Amazon SageMaker.
  3. Amazon SageMaker renews the model artifact and update the endpoint.
  4. The converted CSV is available for ad hoc queries with Amazon Athena.
  5. Data Pipeline controls this flow and repeats the cycle based on the schedule defined by customer requirements.

Building the auto-updating model

This section discusses details about how to read the DynamoDB exported data in Data Pipeline and build automated workflows for real-time prediction with a regularly updated model.

Download sample scripts and data

Before you begin, take the following steps:

  1. Download sample scripts in this .zip file.
  2. Unzip the src.zip file.
  3. Find the automation_script.sh file and edit it for your environment. For example, you need to replace 's3://<your bucket>/<datasource path>/' with your own S3 path to the data source for Amazon ML. In the script, the text enclosed by angle brackets—< and >—should be replaced with your own path.
  4. Upload the json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar file to your S3 path so that the ADD jar command in Apache Hive can refer to it.

For this solution, the banking.csv  should be imported into a DynamoDB table.

Export a DynamoDB table

To export the DynamoDB table to S3, open the Data Pipeline console and choose the Export DynamoDB table to S3 template. In this template, Data Pipeline creates an Amazon EMR cluster and performs an export in the EMRActivity activity. Set proper intervals for backups according to your business requirements.

One core node(m3.xlarge) provides the default capacity for the EMR cluster and should be suitable for the solution in this post. Leave the option to resize the cluster before running enabled in the TableBackupActivity activity to let Data Pipeline scale the cluster to match the table size. The process of converting to CSV format and renewing models happens in this EMR cluster.

For a more in-depth look at how to export data from DynamoDB, see Export Data from DynamoDB in the Data Pipeline documentation.

Add the script to an existing pipeline

After you export your DynamoDB table, you add an additional EMR step to EMRActivity by following these steps:

  1. Open the Data Pipeline console and choose the ID for the pipeline that you want to add the script to.
  2. For Actions, choose Edit.
  3. In the editing console, choose the Activities category and add an EMR step using the custom script downloaded in the previous section, as shown below.

Paste the following command into the new step after the data ­­upload step:

s3://#{myDDBRegion}.elasticmapreduce/libs/script-runner/script-runner.jar,s3://<your bucket name>/automation_script.sh,#{output.directoryPath},#{myDDBRegion}

The element #{output.directoryPath} references the S3 path where the data pipeline exports DynamoDB data as JSON. The path should be passed to the script as an argument.

The bash script has two goals, converting data formats and renewing the Amazon SageMaker model. Subsequent sections discuss the contents of the automation script.

Automation script: Convert JSON data to CSV with Hive

We use Apache Hive to transform the data into a new format. The Hive QL script to create an external table and transform the data is included in the custom script that you added to the Data Pipeline definition.

When you run the Hive scripts, do so with the -e option. Also, define the Hive table with the 'org.openx.data.jsonserde.JsonSerDe' row format to parse and read JSON format. The SQL creates a Hive EXTERNAL table, and it reads the DynamoDB backup data on the S3 path passed to it by Data Pipeline.

Note: You should create the table with the “EXTERNAL” keyword to avoid the backup data being accidentally deleted from S3 if you drop the table.

The full automation script for converting follows. Add your own bucket name and data source path in the highlighted areas.

#!/bin/bash
hive -e "
ADD jar s3://<your bucket name>/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar ; 
DROP TABLE IF EXISTS blog_backup_data ;
CREATE EXTERNAL TABLE blog_backup_data (
 customer_id map<string,string>,
 age map<string,string>, job map<string,string>, 
 marital map<string,string>,education map<string,string>, 
 default map<string,string>, housing map<string,string>,
 loan map<string,string>, contact map<string,string>, 
 month map<string,string>, day_of_week map<string,string>, 
 duration map<string,string>, campaign map<string,string>,
 pdays map<string,string>, previous map<string,string>, 
 poutcome map<string,string>, emp_var_rate map<string,string>, 
 cons_price_idx map<string,string>, cons_conf_idx map<string,string>,
 euribor3m map<string,string>, nr_employed map<string,string>, 
 y map<string,string> ) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 
LOCATION '$1/';

INSERT OVERWRITE DIRECTORY 's3://<your bucket name>/<datasource path>/' 
SELECT concat( customer_id['s'],',', 
 age['n'],',', job['s'],',', 
 marital['s'],',', education['s'],',', default['s'],',', 
 housing['s'],',', loan['s'],',', contact['s'],',', 
 month['s'],',', day_of_week['s'],',', duration['n'],',', 
 campaign['n'],',',pdays['n'],',',previous['n'],',', 
 poutcome['s'],',', emp_var_rate['n'],',', cons_price_idx['n'],',',
 cons_conf_idx['n'],',', euribor3m['n'],',', nr_employed['n'],',', y['n'] ) 
FROM blog_backup_data
WHERE customer_id['s'] > 0 ; 

After creating an external table, you need to read data. You then use the INSERT OVERWRITE DIRECTORY ~ SELECT command to write CSV data to the S3 path that you designated as the data source for Amazon SageMaker.

Depending on your requirements, you can eliminate or process the columns in the SELECT clause in this step to optimize data analysis. For example, you might remove some columns that have unpredictable correlations with the target value because keeping the wrong columns might expose your model to “overfitting” during the training. In this post, customer_id  columns is removed. Overfitting can make your prediction weak. More information about overfitting can be found in the topic Model Fit: Underfitting vs. Overfitting in the Amazon ML documentation.

Automation script: Renew the Amazon SageMaker model

After the CSV data is replaced and ready to use, create a new model artifact for Amazon SageMaker with the updated dataset on S3.  For renewing model artifact, you must create a new training job.  Training jobs can be run using the AWS SDK ( for example, Amazon SageMaker boto3 ) or the Amazon SageMaker Python SDK that can be installed with “pip install sagemaker” command as well as the AWS CLI for Amazon SageMaker described in this post.

In addition, consider how to smoothly renew your existing model without service impact, because your model is called by applications in real time. To do this, you need to create a new endpoint configuration first and update a current endpoint with the endpoint configuration that is just created.

#!/bin/bash
## Define variable 
REGION=$2
DTTIME=`date +%Y-%m-%d-%H-%M-%S`
ROLE="<your AmazonSageMaker-ExecutionRole>" 


# Select containers image based on region.  
case "$REGION" in
"us-west-2" )
    IMAGE="174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:latest"
    ;;
"us-east-1" )
    IMAGE="382416733822.dkr.ecr.us-east-1.amazonaws.com/linear-learner:latest" 
    ;;
"us-east-2" )
    IMAGE="404615174143.dkr.ecr.us-east-2.amazonaws.com/linear-learner:latest" 
    ;;
"eu-west-1" )
    IMAGE="438346466558.dkr.ecr.eu-west-1.amazonaws.com/linear-learner:latest" 
    ;;
 *)
    echo "Invalid Region Name"
    exit 1 ;  
esac

# Start training job and creating model artifact 
TRAINING_JOB_NAME=TRAIN-${DTTIME} 
S3OUTPUT="s3://<your bucket name>/model/" 
INSTANCETYPE="ml.m4.xlarge"
INSTANCECOUNT=1
VOLUMESIZE=5 
aws sagemaker create-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --algorithm-specification TrainingImage=${IMAGE},TrainingInputMode=File --role-arn ${ROLE}  --input-data-config '[{ "ChannelName": "train", "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://<your bucket name>/<datasource path>/", "S3DataDistributionType": "FullyReplicated" } }, "ContentType": "text/csv", "CompressionType": "None" , "RecordWrapperType": "None"  }]'  --output-data-config S3OutputPath=${S3OUTPUT} --resource-config  InstanceType=${INSTANCETYPE},InstanceCount=${INSTANCECOUNT},VolumeSizeInGB=${VOLUMESIZE} --stopping-condition MaxRuntimeInSeconds=120 --hyper-parameters feature_dim=20,predictor_type=binary_classifier  

# Wait until job completed 
aws sagemaker wait training-job-completed-or-stopped --training-job-name ${TRAINING_JOB_NAME}  --region ${REGION}

# Get newly created model artifact and create model
MODELARTIFACT=`aws sagemaker describe-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --query 'ModelArtifacts.S3ModelArtifacts' --output text `
MODELNAME=MODEL-${DTTIME}
aws sagemaker create-model --region ${REGION} --model-name ${MODELNAME}  --primary-container Image=${IMAGE},ModelDataUrl=${MODELARTIFACT}  --execution-role-arn ${ROLE}

# create a new endpoint configuration 
CONFIGNAME=CONFIG-${DTTIME}
aws sagemaker  create-endpoint-config --region ${REGION} --endpoint-config-name ${CONFIGNAME}  --production-variants  VariantName=Users,ModelName=${MODELNAME},InitialInstanceCount=1,InstanceType=ml.m4.xlarge

# create or update the endpoint
STATUS=`aws sagemaker describe-endpoint --endpoint-name  ServiceEndpoint --query 'EndpointStatus' --output text --region ${REGION} `
if [[ $STATUS -ne "InService" ]] ;
then
    aws sagemaker  create-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}    
else
    aws sagemaker  update-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}
fi

Grant permission

Before you execute the script, you must grant proper permission to Data Pipeline. Data Pipeline uses the DataPipelineDefaultResourceRole role by default. I added the following policy to DataPipelineDefaultResourceRole to allow Data Pipeline to create, delete, and update the Amazon SageMaker model and data source in the script.

{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Action": [
 "sagemaker:CreateTrainingJob",
 "sagemaker:DescribeTrainingJob",
 "sagemaker:CreateModel",
 "sagemaker:CreateEndpointConfig",
 "sagemaker:DescribeEndpoint",
 "sagemaker:CreateEndpoint",
 "sagemaker:UpdateEndpoint",
 "iam:PassRole"
 ],
 "Resource": "*"
 }
 ]
}

Use real-time prediction

After you deploy a model into production using Amazon SageMaker hosting services, your client applications use this API to get inferences from the model hosted at the specified endpoint. This approach is useful for interactive web, mobile, or desktop applications.

Following, I provide a simple Python code example that queries against Amazon SageMaker endpoint URL with its name (“ServiceEndpoint”) and then uses them for real-time prediction.

=== Python sample for real-time prediction ===

#!/usr/bin/env python
import boto3
import json 

client = boto3.client('sagemaker-runtime', region_name ='<your region>' )
new_customer_info = '34,10,2,4,1,2,1,1,6,3,190,1,3,4,3,-1.7,94.055,-39.8,0.715,4991.6'
response = client.invoke_endpoint(
    EndpointName='ServiceEndpoint',
    Body=new_customer_info, 
    ContentType='text/csv'
)
result = json.loads(response['Body'].read().decode())
print(result)
--- output(response) ---
{u'predictions': [{u'score': 0.7528127431869507, u'predicted_label': 1.0}]}

Solution summary

The solution takes the following steps:

  1. Data Pipeline exports DynamoDB table data into S3. The original JSON data should be kept to recover the table in the rare event that this is needed. Data Pipeline then converts JSON to CSV so that Amazon SageMaker can read the data.Note: You should select only meaningful attributes when you convert CSV. For example, if you judge that the “campaign” attribute is not correlated, you can eliminate this attribute from the CSV.
  2. Train the Amazon SageMaker model with the new data source.
  3. When a new customer comes to your site, you can judge how likely it is for this customer to subscribe to your new product based on “predictedScores” provided by Amazon SageMaker.
  4. If the new user subscribes your new product, your application must update the attribute “y” to the value 1 (for yes). This updated data is provided for the next model renewal as a new data source. It serves to improve the accuracy of your prediction. With each new entry, your application can become smarter and deliver better predictions.

Running ad hoc queries using Amazon Athena

Amazon Athena is a serverless query service that makes it easy to analyze large amounts of data stored in Amazon S3 using standard SQL. Athena is useful for examining data and collecting statistics or informative summaries about data. You can also use the powerful analytic functions of Presto, as described in the topic Aggregate Functions of Presto in the Presto documentation.

With the Data Pipeline scheduled activity, recent CSV data is always located in S3 so that you can run ad hoc queries against the data using Amazon Athena. I show this with example SQL statements following. For an in-depth description of this process, see the post Interactive SQL Queries for Data in Amazon S3 on the AWS News Blog. 

Creating an Amazon Athena table and running it

Simply, you can create an EXTERNAL table for the CSV data on S3 in Amazon Athena Management Console.

=== Table Creation ===
CREATE EXTERNAL TABLE datasource (
 age int, 
 job string, 
 marital string , 
 education string, 
 default string, 
 housing string, 
 loan string, 
 contact string, 
 month string, 
 day_of_week string, 
 duration int, 
 campaign int, 
 pdays int , 
 previous int , 
 poutcome string, 
 emp_var_rate double, 
 cons_price_idx double,
 cons_conf_idx double, 
 euribor3m double, 
 nr_employed double, 
 y int 
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n' 
LOCATION 's3://<your bucket name>/<datasource path>/';

The following query calculates the correlation coefficient between the target attribute and other attributes using Amazon Athena.

=== Sample Query ===

SELECT corr(age,y) AS correlation_age_and_target, 
 corr(duration,y) AS correlation_duration_and_target, 
 corr(campaign,y) AS correlation_campaign_and_target,
 corr(contact,y) AS correlation_contact_and_target
FROM ( SELECT age , duration , campaign , y , 
 CASE WHEN contact = 'telephone' THEN 1 ELSE 0 END AS contact 
 FROM datasource 
 ) datasource ;

Conclusion

In this post, I introduce an example of how to analyze data in DynamoDB by using table data in Amazon S3 to optimize DynamoDB table read capacity. You can then use the analyzed data as a new data source to train an Amazon SageMaker model for accurate real-time prediction. In addition, you can run ad hoc queries against the data on S3 using Amazon Athena. I also present how to automate these procedures by using Data Pipeline.

You can adapt this example to your specific use case at hand, and hopefully this post helps you accelerate your development. You can find more examples and use cases for Amazon SageMaker in the video AWS 2017: Introducing Amazon SageMaker on the AWS website.

 


Additional Reading

If you found this post useful, be sure to check out Serving Real-Time Machine Learning Predictions on Amazon EMR and Analyzing Data in S3 using Amazon Athena.

 


About the Author

Yong Seong Lee is a Cloud Support Engineer for AWS Big Data Services. He is interested in every technology related to data/databases and helping customers who have difficulties in using AWS services. His motto is “Enjoy life, be curious and have maximum experience.”