Tag Archives: dumps

Use Slack ChatOps to Deploy Your Code – How to Integrate Your Pipeline in AWS CodePipeline with Your Slack Channel

Post Syndicated from Rumi Olsen original https://aws.amazon.com/blogs/devops/use-slack-chatops-to-deploy-your-code-how-to-integrate-your-pipeline-in-aws-codepipeline-with-your-slack-channel/

Slack is widely used by DevOps and development teams to communicate status. Typically, when a build has been tested and is ready to be promoted to a staging environment, a QA engineer or DevOps engineer kicks off the deployment. Using Slack in a ChatOps collaboration model, the promotion can be done in a single click from a Slack channel. And because the promotion happens through a Slack channel, the whole development team knows what’s happening without checking email.

In this blog post, I will show you how to integrate AWS services with a Slack application. I use an interactive message button and incoming webhook to promote a stage with a single click.

To follow along with the steps in this post, you’ll need a pipeline in AWS CodePipeline. If you don’t have a pipeline, the fastest way to create one for this use case is to use AWS CodeStar. Go to the AWS CodeStar console and select the Static Website template (shown in the screenshot). AWS CodeStar will create a pipeline with an AWS CodeCommit repository and an AWS CodeDeploy deployment for you. After the pipeline is created, you will need to add a manual approval stage.

You’ll also need to build a Slack app with webhooks and interactive components, write two Lambda functions, and create an API Gateway API and a SNS topic.

As you’ll see in the following diagram, when I make a change and merge a new feature into the master branch in AWS CodeCommit, the check-in kicks off my CI/CD pipeline in AWS CodePipeline. When CodePipeline reaches the approval stage, it sends a notification to Amazon SNS, which triggers an AWS Lambda function (ApprovalRequester).

The Slack channel receives a prompt that looks like the following screenshot. When I click Yes to approve the build promotion, the approval result is sent to CodePipeline through API Gateway and Lambda (ApprovalHandler). The pipeline continues on to deploy the build to the next environment.

Create a Slack app

For App Name, type a name for your app. For Development Slack Workspace, choose the name of your workspace. You’ll see in the following screenshot that my workspace is AWS ChatOps.

After the Slack application has been created, you will see the Basic Information page, where you can create incoming webhooks and enable interactive components.

To add incoming webhooks:

  1. Under Add features and functionality, choose Incoming Webhooks. Turn the feature on by selecting Off, as shown in the following screenshot.
  2. Now that the feature is turned on, choose Add New Webhook to Workspace. In the process of creating the webhook, Slack lets you choose the channel where messages will be posted.
  3. After the webhook has been created, you’ll see its URL. You will use this URL when you create the Lambda function.

If you followed the steps in the post, the pipeline should look like the following.

Write the Lambda function for approval requests

This Lambda function is invoked by the SNS notification. It sends a request that consists of an interactive message button to the incoming webhook you created earlier.  The following sample code sends the request to the incoming webhook. WEBHOOK_URL and SLACK_CHANNEL are the environment variables that hold values of the webhook URL that you created and the Slack channel where you want the interactive message button to appear.

# This function is invoked via SNS when the CodePipeline manual approval action starts.
# It will take the details from this approval notification and sent an interactive message to Slack that allows users to approve or cancel the deployment.

import os
import json
import logging
import urllib.parse

from base64 import b64decode
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError

# This is passed as a plain-text environment variable for ease of demonstration.
# Consider encrypting the value with KMS or use an encrypted parameter in Parameter Store for production deployments.
SLACK_WEBHOOK_URL = os.environ['SLACK_WEBHOOK_URL']
SLACK_CHANNEL = os.environ['SLACK_CHANNEL']

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    print("Received event: " + json.dumps(event, indent=2))
    message = event["Records"][0]["Sns"]["Message"]
    
    data = json.loads(message) 
    token = data["approval"]["token"]
    codepipeline_name = data["approval"]["pipelineName"]
    
    slack_message = {
        "channel": SLACK_CHANNEL,
        "text": "Would you like to promote the build to production?",
        "attachments": [
            {
                "text": "Yes to deploy your build to production",
                "fallback": "You are unable to promote a build",
                "callback_id": "wopr_game",
                "color": "#3AA3E3",
                "attachment_type": "default",
                "actions": [
                    {
                        "name": "deployment",
                        "text": "Yes",
                        "style": "danger",
                        "type": "button",
                        "value": json.dumps({"approve": True, "codePipelineToken": token, "codePipelineName": codepipeline_name}),
                        "confirm": {
                            "title": "Are you sure?",
                            "text": "This will deploy the build to production",
                            "ok_text": "Yes",
                            "dismiss_text": "No"
                        }
                    },
                    {
                        "name": "deployment",
                        "text": "No",
                        "type": "button",
                        "value": json.dumps({"approve": False, "codePipelineToken": token, "codePipelineName": codepipeline_name})
                    }  
                ]
            }
        ]
    }

    req = Request(SLACK_WEBHOOK_URL, json.dumps(slack_message).encode('utf-8'))

    response = urlopen(req)
    response.read()
    
    return None

 

Create a SNS topic

Create a topic and then create a subscription that invokes the ApprovalRequester Lambda function. You can configure the manual approval action in the pipeline to send a message to this SNS topic when an approval action is required. When the pipeline reaches the approval stage, it sends a notification to this SNS topic. SNS publishes a notification to all of the subscribed endpoints. In this case, the Lambda function is the endpoint. Therefore, it invokes and executes the Lambda function. For information about how to create a SNS topic, see Create a Topic in the Amazon SNS Developer Guide.

Write the Lambda function for handling the interactive message button

This Lambda function is invoked by API Gateway. It receives the result of the interactive message button whether or not the build promotion was approved. If approved, an API call is made to CodePipeline to promote the build to the next environment. If not approved, the pipeline stops and does not move to the next stage.

The Lambda function code might look like the following. SLACK_VERIFICATION_TOKEN is the environment variable that contains your Slack verification token. You can find your verification token under Basic Information on Slack manage app page. When you scroll down, you will see App Credential. Verification token is found under the section.

# This function is triggered via API Gateway when a user acts on the Slack interactive message sent by approval_requester.py.

from urllib.parse import parse_qs
import json
import os
import boto3

SLACK_VERIFICATION_TOKEN = os.environ['SLACK_VERIFICATION_TOKEN']

#Triggered by API Gateway
#It kicks off a particular CodePipeline project
def lambda_handler(event, context):
	#print("Received event: " + json.dumps(event, indent=2))
	body = parse_qs(event['body'])
	payload = json.loads(body['payload'][0])

	# Validate Slack token
	if SLACK_VERIFICATION_TOKEN == payload['token']:
		send_slack_message(json.loads(payload['actions'][0]['value']))
		
		# This will replace the interactive message with a simple text response.
		# You can implement a more complex message update if you would like.
		return  {
			"isBase64Encoded": "false",
			"statusCode": 200,
			"body": "{\"text\": \"The approval has been processed\"}"
		}
	else:
		return  {
			"isBase64Encoded": "false",
			"statusCode": 403,
			"body": "{\"error\": \"This request does not include a vailid verification token.\"}"
		}


def send_slack_message(action_details):
	codepipeline_status = "Approved" if action_details["approve"] else "Rejected"
	codepipeline_name = action_details["codePipelineName"]
	token = action_details["codePipelineToken"] 

	client = boto3.client('codepipeline')
	response_approval = client.put_approval_result(
							pipelineName=codepipeline_name,
							stageName='Approval',
							actionName='ApprovalOrDeny',
							result={'summary':'','status':codepipeline_status},
							token=token)
	print(response_approval)

 

Create the API Gateway API

  1. In the Amazon API Gateway console, create a resource called InteractiveMessageHandler.
  2. Create a POST method.
    • For Integration type, choose Lambda Function.
    • Select Use Lambda Proxy integration.
    • From Lambda Region, choose a region.
    • In Lambda Function, type a name for your function.
  3.  Deploy to a stage.

For more information, see Getting Started with Amazon API Gateway in the Amazon API Developer Guide.

Now go back to your Slack application and enable interactive components.

To enable interactive components for the interactive message (Yes) button:

  1. Under Features, choose Interactive Components.
  2. Choose Enable Interactive Components.
  3. Type a request URL in the text box. Use the invoke URL in Amazon API Gateway that will be called when the approval button is clicked.

Now that all the pieces have been created, run the solution by checking in a code change to your CodeCommit repo. That will release the change through CodePipeline. When the CodePipeline comes to the approval stage, it will prompt to your Slack channel to see if you want to promote the build to your staging or production environment. Choose Yes and then see if your change was deployed to the environment.

Conclusion

That is it! You have now created a Slack ChatOps solution using AWS CodeCommit, AWS CodePipeline, AWS Lambda, Amazon API Gateway, and Amazon Simple Notification Service.

Now that you know how to do this Slack and CodePipeline integration, you can use the same method to interact with other AWS services using API Gateway and Lambda. You can also use Slack’s slash command to initiate an action from a Slack channel, rather than responding in the way demonstrated in this post.

snallygaster – Scan For Secret Files On HTTP Servers

Post Syndicated from Darknet original https://www.darknet.org.uk/2018/04/snallygaster-scan-for-secret-files-on-http-servers/?utm_source=rss&utm_medium=social&utm_campaign=darknetfeed

snallygaster – Scan For Secret Files On HTTP Servers

snallygaster is a Python-based tool that can help you to scan for secret files on HTTP servers, files that are accessible that shouldn’t be public and can pose a security risk.

Typical examples include publicly accessible git repositories, backup files potentially containing passwords or database dumps. In addition it contains a few checks for other security vulnerabilities.

snallygaster HTTP Secret File Scanner Features

This is an overview of the tests provided by snallygaster.

Read the rest of snallygaster – Scan For Secret Files On HTTP Servers now! Only available at Darknet.

AWS Secrets Manager: Store, Distribute, and Rotate Credentials Securely

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/aws-secrets-manager-store-distribute-and-rotate-credentials-securely/

Today we’re launching AWS Secrets Manager which makes it easy to store and retrieve your secrets via API or the AWS Command Line Interface (CLI) and rotate your credentials with built-in or custom AWS Lambda functions. Managing application secrets like database credentials, passwords, or API Keys is easy when you’re working locally with one machine and one application. As you grow and scale to many distributed microservices, it becomes a daunting task to securely store, distribute, rotate, and consume secrets. Previously, customers needed to provision and maintain additional infrastructure solely for secrets management which could incur costs and introduce unneeded complexity into systems.

AWS Secrets Manager

Imagine that I have an application that takes incoming tweets from Twitter and stores them in an Amazon Aurora database. Previously, I would have had to request a username and password from my database administrator and embed those credentials in environment variables or, in my race to production, even in the application itself. I would also need to have our social media manager create the Twitter API credentials and figure out how to store those. This is a fairly manual process, involving multiple people, that I have to restart every time I want to rotate these credentials. With Secrets Manager my database administrator can provide the credentials in secrets manager once and subsequently rely on a Secrets Manager provided Lambda function to automatically update and rotate those credentials. My social media manager can put the Twitter API keys in Secrets Manager which I can then access with a simple API call and I can even rotate these programmatically with a custom lambda function calling out to the Twitter API. My secrets are encrypted with the KMS key of my choice, and each of these administrators can explicitly grant access to these secrets with with granular IAM policies for individual roles or users.

Let’s take a look at how I would store a secret using the AWS Secrets Manager console. First, I’ll click Store a new secret to get to the new secrets wizard. For my RDS Aurora instance it’s straightforward to simply select the instance and provide the initial username and password to connect to the database.

Next, I’ll fill in a quick description and a name to access my secret by. You can use whatever naming scheme you want here.

Next, we’ll configure rotation to use the Secrets Manager-provided Lambda function to rotate our password every 10 days.

Finally, we’ll review all the details and check out our sample code for storing and retrieving our secret!

Finally I can review the secrets in the console.

Now, if I needed to access these secrets I’d simply call the API.

import json
import boto3
secrets = boto3.client("secretsmanager")
rds = json.dumps(secrets.get_secrets_value("prod/TwitterApp/Database")['SecretString'])
print(rds)

Which would give me the following values:


{'engine': 'mysql',
 'host': 'twitterapp2.abcdefg.us-east-1.rds.amazonaws.com',
 'password': '-)Kw>THISISAFAKEPASSWORD:lg{&sad+Canr',
 'port': 3306,
 'username': 'ranman'}

More than passwords

AWS Secrets Manager works for more than just passwords. I can store OAuth credentials, binary data, and more. Let’s look at storing my Twitter OAuth application keys.

Now, I can define the rotation for these third-party OAuth credentials with a custom AWS Lambda function that can call out to Twitter whenever we need to rotate our credentials.

Custom Rotation

One of the niftiest features of AWS Secrets Manager is custom AWS Lambda functions for credential rotation. This allows you to define completely custom workflows for credentials. Secrets Manager will call your lambda with a payload that includes a Step which specifies which step of the rotation you’re in, a SecretId which specifies which secret the rotation is for, and importantly a ClientRequestToken which is used to ensure idempotency in any changes to the underlying secret.

When you’re rotating secrets you go through a few different steps:

  1. createSecret
  2. setSecret
  3. testSecret
  4. finishSecret

The advantage of these steps is that you can add any kind of approval steps you want for each phase of the rotation. For more details on custom rotation check out the documentation.

Available Now
AWS Secrets Manager is available today in US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), EU (Frankfurt), EU (Ireland), EU (London), and South America (São Paulo). Secrets are priced at $0.40 per month per secret and $0.05 per 10,000 API calls. I’m looking forward to seeing more users adopt rotating credentials to secure their applications!

Randall

Performing Unit Testing in an AWS CodeStar Project

Post Syndicated from Jerry Mathen Jacob original https://aws.amazon.com/blogs/devops/performing-unit-testing-in-an-aws-codestar-project/

In this blog post, I will show how you can perform unit testing as a part of your AWS CodeStar project. AWS CodeStar helps you quickly develop, build, and deploy applications on AWS. With AWS CodeStar, you can set up your continuous delivery (CD) toolchain and manage your software development from one place.

Because unit testing tests individual units of application code, it is helpful for quickly identifying and isolating issues. As a part of an automated CI/CD process, it can also be used to prevent bad code from being deployed into production.

Many of the AWS CodeStar project templates come preconfigured with a unit testing framework so that you can start deploying your code with more confidence. The unit testing is configured to run in the provided build stage so that, if the unit tests do not pass, the code is not deployed. For a list of AWS CodeStar project templates that include unit testing, see AWS CodeStar Project Templates in the AWS CodeStar User Guide.

The scenario

As a big fan of superhero movies, I decided to list my favorites and ask my friends to vote on theirs by using a WebService endpoint I created. The example I use is a Python web service running on AWS Lambda with AWS CodeCommit as the code repository. CodeCommit is a fully managed source control system that hosts Git repositories and works with all Git-based tools.

Here’s how you can create the WebService endpoint:

Sign in to the AWS CodeStar console. Choose Start a project, which will take you to the list of project templates.

create project

For code edits I will choose AWS Cloud9, which is a cloud-based integrated development environment (IDE) that you use to write, run, and debug code.

choose cloud9

Here are the other tasks required by my scenario:

  • Create a database table where the votes can be stored and retrieved as needed.
  • Update the logic in the Lambda function that was created for posting and getting the votes.
  • Update the unit tests (of course!) to verify that the logic works as expected.

For a database table, I’ve chosen Amazon DynamoDB, which offers a fast and flexible NoSQL database.

Getting set up on AWS Cloud9

From the AWS CodeStar console, go to the AWS Cloud9 console, which should take you to your project code. I will open up a terminal at the top-level folder under which I will set up my environment and required libraries.

Use the following command to set the PYTHONPATH environment variable on the terminal.

export PYTHONPATH=/home/ec2-user/environment/vote-your-movie

You should now be able to use the following command to execute the unit tests in your project.

python -m unittest discover vote-your-movie/tests

cloud9 setup

Start coding

Now that you have set up your local environment and have a copy of your code, add a DynamoDB table to the project by defining it through a template file. Open template.yml, which is the Serverless Application Model (SAM) template file. This template extends AWS CloudFormation to provide a simplified way of defining the Amazon API Gateway APIs, AWS Lambda functions, and Amazon DynamoDB tables required by your serverless application.

AWSTemplateFormatVersion: 2010-09-09
Transform:
- AWS::Serverless-2016-10-31
- AWS::CodeStar

Parameters:
  ProjectId:
    Type: String
    Description: CodeStar projectId used to associate new resources to team members

Resources:
  # The DB table to store the votes.
  MovieVoteTable:
    Type: AWS::Serverless::SimpleTable
    Properties:
      PrimaryKey:
        # Name of the "Candidate" is the partition key of the table.
        Name: Candidate
        Type: String
  # Creating a new lambda function for retrieving and storing votes.
  MovieVoteLambda:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.handler
      Runtime: python3.6
      Environment:
        # Setting environment variables for your lambda function.
        Variables:
          TABLE_NAME: !Ref "MovieVoteTable"
          TABLE_REGION: !Ref "AWS::Region"
      Role:
        Fn::ImportValue:
          !Join ['-', [!Ref 'ProjectId', !Ref 'AWS::Region', 'LambdaTrustRole']]
      Events:
        GetEvent:
          Type: Api
          Properties:
            Path: /
            Method: get
        PostEvent:
          Type: Api
          Properties:
            Path: /
            Method: post

We’ll use Python’s boto3 library to connect to AWS services. And we’ll use Python’s mock library to mock AWS service calls for our unit tests.
Use the following command to install these libraries:

pip install --upgrade boto3 mock -t .

install dependencies

Add these libraries to the buildspec.yml, which is the YAML file that is required for CodeBuild to execute.

version: 0.2

phases:
  install:
    commands:

      # Upgrade AWS CLI to the latest version
      - pip install --upgrade awscli boto3 mock

  pre_build:
    commands:

      # Discover and run unit tests in the 'tests' directory. For more information, see <https://docs.python.org/3/library/unittest.html#test-discovery>
      - python -m unittest discover tests

  build:
    commands:

      # Use AWS SAM to package the application by using AWS CloudFormation
      - aws cloudformation package --template template.yml --s3-bucket $S3_BUCKET --output-template template-export.yml

artifacts:
  type: zip
  files:
    - template-export.yml

Open the index.py where we can write the simple voting logic for our Lambda function.

import json
import datetime
import boto3
import os

table_name = os.environ['TABLE_NAME']
table_region = os.environ['TABLE_REGION']

VOTES_TABLE = boto3.resource('dynamodb', region_name=table_region).Table(table_name)
CANDIDATES = {"A": "Black Panther", "B": "Captain America: Civil War", "C": "Guardians of the Galaxy", "D": "Thor: Ragnarok"}

def handler(event, context):
    if event['httpMethod'] == 'GET':
        resp = VOTES_TABLE.scan()
        return {'statusCode': 200,
                'body': json.dumps({item['Candidate']: int(item['Votes']) for item in resp['Items']}),
                'headers': {'Content-Type': 'application/json'}}

    elif event['httpMethod'] == 'POST':
        try:
            body = json.loads(event['body'])
        except:
            return {'statusCode': 400,
                    'body': 'Invalid input! Expecting a JSON.',
                    'headers': {'Content-Type': 'application/json'}}
        if 'candidate' not in body:
            return {'statusCode': 400,
                    'body': 'Missing "candidate" in request.',
                    'headers': {'Content-Type': 'application/json'}}
        if body['candidate'] not in CANDIDATES.keys():
            return {'statusCode': 400,
                    'body': 'You must vote for one of the following candidates - {}.'.format(get_allowed_candidates()),
                    'headers': {'Content-Type': 'application/json'}}

        resp = VOTES_TABLE.update_item(
            Key={'Candidate': CANDIDATES.get(body['candidate'])},
            UpdateExpression='ADD Votes :incr',
            ExpressionAttributeValues={':incr': 1},
            ReturnValues='ALL_NEW'
        )
        return {'statusCode': 200,
                'body': "{} now has {} votes".format(CANDIDATES.get(body['candidate']), resp['Attributes']['Votes']),
                'headers': {'Content-Type': 'application/json'}}

def get_allowed_candidates():
    l = []
    for key in CANDIDATES:
        l.append("'{}' for '{}'".format(key, CANDIDATES.get(key)))
    return ", ".join(l)

What our code basically does is take in the HTTPS request call as an event. If it is an HTTP GET request, it gets the votes result from the table. If it is an HTTP POST request, it sets a vote for the candidate of choice. We also validate the inputs in the POST request to filter out requests that seem malicious. That way, only valid calls are stored in the table.

In the example code provided, we use a CANDIDATES variable to store our candidates, but you can store the candidates in a JSON file and use Python’s json library instead.

Let’s update the tests now. Under the tests folder, open the test_handler.py and modify it to verify the logic.

import os
# Some mock environment variables that would be used by the mock for DynamoDB
os.environ['TABLE_NAME'] = "MockHelloWorldTable"
os.environ['TABLE_REGION'] = "us-east-1"

# The library containing our logic.
import index

# Boto3's core library
import botocore
# For handling JSON.
import json
# Unit test library
import unittest
## Getting StringIO based on your setup.
try:
    from StringIO import StringIO
except ImportError:
    from io import StringIO
## Python mock library
from mock import patch, call
from decimal import Decimal

@patch('botocore.client.BaseClient._make_api_call')
class TestCandidateVotes(unittest.TestCase):

    ## Test the HTTP GET request flow. 
    ## We expect to get back a successful response with results of votes from the table (mocked).
    def test_get_votes(self, boto_mock):
        # Input event to our method to test.
        expected_event = {'httpMethod': 'GET'}
        # The mocked values in our DynamoDB table.
        items_in_db = [{'Candidate': 'Black Panther', 'Votes': Decimal('3')},
                        {'Candidate': 'Captain America: Civil War', 'Votes': Decimal('8')},
                        {'Candidate': 'Guardians of the Galaxy', 'Votes': Decimal('8')},
                        {'Candidate': "Thor: Ragnarok", 'Votes': Decimal('1')}
                    ]
        # The mocked DynamoDB response.
        expected_ddb_response = {'Items': items_in_db}
        # The mocked response we expect back by calling DynamoDB through boto.
        response_body = botocore.response.StreamingBody(StringIO(str(expected_ddb_response)),
                                                        len(str(expected_ddb_response)))
        # Setting the expected value in the mock.
        boto_mock.side_effect = [expected_ddb_response]
        # Expecting that there would be a call to DynamoDB Scan function during execution with these parameters.
        expected_calls = [call('Scan', {'TableName': os.environ['TABLE_NAME']})]

        # Call the function to test.
        result = index.handler(expected_event, {})

        # Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
        assert result.get('headers').get('Content-Type') == 'application/json'
        assert result.get('statusCode') == 200

        result_body = json.loads(result.get('body'))
        # Verifying that the results match to that from the table.
        assert len(result_body) == len(items_in_db)
        for i in range(len(result_body)):
            assert result_body.get(items_in_db[i].get("Candidate")) == int(items_in_db[i].get("Votes"))

        assert boto_mock.call_count == 1
        boto_mock.assert_has_calls(expected_calls)

    ## Test the HTTP POST request flow that places a vote for a selected candidate.
    ## We expect to get back a successful response with a confirmation message.
    def test_place_valid_candidate_vote(self, boto_mock):
        # Input event to our method to test.
        expected_event = {'httpMethod': 'POST', 'body': "{\"candidate\": \"D\"}"}
        # The mocked response in our DynamoDB table.
        expected_ddb_response = {'Attributes': {'Candidate': "Thor: Ragnarok", 'Votes': Decimal('2')}}
        # The mocked response we expect back by calling DynamoDB through boto.
        response_body = botocore.response.StreamingBody(StringIO(str(expected_ddb_response)),
                                                        len(str(expected_ddb_response)))
        # Setting the expected value in the mock.
        boto_mock.side_effect = [expected_ddb_response]
        # Expecting that there would be a call to DynamoDB UpdateItem function during execution with these parameters.
        expected_calls = [call('UpdateItem', {
                                                'TableName': os.environ['TABLE_NAME'], 
                                                'Key': {'Candidate': 'Thor: Ragnarok'},
                                                'UpdateExpression': 'ADD Votes :incr',
                                                'ExpressionAttributeValues': {':incr': 1},
                                                'ReturnValues': 'ALL_NEW'
                                            })]
        # Call the function to test.
        result = index.handler(expected_event, {})
        # Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
        assert result.get('headers').get('Content-Type') == 'application/json'
        assert result.get('statusCode') == 200

        assert result.get('body') == "{} now has {} votes".format(
            expected_ddb_response['Attributes']['Candidate'], 
            expected_ddb_response['Attributes']['Votes'])

        assert boto_mock.call_count == 1
        boto_mock.assert_has_calls(expected_calls)

    ## Test the HTTP POST request flow that places a vote for an non-existant candidate.
    ## We expect to get back a successful response with a confirmation message.
    def test_place_invalid_candidate_vote(self, boto_mock):
        # Input event to our method to test.
        # The valid IDs for the candidates are A, B, C, and D
        expected_event = {'httpMethod': 'POST', 'body': "{\"candidate\": \"E\"}"}
        # Call the function to test.
        result = index.handler(expected_event, {})
        # Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
        assert result.get('headers').get('Content-Type') == 'application/json'
        assert result.get('statusCode') == 400
        assert result.get('body') == 'You must vote for one of the following candidates - {}.'.format(index.get_allowed_candidates())

    ## Test the HTTP POST request flow that places a vote for a selected candidate but associated with an invalid key in the POST body.
    ## We expect to get back a failed (400) response with an appropriate error message.
    def test_place_invalid_data_vote(self, boto_mock):
        # Input event to our method to test.
        # "name" is not the expected input key.
        expected_event = {'httpMethod': 'POST', 'body': "{\"name\": \"D\"}"}
        # Call the function to test.
        result = index.handler(expected_event, {})
        # Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
        assert result.get('headers').get('Content-Type') == 'application/json'
        assert result.get('statusCode') == 400
        assert result.get('body') == 'Missing "candidate" in request.'

    ## Test the HTTP POST request flow that places a vote for a selected candidate but not as a JSON string which the body of the request expects.
    ## We expect to get back a failed (400) response with an appropriate error message.
    def test_place_malformed_json_vote(self, boto_mock):
        # Input event to our method to test.
        # "body" receives a string rather than a JSON string.
        expected_event = {'httpMethod': 'POST', 'body': "Thor: Ragnarok"}
        # Call the function to test.
        result = index.handler(expected_event, {})
        # Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
        assert result.get('headers').get('Content-Type') == 'application/json'
        assert result.get('statusCode') == 400
        assert result.get('body') == 'Invalid input! Expecting a JSON.'

if __name__ == '__main__':
    unittest.main()

I am keeping the code samples well commented so that it’s clear what each unit test accomplishes. It tests the success conditions and the failure paths that are handled in the logic.

In my unit tests I use the patch decorator (@patch) in the mock library. @patch helps mock the function you want to call (in this case, the botocore library’s _make_api_call function in the BaseClient class).
Before we commit our changes, let’s run the tests locally. On the terminal, run the tests again. If all the unit tests pass, you should expect to see a result like this:

You:~/environment $ python -m unittest discover vote-your-movie/tests
.....
----------------------------------------------------------------------
Ran 5 tests in 0.003s

OK
You:~/environment $

Upload to AWS

Now that the tests have passed, it’s time to commit and push the code to source repository!

Add your changes

From the terminal, go to the project’s folder and use the following command to verify the changes you are about to push.

git status

To add the modified files only, use the following command:

git add -u

Commit your changes

To commit the changes (with a message), use the following command:

git commit -m "Logic and tests for the voting webservice."

Push your changes to AWS CodeCommit

To push your committed changes to CodeCommit, use the following command:

git push

In the AWS CodeStar console, you can see your changes flowing through the pipeline and being deployed. There are also links in the AWS CodeStar console that take you to this project’s build runs so you can see your tests running on AWS CodeBuild. The latest link under the Build Runs table takes you to the logs.

unit tests at codebuild

After the deployment is complete, AWS CodeStar should now display the AWS Lambda function and DynamoDB table created and synced with this project. The Project link in the AWS CodeStar project’s navigation bar displays the AWS resources linked to this project.

codestar resources

Because this is a new database table, there should be no data in it. So, let’s put in some votes. You can download Postman to test your application endpoint for POST and GET calls. The endpoint you want to test is the URL displayed under Application endpoints in the AWS CodeStar console.

Now let’s open Postman and look at the results. Let’s create some votes through POST requests. Based on this example, a valid vote has a value of A, B, C, or D.
Here’s what a successful POST request looks like:

POST success

Here’s what it looks like if I use some value other than A, B, C, or D:

 

POST Fail

Now I am going to use a GET request to fetch the results of the votes from the database.

GET success

And that’s it! You have now created a simple voting web service using AWS Lambda, Amazon API Gateway, and DynamoDB and used unit tests to verify your logic so that you ship good code.
Happy coding!

Real-Time Hotspot Detection in Amazon Kinesis Analytics

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/real-time-hotspot-detection-in-amazon-kinesis-analytics/

Today we’re releasing a new machine learning feature in Amazon Kinesis Data Analytics for detecting “hotspots” in your streaming data. We launched Kinesis Data Analytics in August of 2016 and we’ve continued to add features since. As you may already know, Kinesis Data Analytics is a fully managed real-time processing engine for streaming data that lets you write SQL queries to derive meaning from your data and output the results to Kinesis Data Firehose, Kinesis Data Streams, or even an AWS Lambda function. The new HOTSPOT function adds to the existing machine learning capabilities in Kinesis that allow customers to leverage unsupervised streaming based machine learning algorithms. Customers don’t need to be experts in data science or machine learning to take advantage of these capabilities.

Hotspots

The HOTSPOTS function is a new Kinesis Data Analytics SQL function you can use to idenitfy relatively dense regions in your data without having to explicity build and train complicated machine learning models. You can identify subsections of your data that need immediate attention and take action programatically by streaming the hotspots out to a Kinesis Data stream, to a Firehose delivery stream, or by invoking a AWS Lambda function.

There are a ton of really cool scenarios where this could make your operations easier. Imagine a ride-share program or autonomous vehicle fleet communicating spatiotemporal data about traffic jams and congestion, or a datacenter where a number of servers start to overheat indicating an HVAC issue. HOTSPOTS is not limited to spatiotemporal data and you could apply it across many problem domains.

The function follows some simple syntax and accepts the DOUBLE, INTEGER, FLOAT, TINYINT, SMALLINT, REAL, and BIGINT data types.

The HOTSPOT function takes a cursor as input and returns a JSON string describing the hotspot. This will be easier to understand with an example.

Using Kinesis Data Analytics to Detect Hotspots

Let’s take a simple data set from NY Taxi and Limousine Commission that tracks yellow cab pickup and dropoff locations. Most of this data is already on S3 and publicly accessible at s3://nyc-tlc/. We will create a small python script to load our Kinesis Data Stream with Taxi records which will feed our Kinesis Data Analytics. Finally we’ll output all of this to a Kinesis Data Firehose connected to an Amazon Elasticsearch Service cluster for visualization with Kibana. I know from living in New York for 5 years that we’ll probably find a hotspot or two in this data.

First, we’ll create an input Kinesis stream and start sending our NYC Taxi Ride data into it. I just wrote a quick python script to read from one of the CSV files and used boto3 to push the records into Kinesis. You can put the record in whatever way works for you.

 

import csv
import json
import boto3
def chunkit(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]

kinesis = boto3.client("kinesis")
with open("taxidata2.csv") as f:
    reader = csv.DictReader(f)
    records = chunkit([{"PartitionKey": "taxis", "Data": json.dumps(row)} for row in reader], 500)
    for chunk in records:
        kinesis.put_records(StreamName="TaxiData", Records=chunk)

Next, we’ll create the Kinesis Data Analytics application and add our input stream with our taxi data as the source.

Next we’ll automatically detect the schema.

Now we’ll create a quick SQL Script to detect our hotspots and add that to the Real Time Analytics section of our application.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    "pickup_longitude" DOUBLE,
    "pickup_latitude" DOUBLE,
    HOTSPOTS_RESULT VARCHAR(10000)
); 
CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" 
    SELECT "pickup_longitude", "pickup_latitude", "HOTSPOTS_RESULT" FROM
        TABLE(HOTSPOTS(
            CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001"),
            1000,
            0.013,
            20
        )
    );


Our HOTSPOTS function takes an input stream, a window size, scan radius, and a minimum number of points to count as a hotspot. The values for these are application dependent but you can tinker with them in the console easily until you get the results you want. There are more details about the parameters themselves in the documentation. The HOTSPOTS_RESULT returns some useful JSON that would let us plot bounding boxes around our hotspots:

{
  "hotspots": [
    {
      "density": "elided",
      "minValues": [40.7915039, -74.0077401],
      "maxValues": [40.7915041, -74.0078001]
    }
  ]
}

 

When we have our desired results we can save the script and connect our application to our Amazon Elastic Search Service Firehose Delivery Stream. We can run an intermediate lambda function in the firehose to transform our record into a format more suitable for geographic work. Then we can update our mapping in Elasticsearch to index the hotspot objects as Geo-Shapes.

Finally, we can connect to Kibana and visualize the results.

Looks like Manhattan is pretty busy!

Available Now
This feature is available now in all existing regions with Kinesis Data Analytics. I think this is a really interesting new feature of Kinesis Data Analytics that can bring immediate value to many applications. Let us know what you build with it on Twitter or in the comments!

Randall

Flight Sim Company Embeds Malware to Steal Pirates’ Passwords

Post Syndicated from Andy original https://torrentfreak.com/flight-sim-company-embeds-malware-to-steal-pirates-passwords-180219/

Anti-piracy systems and DRM come in all shapes and sizes, none of them particularly popular, but one deployed by flight sim company FlightSimLabs is likely to go down in history as one of the most outrageous.

It all started yesterday on Reddit when Flight Sim user ‘crankyrecursion’ reported a little extra something in his download of FlightSimLabs’ A320X module.

“Using file ‘FSLabs_A320X_P3D_v2.0.1.231.exe’ there seems to be a file called ‘test.exe’ included,” crankyrecursion wrote.

“This .exe file is from http://securityxploded.com and is touted as a ‘Chrome Password Dump’ tool, which seems to work – particularly as the installer would typically run with Administrative rights (UAC prompts) on Windows Vista and above. Can anyone shed light on why this tool is included in a supposedly trusted installer?”

The existence of a Chrome password dumping tool is certainly cause for alarm, especially if the software had been obtained from a less-than-official source, such as a torrent or similar site, given the potential for third-party pollution.

However, with the possibility of a nefarious third-party dumping something nasty in a pirate release still lurking on the horizon, things took an unexpected turn. FlightSimLabs chief Lefteris Kalamaras made a statement basically admitting that his company was behind the malware installation.

“We were made aware there is a Reddit thread started tonight regarding our latest installer and how a tool is included in it, that indiscriminately dumps Chrome passwords. That is not correct information – in fact, the Reddit thread was posted by a person who is not our customer and has somehow obtained our installer without purchasing,” Kalamaras wrote.

“[T]here are no tools used to reveal any sensitive information of any customer who has legitimately purchased our products. We all realize that you put a lot of trust in our products and this would be contrary to what we believe.

“There is a specific method used against specific serial numbers that have been identified as pirate copies and have been making the rounds on ThePirateBay, RuTracker and other such malicious sites,” he added.

In a nutshell, FlightSimLabs installed a password dumper onto ALL users’ machines, whether they were pirates or not, but then only activated the password-stealing module when it determined that specific ‘pirate’ serial numbers had been used which matched those on FlightSimLabs’ servers.

“Test.exe is part of the DRM and is only targeted against specific pirate copies of copyrighted software obtained illegally. That program is only extracted temporarily and is never under any circumstances used in legitimate copies of the product,” Kalamaras added.

That didn’t impress Luke Gorman, who published an analysis slamming the flight sim company for knowingly installing password-stealing malware on users machines, even those who purchased the title legitimately.

Password stealer in action (credit: Luke Gorman)

Making matters even worse, the FlightSimLabs chief went on to say that information being obtained from pirates’ machines in this manner is likely to be used in court or other legal processes.

“This method has already successfully provided information that we’re going to use in our ongoing legal battles against such criminals,” Kalamaras revealed.

While the use of the extracted passwords and usernames elsewhere will remain to be seen, it appears that FlightSimLabs has had a change of heart. With immediate effect, the company is pointing customers to a new installer that doesn’t include code for stealing their most sensitive data.

“I want to reiterate and reaffirm that we as a company and as flight simmers would never do anything to knowingly violate the trust that you have placed in us by not only buying our products but supporting them and FlightSimLabs,” Kalamaras said in an update.

“While the majority of our customers understand that the fight against piracy is a difficult and ongoing battle that sometimes requires drastic measures, we realize that a few of you were uncomfortable with this particular method which might be considered to be a bit heavy handed on our part. It is for this reason we have uploaded an updated installer that does not include the DRM check file in question.”

To be continued………

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Sublist3r – Fast Python Subdomain Enumeration Tool

Post Syndicated from Darknet original https://www.darknet.org.uk/2017/12/sublist3r-fast-python-subdomain-enumeration-tool/?utm_source=rss&utm_medium=social&utm_campaign=darknetfeed

Sublist3r – Fast Python Subdomain Enumeration Tool

Sublist3r is a Python-based tool designed to enumerate subdomains of websites using OSINT. It helps penetration testers and bug hunters collect and gather subdomains for the domain they are targeting.

It also integrates with subbrute for subdomain brute-forcing with word lists.

Features of Sublist3r Subdomain Enumeration Tool

It enumerates subdomains using many search engines such as:

  • Google
  • Yahoo
  • Bing
  • Baidu
  • Ask

The tool also enumerates subdomains using:

  • Netcraft
  • Virustotal
  • ThreatCrowd
  • DNSdumpster
  • ReverseDNS

Requirements of Sublist3r Subdomain Search

It currently supports Python 2 and Python 3.

Read the rest of Sublist3r – Fast Python Subdomain Enumeration Tool now! Only available at Darknet.

Resume AWS Step Functions from Any State

Post Syndicated from Andy Katz original https://aws.amazon.com/blogs/compute/resume-aws-step-functions-from-any-state/


Yash Pant, Solutions Architect, AWS


Aaron Friedman, Partner Solutions Architect, AWS

When we discuss how to build applications with customers, we often align to the Well Architected Framework pillars of security, reliability, performance efficiency, cost optimization, and operational excellence. Designing for failure is an essential component to developing well architected applications that are resilient to spurious errors that may occur.

There are many ways you can use AWS services to achieve high availability and resiliency of your applications. For example, you can couple Elastic Load Balancing with Auto Scaling and Amazon EC2 instances to build highly available applications. Or use Amazon API Gateway and AWS Lambda to rapidly scale out a microservices-based architecture. Many AWS services have built in solutions to help with the appropriate error handling, such as Dead Letter Queues (DLQ) for Amazon SQS or retries in AWS Batch.

AWS Step Functions is an AWS service that makes it easy for you to coordinate the components of distributed applications and microservices. Step Functions allows you to easily design for failure, by incorporating features such as error retries and custom error handling from AWS Lambda exceptions. These features allow you to programmatically handle many common error modes and build robust, reliable applications.

In some rare cases, however, your application may fail in an unexpected manner. In these situations, you might not want to duplicate in a repeat execution those portions of your state machine that have already run. This is especially true when orchestrating long-running jobs or executing a complex state machine as part of a microservice. Here, you need to know the last successful state in your state machine from which to resume, so that you don’t duplicate previous work. In this post, we present a solution to enable you to resume from any given state in your state machine in the case of an unexpected failure.

Resuming from a given state

To resume a failed state machine execution from the state at which it failed, you first run a script that dynamically creates a new state machine. When the new state machine is executed, it resumes the failed execution from the point of failure. The script contains the following two primary steps:

  1. Parse the execution history of the failed execution to find the name of the state at which it failed, as well as the JSON input to that state.
  2. Create a new state machine, which adds an additional state to failed state machine, called "GoToState". "GoToState" is a choice state at the beginning of the state machine that branches execution directly to the failed state, allowing you to skip states that had succeeded in the previous execution.

The full script along with a CloudFormation template that creates a demo of this is available in the aws-sfn-resume-from-any-state GitHub repo.

Diving into the script

In this section, we walk you through the script and highlight the core components of its functionality. The script contains a main function, which adds a command line parameter for the failedExecutionArn so that you can easily call the script from the command line:

python gotostate.py --failedExecutionArn '<Failed_Execution_Arn>'

Identifying the failed state in your execution

First, the script extracts the name of the failed state along with the input to that state. It does so by using the failed state machine execution history, which is identified by the Amazon Resource Name (ARN) of the execution. The failed state is marked in the execution history, along with the input to that state (which is also the output of the preceding successful state). The script is able to parse these values from the log.

The script loops through the execution history of the failed state machine, and traces it backwards until it finds the failed state. If the state machine failed in a parallel state, then it must restart from the beginning of the parallel state. The script is able to capture the name of the parallel state that failed, rather than any substate within the parallel state that may have caused the failure. The following code is the Python function that does this.


def parseFailureHistory(failedExecutionArn):

    '''
    Parses the execution history of a failed state machine to get the name of failed state and the input to the failed state:
    Input failedExecutionArn = A string containing the execution ARN of a failed state machine y
    Output = A list with two elements: [name of failed state, input to failed state]
    '''
    failedAtParallelState = False
    try:
        #Get the execution history
        response = client.get\_execution\_history(
            executionArn=failedExecutionArn,
            reverseOrder=True
        )
        failedEvents = response['events']
    except Exception as ex:
        raise ex
    #Confirm that the execution actually failed, raise exception if it didn't fail.
    try:
        failedEvents[0]['executionFailedEventDetails']
    except:
        raise('Execution did not fail')
        
    '''
    If you have a 'States.Runtime' error (for example, if a task state in your state machine attempts to execute a Lambda function in a different region than the state machine), get the ID of the failed state, and use it to determine the failed state name and input.
    '''
    
    if failedEvents[0]['executionFailedEventDetails']['error'] == 'States.Runtime':
        failedId = int(filter(str.isdigit, str(failedEvents[0]['executionFailedEventDetails']['cause'].split()[13])))
        failedState = failedEvents[-1 \* failedId]['stateEnteredEventDetails']['name']
        failedInput = failedEvents[-1 \* failedId]['stateEnteredEventDetails']['input']
        return (failedState, failedInput)
        
    '''
    You need to loop through the execution history, tracing back the executed steps.
    The first state you encounter is the failed state. If you failed on a parallel state, you need the name of the parallel state rather than the name of a state within a parallel state that it failed on. This is because you can only attach goToState to the parallel state, but not a substate within the parallel state.
    This loop starts with the ID of the latest event and uses the previous event IDs to trace back the execution to the beginning (id 0). However, it returns as soon it finds the name of the failed state.
    '''

    currentEventId = failedEvents[0]['id']
    while currentEventId != 0:
        #multiply event ID by -1 for indexing because you're looking at the reversed history
        currentEvent = failedEvents[-1 \* currentEventId]
        
        '''
        You can determine if the failed state was a parallel state because it and an event with 'type'='ParallelStateFailed' appears in the execution history before the name of the failed state
        '''

        if currentEvent['type'] == 'ParallelStateFailed':
            failedAtParallelState = True

        '''
        If the failed state is not a parallel state, then the name of failed state to return is the name of the state in the first 'TaskStateEntered' event type you run into when tracing back the execution history
        '''

        if currentEvent['type'] == 'TaskStateEntered' and failedAtParallelState == False:
            failedState = currentEvent['stateEnteredEventDetails']['name']
            failedInput = currentEvent['stateEnteredEventDetails']['input']
            return (failedState, failedInput)

        '''
        If the failed state was a parallel state, then you need to trace execution back to the first event with 'type'='ParallelStateEntered', and return the name of the state
        '''

        if currentEvent['type'] == 'ParallelStateEntered' and failedAtParallelState:
            failedState = failedState = currentEvent['stateEnteredEventDetails']['name']
            failedInput = currentEvent['stateEnteredEventDetails']['input']
            return (failedState, failedInput)
        #Update the ID for the next execution of the loop
        currentEventId = currentEvent['previousEventId']
        

Create the new state machine

The script uses the name of the failed state to create the new state machine, with "GoToState" branching execution directly to the failed state.

To do this, the script requires the Amazon States Language (ASL) definition of the failed state machine. It modifies the definition to append "GoToState", and create a new state machine from it.

The script gets the ARN of the failed state machine from the execution ARN of the failed state machine. This ARN allows it to get the ASL definition of the failed state machine by calling the DesribeStateMachine API action. It creates a new state machine with "GoToState".

When the script creates the new state machine, it also adds an additional input variable called "resuming". When you execute this new state machine, you specify this resuming variable as true in the input JSON. This tells "GoToState" to branch execution to the state that had previously failed. Here’s the function that does this:

def attachGoToState(failedStateName, stateMachineArn):

    '''
    Given a state machine ARN and the name of a state in that state machine, create a new state machine that starts at a new choice state called 'GoToState'. "GoToState" branches to the named state, and sends the input of the state machine to that state, when a variable called "resuming" is set to True.
    Input failedStateName = A string with the name of the failed state
          stateMachineArn = A string with the ARN of the state machine
    Output response from the create_state_machine call, which is the API call that creates a new state machine
    '''

    try:
        response = client.describe\_state\_machine(
            stateMachineArn=stateMachineArn
        )
    except:
        raise('Could not get ASL definition of state machine')
    roleArn = response['roleArn']
    stateMachine = json.loads(response['definition'])
    #Create a name for the new state machine
    newName = response['name'] + '-with-GoToState'
    #Get the StartAt state for the original state machine, because you point the 'GoToState' to this state
    originalStartAt = stateMachine['StartAt']

    '''
    Create the GoToState with the variable $.resuming.
    If new state machine is executed with $.resuming = True, then the state machine skips to the failed state.
    Otherwise, it executes the state machine from the original start state.
    '''

    goToState = {'Type':'Choice', 'Choices':[{'Variable':'$.resuming', 'BooleanEquals':False, 'Next':originalStartAt}], 'Default':failedStateName}
    #Add GoToState to the set of states in the new state machine
    stateMachine['States']['GoToState'] = goToState
    #Add StartAt
    stateMachine['StartAt'] = 'GoToState'
    #Create new state machine
    try:
        response = client.create_state_machine(
            name=newName,
            definition=json.dumps(stateMachine),
            roleArn=roleArn
        )
    except:
        raise('Failed to create new state machine with GoToState')
    return response

Testing the script

Now that you understand how the script works, you can test it out.

The following screenshot shows an example state machine that has failed, called "TestMachine". This state machine successfully completed "FirstState" and "ChoiceState", but when it branched to "FirstMatchState", it failed.

Use the script to create a new state machine that allows you to rerun this state machine, but skip the "FirstState" and the "ChoiceState" steps that already succeeded. You can do this by calling the script as follows:

python gotostate.py --failedExecutionArn 'arn:aws:states:us-west-2:<AWS_ACCOUNT_ID>:execution:TestMachine-with-GoToState:b2578403-f41d-a2c7-e70c-7500045288595

This creates a new state machine called "TestMachine-with-GoToState", and returns its ARN, along with the input that had been sent to "FirstMatchState". You can then inspect the input to determine what caused the error. In this case, you notice that the input to "FirstMachState" was the following:

{
"foo": 1,
"Message": true
}

However, this state machine expects the "Message" field of the JSON to be a string rather than a Boolean. Execute the new "TestMachine-with-GoToState" state machine, change the input to be a string, and add the "resuming" variable that "GoToState" requires:

{
"foo": 1,
"Message": "Hello!",
"resuming":true
}

When you execute the new state machine, it skips "FirstState" and "ChoiceState", and goes directly to "FirstMatchState", which was the state that failed:

Look at what happens when you have a state machine with multiple parallel steps. This example is included in the GitHub repository associated with this post. The repo contains a CloudFormation template that sets up this state machine and provides instructions to replicate this solution.

The following state machine, "ParallelStateMachine", takes an input through two subsequent parallel states before doing some final processing and exiting, along with the JSON with the ASL definition of the state machine.

{
  "Comment": "An example of the Amazon States Language using a parallel state to execute two branches at the same time.",
  "StartAt": "Parallel",
  "States": {
    "Parallel": {
      "Type": "Parallel",
      "ResultPath":"$.output",
      "Next": "Parallel 2",
      "Branches": [
        {
          "StartAt": "Parallel Step 1, Process 1",
          "States": {
            "Parallel Step 1, Process 1": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaA",
              "End": true
            }
          }
        },
        {
          "StartAt": "Parallel Step 1, Process 2",
          "States": {
            "Parallel Step 1, Process 2": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaA",
              "End": true
            }
          }
        }
      ]
    },
    "Parallel 2": {
      "Type": "Parallel",
      "Next": "Final Processing",
      "Branches": [
        {
          "StartAt": "Parallel Step 2, Process 1",
          "States": {
            "Parallel Step 2, Process 1": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXXX:function:LambdaB",
              "End": true
            }
          }
        },
        {
          "StartAt": "Parallel Step 2, Process 2",
          "States": {
            "Parallel Step 2, Process 2": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaB",
              "End": true
            }
          }
        }
      ]
    },
    "Final Processing": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaC",
      "End": true
    }
  }
}

First, use an input that initially fails:

{
  "Message": "Hello!"
}

This fails because the state machine expects you to have a variable in the input JSON called "foo" in the second parallel state to run "Parallel Step 2, Process 1" and "Parallel Step 2, Process 2". Instead, the original input gets processed by the first parallel state and produces the following output to pass to the second parallel state:

{
"output": [
    {
      "Message": "Hello!"
    },
    {
      "Message": "Hello!"
    }
  ],
}

Run the script on the failed state machine to create a new state machine that allows it to resume directly at the second parallel state instead of having to redo the first parallel state. This creates a new state machine called "ParallelStateMachine-with-GoToState". The following JSON was created by the script to define the new state machine in ASL. It contains the "GoToState" value that was attached by the script.

{
   "Comment":"An example of the Amazon States Language using a parallel state to execute two branches at the same time.",
   "States":{
      "Final Processing":{
         "Resource":"arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaC",
         "End":true,
         "Type":"Task"
      },
      "GoToState":{
         "Default":"Parallel 2",
         "Type":"Choice",
         "Choices":[
            {
               "Variable":"$.resuming",
               "BooleanEquals":false,
               "Next":"Parallel"
            }
         ]
      },
      "Parallel":{
         "Branches":[
            {
               "States":{
                  "Parallel Step 1, Process 1":{
                     "Resource":"arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaA",
                     "End":true,
                     "Type":"Task"
                  }
               },
               "StartAt":"Parallel Step 1, Process 1"
            },
            {
               "States":{
                  "Parallel Step 1, Process 2":{
                     "Resource":"arn:aws:lambda:us-west-2:XXXXXXXXXXXX:LambdaA",
                     "End":true,
                     "Type":"Task"
                  }
               },
               "StartAt":"Parallel Step 1, Process 2"
            }
         ],
         "ResultPath":"$.output",
         "Type":"Parallel",
         "Next":"Parallel 2"
      },
      "Parallel 2":{
         "Branches":[
            {
               "States":{
                  "Parallel Step 2, Process 1":{
                     "Resource":"arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaB",
                     "End":true,
                     "Type":"Task"
                  }
               },
               "StartAt":"Parallel Step 2, Process 1"
            },
            {
               "States":{
                  "Parallel Step 2, Process 2":{
                     "Resource":"arn:aws:lambda:us-west-2:XXXXXXXXXXXX:function:LambdaB",
                     "End":true,
                     "Type":"Task"
                  }
               },
               "StartAt":"Parallel Step 2, Process 2"
            }
         ],
         "Type":"Parallel",
         "Next":"Final Processing"
      }
   },
   "StartAt":"GoToState"
}

You can then execute this state machine with the correct input by adding the "foo" and "resuming" variables:

{
  "foo": 1,
  "output": [
    {
      "Message": "Hello!"
    },
    {
      "Message": "Hello!"
    }
  ],
  "resuming": true
}

This yields the following result. Notice that this time, the state machine executed successfully to completion, and skipped the steps that had previously failed.


Conclusion

When you’re building out complex workflows, it’s important to be prepared for failure. You can do this by taking advantage of features such as automatic error retries in Step Functions and custom error handling of Lambda exceptions.

Nevertheless, state machines still have the possibility of failing. With the methodology and script presented in this post, you can resume a failed state machine from its point of failure. This allows you to skip the execution of steps in the workflow that had already succeeded, and recover the process from the point of failure.

To see more examples, please visit the Step Functions Getting Started page.

If you have questions or suggestions, please comment below.

Visualize AWS Cloudtrail Logs using AWS Glue and Amazon Quicksight

Post Syndicated from Luis Caro Perez original https://aws.amazon.com/blogs/big-data/streamline-aws-cloudtrail-log-visualization-using-aws-glue-and-amazon-quicksight/

Being able to easily visualize AWS CloudTrail logs gives you a better understanding of how your AWS infrastructure is being used. It can also help you audit and review AWS API calls and detect security anomalies inside your AWS account. To do this, you must be able to perform analytics based on your CloudTrail logs.

In this post, I walk through using AWS Glue and AWS Lambda to convert AWS CloudTrail logs from JSON to a query-optimized format dataset in Amazon S3. I then use Amazon Athena and Amazon QuickSight to query and visualize the data.

Solution overview

To process CloudTrail logs, you must implement the following architecture:

CloudTrail delivers log files in an Amazon S3 bucket folder. To correctly crawl these logs, you modify the file contents and folder structure using an Amazon S3-triggered Lambda function that stores the transformed files in an S3 bucket single folder. When the files are in a single folder, AWS Glue scans the data, converts it into Apache Parquet format, and catalogs it to allow for querying and visualization using Amazon Athena and Amazon QuickSight.

Walkthrough

Let’s look at the steps that are required to build the solution.

Set up CloudTrail logs

First, you need to set up a trail that delivers log files to an S3 bucket. To create a trail in CloudTrail, follow the instructions in Creating a Trail.

When you finish, the trail settings page should look like the following screenshot:

In this example, I set up log files to be delivered to the cloudtraillfcaro bucket.

Consolidate CloudTrail reports into a single folder using Lambda

AWS CloudTrail delivers log files using the following folder structure inside the configured Amazon S3 bucket:

AWSLogs/ACCOUNTID/CloudTrail/REGION/YEAR/MONTH/HOUR/filename.json.gz

Additionally, log files have the following structure:

{
    "Records": [{
        "eventVersion": "1.01",
        "userIdentity": {
            "type": "IAMUser",
            "principalId": "AIDAJDPLRKLG7UEXAMPLE",
            "arn": "arn:aws:iam::123456789012:user/Alice",
            "accountId": "123456789012",
            "accessKeyId": "AKIAIOSFODNN7EXAMPLE",
            "userName": "Alice",
            "sessionContext": {
                "attributes": {
                    "mfaAuthenticated": "false",
                    "creationDate": "2014-03-18T14:29:23Z"
                }
            }
        },
        "eventTime": "2014-03-18T14:30:07Z",
        "eventSource": "cloudtrail.amazonaws.com",
        "eventName": "StartLogging",
        "awsRegion": "us-west-2",
        "sourceIPAddress": "72.21.198.64",
        "userAgent": "signin.amazonaws.com",
        "requestParameters": {
            "name": "Default"
        },
        "responseElements": null,
        "requestID": "cdc73f9d-aea9-11e3-9d5a-835b769c0d9c",
        "eventID": "3074414d-c626-42aa-984b-68ff152d6ab7"
    },
    ... additional entries ...
    ]

If AWS Glue crawlers are used to catalog these files as they are written, the following obstacles arise:

  1. AWS Glue identifies different tables per different folders because they don’t follow a traditional partition format.
  2. Based on the structure of the file content, AWS Glue identifies the tables as having a single column of type array.
  3. CloudTrail logs have JSON attributes that use uppercase letters. According to the Best Practices When Using Athena with AWS Glue, it is recommended that you convert these to lowercase.

To have AWS Glue catalog all log files in a single table with all the columns describing each event, implement the following Lambda function:

from __future__ import print_function
import json
import urllib
import boto3
import gzip

s3 = boto3.resource('s3')
client = boto3.client('s3')

def convertColumntoLowwerCaps(obj):
    for key in obj.keys():
        new_key = key.lower()
        if new_key != key:
            obj[new_key] = obj[key]
            del obj[key]
    return obj


def lambda_handler(event, context):

    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode('utf8'))
    print(bucket)
    print(key)
    try:
        newKey = 'flatfiles/' + key.replace("/", "")
        client.download_file(bucket, key, '/tmp/file.json.gz')
        with gzip.open('/tmp/out.json.gz', 'w') as output, gzip.open('/tmp/file.json.gz', 'rb') as file:
            i = 0
            for line in file: 
                for record in json.loads(line,object_hook=convertColumntoLowwerCaps)['records']:
            		if i != 0:
            		    output.write("\n")
            		output.write(json.dumps(record))
            		i += 1
        client.upload_file('/tmp/out.json.gz', bucket,newKey)
        return "success"
    except Exception as e:
        print(e)
        print('Error processing object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
        raise e

The function goes over each element of the records array, changes uppercase letters to lowercase in column names, and inserts each element of the array as a single line of a new file. The new file is saved inside a flatfiles folder created by the function without any subfolders in the S3 bucket.

The function should have a role containing a policy with at least the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::cloudtraillfcaro/*",
                "arn:aws:s3:::cloudtraillfcaro"
            ],
            "Effect": "Allow"
        }
    ]
}

In this example, CloudTrail delivers logs to the cloudtraillfcaro bucket. Make sure that you replace this name with your bucket name in the policy. For more information about how to work with inline policies, see Working with Inline Policies.

After the Lambda function is created, you can set up the following trigger using the Triggers tab on the AWS Lambda console.

Choose Add trigger, and choose S3 as a source of the trigger.

After choosing the source, configure the following settings:

In the trigger, any file that is written to the path for the log files—which in this case is AWSLogs/119582755581/CloudTrail/—is processed. Make sure that the Enable trigger check box is selected and that the bucket and prefix parameters match your use case.

After you set up the function and receive log files, the bucket (in this case cloudtraillfcaro) should contain the processed files inside the flatfiles folder.

Catalog source data

Once the files are processed by the Lambda function, set up a crawler named cloudtrail to catalog them.

The crawler must point to the flatfiles folder.

All the crawlers and AWS Glue jobs created for this solution must have a role with the AWSGlueServiceRole managed policy and an inline policy with permissions to modify the S3 buckets used on the Lambda function. For more information, see Working with Managed Policies.

The role should look like the following:

In this example, the inline policy named s3perms contains the permissions to modify the S3 buckets.

After you choose the role, you can schedule the crawler to run on demand.

A new database is created, and the crawler is set to use it. In this case, the cloudtrail database is used for all the tables.

After the crawler runs, a single table should be created in the catalog with the following structure:

The table should contain the following columns:

Create and run the AWS Glue job

To convert all the CloudTrail logs to a columnar store in Parquet, set up an AWS Glue job by following these steps.

Upload the following script into a bucket in Amazon S3:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import boto3
import time

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "cloudtrail", table_name = "flatfiles", transformation_ctx = "datasource0")
resolvechoice1 = ResolveChoice.apply(frame = datasource0, choice = "make_struct", transformation_ctx = "resolvechoice1")
relationalized1 = resolvechoice1.relationalize("trail", args["TempDir"]).select("trail")
datasink = glueContext.write_dynamic_frame.from_options(frame = relationalized1, connection_type = "s3", connection_options = {"path": "s3://cloudtraillfcaro/parquettrails"}, format = "parquet", transformation_ctx = "datasink4")
job.commit()

In the example, you load the script as a file named cloudtrailtoparquet.py. Make sure that you modify the script and update the “{"path": "s3://cloudtraillfcaro/parquettrails"}” with the destination in which you want to store your results.

After uploading the script, add a new AWS Glue job. Choose a name and role for the job, and choose the option of running the job from An existing script that you provide.

To avoid processing the same data twice, enable the Job bookmark setting in the Advanced properties section of the job properties.

Choose Next twice, and then choose Finish.

If logs are already in the flatfiles folder, you can run the job on demand to generate the first set of results.

Once the job starts running, wait for it to complete.

When the job is finished, its Run status should be Succeeded. After that, you can verify that the Parquet files are written to the Amazon S3 location.

Catalog results

To be able to process results from Athena, you can use an AWS Glue crawler to catalog the results of the AWS Glue job.

In this example, the crawler is set to use the same database as the source named cloudtrail.

You can run the crawler using the console. When the crawler finishes running and has processed the Parquet results, a new table should be created in the AWS Glue Data Catalog. In this example, it’s named parquettrails.

The table should have the classification set to parquet.

It should have the same columns as the flatfiles table, with the exception of the struct type columns, which should be relationalized into several columns:

In this example, notice how the requestparameters column, which was a struct in the original table (flatfiles), was transformed to several columns—one for each key value inside it. This is done using a transformation native to AWS Glue called relationalize.

Query results with Athena

After crawling the results, you can query them using Athena. For example, to query what events took place in the time frame between 2017-10-23t12:00:00 and 2017-10-23t13:00, use the following select statement:

select *
from cloudtrail.parquettrails
where eventtime > '2017-10-23T12:00:00Z' AND eventtime < '2017-10-23T13:00:00Z'
order by eventtime asc;

Be sure to replace cloudtrail.parquettrails with the names of your database and table that references the Parquet results. Replace the datetimes with an hour when your account had activity and was processed by the AWS Glue job.

Visualize results using Amazon QuickSight

Once you can query the data using Athena, you can visualize it using Amazon QuickSight. Before connecting Amazon QuickSight to Athena, be sure to grant QuickSight access to Athena and the associated S3 buckets in your account. For more information, see Managing Amazon QuickSight Permissions to AWS Resources. You can then create a new data set in Amazon QuickSight based on the Athena table that you created.

After setting up permissions, you can create a new analysis in Amazon QuickSight by choosing New analysis.

Then add a new data set.

Choose Athena as the source.

Give the data source a name (in this case, I named it cloudtrail).

Choose the name of the database and the table referencing the Parquet results.

Then choose Visualize.

After that, you should see the following screen:

Now you can create some visualizations. First, search for the sourceipaddress column, and drag it to the AutoGraph section.

You can see a list of the IP addresses that you have used to interact with AWS. To review whether these IP addresses have been used from IAM users, internal AWS services, or roles, use the type value that is inside the useridentity field of the original log files. Thanks to the relationalize transformation, this value is available as the useridentity.type column. After the column is added into the Group/Color box, the visualization should look like the following:

You can now see and distinguish the most used IPs and whether they are used from roles, AWS services, or IAM users.

After following all these steps, you can use Amazon QuickSight to add different columns from CloudTrail and perform different types of visualizations. You can build operational dashboards that continuously monitor AWS infrastructure usage and access. You can share those dashboards with others in your organization who might need to see this data.

Summary

In this post, you saw how you can use a simple Lambda function and an AWS Glue script to convert text files into Parquet to improve Athena query performance and data compression. The post also demonstrated how to use AWS Lambda to preprocess files in Amazon S3 and transform them into a format that is recognizable by AWS Glue crawlers.

This example, used AWS CloudTrail logs, but you can apply the proposed solution to any set of files that after preprocessing, can be cataloged by AWS Glue.


Additional Reading

Learn how to Harmonize, Query, and Visualize Data from Various Providers using AWS Glue, Amazon Athena, and Amazon QuickSight.


About the Authors

Luis Caro is a Big Data Consultant for AWS Professional Services. He works with our customers to provide guidance and technical assistance on big data projects, helping them improving the value of their solutions when using AWS.

 

 

 

PS4 Piracy Now Exists – If Gamers Want to Jump Through Hoops

Post Syndicated from Andy original https://torrentfreak.com/ps4-piracy-now-exists-if-gamers-want-to-jump-through-hoops-170930/

During the reign of the first few generations of consoles, gamers became accustomed to their machines being compromised by hacking groups and enthusiasts, to enable the execution of third-party software.

Often carried out under the banner of running “homebrew” code, so-called jailbroken consoles also brought with them the prospect of running pirate copies of officially produced games. Once the floodgates were opened, not much could hold things back.

With the advent of mass online gaming, however, things became more complex. Regular firmware updates mean that security holes could be fixed remotely whenever a user went online, rendering the jailbreaking process a cat-and-mouse game with continually moving targets.

This, coupled with massively improved overall security, has meant that the current generation of consoles has remained largely piracy free, at least on a do-it-at-home basis. Now, however, that position is set to change after the first decrypted PS4 game dumps began to hit the web this week.

Thanks to release group KOTF (Knights of the Fallen), Grand Theft Auto V, Far Cry 4, and Assassins Creed IV are all available for download from the usual places. As expected they are pretty meaty downloads, with GTAV weighing in via 90 x 500MB files, Far Cry4 via 54 of the same size, and ACIV sporting 84 x 250MB.

Partial NFO file for PS4 GTA V

While undoubtedly large, it’s not the filesize that will prove most prohibitive when it comes to getting these beasts to run on a PlayStation 4. Indeed, a potential pirate will need to jump through a number of hoops to enjoy any of these titles or others that may appear in the near future.

KOTF explains as much in the NFO (information) files it includes with its releases. The list of requirements is long.

First up, a gamer needs to possess a PS4 with an extremely old firmware version – v1.76 – which was released way back in August 2014. The fact this firmware is required doesn’t come as a surprise since it was successfully jailbroken back in December 2015.

The age of the firmware raises several issues, not least where people can obtain a PS4 that’s so old it still has this firmware intact. Also, newer games require later firmware, so most games released during the past two to three years won’t be compatible with v1.76. That limits the pool of games considerably.

Finally, forget going online with such an old software version. Sony will be all over it like a cheap suit, plotting to do something unpleasant to that cheeky antique code, given half a chance. And, for anyone wondering, downgrading a higher firmware version to v1.76 isn’t possible – yet.

But for gamers who want a little bit of recent PS4 nostalgia on the cheap, ‘all’ they have to do is gather the necessary tools together and follow the instructions below.

Easy – when you know how

While this is a landmark moment for PS4 piracy (which to date has mainly centered around much hocus pocus), the limitations listed above mean that it isn’t going to hit the mainstream just yet.

That being said, all things are possible when given the right people, determination, and enough time. Whether that will be anytime soon is anyone’s guess but there are rumors that firmware v4.55 has already been exploited, so you never know.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Analyzing AWS Cost and Usage Reports with Looker and Amazon Athena

Post Syndicated from Dillon Morrison original https://aws.amazon.com/blogs/big-data/analyzing-aws-cost-and-usage-reports-with-looker-and-amazon-athena/

This is a guest post by Dillon Morrison at Looker. Looker is, in their own words, “a new kind of analytics platform–letting everyone in your business make better decisions by getting reliable answers from a tool they can use.” 

As the breadth of AWS products and services continues to grow, customers are able to more easily move their technology stack and core infrastructure to AWS. One of the attractive benefits of AWS is the cost savings. Rather than paying upfront capital expenses for large on-premises systems, customers can instead pay variables expenses for on-demand services. To further reduce expenses AWS users can reserve resources for specific periods of time, and automatically scale resources as needed.

The AWS Cost Explorer is great for aggregated reporting. However, conducting analysis on the raw data using the flexibility and power of SQL allows for much richer detail and insight, and can be the better choice for the long term. Thankfully, with the introduction of Amazon Athena, monitoring and managing these costs is now easier than ever.

In the post, I walk through setting up the data pipeline for cost and usage reports, Amazon S3, and Athena, and discuss some of the most common levers for cost savings. I surface tables through Looker, which comes with a host of pre-built data models and dashboards to make analysis of your cost and usage data simple and intuitive.

Analysis with Athena

With Athena, there’s no need to create hundreds of Excel reports, move data around, or deploy clusters to house and process data. Athena uses Apache Hive’s DDL to create tables, and the Presto querying engine to process queries. Analysis can be performed directly on raw data in S3. Conveniently, AWS exports raw cost and usage data directly into a user-specified S3 bucket, making it simple to start querying with Athena quickly. This makes continuous monitoring of costs virtually seamless, since there is no infrastructure to manage. Instead, users can leverage the power of the Athena SQL engine to easily perform ad-hoc analysis and data discovery without needing to set up a data warehouse.

After the data pipeline is established, cost and usage data (the recommended billing data, per AWS documentation) provides a plethora of comprehensive information around usage of AWS services and the associated costs. Whether you need the report segmented by product type, user identity, or region, this report can be cut-and-sliced any number of ways to properly allocate costs for any of your business needs. You can then drill into any specific line item to see even further detail, such as the selected operating system, tenancy, purchase option (on-demand, spot, or reserved), and so on.

Walkthrough

By default, the Cost and Usage report exports CSV files, which you can compress using gzip (recommended for performance). There are some additional configuration options for tuning performance further, which are discussed below.

Prerequisites

If you want to follow along, you need the following resources:

Enable the cost and usage reports

First, enable the Cost and Usage report. For Time unit, select Hourly. For Include, select Resource IDs. All options are prompted in the report-creation window.

The Cost and Usage report dumps CSV files into the specified S3 bucket. Please note that it can take up to 24 hours for the first file to be delivered after enabling the report.

Configure the S3 bucket and files for Athena querying

In addition to the CSV file, AWS also creates a JSON manifest file for each cost and usage report. Athena requires that all of the files in the S3 bucket are in the same format, so we need to get rid of all these manifest files. If you’re looking to get started with Athena quickly, you can simply go into your S3 bucket and delete the manifest file manually, skip the automation described below, and move on to the next section.

To automate the process of removing the manifest file each time a new report is dumped into S3, which I recommend as you scale, there are a few additional steps. The folks at Concurrency labs wrote a great overview and set of scripts for this, which you can find in their GitHub repo.

These scripts take the data from an input bucket, remove anything unnecessary, and dump it into a new output bucket. We can utilize AWS Lambda to trigger this process whenever new data is dropped into S3, or on a nightly basis, or whatever makes most sense for your use-case, depending on how often you’re querying the data. Please note that enabling the “hourly” report means that data is reported at the hour-level of granularity, not that a new file is generated every hour.

Following these scripts, you’ll notice that we’re adding a date partition field, which isn’t necessary but improves query performance. In addition, converting data from CSV to a columnar format like ORC or Parquet also improves performance. We can automate this process using Lambda whenever new data is dropped in our S3 bucket. Amazon Web Services discusses columnar conversion at length, and provides walkthrough examples, in their documentation.

As a long-term solution, best practice is to use compression, partitioning, and conversion. However, for purposes of this walkthrough, we’re not going to worry about them so we can get up-and-running quicker.

Set up the Athena query engine

In your AWS console, navigate to the Athena service, and click “Get Started”. Follow the tutorial and set up a new database (we’ve called ours “AWS Optimizer” in this example). Don’t worry about configuring your initial table, per the tutorial instructions. We’ll be creating a new table for cost and usage analysis. Once you walked through the tutorial steps, you’ll be able to access the Athena interface, and can begin running Hive DDL statements to create new tables.

One thing that’s important to note, is that the Cost and Usage CSVs also contain the column headers in their first row, meaning that the column headers would be included in the dataset and any queries. For testing and quick set-up, you can remove this line manually from your first few CSV files. Long-term, you’ll want to use a script to programmatically remove this row each time a new file is dropped in S3 (every few hours typically). We’ve drafted up a sample script for ease of reference, which we run on Lambda. We utilize Lambda’s native ability to invoke the script whenever a new object is dropped in S3.

For cost and usage, we recommend using the DDL statement below. Since our data is in CSV format, we don’t need to use a SerDe, we can simply specify the “separatorChar, quoteChar, and escapeChar”, and the structure of the files (“TEXTFILE”). Note that AWS does have an OpenCSV SerDe as well, if you prefer to use that.

 

CREATE EXTERNAL TABLE IF NOT EXISTS cost_and_usage	 (
identity_LineItemId String,
identity_TimeInterval String,
bill_InvoiceId String,
bill_BillingEntity String,
bill_BillType String,
bill_PayerAccountId String,
bill_BillingPeriodStartDate String,
bill_BillingPeriodEndDate String,
lineItem_UsageAccountId String,
lineItem_LineItemType String,
lineItem_UsageStartDate String,
lineItem_UsageEndDate String,
lineItem_ProductCode String,
lineItem_UsageType String,
lineItem_Operation String,
lineItem_AvailabilityZone String,
lineItem_ResourceId String,
lineItem_UsageAmount String,
lineItem_NormalizationFactor String,
lineItem_NormalizedUsageAmount String,
lineItem_CurrencyCode String,
lineItem_UnblendedRate String,
lineItem_UnblendedCost String,
lineItem_BlendedRate String,
lineItem_BlendedCost String,
lineItem_LineItemDescription String,
lineItem_TaxType String,
product_ProductName String,
product_accountAssistance String,
product_architecturalReview String,
product_architectureSupport String,
product_availability String,
product_bestPractices String,
product_cacheEngine String,
product_caseSeverityresponseTimes String,
product_clockSpeed String,
product_currentGeneration String,
product_customerServiceAndCommunities String,
product_databaseEdition String,
product_databaseEngine String,
product_dedicatedEbsThroughput String,
product_deploymentOption String,
product_description String,
product_durability String,
product_ebsOptimized String,
product_ecu String,
product_endpointType String,
product_engineCode String,
product_enhancedNetworkingSupported String,
product_executionFrequency String,
product_executionLocation String,
product_feeCode String,
product_feeDescription String,
product_freeQueryTypes String,
product_freeTrial String,
product_frequencyMode String,
product_fromLocation String,
product_fromLocationType String,
product_group String,
product_groupDescription String,
product_includedServices String,
product_instanceFamily String,
product_instanceType String,
product_io String,
product_launchSupport String,
product_licenseModel String,
product_location String,
product_locationType String,
product_maxIopsBurstPerformance String,
product_maxIopsvolume String,
product_maxThroughputvolume String,
product_maxVolumeSize String,
product_maximumStorageVolume String,
product_memory String,
product_messageDeliveryFrequency String,
product_messageDeliveryOrder String,
product_minVolumeSize String,
product_minimumStorageVolume String,
product_networkPerformance String,
product_operatingSystem String,
product_operation String,
product_operationsSupport String,
product_physicalProcessor String,
product_preInstalledSw String,
product_proactiveGuidance String,
product_processorArchitecture String,
product_processorFeatures String,
product_productFamily String,
product_programmaticCaseManagement String,
product_provisioned String,
product_queueType String,
product_requestDescription String,
product_requestType String,
product_routingTarget String,
product_routingType String,
product_servicecode String,
product_sku String,
product_softwareType String,
product_storage String,
product_storageClass String,
product_storageMedia String,
product_technicalSupport String,
product_tenancy String,
product_thirdpartySoftwareSupport String,
product_toLocation String,
product_toLocationType String,
product_training String,
product_transferType String,
product_usageFamily String,
product_usagetype String,
product_vcpu String,
product_version String,
product_volumeType String,
product_whoCanOpenCases String,
pricing_LeaseContractLength String,
pricing_OfferingClass String,
pricing_PurchaseOption String,
pricing_publicOnDemandCost String,
pricing_publicOnDemandRate String,
pricing_term String,
pricing_unit String,
reservation_AvailabilityZone String,
reservation_NormalizedUnitsPerReservation String,
reservation_NumberOfReservations String,
reservation_ReservationARN String,
reservation_TotalReservedNormalizedUnits String,
reservation_TotalReservedUnits String,
reservation_UnitsPerReservation String,
resourceTags_userName String,
resourceTags_usercostcategory String  


)
    ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
      ESCAPED BY '\\'
      LINES TERMINATED BY '\n'

STORED AS TEXTFILE
    LOCATION 's3://<<your bucket name>>';

Once you’ve successfully executed the command, you should see a new table named “cost_and_usage” with the below properties. Now we’re ready to start executing queries and running analysis!

Start with Looker and connect to Athena

Setting up Looker is a quick process, and you can try it out for free here (or download from Amazon Marketplace). It takes just a few seconds to connect Looker to your Athena database, and Looker comes with a host of pre-built data models and dashboards to make analysis of your cost and usage data simple and intuitive. After you’re connected, you can use the Looker UI to run whatever analysis you’d like. Looker translates this UI to optimized SQL, so any user can execute and visualize queries for true self-service analytics.

Major cost saving levers

Now that the data pipeline is configured, you can dive into the most popular use cases for cost savings. In this post, I focus on:

  • Purchasing Reserved Instances vs. On-Demand Instances
  • Data transfer costs
  • Allocating costs over users or other Attributes (denoted with resource tags)

On-Demand, Spot, and Reserved Instances

Purchasing Reserved Instances vs On-Demand Instances is arguably going to be the biggest cost lever for heavy AWS users (Reserved Instances run up to 75% cheaper!). AWS offers three options for purchasing instances:

  • On-Demand—Pay as you use.
  • Spot (variable cost)—Bid on spare Amazon EC2 computing capacity.
  • Reserved Instances—Pay for an instance for a specific, allotted period of time.

When purchasing a Reserved Instance, you can also choose to pay all-upfront, partial-upfront, or monthly. The more you pay upfront, the greater the discount.

If your company has been using AWS for some time now, you should have a good sense of your overall instance usage on a per-month or per-day basis. Rather than paying for these instances On-Demand, you should try to forecast the number of instances you’ll need, and reserve them with upfront payments.

The total amount of usage with Reserved Instances versus overall usage with all instances is called your coverage ratio. It’s important not to confuse your coverage ratio with your Reserved Instance utilization. Utilization represents the amount of reserved hours that were actually used. Don’t worry about exceeding capacity, you can still set up Auto Scaling preferences so that more instances get added whenever your coverage or utilization crosses a certain threshold (we often see a target of 80% for both coverage and utilization among savvy customers).

Calculating the reserved costs and coverage can be a bit tricky with the level of granularity provided by the cost and usage report. The following query shows your total cost over the last 6 months, broken out by Reserved Instance vs other instance usage. You can substitute the cost field for usage if you’d prefer. Please note that you should only have data for the time period after the cost and usage report has been enabled (though you can opt for up to 3 months of historical data by contacting your AWS Account Executive). If you’re just getting started, this query will only show a few days.

 

SELECT 
	DATE_FORMAT(from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate),'%Y-%m') AS "cost_and_usage.usage_start_month",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0) AS "cost_and_usage.total_unblended_cost",
	COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_reserved_unblended_cost",
	1.0 * (COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_on_ris",
	COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_non_reserved_unblended_cost",
	1.0 * (COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_on_non_ris"
FROM aws_optimizer.cost_and_usage  AS cost_and_usage

WHERE 
	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))
GROUP BY 1
ORDER BY 2 DESC
LIMIT 500

The resulting table should look something like the image below (I’m surfacing tables through Looker, though the same table would result from querying via command line or any other interface).

With a BI tool, you can create dashboards for easy reference and monitoring. New data is dumped into S3 every few hours, so your dashboards can update several times per day.

It’s an iterative process to understand the appropriate number of Reserved Instances needed to meet your business needs. After you’ve properly integrated Reserved Instances into your purchasing patterns, the savings can be significant. If your coverage is consistently below 70%, you should seriously consider adjusting your purchase types and opting for more Reserved instances.

Data transfer costs

One of the great things about AWS data storage is that it’s incredibly cheap. Most charges often come from moving and processing that data. There are several different prices for transferring data, broken out largely by transfers between regions and availability zones. Transfers between regions are the most costly, followed by transfers between Availability Zones. Transfers within the same region and same availability zone are free unless using elastic or public IP addresses, in which case there is a cost. You can find more detailed information in the AWS Pricing Docs. With this in mind, there are several simple strategies for helping reduce costs.

First, since costs increase when transferring data between regions, it’s wise to ensure that as many services as possible reside within the same region. The more you can localize services to one specific region, the lower your costs will be.

Second, you should maximize the data you’re routing directly within AWS services and IP addresses. Transfers out to the open internet are the most costly and least performant mechanisms of data transfers, so it’s best to keep transfers within AWS services.

Lastly, data transfers between private IP addresses are cheaper than between elastic or public IP addresses, so utilizing private IP addresses as much as possible is the most cost-effective strategy.

The following query provides a table depicting the total costs for each AWS product, broken out transfer cost type. Substitute the “lineitem_productcode” field in the query to segment the costs by any other attribute. If you notice any unusually high spikes in cost, you’ll need to dig deeper to understand what’s driving that spike: location, volume, and so on. Drill down into specific costs by including “product_usagetype” and “product_transfertype” in your query to identify the types of transfer costs that are driving up your bill.

SELECT 
	cost_and_usage.lineitem_productcode  AS "cost_and_usage.product_code",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost), 0) AS "cost_and_usage.total_unblended_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_data_transfer_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer-In')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_inbound_data_transfer_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer-Out')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_outbound_data_transfer_cost"
FROM aws_optimizer.cost_and_usage  AS cost_and_usage

WHERE 
	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))
GROUP BY 1
ORDER BY 2 DESC
LIMIT 500

When moving between regions or over the open web, many data transfer costs also include the origin and destination location of the data movement. Using a BI tool with mapping capabilities, you can get a nice visual of data flows. The point at the center of the map is used to represent external data flows over the open internet.

Analysis by tags

AWS provides the option to apply custom tags to individual resources, so you can allocate costs over whatever customized segment makes the most sense for your business. For a SaaS company that hosts software for customers on AWS, maybe you’d want to tag the size of each customer. The following query uses custom tags to display the reserved, data transfer, and total cost for each AWS service, broken out by tag categories, over the last 6 months. You’ll want to substitute the cost_and_usage.resourcetags_customersegment and cost_and_usage.customer_segment with the name of your customer field.

 

SELECT * FROM (
SELECT *, DENSE_RANK() OVER (ORDER BY z___min_rank) as z___pivot_row_rank, RANK() OVER (PARTITION BY z__pivot_col_rank ORDER BY z___min_rank) as z__pivot_col_ordering FROM (
SELECT *, MIN(z___rank) OVER (PARTITION BY "cost_and_usage.product_code") as z___min_rank FROM (
SELECT *, RANK() OVER (ORDER BY CASE WHEN z__pivot_col_rank=1 THEN (CASE WHEN "cost_and_usage.total_unblended_cost" IS NOT NULL THEN 0 ELSE 1 END) ELSE 2 END, CASE WHEN z__pivot_col_rank=1 THEN "cost_and_usage.total_unblended_cost" ELSE NULL END DESC, "cost_and_usage.total_unblended_cost" DESC, z__pivot_col_rank, "cost_and_usage.product_code") AS z___rank FROM (
SELECT *, DENSE_RANK() OVER (ORDER BY CASE WHEN "cost_and_usage.customer_segment" IS NULL THEN 1 ELSE 0 END, "cost_and_usage.customer_segment") AS z__pivot_col_rank FROM (
SELECT 
	cost_and_usage.lineitem_productcode  AS "cost_and_usage.product_code",
	cost_and_usage.resourcetags_customersegment  AS "cost_and_usage.customer_segment",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0) AS "cost_and_usage.total_unblended_cost",
	1.0 * (COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_data_transfers_unblended",
	1.0 * (COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.unblended_percent_spend_on_ris"
FROM aws_optimizer.cost_and_usage_raw  AS cost_and_usage

WHERE 
	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))
GROUP BY 1,2) ww
) bb WHERE z__pivot_col_rank <= 16384
) aa
) xx
) zz
 WHERE z___pivot_row_rank <= 500 OR z__pivot_col_ordering = 1 ORDER BY z___pivot_row_rank

The resulting table in this example looks like the results below. In this example, you can tell that we’re making poor use of Reserved Instances because they represent such a small portion of our overall costs.

Again, using a BI tool to visualize these costs and trends over time makes the analysis much easier to consume and take action on.

Summary

Saving costs on your AWS spend is always an iterative, ongoing process. Hopefully with these queries alone, you can start to understand your spending patterns and identify opportunities for savings. However, this is just a peek into the many opportunities available through analysis of the Cost and Usage report. Each company is different, with unique needs and usage patterns. To achieve maximum cost savings, we encourage you to set up an analytics environment that enables your team to explore all potential cuts and slices of your usage data, whenever it’s necessary. Exploring different trends and spikes across regions, services, user types, etc. helps you gain comprehensive understanding of your major cost levers and consistently implement new cost reduction strategies.

Note that all of the queries and analysis provided in this post were generated using the Looker data platform. If you’re already a Looker customer, you can get all of this analysis, additional pre-configured dashboards, and much more using Looker Blocks for AWS.


About the Author

Dillon Morrison leads the Platform Ecosystem at Looker. He enjoys exploring new technologies and architecting the most efficient data solutions for the business needs of his company and their customers. In his spare time, you’ll find Dillon rock climbing in the Bay Area or nose deep in the docs of the latest AWS product release at his favorite cafe (“Arlequin in SF is unbeatable!”).

 

 

 

New – AWS SAM Local (Beta) – Build and Test Serverless Applications Locally

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/new-aws-sam-local-beta-build-and-test-serverless-applications-locally/

Today we’re releasing a beta of a new tool, SAM Local, that makes it easy to build and test your serverless applications locally. In this post we’ll use SAM local to build, debug, and deploy a quick application that allows us to vote on tabs or spaces by curling an endpoint. AWS introduced Serverless Application Model (SAM) last year to make it easier for developers to deploy serverless applications. If you’re not already familiar with SAM my colleague Orr wrote a great post on how to use SAM that you can read in about 5 minutes. At it’s core, SAM is a powerful open source specification built on AWS CloudFormation that makes it easy to keep your serverless infrastructure as code – and they have the cutest mascot.

SAM Local takes all the good parts of SAM and brings them to your local machine.

There are a couple of ways to install SAM Local but the easiest is through NPM. A quick npm install -g aws-sam-local should get us going but if you want the latest version you can always install straight from the source: go get github.com/awslabs/aws-sam-local (this will create a binary named aws-sam-local, not sam).

I like to vote on things so let’s write a quick SAM application to vote on Spaces versus Tabs. We’ll use a very simple, but powerful, architecture of API Gateway fronting a Lambda function and we’ll store our results in DynamoDB. In the end a user should be able to curl our API curl https://SOMEURL/ -d '{"vote": "spaces"}' and get back the number of votes.

Let’s start by writing a simple SAM template.yaml:

AWSTemplateFormatVersion : '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:
  VotesTable:
    Type: "AWS::Serverless::SimpleTable"
  VoteSpacesTabs:
    Type: "AWS::Serverless::Function"
    Properties:
      Runtime: python3.6
      Handler: lambda_function.lambda_handler
      Policies: AmazonDynamoDBFullAccess
      Environment:
        Variables:
          TABLE_NAME: !Ref VotesTable
      Events:
        Vote:
          Type: Api
          Properties:
            Path: /
            Method: post

So we create a [dynamo_i] table that we expose to our Lambda function through an environment variable called TABLE_NAME.

To test that this template is valid I’ll go ahead and call sam validate to make sure I haven’t fat-fingered anything. It returns Valid! so let’s go ahead and get to work on our Lambda function.

import os
import os
import json
import boto3
votes_table = boto3.resource('dynamodb').Table(os.getenv('TABLE_NAME'))

def lambda_handler(event, context):
    print(event)
    if event['httpMethod'] == 'GET':
        resp = votes_table.scan()
        return {'body': json.dumps({item['id']: int(item['votes']) for item in resp['Items']})}
    elif event['httpMethod'] == 'POST':
        try:
            body = json.loads(event['body'])
        except:
            return {'statusCode': 400, 'body': 'malformed json input'}
        if 'vote' not in body:
            return {'statusCode': 400, 'body': 'missing vote in request body'}
        if body['vote'] not in ['spaces', 'tabs']:
            return {'statusCode': 400, 'body': 'vote value must be "spaces" or "tabs"'}

        resp = votes_table.update_item(
            Key={'id': body['vote']},
            UpdateExpression='ADD votes :incr',
            ExpressionAttributeValues={':incr': 1},
            ReturnValues='ALL_NEW'
        )
        return {'body': "{} now has {} votes".format(body['vote'], resp['Attributes']['votes'])}

So let’s test this locally. I’ll need to create a real DynamoDB database to talk to and I’ll need to provide the name of that database through the enviornment variable TABLE_NAME. I could do that with an env.json file or I can just pass it on the command line. First, I can call:
$ echo '{"httpMethod": "POST", "body": "{\"vote\": \"spaces\"}"}' |\
TABLE_NAME="vote-spaces-tabs" sam local invoke "VoteSpacesTabs"

to test the Lambda – it returns the number of votes for spaces so theoritically everything is working. Typing all of that out is a pain so I could generate a sample event with sam local generate-event api and pass that in to the local invocation. Far easier than all of that is just running our API locally. Let’s do that: sam local start-api. Now I can curl my local endpoints to test everything out.
I’ll run the command: $ curl -d '{"vote": "tabs"}' http://127.0.0.1:3000/ and it returns: “tabs now has 12 votes”. Now, of course I did not write this function perfectly on my first try. I edited and saved several times. One of the benefits of hot-reloading is that as I change the function I don’t have to do any additional work to test the new function. This makes iterative development vastly easier.

Let’s say we don’t want to deal with accessing a real DynamoDB database over the network though. What are our options? Well we can download DynamoDB Local and launch it with java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb. Then we can have our Lambda function use the AWS_SAM_LOCAL environment variable to make some decisions about how to behave. Let’s modify our function a bit:

import os
import json
import boto3
if os.getenv("AWS_SAM_LOCAL"):
    votes_table = boto3.resource(
        'dynamodb',
        endpoint_url="http://docker.for.mac.localhost:8000/"
    ).Table("spaces-tabs-votes")
else:
    votes_table = boto3.resource('dynamodb').Table(os.getenv('TABLE_NAME'))

Now we’re using a local endpoint to connect to our local database which makes working without wifi a little easier.

SAM local even supports interactive debugging! In Java and Node.js I can just pass the -d flag and a port to immediately enable the debugger. For Python I could use a library like import epdb; epdb.serve() and connect that way. Then we can call sam local invoke -d 8080 "VoteSpacesTabs" and our function will pause execution waiting for you to step through with the debugger.

Alright, I think we’ve got everything working so let’s deploy this!

First I’ll call the sam package command which is just an alias for aws cloudformation package and then I’ll use the result of that command to sam deploy.

$ sam package --template-file template.yaml --s3-bucket MYAWESOMEBUCKET --output-template-file package.yaml
Uploading to 144e47a4a08f8338faae894afe7563c3  90570 / 90570.0  (100.00%)
Successfully packaged artifacts and wrote output template to file package.yaml.
Execute the following command to deploy the packaged template
aws cloudformation deploy --template-file package.yaml --stack-name 
$ sam deploy --template-file package.yaml --stack-name VoteForSpaces --capabilities CAPABILITY_IAM
Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - VoteForSpaces

Which brings us to our API:
.

I’m going to hop over into the production stage and add some rate limiting in case you guys start voting a lot – but otherwise we’ve taken our local work and deployed it to the cloud without much effort at all. I always enjoy it when things work on the first deploy!

You can vote now and watch the results live! http://spaces-or-tabs.s3-website-us-east-1.amazonaws.com/

We hope that SAM Local makes it easier for you to test, debug, and deploy your serverless apps. We have a CONTRIBUTING.md guide and we welcome pull requests. Please tweet at us to let us know what cool things you build. You can see our What’s New post here and the documentation is live here.

Randall

CyberChef – Cyber Swiss Army Knife

Post Syndicated from Darknet original http://feedproxy.google.com/~r/darknethackers/~3/SOhld_nebGs/

CyberChef is a simple, intuitive web app for carrying out all manner of “cyber” operations within a web browser. These operations include simple encoding like XOR or Base64, more complex encryption like AES, DES and Blowfish, creating binary and hexdumps, compression and decompression of data, calculating hashes and checksums, IPv6 and X.509…

Read the full post at darknet.org.uk

Perform Near Real-time Analytics on Streaming Data with Amazon Kinesis and Amazon Elasticsearch Service

Post Syndicated from Tristan Li original https://aws.amazon.com/blogs/big-data/perform-near-real-time-analytics-on-streaming-data-with-amazon-kinesis-and-amazon-elasticsearch-service/

Nowadays, streaming data is seen and used everywhere—from social networks, to mobile and web applications, IoT devices, instrumentation in data centers, and many other sources. As the speed and volume of this type of data increases, the need to perform data analysis in real time with machine learning algorithms and extract a deeper understanding from the data becomes ever more important. For example, you might want a continuous monitoring system to detect sentiment changes in a social media feed so that you can react to the sentiment in near real time.

In this post, we use Amazon Kinesis Streams to collect and store streaming data. We then use Amazon Kinesis Analytics to process and analyze the streaming data continuously. Specifically, we use the Kinesis Analytics built-in RANDOM_CUT_FOREST function, a machine learning algorithm, to detect anomalies in the streaming data. Finally, we use Amazon Kinesis Firehose to export the anomalies data to Amazon Elasticsearch Service (Amazon ES). We then build a simple dashboard in the open source tool Kibana to visualize the result.

Solution overview

The following diagram depicts a high-level overview of this solution.

Amazon Kinesis Streams

You can use Amazon Kinesis Streams to build your own streaming application. This application can process and analyze streaming data by continuously capturing and storing terabytes of data per hour from hundreds of thousands of sources.

Amazon Kinesis Analytics

Kinesis Analytics provides an easy and familiar standard SQL language to analyze streaming data in real time. One of its most powerful features is that there are no new languages, processing frameworks, or complex machine learning algorithms that you need to learn.

Amazon Kinesis Firehose

Kinesis Firehose is the easiest way to load streaming data into AWS. It can capture, transform, and load streaming data into Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service.

Amazon Elasticsearch Service

Amazon ES is a fully managed service that makes it easy to deploy, operate, and scale Elasticsearch for log analytics, full text search, application monitoring, and more.

Solution summary

The following is a quick walkthrough of the solution that’s presented in the diagram:

  1. IoT sensors send streaming data into Kinesis Streams. In this post, you use a Python script to simulate an IoT temperature sensor device that sends the streaming data.
  2. By using the built-in RANDOM_CUT_FOREST function in Kinesis Analytics, you can detect anomalies in real time with the sensor data that is stored in Kinesis Streams. RANDOM_CUT_FOREST is also an appropriate algorithm for many other kinds of anomaly-detection use cases—for example, the media sentiment example mentioned earlier in this post.
  3. The processed anomaly data is then loaded into the Kinesis Firehose delivery stream.
  4. By using the built-in integration that Kinesis Firehose has with Amazon ES, you can easily export the processed anomaly data into the service and visualize it with Kibana.

Implementation steps

The following sections walk through the implementation steps in detail.

Creating the delivery stream

  1. Open the Amazon Kinesis Streams console.
  2. Create a new Kinesis stream. Give it a name that indicates it’s for raw incoming stream data—for example, RawStreamData. For Number of shards, type 1.
  3. The Python code provided below simulates a streaming application, such as an IoT device, and generates random data and anomalies into a Kinesis stream. The code generates two temperature ranges, where the first range is the hypothetical sensor’s normal operating temperature range (10–20), and the second is the anomaly temperature range (100–120).Make sure to change the stream name on line 16 and 20 and the Region on line 6 to match your configuration. Alternatively, you can download the Amazon Kinesis Data Generator from this repository and use it to generate the data.
    import json
    import datetime
    import random
    import testdata
    from boto import kinesis
    
    kinesis = kinesis.connect_to_region("us-east-1")
    
    def getData(iotName, lowVal, highVal):
       data = {}
       data["iotName"] = iotName
       data["iotValue"] = random.randint(lowVal, highVal) 
       return data
    
    while 1:
       rnd = random.random()
       if (rnd < 0.01):
          data = json.dumps(getData("DemoSensor", 100, 120))  
          kinesis.put_record("RawStreamData", data, "DemoSensor")
          print '***************************** anomaly ************************* ' + data
       else:
          data = json.dumps(getData("DemoSensor", 10, 20))  
          kinesis.put_record("RawStreamData", data, "DemoSensor")
          print data

  4. Open the Amazon Elasticsearch Service console and create a new domain.
    1. Give the domain a unique name. In the Configure cluster screen, use the default settings.
    2. In the Set up access policy screen, in the Set the domain access policy list, choose Allow access to the domain from specific IP(s).
    3. Enter the public IP address of your computer.
      Note: If you’re working behind a proxy or firewall, see the “Use a proxy to simplify request signing” section in this AWS Database blog post to learn how to work with a proxy. For additional information about securing access to your Amazon ES domain, see How to Control Access to Your Amazon Elasticsearch Domain in the AWS Security Blog.
  5. After the Amazon ES domain is up and running, you can set up and configure Kinesis Firehose to export results to Amazon ES:
    1. Open the Amazon Kinesis Firehose console and choose Create Delivery Stream.
    2. In the Destination dropdown list, choose Amazon Elasticsearch Service.
    3. Type a stream name, and choose the Amazon ES domain that you created in Step 4.
    4. Provide an index name and ES type. In the S3 bucket dropdown list, choose Create New S3 bucket. Choose Next.
    5. In the configuration, change the Elasticsearch Buffer size to 1 MB and the Buffer interval to 60s. Use the default settings for all other fields. This shortens the time for the data to reach the ES cluster.
    6. Under IAM Role, choose Create/Update existing IAM role.
      The best practice is to create a new role every time. Otherwise, the console keeps adding policy documents to the same role. Eventually the size of the attached policies causes IAM to reject the role, but it does it in a non-obvious way, where the console basically quits functioning.
    7. Choose Next to move to the Review page.
  6. Review the configuration, and then choose Create Delivery Stream.
  7. Run the Python file for 1–2 minutes, and then press Ctrl+C to stop the execution. This loads some data into the stream for you to visualize in the next step.

Analyzing the data

Now it’s time to analyze the IoT streaming data using Amazon Kinesis Analytics.

  1. Open the Amazon Kinesis Analytics console and create a new application. Give the application a name, and then choose Create Application.
  2. On the next screen, choose Connect to a source. Choose the raw incoming data stream that you created earlier. (Note the stream name Source_SQL_STREAM_001 because you will need it later.)
  3. Use the default settings for everything else. When the schema discovery process is complete, it displays a success message with the formatted stream sample in a table as shown in the following screenshot. Review the data, and then choose Save and continue.
  4. Next, choose Go to SQL editor. When prompted, choose Yes, start application.
  5. Copy the following SQL code and paste it into the SQL editor window.
    CREATE OR REPLACE STREAM "TEMP_STREAM" (
       "iotName"        varchar (40),
       "iotValue"   integer,
       "ANOMALY_SCORE"  DOUBLE);
    -- Creates an output stream and defines a schema
    CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
       "iotName"       varchar(40),
       "iotValue"       integer,
       "ANOMALY_SCORE"  DOUBLE,
       "created" TimeStamp);
     
    -- Compute an anomaly score for each record in the source stream
    -- using Random Cut Forest
    CREATE OR REPLACE PUMP "STREAM_PUMP_1" AS INSERT INTO "TEMP_STREAM"
    SELECT STREAM "iotName", "iotValue", ANOMALY_SCORE FROM
      TABLE(RANDOM_CUT_FOREST(
        CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001")
      )
    );
    
    -- Sort records by descending anomaly score, insert into output stream
    CREATE OR REPLACE PUMP "OUTPUT_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"
    SELECT STREAM "iotName", "iotValue", ANOMALY_SCORE, ROWTIME FROM "TEMP_STREAM"
    ORDER BY FLOOR("TEMP_STREAM".ROWTIME TO SECOND), ANOMALY_SCORE DESC;

 

  1. Choose Save and run SQL.
    As the application is running, it displays the results as stream data arrives. If you don’t see any data coming in, run the Python script again to generate some fresh data. When there is data, it appears in a grid as shown in the following screenshot.Note that you are selecting data from the source stream name Source_SQL_STREAM_001 that you created previously. Also note the ANOMALY_SCORE column. This is the value that the Random_Cut_Forest function calculates based on the temperature ranges provided by the Python script. Higher (anomaly) temperature ranges have a higher score.Looking at the SQL code, note that the first two blocks of code create two new streams to store temporary data and the final result. The third block of code analyzes the raw source data (Stream_Pump_1) using the Random_Cut_Forest function. It calculates an anomaly score (ANOMALY_SCORE) and inserts it into the TEMP_STREAM stream. The final code block loads the result stored in the TEMP_STREAM into DESTINATION_SQL_STREAM.
  2. Choose Exit (done editing) next to the Save and run SQL button to return to the application configuration page.

Load processed data into the Kinesis Firehose delivery stream

Now, you can export the result from DESTINATION_SQL_STREAM into the Amazon Kinesis Firehose stream that you created previously.

  1. On the application configuration page, choose Connect to a destination.
  2. Choose the stream name that you created earlier, and use the default settings for everything else. Then choose Save and Continue.
  3. On the application configuration page, choose Exit to Kinesis Analytics applications to return to the Amazon Kinesis Analytics console.
  4. Run the Python script again for 4–5 minutes to generate enough data to flow through Amazon Kinesis Streams, Kinesis Analytics, Kinesis Firehose, and finally into the Amazon ES domain.
  5. Open the Kinesis Firehose console, choose the stream, and then choose the Monitoring
  6. As the processed data flows into Kinesis Firehose and Amazon ES, the metrics appear on the Delivery Stream metrics page. Keep in mind that the metrics page takes a few minutes to refresh with the latest data.
  7. Open the Amazon Elasticsearch Service dashboard in the AWS Management Console. The count in the Searchable documents column increases as shown in the following screenshot. In addition, the domain shows a cluster health of Yellow. This is because, by default, it needs two instances to deploy redundant copies of the index. To fix this, you can deploy two instances instead of one.

Visualize the data using Kibana

Now it’s time to launch Kibana and visualize the data.

  1. Use the ES domain link to go to the cluster detail page, and then choose the Kibana link as shown in the following screenshot.

    If you’re working behind a proxy or firewall, see the “Use a proxy to simplify request signing” section in this blog post to learn how to work with a proxy.
  2. In the Kibana dashboard, choose the Discover tab to perform a query.
  3. You can also visualize the data using the different types of charts offered by Kibana. For example, by going to the Visualize tab, you can quickly create a split bar chart that aggregates by ANOMALY_SCORE per minute.


Conclusion

In this post, you learned how to use Amazon Kinesis to collect, process, and analyze real-time streaming data, and then export the results to Amazon ES for analysis and visualization with Kibana. If you have comments about this post, add them to the “Comments” section below. If you have questions or issues with implementing this solution, please open a new thread on the Amazon Kinesis or Amazon ES discussion forums.


Next Steps

Take your skills to the next level. Learn real-time clickstream anomaly detection with Amazon Kinesis Analytics.

 


About the Author

Tristan Li is a Solutions Architect with Amazon Web Services. He works with enterprise customers in the US, helping them adopt cloud technology to build scalable and secure solutions on AWS.

 

 

 

 

New – API & CloudFormation Support for Amazon CloudWatch Dashboards

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-api-cloudformation-support-for-amazon-cloudwatch-dashboards/

We launched CloudWatch Dashboards a couple of years ago. In the post that I wrote for the launch, I showed you how to interactively create a dashboard that displayed chosen CloudWatch metrics in graphical form. After the launch, we added additional features including a full screen mode, a dark theme, control over the range of the Y axis, simplified renaming, persistent storage, and new visualization options.

New API & CLI
While console support is wonderful for interactive use, many customers have asked us to support programmatic creation and manipulation of dashboards and the widgets within. They would like to dynamically build and maintain dashboards, adding and removing widgets as the corresponding AWS resources are created and destroyed. Other customers are interested in setting up and maintaining a consistent set of dashboards across two or more AWS accounts.

I am happy to announce that API, CLI, and AWS CloudFormation support for CloudWatch Dashboards is available now and that you can start using it today!

There are four new API functions (and equivalent CLI commands):

ListDashboards / aws cloudwatch list-dashboards – Fetch a list of all dashboards within an account, or a subset that share a common prefix.

GetDashboard / aws cloudwatch get-dashboard – Fetch details for a single dashboard.

PutDashboard / aws cloudwatch put-dashboard – Create a new dashboard or update an existing one.

DeleteDashboards / aws cloudwatch delete-dashboards – Delete one or more dashboards.

Dashboard Concepts
I want to show you how to use these functions and commands. Before I dive in, I should review a couple of important dashboard concepts and attributes.

Global – Dashboards are part of an AWS account, and are not associated with a specific AWS Region. Each account can have up to 500 dashboards.

Named – Each dashboard has a name that is unique within the AWS account. Names can be up to 255 characters long.

Grid Model – Each dashboard is composed of a grid of cells. The grid is 24 cells across and as tall as necessary. Each widget on the dashboard is positioned at a particular set of grid coordinates, and has a size that spans an integral number of grid cells.

Widgets (Visualizations) – Each widget can display text or a set of CloudWatch metrics. Text is specified using Markdown; metrics can be displayed as single values, line charts, or stacked area charts. Each dashboard can have up to 100 widgets. Widgets that display metrics can also be associated with a CloudWatch Alarm.

Dashboards have a JSON representation that you can now see and edit from within the console. Simply click on the Action menu and choose View/edit source:

Here’s the source for my dashboard:

You can use this JSON as a starting point for your own applications. As you can see, there’s an entry in the widgets array for each widget on the dashboard; each entry describes one widget, starting with its type, position, and size.

Creating a Dashboard Using the API
Let’s say I want to create a dashboard that has a widget for each of my EC2 instances in a particular region. I’ll use Python and the AWS SDK for Python, and start as follows (excuse the amateur nature of my code):

import boto3
import json

cw  = boto3.client("cloudwatch")
ec2 = boto3.client("ec2")

x, y          = [0, 0]
width, height = [3, 3]
max_width     = 12
widgets       = []

Then I simply iterate over the instances, creating a widget dictionary for each one, and appending it to the widgets array:

instances = ec2.describe_instances()
for r in instances['Reservations']:
    for i in r['Instances']:

        widget = {'type'      : 'metric',
                  'x'         : x,
                  'y'         : y,
                  'height'    : height,
                  'width'     : width,
                  'properties': {'view'    : 'timeSeries',
                                 'stacked' : False,
                                 'metrics' : [['AWS/EC2', 'NetworkIn', 'InstanceId', i['InstanceId']],
                                              ['.',       'NetworkOut', '.',         '.']
                                             ],
                                 'period'  : 300,
                                 'stat'    : 'Average',
                                 'region'  : 'us-east-1',
                                 'title'   : i['InstanceId']
                                }
                 }

        widgets.append(widget)

I update the position (x and y) within the loop, and form a grid (if I don’t specify positions, the widgets will be laid out left to right, top to bottom):

        x += width
        if (x + width > max_width):
            x = 0
            y += height

After I have processed all of the instances, I create a JSON version of the widget array:

body   = {'widgets' : widgets}
body_j = json.dumps(body)

And I create or update my dashboard:

cw.put_dashboard(DashboardName = "EC2_Networking",
                 DashboardBody = body_j)

I run the code, and get the following dashboard:

The CloudWatch team recommends that dashboards created programmatically include a text widget indicating that the dashboard was generated automatically, along with a link to the source code or CloudFormation template that did the work. This will discourage users from making manual, out-of-band changers to the dashboards.

As I mentioned earlier, each metric widget can also be associated with a CloudWatch Alarm. You can create the alarms programmatically or by using a CloudFormation template such as the Sample CPU Utilization Alarm. If you decide to do this, the alarm threshold will be displayed in the widget. To learn more about this, read Tara Walker’s recent post, Amazon CloudWatch Launches Alarms on Dashboards.

Going one step further, I could use CloudWatch Events and a Lamba Function to track the creation and deletion of certain resources and update a dashboard in concert with the changes. To learn how to do this, read Keeping CloudWatch Dashboards up to Date Using AWS Lambda.

Accessing a Dashboard Using the CLI
I can also access and manipulate my dashboards from the command line. For example, I can generate a simple list:

$ aws cloudwatch list-dashboards --output table
----------------------------------------------
|               ListDashboards               |
+--------------------------------------------+
||             DashboardEntries             ||
|+-----------------+----------------+-------+|
||  DashboardName  | LastModified   | Size  ||
|+-----------------+----------------+-------+|
||  Disk-Metrics   |  1496405221.0  |  316  ||
||  EC2_Networking |  1498090434.0  |  2830 ||
||  Main-Metrics   |  1498085173.0  |  234  ||
|+-----------------+----------------+-------+|

And I can get rid of the Disk-Metrics dashboard:

$ aws cloudwatch delete-dashboards --dashboard-names Disk-Metrics

I can also retrieve the JSON that defines a dashboard:

Creating a Dashboard Using CloudFormation
Dashboards can also be specified in CloudFormation templates. Here’s a simple template in YAML (the DashboardBody is still specified in JSON):

Resources:
  MyDashboard:
    Type: "AWS::CloudWatch::Dashboard"
    Properties:
      DashboardName: SampleDashboard
      DashboardBody: '{"widgets":[{"type":"text","x":0,"y":0,"width":6,"height":6,"properties":{"markdown":"Hi there from CloudFormation"}}]}'

I place the template in a file and then create a stack using the console or the CLI:

$ aws cloudformation create-stack --stack-name MyDashboard --template-body file://dash.yaml
{
    "StackId": "arn:aws:cloudformation:us-east-1:xxxxxxxxxxxx:stack/MyDashboard/a2a3fb20-5708-11e7-8ffd-500c21311262"
}

Here’s the dashboard:

Available Now
This feature is available now and you can start using it today. You can create 3 dashboards with up to 50 metrics per dashboard at no charge; additional dashboards are priced at $3 per month, as listed on the CloudWatch Pricing page. You can make up to 1 million calls to the new API functions each month at no charge; beyond that you pay $.01 for every 1,000 calls.

Jeff;