Tag Archives: web services

Using AWS CodePipeline, AWS CodeBuild, and AWS Lambda for Serverless Automated UI Testing

Post Syndicated from Prakash Palanisamy original https://aws.amazon.com/blogs/devops/using-aws-codepipeline-aws-codebuild-and-aws-lambda-for-serverless-automated-ui-testing/

Testing the user interface of a web application is an important part of the development lifecycle. In this post, I’ll explain how to automate UI testing using serverless technologies, including AWS CodePipeline, AWS CodeBuild, and AWS Lambda.

I built a website for UI testing that is hosted in S3. I used Selenium to perform cross-browser UI testing on Chrome, Firefox, and PhantomJS, a headless WebKit browser with Ghost Driver, an implementation of the WebDriver Wire Protocol. I used Python to create test cases for ChromeDriver, FirefoxDriver, or PhatomJSDriver based the browser against which the test is being executed.

Resources referred to in this post, including the AWS CloudFormation template, test and status websites hosted in S3, AWS CodeBuild build specification files, AWS Lambda function, and the Python script that performs the test are available in the serverless-automated-ui-testing GitHub repository.

S3 Hosted Test Website:

AWS CodeBuild supports custom containers so we can use the Selenium/standalone-Firefox and Selenium/standalone-Chrome containers, which include prebuild Firefox and Chrome browsers, respectively. Xvfb performs the graphical operation in virtual memory without any display hardware. It will be installed in the CodeBuild containers during the install phase.

Build Spec for Chrome and Firefox

The build specification for Chrome and Firefox testing includes multiple phases:

  • The environment variables section contains a set of default variables that are overridden while creating the build project or triggering the build.
  • As part of install phase, required packages like Xvfb and Selenium are installed using yum.
  • During the pre_build phase, the test bed is prepared for test execution.
  • During the build phase, the appropriate DISPLAY is set and the tests are executed.
version: 0.2

env:
  variables:
    BROWSER: "chrome"
    WebURL: "https://sampletestweb.s3-eu-west-1.amazonaws.com/website/index.html"
    ArtifactBucket: "codebuild-demo-artifact-repository"
    MODULES: "mod1"
    ModuleTable: "test-modules"
    StatusTable: "blog-test-status"

phases:
  install:
    commands:
      - apt-get update
      - apt-get -y upgrade
      - apt-get install xvfb python python-pip build-essential -y
      - pip install --upgrade pip
      - pip install selenium
      - pip install awscli
      - pip install requests
      - pip install boto3
      - cp xvfb.init /etc/init.d/xvfb
      - chmod +x /etc/init.d/xvfb
      - update-rc.d xvfb defaults
      - service xvfb start
      - export PATH="$PATH:`pwd`/webdrivers"
  pre_build:
    commands:
      - python prepare_test.py
  build:
    commands:
      - export DISPLAY=:5
      - cd tests
      - echo "Executing simple test..."
      - python testsuite.py

Because Ghost Driver runs headless, it can be executed on AWS Lambda. In keeping with a fire-and-forget model, I used CodeBuild to create the PhantomJS Lambda function and trigger the test invocations on Lambda in parallel. This is powerful because many tests can be executed in parallel on Lambda.

Build Spec for PhantomJS

The build specification for PhantomJS testing also includes multiple phases. It is a little different from the preceding example because we are using AWS Lambda for the test execution.

  • The environment variables section contains a set of default variables that are overridden while creating the build project or triggering the build.
  • As part of install phase, the required packages like Selenium and the AWS CLI are installed using yum.
  • During the pre_build phase, the test bed is prepared for test execution.
  • During the build phase, a zip file that will be used to create the PhantomJS Lambda function is created and tests are executed on the Lambda function.
version: 0.2

env:
  variables:
    BROWSER: "phantomjs"
    WebURL: "https://sampletestweb.s3-eu-west-1.amazonaws.com/website/index.html"
    ArtifactBucket: "codebuild-demo-artifact-repository"
    MODULES: "mod1"
    ModuleTable: "test-modules"
    StatusTable: "blog-test-status"
    LambdaRole: "arn:aws:iam::account-id:role/role-name"

phases:
  install:
    commands:
      - apt-get update
      - apt-get -y upgrade
      - apt-get install python python-pip build-essential -y
      - apt-get install zip unzip -y
      - pip install --upgrade pip
      - pip install selenium
      - pip install awscli
      - pip install requests
      - pip install boto3
  pre_build:
    commands:
      - python prepare_test.py
  build:
    commands:
      - cd lambda_function
      - echo "Packaging Lambda Function..."
      - zip -r /tmp/lambda_function.zip ./*
      - func_name=`echo $CODEBUILD_BUILD_ID | awk -F ':' '{print $1}'`-phantomjs
      - echo "Creating Lambda Function..."
      - chmod 777 phantomjs
      - |
         func_list=`aws lambda list-functions | grep FunctionName | awk -F':' '{print $2}' | tr -d ', "'`
         if echo "$func_list" | grep -qw $func_name
         then
             echo "Lambda function already exists."
         else
             aws lambda create-function --function-name $func_name --runtime "python2.7" --role $LambdaRole --handler "testsuite.lambda_handler" --zip-file fileb:///tmp/lambda_function.zip --timeout 150 --memory-size 1024 --environment Variables="{WebURL=$WebURL, StatusTable=$StatusTable}" --tags Name=$func_name
         fi
      - export PhantomJSFunction=$func_name
      - cd ../tests/
      - python testsuite.py

The list of test cases and the test modules that belong to each case are stored in an Amazon DynamoDB table. Based on the list of modules passed as an argument to the CodeBuild project, CodeBuild gets the test cases from that table and executes them. The test execution status and results are stored in another Amazon DynamoDB table. It will read the test status from the status table in DynamoDB and display it.

AWS CodeBuild and AWS Lambda perform the test execution as individual tasks. AWS CodePipeline plays an important role here by enabling continuous delivery and parallel execution of tests for optimized testing.

Here’s how to do it:

In AWS CodePipeline, create a pipeline with four stages:

  • Source (AWS CodeCommit)
  • UI testing (AWS Lambda and AWS CodeBuild)
  • Approval (manual approval)
  • Production (AWS Lambda)

Pipeline stages, the actions in each stage, and transitions between stages are shown in the following diagram.

This design implemented in AWS CodePipeline looks like this:

CodePipeline automatically detects a change in the source repository and triggers the execution of the pipeline.

In the UITest stage, there are two parallel actions:

  • DeployTestWebsite invokes a Lambda function to deploy the test website in S3 as an S3 website.
  • DeployStatusPage invokes another Lambda function to deploy in parallel the status website in S3 as an S3 website.

Next, there are three parallel actions that trigger the CodeBuild project:

  • TestOnChrome launches a container to perform the Selenium tests on Chrome.
  • TestOnFirefox launches another container to perform the Selenium tests on Firefox.
  • TestOnPhantomJS creates a Lambda function and invokes individual Lambda functions per test case to execute the test cases in parallel.

You can monitor the status of the test execution on the status website, as shown here:

When the UI testing is completed successfully, the pipeline continues to an Approval stage in which a notification is sent to the configured SNS topic. The designated team member reviews the test status and approves or rejects the deployment. Upon approval, the pipeline continues to the Production stage, where it invokes a Lambda function and deploys the website to a production S3 bucket.

I used a CloudFormation template to set up my continuous delivery pipeline. The automated-ui-testing.yaml template, available from GitHub, sets up a full-featured pipeline.

When I use the template to create my pipeline, I specify the following:

  • AWS CodeCommit repository.
  • SNS topic to send approval notification.
  • S3 bucket name where the artifacts will be stored.

The stack name should follow the rules for S3 bucket naming because it will be part of the S3 bucket name.

When the stack is created successfully, the URLs for the test website and status website appear in the Outputs section, as shown here:

Conclusion

In this post, I showed how you can use AWS CodePipeline, AWS CodeBuild, AWS Lambda, and a manual approval process to create a continuous delivery pipeline for serverless automated UI testing. Websites running on Amazon EC2 instances or AWS Elastic Beanstalk can also be tested using similar approach.


About the author

Prakash Palanisamy is a Solutions Architect for Amazon Web Services. When he is not working on Serverless, DevOps or Alexa, he will be solving problems in Project Euler. He also enjoys watching educational documentaries.

Simplify Your Jenkins Builds with AWS CodeBuild

Post Syndicated from Paul Roberts original https://aws.amazon.com/blogs/devops/simplify-your-jenkins-builds-with-aws-codebuild/

Jeff Bezos famously said, “There’s a lot of undifferentiated heavy lifting that stands between your idea and that success.” He went on to say, “…70% of your time, energy, and dollars go into the undifferentiated heavy lifting and only 30% of your energy, time, and dollars gets to go into the core kernel of your idea.”

If you subscribe to this maxim, you should not be spending valuable time focusing on operational issues related to maintaining the Jenkins build infrastructure. Companies such as Riot Games have over 1.25 million builds per year and have written several lengthy blog posts about their experiences designing a complex, custom Docker-powered Jenkins build farm. Dealing with Jenkins slaves at scale is a job in itself and Riot has engineers focused on managing the build infrastructure.

Typical Jenkins Build Farm

 

As with all technology, the Jenkins build farm architectures have evolved. Today, instead of manually building your own container infrastructure, there are Jenkins Docker plugins available to help reduce the operational burden of maintaining these environments. There is also a community-contributed Amazon EC2 Container Service (Amazon ECS) plugin that helps remove some of the overhead, but you still need to configure and manage the overall Amazon ECS environment.

There are various ways to create and manage your Jenkins build farm, but there has to be a way that significantly reduces your operational overhead.

Introducing AWS CodeBuild

AWS CodeBuild is a fully managed build service that removes the undifferentiated heavy lifting of provisioning, managing, and scaling your own build servers. With CodeBuild, there is no software to install, patch, or update. CodeBuild scales up automatically to meet the needs of your development teams. In addition, CodeBuild is an on-demand service where you pay as you go. You are charged based only on the number of minutes it takes to complete your build.

One AWS customer, Recruiterbox, helps companies hire simply and predictably through their software platform. Two years ago, they began feeling the operational pain of maintaining their own Jenkins build farms. They briefly considered moving to Amazon ECS, but chose an even easier path forward instead. Recuiterbox transitioned to using Jenkins with CodeBuild and are very happy with the results. You can read more about their journey here.

Solution Overview: Jenkins and CodeBuild

To remove the heavy lifting from managing your Jenkins build farm, AWS has developed a Jenkins AWS CodeBuild plugin. After the plugin has been enabled, a developer can configure a Jenkins project to pick up new commits from their chosen source code repository and automatically run the associated builds. After the build is successful, it will create an artifact that is stored inside an S3 bucket that you have configured. If an error is detected somewhere, CodeBuild will capture the output and send it to Amazon CloudWatch logs. In addition to storing the logs on CloudWatch, Jenkins also captures the error so you do not have to go hunting for log files for your build.

 

AWS CodeBuild with Jenkins Plugin

 

The following example uses AWS CodeCommit (Git) as the source control management (SCM) and Amazon S3 for build artifact storage. Logs are stored in CloudWatch. A development pipeline that uses Jenkins with CodeBuild plugin architecture looks something like this:

 

AWS CodeBuild Diagram

Initial Solution Setup

To keep this blog post succinct, I assume that you are using the following components on AWS already and have applied the appropriate IAM policies:

·         AWS CodeCommit repo.

·         Amazon S3 bucket for CodeBuild artifacts.

·         SNS notification for text messaging of the Jenkins admin password.

·         IAM user’s key and secret.

·         A role that has a policy with these permissions. Be sure to edit the ARNs with your region, account, and resource name. Use this role in the AWS CloudFormation template referred to later in this post.

 

Jenkins Installation with CodeBuild Plugin Enabled

To make the integration with Jenkins as frictionless as possible, I have created an AWS CloudFormation template here: https://s3.amazonaws.com/proberts-public/jenkins.yaml. Download the template, sign in the AWS CloudFormation console, and then use the template to create a stack.

 

CloudFormation Inputs

Jenkins Project Configuration

After the stack is complete, log in to the Jenkins EC2 instance using the user name “admin” and the password sent to your mobile device. Now that you have logged in to Jenkins, you need to create your first project. Start with a Freestyle project and configure the parameters based on your CodeBuild and CodeCommit settings.

 

AWS CodeBuild Plugin Configuration in Jenkins

 

Additional Jenkins AWS CodeBuild Plugin Configuration

 

After you have configured the Jenkins project appropriately you should be able to check your build status on the Jenkins polling log under your project settings:

 

Jenkins Polling Log

 

Now that Jenkins is polling CodeCommit, you can check the CodeBuild dashboard under your Jenkins project to confirm your build was successful:

Jenkins AWS CodeBuild Dashboard

Wrapping Up

In a matter of minutes, you have been able to provision Jenkins with the AWS CodeBuild plugin. This will greatly simplify your build infrastructure management. Now kick back and relax while CodeBuild does all the heavy lifting!


About the Author

Paul Roberts is a Strategic Solutions Architect for Amazon Web Services. When he is not working on Serverless, DevOps, or Artificial Intelligence, he is often found in Lake Tahoe exploring the various mountain ranges with his family.

Manage Kubernetes Clusters on AWS Using CoreOS Tectonic

Post Syndicated from Arun Gupta original https://aws.amazon.com/blogs/compute/kubernetes-clusters-aws-coreos-tectonic/

There are multiple ways to run a Kubernetes cluster on Amazon Web Services (AWS). The first post in this series explained how to manage a Kubernetes cluster on AWS using kops. This second post explains how to manage a Kubernetes cluster on AWS using CoreOS Tectonic.

Tectonic overview

Tectonic delivers the most current upstream version of Kubernetes with additional features. It is a commercial offering from CoreOS and adds the following features over the upstream:

  • Installer
    Comes with a graphical installer that installs a highly available Kubernetes cluster. Alternatively, the cluster can be installed using AWS CloudFormation templates or Terraform scripts.
  • Operators
    An operator is an application-specific controller that extends the Kubernetes API to create, configure, and manage instances of complex stateful applications on behalf of a Kubernetes user. This release includes an etcd operator for rolling upgrades and a Prometheus operator for monitoring capabilities.
  • Console
    A web console provides a full view of applications running in the cluster. It also allows you to deploy applications to the cluster and start the rolling upgrade of the cluster.
  • Monitoring
    Node CPU and memory metrics are powered by the Prometheus operator. The graphs are available in the console. A large set of preconfigured Prometheus alerts are also available.
  • Security
    Tectonic ensures that cluster is always up to date with the most recent patches/fixes. Tectonic clusters also enable role-based access control (RBAC). Different roles can be mapped to an LDAP service.
  • Support
    CoreOS provides commercial support for clusters created using Tectonic.

Tectonic can be installed on AWS using a GUI installer or Terraform scripts. The installer prompts you for the information needed to boot the Kubernetes cluster, such as AWS access and secret key, number of master and worker nodes, and instance size for the master and worker nodes. The cluster can be created after all the options are specified. Alternatively, Terraform assets can be downloaded and the cluster can be created later. This post shows using the installer.

CoreOS License and Pull Secret

Even though Tectonic is a commercial offering, a cluster for up to 10 nodes can be created by creating a free account at Get Tectonic for Kubernetes. After signup, a CoreOS License and Pull Secret files are provided on your CoreOS account page. Download these files as they are needed by the installer to boot the cluster.

IAM user permission

The IAM user to create the Kubernetes cluster must have access to the following services and features:

  • Amazon Route 53
  • Amazon EC2
  • Elastic Load Balancing
  • Amazon S3
  • Amazon VPC
  • Security groups

Use the aws-policy policy to grant the required permissions for the IAM user.

DNS configuration

A subdomain is required to create the cluster, and it must be registered as a public Route 53 hosted zone. The zone is used to host and expose the console web application. It is also used as the static namespace for the Kubernetes API server. This allows kubectl to be able to talk directly with the master.

The domain may be registered using Route 53. Alternatively, a domain may be registered at a third-party registrar. This post uses a kubernetes-aws.io domain registered at a third-party registrar and a tectonic subdomain within it.

Generate a Route 53 hosted zone using the AWS CLI. Download jq to run this command:

ID=$(uuidgen) && \
aws route53 create-hosted-zone \
--name tectonic.kubernetes-aws.io \
--caller-reference $ID \
| jq .DelegationSet.NameServers

The command shows an output such as the following:

[
  "ns-1924.awsdns-48.co.uk",
  "ns-501.awsdns-62.com",
  "ns-1259.awsdns-29.org",
  "ns-749.awsdns-29.net"
]

Create NS records for the domain with your registrar. Make sure that the NS records can be resolved using a utility like dig web interface. A sample output would look like the following:

The bottom of the screenshot shows NS records configured for the subdomain.

Download and run the Tectonic installer

Download the Tectonic installer (version 1.7.1) and extract it. The latest installer can always be found at coreos.com/tectonic. Start the installer:

./tectonic/tectonic-installer/$PLATFORM/installer

Replace $PLATFORM with either darwin or linux. The installer opens your default browser and prompts you to select the cloud provider. Choose Amazon Web Services as the platform. Choose Next Step.

Specify the Access Key ID and Secret Access Key for the IAM role that you created earlier. This allows the installer to create resources required for the Kubernetes cluster. This also gives the installer full access to your AWS account. Alternatively, to protect the integrity of your main AWS credentials, use a temporary session token to generate temporary credentials.

You also need to choose a region in which to install the cluster. For the purpose of this post, I chose a region close to where I live, Northern California. Choose Next Step.

Give your cluster a name. This name is part of the static namespace for the master and the address of the console.

To enable in-place update to the Kubernetes cluster, select the checkbox next to Automated Updates. It also enables update to the etcd and Prometheus operators. This feature may become a default in future releases.

Choose Upload “tectonic-license.txt” and upload the previously downloaded license file.

Choose Upload “config.json” and upload the previously downloaded pull secret file. Choose Next Step.

Let the installer generate a CA certificate and key. In this case, the browser may not recognize this certificate, which I discuss later in the post. Alternatively, you can provide a CA certificate and a key in PEM format issued by an authorized certificate authority. Choose Next Step.

Use the SSH key for the region specified earlier. You also have an option to generate a new key. This allows you to later connect using SSH into the Amazon EC2 instances provisioned by the cluster. Here is the command that can be used to log in:

ssh –i <key> [email protected]<ec2-instance-ip>

Choose Next Step.

Define the number and instance type of master and worker nodes. In this case, create a 6 nodes cluster. Make sure that the worker nodes have enough processing power and memory to run the containers.

An etcd cluster is used as persistent storage for all of Kubernetes API objects. This cluster is required for the Kubernetes cluster to operate. There are three ways to use the etcd cluster as part of the Tectonic installer:

  • (Default) Provision the cluster using EC2 instances. Additional EC2 instances are used in this case.
  • Use an alpha support for cluster provisioning using the etcd operator. The etcd operator is used for automated operations of the etcd master nodes for the cluster itself, in addition to for etcd instances that are created for application usage. The etcd cluster is provisioned within the Tectonic installer.
  • Bring your own pre-provisioned etcd cluster.

Use the first option in this case.

For more information about choosing the appropriate instance type, see the etcd hardware recommendation. Choose Next Step.

Specify the networking options. The installer can create a new public VPC or use a pre-existing public or private VPC. Make sure that the VPC requirements are met for an existing VPC.

Give a DNS name for the cluster. Choose the domain for which the Route 53 hosted zone was configured earlier, such as tectonic.kubernetes-aws.io. Multiple clusters may be created under a single domain. The cluster name and the DNS name would typically match each other.

To select the CIDR range, choose Show Advanced Settings. You can also choose the Availability Zones for the master and worker nodes. By default, the master and worker nodes are spread across multiple Availability Zones in the chosen region. This makes the cluster highly available.

Leave the other values as default. Choose Next Step.

Specify an email address and password to be used as credentials to log in to the console. Choose Next Step.

At any point during the installation, you can choose Save progress. This allows you to save configurations specified in the installer. This configuration file can then be used to restore progress in the installer at a later point.

To start the cluster installation, choose Submit. At another time, you can download the Terraform assets by choosing Manually boot. This allows you to boot the cluster later.

The logs from the Terraform scripts are shown in the installer. When the installation is complete, the console shows that the Terraform scripts were successfully applied, the domain name was resolved successfully, and that the console has started. The domain works successfully if the DNS resolution worked earlier, and it’s the address where the console is accessible.

Choose Download assets to download assets related to your cluster. It contains your generated CA, kubectl configuration file, and the Terraform state. This download is an important step as it allows you to delete the cluster later.

Choose Next Step for the final installation screen. It allows you to access the Tectonic console, gives you instructions about how to configure kubectl to manage this cluster, and finally deploys an application using kubectl.

Choose Go to my Tectonic Console. In our case, it is also accessible at http://cluster.tectonic.kubernetes-aws.io/.

As I mentioned earlier, the browser does not recognize the self-generated CA certificate. Choose Advanced and connect to the console. Enter the login credentials specified earlier in the installer and choose Login.

The Kubernetes upstream and console version are shown under Software Details. Cluster health shows All systems go and it means that the API server and the backend API can be reached.

To view different Kubernetes resources in the cluster choose, the resource in the left navigation bar. For example, all deployments can be seen by choosing Deployments.

By default, resources in the all namespace are shown. Other namespaces may be chosen by clicking on a menu item on the top of the screen. Different administration tasks such as managing the namespaces, getting list of the nodes and RBAC can be configured as well.

Download and run Kubectl

Kubectl is required to manage the Kubernetes cluster. The latest version of kubectl can be downloaded using the following command:

curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/darwin/amd64/kubectl

It can also be conveniently installed using the Homebrew package manager. To find and access a cluster, Kubectl needs a kubeconfig file. By default, this configuration file is at ~/.kube/config. This file is created when a Kubernetes cluster is created from your machine. However, in this case, download this file from the console.

In the console, choose admin, My Account, Download Configuration and follow the steps to download the kubectl configuration file. Move this file to ~/.kube/config. If kubectl has already been used on your machine before, then this file already exists. Make sure to take a backup of that file first.

Now you can run the commands to view the list of deployments:

~ $ kubectl get deployments --all-namespaces
NAMESPACE         NAME                                    DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system       etcd-operator                           1         1         1            1           43m
kube-system       heapster                                1         1         1            1           40m
kube-system       kube-controller-manager                 3         3         3            3           43m
kube-system       kube-dns                                1         1         1            1           43m
kube-system       kube-scheduler                          3         3         3            3           43m
tectonic-system   container-linux-update-operator         1         1         1            1           40m
tectonic-system   default-http-backend                    1         1         1            1           40m
tectonic-system   kube-state-metrics                      1         1         1            1           40m
tectonic-system   kube-version-operator                   1         1         1            1           40m
tectonic-system   prometheus-operator                     1         1         1            1           40m
tectonic-system   tectonic-channel-operator               1         1         1            1           40m
tectonic-system   tectonic-console                        2         2         2            2           40m
tectonic-system   tectonic-identity                       2         2         2            2           40m
tectonic-system   tectonic-ingress-controller             1         1         1            1           40m
tectonic-system   tectonic-monitoring-auth-alertmanager   1         1         1            1           40m
tectonic-system   tectonic-monitoring-auth-prometheus     1         1         1            1           40m
tectonic-system   tectonic-prometheus-operator            1         1         1            1           40m
tectonic-system   tectonic-stats-emitter                  1         1         1            1           40m

This output is similar to the one shown in the console earlier. Now, this kubectl can be used to manage your resources.

Upgrade the Kubernetes cluster

Tectonic allows the in-place upgrade of the cluster. This is an experimental feature as of this release. The clusters can be updated either automatically, or with manual approval.

To perform the update, choose Administration, Cluster Settings. If an earlier Tectonic installer, version 1.6.2 in this case, is used to install the cluster, then this screen would look like the following:

Choose Check for Updates. If any updates are available, choose Start Upgrade. After the upgrade is completed, the screen is refreshed.

This is an experimental feature in this release and so should only be used on clusters that can be easily replaced. This feature may become a fully supported in a future release. For more information about the upgrade process, see Upgrading Tectonic & Kubernetes.

Delete the Kubernetes cluster

Typically, the Kubernetes cluster is a long-running cluster to serve your applications. After its purpose is served, you may delete it. It is important to delete the cluster as this ensures that all resources created by the cluster are appropriately cleaned up.

The easiest way to delete the cluster is using the assets downloaded in the last step of the installer. Extract the downloaded zip file. This creates a directory like <cluster-name>_TIMESTAMP. In that directory, give the following command to delete the cluster:

TERRAFORM_CONFIG=$(pwd)/.terraformrc terraform destroy --force

This destroys the cluster and all associated resources.

You may have forgotten to download the assets. There is a copy of the assets in the directory tectonic/tectonic-installer/darwin/clusters. In this directory, another directory with the name <cluster-name>_TIMESTAMP contains your assets.

Conclusion

This post explained how to manage Kubernetes clusters using the CoreOS Tectonic graphical installer.  For more details, see Graphical Installer with AWS. If the installation does not succeed, see the helpful Troubleshooting tips. After the cluster is created, see the Tectonic tutorials to learn how to deploy, scale, version, and delete an application.

Future posts in this series will explain other ways of creating and running a Kubernetes cluster on AWS.

Arun

Parallel Processing in Python with AWS Lambda

Post Syndicated from Oz Akan original https://aws.amazon.com/blogs/compute/parallel-processing-in-python-with-aws-lambda/

If you develop an AWS Lambda function with Node.js, you can call multiple web services without waiting for a response due to its asynchronous nature.  All requests are initiated almost in parallel, so you can get results much faster than a series of sequential calls to each web service. Considering the maximum execution duration for Lambda, it is beneficial for I/O bound tasks to run in parallel.

If you develop a Lambda function with Python, parallelism doesn’t come by default. Lambda supports Python 2.7 and Python 3.6, both of which have multiprocessing and threading modules. The multiprocessing module supports multiple cores so it is a better choice, especially for CPU intensive workloads. With the threading module, all threads are going to run on a single core though performance difference is negligible for network-bound tasks.

In this post, I demonstrate how the Python multiprocessing module can be used within a Lambda function to run multiple I/O bound tasks in parallel.

Example use case

In this example, you call Amazon EC2 and Amazon EBS API operations to find the total EBS volume size for all your EC2 instances in a region.

This is a two-step process:

  • The Lambda function calls EC2 to list all EC2 instances
  • The function calls EBS for each instance to find attached EBS volumes

Sequential Execution

If you make these calls sequentially, during the second step, your code has to loop over all the instances and wait for each response before moving to the next request.

The class named VolumesSequential has the following methods:

  • __init__ creates an EC2 resource.
  • total_size returns all EC2 instances and passes these to the instance_volumes method.
  • instance_volumes finds the total size of EBS volumes for the instance.
  • total_size adds all sizes from all instances to find total size for the EBS volumes.

Source Code for Sequential Execution

import time
import boto3

class VolumesSequential(object):
    """Finds total volume size for all EC2 instances"""
    def __init__(self):
        self.ec2 = boto3.resource('ec2')

    def instance_volumes(self, instance):
        """
        Finds total size of the EBS volumes attached
        to an EC2 instance
        """
        instance_total = 0
        for volume in instance.volumes.all():
            instance_total += volume.size
        return instance_total

    def total_size(self):
        """
        Lists all EC2 instances in the default region
        and sums result of instance_volumes
        """
        print "Running sequentially"
        instances = self.ec2.instances.all()
        instances_total = 0
        for instance in instances:
            instances_total += self.instance_volumes(instance)
        return instances_total

def lambda_handler(event, context):
    volumes = VolumesSequential()
    _start = time.time()
    total = volumes.total_size()
    print "Total volume size: %s GB" % total
    print "Sequential execution time: %s seconds" % (time.time() - _start)

Parallel Execution

The multiprocessing module that comes with Python 2.7 lets you run multiple processes in parallel. Due to the Lambda execution environment not having /dev/shm (shared memory for processes) support, you can’t use multiprocessing.Queue or multiprocessing.Pool.

If you try to use multiprocessing.Queue, you get an error similar to the following:

[Errno 38] Function not implemented: OSError
…
    sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
OSError: [Errno 38] Function not implemented

On the other hand, you can use multiprocessing.Pipe instead of multiprocessing.Queue to accomplish what you need without getting any errors during the execution of the Lambda function.

The class named VolumeParallel has the following methods:

  • __init__ creates an EC2 resource
  • instance_volumes finds the total size of EBS volumes attached to an instance
  • total_size finds all instances and runs instance_volumes for each to find the total size of all EBS volumes attached to all EC2 instances.

Source Code for Parallel Execution

import time
from multiprocessing import Process, Pipe
import boto3

class VolumesParallel(object):
    """Finds total volume size for all EC2 instances"""
    def __init__(self):
        self.ec2 = boto3.resource('ec2')

    def instance_volumes(self, instance, conn):
        """
        Finds total size of the EBS volumes attached
        to an EC2 instance
        """
        instance_total = 0
        for volume in instance.volumes.all():
            instance_total += volume.size
        conn.send([instance_total])
        conn.close()

    def total_size(self):
        """
        Lists all EC2 instances in the default region
        and sums result of instance_volumes
        """
        print "Running in parallel"

        # get all EC2 instances
        instances = self.ec2.instances.all()
        
        # create a list to keep all processes
        processes = []

        # create a list to keep connections
        parent_connections = []
        
        # create a process per instance
        for instance in instances:            
            # create a pipe for communication
            parent_conn, child_conn = Pipe()
            parent_connections.append(parent_conn)

            # create the process, pass instance and connection
            process = Process(target=self.instance_volumes, args=(instance, child_conn,))
            processes.append(process)

        # start all processes
        for process in processes:
            process.start()

        # make sure that all processes have finished
        for process in processes:
            process.join()

        instances_total = 0
        for parent_connection in parent_connections:
            instances_total += parent_connection.recv()[0]

        return instances_total


def lambda_handler(event, context):
    volumes = VolumesParallel()
    _start = time.time()
    total = volumes.total_size()
    print "Total volume size: %s GB" % total
    print "Sequential execution time: %s seconds" % (time.time() - _start)

Performance

There are a few differences between two Lambda functions when it comes to the execution environment. The parallel function requires more memory than the sequential one. You may run the parallel Lambda function with a relatively large memory setting to see how much memory it uses. The amount of memory required by the Lambda function depends on what the function does and how many processes it runs in parallel. To restrict maximum memory usage, you may want to limit the number of parallel executions.

In this case, when you give 1024 MB for both Lambda functions, the parallel function runs about two times faster than the sequential function. I have a handful of EC2 instances and EBS volumes in my account so the test ran way under the maximum execution limit for Lambda. Remember that parallel execution doesn’t guarantee that the runtime for the Lambda function will be under the maximum allowed duration but does speed up the overall execution time.

Sequential Run Time Output

START RequestId: 4c370b12-f9d3-11e6-b46b-b5d41afd648e Version: $LATEST
Running sequentially
Total volume size: 589 GB
Sequential execution time: 3.80066084862 seconds
END RequestId: 4c370b12-f9d3-11e6-b46b-b5d41afd648e
REPORT RequestId: 4c370b12-f9d3-11e6-b46b-b5d41afd648e Duration: 4091.59 ms Billed Duration: 4100 ms  Memory Size: 1024 MB Max Memory Used: 46 MB

Parallel Run Time Output

START RequestId: 4f1328ed-f9d3-11e6-8cd1-c7381c5c078d Version: $LATEST
Running in parallel
Total volume size: 589 GB
Sequential execution time: 1.89170885086 seconds
END RequestId: 4f1328ed-f9d3-11e6-8cd1-c7381c5c078d
REPORT RequestId: 4f1328ed-f9d3-11e6-8cd1-c7381c5c078d Duration: 2069.33 ms Billed Duration: 2100 ms  Memory Size: 1024 MB Max Memory Used: 181 MB 

Summary

In this post, I demonstrated how to run multiple I/O bound tasks in parallel by developing a Lambda function with the Python multiprocessing module. With the help of this module, you freed the CPU from waiting for I/O and fired up several tasks to fit more I/O bound operations into a given time frame. This might be the trick to reduce the overall runtime of a Lambda function especially when you have to run so many and don’t want to split the work into smaller chunks.

Russia Blocks 4,000 Pirate Sites Plus 41,000 Innocent as Collateral Damage

Post Syndicated from Andy original https://torrentfreak.com/russia-blocks-4000-pirate-sites-plus-41000-innocent-as-collateral-damage-170905/

After years of criticism from both international and local rightsholders, in 2013 the Russian government decided to get tough on Internet piracy.

Under new legislation, sites engaged in Internet piracy could find themselves blocked by ISPs, rendering them inaccessible to local citizens and solving the piracy problem. Well, that was the theory, at least.

More than four years on, Russia is still grappling with a huge piracy problem that refuses to go away. It has been blocking thousands of sites at a steady rate, including RuTracker, the country’s largest torrent platform, but still the problem persists.

Now, a new report produced by Roskomsvoboda, the Center for the Protection of Digital Rights, and the Pirate Party of Russia, reveals a system that has not only failed to reach its stated aims but is also having a negative effect on the broader Internet.

“It’s already been four years since the creation of this ‘anti-piracy machine’ in Russia. The first amendments related to the fight against ‘piracy’ in the network came into force on August 1, 2013, and since then this mechanism has been twice revised,” Roskomsvoboda said in a statement.

“[These include] the emergence of additional responsibilities to restrict access to network resources and increase the number of subjects who are responsible for removing and blocking content. Since that time, several ‘purely Russian’ trends in ‘anti-piracy’ and trade in rights have also emerged.”

These revisions, which include the permanent blocking of persistently infringing sites and the planned blocking of mirror sites and anonymizers, have been widely documented. However, the researchers say that they want to shine a light on the effects of blocking procedures and subsequent actions that are causing significant issues for third-parties.

As part of the study, the authors collected data on the cases presented to the Moscow City Court by the most active plaintiffs in anti-piracy actions (mainly TV show distributors and music outfits including Sony Music Entertainment and Universal Music). They describe the court process and system overall as lacking.

“The court does not conduct a ‘triple test’ and ignores the position, rights and interests of respondents and third parties. It does not check the availability of illegal information on sites and appeals against decisions of the Moscow City Court do not bring any results,” the researchers write.

“Furthermore, the cancellation of the unlimited blocking of a site is simply impossible and in respect of hosting providers and security services, those web services are charged with all the legal costs of the case.”

The main reason behind this situation is that ‘pirate’ site operators rarely (if ever) turn up to defend themselves. If at some point they are found liable for infringement under the Criminal Code, they can be liable for up to six years in prison, hardly an incentive to enter into a copyright process voluntarily. As a result, hosts and other providers act as respondents.

This means that these third-party companies appear as defendants in the majority of cases, a position they find both “unfair and illogical.” They’re also said to be confused about how they are supposed to fulfill the blocking demands placed upon them by the Court.

“About 90% of court cases take place without the involvement of the site owner, since the requirements are imposed on the hosting provider, who is not responsible for the content of the site,” the report says.

Nevertheless, hosts and other providers have been ordered to block huge numbers of pirate sites.

According to the researchers, the total has now gone beyond 4,000 domains, but the knock on effect is much more expansive. Due to the legal requirement to block sites by both IP address and other means, third-party sites with shared IP addresses get caught up as collateral damage. The report states that more than 41,000 innocent sites have been blocked as the result of supposedly targeted court orders.

But with collateral damage mounting, the main issue as far as copyright holders are concerned is whether piracy is decreasing as a result. The report draws few conclusions on that front but notes that blocks are a blunt instrument. While they may succeed in stopping some people from accessing ‘pirate’ domains, the underlying infringement carries on regardless.

“Blocks create restrictions only for Internet users who are denied access to sites, but do not lead to the removal of illegal information or prevent intellectual property violations,” the researchers add.

With no sign of the system being overhauled to tackle the issues raised in the study (pdf, Russian), Russia is now set to introduce yet new anti-piracy measures.

As recently reported, new laws requiring search engines to remove listings for ‘pirate’ mirror sites comes into effect October 1. Exactly a month later on November 1, VPNs and anonymization tools will have to be removed too, if they fail to meet the standards required under state regulation.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

How Aussie ecommerce stores can compete with the retail giant Amazon

Post Syndicated from chris desantis original https://www.anchor.com.au/blog/2017/08/aussie-ecommerce-stores-vs-amazon/

The powerhouse Amazon retail store is set to launch in Australia toward the end of 2018 and Aussie ecommerce retailers need to ready themselves for the competition storm ahead.

2018 may seem a while away but getting your ecommerce site in tip top shape and ready to compete can take time. Check out these helpful hints from the Anchor crew.

Speed kills

If you’ve ever heard of the tale of the tortoise and the hare, the moral is that “slow and steady wins the race”. This is definitely not the place for that phrase, because if your site loads as slowly as a 1995 dial up connection, your ecommerce store will not, I repeat, will not win the race.

Site speed can be impacted by a number of factors and getting the balance right between a site that loads at lightning speed and delivering engaging content to your audience. There are many ways to check the performance of your site including Anchor’s free hosting check up or pingdom.

Taking action can boost the performance of your site:

Here’s an interesting blog from the WebCEO team about site speed’s impact on conversion rates on-page, or check out our previous blog on maximising site performance.

Show me the money

As an ecommerce store, getting credit card details as fast as possible is probably at the top of your list, but it’s important to remember that it’s an actual person that needs to hand over the details.

Consider the customer’s experience whilst checking out. Making people log in to their account before checkout, can lead to abandoned carts as customers try to remember the vital details. Similarly, making a customer enter all their details before displaying shipping costs is more of an annoyance than a benefit.

Built for growth

Before you blast out a promo email to your entire database or spend up big on PPC, consider what happens when this 5 fold increase in traffic, all jumps onto your site at around the same time.

Will your site come screeching to a sudden halt with a 504 or 408 error message, or ride high on the wave of increased traffic? If you have fixed infrastructure such as a dedicated server, or are utilising a VPS, then consider the maximum concurrent users that your site can handle.

Consider this. Amazon.com.au will be built on the scalable cloud infrastructure of Amazon Web Services and will utilise all the microservices and data mining technology to offer customers a seamless, personalised shopping experience. How will your business compete?

Search ready

Being found online is important for any business, but for ecommerce sites, it’s essential. Gaining results from SEO practices can take time so beware of ‘quick fix guarantees’ from outsourced agencies.

Search Engine Optimisation (SEO) practices can have lasting effects. Good practices can ensure your site is found via organic search without huge advertising budgets, on the other hand ‘black hat’ practices can push your ecommerce store into search oblivion.

SEO takes discipline and focus to get right. Here are some of our favourite hints for SEO greatness from those who live and breathe SEO:

  • Optimise your site for mobile
  • Use Meta Tags wisely
  • Leverage Descriptive alt tags and image file names
  • Create content for people, not bots (keyword stuffing is a no no!)

SEO best practices are continually evolving, but creating a site that is designed to give users a great experience and give them the content they expect to find.

Google My Business is a free service that EVERY business should take advantage of. It is a listing service where your business can provide details such as address, phone number, website, and trading hours. It’s easy to update and manage, you can add photos, a physical address (if applicable), and display shopper reviews.

Get your site ship shape

Overwhelmed by these starter tips? If you are ready to get your site into tip top shape–get in touch. We work with awesome partners like eWave who can help create a seamless online shopping experience.

 

The post How Aussie ecommerce stores can compete with the retail giant Amazon appeared first on AWS Managed Services by Anchor.

Analyzing Salesforce Data with Amazon QuickSight

Post Syndicated from David McAmis original https://aws.amazon.com/blogs/big-data/analyzing-salesforce-data-with-amazon-quicksight/

Salesforce Sales Cloud is a powerful platform for managing customer data. One of the key functions that the platform provides is the ability to track customer opportunities. Opportunities in Salesforce are used to track revenue, sales pipelines, and other activities from the very first contact with a potential customer to a closed sale.

Amazon QuickSight is a rich data visualization tool that provides the ability to connect to Salesforce data and use it as a data source for creating analyses, stories, and dashboards  and easily share them with others in the organization. This post focuses on how to connect to Salesforce as a data source and create a useful opportunity dashboard, incorporating Amazon QuickSight features like relative date filters, Key Performance Indicator (KPI) charts, and more.

Walkthrough

In this post, you walk through the following tasks:

  • Creating a new data set based on Salesforce data
  • Creating your analysis and adding visuals
  • Creating an Amazon QuickSight dashboard
  • Working with filters

Note: For this walkthrough, I am using my own Salesforce.com Developer Edition account. You can sign up for your own free developer account at https://developer.salesforce.com/.

Creating a new Amazon QuickSight data set based on Salesforce data

To start, you need to create a new Amazon QuickSight data set. Sign in to Amazon QuickSight at https://quicksight.aws using the link from the home page. Enter your Amazon QuickSight account name and choose Continue. Next, enter your Email address or user name and password, then choose Sign In.

On the Amazon QuickSight start page, choose Manage Data, which takes you to a list of your data sets. Choose New Data Set, and choose Salesforce as your data source. Enter a data source name—in this example, I called mine “SFDC Opportunity.” Choose Create Data Source to open the Salesforce authentication page, where you can enter your Salesforce user name and password.

After you are authenticated to Salesforce, you are presented with a drop-down list that lets you select data from Reports or Objects. For this tutorial, choose Object. Scroll down in the list to choose the Opportunity object, and then choose Select.

To finish creating your data set, choose Visualize to go to where you can create a new Amazon QuickSight analysis from this data.

Creating your analysis and adding visuals

Now that you have acquired your data, it’s time to start working with your analysis. In Amazon Quicksight, an analysis is a container for a set of related visual stories. When you chose Visualize, a new analysis was created for you. This is where you start to create the visuals (charts, graphs, etc.) that will be the building blocks for your dashboard.

In Amazon QuickSight, Salesforce objects look like database tables. In the analysis that you just created, you can see the columns in the Fields list for the Opportunity object.

The Opportunity object in Salesforce has a number of default fields. Salesforce administrators can extend this object by adding other custom fields as required—these custom fields are usually marked with a “_c” at the end.

In the Fields List, you can see that Amazon QuickSight has divided the fields into Dimensions and Measures.  You use these to create your visualizations and dashboard. For this particular dashboard, you create five different visuals to display the data in a few different ways.

Opportunity by Stage

For the first visualization, you create a horizontal bar chart showing “Opportunity by Stage”. In the Fields List, choose the StageName dimension and the ExpectedRevenue measure. By default, this should create a horizontal bar chart for you, as shown in the following image.

Notice that this chart includes the Closed Won category, which we aren’t interested in showing. Choose the bar for Closed Won, and in the pop-up menu, choose Exclude Closed Won. This filters the chart to show only opportunities that are in progress.

It’s important to note that for this dashboard, we only want to show the opportunities that are not Closed Won. So in the menu bar on the left side, choose Filter.

By default, the filter that you just created was only applied to a single visualization. To change this, choose the filter, and then choose All Visuals from the drop-down list. This applies the filter to all visuals in the analysis.

To finish, select the chart title and rename the chart to Opportunity by Stage.

Opportunity by Month

Next, you need to create a new visual to show “Opportunity by Month.” You use a vertical bar chart to display the data. On the Amazon QuickSight toolbar, choose Add, and then choose Add visual. For this visual, choose CloseDate from the dimensions and ExpectedRevenue from the measures.

Using the Visual Types menu, change the chart type to a Vertical Bar Chart. By default, the chart displays the revenue by year, but we want to break it down a bit further. Choose Field Wells, and using the CloseDate drop-down menu, change the Aggregate to Month.

With the change to a monthly aggregate, your chart should look something like the following:

Select the chart title and rename the chart to Opportunity by Month.

Expected Revenue

When working with Salesforce opportunities, there are two measures that are important to most sales managers—the first is the total amount associated with the opportunity, and the second is what the actual expected revenue will be. For the next visual, you use the KPI chart to display these measures.

Choose Add on the Amazon QuickSight toolbar, and then choose Add visual. From the measures, choose ExpectedRevenue, and then Amount. To change your visualization, go to the Visual Types menu and choose the Key Performance Indicator (KPI). Your visualization should change and be similar to the following:

Select the chart title and rename the chart to Expected Revenue.

Opportunity by Lead Source

Next, you need to look at where the opportunity actually came from. This helps your dashboard users understand where the leads are being generated from and their value to the business. For this visual, you use a Horizontal Bar Chart.

On the Amazon QuickSight toolbar, choose Add, and then choose Add visual. From the measures, choose Amount, and for the dimensions, choose LeadSource. To change your visualization, go to the Visual Types menu and choose the Horizontal Bar Chart. Your visualization should change and be similar to the following:

Note: If you can’t read the chart labels for the bars, grab the axis line and drag to resize.

Select the chart title and rename the chart to Opportunity by Lead Source.

Expected Revenue vs. Opportunity Amount

For the last visual, you look at the individual opportunities and how they contribute to the total pipeline. A tree map is a specialized chart type that lets your dashboard users see how each opportunity amount contributes to the whole.  Additionally, you can highlight if there is a difference between the Expected Revenue and the Amount by sizing the marks by the Amount and coloring them by the Expected Amount.

On the Amazon QuickSight toolbar, choose Add, and then choose Add visual. From the measures, choose ExpectedRevenue and Amount. From the dimensions, choose Name. To change your visualization, go to the Visual Types menu and choose the Tree Map. Your visualization should change and be similar to the following:

Select the chart title and rename the chart to Expected Revenue vs Opportunity Amount.

Creating an Amazon QuickSight dashboard

Now that your visuals are created, it’s time to do the fun part—actually putting your Amazon QuickSight dashboard together. To create a dashboard, resize and position your visuals on the page, using the following layout:

To resize a visual, grab the handle in the lower-right corner and drag it to the height and width that you want.

To move your visual, use the grab bar at the top of the visual, as shown here:

When you are done resizing your visuals, your canvas should look something like this:

To create a dashboard, choose Share in the Amazon QuickSight toolbar. Then choose Create Dashboard. For this dashboard, give it a name of SFDC Opportunity Dashboard, and choose Create Dashboard. You are prompted to enter the email address or user name of the users you want to share this dashboard with.

Because we are just concentrating on the design at the moment, you can choose Cancel and share your dashboard later using the Share button on the dashboard toolbar.

Working with filters

There is one more feature that you can use when viewing your dashboard to make it even more useful. Earlier, when you were working with the Analysis, you added a filter to remove any opportunities that were tagged as Closed Won. Now, as you are viewing the dashboard, you add a filter that you can use to filter on a relative date.

This feature in Amazon QuickSight allows you to choose a time period (years, quarters, months, weeks, etc.) and then select from a list of relative time periods. For example, if you choose Year, you could set the filter options to Previous Year, This Year, Year to Date, or Last N Years.

This is especially handy for a Salesforce Opportunity dashboard, as you might want to filter the data using the Close Date field to see when the opportunity is actually set to close.

To create a relative date filter, choose Filter on the toolbar. Choose the filter icon, and then choose CloseDate, as shown in the following image:

At the top of the Edit Filter pane, change the drop-down list to apply the filter to All Visuals. The default filter type is Time Range, so use the drop-down list to change the filter type to Relative Dates.  For the time period, choose Quarters. To view all the current opportunities in your dashboard, choose the option for This Quarter, and choose Apply.

With the date filter in place, you have the final component for your dashboard, which should look something like the following example:

It’s important to note that at this point, you have added the filter when viewing the dashboard. If you think this is something that other users might want to do, you can go back to your Amazon QuickSight Analysis and add the filter there—that way it will be available for all dashboard users.

Summary

In this post, you learned how to connect to Salesforce data and create a basic dashboard. You can apply the same techniques to create analyses and dashboards from all different types of Salesforce data and objects. Whether you want to analyze your Salesforce account demographics or where your leads are coming from, or evaluate any other data stored in Salesforce, Amazon QuickSight helps you quickly connect to and visualize your data with only a few clicks.

 


Additional Reading

Learn how to visualize Amazon S3 analytics data with Amazon QuickSight!


About the Author

David McAmis is a Big Data & Analytics Consultant with Amazon Web Services. He works with customers to develop scalable platforms to gather, process and analyze data on AWS.

 

 

 

 

Analyzing AWS Cost and Usage Reports with Looker and Amazon Athena

Post Syndicated from Dillon Morrison original https://aws.amazon.com/blogs/big-data/analyzing-aws-cost-and-usage-reports-with-looker-and-amazon-athena/

This is a guest post by Dillon Morrison at Looker. Looker is, in their own words, “a new kind of analytics platform–letting everyone in your business make better decisions by getting reliable answers from a tool they can use.” 

As the breadth of AWS products and services continues to grow, customers are able to more easily move their technology stack and core infrastructure to AWS. One of the attractive benefits of AWS is the cost savings. Rather than paying upfront capital expenses for large on-premises systems, customers can instead pay variables expenses for on-demand services. To further reduce expenses AWS users can reserve resources for specific periods of time, and automatically scale resources as needed.

The AWS Cost Explorer is great for aggregated reporting. However, conducting analysis on the raw data using the flexibility and power of SQL allows for much richer detail and insight, and can be the better choice for the long term. Thankfully, with the introduction of Amazon Athena, monitoring and managing these costs is now easier than ever.

In the post, I walk through setting up the data pipeline for cost and usage reports, Amazon S3, and Athena, and discuss some of the most common levers for cost savings. I surface tables through Looker, which comes with a host of pre-built data models and dashboards to make analysis of your cost and usage data simple and intuitive.

Analysis with Athena

With Athena, there’s no need to create hundreds of Excel reports, move data around, or deploy clusters to house and process data. Athena uses Apache Hive’s DDL to create tables, and the Presto querying engine to process queries. Analysis can be performed directly on raw data in S3. Conveniently, AWS exports raw cost and usage data directly into a user-specified S3 bucket, making it simple to start querying with Athena quickly. This makes continuous monitoring of costs virtually seamless, since there is no infrastructure to manage. Instead, users can leverage the power of the Athena SQL engine to easily perform ad-hoc analysis and data discovery without needing to set up a data warehouse.

After the data pipeline is established, cost and usage data (the recommended billing data, per AWS documentation) provides a plethora of comprehensive information around usage of AWS services and the associated costs. Whether you need the report segmented by product type, user identity, or region, this report can be cut-and-sliced any number of ways to properly allocate costs for any of your business needs. You can then drill into any specific line item to see even further detail, such as the selected operating system, tenancy, purchase option (on-demand, spot, or reserved), and so on.

Walkthrough

By default, the Cost and Usage report exports CSV files, which you can compress using gzip (recommended for performance). There are some additional configuration options for tuning performance further, which are discussed below.

Prerequisites

If you want to follow along, you need the following resources:

Enable the cost and usage reports

First, enable the Cost and Usage report. For Time unit, select Hourly. For Include, select Resource IDs. All options are prompted in the report-creation window.

The Cost and Usage report dumps CSV files into the specified S3 bucket. Please note that it can take up to 24 hours for the first file to be delivered after enabling the report.

Configure the S3 bucket and files for Athena querying

In addition to the CSV file, AWS also creates a JSON manifest file for each cost and usage report. Athena requires that all of the files in the S3 bucket are in the same format, so we need to get rid of all these manifest files. If you’re looking to get started with Athena quickly, you can simply go into your S3 bucket and delete the manifest file manually, skip the automation described below, and move on to the next section.

To automate the process of removing the manifest file each time a new report is dumped into S3, which I recommend as you scale, there are a few additional steps. The folks at Concurrency labs wrote a great overview and set of scripts for this, which you can find in their GitHub repo.

These scripts take the data from an input bucket, remove anything unnecessary, and dump it into a new output bucket. We can utilize AWS Lambda to trigger this process whenever new data is dropped into S3, or on a nightly basis, or whatever makes most sense for your use-case, depending on how often you’re querying the data. Please note that enabling the “hourly” report means that data is reported at the hour-level of granularity, not that a new file is generated every hour.

Following these scripts, you’ll notice that we’re adding a date partition field, which isn’t necessary but improves query performance. In addition, converting data from CSV to a columnar format like ORC or Parquet also improves performance. We can automate this process using Lambda whenever new data is dropped in our S3 bucket. Amazon Web Services discusses columnar conversion at length, and provides walkthrough examples, in their documentation.

As a long-term solution, best practice is to use compression, partitioning, and conversion. However, for purposes of this walkthrough, we’re not going to worry about them so we can get up-and-running quicker.

Set up the Athena query engine

In your AWS console, navigate to the Athena service, and click “Get Started”. Follow the tutorial and set up a new database (we’ve called ours “AWS Optimizer” in this example). Don’t worry about configuring your initial table, per the tutorial instructions. We’ll be creating a new table for cost and usage analysis. Once you walked through the tutorial steps, you’ll be able to access the Athena interface, and can begin running Hive DDL statements to create new tables.

One thing that’s important to note, is that the Cost and Usage CSVs also contain the column headers in their first row, meaning that the column headers would be included in the dataset and any queries. For testing and quick set-up, you can remove this line manually from your first few CSV files. Long-term, you’ll want to use a script to programmatically remove this row each time a new file is dropped in S3 (every few hours typically). We’ve drafted up a sample script for ease of reference, which we run on Lambda. We utilize Lambda’s native ability to invoke the script whenever a new object is dropped in S3.

For cost and usage, we recommend using the DDL statement below. Since our data is in CSV format, we don’t need to use a SerDe, we can simply specify the “separatorChar, quoteChar, and escapeChar”, and the structure of the files (“TEXTFILE”). Note that AWS does have an OpenCSV SerDe as well, if you prefer to use that.

 

CREATE EXTERNAL TABLE IF NOT EXISTS cost_and_usage	 (
identity_LineItemId String,
identity_TimeInterval String,
bill_InvoiceId String,
bill_BillingEntity String,
bill_BillType String,
bill_PayerAccountId String,
bill_BillingPeriodStartDate String,
bill_BillingPeriodEndDate String,
lineItem_UsageAccountId String,
lineItem_LineItemType String,
lineItem_UsageStartDate String,
lineItem_UsageEndDate String,
lineItem_ProductCode String,
lineItem_UsageType String,
lineItem_Operation String,
lineItem_AvailabilityZone String,
lineItem_ResourceId String,
lineItem_UsageAmount String,
lineItem_NormalizationFactor String,
lineItem_NormalizedUsageAmount String,
lineItem_CurrencyCode String,
lineItem_UnblendedRate String,
lineItem_UnblendedCost String,
lineItem_BlendedRate String,
lineItem_BlendedCost String,
lineItem_LineItemDescription String,
lineItem_TaxType String,
product_ProductName String,
product_accountAssistance String,
product_architecturalReview String,
product_architectureSupport String,
product_availability String,
product_bestPractices String,
product_cacheEngine String,
product_caseSeverityresponseTimes String,
product_clockSpeed String,
product_currentGeneration String,
product_customerServiceAndCommunities String,
product_databaseEdition String,
product_databaseEngine String,
product_dedicatedEbsThroughput String,
product_deploymentOption String,
product_description String,
product_durability String,
product_ebsOptimized String,
product_ecu String,
product_endpointType String,
product_engineCode String,
product_enhancedNetworkingSupported String,
product_executionFrequency String,
product_executionLocation String,
product_feeCode String,
product_feeDescription String,
product_freeQueryTypes String,
product_freeTrial String,
product_frequencyMode String,
product_fromLocation String,
product_fromLocationType String,
product_group String,
product_groupDescription String,
product_includedServices String,
product_instanceFamily String,
product_instanceType String,
product_io String,
product_launchSupport String,
product_licenseModel String,
product_location String,
product_locationType String,
product_maxIopsBurstPerformance String,
product_maxIopsvolume String,
product_maxThroughputvolume String,
product_maxVolumeSize String,
product_maximumStorageVolume String,
product_memory String,
product_messageDeliveryFrequency String,
product_messageDeliveryOrder String,
product_minVolumeSize String,
product_minimumStorageVolume String,
product_networkPerformance String,
product_operatingSystem String,
product_operation String,
product_operationsSupport String,
product_physicalProcessor String,
product_preInstalledSw String,
product_proactiveGuidance String,
product_processorArchitecture String,
product_processorFeatures String,
product_productFamily String,
product_programmaticCaseManagement String,
product_provisioned String,
product_queueType String,
product_requestDescription String,
product_requestType String,
product_routingTarget String,
product_routingType String,
product_servicecode String,
product_sku String,
product_softwareType String,
product_storage String,
product_storageClass String,
product_storageMedia String,
product_technicalSupport String,
product_tenancy String,
product_thirdpartySoftwareSupport String,
product_toLocation String,
product_toLocationType String,
product_training String,
product_transferType String,
product_usageFamily String,
product_usagetype String,
product_vcpu String,
product_version String,
product_volumeType String,
product_whoCanOpenCases String,
pricing_LeaseContractLength String,
pricing_OfferingClass String,
pricing_PurchaseOption String,
pricing_publicOnDemandCost String,
pricing_publicOnDemandRate String,
pricing_term String,
pricing_unit String,
reservation_AvailabilityZone String,
reservation_NormalizedUnitsPerReservation String,
reservation_NumberOfReservations String,
reservation_ReservationARN String,
reservation_TotalReservedNormalizedUnits String,
reservation_TotalReservedUnits String,
reservation_UnitsPerReservation String,
resourceTags_userName String,
resourceTags_usercostcategory String  


)
    ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
      ESCAPED BY '\\'
      LINES TERMINATED BY '\n'

STORED AS TEXTFILE
    LOCATION 's3://<<your bucket name>>';

Once you’ve successfully executed the command, you should see a new table named “cost_and_usage” with the below properties. Now we’re ready to start executing queries and running analysis!

Start with Looker and connect to Athena

Setting up Looker is a quick process, and you can try it out for free here (or download from Amazon Marketplace). It takes just a few seconds to connect Looker to your Athena database, and Looker comes with a host of pre-built data models and dashboards to make analysis of your cost and usage data simple and intuitive. After you’re connected, you can use the Looker UI to run whatever analysis you’d like. Looker translates this UI to optimized SQL, so any user can execute and visualize queries for true self-service analytics.

Major cost saving levers

Now that the data pipeline is configured, you can dive into the most popular use cases for cost savings. In this post, I focus on:

  • Purchasing Reserved Instances vs. On-Demand Instances
  • Data transfer costs
  • Allocating costs over users or other Attributes (denoted with resource tags)

On-Demand, Spot, and Reserved Instances

Purchasing Reserved Instances vs On-Demand Instances is arguably going to be the biggest cost lever for heavy AWS users (Reserved Instances run up to 75% cheaper!). AWS offers three options for purchasing instances:

  • On-Demand—Pay as you use.
  • Spot (variable cost)—Bid on spare Amazon EC2 computing capacity.
  • Reserved Instances—Pay for an instance for a specific, allotted period of time.

When purchasing a Reserved Instance, you can also choose to pay all-upfront, partial-upfront, or monthly. The more you pay upfront, the greater the discount.

If your company has been using AWS for some time now, you should have a good sense of your overall instance usage on a per-month or per-day basis. Rather than paying for these instances On-Demand, you should try to forecast the number of instances you’ll need, and reserve them with upfront payments.

The total amount of usage with Reserved Instances versus overall usage with all instances is called your coverage ratio. It’s important not to confuse your coverage ratio with your Reserved Instance utilization. Utilization represents the amount of reserved hours that were actually used. Don’t worry about exceeding capacity, you can still set up Auto Scaling preferences so that more instances get added whenever your coverage or utilization crosses a certain threshold (we often see a target of 80% for both coverage and utilization among savvy customers).

Calculating the reserved costs and coverage can be a bit tricky with the level of granularity provided by the cost and usage report. The following query shows your total cost over the last 6 months, broken out by Reserved Instance vs other instance usage. You can substitute the cost field for usage if you’d prefer. Please note that you should only have data for the time period after the cost and usage report has been enabled (though you can opt for up to 3 months of historical data by contacting your AWS Account Executive). If you’re just getting started, this query will only show a few days.

 

SELECT 
	DATE_FORMAT(from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate),'%Y-%m') AS "cost_and_usage.usage_start_month",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0) AS "cost_and_usage.total_unblended_cost",
	COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_reserved_unblended_cost",
	1.0 * (COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_on_ris",
	COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_non_reserved_unblended_cost",
	1.0 * (COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_on_non_ris"
FROM aws_optimizer.cost_and_usage  AS cost_and_usage

WHERE 
	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))
GROUP BY 1
ORDER BY 2 DESC
LIMIT 500

The resulting table should look something like the image below (I’m surfacing tables through Looker, though the same table would result from querying via command line or any other interface).

With a BI tool, you can create dashboards for easy reference and monitoring. New data is dumped into S3 every few hours, so your dashboards can update several times per day.

It’s an iterative process to understand the appropriate number of Reserved Instances needed to meet your business needs. After you’ve properly integrated Reserved Instances into your purchasing patterns, the savings can be significant. If your coverage is consistently below 70%, you should seriously consider adjusting your purchase types and opting for more Reserved instances.

Data transfer costs

One of the great things about AWS data storage is that it’s incredibly cheap. Most charges often come from moving and processing that data. There are several different prices for transferring data, broken out largely by transfers between regions and availability zones. Transfers between regions are the most costly, followed by transfers between Availability Zones. Transfers within the same region and same availability zone are free unless using elastic or public IP addresses, in which case there is a cost. You can find more detailed information in the AWS Pricing Docs. With this in mind, there are several simple strategies for helping reduce costs.

First, since costs increase when transferring data between regions, it’s wise to ensure that as many services as possible reside within the same region. The more you can localize services to one specific region, the lower your costs will be.

Second, you should maximize the data you’re routing directly within AWS services and IP addresses. Transfers out to the open internet are the most costly and least performant mechanisms of data transfers, so it’s best to keep transfers within AWS services.

Lastly, data transfers between private IP addresses are cheaper than between elastic or public IP addresses, so utilizing private IP addresses as much as possible is the most cost-effective strategy.

The following query provides a table depicting the total costs for each AWS product, broken out transfer cost type. Substitute the “lineitem_productcode” field in the query to segment the costs by any other attribute. If you notice any unusually high spikes in cost, you’ll need to dig deeper to understand what’s driving that spike: location, volume, and so on. Drill down into specific costs by including “product_usagetype” and “product_transfertype” in your query to identify the types of transfer costs that are driving up your bill.

SELECT 
	cost_and_usage.lineitem_productcode  AS "cost_and_usage.product_code",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost), 0) AS "cost_and_usage.total_unblended_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_data_transfer_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer-In')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_inbound_data_transfer_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer-Out')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_outbound_data_transfer_cost"
FROM aws_optimizer.cost_and_usage  AS cost_and_usage

WHERE 
	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))
GROUP BY 1
ORDER BY 2 DESC
LIMIT 500

When moving between regions or over the open web, many data transfer costs also include the origin and destination location of the data movement. Using a BI tool with mapping capabilities, you can get a nice visual of data flows. The point at the center of the map is used to represent external data flows over the open internet.

Analysis by tags

AWS provides the option to apply custom tags to individual resources, so you can allocate costs over whatever customized segment makes the most sense for your business. For a SaaS company that hosts software for customers on AWS, maybe you’d want to tag the size of each customer. The following query uses custom tags to display the reserved, data transfer, and total cost for each AWS service, broken out by tag categories, over the last 6 months. You’ll want to substitute the cost_and_usage.resourcetags_customersegment and cost_and_usage.customer_segment with the name of your customer field.

 

SELECT * FROM (
SELECT *, DENSE_RANK() OVER (ORDER BY z___min_rank) as z___pivot_row_rank, RANK() OVER (PARTITION BY z__pivot_col_rank ORDER BY z___min_rank) as z__pivot_col_ordering FROM (
SELECT *, MIN(z___rank) OVER (PARTITION BY "cost_and_usage.product_code") as z___min_rank FROM (
SELECT *, RANK() OVER (ORDER BY CASE WHEN z__pivot_col_rank=1 THEN (CASE WHEN "cost_and_usage.total_unblended_cost" IS NOT NULL THEN 0 ELSE 1 END) ELSE 2 END, CASE WHEN z__pivot_col_rank=1 THEN "cost_and_usage.total_unblended_cost" ELSE NULL END DESC, "cost_and_usage.total_unblended_cost" DESC, z__pivot_col_rank, "cost_and_usage.product_code") AS z___rank FROM (
SELECT *, DENSE_RANK() OVER (ORDER BY CASE WHEN "cost_and_usage.customer_segment" IS NULL THEN 1 ELSE 0 END, "cost_and_usage.customer_segment") AS z__pivot_col_rank FROM (
SELECT 
	cost_and_usage.lineitem_productcode  AS "cost_and_usage.product_code",
	cost_and_usage.resourcetags_customersegment  AS "cost_and_usage.customer_segment",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0) AS "cost_and_usage.total_unblended_cost",
	1.0 * (COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_data_transfers_unblended",
	1.0 * (COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.unblended_percent_spend_on_ris"
FROM aws_optimizer.cost_and_usage_raw  AS cost_and_usage

WHERE 
	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))
GROUP BY 1,2) ww
) bb WHERE z__pivot_col_rank <= 16384
) aa
) xx
) zz
 WHERE z___pivot_row_rank <= 500 OR z__pivot_col_ordering = 1 ORDER BY z___pivot_row_rank

The resulting table in this example looks like the results below. In this example, you can tell that we’re making poor use of Reserved Instances because they represent such a small portion of our overall costs.

Again, using a BI tool to visualize these costs and trends over time makes the analysis much easier to consume and take action on.

Summary

Saving costs on your AWS spend is always an iterative, ongoing process. Hopefully with these queries alone, you can start to understand your spending patterns and identify opportunities for savings. However, this is just a peek into the many opportunities available through analysis of the Cost and Usage report. Each company is different, with unique needs and usage patterns. To achieve maximum cost savings, we encourage you to set up an analytics environment that enables your team to explore all potential cuts and slices of your usage data, whenever it’s necessary. Exploring different trends and spikes across regions, services, user types, etc. helps you gain comprehensive understanding of your major cost levers and consistently implement new cost reduction strategies.

Note that all of the queries and analysis provided in this post were generated using the Looker data platform. If you’re already a Looker customer, you can get all of this analysis, additional pre-configured dashboards, and much more using Looker Blocks for AWS.


About the Author

Dillon Morrison leads the Platform Ecosystem at Looker. He enjoys exploring new technologies and architecting the most efficient data solutions for the business needs of his company and their customers. In his spare time, you’ll find Dillon rock climbing in the Bay Area or nose deep in the docs of the latest AWS product release at his favorite cafe (“Arlequin in SF is unbeatable!”).

 

 

 

Automating Blue/Green Deployments of Infrastructure and Application Code using AMIs, AWS Developer Tools, & Amazon EC2 Systems Manager

Post Syndicated from Ramesh Adabala original https://aws.amazon.com/blogs/devops/bluegreen-infrastructure-application-deployment-blog/

Previous DevOps blog posts have covered the following use cases for infrastructure and application deployment automation:

An AMI provides the information required to launch an instance, which is a virtual server in the cloud. You can use one AMI to launch as many instances as you need. It is security best practice to customize and harden your base AMI with required operating system updates and, if you are using AWS native services for continuous security monitoring and operations, you are strongly encouraged to bake into the base AMI agents such as those for Amazon EC2 Systems Manager (SSM), Amazon Inspector, CodeDeploy, and CloudWatch Logs. A customized and hardened AMI is often referred to as a “golden AMI.” The use of golden AMIs to create EC2 instances in your AWS environment allows for fast and stable application deployment and scaling, secure application stack upgrades, and versioning.

In this post, using the DevOps automation capabilities of Systems Manager, AWS developer tools (CodePipeLine, CodeDeploy, CodeCommit, CodeBuild), I will show you how to use AWS CodePipeline to orchestrate the end-to-end blue/green deployments of a golden AMI and application code. Systems Manager Automation is a powerful security feature for enterprises that want to mature their DevSecOps practices.

Here are the high-level phases and primary services covered in this use case:

 

You can access the source code for the sample used in this post here: https://github.com/awslabs/automating-governance-sample/tree/master/Bluegreen-AMI-Application-Deployment-blog.

This sample will create a pipeline in AWS CodePipeline with the building blocks to support the blue/green deployments of infrastructure and application. The sample includes a custom Lambda step in the pipeline to execute Systems Manager Automation to build a golden AMI and update the Auto Scaling group with the golden AMI ID for every rollout of new application code. This guarantees that every new application deployment is on a fully patched and customized AMI in a continuous integration and deployment model. This enables the automation of hardened AMI deployment with every new version of application deployment.

 

 

We will build and run this sample in three parts.

Part 1: Setting up the AWS developer tools and deploying a base web application

Part 1 of the AWS CloudFormation template creates the initial Java-based web application environment in a VPC. It also creates all the required components of Systems Manager Automation, CodeCommit, CodeBuild, and CodeDeploy to support the blue/green deployments of the infrastructure and application resulting from ongoing code releases.

Part 1 of the AWS CloudFormation stack creates these resources:

After Part 1 of the AWS CloudFormation stack creation is complete, go to the Outputs tab and click the Elastic Load Balancing link. You will see the following home page for the base web application:

Make sure you have all the outputs from the Part 1 stack handy. You need to supply them as parameters in Part 3 of the stack.

Part 2: Setting up your CodeCommit repository

In this part, you will commit and push your sample application code into the CodeCommit repository created in Part 1. To access the initial git commands to clone the empty repository to your local machine, click Connect to go to the AWS CodeCommit console. Make sure you have the IAM permissions required to access AWS CodeCommit from command line interface (CLI).

After you’ve cloned the repository locally, download the sample application files from the part2 folder of the Git repository and place the files directly into your local repository. Do not include the aws-codedeploy-sample-tomcat folder. Go to the local directory and type the following commands to commit and push the files to the CodeCommit repository:

git add .
git commit -a -m "add all files from the AWS Java Tomcat CodeDeploy application"
git push

After all the files are pushed successfully, the repository should look like this:

 

Part 3: Setting up CodePipeline to enable blue/green deployments     

Part 3 of the AWS CloudFormation template creates the pipeline in AWS CodePipeline and all the required components.

a) Source: The pipeline is triggered by any change to the CodeCommit repository.

b) BuildGoldenAMI: This Lambda step executes the Systems Manager Automation document to build the golden AMI. After the golden AMI is successfully created, a new launch configuration with the new AMI details will be updated into the Auto Scaling group of the application deployment group. You can watch the progress of the automation in the EC2 console from the Systems Manager –> Automations menu.

c) Build: This step uses the application build spec file to build the application build artifact. Here are the CodeBuild execution steps and their status:

d) Deploy: This step clones the Auto Scaling group, launches the new instances with the new AMI, deploys the application changes, reroutes the traffic from the elastic load balancer to the new instances and terminates the old Auto Scaling group. You can see the execution steps and their status in the CodeDeploy console.

After the CodePipeline execution is complete, you can access the application by clicking the Elastic Load Balancing link. You can find it in the output of Part 1 of the AWS CloudFormation template. Any consecutive commits to the application code in the CodeCommit repository trigger the pipelines and deploy the infrastructure and code with an updated AMI and code.

 

If you have feedback about this post, add it to the Comments section below. If you have questions about implementing the example used in this post, open a thread on the Developer Tools forum.


About the author

 

Ramesh Adabala is a Solutions Architect in Southeast Enterprise Solution Architecture team at Amazon Web Services.

Turbocharge your Apache Hive queries on Amazon EMR using LLAP

Post Syndicated from Jigar Mistry original https://aws.amazon.com/blogs/big-data/turbocharge-your-apache-hive-queries-on-amazon-emr-using-llap/

Apache Hive is one of the most popular tools for analyzing large datasets stored in a Hadoop cluster using SQL. Data analysts and scientists use Hive to query, summarize, explore, and analyze big data.

With the introduction of Hive LLAP (Low Latency Analytical Processing), the notion of Hive being just a batch processing tool has changed. LLAP uses long-lived daemons with intelligent in-memory caching to circumvent batch-oriented latency and provide sub-second query response times.

This post provides an overview of Hive LLAP, including its architecture and common use cases for boosting query performance. You will learn how to install and configure Hive LLAP on an Amazon EMR cluster and run queries on LLAP daemons.

What is Hive LLAP?

Hive LLAP was introduced in Apache Hive 2.0, which provides very fast processing of queries. It uses persistent daemons that are deployed on a Hadoop YARN cluster using Apache Slider. These daemons are long-running and provide functionality such as I/O with DataNode, in-memory caching, query processing, and fine-grained access control. And since the daemons are always running in the cluster, it saves substantial overhead of launching new YARN containers for every new Hive session, thereby avoiding long startup times.

When Hive is configured in hybrid execution mode, small and short queries execute directly on LLAP daemons. Heavy lifting (like large shuffles in the reduce stage) is performed in YARN containers that belong to the application. Resources (CPU, memory, etc.) are obtained in a traditional fashion using YARN. After the resources are obtained, the execution engine can decide which resources are to be allocated to LLAP, or it can launch Apache Tez processors in separate YARN containers. You can also configure Hive to run all the processing workloads on LLAP daemons for querying small datasets at lightning fast speeds.

LLAP daemons are launched under YARN management to ensure that the nodes don’t get overloaded with the compute resources of these daemons. You can use scheduling queues to make sure that there is enough compute capacity for other YARN applications to run.

Why use Hive LLAP?

With many options available in the market (Presto, Spark SQL, etc.) for doing interactive SQL  over data that is stored in Amazon S3 and HDFS, there are several reasons why using Hive and LLAP might be a good choice:

  • For those who are heavily invested in the Hive ecosystem and have external BI tools that connect to Hive over JDBC/ODBC connections, LLAP plugs in to their existing architecture without a steep learning curve.
  • It’s compatible with existing Hive SQL and other Hive tools, like HiveServer2, and JDBC drivers for Hive.
  • It has native support for security features with authentication and authorization (SQL standards-based authorization) using HiveServer2.
  • LLAP daemons are aware about of the columns and records that are being processed which enables you to enforce fine-grained access control.
  • It can use Hive’s vectorization capabilities to speed up queries, and Hive has better support for Parquet file format when vectorization is enabled.
  • It can take advantage of a number of Hive optimizations like merging multiple small files for query results, automatically determining the number of reducers for joins and groupbys, etc.
  • It’s optional and modular so it can be turned on or off depending on the compute and resource requirements of the cluster. This lets you to run other YARN applications concurrently without reserving a cluster specifically for LLAP.

How do you install Hive LLAP in Amazon EMR?

To install and configure LLAP on an EMR cluster, use the following bootstrap action (BA):

s3://aws-bigdata-blog/artifacts/Turbocharge_Apache_Hive_on_EMR/configure-Hive-LLAP.sh

This BA downloads and installs Apache Slider on the cluster and configures LLAP so that it works with EMR Hive. For LLAP to work, the EMR cluster must have Hive, Tez, and Apache Zookeeper installed.

You can pass the following arguments to the BA.

Argument Definition Default value
--instances Number of instances of LLAP daemon Number of core/task nodes of the cluster
--cache Cache size per instance 20% of physical memory of the node
--executors Number of executors per instance Number of CPU cores of the node
--iothreads Number of IO threads per instance Number of CPU cores of the node
--size Container size per instance 50% of physical memory of the node
--xmx Working memory size 50% of container size
--log-level Log levels for the LLAP instance INFO

LLAP example

This section describes how you can try the faster Hive queries with LLAP using the TPC-DS testbench for Hive on Amazon EMR.

Use the following AWS command line interface (AWS CLI) command to launch a 1+3 nodes m4.xlarge EMR 5.6.0 cluster with the bootstrap action to install LLAP:

aws emr create-cluster --release-label emr-5.6.0 \
--applications Name=Hadoop Name=Hive Name=Hue Name=ZooKeeper Name=Tez \
--bootstrap-actions '[{"Path":"s3://aws-bigdata-blog/artifacts/Turbocharge_Apache_Hive_on_EMR/configure-Hive-LLAP.sh","Name":"Custom action"}]' \ 
--ec2-attributes '{"KeyName":"<YOUR-KEY-PAIR>","InstanceProfile":"EMR_EC2_DefaultRole","SubnetId":"subnet-xxxxxxxx","EmrManagedSlaveSecurityGroup":"sg-xxxxxxxx","EmrManagedMasterSecurityGroup":"sg-xxxxxxxx"}' 
--service-role EMR_DefaultRole \
--enable-debugging \
--log-uri 's3n://<YOUR-BUCKET/' --name 'test-hive-llap' \
--instance-groups '[{"InstanceCount":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":1}],"EbsOptimized":true},"InstanceGroupType":"MASTER","InstanceType":"m4.xlarge","Name":"Master - 1"},{"InstanceCount":3,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":1}],"EbsOptimized":true},"InstanceGroupType":"CORE","InstanceType":"m4.xlarge","Name":"Core - 2"}]' 
--region us-east-1

After the cluster is launched, log in to the master node using SSH, and do the following:

  1. Open the hive-tpcds folder:
    cd /home/hadoop/hive-tpcds/
  2. Start Hive CLI using the testbench configuration, create the required tables, and run the sample query:

    hive –i testbench.settings
    hive> source create_tables.sql;
    hive> source query55.sql;

    This sample query runs on a 40 GB dataset that is stored on Amazon S3. The dataset is generated using the data generation tool in the TPC-DS testbench for Hive.It results in output like the following:
  3. This screenshot shows that the query finished in about 47 seconds for LLAP mode. Now, to compare this to the execution time without LLAP, you can run the same workload using only Tez containers:
    hive> set hive.llap.execution.mode=none;
    hive> source query55.sql;


    This query finished in about 80 seconds.

The difference in query execution time is almost 1.7 times when using just YARN containers in contrast to running the query on LLAP daemons. And with every rerun of the query, you notice that the execution time substantially decreases by the virtue of in-memory caching by LLAP daemons.

Conclusion

In this post, I introduced Hive LLAP as a way to boost Hive query performance. I discussed its architecture and described several use cases for the component. I showed how you can install and configure Hive LLAP on an Amazon EMR cluster and how you can run queries on LLAP daemons.

If you have questions about using Hive LLAP on Amazon EMR or would like to share your use cases, please leave a comment below.


Additional Reading

Learn how to to automatically partition Hive external tables with AWS.


About the Author

Jigar Mistry is a Hadoop Systems Engineer with Amazon Web Services. He works with customers to provide them architectural guidance and technical support for processing large datasets in the cloud using open-source applications. In his spare time, he enjoys going for camping and exploring different restaurants in the Seattle area.

 

 

 

 

Transparency in Cloud Storage Costs

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/transparency-in-cloud-storage-costs/

cloud storage cost calculator

Backblaze’s mission is to make cloud storage that’s affordable and astonishingly easy to use. Backblaze B2 embodies that mission for those looking for an object storage solution.

Another Backblaze core value is being transparent, from releasing our Storage Pod designs to detailing our cloud storage cost of goods sold. We are an open book in the Cloud Storage industry. So it makes sense that opaque pricing policies that require mind numbing calculations are a no-no for us. Our approach to pricing is to be transparent, straight-forward, and predictable.

For Backblaze B2, this means that no matter how much data you have, the cost for B2 is $0.005/GB per month for data storage and $0.02/GB to download data. There are no costs to upload. We also throw in 10GB of storage and 1GB of downloads for free every month.

Cloud Storage Price Comparison

The storage industry does not share our view of making pricing transparent, or affordable. In an effort to help everyone, we’ve made a Cloud Storage Pricing Calculator, where anyone can enter in their specific use case and get pricing back for B2, S3, Azure, and GCS. We’ve also included the calculator below for those interested in trying it out.

B2 Cost Calculator

Backblaze provides this calculator as an estimate.

Initial Upload:

GB

Data over time

Monthly Upload:

GB

Monthly Delete:

GB

Monthly Download:

GB


Period of Time:

Months

Storage Costs

Storage Cost for Initial Month:
x

Data Added Each Month:
x

Data Deleted Each Month:
x

Net Data:
x

Download Costs

Monthly Download Cost:
x

Total

Total Cost for x Months
x

Amazon S3
Microsoft Azure
Google Cloud

x
x
x
x
x
x
* Figures are not exact and do not include the following: Free first 10 GB of storage, free 1 GB of daily downloads, or $.004/10,000 class B Transactions and $.004/1,000 Class C Transactions.

Sample storage scenarios:

Scenario 1

You have data you wish to archive, and will be adding more each month, but you don’t expect that you will be downloading or deleting any data.

Initial upload: 10,000GB
Monthly upload: 1,000GB

For twelve months, your costs would be:

Backblaze B2 $990.00
Amazon S3 $4,158.00 +420%
Microsoft Azure $4,356.00 +440%
Google Cloud $5,148.00 +520%

 

Scenario 2

You wish to store data, and will be actively changing that data with uploads, downloads, and deletions.

Initial upload: 10,000GB
Monthly upload: 2,000GB
Monthly deletion: 1,000GB
Monthly download: 500GB

Your costs for 12 months would be:

Backblaze B2 $1,100.00
Amazon S3 $3.458/00 +402%
Microsoft Azure $4,656.00 +519%
Google Cloud $5,628.00 +507%

We invite you to compare our cost estimates against the competition. Here are the links to our competitors’ pricing calculators.

B2 Cloud Storage Pricing Summary

Provider
Storage
($/GB/Month)

Download
($/GB)
$0.005 $0.02
$0.021
+420%
$0.05+
+250%
$0.022+
+440%
$0.05+
+250%
$0.026
+520%
$0.08+
+400%

The Details


STORAGE
$0.005/GB/Month
How much data you have stored with Backblaze. This is calculated once a day based on the average storage of the previous 24 hours.
The first 10 GB of storage is free.

DOWNLOAD
$0.02/GB
Charged when you download files and charged when you create a Snapshot. Charged for any portion of a GB. The first 1 GB of data downloaded each day is free.

TRANSACTIONS
Class “A” transactions – Free
Class “B” transactions – $0.004 per 10,000 with 2,500 free per day.
Class “C” transactions – $0.004 per 1,000 with 2,500 free per day.
View Transactions by API Call

DATA BY MAIL
Mail us your data on a B2 Fireball – $550
Backblaze will mail your data to you by FedEx:
• USB Flash Drive – up to 110 GB – $89
• USB Hard Drive – up to 3.5TB of data – $189

PRODUCT SUPPORT
All B2 active account owners can contact Backblaze support at help.backblaze.com where they will also find a free-to- use knowledge base of B2 advice, guides, and more. In addition, a B2 user can pay to upgrade their support plan to include phone service, 24×7 support and more.

EVERYTHING ELSE
Free
Unlike other services, you won’t be nickeled and dimed with upload fees, file deletion charges, minimum files size requirements, and more. Everything you can possibly pay Backblaze is listed above.

 

Visit our B2 Cloud Storage Pricing web page for more details.


Amazon S3
Storage Costs
Initial upload cost:
x
Data added each month:
x

Data del. each month:
x

Net data:
x

Download Costs

Monthly Download Cost:
x

Total

Total Cost for x Months
x

Microsoft
Storage Costs
Initial upload cost:
x
Data added each month:
x

Data del. each month:
x

Net data:
x

Download Costs

Monthly Download Cost:
x

Total

Total Cost for x Months
x

Google
Storage Costs
Initial upload cost:
x
Data added each month:
x

Data del. each month:
x

Net data:
x

Download Costs

Monthly Download Cost:
x

Total

Total Cost for x Months
x

The post Transparency in Cloud Storage Costs appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Create Multiple Builds from the Same Source Using Different AWS CodeBuild Build Specification Files

Post Syndicated from Prakash Palanisamy original https://aws.amazon.com/blogs/devops/create-multiple-builds-from-the-same-source-using-different-aws-codebuild-build-specification-files/

In June 2017, AWS CodeBuild announced you can now specify an alternate build specification file name or location in an AWS CodeBuild project.

In this post, I’ll show you how to use different build specification files in the same repository to create different builds. You’ll find the source code for this post in our GitHub repo.

Requirements

The AWS CLI must be installed and configured.

Solution Overview

I have created a C program (cbsamplelib.c) that will be used to create a shared library and another utility program (cbsampleutil.c) to use that library. I’ll use a Makefile to compile these files.

I need to put this sample application in RPM and DEB packages so end users can easily deploy them. I have created a build specification file for RPM. It will use make to compile this code and the RPM specification file (cbsample.rpmspec) configured in the build specification to create the RPM package. Similarly, I have created a build specification file for DEB. It will create the DEB package based on the control specification file (cbsample.control) configured in this build specification.

RPM Build Project:

The following build specification file (buildspec-rpm.yml) uses build specification version 0.2. As described in the documentation, this version has different syntax for environment variables. This build specification includes multiple phases:

  • As part of the install phase, the required packages is installed using yum.
  • During the pre_build phase, the required directories are created and the required files, including the RPM build specification file, are copied to the appropriate location.
  • During the build phase, the code is compiled, and then the RPM package is created based on the RPM specification.

As defined in the artifact section, the RPM file will be uploaded as a build artifact.

version: 0.2

env:
  variables:
    build_version: "0.1"

phases:
  install:
    commands:
      - yum install rpm-build make gcc glibc -y
  pre_build:
    commands:
      - curr_working_dir=`pwd`
      - mkdir -p ./{RPMS,SRPMS,BUILD,SOURCES,SPECS,tmp}
      - filename="cbsample-$build_version"
      - echo $filename
      - mkdir -p $filename
      - cp ./*.c ./*.h Makefile $filename
      - tar -zcvf /root/$filename.tar.gz $filename
      - cp /root/$filename.tar.gz ./SOURCES/
      - cp cbsample.rpmspec ./SPECS/
  build:
    commands:
      - echo "Triggering RPM build"
      - rpmbuild --define "_topdir `pwd`" -ba SPECS/cbsample.rpmspec
      - cd $curr_working_dir

artifacts:
  files:
    - RPMS/x86_64/cbsample*.rpm
  discard-paths: yes

Using cb-centos-project.json as a reference, create the input JSON file for the CLI command. This project uses an AWS CodeCommit repository named codebuild-multispec and a file named buildspec-rpm.yml as the build specification file. To create the RPM package, we need to specify a custom image name. I’m using the latest CentOS 7 image available in the Docker Hub. I’m using a role named CodeBuildServiceRole. It contains permissions similar to those defined in CodeBuildServiceRole.json. (You need to change the resource fields in the policy, as appropriate.)

{
    "name": "rpm-build-project",
    "description": "Project which will build RPM from the source.",
    "source": {
        "type": "CODECOMMIT",
        "location": "https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/codebuild-multispec",
        "buildspec": "buildspec-rpm.yml"
    },
    "artifacts": {
        "type": "S3",
        "location": "codebuild-demo-artifact-repository"
    },
    "environment": {
        "type": "LINUX_CONTAINER",
        "image": "centos:7",
        "computeType": "BUILD_GENERAL1_SMALL"
    },
    "serviceRole": "arn:aws:iam::012345678912:role/service-role/CodeBuildServiceRole",
    "timeoutInMinutes": 15,
    "encryptionKey": "arn:aws:kms:eu-west-1:012345678912:alias/aws/s3",
    "tags": [
        {
            "key": "Name",
            "value": "RPM Demo Build"
        }
    ]
}

After the cli-input-json file is ready, execute the following command to create the build project.

$ aws codebuild create-project --name CodeBuild-RPM-Demo --cli-input-json file://cb-centos-project.json

{
    "project": {
        "name": "CodeBuild-RPM-Demo", 
        "serviceRole": "arn:aws:iam::012345678912:role/service-role/CodeBuildServiceRole", 
        "tags": [
            {
                "value": "RPM Demo Build", 
                "key": "Name"
            }
        ], 
        "artifacts": {
            "namespaceType": "NONE", 
            "packaging": "NONE", 
            "type": "S3", 
            "location": "codebuild-demo-artifact-repository", 
            "name": "CodeBuild-RPM-Demo"
        }, 
        "lastModified": 1500559811.13, 
        "timeoutInMinutes": 15, 
        "created": 1500559811.13, 
        "environment": {
            "computeType": "BUILD_GENERAL1_SMALL", 
            "privilegedMode": false, 
            "image": "centos:7", 
            "type": "LINUX_CONTAINER", 
            "environmentVariables": []
        }, 
        "source": {
            "buildspec": "buildspec-rpm.yml", 
            "type": "CODECOMMIT", 
            "location": "https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/codebuild-multispec"
        }, 
        "encryptionKey": "arn:aws:kms:eu-west-1:012345678912:alias/aws/s3", 
        "arn": "arn:aws:codebuild:eu-west-1:012345678912:project/CodeBuild-RPM-Demo", 
        "description": "Project which will build RPM from the source."
    }
}

When the project is created, run the following command to start the build. After the build has started, get the build ID. You can use the build ID to get the status of the build.

$ aws codebuild start-build --project-name CodeBuild-RPM-Demo
{
    "build": {
        "buildComplete": false, 
        "initiator": "prakash", 
        "artifacts": {
            "location": "arn:aws:s3:::codebuild-demo-artifact-repository/CodeBuild-RPM-Demo"
        }, 
        "projectName": "CodeBuild-RPM-Demo", 
        "timeoutInMinutes": 15, 
        "buildStatus": "IN_PROGRESS", 
        "environment": {
            "computeType": "BUILD_GENERAL1_SMALL", 
            "privilegedMode": false, 
            "image": "centos:7", 
            "type": "LINUX_CONTAINER", 
            "environmentVariables": []
        }, 
        "source": {
            "buildspec": "buildspec-rpm.yml", 
            "type": "CODECOMMIT", 
            "location": "https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/codebuild-multispec"
        }, 
        "currentPhase": "SUBMITTED", 
        "startTime": 1500560156.761, 
        "id": "CodeBuild-RPM-Demo:57a36755-4d37-4b08-9c11-1468e1682abc", 
        "arn": "arn:aws:codebuild:eu-west-1: 012345678912:build/CodeBuild-RPM-Demo:57a36755-4d37-4b08-9c11-1468e1682abc"
    }
}

$ aws codebuild list-builds-for-project --project-name CodeBuild-RPM-Demo
{
    "ids": [
        "CodeBuild-RPM-Demo:57a36755-4d37-4b08-9c11-1468e1682abc"
    ]
}

$ aws codebuild batch-get-builds --ids CodeBuild-RPM-Demo:57a36755-4d37-4b08-9c11-1468e1682abc
{
    "buildsNotFound": [], 
    "builds": [
        {
            "buildComplete": true, 
            "phases": [
                {
                    "phaseStatus": "SUCCEEDED", 
                    "endTime": 1500560157.164, 
                    "phaseType": "SUBMITTED", 
                    "durationInSeconds": 0, 
                    "startTime": 1500560156.761
                }, 
                {
                    "contexts": [], 
                    "phaseType": "PROVISIONING", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 24, 
                    "startTime": 1500560157.164, 
                    "endTime": 1500560182.066
                }, 
                {
                    "contexts": [], 
                    "phaseType": "DOWNLOAD_SOURCE", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 15, 
                    "startTime": 1500560182.066, 
                    "endTime": 1500560197.906
                }, 
                {
                    "contexts": [], 
                    "phaseType": "INSTALL", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 19, 
                    "startTime": 1500560197.906, 
                    "endTime": 1500560217.515
                }, 
                {
                    "contexts": [], 
                    "phaseType": "PRE_BUILD", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 0, 
                    "startTime": 1500560217.515, 
                    "endTime": 1500560217.662
                }, 
                {
                    "contexts": [], 
                    "phaseType": "BUILD", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 0, 
                    "startTime": 1500560217.662, 
                    "endTime": 1500560217.995
                }, 
                {
                    "contexts": [], 
                    "phaseType": "POST_BUILD", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 0, 
                    "startTime": 1500560217.995, 
                    "endTime": 1500560218.074
                }, 
                {
                    "contexts": [], 
                    "phaseType": "UPLOAD_ARTIFACTS", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 0, 
                    "startTime": 1500560218.074, 
                    "endTime": 1500560218.542
                }, 
                {
                    "contexts": [], 
                    "phaseType": "FINALIZING", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 4, 
                    "startTime": 1500560218.542, 
                    "endTime": 1500560223.128
                }, 
                {
                    "phaseType": "COMPLETED", 
                    "startTime": 1500560223.128
                }
            ], 
            "logs": {
                "groupName": "/aws/codebuild/CodeBuild-RPM-Demo", 
                "deepLink": "https://console.aws.amazon.com/cloudwatch/home?region=eu-west-1#logEvent:group=/aws/codebuild/CodeBuild-RPM-Demo;stream=57a36755-4d37-4b08-9c11-1468e1682abc", 
                "streamName": "57a36755-4d37-4b08-9c11-1468e1682abc"
            }, 
            "artifacts": {
                "location": "arn:aws:s3:::codebuild-demo-artifact-repository/CodeBuild-RPM-Demo"
            }, 
            "projectName": "CodeBuild-RPM-Demo", 
            "timeoutInMinutes": 15, 
            "initiator": "prakash", 
            "buildStatus": "SUCCEEDED", 
            "environment": {
                "computeType": "BUILD_GENERAL1_SMALL", 
                "privilegedMode": false, 
                "image": "centos:7", 
                "type": "LINUX_CONTAINER", 
                "environmentVariables": []
            }, 
            "source": {
                "buildspec": "buildspec-rpm.yml", 
                "type": "CODECOMMIT", 
                "location": "https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/codebuild-multispec"
            }, 
            "currentPhase": "COMPLETED", 
            "startTime": 1500560156.761, 
            "endTime": 1500560223.128, 
            "id": "CodeBuild-RPM-Demo:57a36755-4d37-4b08-9c11-1468e1682abc", 
            "arn": "arn:aws:codebuild:eu-west-1:012345678912:build/CodeBuild-RPM-Demo:57a36755-4d37-4b08-9c11-1468e1682abc"
        }
    ]
}

DEB Build Project:

In this project, we will use the build specification file named buildspec-deb.yml. Like the RPM build project, this specification includes multiple phases. Here I use a Debian control file to create the package in DEB format. After a successful build, the DEB package will be uploaded as build artifact.

version: 0.2

env:
  variables:
    build_version: "0.1"

phases:
  install:
    commands:
      - apt-get install gcc make -y
  pre_build:
    commands:
      - mkdir -p ./cbsample-$build_version/DEBIAN
      - mkdir -p ./cbsample-$build_version/usr/lib
      - mkdir -p ./cbsample-$build_version/usr/include
      - mkdir -p ./cbsample-$build_version/usr/bin
      - cp -f cbsample.control ./cbsample-$build_version/DEBIAN/control
  build:
    commands:
      - echo "Building the application"
      - make
      - cp libcbsamplelib.so ./cbsample-$build_version/usr/lib
      - cp cbsamplelib.h ./cbsample-$build_version/usr/include
      - cp cbsampleutil ./cbsample-$build_version/usr/bin
      - chmod +x ./cbsample-$build_version/usr/bin/cbsampleutil
      - dpkg-deb --build ./cbsample-$build_version

artifacts:
  files:
    - cbsample-*.deb

Here we use cb-ubuntu-project.json as a reference to create the CLI input JSON file. This project uses the same AWS CodeCommit repository (codebuild-multispec) but a different buildspec file in the same repository (buildspec-deb.yml). We use the default CodeBuild image to create the DEB package. We use the same IAM role (CodeBuildServiceRole).

{
    "name": "deb-build-project",
    "description": "Project which will build DEB from the source.",
    "source": {
        "type": "CODECOMMIT",
        "location": "https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/codebuild-multispec",
        "buildspec": "buildspec-deb.yml"
    },
    "artifacts": {
        "type": "S3",
        "location": "codebuild-demo-artifact-repository"
    },
    "environment": {
        "type": "LINUX_CONTAINER",
        "image": "aws/codebuild/ubuntu-base:14.04",
        "computeType": "BUILD_GENERAL1_SMALL"
    },
    "serviceRole": "arn:aws:iam::012345678912:role/service-role/CodeBuildServiceRole",
    "timeoutInMinutes": 15,
    "encryptionKey": "arn:aws:kms:eu-west-1:012345678912:alias/aws/s3",
    "tags": [
        {
            "key": "Name",
            "value": "Debian Demo Build"
        }
    ]
}

Using the CLI input JSON file, create the project, start the build, and check the status of the project.

$ aws codebuild create-project --name CodeBuild-DEB-Demo --cli-input-json file://cb-ubuntu-project.json

$ aws codebuild list-builds-for-project --project-name CodeBuild-DEB-Demo

$ aws codebuild batch-get-builds --ids CodeBuild-DEB-Demo:e535c4b0-7067-4fbe-8060-9bb9de203789

After successful completion of the RPM and DEB builds, check the S3 bucket configured in the artifacts section for the build packages. Build projects will create a directory in the name of the build project and copy the artifacts inside it.

$ aws s3 ls s3://codebuild-demo-artifact-repository/CodeBuild-RPM-Demo/
2017-07-20 16:16:59       8108 cbsample-0.1-1.el7.centos.x86_64.rpm

$ aws s3 ls s3://codebuild-demo-artifact-repository/CodeBuild-DEB-Demo/
2017-07-20 16:37:22       5420 cbsample-0.1.deb

Override Buildspec During Build Start:

It’s also possible to override the build specification file of an existing project when starting a build. If we want to create the libs RPM package instead of the whole RPM, we will use the build specification file named buildspec-libs-rpm.yml. This build specification file is similar to the earlier RPM build. The only difference is that it uses a different RPM specification file to create libs RPM.

version: 0.2

env:
  variables:
    build_version: "0.1"

phases:
  install:
    commands:
      - yum install rpm-build make gcc glibc -y
  pre_build:
    commands:
      - curr_working_dir=`pwd`
      - mkdir -p ./{RPMS,SRPMS,BUILD,SOURCES,SPECS,tmp}
      - filename="cbsample-libs-$build_version"
      - echo $filename
      - mkdir -p $filename
      - cp ./*.c ./*.h Makefile $filename
      - tar -zcvf /root/$filename.tar.gz $filename
      - cp /root/$filename.tar.gz ./SOURCES/
      - cp cbsample-libs.rpmspec ./SPECS/
  build:
    commands:
      - echo "Triggering RPM build"
      - rpmbuild --define "_topdir `pwd`" -ba SPECS/cbsample-libs.rpmspec
      - cd $curr_working_dir

artifacts:
  files:
    - RPMS/x86_64/cbsample-libs*.rpm
  discard-paths: yes

Using the same RPM build project that we created earlier, start a new build and set the value of the `–buildspec-override` parameter to buildspec-libs-rpm.yml .

$ aws codebuild start-build --project-name CodeBuild-RPM-Demo --buildspec-override buildspec-libs-rpm.yml
{
    "build": {
        "buildComplete": false, 
        "initiator": "prakash", 
        "artifacts": {
            "location": "arn:aws:s3:::codebuild-demo-artifact-repository/CodeBuild-RPM-Demo"
        }, 
        "projectName": "CodeBuild-RPM-Demo", 
        "timeoutInMinutes": 15, 
        "buildStatus": "IN_PROGRESS", 
        "environment": {
            "computeType": "BUILD_GENERAL1_SMALL", 
            "privilegedMode": false, 
            "image": "centos:7", 
            "type": "LINUX_CONTAINER", 
            "environmentVariables": []
        }, 
        "source": {
            "buildspec": "buildspec-libs-rpm.yml", 
            "type": "CODECOMMIT", 
            "location": "https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/codebuild-multispec"
        }, 
        "currentPhase": "SUBMITTED", 
        "startTime": 1500562366.239, 
        "id": "CodeBuild-RPM-Demo:82d05f8a-b161-401c-82f0-83cb41eba567", 
        "arn": "arn:aws:codebuild:eu-west-1:012345678912:build/CodeBuild-RPM-Demo:82d05f8a-b161-401c-82f0-83cb41eba567"
    }
}

After the build is completed successfully, check to see if the package appears in the artifact S3 bucket under the CodeBuild-RPM-Demo build project folder.

$ aws s3 ls s3://codebuild-demo-artifact-repository/CodeBuild-RPM-Demo/
2017-07-20 16:16:59       8108 cbsample-0.1-1.el7.centos.x86_64.rpm
2017-07-20 16:53:54       5320 cbsample-libs-0.1-1.el7.centos.x86_64.rpm

Conclusion

In this post, I have shown you how multiple buildspec files in the same source repository can be used to run multiple AWS CodeBuild build projects. I have also shown you how to provide a different buildspec file when starting the build.

For more information about AWS CodeBuild, see the AWS CodeBuild documentation. You can get started with AWS CodeBuild by using this step by step guide.


About the author

Prakash Palanisamy is a Solutions Architect for Amazon Web Services. When he is not working on Serverless, DevOps or Alexa, he will be solving problems in Project Euler. He also enjoys watching educational documentaries.

Open and Click Tracking Have Arrived

Post Syndicated from Brent Meyer original https://aws.amazon.com/blogs/ses/open-and-click-tracking-have-arrived/

We’re pleased to announce the addition of open and click tracking metrics to Amazon SES. These metrics will help you measure the effectiveness of the email campaigns you send using Amazon SES.

We’re also adding the ability to publish email sending metrics to Amazon Simple Notification Service (Amazon SNS) using event publishing. This feature gives you greater control over the sending notifications you receive through Amazon SNS.

What’s new in this release?

When you send an email using Amazon SES, we now collect metrics related to opens and clicks. Opens, in this sense, refers to the number of users who successfully received your email and opened it in their email clients; clicks refers to the number of users who received an email and clicked one or more links in it.

Additionally, you can now use event publishing to push email sending notifications—including open and click notifications—using Amazon SNS. Previously, you could send account-level notifications through Amazon SNS. These notifications were pretty limited: you could only receive notifications about bounces, complaints, and deliveries, and you would receive notifications about all of these events across your entire Amazon SES account. Now you can use event publishing to send notifications about deliveries, opens, clicks, bounces, and complaints. Furthermore, you can set up event publishing so that you only receive notifications about emails sent using the configuration sets you specify in those emails.

Why should I use open and click tracking?

Whether you are sending marketing emails, transactional emails, or notifications, you need to know how effective your communications are. The email sending metrics feature of Amazon SES gives you data about entire email response funnel—the total number of emails that were sent, bounced, viewed, and clicked. You can then transform those insights into action.

For example, the open and click tracking feature can help you identify the customers who are most interested in receiving the messages you send. By narrowing down your list of recipients and focusing on your most engaged customers, you can save money (by sending fewer messages), improve the response rates of your marketing campaigns (by targeting only the customers who are most interested in what you have to say), and protect your sender reputation (by reducing the number of bounces and complaints against your sending domain).

How do I enable open and click tracking?

If you’ve set up Sending Metrics in the past, then you can easily add open and click tracking to your existing configuration sets. On the Configuration Sets page, choose the configuration set that contains your sending event destination; edit the event destination, check the boxes for Open and Click (as shown in the image below), and then choose Save.

How does open and click tracking work?

Amazon SES makes very minor changes to your emails in order to make open and click tracking work. At the bottom of each message, we insert a 1 pixel by 1 pixel transparent GIF image. Each email includes a unique link to this image file; when the image is opened, we can tell exactly which message was opened and by whom.

To track clicks, we set up a redirect for each link in the message. When a recipient clicks a link, they are sent to an Amazon SES server, and are immediately forwarded to the destination address. As with open tracking, each of these redirect links is unique, allowing us to easily determine which recipient clicked the link, when they clicked it, and the email from which they arrived at the link.

Can I disable click tracking?

You can disable click tracking by adding a special tag to the anchor tags in your HTML. For example, if you were linking to the AWS home page, a normal anchor link would look something like this:

<a href="https://aws.amazon.com/">Amazon Web Services</a>

To disable click tracking for that same link, you would modify to look like this:

<a ses:no-track href="https://aws.amazon.com/">Amazon Web Services</a>

Because the ses:no-track attribute is non-standard HTML, we automatically remove it from the version of the email that arrives in your recipients’ inboxes.

How do I use event publishing with Amazon SNS?

If you’ve set up event destinations in the past, then the process of setting up an Amazon SNS event destination will be very familiar. You can add an Amazon SNS destination to an existing configuration set, or create a new configuration set that uses Amazon SNS as its event destination. To learn more, see “Set Up an Amazon SNS Event Destination for Amazon SES Event Publishing” in our Developer Guide.

We’re excited about this release. Let us know what you think of these new features in the SES Forum, or in the comments for this post.