Tag Archives: Docker

Clean up Your Container Images with Amazon ECR Lifecycle Policies

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/clean-up-your-container-images-with-amazon-ecr-lifecycle-policies/

This post comes from the desk of Brent Langston.

Starting today, customers can keep their container image repositories tidy by automatically removing old or unused images using lifecycle policies, now available as part of Amazon E2 Container Repository (Amazon ECR).

Amazon ECR is a fully managed Docker container registry that makes it easy to store manage and deploy Docker container images without worrying about the typical challenges of scaling a service to handle pulling hundreds of images at one time. This scale means that development teams using Amazon ECR actively often find that their repositories fill up with many container image versions. This makes it difficult to find the code changes that matter and incurs unnecessary storage costs. Previously, cleaning up your repository meant spending time to manually delete old images, or writing and executing scripts.

Now, lifecycle policies allow you to define a set of rules to remove old container images automatically. You can also preview rules to see exactly which container images are affected when the rule runs. This allows repositories to be better organized, makes it easier to find the code revisions that matter, and lowers storage costs.

Look at how lifecycle policies work.

Ground Rules

One of the biggest benefits of deploying code in containers is the ability to quickly and easily roll back to a previous version. You can deploy with less risk because, if something goes wrong, it is easy to revert back to the previous container version and know that your application will run like it did before the failed deployment. Most people probably never roll back past a few versions. If your situation is similar, then one simple lifecycle rule might be to just keep the last 30 images.

Last 30 Images

In your ECR registry, choose Dry-Run Lifecycle Rules, Add.

  • For Image Status, select Untagged.
  • Under Match criteria, for Count Type, enter Image Count More Than.
  • For Count Number, enter 30.
  • For Rule action, choose expire.

Choose Save. To see which images would be cleaned up, Save and dry-run rules.

Of course, there are teams who, for compliance reasons, might prefer to keep certain images for a period of time, rather than keeping by count. For that situation, you can choose to clean up images older than 90 days.

Last 90 Days

Select the rule that you just created and choose Edit. Change the parameters to keep only 90 days of untagged images:

  • Under Match criteria, for Count Type, enter Since Image Pushed
  • For Count Number, enter 90.
  • For Count Unit, enter days.

Tags

Certainly 90 days is an arbitrary timeframe, and your team might have policies in place that would require a longer timeframe for certain kinds of images. If that’s the case, but you still want to continue with the spring cleaning, you can consider getting rid of images that are tag prefixed.

Here is the list of rules I came up with to groom untagged, development, staging, and production images:

  • Remove untagged images over 90 days old
  • Remove development tagged images over 90 days old
  • Remove staging tagged images over 180 days old
  • Remove production tagged images over 1 year old

As you can see, the new Amazon ECR lifecycle policies are powerful, and help you easily keep the images you need, while cleaning out images you may never use again. This feature is available starting today, in all regions where Amazon ECR is available, at no extra charge. For more information, see Amazon ECR Lifecycle Policies in the AWS technical documentation.

— Brent
@brentContained

AWS Developer Tools Expands Integration to Include GitHub

Post Syndicated from Balaji Iyer original https://aws.amazon.com/blogs/devops/aws-developer-tools-expands-integration-to-include-github/

AWS Developer Tools is a set of services that include AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, and AWS CodeDeploy. Together, these services help you securely store and maintain version control of your application’s source code and automatically build, test, and deploy your application to AWS or your on-premises environment. These services are designed to enable developers and IT professionals to rapidly and safely deliver software.

As part of our continued commitment to extend the AWS Developer Tools ecosystem to third-party tools and services, we’re pleased to announce AWS CodeStar and AWS CodeBuild now integrate with GitHub. This will make it easier for GitHub users to set up a continuous integration and continuous delivery toolchain as part of their release process using AWS Developer Tools.

In this post, I will walk through the following:

Prerequisites:

You’ll need an AWS account, a GitHub account, an Amazon EC2 key pair, and administrator-level permissions for AWS Identity and Access Management (IAM), AWS CodeStar, AWS CodeBuild, AWS CodePipeline, Amazon EC2, Amazon S3.

 

Integrating GitHub with AWS CodeStar

AWS CodeStar enables you to quickly develop, build, and deploy applications on AWS. Its unified user interface helps you easily manage your software development activities in one place. With AWS CodeStar, you can set up your entire continuous delivery toolchain in minutes, so you can start releasing code faster.

When AWS CodeStar launched in April of this year, it used AWS CodeCommit as the hosted source repository. You can now choose between AWS CodeCommit or GitHub as the source control service for your CodeStar projects. In addition, your CodeStar project dashboard lets you centrally track GitHub activities, including commits, issues, and pull requests. This makes it easy to manage project activity across the components of your CI/CD toolchain. Adding the GitHub dashboard view will simplify development of your AWS applications.

In this section, I will show you how to use GitHub as the source provider for your CodeStar projects. I’ll also show you how to work with recent commits, issues, and pull requests in the CodeStar dashboard.

Sign in to the AWS Management Console and from the Services menu, choose CodeStar. In the CodeStar console, choose Create a new project. You should see the Choose a project template page.

CodeStar Project

Choose an option by programming language, application category, or AWS service. I am going to choose the Ruby on Rails web application that will be running on Amazon EC2.

On the Project details page, you’ll now see the GitHub option. Type a name for your project, and then choose Connect to GitHub.

Project details

You’ll see a message requesting authorization to connect to your GitHub repository. When prompted, choose Authorize, and then type your GitHub account password.

Authorize

This connects your GitHub identity to AWS CodeStar through OAuth. You can always review your settings by navigating to your GitHub application settings.

Installed GitHub Apps

You’ll see AWS CodeStar is now connected to GitHub:

Create project

You can choose a public or private repository. GitHub offers free accounts for users and organizations working on public and open source projects and paid accounts that offer unlimited private repositories and optional user management and security features.

In this example, I am going to choose the public repository option. Edit the repository description, if you like, and then choose Next.

Review your CodeStar project details, and then choose Create Project. On Choose an Amazon EC2 Key Pair, choose Create Project.

Key Pair

On the Review project details page, you’ll see Edit Amazon EC2 configuration. Choose this link to configure instance type, VPC, and subnet options. AWS CodeStar requires a service role to create and manage AWS resources and IAM permissions. This role will be created for you when you select the AWS CodeStar would like permission to administer AWS resources on your behalf check box.

Choose Create Project. It might take a few minutes to create your project and resources.

Review project details

When you create a CodeStar project, you’re added to the project team as an owner. If this is the first time you’ve used AWS CodeStar, you’ll be asked to provide the following information, which will be shown to others:

  • Your display name.
  • Your email address.

This information is used in your AWS CodeStar user profile. User profiles are not project-specific, but they are limited to a single AWS region. If you are a team member in projects in more than one region, you’ll have to create a user profile in each region.

User settings

User settings

Choose Next. AWS CodeStar will create a GitHub repository with your configuration settings (for example, https://github.com/biyer/ruby-on-rails-service).

When you integrate your integrated development environment (IDE) with AWS CodeStar, you can continue to write and develop code in your preferred environment. The changes you make will be included in the AWS CodeStar project each time you commit and push your code.

IDE

After setting up your IDE, choose Next to go to the CodeStar dashboard. Take a few minutes to familiarize yourself with the dashboard. You can easily track progress across your entire software development process, from your backlog of work items to recent code deployments.

Dashboard

After the application deployment is complete, choose the endpoint that will display the application.

Pipeline

This is what you’ll see when you open the application endpoint:

The Commit history section of the dashboard lists the commits made to the Git repository. If you choose the commit ID or the Open in GitHub option, you can use a hotlink to your GitHub repository.

Commit history

Your AWS CodeStar project dashboard is where you and your team view the status of your project resources, including the latest commits to your project, the state of your continuous delivery pipeline, and the performance of your instances. This information is displayed on tiles that are dedicated to a particular resource. To see more information about any of these resources, choose the details link on the tile. The console for that AWS service will open on the details page for that resource.

Issues

You can also filter issues based on their status and the assigned user.

Filter

AWS CodeBuild Now Supports Building GitHub Pull Requests

CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy. With CodeBuild, you don’t need to provision, manage, and scale your own build servers. CodeBuild scales continuously and processes multiple builds concurrently, so your builds are not left waiting in a queue. You can use prepackaged build environments to get started quickly or you can create custom build environments that use your own build tools.

We recently announced support for GitHub pull requests in AWS CodeBuild. This functionality makes it easier to collaborate across your team while editing and building your application code with CodeBuild. You can use the AWS CodeBuild or AWS CodePipeline consoles to run AWS CodeBuild. You can also automate the running of AWS CodeBuild by using the AWS Command Line Interface (AWS CLI), the AWS SDKs, or the AWS CodeBuild Plugin for Jenkins.

AWS CodeBuild

In this section, I will show you how to trigger a build in AWS CodeBuild with a pull request from GitHub through webhooks.

Open the AWS CodeBuild console at https://console.aws.amazon.com/codebuild/. Choose Create project. If you already have a CodeBuild project, you can choose Edit project, and then follow along. CodeBuild can connect to AWS CodeCommit, S3, BitBucket, and GitHub to pull source code for builds. For Source provider, choose GitHub, and then choose Connect to GitHub.

Configure

After you’ve successfully linked GitHub and your CodeBuild project, you can choose a repository in your GitHub account. CodeBuild also supports connections to any public repository. You can review your settings by navigating to your GitHub application settings.

GitHub Apps

On Source: What to Build, for Webhook, select the Rebuild every time a code change is pushed to this repository check box.

Note: You can select this option only if, under Repository, you chose Use a repository in my account.

Source

In Environment: How to build, for Environment image, select Use an image managed by AWS CodeBuild. For Operating system, choose Ubuntu. For Runtime, choose Base. For Version, choose the latest available version. For Build specification, you can provide a collection of build commands and related settings, in YAML format (buildspec.yml) or you can override the build spec by inserting build commands directly in the console. AWS CodeBuild uses these commands to run a build. In this example, the output is the string “hello.”

Environment

On Artifacts: Where to put the artifacts from this build project, for Type, choose No artifacts. (This is also the type to choose if you are just running tests or pushing a Docker image to Amazon ECR.) You also need an AWS CodeBuild service role so that AWS CodeBuild can interact with dependent AWS services on your behalf. Unless you already have a role, choose Create a role, and for Role name, type a name for your role.

Artifacts

In this example, leave the advanced settings at their defaults.

If you expand Show advanced settings, you’ll see options for customizing your build, including:

  • A build timeout.
  • A KMS key to encrypt all the artifacts that the builds for this project will use.
  • Options for building a Docker image.
  • Elevated permissions during your build action (for example, accessing Docker inside your build container to build a Dockerfile).
  • Resource options for the build compute type.
  • Environment variables (built-in or custom). For more information, see Create a Build Project in the AWS CodeBuild User Guide.

Advanced settings

You can use the AWS CodeBuild console to create a parameter in Amazon EC2 Systems Manager. Choose Create a parameter, and then follow the instructions in the dialog box. (In that dialog box, for KMS key, you can optionally specify the ARN of an AWS KMS key in your account. Amazon EC2 Systems Manager uses this key to encrypt the parameter’s value during storage and decrypt during retrieval.)

Create parameter

Choose Continue. On the Review page, either choose Save and build or choose Save to run the build later.

Choose Start build. When the build is complete, the Build logs section should display detailed information about the build.

Logs

To demonstrate a pull request, I will fork the repository as a different GitHub user, make commits to the forked repo, check in the changes to a newly created branch, and then open a pull request.

Pull request

As soon as the pull request is submitted, you’ll see CodeBuild start executing the build.

Build

GitHub sends an HTTP POST payload to the webhook’s configured URL (highlighted here), which CodeBuild uses to download the latest source code and execute the build phases.

Build project

If you expand the Show all checks option for the GitHub pull request, you’ll see that CodeBuild has completed the build, all checks have passed, and a deep link is provided in Details, which opens the build history in the CodeBuild console.

Pull request

Summary:

In this post, I showed you how to use GitHub as the source provider for your CodeStar projects and how to work with recent commits, issues, and pull requests in the CodeStar dashboard. I also showed you how you can use GitHub pull requests to automatically trigger a build in AWS CodeBuild — specifically, how this functionality makes it easier to collaborate across your team while editing and building your application code with CodeBuild.


About the author:

Balaji Iyer is an Enterprise Consultant for the Professional Services Team at Amazon Web Services. In this role, he has helped several customers successfully navigate their journey to AWS. His specialties include architecting and implementing highly scalable distributed systems, serverless architectures, large scale migrations, operational security, and leading strategic AWS initiatives. Before he joined Amazon, Balaji spent more than a decade building operating systems, big data analytics solutions, mobile services, and web applications. In his spare time, he enjoys experiencing the great outdoors and spending time with his family.

 

Security updates for Wednesday

Post Syndicated from ris original https://lwn.net/Articles/736063/rss

Security updates have been issued by Arch Linux (lame, salt, and xorg-server), Debian (ffmpeg, imagemagick, libxfont, wordpress, and xen), Fedora (ImageMagick, rubygem-rmagick, and tor), Oracle (kernel), SUSE (kernel, SLES 12 Docker image, SLES 12-SP1 Docker image, and SLES 12-SP2 Docker image), and Ubuntu (curl, glance, horizon, kernel, keystone, libxfont, libxfont1, libxfont2, libxml2, linux, linux-aws, linux-gke, linux-kvm, linux-raspi2, linux-snapdragon, linux, linux-raspi2, linux-gcp, linux-hwe, linux-lts-xenial, nova, openvswitch, swift, and thunderbird).

Now Available – Microsoft SQL Server 2017 for Amazon EC2

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-available-microsoft-sql-server-2017-for-amazon-ec2/

Microsoft SQL Server 2017 (launched just a few days ago) includes lots of powerful new features including support for graph databases, automatic database tuning, and the ability to create clusterless Always On Availability Groups. It can also be run on Linux and in Docker containers.

Run on EC2
I’m happy to announce that you can now launch EC2 instances that run Windows Server 2016 and four editions (Web, Express, Standard, and Enterprise) of SQL Server 2017. The AMIs (Amazon Machine Images) are available today in all AWS Regions and run on a wide variety of EC2 instance types, including the new x1e.32xlarge with 128 vCPUs and almost 4 TB of memory.

You can launch these instances from the AWS Management Console or through AWS Marketplace. Here’s what they look like in the console:

And in AWS Marketplace:

Licensing Options Galore
You have lots of licensing options for SQL Server:

Pay As You Go – This option works well if you would prefer to avoid buying licenses, are already running an older version of SQL Server, and want to upgrade. You don’t have to deal with true-ups, software compliance audits, or Software Assurance and you don’t need to make a long-term purchase. If you are running the Standard Edition of SQL Server, you also benefit from our recent price reduction, with savings of up to 52%.

License Mobility – This option lets your use your active Software Assurance agreement to bring your existing licenses to EC2, and allows you to run SQL Server on Windows or Linux instances.

Bring Your Own Licenses – This option lets you take advantage of your existing license investment while minimizing upgrade costs. You can run SQL Server on EC2 Dedicated Instances or EC2 Dedicated Hosts, with the potential to reduce operating costs by licensing SQL Server on a per-core basis. This option allows you to run SQL Server 2017 on EC2 Linux instances (SUSE, RHEL, and Ubuntu are supported) and also supports Docker-based environments running on EC2 Windows and Linux instances. To learn more about these options, read the Installation Guidance for SQL Server on Linux and Run SQL Server 2017 Container Image with Docker.

Learn More
To learn more about SQL Server 2017 and to explore your licensing options in depth, take a look at the SQL Server on AWS page.

If you need advice and guidance as you plan your migration effort, check out the AWS Partners who have qualified for the Microsoft Workloads competency and focus on database solutions.

Amazon RDS support for SQL Server 2017 is planned for November. This will give you a fully managed option.

Plan to join the AWS team at the PASS Summit (November 1-3 in Seattle) and at AWS re:Invent (November 27th to December 1st in Las Vegas).

Jeff;

PS – Special thanks to my colleague Tom Staab (Partner Solutions Architect) for his help with this post!

timeShift(GrafanaBuzz, 1w) Issue 16

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/10/06/timeshiftgrafanabuzz-1w-issue-16/

Welcome to another issue of TimeShift. In addition to the roundup of articles and plugin updates, we had a big announcement this week – Early Bird tickets to GrafanaCon EU are now available! We’re also accepting CFPs through the end of October, so if you have a topic in mind, don’t wait until the last minute, please send it our way. Speakers who are selected will receive a comped ticket to the conference.


Early Bird Tickets Now Available

We’ve released a limited number of Early Bird tickets before General Admission tickets are available. Take advantage of this discount before they’re sold out!

Get Your Early Bird Ticket Now

Interested in speaking at GrafanaCon? We’re looking for technical and non-tecnical talks of all sizes. Submit a CFP Now.


From the Blogosphere

Get insights into your Azure Cosmos DB: partition heatmaps, OMS, and More: Microsoft recently announced the ability to access a subset of Azure Cosmos DB metrics via Azure Monitor API. Grafana Labs built an Azure Monitor Plugin for Grafana 4.5 to visualize the data.

How to monitor Docker for Mac/Windows: Brian was tired of guessing about the performance of his development machines and test environment. Here, he shows how to monitor Docker with Prometheus to get a better understanding of a dev environment in his quest to monitor all the things.

Prometheus and Grafana to Monitor 10,000 servers: This article covers enokido’s process of choosing a monitoring platform. He identifies three possible solutions, outlines the pros and cons of each, and discusses why he chose Prometheus.

GitLab Monitoring: It’s fascinating to see Grafana dashboards with production data from companies around the world. For instance, we’ve previously highlighted the huge number of dashboards Wikimedia publicly shares. This week, we found that GitLab also has public dashboards to explore.

Monitoring a Docker Swarm Cluster with cAdvisor, InfluxDB and Grafana | The Laboratory: It’s important to know the state of your applications in a scalable environment such as Docker Swarm. This video covers an overview of Docker, VM’s vs. containers, orchestration and how to monitor Docker Swarm.

Introducing Telemetry: Actionable Time Series Data from Counters: Learn how to use counters from mulitple disparate sources, devices, operating systems, and applications to generate actionable time series data.

ofp_sniffer Branch 1.2 (docker/influxdb/grafana) Upcoming Features: This video demo shows off some of the upcoming features for OFP_Sniffer, an OpenFlow sniffer to help network troubleshooting in production networks.


Grafana Plugins

Plugin authors add new features and bugfixes all the time, so it’s important to always keep your plugins up to date. To update plugins from on-prem Grafana, use the Grafana-cli tool, if you are using Hosted Grafana, you can update with 1 click! If you have questions or need help, hit up our community site, where the Grafana team and members of the community are happy to help.

UPDATED PLUGIN

PNP for Nagios Data Source – The latest release for the PNP data source has some fixes and adds a mathematical factor option.

Update

UPDATED PLUGIN

Google Calendar Data Source – This week, there was a small bug fix for the Google Calendar annotations data source.

Update

UPDATED PLUGIN

BT Plugins – Our friends at BT have been busy. All of the BT plugins in our catalog received and update this week. The plugins are the Status Dot Panel, the Peak Report Panel, the Trend Box Panel and the Alarm Box Panel.

Changes include:

  • Custom dashboard links now work in Internet Explorer.
  • The Peak Report panel no longer supports click-to-sort.
  • The Status Dot panel tooltips now look like Grafana tooltips.


This week’s MVC (Most Valuable Contributor)

Each week we highlight some of the important contributions from our amazing open source community. This week, we’d like to recognize a contributor who did a lot of work to improve Prometheus support.

pdoan017
Thanks to Alin Sinpaleanfor his Prometheus PR – that aligns the step and interval parameters. Alin got a lot of feedback from the Prometheus community and spent a lot of time and energy explaining, debating and iterating before the PR was ready.
Thank you!


Grafana Labs is Hiring!

We are passionate about open source software and thrive on tackling complex challenges to build the future. We ship code from every corner of the globe and love working with the community. If this sounds exciting, you’re in luck – WE’RE HIRING!

Check out our Open Positions


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard and show it off! #monitoringLove

Wow – Excited to be a part of exploring data to find out how Mexico City is evolving.

We Need Your Help!

Do you have a graph that you love because the data is beautiful or because the graph provides interesting information? Please get in touch. Tweet or send us an email with a screenshot, and we’ll tell you about this fun experiment.

Tell Me More


What do you think?

That’s a wrap! How are we doing? Submit a comment on this article below, or post something at our community forum. Help us make these weekly roundups better!

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

Creating a Cost-Efficient Amazon ECS Cluster for Scheduled Tasks

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/creating-a-cost-efficient-amazon-ecs-cluster-for-scheduled-tasks/

Madhuri Peri
Sr. DevOps Consultant

When you use Amazon Relational Database Service (Amazon RDS), depending on the logging levels on the RDS instances and the volume of transactions, you could generate a lot of log data. To ensure that everything is running smoothly, many customers search for log error patterns using different log aggregation and visualization systems, such as Amazon Elasticsearch Service, Splunk, or other tool of their choice. A module needs to periodically retrieve the RDS logs using the SDK, and then send them to Amazon S3. From there, you can stream them to your log aggregation tool.

One option is writing an AWS Lambda function to retrieve the log files. However, because of the time that this function needs to execute, depending on the volume of log files retrieved and transferred, it is possible that Lambda could time out on many instances.  Another approach is launching an Amazon EC2 instance that runs this job periodically. However, this would require you to run an EC2 instance continuously, not an optimal use of time or money.

Using the new Amazon CloudWatch integration with Amazon EC2 Container Service, you can trigger this job to run in a container on an existing Amazon ECS cluster. Additionally, this would allow you to improve costs by running containers on a fleet of Spot Instances.

In this post, I will show you how to use the new scheduled tasks (cron) feature in Amazon ECS and launch tasks using CloudWatch events, while leveraging Spot Fleet to maximize availability and cost optimization for containerized workloads.

Architecture

The following diagram shows how the various components described schedule a task that retrieves log files from Amazon RDS database instances, and deposits the logs into an S3 bucket.

Amazon ECS cluster container instances are using Spot Fleet, which is a perfect match for the workload that needs to run when it can. This improves cluster costs.

The task definition defines which Docker image to retrieve from the Amazon EC2 Container Registry (Amazon ECR) repository and run on the Amazon ECS cluster.

The container image has Python code functions to make AWS API calls using boto3. It iterates over the RDS database instances, retrieves the logs, and deposits them in the S3 bucket. Many customers choose these logs to be delivered to their centralized log-store. CloudWatch Events defines the schedule for when the container task has to be launched.

Walkthrough

To provide the basic framework, we have built an AWS CloudFormation template that creates the following resources:

  • Amazon ECR repository for storing the Docker image to be used in the task definition
  • S3 bucket that holds the transferred logs
  • Task definition, with image name and S3 bucket as environment variables provided via input parameter
  • CloudWatch Events rule
  • Amazon ECS cluster
  • Amazon ECS container instances using Spot Fleet
  • IAM roles required for the container instance profiles

Before you begin

Ensure that Git, Docker, and the AWS CLI are installed on your computer.

In your AWS account, instantiate one Amazon Aurora instance using the console. For more information, see Creating an Amazon Aurora DB Cluster.

Implementation Steps

  1. Clone the code from GitHub that performs RDS API calls to retrieve the log files.
    git clone https://github.com/awslabs/aws-ecs-scheduled-tasks.git
  2. Build and tag the image.
    cd aws-ecs-scheduled-tasks/container-code/src && ls

    Dockerfile		rdslogsshipper.py	requirements.txt

    docker build -t rdslogsshipper .

    Sending build context to Docker daemon 9.728 kB
    Step 1 : FROM python:3
     ---> 41397f4f2887
    Step 2 : WORKDIR /usr/src/app
     ---> Using cache
     ---> 59299c020e7e
    Step 3 : COPY requirements.txt ./
     ---> 8c017e931c3b
    Removing intermediate container df09e1bed9f2
    Step 4 : COPY rdslogsshipper.py /usr/src/app
     ---> 099a49ca4325
    Removing intermediate container 1b1da24a6699
    Step 5 : RUN pip install --no-cache-dir -r requirements.txt
     ---> Running in 3ed98b30901d
    Collecting boto3 (from -r requirements.txt (line 1))
      Downloading boto3-1.4.6-py2.py3-none-any.whl (128kB)
    Collecting botocore (from -r requirements.txt (line 2))
      Downloading botocore-1.6.7-py2.py3-none-any.whl (3.6MB)
    Collecting s3transfer<0.2.0,>=0.1.10 (from boto3->-r requirements.txt (line 1))
      Downloading s3transfer-0.1.10-py2.py3-none-any.whl (54kB)
    Collecting jmespath<1.0.0,>=0.7.1 (from boto3->-r requirements.txt (line 1))
      Downloading jmespath-0.9.3-py2.py3-none-any.whl
    Collecting python-dateutil<3.0.0,>=2.1 (from botocore->-r requirements.txt (line 2))
      Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)
    Collecting docutils>=0.10 (from botocore->-r requirements.txt (line 2))
      Downloading docutils-0.14-py3-none-any.whl (543kB)
    Collecting six>=1.5 (from python-dateutil<3.0.0,>=2.1->botocore->-r requirements.txt (line 2))
      Downloading six-1.10.0-py2.py3-none-any.whl
    Installing collected packages: six, python-dateutil, docutils, jmespath, botocore, s3transfer, boto3
    Successfully installed boto3-1.4.6 botocore-1.6.7 docutils-0.14 jmespath-0.9.3 python-dateutil-2.6.1 s3transfer-0.1.10 six-1.10.0
     ---> f892d3cb7383
    Removing intermediate container 3ed98b30901d
    Step 6 : COPY . .
     ---> ea7550c04fea
    Removing intermediate container b558b3ebd406
    Successfully built ea7550c04fea
  3. Run the CloudFormation stack and get the names for the Amazon ECR repo and S3 bucket. In the stack, choose Outputs.
  4. Open the ECS console and choose Repositories. The rdslogs repo has been created. Choose View Push Commands and follow the instructions to connect to the repository and push the image for the code that you built in Step 2. The screenshot shows the final result:
  5. Associate the CloudWatch scheduled task with the created Amazon ECS Task Definition, using a new CloudWatch event rule that is scheduled to run at intervals. The following rule is scheduled to run every 15 minutes:
    aws --profile default --region us-west-2 events put-rule --name demo-ecs-task-rule  --schedule-expression "rate(15 minutes)"

    {
        "RuleArn": "arn:aws:events:us-west-2:12345678901:rule/demo-ecs-task-rule"
    }
  6. CloudWatch requires IAM permissions to place a task on the Amazon ECS cluster when the CloudWatch event rule is executed, in addition to an IAM role that can be assumed by CloudWatch Events. This is done in three steps:
    1. Create the IAM role to be assumed by CloudWatch.
      aws --profile default --region us-west-2 iam create-role --role-name Test-Role --assume-role-policy-document file://event-role.json

      {
          "Role": {
              "AssumeRolePolicyDocument": {
                  "Version": "2012-10-17", 
                  "Statement": [
                      {
                          "Action": "sts:AssumeRole", 
                          "Effect": "Allow", 
                          "Principal": {
                              "Service": "events.amazonaws.com"
                          }
                      }
                  ]
              }, 
              "RoleId": "AROAIRYYLDCVZCUACT7FS", 
              "CreateDate": "2017-07-14T22:44:52.627Z", 
              "RoleName": "Test-Role", 
              "Path": "/", 
              "Arn": "arn:aws:iam::12345678901:role/Test-Role"
          }
      }

      The following is an example of the event-role.json file used earlier:

      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Principal": {
                    "Service": "events.amazonaws.com"
                  },
                  "Action": "sts:AssumeRole"
              }
          ]
      }
    2. Create the IAM policy defining the ECS cluster and task definition. You need to get these values from the CloudFormation outputs and resources.
      aws --profile default --region us-west-2 iam create-policy --policy-name test-policy --policy-document file://event-policy.json

      {
          "Policy": {
              "PolicyName": "test-policy", 
              "CreateDate": "2017-07-14T22:51:20.293Z", 
              "AttachmentCount": 0, 
              "IsAttachable": true, 
              "PolicyId": "ANPAI7XDIQOLTBUMDWGJW", 
              "DefaultVersionId": "v1", 
              "Path": "/", 
              "Arn": "arn:aws:iam::123455678901:policy/test-policy", 
              "UpdateDate": "2017-07-14T22:51:20.293Z"
          }
      }

      The following is an example of the event-policy.json file used earlier:

      {
          "Version": "2012-10-17",
          "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "ecs:RunTask"
                ],
                "Resource": [
                    "arn:aws:ecs:*::task-definition/"
                ],
                "Condition": {
                    "ArnLike": {
                        "ecs:cluster": "arn:aws:ecs:*::cluster/"
                    }
                }
            }
          ]
      }
    3. Attach the IAM policy to the role.
      aws --profile default --region us-west-2 iam attach-role-policy --role-name Test-Role --policy-arn arn:aws:iam::1234567890:policy/test-policy
  7. Associate the CloudWatch rule created earlier to place the task on the ECS cluster. The following command shows an example. Replace the AWS account ID and region with your settings.
    aws events put-targets --rule demo-ecs-task-rule --targets "Id"="1","Arn"="arn:aws:ecs:us-west-2:12345678901:cluster/test-cwe-blog-ecsCluster-15HJFWCH4SP67","EcsParameters"={"TaskDefinitionArn"="arn:aws:ecs:us-west-2:12345678901:task-definition/test-cwe-blog-taskdef:8"},"RoleArn"="arn:aws:iam::12345678901:role/Test-Role"

    {
        "FailedEntries": [], 
        "FailedEntryCount": 0
    }

That’s it. The logs now run based on the defined schedule.

To test this, open the Amazon ECS console, select the Amazon ECS cluster that you created, and then choose Tasks, Run New Task. Select the task definition created by the CloudFormation template, and the cluster should be selected automatically. As this runs, the S3 bucket should be populated with the RDS logs for the instance.

Conclusion

In this post, you’ve seen that the choices for workloads that need to run at a scheduled time include Lambda with CloudWatch events or EC2 with cron. However, sometimes the job could run outside of Lambda execution time limits or be not cost-effective for an EC2 instance.

In such cases, you can schedule the tasks on an ECS cluster using CloudWatch rules. In addition, you can use a Spot Fleet cluster with Amazon ECS for cost-conscious workloads that do not have hard requirements on execution time or instance availability in the Spot Fleet. For more information, see Powering your Amazon ECS Cluster with Amazon EC2 Spot Instances and Scheduled Events.

If you have questions or suggestions, please comment below.

timeShift(GrafanaBuzz, 1w) Issue 14

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/09/22/timeshiftgrafanabuzz-1w-issue-14/

Summer is officially in the rear-view mirror, but we at Grafana Labs are excited. Next week, the team will gather in Stockholm, Sweden where we’ll be discussing Grafana 5.0, GrafanaCon EU and setting other goals. If you’re attending Percona Live Europe 2017 in Dublin, be sure and catch Grafana developer, Daniel Lee on Tuesday, September 26. He’ll be showing off the new MySQL data source and a sneak peek of Grafana 5.0.

And with that – we hope you enjoy this issue of TimeShift!


Latest Release

Grafana 4.5.2 is now available! Various fixes to the Graphite data source, HTTP API, and templating.

To see details on what’s been fixed in the newest version, please see the release notes.

Download Grafana 4.5.2 Now


From the Blogosphere

A Monitoring Solution for Docker Hosts, Containers and Containerized Services: Stefan was searching for an open source, self-hosted monitoring solution. With an ever-growing number of open source TSDBs, Stefan outlines why he chose Prometheus and provides a rundown of how he’s monitoring his Docker hosts, containers and services.

Real-time API Performance Monitoring with ES, Beats, Logstash and Grafana: As APIs become a centerpiece for businesses, monitoring API performance is extremely important. Hiren recently configured real time API response time monitoring for a project and shares his implementation plan and configurations.

Monitoring SSL Certificate Expiry in GCP and Kubernetes: This article discusses how to use Prometheus and Grafana to automatically monitor SSL certificates in use by load balancers across GCP projects.

Node.js Performance Monitoring with Prometheus: This is a good primer for monitoring in general. It discusses what monitoring is, important signals to know, instrumentation, and things to consider when selecting a monitoring tool.

DIY Dashboard with Grafana and MariaDB: Mark was interested in testing out the new beta MySQL support in Grafana, so he wrote a short article on how he is using Grafana with MariaDB.

Collecting Temperature Data with Raspberry Pi Computers: Many of us use monitoring for tracking mission-critical systems, but setting up environment monitoring can be a fun way to improve your programming skills as well.


GrafanaCon EU CFP is Open

Have a big idea to share? A shorter talk or a demo you’d like to show off? We’re looking for technical and non-technical talks of all sizes. The proposals are rolling in, but we are happy to save a speaking slot for you!

I’d Like to Speak at GrafanaCon


Grafana Plugins

There were a lot of plugin updates to highlight this week, many of which were due to changes in Grafana 4.5. It’s important to keep your plugins up to date, since bug fixes and new features are added frequently. We’ve made the process of installing and updating plugins simple. On an on-prem instance, use the Grafana-cli, or on Hosted Grafana, install and update with 1-click.

NEW PLUGIN

Linksmart HDS Data Source – The LinkSmart Historical Data Store is a new Grafana data source plugin. LinkSmart is an open source IoT platform for developing IoT applications. IoT applications need to deal with large amounts of data produced by a growing number of sensors and other devices. The Historical Datastore is for storing, querying, and aggregating (time-series) sensor data.

Install Now

UPDATED PLUGIN

Simple JSON Data Source – This plugin received a bug fix for the query editor.

Update Now

UPDATED PLUGIN

Stagemonitor Elasticsearch App – Numerous small updates and the version updated to match the StageMonitor version number.

Update Now

UPDATED PLUGIN

Discrete Panel – Update to fix breaking change in Grafana 4.5.

Update Now

UPDATED PLUGIN

Status Dot Panel – Minor HTML Update in this version.

Update Now

UPDATED PLUGIN

Alarm Box Panel – This panel was updated to fix breaking changes in Grafana 4.5.

Update Now


This week’s MVC (Most Valuable Contributor)

Each week we highlight a contributor to Grafana or the surrounding ecosystem as a thank you for their participation in making open source software great.

Sven Klemm opened a PR for adding a new Postgres data source and has been very quick at implementing proposed changes. The Postgres data source is on our roadmap for Grafana 5.0 so this PR really helps. Thanks Sven!


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard and show it off! #monitoringLove

Glad you’re finding Grafana useful! Curious about that annotation just before midnight 🙂

We Need Your Help

Last week we announced an experiment we were conducting, and need your help! Do you have a graph that you love because the data is beautiful or because the graph provides interesting information? Please get in touch. Tweet or send us an email with a screenshot, and we’ll tell you about this fun experiment.

I Want to Help


Grafana Labs is Hiring!

We are passionate about open source software and thrive on tackling complex challenges to build the future. We ship code from every corner of the globe and love working with the community. If this sounds exciting, you’re in luck – WE’RE HIRING!

Check out our Open Positions


What do you think?

What would you like to see here? Submit a comment on this article below, or post something at our community forum. Help us make these weekly roundups better!

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

Simplify Your Jenkins Builds with AWS CodeBuild

Post Syndicated from Paul Roberts original https://aws.amazon.com/blogs/devops/simplify-your-jenkins-builds-with-aws-codebuild/

Jeff Bezos famously said, “There’s a lot of undifferentiated heavy lifting that stands between your idea and that success.” He went on to say, “…70% of your time, energy, and dollars go into the undifferentiated heavy lifting and only 30% of your energy, time, and dollars gets to go into the core kernel of your idea.”

If you subscribe to this maxim, you should not be spending valuable time focusing on operational issues related to maintaining the Jenkins build infrastructure. Companies such as Riot Games have over 1.25 million builds per year and have written several lengthy blog posts about their experiences designing a complex, custom Docker-powered Jenkins build farm. Dealing with Jenkins slaves at scale is a job in itself and Riot has engineers focused on managing the build infrastructure.

Typical Jenkins Build Farm

 

As with all technology, the Jenkins build farm architectures have evolved. Today, instead of manually building your own container infrastructure, there are Jenkins Docker plugins available to help reduce the operational burden of maintaining these environments. There is also a community-contributed Amazon EC2 Container Service (Amazon ECS) plugin that helps remove some of the overhead, but you still need to configure and manage the overall Amazon ECS environment.

There are various ways to create and manage your Jenkins build farm, but there has to be a way that significantly reduces your operational overhead.

Introducing AWS CodeBuild

AWS CodeBuild is a fully managed build service that removes the undifferentiated heavy lifting of provisioning, managing, and scaling your own build servers. With CodeBuild, there is no software to install, patch, or update. CodeBuild scales up automatically to meet the needs of your development teams. In addition, CodeBuild is an on-demand service where you pay as you go. You are charged based only on the number of minutes it takes to complete your build.

One AWS customer, Recruiterbox, helps companies hire simply and predictably through their software platform. Two years ago, they began feeling the operational pain of maintaining their own Jenkins build farms. They briefly considered moving to Amazon ECS, but chose an even easier path forward instead. Recuiterbox transitioned to using Jenkins with CodeBuild and are very happy with the results. You can read more about their journey here.

Solution Overview: Jenkins and CodeBuild

To remove the heavy lifting from managing your Jenkins build farm, AWS has developed a Jenkins AWS CodeBuild plugin. After the plugin has been enabled, a developer can configure a Jenkins project to pick up new commits from their chosen source code repository and automatically run the associated builds. After the build is successful, it will create an artifact that is stored inside an S3 bucket that you have configured. If an error is detected somewhere, CodeBuild will capture the output and send it to Amazon CloudWatch logs. In addition to storing the logs on CloudWatch, Jenkins also captures the error so you do not have to go hunting for log files for your build.

 

AWS CodeBuild with Jenkins Plugin

 

The following example uses AWS CodeCommit (Git) as the source control management (SCM) and Amazon S3 for build artifact storage. Logs are stored in CloudWatch. A development pipeline that uses Jenkins with CodeBuild plugin architecture looks something like this:

 

AWS CodeBuild Diagram

Initial Solution Setup

To keep this blog post succinct, I assume that you are using the following components on AWS already and have applied the appropriate IAM policies:

·         AWS CodeCommit repo.

·         Amazon S3 bucket for CodeBuild artifacts.

·         SNS notification for text messaging of the Jenkins admin password.

·         IAM user’s key and secret.

·         A role that has a policy with these permissions. Be sure to edit the ARNs with your region, account, and resource name. Use this role in the AWS CloudFormation template referred to later in this post.

 

Jenkins Installation with CodeBuild Plugin Enabled

To make the integration with Jenkins as frictionless as possible, I have created an AWS CloudFormation template here: https://s3.amazonaws.com/proberts-public/jenkins.yaml. Download the template, sign in the AWS CloudFormation console, and then use the template to create a stack.

 

CloudFormation Inputs

Jenkins Project Configuration

After the stack is complete, log in to the Jenkins EC2 instance using the user name “admin” and the password sent to your mobile device. Now that you have logged in to Jenkins, you need to create your first project. Start with a Freestyle project and configure the parameters based on your CodeBuild and CodeCommit settings.

 

AWS CodeBuild Plugin Configuration in Jenkins

 

Additional Jenkins AWS CodeBuild Plugin Configuration

 

After you have configured the Jenkins project appropriately you should be able to check your build status on the Jenkins polling log under your project settings:

 

Jenkins Polling Log

 

Now that Jenkins is polling CodeCommit, you can check the CodeBuild dashboard under your Jenkins project to confirm your build was successful:

Jenkins AWS CodeBuild Dashboard

Wrapping Up

In a matter of minutes, you have been able to provision Jenkins with the AWS CodeBuild plugin. This will greatly simplify your build infrastructure management. Now kick back and relax while CodeBuild does all the heavy lifting!


About the Author

Paul Roberts is a Strategic Solutions Architect for Amazon Web Services. When he is not working on Serverless, DevOps, or Artificial Intelligence, he is often found in Lake Tahoe exploring the various mountain ranges with his family.

Security updates for Tuesday

Post Syndicated from ris original https://lwn.net/Articles/732948/rss

Security updates have been issued by Debian (asterisk and irssi), Fedora (glibc, libidn2, and xen), Gentoo (mcollective), openSUSE (pspp and wireshark), Red Hat (389-ds-base, docker-distribution, kernel-rt, and qemu-kvm-rhev), Scientific Linux (389-ds-base), SUSE (kernel, libzypp, zypper, and xen), and Ubuntu (fontforge and liblouis).

timeShift(GrafanaBuzz, 1w) Issue 10

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/08/25/timeshiftgrafanabuzz-1w-issue-10/

This week, in addition to the articles we collected from around the web and a number of new Plugins and updates, we have a special announcement. GrafanaCon EU has been announced! Join us in Amsterdam March 1-2, 2018. The call for papers is officially open! We’ll keep you up to date as we fill in the details.


Grafana <3 Prometheus

Last week we mentioned that our colleague Carl Bergquist spoke at PromCon 2017 in Munich. His presentation is now available online. We will post the video once it’s available.


From the Blogosphere

Grafana-based GUI for mgstat, a system monitoring tool for InterSystems Caché, Ensemble or HealthShare: This is the second article in a series about Making Prometheus Monitoring for InterSystems Caché. Mikhail goes into great detail about setting this up on Docker, configuring the first dashboard, and adding templating.

Installation and Integration of Grafana in Zabbix 3.x: Daniel put together an installation guide to get Grafana to display metrics from Zabbix, which utilizes the Zabbix Plugin developed by Grafana Labs Developer Alex Zobnin.

Visualize with RRDtool x Grafana: Atfujiwara wanted to update his MRTG graphs from RRDtool. This post talks about the components needed and how he connected RRDtool to Grafana.

Huawei OceanStor metrics in Grafana: Dennis is using Grafana to display metrics for his storage devices. In this post he walks you through the setup and provides a comprehensive dashboard for all the metrics.

Grafana on a Raspberry Pi2: Pete discusses how he uses Grafana with his garden sensors, and walks you through how to get it up and running on a Pi2.


Grafana Plugins

This week was pretty active on the plugin front. Today we’re announcing two brand new plugins and updates to three others. Installing plugins in Grafana is easy – if you have Hosted Grafana, simply use the one-click install, if you’re using an on-prem instance you can use the Grafana-cli.

NEW PLUGIN

IBM APM Data Source – This plugin collects metrics from the IBM APM (Application Performance Management) products and allows you to visualize it on Grafana dashboards. The plugin supports:

  • IBM Tivoli Monitoring 6.x
  • IBM SmartCloud Application Performance Management 7.x
  • IBM Performance Management 8.x (only on-premises version)

Install Now

NEW PLUGIN

Skydive Data Source – This data source plugin collects metrics from Skydive, an open source real-time network topology and protocols analyzer. Using the Skydive Gremlin query language, you can fetch metrics for flows in your network.

Install now

UPDATED PLUGIN

Datatable Panel – Lots of changes in the latest update to the Datatable Panel Here are some highlights from the changelog:

  • NEW: Export options for Clipboard/CSV/PDF/Excel/Print
  • NEW: Column Aliasing – modify the name of a column as sent by the datasource
  • NEW: Added option for a cell or row to link to another page
  • NEW: Supports Clickable links inside table
  • BUGFIX: CSS files now load when Grafana has a subpath
  • NEW: Added multi-column sorting – sort by any number of columns ascending/descending
  • NEW: Column width hints – suggest a width for a named column
  • BUGFIX: Columns from datasources other than JSON can now be aliased

Update Now

UPDATED PLUGIN

D3 Gauge Panel – The D3 Gauge Panel has a new feature – Tick Mapping. Ticks on the gauge can now be mapped to text.

Update Now

UPDATED PLUGIN

PNP4Nagios Data Source – The most recent update to the PNP Data Source adds support for template variables in queries and as well as support for querying warning and critical thresholds.

Update Now


This week’s MVC (Most Valuable Contributor)

Each week we highlight a contributor to Grafana or the surrounding ecosystem as a thank you for their participation in making open source software great.

Brian Gann
Brian is the maintainer of two Grafana Plugins and this week he submitted substantial updates to both of them (Datatable and D3 Gauge panel plugins); and he says there’s more to come! Thanks for all your hard work, Brian.


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard and show it off! #monitoringLove

The Dark Knight popping up in graphs seems to be a recurring theme!
This is the graph Jakub deserves, but not the one he needs right now.



What do you think?

That’s it for the 10th issue of timeShift. Let us know how we’re doing! Submit a comment on this article below, or post something at our community forum. Help us make this better!

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

New – AWS SAM Local (Beta) – Build and Test Serverless Applications Locally

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/new-aws-sam-local-beta-build-and-test-serverless-applications-locally/

Today we’re releasing a beta of a new tool, SAM Local, that makes it easy to build and test your serverless applications locally. In this post we’ll use SAM local to build, debug, and deploy a quick application that allows us to vote on tabs or spaces by curling an endpoint. AWS introduced Serverless Application Model (SAM) last year to make it easier for developers to deploy serverless applications. If you’re not already familiar with SAM my colleague Orr wrote a great post on how to use SAM that you can read in about 5 minutes. At it’s core, SAM is a powerful open source specification built on AWS CloudFormation that makes it easy to keep your serverless infrastructure as code – and they have the cutest mascot.

SAM Local takes all the good parts of SAM and brings them to your local machine.

There are a couple of ways to install SAM Local but the easiest is through NPM. A quick npm install -g aws-sam-local should get us going but if you want the latest version you can always install straight from the source: go get github.com/awslabs/aws-sam-local (this will create a binary named aws-sam-local, not sam).

I like to vote on things so let’s write a quick SAM application to vote on Spaces versus Tabs. We’ll use a very simple, but powerful, architecture of API Gateway fronting a Lambda function and we’ll store our results in DynamoDB. In the end a user should be able to curl our API curl https://SOMEURL/ -d '{"vote": "spaces"}' and get back the number of votes.

Let’s start by writing a simple SAM template.yaml:

AWSTemplateFormatVersion : '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:
  VotesTable:
    Type: "AWS::Serverless::SimpleTable"
  VoteSpacesTabs:
    Type: "AWS::Serverless::Function"
    Properties:
      Runtime: python3.6
      Handler: lambda_function.lambda_handler
      Policies: AmazonDynamoDBFullAccess
      Environment:
        Variables:
          TABLE_NAME: !Ref VotesTable
      Events:
        Vote:
          Type: Api
          Properties:
            Path: /
            Method: post

So we create a [dynamo_i] table that we expose to our Lambda function through an environment variable called TABLE_NAME.

To test that this template is valid I’ll go ahead and call sam validate to make sure I haven’t fat-fingered anything. It returns Valid! so let’s go ahead and get to work on our Lambda function.

import os
import os
import json
import boto3
votes_table = boto3.resource('dynamodb').Table(os.getenv('TABLE_NAME'))

def lambda_handler(event, context):
    print(event)
    if event['httpMethod'] == 'GET':
        resp = votes_table.scan()
        return {'body': json.dumps({item['id']: int(item['votes']) for item in resp['Items']})}
    elif event['httpMethod'] == 'POST':
        try:
            body = json.loads(event['body'])
        except:
            return {'statusCode': 400, 'body': 'malformed json input'}
        if 'vote' not in body:
            return {'statusCode': 400, 'body': 'missing vote in request body'}
        if body['vote'] not in ['spaces', 'tabs']:
            return {'statusCode': 400, 'body': 'vote value must be "spaces" or "tabs"'}

        resp = votes_table.update_item(
            Key={'id': body['vote']},
            UpdateExpression='ADD votes :incr',
            ExpressionAttributeValues={':incr': 1},
            ReturnValues='ALL_NEW'
        )
        return {'body': "{} now has {} votes".format(body['vote'], resp['Attributes']['votes'])}

So let’s test this locally. I’ll need to create a real DynamoDB database to talk to and I’ll need to provide the name of that database through the enviornment variable TABLE_NAME. I could do that with an env.json file or I can just pass it on the command line. First, I can call:
$ echo '{"httpMethod": "POST", "body": "{\"vote\": \"spaces\"}"}' |\
TABLE_NAME="vote-spaces-tabs" sam local invoke "VoteSpacesTabs"

to test the Lambda – it returns the number of votes for spaces so theoritically everything is working. Typing all of that out is a pain so I could generate a sample event with sam local generate-event api and pass that in to the local invocation. Far easier than all of that is just running our API locally. Let’s do that: sam local start-api. Now I can curl my local endpoints to test everything out.
I’ll run the command: $ curl -d '{"vote": "tabs"}' http://127.0.0.1:3000/ and it returns: “tabs now has 12 votes”. Now, of course I did not write this function perfectly on my first try. I edited and saved several times. One of the benefits of hot-reloading is that as I change the function I don’t have to do any additional work to test the new function. This makes iterative development vastly easier.

Let’s say we don’t want to deal with accessing a real DynamoDB database over the network though. What are our options? Well we can download DynamoDB Local and launch it with java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb. Then we can have our Lambda function use the AWS_SAM_LOCAL environment variable to make some decisions about how to behave. Let’s modify our function a bit:

import os
import json
import boto3
if os.getenv("AWS_SAM_LOCAL"):
    votes_table = boto3.resource(
        'dynamodb',
        endpoint_url="http://docker.for.mac.localhost:8000/"
    ).Table("spaces-tabs-votes")
else:
    votes_table = boto3.resource('dynamodb').Table(os.getenv('TABLE_NAME'))

Now we’re using a local endpoint to connect to our local database which makes working without wifi a little easier.

SAM local even supports interactive debugging! In Java and Node.js I can just pass the -d flag and a port to immediately enable the debugger. For Python I could use a library like import epdb; epdb.serve() and connect that way. Then we can call sam local invoke -d 8080 "VoteSpacesTabs" and our function will pause execution waiting for you to step through with the debugger.

Alright, I think we’ve got everything working so let’s deploy this!

First I’ll call the sam package command which is just an alias for aws cloudformation package and then I’ll use the result of that command to sam deploy.

$ sam package --template-file template.yaml --s3-bucket MYAWESOMEBUCKET --output-template-file package.yaml
Uploading to 144e47a4a08f8338faae894afe7563c3  90570 / 90570.0  (100.00%)
Successfully packaged artifacts and wrote output template to file package.yaml.
Execute the following command to deploy the packaged template
aws cloudformation deploy --template-file package.yaml --stack-name 
$ sam deploy --template-file package.yaml --stack-name VoteForSpaces --capabilities CAPABILITY_IAM
Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - VoteForSpaces

Which brings us to our API:
.

I’m going to hop over into the production stage and add some rate limiting in case you guys start voting a lot – but otherwise we’ve taken our local work and deployed it to the cloud without much effort at all. I always enjoy it when things work on the first deploy!

You can vote now and watch the results live! http://spaces-or-tabs.s3-website-us-east-1.amazonaws.com/

We hope that SAM Local makes it easier for you to test, debug, and deploy your serverless apps. We have a CONTRIBUTING.md guide and we welcome pull requests. Please tweet at us to let us know what cool things you build. You can see our What’s New post here and the documentation is live here.

Randall

Deploying an NGINX Reverse Proxy Sidecar Container on Amazon ECS

Post Syndicated from Nathan Peck original https://aws.amazon.com/blogs/compute/nginx-reverse-proxy-sidecar-container-on-amazon-ecs/

Reverse proxies are a powerful software architecture primitive for fetching resources from a server on behalf of a client. They serve a number of purposes, from protecting servers from unwanted traffic to offloading some of the heavy lifting of HTTP traffic processing.

This post explains the benefits of a reverse proxy, and explains how to use NGINX and Amazon EC2 Container Service (Amazon ECS) to easily implement and deploy a reverse proxy for your containerized application.

Components

NGINX is a high performance HTTP server that has achieved significant adoption because of its asynchronous event driven architecture. It can serve thousands of concurrent requests with a low memory footprint. This efficiency also makes it ideal as a reverse proxy.

Amazon ECS is a highly scalable, high performance container management service that supports Docker containers. It allows you to run applications easily on a managed cluster of Amazon EC2 instances. Amazon ECS helps you get your application components running on instances according to a specified configuration. It also helps scale out these components across an entire fleet of instances.

Sidecar containers are a common software pattern that has been embraced by engineering organizations. It’s a way to keep server side architecture easier to understand by building with smaller, modular containers that each serve a simple purpose. Just like an application can be powered by multiple microservices, each microservice can also be powered by multiple containers that work together. A sidecar container is simply a way to move part of the core responsibility of a service out into a containerized module that is deployed alongside a core application container.

The following diagram shows how an NGINX reverse proxy sidecar container operates alongside an application server container:

In this architecture, Amazon ECS has deployed two copies of an application stack that is made up of an NGINX reverse proxy side container and an application container. Web traffic from the public goes to an Application Load Balancer, which then distributes the traffic to one of the NGINX reverse proxy sidecars. The NGINX reverse proxy then forwards the request to the application server and returns its response to the client via the load balancer.

Reverse proxy for security

Security is one reason for using a reverse proxy in front of an application container. Any web server that serves resources to the public can expect to receive lots of unwanted traffic every day. Some of this traffic is relatively benign scans by researchers and tools, such as Shodan or nmap:

[18/May/2017:15:10:10 +0000] "GET /YesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScann HTTP/1.1" 404 1389 - Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36
[18/May/2017:18:19:51 +0000] "GET /clientaccesspolicy.xml HTTP/1.1" 404 322 - Cloud mapping experiment. Contact [email protected]

But other traffic is much more malicious. For example, here is what a web server sees while being scanned by the hacking tool ZmEu, which scans web servers trying to find PHPMyAdmin installations to exploit:

[18/May/2017:16:27:39 +0000] "GET /mysqladmin/scripts/setup.php HTTP/1.1" 404 391 - ZmEu
[18/May/2017:16:27:39 +0000] "GET /web/phpMyAdmin/scripts/setup.php HTTP/1.1" 404 394 - ZmEu
[18/May/2017:16:27:39 +0000] "GET /xampp/phpmyadmin/scripts/setup.php HTTP/1.1" 404 396 - ZmEu
[18/May/2017:16:27:40 +0000] "GET /apache-default/phpmyadmin/scripts/setup.php HTTP/1.1" 404 405 - ZmEu
[18/May/2017:16:27:40 +0000] "GET /phpMyAdmin-2.10.0.0/scripts/setup.php HTTP/1.1" 404 397 - ZmEu
[18/May/2017:16:27:40 +0000] "GET /mysql/scripts/setup.php HTTP/1.1" 404 386 - ZmEu
[18/May/2017:16:27:41 +0000] "GET /admin/scripts/setup.php HTTP/1.1" 404 386 - ZmEu
[18/May/2017:16:27:41 +0000] "GET /forum/phpmyadmin/scripts/setup.php HTTP/1.1" 404 396 - ZmEu
[18/May/2017:16:27:41 +0000] "GET /typo3/phpmyadmin/scripts/setup.php HTTP/1.1" 404 396 - ZmEu
[18/May/2017:16:27:42 +0000] "GET /phpMyAdmin-2.10.0.1/scripts/setup.php HTTP/1.1" 404 399 - ZmEu
[18/May/2017:16:27:44 +0000] "GET /administrator/components/com_joommyadmin/phpmyadmin/scripts/setup.php HTTP/1.1" 404 418 - ZmEu
[18/May/2017:18:34:45 +0000] "GET /phpmyadmin/scripts/setup.php HTTP/1.1" 404 390 - ZmEu
[18/May/2017:16:27:45 +0000] "GET /w00tw00t.at.blackhats.romanian.anti-sec:) HTTP/1.1" 404 401 - ZmEu

In addition, servers can also end up receiving unwanted web traffic that is intended for another server. In a cloud environment, an application may end up reusing an IP address that was formerly connected to another service. It’s common for misconfigured or misbehaving DNS servers to send traffic intended for a different host to an IP address now connected to your server.

It’s the responsibility of anyone running a web server to handle and reject potentially malicious traffic or unwanted traffic. Ideally, the web server can reject this traffic as early as possible, before it actually reaches the core application code. A reverse proxy is one way to provide this layer of protection for an application server. It can be configured to reject these requests before they reach the application server.

Reverse proxy for performance

Another advantage of using a reverse proxy such as NGINX is that it can be configured to offload some heavy lifting from your application container. For example, every HTTP server should support gzip. Whenever a client requests gzip encoding, the server compresses the response before sending it back to the client. This compression saves network bandwidth, which also improves speed for clients who now don’t have to wait as long for a response to fully download.

NGINX can be configured to accept a plaintext response from your application container and gzip encode it before sending it down to the client. This allows your application container to focus 100% of its CPU allotment on running business logic, while NGINX handles the encoding with its efficient gzip implementation.

An application may have security concerns that require SSL termination at the instance level instead of at the load balancer. NGINX can also be configured to terminate SSL before proxying the request to a local application container. Again, this also removes some CPU load from the application container, allowing it to focus on running business logic. It also gives you a cleaner way to patch any SSL vulnerabilities or update SSL certificates by updating the NGINX container without needing to change the application container.

NGINX configuration

Configuring NGINX for both traffic filtering and gzip encoding is shown below:

http {
  # NGINX will handle gzip compression of responses from the app server
  gzip on;
  gzip_proxied any;
  gzip_types text/plain application/json;
  gzip_min_length 1000;
 
  server {
    listen 80;
 
    # NGINX will reject anything not matching /api
    location /api {
      # Reject requests with unsupported HTTP method
      if ($request_method !~ ^(GET|POST|HEAD|OPTIONS|PUT|DELETE)$) {
        return 405;
      }
 
      # Only requests matching the whitelist expectations will
      # get sent to the application server
      proxy_pass http://app:3000;
      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection 'upgrade';
      proxy_set_header Host $host;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_cache_bypass $http_upgrade;
    }
  }
}

The above configuration only accepts traffic that matches the expression /api and has a recognized HTTP method. If the traffic matches, it is forwarded to a local application container accessible at the local hostname app. If the client requested gzip encoding, the plaintext response from that application container is gzip-encoded.

Amazon ECS configuration

Configuring ECS to run this NGINX container as a sidecar is also simple. ECS uses a core primitive called the task definition. Each task definition can include one or more containers, which can be linked to each other:

 {
  "containerDefinitions": [
     {
       "name": "nginx",
       "image": "<NGINX reverse proxy image URL here>",
       "memory": "256",
       "cpu": "256",
       "essential": true,
       "portMappings": [
         {
           "containerPort": "80",
           "protocol": "tcp"
         }
       ],
       "links": [
         "app"
       ]
     },
     {
       "name": "app",
       "image": "<app image URL here>",
       "memory": "256",
       "cpu": "256",
       "essential": true
     }
   ],
   "networkMode": "bridge",
   "family": "application-stack"
}

This task definition causes ECS to start both an NGINX container and an application container on the same instance. Then, the NGINX container is linked to the application container. This allows the NGINX container to send traffic to the application container using the hostname app.

The NGINX container has a port mapping that exposes port 80 on a publically accessible port but the application container does not. This means that the application container is not directly addressable. The only way to send it traffic is to send traffic to the NGINX container, which filters that traffic down. It only forwards to the application container if the traffic passes the whitelisted rules.

Conclusion

Running a sidecar container such as NGINX can bring significant benefits by making it easier to provide protection for application containers. Sidecar containers also improve performance by freeing your application container from various CPU intensive tasks. Amazon ECS makes it easy to run sidecar containers, and automate their deployment across your cluster.

To see the full code for this NGINX sidecar reference, or to try it out yourself, you can check out the open source NGINX reverse proxy reference architecture on GitHub.

– Nathan
 @nathankpeck

Create Multiple Builds from the Same Source Using Different AWS CodeBuild Build Specification Files

Post Syndicated from Prakash Palanisamy original https://aws.amazon.com/blogs/devops/create-multiple-builds-from-the-same-source-using-different-aws-codebuild-build-specification-files/

In June 2017, AWS CodeBuild announced you can now specify an alternate build specification file name or location in an AWS CodeBuild project.

In this post, I’ll show you how to use different build specification files in the same repository to create different builds. You’ll find the source code for this post in our GitHub repo.

Requirements

The AWS CLI must be installed and configured.

Solution Overview

I have created a C program (cbsamplelib.c) that will be used to create a shared library and another utility program (cbsampleutil.c) to use that library. I’ll use a Makefile to compile these files.

I need to put this sample application in RPM and DEB packages so end users can easily deploy them. I have created a build specification file for RPM. It will use make to compile this code and the RPM specification file (cbsample.rpmspec) configured in the build specification to create the RPM package. Similarly, I have created a build specification file for DEB. It will create the DEB package based on the control specification file (cbsample.control) configured in this build specification.

RPM Build Project:

The following build specification file (buildspec-rpm.yml) uses build specification version 0.2. As described in the documentation, this version has different syntax for environment variables. This build specification includes multiple phases:

  • As part of the install phase, the required packages is installed using yum.
  • During the pre_build phase, the required directories are created and the required files, including the RPM build specification file, are copied to the appropriate location.
  • During the build phase, the code is compiled, and then the RPM package is created based on the RPM specification.

As defined in the artifact section, the RPM file will be uploaded as a build artifact.

version: 0.2

env:
  variables:
    build_version: "0.1"

phases:
  install:
    commands:
      - yum install rpm-build make gcc glibc -y
  pre_build:
    commands:
      - curr_working_dir=`pwd`
      - mkdir -p ./{RPMS,SRPMS,BUILD,SOURCES,SPECS,tmp}
      - filename="cbsample-$build_version"
      - echo $filename
      - mkdir -p $filename
      - cp ./*.c ./*.h Makefile $filename
      - tar -zcvf /root/$filename.tar.gz $filename
      - cp /root/$filename.tar.gz ./SOURCES/
      - cp cbsample.rpmspec ./SPECS/
  build:
    commands:
      - echo "Triggering RPM build"
      - rpmbuild --define "_topdir `pwd`" -ba SPECS/cbsample.rpmspec
      - cd $curr_working_dir

artifacts:
  files:
    - RPMS/x86_64/cbsample*.rpm
  discard-paths: yes

Using cb-centos-project.json as a reference, create the input JSON file for the CLI command. This project uses an AWS CodeCommit repository named codebuild-multispec and a file named buildspec-rpm.yml as the build specification file. To create the RPM package, we need to specify a custom image name. I’m using the latest CentOS 7 image available in the Docker Hub. I’m using a role named CodeBuildServiceRole. It contains permissions similar to those defined in CodeBuildServiceRole.json. (You need to change the resource fields in the policy, as appropriate.)

{
    "name": "rpm-build-project",
    "description": "Project which will build RPM from the source.",
    "source": {
        "type": "CODECOMMIT",
        "location": "https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/codebuild-multispec",
        "buildspec": "buildspec-rpm.yml"
    },
    "artifacts": {
        "type": "S3",
        "location": "codebuild-demo-artifact-repository"
    },
    "environment": {
        "type": "LINUX_CONTAINER",
        "image": "centos:7",
        "computeType": "BUILD_GENERAL1_SMALL"
    },
    "serviceRole": "arn:aws:iam::012345678912:role/service-role/CodeBuildServiceRole",
    "timeoutInMinutes": 15,
    "encryptionKey": "arn:aws:kms:eu-west-1:012345678912:alias/aws/s3",
    "tags": [
        {
            "key": "Name",
            "value": "RPM Demo Build"
        }
    ]
}

After the cli-input-json file is ready, execute the following command to create the build project.

$ aws codebuild create-project --name CodeBuild-RPM-Demo --cli-input-json file://cb-centos-project.json

{
    "project": {
        "name": "CodeBuild-RPM-Demo", 
        "serviceRole": "arn:aws:iam::012345678912:role/service-role/CodeBuildServiceRole", 
        "tags": [
            {
                "value": "RPM Demo Build", 
                "key": "Name"
            }
        ], 
        "artifacts": {
            "namespaceType": "NONE", 
            "packaging": "NONE", 
            "type": "S3", 
            "location": "codebuild-demo-artifact-repository", 
            "name": "CodeBuild-RPM-Demo"
        }, 
        "lastModified": 1500559811.13, 
        "timeoutInMinutes": 15, 
        "created": 1500559811.13, 
        "environment": {
            "computeType": "BUILD_GENERAL1_SMALL", 
            "privilegedMode": false, 
            "image": "centos:7", 
            "type": "LINUX_CONTAINER", 
            "environmentVariables": []
        }, 
        "source": {
            "buildspec": "buildspec-rpm.yml", 
            "type": "CODECOMMIT", 
            "location": "https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/codebuild-multispec"
        }, 
        "encryptionKey": "arn:aws:kms:eu-west-1:012345678912:alias/aws/s3", 
        "arn": "arn:aws:codebuild:eu-west-1:012345678912:project/CodeBuild-RPM-Demo", 
        "description": "Project which will build RPM from the source."
    }
}

When the project is created, run the following command to start the build. After the build has started, get the build ID. You can use the build ID to get the status of the build.

$ aws codebuild start-build --project-name CodeBuild-RPM-Demo
{
    "build": {
        "buildComplete": false, 
        "initiator": "prakash", 
        "artifacts": {
            "location": "arn:aws:s3:::codebuild-demo-artifact-repository/CodeBuild-RPM-Demo"
        }, 
        "projectName": "CodeBuild-RPM-Demo", 
        "timeoutInMinutes": 15, 
        "buildStatus": "IN_PROGRESS", 
        "environment": {
            "computeType": "BUILD_GENERAL1_SMALL", 
            "privilegedMode": false, 
            "image": "centos:7", 
            "type": "LINUX_CONTAINER", 
            "environmentVariables": []
        }, 
        "source": {
            "buildspec": "buildspec-rpm.yml", 
            "type": "CODECOMMIT", 
            "location": "https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/codebuild-multispec"
        }, 
        "currentPhase": "SUBMITTED", 
        "startTime": 1500560156.761, 
        "id": "CodeBuild-RPM-Demo:57a36755-4d37-4b08-9c11-1468e1682abc", 
        "arn": "arn:aws:codebuild:eu-west-1: 012345678912:build/CodeBuild-RPM-Demo:57a36755-4d37-4b08-9c11-1468e1682abc"
    }
}

$ aws codebuild list-builds-for-project --project-name CodeBuild-RPM-Demo
{
    "ids": [
        "CodeBuild-RPM-Demo:57a36755-4d37-4b08-9c11-1468e1682abc"
    ]
}

$ aws codebuild batch-get-builds --ids CodeBuild-RPM-Demo:57a36755-4d37-4b08-9c11-1468e1682abc
{
    "buildsNotFound": [], 
    "builds": [
        {
            "buildComplete": true, 
            "phases": [
                {
                    "phaseStatus": "SUCCEEDED", 
                    "endTime": 1500560157.164, 
                    "phaseType": "SUBMITTED", 
                    "durationInSeconds": 0, 
                    "startTime": 1500560156.761
                }, 
                {
                    "contexts": [], 
                    "phaseType": "PROVISIONING", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 24, 
                    "startTime": 1500560157.164, 
                    "endTime": 1500560182.066
                }, 
                {
                    "contexts": [], 
                    "phaseType": "DOWNLOAD_SOURCE", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 15, 
                    "startTime": 1500560182.066, 
                    "endTime": 1500560197.906
                }, 
                {
                    "contexts": [], 
                    "phaseType": "INSTALL", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 19, 
                    "startTime": 1500560197.906, 
                    "endTime": 1500560217.515
                }, 
                {
                    "contexts": [], 
                    "phaseType": "PRE_BUILD", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 0, 
                    "startTime": 1500560217.515, 
                    "endTime": 1500560217.662
                }, 
                {
                    "contexts": [], 
                    "phaseType": "BUILD", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 0, 
                    "startTime": 1500560217.662, 
                    "endTime": 1500560217.995
                }, 
                {
                    "contexts": [], 
                    "phaseType": "POST_BUILD", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 0, 
                    "startTime": 1500560217.995, 
                    "endTime": 1500560218.074
                }, 
                {
                    "contexts": [], 
                    "phaseType": "UPLOAD_ARTIFACTS", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 0, 
                    "startTime": 1500560218.074, 
                    "endTime": 1500560218.542
                }, 
                {
                    "contexts": [], 
                    "phaseType": "FINALIZING", 
                    "phaseStatus": "SUCCEEDED", 
                    "durationInSeconds": 4, 
                    "startTime": 1500560218.542, 
                    "endTime": 1500560223.128
                }, 
                {
                    "phaseType": "COMPLETED", 
                    "startTime": 1500560223.128
                }
            ], 
            "logs": {
                "groupName": "/aws/codebuild/CodeBuild-RPM-Demo", 
                "deepLink": "https://console.aws.amazon.com/cloudwatch/home?region=eu-west-1#logEvent:group=/aws/codebuild/CodeBuild-RPM-Demo;stream=57a36755-4d37-4b08-9c11-1468e1682abc", 
                "streamName": "57a36755-4d37-4b08-9c11-1468e1682abc"
            }, 
            "artifacts": {
                "location": "arn:aws:s3:::codebuild-demo-artifact-repository/CodeBuild-RPM-Demo"
            }, 
            "projectName": "CodeBuild-RPM-Demo", 
            "timeoutInMinutes": 15, 
            "initiator": "prakash", 
            "buildStatus": "SUCCEEDED", 
            "environment": {
                "computeType": "BUILD_GENERAL1_SMALL", 
                "privilegedMode": false, 
                "image": "centos:7", 
                "type": "LINUX_CONTAINER", 
                "environmentVariables": []
            }, 
            "source": {
                "buildspec": "buildspec-rpm.yml", 
                "type": "CODECOMMIT", 
                "location": "https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/codebuild-multispec"
            }, 
            "currentPhase": "COMPLETED", 
            "startTime": 1500560156.761, 
            "endTime": 1500560223.128, 
            "id": "CodeBuild-RPM-Demo:57a36755-4d37-4b08-9c11-1468e1682abc", 
            "arn": "arn:aws:codebuild:eu-west-1:012345678912:build/CodeBuild-RPM-Demo:57a36755-4d37-4b08-9c11-1468e1682abc"
        }
    ]
}

DEB Build Project:

In this project, we will use the build specification file named buildspec-deb.yml. Like the RPM build project, this specification includes multiple phases. Here I use a Debian control file to create the package in DEB format. After a successful build, the DEB package will be uploaded as build artifact.

version: 0.2

env:
  variables:
    build_version: "0.1"

phases:
  install:
    commands:
      - apt-get install gcc make -y
  pre_build:
    commands:
      - mkdir -p ./cbsample-$build_version/DEBIAN
      - mkdir -p ./cbsample-$build_version/usr/lib
      - mkdir -p ./cbsample-$build_version/usr/include
      - mkdir -p ./cbsample-$build_version/usr/bin
      - cp -f cbsample.control ./cbsample-$build_version/DEBIAN/control
  build:
    commands:
      - echo "Building the application"
      - make
      - cp libcbsamplelib.so ./cbsample-$build_version/usr/lib
      - cp cbsamplelib.h ./cbsample-$build_version/usr/include
      - cp cbsampleutil ./cbsample-$build_version/usr/bin
      - chmod +x ./cbsample-$build_version/usr/bin/cbsampleutil
      - dpkg-deb --build ./cbsample-$build_version

artifacts:
  files:
    - cbsample-*.deb

Here we use cb-ubuntu-project.json as a reference to create the CLI input JSON file. This project uses the same AWS CodeCommit repository (codebuild-multispec) but a different buildspec file in the same repository (buildspec-deb.yml). We use the default CodeBuild image to create the DEB package. We use the same IAM role (CodeBuildServiceRole).

{
    "name": "deb-build-project",
    "description": "Project which will build DEB from the source.",
    "source": {
        "type": "CODECOMMIT",
        "location": "https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/codebuild-multispec",
        "buildspec": "buildspec-deb.yml"
    },
    "artifacts": {
        "type": "S3",
        "location": "codebuild-demo-artifact-repository"
    },
    "environment": {
        "type": "LINUX_CONTAINER",
        "image": "aws/codebuild/ubuntu-base:14.04",
        "computeType": "BUILD_GENERAL1_SMALL"
    },
    "serviceRole": "arn:aws:iam::012345678912:role/service-role/CodeBuildServiceRole",
    "timeoutInMinutes": 15,
    "encryptionKey": "arn:aws:kms:eu-west-1:012345678912:alias/aws/s3",
    "tags": [
        {
            "key": "Name",
            "value": "Debian Demo Build"
        }
    ]
}

Using the CLI input JSON file, create the project, start the build, and check the status of the project.

$ aws codebuild create-project --name CodeBuild-DEB-Demo --cli-input-json file://cb-ubuntu-project.json

$ aws codebuild list-builds-for-project --project-name CodeBuild-DEB-Demo

$ aws codebuild batch-get-builds --ids CodeBuild-DEB-Demo:e535c4b0-7067-4fbe-8060-9bb9de203789

After successful completion of the RPM and DEB builds, check the S3 bucket configured in the artifacts section for the build packages. Build projects will create a directory in the name of the build project and copy the artifacts inside it.

$ aws s3 ls s3://codebuild-demo-artifact-repository/CodeBuild-RPM-Demo/
2017-07-20 16:16:59       8108 cbsample-0.1-1.el7.centos.x86_64.rpm

$ aws s3 ls s3://codebuild-demo-artifact-repository/CodeBuild-DEB-Demo/
2017-07-20 16:37:22       5420 cbsample-0.1.deb

Override Buildspec During Build Start:

It’s also possible to override the build specification file of an existing project when starting a build. If we want to create the libs RPM package instead of the whole RPM, we will use the build specification file named buildspec-libs-rpm.yml. This build specification file is similar to the earlier RPM build. The only difference is that it uses a different RPM specification file to create libs RPM.

version: 0.2

env:
  variables:
    build_version: "0.1"

phases:
  install:
    commands:
      - yum install rpm-build make gcc glibc -y
  pre_build:
    commands:
      - curr_working_dir=`pwd`
      - mkdir -p ./{RPMS,SRPMS,BUILD,SOURCES,SPECS,tmp}
      - filename="cbsample-libs-$build_version"
      - echo $filename
      - mkdir -p $filename
      - cp ./*.c ./*.h Makefile $filename
      - tar -zcvf /root/$filename.tar.gz $filename
      - cp /root/$filename.tar.gz ./SOURCES/
      - cp cbsample-libs.rpmspec ./SPECS/
  build:
    commands:
      - echo "Triggering RPM build"
      - rpmbuild --define "_topdir `pwd`" -ba SPECS/cbsample-libs.rpmspec
      - cd $curr_working_dir

artifacts:
  files:
    - RPMS/x86_64/cbsample-libs*.rpm
  discard-paths: yes

Using the same RPM build project that we created earlier, start a new build and set the value of the `–buildspec-override` parameter to buildspec-libs-rpm.yml .

$ aws codebuild start-build --project-name CodeBuild-RPM-Demo --buildspec-override buildspec-libs-rpm.yml
{
    "build": {
        "buildComplete": false, 
        "initiator": "prakash", 
        "artifacts": {
            "location": "arn:aws:s3:::codebuild-demo-artifact-repository/CodeBuild-RPM-Demo"
        }, 
        "projectName": "CodeBuild-RPM-Demo", 
        "timeoutInMinutes": 15, 
        "buildStatus": "IN_PROGRESS", 
        "environment": {
            "computeType": "BUILD_GENERAL1_SMALL", 
            "privilegedMode": false, 
            "image": "centos:7", 
            "type": "LINUX_CONTAINER", 
            "environmentVariables": []
        }, 
        "source": {
            "buildspec": "buildspec-libs-rpm.yml", 
            "type": "CODECOMMIT", 
            "location": "https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/codebuild-multispec"
        }, 
        "currentPhase": "SUBMITTED", 
        "startTime": 1500562366.239, 
        "id": "CodeBuild-RPM-Demo:82d05f8a-b161-401c-82f0-83cb41eba567", 
        "arn": "arn:aws:codebuild:eu-west-1:012345678912:build/CodeBuild-RPM-Demo:82d05f8a-b161-401c-82f0-83cb41eba567"
    }
}

After the build is completed successfully, check to see if the package appears in the artifact S3 bucket under the CodeBuild-RPM-Demo build project folder.

$ aws s3 ls s3://codebuild-demo-artifact-repository/CodeBuild-RPM-Demo/
2017-07-20 16:16:59       8108 cbsample-0.1-1.el7.centos.x86_64.rpm
2017-07-20 16:53:54       5320 cbsample-libs-0.1-1.el7.centos.x86_64.rpm

Conclusion

In this post, I have shown you how multiple buildspec files in the same source repository can be used to run multiple AWS CodeBuild build projects. I have also shown you how to provide a different buildspec file when starting the build.

For more information about AWS CodeBuild, see the AWS CodeBuild documentation. You can get started with AWS CodeBuild by using this step by step guide.


About the author

Prakash Palanisamy is a Solutions Architect for Amazon Web Services. When he is not working on Serverless, DevOps or Alexa, he will be solving problems in Project Euler. He also enjoys watching educational documentaries.

Security updates for Thursday

Post Syndicated from jake original https://lwn.net/Articles/729041/rss

Security updates have been issued by Arch Linux (lib32-expat, webkit2gtk, and wireshark-cli), Debian (resiprocate), Fedora (java-1.8.0-openjdk, kernel, and open-vm-tools), openSUSE (containerd, docker, runc and gnu-efi, pesign, shim), Red Hat (tomcat), and Ubuntu (gdb, libiberty, and openjdk-8).

Running an elastic HiveMQ cluster with auto discovery on AWS

Post Syndicated from The HiveMQ Team original http://www.hivemq.com/blog/running-hivemq-cluster-aws-auto-discovery

hivemq-aws

HiveMQ is a cloud-first MQTT broker with elastic clustering capabilities and a resilient software design which is a perfect fit for common cloud infrastructures. This blogpost discussed what benefits a MQTT broker cluster offers. Today’s post aims to be more practical and talk about how to set up a HiveMQ on one of the most popular cloud computing platform: Amazon Webservices.

Running HiveMQ on cloud infrastructure

Running a HiveMQ cluster on cloud infrastructure like AWS not only offers the advantage the possibility of elastically scaling the infrastructure, it also assures that state of the art security standards are in place on the infrastructure side. These platforms are typically highly available and new virtual machines can be spawned in a snap if they are needed. HiveMQ’s unique ability to add (and remove) cluster nodes at runtime without any manual reconfiguration of the cluster allow to scale linearly on IaaS providers. New cluster nodes can be started (manually or automatically) and the cluster sizes adapts automatically. For more detailed information about HiveMQ clustering and how to achieve true high availability and linear scalability with HiveMQ, we recommend reading the HiveMQ Clustering Paper.

As Amazon Webservice is amongst the best known and most used cloud platforms, we want to illustrate the setup of a HiveMQ cluster on AWS in this post. Note that similar concepts as displayed in this step by step guide for Running an elastic HiveMQ cluster on AWS apply to other cloud platforms such as Microsoft Azure or Google Cloud Platform.

Setup and Configuration

Amazon Webservices prohibits the use of UDP multicast, which is the default HiveMQ cluster discovery mode. The use of Amazon Simple Storage Service (S3) buckets for auto-discovery is a perfect alternative if the brokers are running on AWS EC2 instances anyway. HiveMQ has a free off-the-shelf plugin available for AWS S3 Cluster Discovery.

The following provides a step-by-step guide how to setup the brokers on AWS EC2 with automatic cluster member discovery via S3.

Setup Security Group

The first step is creating a security group that allows inbound traffic to the listeners we are going to configure for MQTT communication. It is also vital to have SSH access on the instances. After you created the security group you need to edit the group and add an additional rule for internal communication between the cluster nodes (meaning the source is the security group itself) on all TCP ports.

To create and edit security groups go to the EC2 console – NETWORK & SECURITY – Security Groups

Inbound traffic

Inbound traffic

Outbound traffic

Outbound traffic

The next step is to create an s3-bucket in the s3 console. Make sure to choose a region, close to the region you want to run your HiveMQ instances on.

Option A: Create IAM role and assign to EC2 instance

Our recommendation is to configure your EC2 instances in a way, allowing them to have access to the s3 bucket. This way you don’t need to create a specific user and don’t need to use the user’s credentials in the

s3discovery.properties

file.

Create IAM Role

Create IAM Role

EC2 Instance Role Type

EC2 Instance Role Type

Select S3 Full Access

Select S3 Full Access

Assign new Role to Instance

Assign new Role to Instance

Option B: Create user and assign IAM policy

The next step is creating a user in the IAM console.

Choose name and set programmatic access

Choose name and set programmatic access

Assign s3 full access role

Assign s3 full access role

Review and create

Review and create

Download credentials

Download credentials

It is important you store these credentials, as they will be needed later for configuring the S3 Cluster Discovery Plugin.

Start EC2 instances with HiveMQ

The next step is spawning 2 or more EC-2 instances with HiveMQ. Follow the steps in the HiveMQ User Guide.

Install s3 discovery plugin

The final step is downloading, installing and configuring the S3 Cluster Discovery Plugin.
After you downloaded the plugin you need to configure the s3 access in the

s3discovery.properties

file according to which s3 access option you chose.

Option A:

# AWS Credentials                                          #
############################################################

#
# Use environment variables to specify your AWS credentials
# the following variables need to be set:
# AWS_ACCESS_KEY_ID
# AWS_SECRET_ACCESS_KEY
#
#credentials-type:environment_variables

#
# Use Java system properties to specify your AWS credentials
# the following variables need to be set:
# aws.accessKeyId
# aws.secretKey
#
#credentials-type:java_system_properties

#
# Uses the credentials file wich ############################################################
# can be created by calling 'aws configure' (AWS CLI)
# usually this file is located at ~/.aws/credentials (platform dependent)
# The location of the file can be configured by setting the environment variable
# AWS_CREDENTIAL_PROFILE_FILE to the location of your file
#
#credentials-type:user_credentials_file

#
# Uses the IAM Profile assigned to the EC2 instance running HiveMQ to access S3
# Notice: This only works if HiveMQ is running on an EC2 instance !
#
credentials-type:instance_profile_credentials

#
# Tries to access S3 via the default mechanisms in the following order
# 1) Environment variables
# 2) Java system properties
# 3) User credentials file
# 4) IAM profiles assigned to EC2 instance
#
#credentials-type:default

#
# Uses the credentials specified in this file.
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
#
#credentials-type:access_key
#credentials-access-key-id:
#credentials-secret-access-key:

#
# Uses the credentials specified in this file to authenticate with a temporary session
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
# credentials-session-token
#
#credentials-type:temporary_session
#credentials-access-key-id:{access_key_id}
#credentials-secret-access-key:{secret_access_key}
#credentials-session-token:{session_token}


############################################################
# S3 Bucket                                                #
############################################################

#
# Region for the S3 bucket used by hivemq
# see http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region for a list of regions for S3
# example: us-west-2
#
s3-bucket-region:<your region here>

#
# Name of the bucket used by HiveMQ
#
s3-bucket-name:<your s3 bucket name here>

#
# Prefix for the filename of every node's file (optional)
#
file-prefix:hivemq/cluster/nodes/

#
# Expiration timeout (in minutes).
# Files with a timestamp older than (timestamp + expiration) will be automatically deleted
# Set to 0 if you do not want the plugin to handle expiration.
#
file-expiration:360

#
# Interval (in minutes) in which the own information in S3 is updated.
# Set to 0 if you do not want the plugin to update its own information.
# If you disable this you also might want to disable expiration.
#
update-interval:180

Option B:

# AWS Credentials                                          #
############################################################

#
# Use environment variables to specify your AWS credentials
# the following variables need to be set:
# AWS_ACCESS_KEY_ID
# AWS_SECRET_ACCESS_KEY
#
#credentials-type:environment_variables

#
# Use Java system properties to specify your AWS credentials
# the following variables need to be set:
# aws.accessKeyId
# aws.secretKey
#
#credentials-type:java_system_properties

#
# Uses the credentials file wich ############################################################
# can be created by calling 'aws configure' (AWS CLI)
# usually this file is located at ~/.aws/credentials (platform dependent)
# The location of the file can be configured by setting the environment variable
# AWS_CREDENTIAL_PROFILE_FILE to the location of your file
#
#credentials-type:user_credentials_file

#
# Uses the IAM Profile assigned to the EC2 instance running HiveMQ to access S3
# Notice: This only works if HiveMQ is running on an EC2 instance !
#
#credentials-type:instance_profile_credentials

#
# Tries to access S3 via the default mechanisms in the following order
# 1) Environment variables
# 2) Java system properties
# 3) User credentials file
# 4) IAM profiles assigned to EC2 instance
#
#credentials-type:default

#
# Uses the credentials specified in this file.
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
#
credentials-type:access_key
credentials-access-key-id:<your access key id here>
credentials-secret-access-key:<your secret access key here>

#
# Uses the credentials specified in this file to authenticate with a temporary session
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
# credentials-session-token
#
#credentials-type:temporary_session
#credentials-access-key-id:{access_key_id}
#credentials-secret-access-key:{secret_access_key}
#credentials-session-token:{session_token}


############################################################
# S3 Bucket                                                #
############################################################

#
# Region for the S3 bucket used by hivemq
# see http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region for a list of regions for S3
# example: us-west-2
#
s3-bucket-region:<your region here>

#
# Name of the bucket used by HiveMQ
#
s3-bucket-name:<your s3 bucket name here>

#
# Prefix for the filename of every node's file (optional)
#
file-prefix:hivemq/cluster/nodes/

#
# Expiration timeout (in minutes).
# Files with a timestamp older than (timestamp + expiration) will be automatically deleted
# Set to 0 if you do not want the plugin to handle expiration.
#
file-expiration:360

#
# Interval (in minutes) in which the own information in S3 is updated.
# Set to 0 if you do not want the plugin to update its own information.
# If you disable this you also might want to disable expiration.
#
update-interval:180

This file has to be identical on all your cluster nodes.

That’s it. Starting HiveMQ on multiple EC2 instances will now result in them forming a cluster, taking advantage of the S3 bucket for discovery.
You know that your setup was successful when HiveMQ logs something similar to this.

Cluster size = 2, members : [0QMpE, jw8wu].

Enjoy an elastic MQTT broker cluster

We are now able to take advantage of rapid elasticity. Scaling the HiveMQ cluster up or down by adding or removing EC2 instances without the need of administrative intervention is now possible.

For production environments it’s recommended to use automatic provisioning of the EC2 instances (e.g. by using Chef, Puppet, Ansible or similar tools) so you don’t need to configure each EC2 instance manually. Of course HiveMQ can also be used with Docker, which can also ease the provisioning of HiveMQ nodes.

Running an elastic HiveMQ cluster with auto discovery on AWS

Post Syndicated from The HiveMQ Team original https://www.hivemq.com/blog/running-hivemq-cluster-aws-auto-discovery

hivemq-aws

HiveMQ is a cloud-first MQTT broker with elastic clustering capabilities and a resilient software design which is a perfect fit for common cloud infrastructures. This blogpost discussed what benefits a MQTT broker cluster offers. Today’s post aims to be more practical and talk about how to set up a HiveMQ on one of the most popular cloud computing platform: Amazon Webservices.

Running HiveMQ on cloud infrastructure

Running a HiveMQ cluster on cloud infrastructure like AWS not only offers the advantage the possibility of elastically scaling the infrastructure, it also assures that state of the art security standards are in place on the infrastructure side. These platforms are typically highly available and new virtual machines can be spawned in a snap if they are needed. HiveMQ’s unique ability to add (and remove) cluster nodes at runtime without any manual reconfiguration of the cluster allow to scale linearly on IaaS providers. New cluster nodes can be started (manually or automatically) and the cluster sizes adapts automatically. For more detailed information about HiveMQ clustering and how to achieve true high availability and linear scalability with HiveMQ, we recommend reading the HiveMQ Clustering Paper.

As Amazon Webservice is amongst the best known and most used cloud platforms, we want to illustrate the setup of a HiveMQ cluster on AWS in this post. Note that similar concepts as displayed in this step by step guide for Running an elastic HiveMQ cluster on AWS apply to other cloud platforms such as Microsoft Azure or Google Cloud Platform.

Setup and Configuration

Amazon Webservices prohibits the use of UDP multicast, which is the default HiveMQ cluster discovery mode. The use of Amazon Simple Storage Service (S3) buckets for auto-discovery is a perfect alternative if the brokers are running on AWS EC2 instances anyway. HiveMQ has a free off-the-shelf plugin available for AWS S3 Cluster Discovery.

The following provides a step-by-step guide how to setup the brokers on AWS EC2 with automatic cluster member discovery via S3.

Setup Security Group

The first step is creating a security group that allows inbound traffic to the listeners we are going to configure for MQTT communication. It is also vital to have SSH access on the instances. After you created the security group you need to edit the group and add an additional rule for internal communication between the cluster nodes (meaning the source is the security group itself) on all TCP ports.

To create and edit security groups go to the EC2 console – NETWORK & SECURITY – Security Groups

Inbound traffic

Inbound traffic

Outbound traffic

Outbound traffic

The next step is to create an s3-bucket in the s3 console. Make sure to choose a region, close to the region you want to run your HiveMQ instances on.

Option A: Create IAM role and assign to EC2 instance

Our recommendation is to configure your EC2 instances in a way, allowing them to have access to the s3 bucket. This way you don’t need to create a specific user and don’t need to use the user’s credentials in the s3discovery.properties file.

Create IAM Role

Create IAM Role

EC2 Instance Role Type

EC2 Instance Role Type

Select S3 Full Access

Select S3 Full Access

Assign new Role to Instance

Assign new Role to Instance

Option B: Create user and assign IAM policy

The next step is creating a user in the IAM console.

Choose name and set programmatic access

Choose name and set programmatic access

Assign s3 full access role

Assign s3 full access role

Review and create

Review and create

Download credentials

Download credentials

It is important you store these credentials, as they will be needed later for configuring the S3 Cluster Discovery Plugin.

Start EC2 instances with HiveMQ

The next step is spawning 2 or more EC-2 instances with HiveMQ. Follow the steps in the HiveMQ User Guide.

Install s3 discovery plugin

The final step is downloading, installing and configuring the S3 Cluster Discovery Plugin.
After you downloaded the plugin you need to configure the s3 access in the s3discovery.properties file according to which s3 access option you chose.

Option A:

# AWS Credentials                                          #
############################################################

#
# Use environment variables to specify your AWS credentials
# the following variables need to be set:
# AWS_ACCESS_KEY_ID
# AWS_SECRET_ACCESS_KEY
#
#credentials-type:environment_variables

#
# Use Java system properties to specify your AWS credentials
# the following variables need to be set:
# aws.accessKeyId
# aws.secretKey
#
#credentials-type:java_system_properties

#
# Uses the credentials file wich ############################################################
# can be created by calling 'aws configure' (AWS CLI)
# usually this file is located at ~/.aws/credentials (platform dependent)
# The location of the file can be configured by setting the environment variable
# AWS_CREDENTIAL_PROFILE_FILE to the location of your file
#
#credentials-type:user_credentials_file

#
# Uses the IAM Profile assigned to the EC2 instance running HiveMQ to access S3
# Notice: This only works if HiveMQ is running on an EC2 instance !
#
credentials-type:instance_profile_credentials

#
# Tries to access S3 via the default mechanisms in the following order
# 1) Environment variables
# 2) Java system properties
# 3) User credentials file
# 4) IAM profiles assigned to EC2 instance
#
#credentials-type:default

#
# Uses the credentials specified in this file.
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
#
#credentials-type:access_key
#credentials-access-key-id:
#credentials-secret-access-key:

#
# Uses the credentials specified in this file to authenticate with a temporary session
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
# credentials-session-token
#
#credentials-type:temporary_session
#credentials-access-key-id:{access_key_id}
#credentials-secret-access-key:{secret_access_key}
#credentials-session-token:{session_token}


############################################################
# S3 Bucket                                                #
############################################################

#
# Region for the S3 bucket used by hivemq
# see http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region for a list of regions for S3
# example: us-west-2
#
s3-bucket-region:

#
# Name of the bucket used by HiveMQ
#
s3-bucket-name:

#
# Prefix for the filename of every node's file (optional)
#
file-prefix:hivemq/cluster/nodes/

#
# Expiration timeout (in minutes).
# Files with a timestamp older than (timestamp + expiration) will be automatically deleted
# Set to 0 if you do not want the plugin to handle expiration.
#
file-expiration:360

#
# Interval (in minutes) in which the own information in S3 is updated.
# Set to 0 if you do not want the plugin to update its own information.
# If you disable this you also might want to disable expiration.
#
update-interval:180

Option B:

# AWS Credentials                                          #
############################################################

#
# Use environment variables to specify your AWS credentials
# the following variables need to be set:
# AWS_ACCESS_KEY_ID
# AWS_SECRET_ACCESS_KEY
#
#credentials-type:environment_variables

#
# Use Java system properties to specify your AWS credentials
# the following variables need to be set:
# aws.accessKeyId
# aws.secretKey
#
#credentials-type:java_system_properties

#
# Uses the credentials file wich ############################################################
# can be created by calling 'aws configure' (AWS CLI)
# usually this file is located at ~/.aws/credentials (platform dependent)
# The location of the file can be configured by setting the environment variable
# AWS_CREDENTIAL_PROFILE_FILE to the location of your file
#
#credentials-type:user_credentials_file

#
# Uses the IAM Profile assigned to the EC2 instance running HiveMQ to access S3
# Notice: This only works if HiveMQ is running on an EC2 instance !
#
#credentials-type:instance_profile_credentials

#
# Tries to access S3 via the default mechanisms in the following order
# 1) Environment variables
# 2) Java system properties
# 3) User credentials file
# 4) IAM profiles assigned to EC2 instance
#
#credentials-type:default

#
# Uses the credentials specified in this file.
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
#
credentials-type:access_key
credentials-access-key-id:
credentials-secret-access-key:

#
# Uses the credentials specified in this file to authenticate with a temporary session
# The variables you must provide are:
# credentials-access-key-id
# credentials-secret-access-key
# credentials-session-token
#
#credentials-type:temporary_session
#credentials-access-key-id:{access_key_id}
#credentials-secret-access-key:{secret_access_key}
#credentials-session-token:{session_token}


############################################################
# S3 Bucket                                                #
############################################################

#
# Region for the S3 bucket used by hivemq
# see http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region for a list of regions for S3
# example: us-west-2
#
s3-bucket-region:

#
# Name of the bucket used by HiveMQ
#
s3-bucket-name:

#
# Prefix for the filename of every node's file (optional)
#
file-prefix:hivemq/cluster/nodes/

#
# Expiration timeout (in minutes).
# Files with a timestamp older than (timestamp + expiration) will be automatically deleted
# Set to 0 if you do not want the plugin to handle expiration.
#
file-expiration:360

#
# Interval (in minutes) in which the own information in S3 is updated.
# Set to 0 if you do not want the plugin to update its own information.
# If you disable this you also might want to disable expiration.
#
update-interval:180

This file has to be identical on all your cluster nodes.

That’s it. Starting HiveMQ on multiple EC2 instances will now result in them forming a cluster, taking advantage of the S3 bucket for discovery.
You know that your setup was successful when HiveMQ logs something similar to this.

Cluster size = 2, members : [0QMpE, jw8wu].

Enjoy an elastic MQTT broker cluster

We are now able to take advantage of rapid elasticity. Scaling the HiveMQ cluster up or down by adding or removing EC2 instances without the need of administrative intervention is now possible.

For production environments it’s recommended to use automatic provisioning of the EC2 instances (e.g. by using Chef, Puppet, Ansible or similar tools) so you don’t need to configure each EC2 instance manually. Of course HiveMQ can also be used with Docker, which can also ease the provisioning of HiveMQ nodes.

Analyze OpenFDA Data in R with Amazon S3 and Amazon Athena

Post Syndicated from Ryan Hood original https://aws.amazon.com/blogs/big-data/analyze-openfda-data-in-r-with-amazon-s3-and-amazon-athena/

One of the great benefits of Amazon S3 is the ability to host, share, or consume public data sets. This provides transparency into data to which an external data scientist or developer might not normally have access. By exposing the data to the public, you can glean many insights that would have been difficult with a data silo.

The openFDA project creates easy access to the high value, high priority, and public access data of the Food and Drug Administration (FDA). The data has been formatted and documented in consumer-friendly standards. Critical data related to drugs, devices, and food has been harmonized and can easily be called by application developers and researchers via API calls. OpenFDA has published two whitepapers that drill into the technical underpinnings of the API infrastructure as well as how to properly analyze the data in R. In addition, FDA makes openFDA data available on S3 in raw format.

In this post, I show how to use S3, Amazon EMR, and Amazon Athena to analyze the drug adverse events dataset. A drug adverse event is an undesirable experience associated with the use of a drug, including serious drug side effects, product use errors, product quality programs, and therapeutic failures.

Data considerations

Keep in mind that this data does have limitations. In addition, in the United States, these adverse events are submitted to the FDA voluntarily from consumers so there may not be reports for all events that occurred. There is no certainty that the reported event was actually due to the product. The FDA does not require that a causal relationship between a product and event be proven, and reports do not always contain the detail necessary to evaluate an event. Because of this, there is no way to identify the true number of events. The important takeaway to all this is that the information contained in this data has not been verified to produce cause and effect relationships. Despite this disclaimer, many interesting insights and value can be derived from the data to accelerate drug safety research.

Data analysis using SQL

For application developers who want to perform targeted searching and lookups, the API endpoints provided by the openFDA project are “ready to go” for software integration using a standard API powered by Elasticsearch, NodeJS, and Docker. However, for data analysis purposes, it is often easier to work with the data using SQL and statistical packages that expect a SQL table structure. For large-scale analysis, APIs often have query limits, such as 5000 records per query. This can cause extra work for data scientists who want to analyze the full dataset instead of small subsets of data.

To address the concern of requiring all the data in a single dataset, the openFDA project released the full 100 GB of harmonized data files that back the openFDA project onto S3. Athena is an interactive query service that makes it easy to analyze data in S3 using standard SQL. It’s a quick and easy way to answer your questions about adverse events and aspirin that does not require you to spin up databases or servers.

While you could point tools directly at the openFDA S3 files, you can find greatly improved performance and use of the data by following some of the preparation steps later in this post.

Architecture

This post explains how to use the following architecture to take the raw data provided by openFDA, leverage several AWS services, and derive meaning from the underlying data.

Steps:

  1. Load the openFDA /drug/event dataset into Spark and convert it to gzip to allow for streaming.
  2. Transform the data in Spark and save the results as a Parquet file in S3.
  3. Query the S3 Parquet file with Athena.
  4. Perform visualization and analysis of the data in R and Python on Amazon EC2.

Optimizing public data sets: A primer on data preparation

Those who want to jump right into preparing the files for Athena may want to skip ahead to the next section.

Transforming, or pre-processing, files is a common task for using many public data sets. Before you jump into the specific steps for transforming the openFDA data files into a format optimized for Athena, I thought it would be worthwhile to provide a quick exploration on the problem.

Making a dataset in S3 efficiently accessible with minimal transformation for the end user has two key elements:

  1. Partitioning the data into objects that contain a complete part of the data (such as data created within a specific month).
  2. Using file formats that make it easy for applications to locate subsets of data (for example, gzip, Parquet, ORC, etc.).

With these two key elements in mind, you can now apply transformations to the openFDA adverse event data to prepare it for Athena. You might find the data techniques employed in this post to be applicable to many of the questions you might want to ask of the public data sets stored in Amazon S3.

Before you get started, I encourage those who are interested in doing deeper healthcare analysis on AWS to make sure that you first read the AWS HIPAA Compliance whitepaper. This covers the information necessary for processing and storing patient health information (PHI).

Also, the adverse event analysis shown for aspirin is strictly for demonstration purposes and should not be used for any real decision or taken as anything other than a demonstration of AWS capabilities. However, there have been robust case studies published that have explored a causal relationship between aspirin and adverse reactions using OpenFDA data. If you are seeking research on aspirin or its risks, visit organizations such as the Centers for Disease Control and Prevention (CDC) or the Institute of Medicine (IOM).

Preparing data for Athena

For this walkthrough, you will start with the FDA adverse events dataset, which is stored as JSON files within zip archives on S3. You then convert it to Parquet for analysis. Why do you need to convert it? The original data download is stored in objects that are partitioned by quarter.

Here is a small sample of what you find in the adverse events (/drugs/event) section of the openFDA website.

If you were looking for events that happened in a specific quarter, this is not a bad solution. For most other scenarios, such as looking across the full history of aspirin events, it requires you to access a lot of data that you won’t need. The zip file format is not ideal for using data in place because zip readers must have random access to the file, which means the data can’t be streamed. Additionally, the zip files contain large JSON objects.

To read the data in these JSON files, a streaming JSON decoder must be used or a computer with a significant amount of RAM must decode the JSON. Opening up these files for public consumption is a great start. However, you still prepare the data with a few lines of Spark code so that the JSON can be streamed.

Step 1:  Convert the file types

Using Apache Spark on EMR, you can extract all of the zip files and pull out the events from the JSON files. To do this, use the Scala code below to deflate the zip file and create a text file. In addition, compress the JSON files with gzip to improve Spark’s performance and reduce your overall storage footprint. The Scala code can be run in either the Spark Shell or in an Apache Zeppelin notebook on your EMR cluster.

If you are unfamiliar with either Apache Zeppelin or the Spark Shell, the following posts serve as great references:

 

import scala.io.Source
import java.util.zip.ZipInputStream
import org.apache.spark.input.PortableDataStream
import org.apache.hadoop.io.compress.GzipCodec

// Input Directory
val inputFile = "s3://download.open.fda.gov/drug/event/2015q4/*.json.zip";

// Output Directory
val outputDir = "s3://{YOUR OUTPUT BUCKET HERE}/output/2015q4/";

// Extract zip files from 
val zipFiles = sc.binaryFiles(inputFile);

// Process zip file to extract the json as text file and save it
// in the output directory 
val rdd = zipFiles.flatMap((file: (String, PortableDataStream)) => {
    val zipStream = new ZipInputStream(file.2.open)
    val entry = zipStream.getNextEntry
    val iter = Source.fromInputStream(zipStream).getLines
    iter
}).map(.replaceAll("\s+","")).saveAsTextFile(outputDir, classOf[GzipCodec])

Step 2:  Transform JSON into Parquet

With just a few more lines of Scala code, you can use Spark’s abstractions to convert the JSON into a Spark DataFrame and then export the data back to S3 in Parquet format.

Spark requires the JSON to be in JSON Lines format to be parsed correctly into a DataFrame.

// Output Parquet directory
val outputDir = "s3://{YOUR OUTPUT BUCKET NAME}/output/drugevents"
// Input json file
val inputJson = "s3://{YOUR OUTPUT BUCKET NAME}/output/2015q4/*”
// Load dataframe from json file multiline 
val df = spark.read.json(sc.wholeTextFiles(inputJson).values)
// Extract results from dataframe
val results = df.select("results")
// Save it to Parquet
results.write.parquet(outputDir)

Step 3:  Create an Athena table

With the data cleanly prepared and stored in S3 using the Parquet format, you can now place an Athena table on top of it to get a better understanding of the underlying data.

Because the openFDA data structure incorporates several layers of nesting, it can be a complex process to try to manually derive the underlying schema in a Hive-compatible format. To shorten this process, you can load the top row of the DataFrame from the previous step into a Hive table within Zeppelin and then extract the “create  table” statement from SparkSQL.

results.createOrReplaceTempView("data")

val top1 = spark.sql("select * from data tablesample(1 rows)")

top1.write.format("parquet").mode("overwrite").saveAsTable("drugevents")

val show_cmd = spark.sql("show create table drugevents”).show(1, false)

This returns a “create table” statement that you can almost paste directly into the Athena console. Make some small modifications (adding the word “external” and replacing “using with “stored as”), and then execute the code in the Athena query editor. The table is created.

For the openFDA data, the DDL returns all string fields, as the date format used in your dataset does not conform to the yyy-mm-dd hh:mm:ss[.f…] format required by Hive. For your analysis, the string format works appropriately but it would be possible to extend this code to use a Presto function to convert the strings into time stamps.

CREATE EXTERNAL TABLE  drugevents (
   companynumb  string, 
   safetyreportid  string, 
   safetyreportversion  string, 
   receiptdate  string, 
   patientagegroup  string, 
   patientdeathdate  string, 
   patientsex  string, 
   patientweight  string, 
   serious  string, 
   seriousnesscongenitalanomali  string, 
   seriousnessdeath  string, 
   seriousnessdisabling  string, 
   seriousnesshospitalization  string, 
   seriousnesslifethreatening  string, 
   seriousnessother  string, 
   actiondrug  string, 
   activesubstancename  string, 
   drugadditional  string, 
   drugadministrationroute  string, 
   drugcharacterization  string, 
   drugindication  string, 
   drugauthorizationnumb  string, 
   medicinalproduct  string, 
   drugdosageform  string, 
   drugdosagetext  string, 
   reactionoutcome  string, 
   reactionmeddrapt  string, 
   reactionmeddraversionpt  string)
STORED AS parquet
LOCATION
  's3://{YOUR TARGET BUCKET}/output/drugevents'

With the Athena table in place, you can start to explore the data by running ad hoc queries within Athena or doing more advanced statistical analysis in R.

Using SQL and R to analyze adverse events

Using the openFDA data with Athena makes it very easy to translate your questions into SQL code and perform quick analysis on the data. After you have prepared the data for Athena, you can begin to explore the relationship between aspirin and adverse drug events, as an example. One of the most common metrics to measure adverse drug events is the Proportional Reporting Ratio (PRR). It is defined as:

PRR = (m/n)/( (M-m)/(N-n) )
Where
m = #reports with drug and event
n = #reports with drug
M = #reports with event in database
N = #reports in database

Gastrointestinal haemorrhage has the highest PRR of any reaction to aspirin when viewed in aggregate. One question you may want to ask is how the PRR has trended on a yearly basis for gastrointestinal haemorrhage since 2005.

Using the following query in Athena, you can see the PRR trend of “GASTROINTESTINAL HAEMORRHAGE” reactions with “ASPIRIN” since 2005:

with drug_and_event as 
(select rpad(receiptdate, 4, 'NA') as receipt_year
    , reactionmeddrapt
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as reports_with_drug_and_event 
from fda.drugevents
where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
     and medicinalproduct = 'ASPIRIN'
     and reactionmeddrapt= 'GASTROINTESTINAL HAEMORRHAGE'
group by reactionmeddrapt, rpad(receiptdate, 4, 'NA') 
), reports_with_drug as 
(
select rpad(receiptdate, 4, 'NA') as receipt_year
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as reports_with_drug 
 from fda.drugevents 
 where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
     and medicinalproduct = 'ASPIRIN'
group by rpad(receiptdate, 4, 'NA') 
), reports_with_event as 
(
   select rpad(receiptdate, 4, 'NA') as receipt_year
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as reports_with_event 
   from fda.drugevents
   where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
     and reactionmeddrapt= 'GASTROINTESTINAL HAEMORRHAGE'
   group by rpad(receiptdate, 4, 'NA')
), total_reports as 
(
   select rpad(receiptdate, 4, 'NA') as receipt_year
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as total_reports 
   from fda.drugevents
   where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
   group by rpad(receiptdate, 4, 'NA')
)
select  drug_and_event.receipt_year, 
(1.0 * drug_and_event.reports_with_drug_and_event/reports_with_drug.reports_with_drug)/ (1.0 * (reports_with_event.reports_with_event- drug_and_event.reports_with_drug_and_event)/(total_reports.total_reports-reports_with_drug.reports_with_drug)) as prr
, drug_and_event.reports_with_drug_and_event
, reports_with_drug.reports_with_drug
, reports_with_event.reports_with_event
, total_reports.total_reports
from drug_and_event
    inner join reports_with_drug on  drug_and_event.receipt_year = reports_with_drug.receipt_year   
    inner join reports_with_event on  drug_and_event.receipt_year = reports_with_event.receipt_year
    inner join total_reports on  drug_and_event.receipt_year = total_reports.receipt_year
order by  drug_and_event.receipt_year


One nice feature of Athena is that you can quickly connect to it via R or any other tool that can use a JDBC driver to visualize the data and understand it more clearly.

With this quick R script that can be run in R Studio either locally or on an EC2 instance, you can create a visualization of the PRR and Reporting Odds Ratio (RoR) for “GASTROINTESTINAL HAEMORRHAGE” reactions from “ASPIRIN” since 2005 to better understand these trends.

# connect to ATHENA
conn <- dbConnect(drv, '<Your JDBC URL>',s3_staging_dir="<Your S3 Location>",user=Sys.getenv(c("USER_NAME"),password=Sys.getenv(c("USER_PASSWORD"))

# Declare Adverse Event
adverseEvent <- "'GASTROINTESTINAL HAEMORRHAGE'"

# Build SQL Blocks
sqlFirst <- "SELECT rpad(receiptdate, 4, 'NA') as receipt_year, count(DISTINCT safetyreportid) as event_count FROM fda.drugsflat WHERE rpad(receiptdate,4,'NA') between '2005' and '2015'"
sqlEnd <- "GROUP BY rpad(receiptdate, 4, 'NA') ORDER BY receipt_year"

# Extract Aspirin with adverse event counts
sql <- paste(sqlFirst,"AND medicinalproduct ='ASPIRIN' AND reactionmeddrapt=",adverseEvent, sqlEnd,sep=" ")
aspirinAdverseCount = dbGetQuery(conn,sql)

# Extract Aspirin counts
sql <- paste(sqlFirst,"AND medicinalproduct ='ASPIRIN'", sqlEnd,sep=" ")
aspirinCount = dbGetQuery(conn,sql)

# Extract adverse event counts
sql <- paste(sqlFirst,"AND reactionmeddrapt=",adverseEvent, sqlEnd,sep=" ")
adverseCount = dbGetQuery(conn,sql)

# All Drug Adverse event Counts
sql <- paste(sqlFirst, sqlEnd,sep=" ")
allDrugCount = dbGetQuery(conn,sql)

# Select correct rows
selAll =  allDrugCount$receipt_year == aspirinAdverseCount$receipt_year
selAspirin = aspirinCount$receipt_year == aspirinAdverseCount$receipt_year
selAdverse = adverseCount$receipt_year == aspirinAdverseCount$receipt_year

# Calculate Numbers
m <- c(aspirinAdverseCount$event_count)
n <- c(aspirinCount[selAspirin,2])
M <- c(adverseCount[selAdverse,2])
N <- c(allDrugCount[selAll,2])

# Calculate proptional reporting ratio
PRR = (m/n)/((M-m)/(N-n))

# Calculate reporting Odds Ratio
d = n-m
D = N-M
ROR = (m/d)/(M/D)

# Plot the PRR and ROR
g_range <- range(0, PRR,ROR)
g_range[2] <- g_range[2] + 3
yearLen = length(aspirinAdverseCount$receipt_year)
axis(1,1:yearLen,lab=ax)
plot(PRR, type="o", col="blue", ylim=g_range,axes=FALSE, ann=FALSE)
axis(1,1:yearLen,lab=ax)
axis(2, las=1, at=1*0:g_range[2])
box()
lines(ROR, type="o", pch=22, lty=2, col="red")

As you can see, the PRR and RoR have both remained fairly steady over this time range. With the R Script above, all you need to do is change the adverseEvent variable from GASTROINTESTINAL HAEMORRHAGE to another type of reaction to analyze and compare those trends.

Summary

In this walkthrough:

  • You used a Scala script on EMR to convert the openFDA zip files to gzip.
  • You then transformed the JSON blobs into flattened Parquet files using Spark on EMR.
  • You created an Athena DDL so that you could query these Parquet files residing in S3.
  • Finally, you pointed the R package at the Athena table to analyze the data without pulling it into a database or creating your own servers.

If you have questions or suggestions, please comment below.


Next Steps

Take your skills to the next level. Learn how to optimize Amazon S3 for an architecture commonly used to enable genomic data analysis. Also, be sure to read more about running R on Amazon Athena.

 

 

 

 

 


About the Authors

Ryan Hood is a Data Engineer for AWS. He works on big data projects leveraging the newest AWS offerings. In his spare time, he enjoys watching the Cubs win the World Series and attempting to Sous-vide anything he can find in his refrigerator.

 

 

Vikram Anand is a Data Engineer for AWS. He works on big data projects leveraging the newest AWS offerings. In his spare time, he enjoys playing soccer and watching the NFL & European Soccer leagues.

 

 

Dave Rocamora is a Solutions Architect at Amazon Web Services on the Open Data team. Dave is based in Seattle and when he is not opening data, he enjoys biking and drinking coffee outside.

 

 

 

 

DevOps Cafe Episode 73 – Patrick Chanezon

Post Syndicated from DevOpsCafeAdmin original http://devopscafe.org/show/2017/7/11/devops-cafe-episode-73-patrick-chanezon.html

Changing how developers make change.

Patrick Chanezon chats with John and Damon about trends that are changing how Developers work.


 

  

Direct download

Follow John Willis on Twitter: @botchagalupe
Follow Damon Edwards on Twitter: @damonedwards 
Follow Patrick Chanezon on Twitter: @chanezon

Notes:

 

Please tweet or leave comments or questions below and we’ll read them on the show!