Tag Archives: Kubernetes

Run your Kubernetes Workloads on Amazon EC2 Spot Instances with Amazon EKS

Post Syndicated from Roshni Pary original https://aws.amazon.com/blogs/compute/run-your-kubernetes-workloads-on-amazon-ec2-spot-instances-with-amazon-eks/

Contributed by Madhuri Peri, Sr. EC2 Spot Specialist SA, and Shawn OConnor, AWS Enterprise Solutions Architect

Many organizations today are using containers to package source code and dependencies into lightweight, immutable artifacts that can be deployed reliably to any environment.

Kubernetes (K8s) is an open-source framework for automated scheduling and management of containerized workloads. In addition to master nodes, a K8s cluster is made up of worker nodes where containers are scheduled and run.

Amazon Elastic Container Service for Kubernetes (Amazon EKS) is a managed service that removes the need to manage the installation, scaling, or administration of master nodes and the etcd distributed key-value store. It provides a highly available and secure K8s control plane.

This post demonstrates how to use Spot Instances as K8s worker nodes, and shows the areas of provisioning, automatic scaling, and handling interruptions (termination) of K8s worker nodes across your cluster.

What this post does not cover

This post focuses primarily on EC2 instance scaling. This post also assumes a default interruption mode of terminate for EC2 instances, though there are other interruption types, stop and hibernate. For stateless K8s sessions, I recommend choosing the interruption mode of terminate.

Spot Instances

Amazon EC2 Spot Instances are spare EC2 capacity that offer discounts of 70-90% over On-Demand prices. The Spot price is determined by term trends in supply and demand and the amount of On-Demand capacity on a particular instance size, family, Availability Zone, and AWS Region.

If the available On-Demand capacity of a particular instance type is depleted, the Spot Instance is sent an interruption notice two minutes ahead to gracefully wrap up things. I recommend a diversified fleet of instances, with multiple instance types created by Spot Fleets or EC2 Fleets.

You can use Spot Instances for various fault-tolerant and flexible applications. In a workload that uses container orchestration and management platforms like EKS or Amazon Elastic Container Service (Amazon ECS), the schedulers have built-in mechanisms to identify any pods or containers on these interrupted EC2 instances. The interrupted pods or containers are then replaced on other EC2 instances in the cluster.

Solution architecture

There are three goals to accomplish with this solution:

  1.  The cluster must scale automatically to match the demands of an application.
  2. Optimize for cost by using Spot Instances.
  3. The cluster must be resilient to Spot Instance interruptions.

These goals are accomplished with the following components:

Solution component Role in solution Code Deployment
Cluster Autoscaler Scales EC2 instances in or out Open source K8s pod DaemonSet on On-Demand Instances
Auto Scaling group Provisions Spot or On-Demand Instances AWS Via CloudFormation
Spot Instance interrupt handler Sets K8s nodes to drain state, when the Spot Instance is interrupted Open source K8s pod DaemonSet on all K8s nodes with the label lifecycle=EC2Spot

Here’s a diagram of the solution architecture.

There are a few important things to note in this architecture:

  • Cluster Autoscaler is being used to control all scaling activities, with changes to the MinSize and DesiredCapacity parameters of the Auto Scaling group. This separation of duties ensures that there are no race conditions.
  • The Auto Scaling groups are used purely to replace any lost instances automatically (for example, terminations or interruptions) and maintain the desired number of instances. There are no scaling policies attached to the groups.
  • Auto Scaling, at the time of this post, supports a single instance type. As noted by Jeff Barr’s post EC2 Fleet – Manage Thousands of On-Demand and Spot Instances with One Request, in H2 2018, Auto Scaling groups will support mixed instance types. At that point, multiple groups will not be required, and can collapse into a single group specifying all instance types.

Here’s a further breakdown on the components.

Cluster Autoscaler

Automatic scaling in K8s comes in two forms:

  • Horizontal Pod Autoscaler scales the pods in a deployment or replica set. It is implemented as a K8s API resource and a controller. The controller manager queries the resource utilization against the metrics specified in each HorizontalPodAutoscaler definition. It obtains the metrics from either the resource metrics API (for per-pod resource metrics), or the custom metrics API (for all other metrics).
  • Cluster Autoscaler scales the worker nodes available for pods to be placed. Cluster Autoscaler is the focus for this post.

Cluster Autoscaler is the default K8s component that can be used to perform pod scaling as well as scaling nodes in a cluster. It automatically increases the size of an Auto Scaling group so that pods have a place to run. And it attempts to remove idle nodes, that is, nodes with no running pods.

When a pod cannot be scheduled due to lack of available resources, Cluster Autoscaler determines that the cluster must scale up. Expander interfaces allow you to apply different pod placement strategies. Currently, the following strategies are supported:

  • Random – Randomly select an available node group.
  • Most Pods – Selects the group that can schedule the largest quantity of nodes. This can be used balance the load across groups of nodes.
  • Least Waste – This is commonly referred to as ‘bin packing.’ It selects the node-group with the least available tied resource (CPU or memory). This helps to reduce the total node footprint, and is the strategy used in this post.

Although Cluster Autoscaler is the de facto standard for automatic scaling in K8s, it is not part of the main release. Deploy it like any other pod in the kube-system namespace, like other management pods. Those management pods would prevent the cluster from scaling down. Override this default behavior by passing in the –-skip-nodes-with-system-pods=false flag.

But how do you reliably control scale-down operations so that you do not remove the pods that you need? This is accomplished using a pod disruption budget (PDB). A PDB limits the number of replicated pods that can be down at a given time. Create a PDB to ensure that you always have at least one Cluster Autoscaler pod running

In summary, Cluster Autoscaler does not remove nodes under the following scenarios:

  • Pods with a restrictive PDB.
  • Pods running in the kube-system namespace that are deployed (that is, not run on the node by default or which do not have a PDB).
  • Pods not backed by a controller object (not created by a deployment, replica set, job, stateful-set, and so on).
  • Pods running with local storage.
  • Pods running that cannot be moved elsewhere due to various constraints (lack of resources, non-matching node selectors or affinity, matching anti-affinity, and so on).

Auto Scaling Group

With Spot Instances, each instance type in each Availability Zone is a pool with its own Spot price based on the available capacity. A recommended best practice when working with Spot Instances is to use a diversified fleet of instances with multiple instance types, as created by Spot Fleet or EC2 Fleet. These APIs aim to fulfill the specified TargetCapacity across the instance types to launch the number of Spot Instances and optionally, On-Demand Instances.

Unfortunately, Cluster Autoscaler does not support Spot Fleets at this time. You need a different strategy to provide diversification. Cluster Autoscaler for AWS provides integration with Auto Scaling groups. It enables users to choose from four different options of deployment:

  • One Auto Scaling group
  • Multiple Auto Scaling groups
  • Auto-Discovery
  • Master Node setup

For this post, you use the Multi-ASG deployment option. For Cluster Autoscaler and other cluster administration and management pods that run on EKS worker nodes, create a small Auto Scaling group using On-Demand Instances. This ensures that the health of the cluster is not impacted by Spot interruptions.

In K8s, label selectors are used to control where pods are placed. Use the K8s node label selector to place the appropriate pods on Spot or On-Demand Instances.

Interrupt handler

The last component to consider handles how the cluster responds to the interruption of a Spot Instance. The workflow can be summarized as:

  • Identify that a Spot Instance is being reclaimed.
  • Use the 2-minute notification window to gracefully prepare the node for termination.
  • Taint the node and cordon it off to prevent new pods from being placed.
  • Drain connections on the running pods.
  • To maintain desired capacity, replace the pods on remaining nodes.

Spot interruptions are reported in the following ways:

For this post, you use a K8s DaemonSet, which means running one pod per node. The pod periodically polls the EC2 metadata service for a Spot termination notice. If a termination notice is received (HTTP status 200), then it tries to gracefully stop and restart on other nodes before the 2-minute grace period expires. This approach is based on an existing project at the kube-spot-termination-notice-handler GitHub repo.

 Walkthrough

Here’s the suggested workflow for this solution:

  1. Provision the worker nodes with EC2 instances using CloudFormation templates.
  2. Deploy the K8s Cluster Autoscaler pods as a DaemonSet, with a PDB.
  3. Deploy the Spot Instance interrupt handler pods as a DaemonSet.
  4. Deploy the sample application

Prerequisites

You should have the following resources or configurations before starting this walkthrough:

  • An EKS cluster master endpoint
  • An EKS service role ARN
  • Subnet IDs and the control plane security group values
  • EKS master cluster certificates
  • Configuration of kubectl against the master EKS endpoint

For more information, see Amazon EKS – Now Generally Available and Deploy a Kubernetes Application with Amazon Elastic Container Service for Kubernetes.

When you describe the EKS cluster, you get a response like the following sample output:

    "cluster": {
        "name": " DemoSpotClusterScale",
        "arn": "arn:aws:eks:us-west-2: 0123456789012:cluster/ DemoSpotClusterScale",
        "createdAt": 1528317531.751,
        "version": "1.10",
        "endpoint": "https://B960845ED5E21A3439ABB5E12F09CE88.sk1.us-west-2.eks.amazonaws.com",
        "roleArn": "arn:aws:iam::0123456789012:role/eksServiceRoleGA",
        "resourcesVpcConfig": {
            "subnetIds": [
                "subnet-3326464a",
                "subnet-c2b93b89",
                "subnet-13225b49"
            ],
            "securityGroupIds": [
                "sg-7fd0b70e"
            ],
            "vpcId": "vpc-c7c8c4be"
        },
        "status": "ACTIVE",
        "certificateAuthority": {
            "data": "<Your ca data here>"
        }
    }
}

I use the cluster name DemoSpotClusterScale throughout this post. Replace that with your cluster name in the following commands.

Get started

git clone https://github.com/awslabs/ec2-spot-labs.git

cd ec2-spot-labs/ec2-spot-eks-solution

Provision the worker nodes

Add worker nodes to your cluster so that you can deploy your applications. Worker nodes can be either Spot or On-Demand Instances. In this example, use Spot Instances for worker nodes.

You can use this customized AWS CloudFormation template to create the Auto Scaling groups described earlier. This template also labels the node with a lifecycle key value indicating whether it is an On-Demand or Spot Instance node.

The template deploys Auto Scaling groups dedicated to the following instance types:

  • Spot Instances, m4.large, across three Availability Zones.
  • Spot Instances, t2.medium, across three Availability Zones.
  • On-Demand Instances, across three Availability Zones.

Make sure that you apply the aws-auth-cm.yaml file with the appropriate NodeInstanceRole value, as provisioned by the CloudFormation template. Find this parameter on the Resources tab.

kubectl apply -f aws-auth-cm.yaml

If the kubectl get nodes command worked as documented, then you are ready to proceed to the next section

Deploying Cluster Autoscaler and PDB

  1. Download the manifest file cluster-autoscaler-ds.yaml. There are six K8s resources that enable the cluster-autoscaler add-on to work in the EKS environment:
    • Service account
    • Cluster role
    • Role
    • Cluster role binding
    • Role binding
    • Two Auto Scaling groups created by the CloudFormation template for Spot and On-Demand Instances

    You also see the cluster-autoscaler command with configured parameters.

  2. Edit the cluster-autoscaler-ds.yaml file to replace the [OD-NodeGroup-Name], [Spot-NodeGroup1-Name], [Spot-NodeGroup2-Name] sections in lines 141-143 with the resources created in your worker node cloudformation template as shown in screenshot above. Deploy the cluster-autoscaler-ds.yaml manifest
    $ kubectl create -f cluster-autoscaler/cluster-autoscaler-ds.yaml

  3. Monitor the deployment:
    $ kubectl logs cluster-autoscaler-<podgeneratedID> --namespace=kube-system

  4. Download and deploy the Cluster Autoscaler PDB:
    $ kubectl create -f cluster-autoscaler/cluster-autoscaler-pdb.yaml

Deploy the Spot Instance interrupt handler

Each K8s EC2 node being launched must have the lifecycle=Ec2Spot value for -node-label, as in the following example. This line is an excerpt from the CloudFormation template:

“sed -i s,MAX_PODS,”, !Join [ “”, [ “‘”, { “Fn::FindInMap”: [ MaxPodsPerNode, { Ref: SpotNode2InstanceType }, MaxPods ] }, ” –node-labels “, “lifecycle=Ec2Spot” , “‘” ] ], “,g /etc/systemd/system/kubelet.service”, “\n”,

The Docker image contains the instance metadata poll script, as shown in entrypoint.sh. Publish this image to your repository. In the following screenshot, I used my ECR repository. A sample image is available on Docker Hub.

Deploy the Spot interrupt handler pod using spec. This sets up the DaemonSet only on the instances that have a K8s label of lifecycle=Ec2Spot.

kubectl apply -f spot-termination-handler/deploy-k8-pod/spot-interrupt-handler.yaml

When the Spot Instance is interrupted, this pod catches the interruption and vacates the pods.

Deploy the sample application and test out scaling up & down

Deploy a sample application with three replicas. Create a new manifest file named greeter-sample.yaml from the code below, or download it from here

You are using node affinity to prefer deployment on Spot Instances. If the Ec2Spot label is unavailable, the manifest file allows the application to run elsewhere

$ kubectl create -f sample/greeter-sample.yaml

Scale up, and watch Cluster Autoscaler manage the Auto Scaling groups. Verify that Cluster Autoscaler is working by scaling up the sample service beyond the current limits of the cluster.

$ kubectl scale --replicas=50 deployment/greeter-sample

Check the AWS Management Console to confirm that the Auto Scaling groups are scaling up to meet demand. This may take a few minutes. You can also follow along with the pod deployment from the command line. You should see the pods transition from pending to running as nodes are scaled up.

$ kubectl get pods -o wide --watch

Scale down, and watch Cluster Autoscaler manage the Auto Scaling groups:

$ kubectl scale --replicas=1 deployment/greeter

Check the K8s logs to watch the terminations occur:

$ kubectl logs deployment/cluster-autoscaler-<podgeneratedID> –namespace=kube-system

Conclusion

In this post, I showed you how to use Spot Instances with K8s workloads, by provisioning, scaling, and managing terminations effectively in EKS clusters to leverage both cost and scale optimizations. Happy coding!

Running GPU-Accelerated Kubernetes Workloads on P3 and P2 EC2 Instances with Amazon EKS

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/running-gpu-accelerated-kubernetes-workloads-on-p3-and-p2-ec2-instances-with-amazon-eks/

This post contributed by Scott Malkie, AWS Solutions Architect

Amazon EC2 P3 and P2 instances, featuring NVIDIA GPUs, power some of the most computationally advanced workloads today, including machine learning (ML), high performance computing (HPC), financial analytics, and video transcoding. Now Amazon Elastic Container Service for Kubernetes (Amazon EKS) supports P3 and P2 instances, making it easy to deploy, manage, and scale GPU-based containerized applications.

This blog post walks through how to start up GPU-powered worker nodes and connect them to an existing Amazon EKS cluster. Then it demonstrates an example application to show how containers can take advantage of all that GPU power!

Prerequisites

You need an existing Amazon EKS cluster, kubectl, and the aws-iam-authenticator set up according to Getting Started with Amazon EKS.

Two steps are required to enable GPU workloads. First, join Amazon EC2 P3 or P2 GPU compute instances as worker nodes to the Kubernetes cluster. Second, configure pods to enable container-level access to the node’s GPUs.

Spinning up Amazon EC2 GPU instances and joining them to an existing Amazon EKS Cluster

To start the worker nodes, use the standard AWS CloudFormation template for Amazon EKS worker nodes, specifying the AMI ID of the new Amazon EKS-optimized AMI for GPU workloads. This AMI is available on AWS Marketplace.

Subscribe to the AMI and then launch it using the AWS CloudFormation template. The template takes care of networking, configuring kubelets, and placing your worker nodes into an Auto Scaling group, as shown in the following image.

This template creates an Auto Scaling group with up to two p3.8xlarge Amazon EC2 GPU instances. Powered by up to eight NVIDIA Tesla V100 GPUs, these instances deliver up to 1 petaflop of mixed-precision performance per instance to significantly accelerate ML and HPC applications. Amazon EC2 P3 instances have been proven to reduce ML training times from days to hours and to reduce time-to-results for HPC.

After the AWS CloudFormation template completes, the Outputs view contains the NodeInstanceRole parameter, as shown in the following image.

NodeInstanceRole needs to be passed in to the AWS Authenticator ConfigMap, as documented in the AWS EKS Getting Started Guide. To do so, edit the ConfigMap template and run the command kubectl apply -f aws-auth-cm.yaml in your terminal to apply the ConfigMap. You can then run kubectl get nodes —watch to watch the two Amazon EC2 GPU instances join the cluster, as shown in the following image.

Configuring Kubernetes pods to access GPU resources

First, use the following command to apply the NVIDIA Kubernetes device plugin as a daemon set on the cluster.

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.10/nvidia-device-plugin.yml

This command produces the following output:

Once the daemon set is running on the GPU-powered worker nodes, use the following command to verify that each node has allocatable GPUs.

kubectl get nodes \
"-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"

The following output shows that each node has four GPUs available:

Next, modify any Kubernetes pod manifests, such as the following one, to take advantage of these GPUs. In general, adding the resources configuration (resources: limits:) to pod manifests gives containers access to one GPU. A pod can have access to all of the GPUs available to the node that it’s running on.

apiVersion: v1
kind: Pod
metadata:
  name: pod-name
spec:
  containers:
  - name: container-name
    ...
    resources:
      limits:
        nvidia.com/gpu: 4

As a more specific example, the following sample manifest displays the results of the nvidia-smi binary, which shows diagnostic information about all GPUs visible to the container.

apiVersion: v1
kind: Pod
metadata:
  name: nvidia-smi
spec:
  restartPolicy: OnFailure
  containers:
  - name: nvidia-smi
    image: nvidia/cuda:latest
    args:
    - "nvidia-smi"
    resources:
      limits:
        nvidia.com/gpu: 4

Download this manifest as nvidia-smi-pod.yaml and launch it with kubectl apply -f nvidia-smi-pod.yaml.

To confirm successful nvidia-smi execution, use the following command to examine the log.

kubectl logs nvidia-smi

The above commands produce the following output:

Existing limitations

  • GPUs cannot be overprovisioned – containers and pods cannot share GPUs
  • The maximum number of GPUs that you can schedule to a pod is capped by the number of GPUs available to that pod’s node
  • Depending on your account, you might have Amazon EC2 service limits on how many and which type of Amazon EC2 GPU compute instances you can launch simultaneously

For more information about GPU support in Kubernetes, see the Kubernetes documentation. For more information about using Amazon EKS, see the Amazon EKS documentation. Guidance setting up and running Amazon EKS can be found in the AWS Workshop for Kubernetes on GitHub.

Please leave any comments about this post and share what you’re working on. I can’t wait to see what you build with GPU-powered workloads on Amazon EKS!

AWS Online Tech Talks – June 2018

Post Syndicated from Devin Watson original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-june-2018/

AWS Online Tech Talks – June 2018

Join us this month to learn about AWS services and solutions. New this month, we have a fireside chat with the GM of Amazon WorkSpaces and our 2nd episode of the “How to re:Invent” series. We’ll also cover best practices, deep dives, use cases and more! Join us and register today!

Note – All sessions are free and in Pacific Time.

Tech talks featured this month:

 

Analytics & Big Data

June 18, 2018 | 11:00 AM – 11:45 AM PTGet Started with Real-Time Streaming Data in Under 5 Minutes – Learn how to use Amazon Kinesis to capture, store, and analyze streaming data in real-time including IoT device data, VPC flow logs, and clickstream data.
June 20, 2018 | 11:00 AM – 11:45 AM PT – Insights For Everyone – Deploying Data across your Organization – Learn how to deploy data at scale using AWS Analytics and QuickSight’s new reader role and usage based pricing.

 

AWS re:Invent
June 13, 2018 | 05:00 PM – 05:30 PM PTEpisode 2: AWS re:Invent Breakout Content Secret Sauce – Hear from one of our own AWS content experts as we dive deep into the re:Invent content strategy and how we maintain a high bar.
Compute

June 25, 2018 | 01:00 PM – 01:45 PM PTAccelerating Containerized Workloads with Amazon EC2 Spot Instances – Learn how to efficiently deploy containerized workloads and easily manage clusters at any scale at a fraction of the cost with Spot Instances.

June 26, 2018 | 01:00 PM – 01:45 PM PTEnsuring Your Windows Server Workloads Are Well-Architected – Get the benefits, best practices and tools on running your Microsoft Workloads on AWS leveraging a well-architected approach.

 

Containers
June 25, 2018 | 09:00 AM – 09:45 AM PTRunning Kubernetes on AWS – Learn about the basics of running Kubernetes on AWS including how setup masters, networking, security, and add auto-scaling to your cluster.

 

Databases

June 18, 2018 | 01:00 PM – 01:45 PM PTOracle to Amazon Aurora Migration, Step by Step – Learn how to migrate your Oracle database to Amazon Aurora.
DevOps

June 20, 2018 | 09:00 AM – 09:45 AM PTSet Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tools – Learn how to set up a CI/CD pipeline for deploying containers using the AWS Developer Tools.

 

Enterprise & Hybrid
June 18, 2018 | 09:00 AM – 09:45 AM PTDe-risking Enterprise Migration with AWS Managed Services – Learn how enterprise customers are de-risking cloud adoption with AWS Managed Services.

June 19, 2018 | 11:00 AM – 11:45 AM PTLaunch AWS Faster using Automated Landing Zones – Learn how the AWS Landing Zone can automate the set up of best practice baselines when setting up new

 

AWS Environments

June 21, 2018 | 11:00 AM – 11:45 AM PTLeading Your Team Through a Cloud Transformation – Learn how you can help lead your organization through a cloud transformation.

June 21, 2018 | 01:00 PM – 01:45 PM PTEnabling New Retail Customer Experiences with Big Data – Learn how AWS can help retailers realize actual value from their big data and deliver on differentiated retail customer experiences.

June 28, 2018 | 01:00 PM – 01:45 PM PTFireside Chat: End User Collaboration on AWS – Learn how End User Compute services can help you deliver access to desktops and applications anywhere, anytime, using any device.
IoT

June 27, 2018 | 11:00 AM – 11:45 AM PTAWS IoT in the Connected Home – Learn how to use AWS IoT to build innovative Connected Home products.

 

Machine Learning

June 19, 2018 | 09:00 AM – 09:45 AM PTIntegrating Amazon SageMaker into your Enterprise – Learn how to integrate Amazon SageMaker and other AWS Services within an Enterprise environment.

June 21, 2018 | 09:00 AM – 09:45 AM PTBuilding Text Analytics Applications on AWS using Amazon Comprehend – Learn how you can unlock the value of your unstructured data with NLP-based text analytics.

 

Management Tools

June 20, 2018 | 01:00 PM – 01:45 PM PTOptimizing Application Performance and Costs with Auto Scaling – Learn how selecting the right scaling option can help optimize application performance and costs.

 

Mobile
June 25, 2018 | 11:00 AM – 11:45 AM PTDrive User Engagement with Amazon Pinpoint – Learn how Amazon Pinpoint simplifies and streamlines effective user engagement.

 

Security, Identity & Compliance

June 26, 2018 | 09:00 AM – 09:45 AM PTUnderstanding AWS Secrets Manager – Learn how AWS Secrets Manager helps you rotate and manage access to secrets centrally.
June 28, 2018 | 09:00 AM – 09:45 AM PTUsing Amazon Inspector to Discover Potential Security Issues – See how Amazon Inspector can be used to discover security issues of your instances.

 

Serverless

June 19, 2018 | 01:00 PM – 01:45 PM PTProductionize Serverless Application Building and Deployments with AWS SAM – Learn expert tips and techniques for building and deploying serverless applications at scale with AWS SAM.

 

Storage

June 26, 2018 | 11:00 AM – 11:45 AM PTDeep Dive: Hybrid Cloud Storage with AWS Storage Gateway – Learn how you can reduce your on-premises infrastructure by using the AWS Storage Gateway to connecting your applications to the scalable and reliable AWS storage services.
June 27, 2018 | 01:00 PM – 01:45 PM PTChanging the Game: Extending Compute Capabilities to the Edge – Discover how to change the game for IIoT and edge analytics applications with AWS Snowball Edge plus enhanced Compute instances.
June 28, 2018 | 11:00 AM – 11:45 AM PTBig Data and Analytics Workloads on Amazon EFS – Get best practices and deployment advice for running big data and analytics workloads on Amazon EFS.

Kata Containers 1.0

Post Syndicated from ris original https://lwn.net/Articles/755230/rss

Kata Containers 1.0 has been released. “This first release of Kata Containers completes the merger of Intel’s Clear Containers and Hyper’s runV technologies, and delivers an OCI compatible runtime with seamless integration for container ecosystem technologies like Docker and Kubernetes.

[$] Securing the container image supply chain

Post Syndicated from corbet original https://lwn.net/Articles/754443/rss

“Security is hard” is a tautology, especially in the fast-moving world
of container orchestration. We have previously covered various aspects of
Linux container
security through, for example, the Clear Containers implementation
or the broader question of Kubernetes and
security
, but those are mostly concerned with container isolation; they do not address the
question of trusting a container’s contents. What is a container running?
Who built it and when? Even assuming we have good programmers and solid
isolation layers, propagating that good code around a Kubernetes cluster
and making strong assertions on the integrity of that supply chain is far
from trivial. The 2018 KubeCon
+ CloudNativeCon Europe
event featured some projects that could
eventually solve that problem.

[$] Updates in container isolation

Post Syndicated from corbet original https://lwn.net/Articles/754433/rss

At KubeCon
+ CloudNativeCon Europe
2018, several talks explored the topic of
container isolation and security. The last year saw the release of Kata Containers which, combined with
the CRI-O project, provided strong isolation
guarantees for containers using a hypervisor. During the conference, Google
released its own hypervisor called gVisor, adding yet another
possible solution for this problem. Those new developments prompted the
community to work on integrating the concept of “secure containers”
(or “sandboxed containers”) deeper
into Kubernetes. This work is now coming to fruition; it prompts us to look
again at how Kubernetes tries to keep the bad guys from wreaking havoc once
they break into a container.

Security updates for Wednesday

Post Syndicated from ris original https://lwn.net/Articles/754653/rss

Security updates have been issued by CentOS (dhcp), Debian (xen), Fedora (dhcp, flac, kubernetes, leptonica, libgxps, LibRaw, matrix-synapse, mingw-LibRaw, mysql-mmm, patch, seamonkey, webkitgtk4, and xen), Mageia (389-ds-base, exempi, golang, graphite2, libpam4j, libraw, libsndfile, libtiff, perl, quassel, spring-ldap, util-linux, and wget), Oracle (dhcp and kernel), Red Hat (389-ds-base, chromium-browser, dhcp, docker-latest, firefox, kernel-alt, libvirt, qemu-kvm, redhat-vertualization-host, rh-haproxy18-haproxy, and rhvm-appliance), Scientific Linux (389-ds-base, dhcp, firefox, libvirt, and qemu-kvm), and Ubuntu (poppler).

[$] Autoscaling for Kubernetes workloads

Post Syndicated from corbet original https://lwn.net/Articles/754153/rss

Technologies like containers, clusters, and Kubernetes offer the prospect
of rapidly scaling the available computing resources to match variable demands
placed on the system. Actually implementing that scaling can be a
challenge, though.
During KubeCon
+ CloudNativeCon Europe 2018
,
Frederic Branczyk from CoreOS (now
part of Red Hat) held a packed session
to introduce a standard and officially recommended way to scale workloads
automatically in Kubernetes
clusters.

Security updates for Friday

Post Syndicated from ris original https://lwn.net/Articles/754257/rss

Security updates have been issued by Arch Linux (libmupdf, mupdf, mupdf-gl, and mupdf-tools), Debian (firebird2.5, firefox-esr, and wget), Fedora (ckeditor, drupal7, firefox, kubernetes, papi, perl-Dancer2, and quassel), openSUSE (cairo, firefox, ImageMagick, libapr1, nodejs6, php7, and tiff), Red Hat (qemu-kvm-rhev), Slackware (mariadb), SUSE (xen), and Ubuntu (openjdk-8).

CI/CD with Data: Enabling Data Portability in a Software Delivery Pipeline with AWS Developer Tools, Kubernetes, and Portworx

Post Syndicated from Kausalya Rani Krishna Samy original https://aws.amazon.com/blogs/devops/cicd-with-data-enabling-data-portability-in-a-software-delivery-pipeline-with-aws-developer-tools-kubernetes-and-portworx/

This post is written by Eric Han – Vice President of Product Management Portworx and Asif Khan – Solutions Architect

Data is the soul of an application. As containers make it easier to package and deploy applications faster, testing plays an even more important role in the reliable delivery of software. Given that all applications have data, development teams want a way to reliably control, move, and test using real application data or, at times, obfuscated data.

For many teams, moving application data through a CI/CD pipeline, while honoring compliance and maintaining separation of concerns, has been a manual task that doesn’t scale. At best, it is limited to a few applications, and is not portable across environments. The goal should be to make running and testing stateful containers (think databases and message buses where operations are tracked) as easy as with stateless (such as with web front ends where they are often not).

Why is state important in testing scenarios? One reason is that many bugs manifest only when code is tested against real data. For example, we might simply want to test a database schema upgrade but a small synthetic dataset does not exercise the critical, finer corner cases in complex business logic. If we want true end-to-end testing, we need to be able to easily manage our data or state.

In this blog post, we define a CI/CD pipeline reference architecture that can automate data movement between applications. We also provide the steps to follow to configure the CI/CD pipeline.

 

Stateful Pipelines: Need for Portable Volumes

As part of continuous integration, testing, and deployment, a team may need to reproduce a bug found in production against a staging setup. Here, the hosting environment is comprised of a cluster with Kubernetes as the scheduler and Portworx for persistent volumes. The testing workflow is then automated by AWS CodeCommit, AWS CodePipeline, and AWS CodeBuild.

Portworx offers Kubernetes storage that can be used to make persistent volumes portable between AWS environments and pipelines. The addition of Portworx to the AWS Developer Tools continuous deployment for Kubernetes reference architecture adds persistent storage and storage orchestration to a Kubernetes cluster. The example uses MongoDB as the demonstration of a stateful application. In practice, the workflow applies to any containerized application such as Cassandra, MySQL, Kafka, and Elasticsearch.

Using the reference architecture, a developer calls CodePipeline to trigger a snapshot of the running production MongoDB database. Portworx then creates a block-based, writable snapshot of the MongoDB volume. Meanwhile, the production MongoDB database continues serving end users and is uninterrupted.

Without the Portworx integrations, a manual process would require an application-level backup of the database instance that is outside of the CI/CD process. For larger databases, this could take hours and impact production. The use of block-based snapshots follows best practices for resilient and non-disruptive backups.

As part of the workflow, CodePipeline deploys a new MongoDB instance for staging onto the Kubernetes cluster and mounts the second Portworx volume that has the data from production. CodePipeline triggers the snapshot of a Portworx volume through an AWS Lambda function, as shown here

 

 

 

AWS Developer Tools with Kubernetes: Integrated Workflow with Portworx

In the following workflow, a developer is testing changes to a containerized application that calls on MongoDB. The tests are performed against a staging instance of MongoDB. The same workflow applies if changes were on the server side. The original production deployment is scheduled as a Kubernetes deployment object and uses Portworx as the storage for the persistent volume.

The continuous deployment pipeline runs as follows:

  • Developers integrate bug fix changes into a main development branch that gets merged into a CodeCommit master branch.
  • Amazon CloudWatch triggers the pipeline when code is merged into a master branch of an AWS CodeCommit repository.
  • AWS CodePipeline sends the new revision to AWS CodeBuild, which builds a Docker container image with the build ID.
  • AWS CodeBuild pushes the new Docker container image tagged with the build ID to an Amazon ECR registry.
  • Kubernetes downloads the new container (for the database client) from Amazon ECR and deploys the application (as a pod) and staging MongoDB instance (as a deployment object).
  • AWS CodePipeline, through a Lambda function, calls Portworx to snapshot the production MongoDB and deploy a staging instance of MongoDB• Portworx provides a snapshot of the production instance as the persistent storage of the staging MongoDB
    • The MongoDB instance mounts the snapshot.

At this point, the staging setup mimics a production environment. Teams can run integration and full end-to-end tests, using partner tooling, without impacting production workloads. The full pipeline is shown here.

 

Summary

This reference architecture showcases how development teams can easily move data between production and staging for the purposes of testing. Instead of taking application-specific manual steps, all operations in this CodePipeline architecture are automated and tracked as part of the CI/CD process.

This integrated experience is part of making stateful containers as easy as stateless. With AWS CodePipeline for CI/CD process, developers can easily deploy stateful containers onto a Kubernetes cluster with Portworx storage and automate data movement within their process.

The reference architecture and code are available on GitHub:

● Reference architecture: https://github.com/portworx/aws-kube-codesuite
● Lambda function source code for Portworx additions: https://github.com/portworx/aws-kube-codesuite/blob/master/src/kube-lambda.py

For more information about persistent storage for containers, visit the Portworx website. For more information about Code Pipeline, see the AWS CodePipeline User Guide.

Security updates for Tuesday

Post Syndicated from ris original https://lwn.net/Articles/751454/rss

Security updates have been issued by CentOS (libvorbis and thunderbird), Debian (pjproject), Fedora (compat-openssl10, java-1.8.0-openjdk-aarch32, libid3tag, python-pip, python3, and python3-docs), Gentoo (ZendFramework), Oracle (thunderbird), Red Hat (ansible, gcc, glibc, golang, kernel, kernel-alt, kernel-rt, krb5, kubernetes, libvncserver, libvorbis, ntp, openssh, openssl, pcs, policycoreutils, qemu-kvm, and xdg-user-dirs), SUSE (openssl and openssl1), and Ubuntu (python-crypto, ubuntu-release-upgrader, and wayland).

Security updates for Monday

Post Syndicated from ris original https://lwn.net/Articles/751346/rss

Security updates have been issued by Arch Linux (openssl and zziplib), Debian (ldap-account-manager, ming, python-crypto, sam2p, sdl-image1.2, and squirrelmail), Fedora (bchunk, koji, libidn, librelp, nodejs, and php), Gentoo (curl, dhcp, libvirt, mailx, poppler, qemu, and spice-vdagent), Mageia (389-ds-base, aubio, cfitsio, libvncserver, nmap, and ntp), openSUSE (GraphicsMagick, ImageMagick, spice-gtk, and wireshark), Oracle (kubernetes), Slackware (patch), and SUSE (apache2 and openssl).

timeShift(GrafanaBuzz, 1w) Issue 38

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/03/30/timeshiftgrafanabuzz-1w-issue-38/

Welcome to TimeShift We have an abridged version of timeShift this week due to the long holiday weekend. Earlier this week we released Grafana v5.0.4 which included fixes for alerting, snapshots, starting Grafana on K8s and more. See the section below for specific bug fixes. Enjoy this issue and we’ll see you next week! Latest Stable Release This week we rolled out Grafana 5.0.4. Bug fixes in the latest release include: Docker: Can’t start Grafana on Kubernetes 1.

Kubernetes 1.10 released

Post Syndicated from ris original https://lwn.net/Articles/750236/rss

Kubernetes 1.10 has been released. “This newest version stabilizes features in 3 key areas, including storage, security, and networking. Notable additions in this release include the introduction of external kubectl credential providers (alpha), the ability to switch DNS service to CoreDNS at install time (beta), and the move of Container Storage Interface (CSI) and persistent local volumes to beta.

Task Networking in AWS Fargate

Post Syndicated from Nathan Peck original https://aws.amazon.com/blogs/compute/task-networking-in-aws-fargate/

AWS Fargate is a technology that allows you to focus on running your application without needing to provision, monitor, or manage the underlying compute infrastructure. You package your application into a Docker container that you can then launch using your container orchestration tool of choice.

Fargate allows you to use containers without being responsible for Amazon EC2 instances, similar to how EC2 allows you to run VMs without managing physical infrastructure. Currently, Fargate provides support for Amazon Elastic Container Service (Amazon ECS). Support for Amazon Elastic Container Service for Kubernetes (Amazon EKS) will be made available in the near future.

Despite offloading the responsibility for the underlying instances, Fargate still gives you deep control over configuration of network placement and policies. This includes the ability to use many networking fundamentals such as Amazon VPC and security groups.

This post covers how to take advantage of the different ways of networking your containers in Fargate when using ECS as your orchestration platform, with a focus on how to do networking securely.

The first step to running any application in Fargate is defining an ECS task for Fargate to launch. A task is a logical group of one or more Docker containers that are deployed with specified settings. When running a task in Fargate, there are two different forms of networking to consider:

  • Container (local) networking
  • External networking

Container Networking

Container networking is often used for tightly coupled application components. Perhaps your application has a web tier that is responsible for serving static content as well as generating some dynamic HTML pages. To generate these dynamic pages, it has to fetch information from another application component that has an HTTP API.

One potential architecture for such an application is to deploy the web tier and the API tier together as a pair and use local networking so the web tier can fetch information from the API tier.

If you are running these two components as two processes on a single EC2 instance, the web tier application process could communicate with the API process on the same machine by using the local loopback interface. The local loopback interface has a special IP address of 127.0.0.1 and hostname of localhost.

By making a networking request to this local interface, it bypasses the network interface hardware and instead the operating system just routes network calls from one process to the other directly. This gives the web tier a fast and efficient way to fetch information from the API tier with almost no networking latency.

In Fargate, when you launch multiple containers as part of a single task, they can also communicate with each other over the local loopback interface. Fargate uses a special container networking mode called awsvpc, which gives all the containers in a task a shared elastic network interface to use for communication.

If you specify a port mapping for each container in the task, then the containers can communicate with each other on that port. For example the following task definition could be used to deploy the web tier and the API tier:

{
  "family": "myapp"
  "containerDefinitions": [
    {
      "name": "web",
      "image": "my web image url",
      "portMappings": [
        {
          "containerPort": 80
        }
      ],
      "memory": 500,
      "cpu": 10,
      "esssential": true
    },
    {
      "name": "api",
      "image": "my api image url",
      "portMappings": [
        {
          "containerPort": 8080
        }
      ],
      "cpu": 10,
      "memory": 500,
      "essential": true
    }
  ]
}

ECS, with Fargate, is able to take this definition and launch two containers, each of which is bound to a specific static port on the elastic network interface for the task.

Because each Fargate task has its own isolated networking stack, there is no need for dynamic ports to avoid port conflicts between different tasks as in other networking modes. The static ports make it easy for containers to communicate with each other. For example, the web container makes a request to the API container using its well-known static port:

curl 127.0.0.1:8080/my-endpoint

This sends a local network request, which goes directly from one container to the other over the local loopback interface without traversing the network. This deployment strategy allows for fast and efficient communication between two tightly coupled containers. But most application architectures require more than just internal local networking.

External Networking

External networking is used for network communications that go outside the task to other servers that are not part of the task, or network communications that originate from other hosts on the internet and are directed to the task.

Configuring external networking for a task is done by modifying the settings of the VPC in which you launch your tasks. A VPC is a fundamental tool in AWS for controlling the networking capabilities of resources that you launch on your account.

When setting up a VPC, you create one or more subnets, which are logical groups that your resources can be placed into. Each subnet has an Availability Zone and its own route table, which defines rules about how network traffic operates for that subnet. There are two main types of subnets: public and private.

Public subnets

A public subnet is a subnet that has an associated internet gateway. Fargate tasks in that subnet are assigned both private and public IP addresses:


A browser or other client on the internet can send network traffic to the task via the internet gateway using its public IP address. The tasks can also send network traffic to other servers on the internet because the route table can route traffic out via the internet gateway.

If tasks want to communicate directly with each other, they can use each other’s private IP address to send traffic directly from one to the other so that it stays inside the subnet without going out to the internet gateway and back in.

Private subnets

A private subnet does not have direct internet access. The Fargate tasks inside the subnet don’t have public IP addresses, only private IP addresses. Instead of an internet gateway, a network address translation (NAT) gateway is attached to the subnet:

 

There is no way for another server or client on the internet to reach your tasks directly, because they don’t even have an address or a direct route to reach them. This is a great way to add another layer of protection for internal tasks that handle sensitive data. Those tasks are protected and can’t receive any inbound traffic at all.

In this configuration, the tasks can still communicate to other servers on the internet via the NAT gateway. They would appear to have the IP address of the NAT gateway to the recipient of the communication. If you run a Fargate task in a private subnet, you must add this NAT gateway. Otherwise, Fargate can’t make a network request to Amazon ECR to download the container image, or communicate with Amazon CloudWatch to store container metrics.

Load balancers

If you are running a container that is hosting internet content in a private subnet, you need a way for traffic from the public to reach the container. This is generally accomplished by using a load balancer such as an Application Load Balancer or a Network Load Balancer.

ECS integrates tightly with AWS load balancers by automatically configuring a service-linked load balancer to send network traffic to containers that are part of the service. When each task starts, the IP address of its elastic network interface is added to the load balancer’s configuration. When the task is being shut down, network traffic is safely drained from the task before removal from the load balancer.

To get internet traffic to containers using a load balancer, the load balancer is placed into a public subnet. ECS configures the load balancer to forward traffic to the container tasks in the private subnet:

This configuration allows your tasks in Fargate to be safely isolated from the rest of the internet. They can still initiate network communication with external resources via the NAT gateway, and still receive traffic from the public via the Application Load Balancer that is in the public subnet.

Another potential use case for a load balancer is for internal communication from one service to another service within the private subnet. This is typically used for a microservice deployment, in which one service such as an internet user account service needs to communicate with an internal service such as a password service. Obviously, it is undesirable for the password service to be directly accessible on the internet, so using an internet load balancer would be a major security vulnerability. Instead, this can be accomplished by hosting an internal load balancer within the private subnet:

With this approach, one container can distribute requests across an Auto Scaling group of other private containers via the internal load balancer, ensuring that the network traffic stays safely protected within the private subnet.

Best Practices for Fargate Networking

Determine whether you should use local task networking

Local task networking is ideal for communicating between containers that are tightly coupled and require maximum networking performance between them. However, when you deploy one or more containers as part of the same task they are always deployed together so it removes the ability to independently scale different types of workload up and down.

In the example of the application with a web tier and an API tier, it may be the case that powering the application requires only two web tier containers but 10 API tier containers. If local container networking is used between these two container types, then an extra eight unnecessary web tier containers would end up being run instead of allowing the two different services to scale independently.

A better approach would be to deploy the two containers as two different services, each with its own load balancer. This allows clients to communicate with the two web containers via the web service’s load balancer. The web service could distribute requests across the eight backend API containers via the API service’s load balancer.

Run internet tasks that require internet access in a public subnet

If you have tasks that require internet access and a lot of bandwidth for communication with other services, it is best to run them in a public subnet. Give them public IP addresses so that each task can communicate with other services directly.

If you run these tasks in a private subnet, then all their outbound traffic has to go through an NAT gateway. AWS NAT gateways support up to 10 Gbps of burst bandwidth. If your bandwidth requirements go over this, then all task networking starts to get throttled. To avoid this, you could distribute the tasks across multiple private subnets, each with their own NAT gateway. It can be easier to just place the tasks into a public subnet, if possible.

Avoid using a public subnet or public IP addresses for private, internal tasks

If you are running a service that handles private, internal information, you should not put it into a public subnet or use a public IP address. For example, imagine that you have one task, which is an API gateway for authentication and access control. You have another background worker task that handles sensitive information.

The intended access pattern is that requests from the public go to the API gateway, which then proxies request to the background task only if the request is from an authenticated user. If the background task is in a public subnet and has a public IP address, then it could be possible for an attacker to bypass the API gateway entirely. They could communicate directly to the background task using its public IP address, without being authenticated.

Conclusion

Fargate gives you a way to run containerized tasks directly without managing any EC2 instances, but you still have full control over how you want networking to work. You can set up containers to talk to each other over the local network interface for maximum speed and efficiency. For running workloads that require privacy and security, use a private subnet with public internet access locked down. Or, for simplicity with an internet workload, you can just use a public subnet and give your containers a public IP address.

To deploy one of these Fargate task networking approaches, check out some sample CloudFormation templates showing how to configure the VPC, subnets, and load balancers.

If you have questions or suggestions, please comment below.

[$] Changes in Prometheus 2.0

Post Syndicated from corbet original https://lwn.net/Articles/744721/rss

2017 was a big year for the Prometheus project, as it published
its 2.0 release in November
. The new release ships numerous
bug fixes, new features, and, notably, a new storage engine that brings major
performance improvements. This comes at the cost of incompatible changes to
the storage and configuration-file formats. An overview of
Prometheus and its new release was presented to the Kubernetes community in a talk
held during KubeCon
+ CloudNativeCon
. This article covers what changed in this new release
and what is brewing next in the Prometheus community; it is a companion to
this article, which provided a general
introduction to monitoring with Prometheus.

timeShift(GrafanaBuzz, 1w) Issue 29

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/01/12/timeshiftgrafanabuzz-1w-issue-29/

Welcome to TimeShift

intro paragraph


Latest Stable Release

Grafana 4.6.3 is now available. Latest bugfixes include:

  • Gzip: Fixes bug Gravatar images when gzip was enabled #5952
  • Alert list: Now shows alert state changes even after adding manual annotations on dashboard #99513
  • Alerting: Fixes bug where rules evaluated as firing when all conditions was false and using OR operator. #93183
  • Cloudwatch: CloudWatch no longer display metrics’ default alias #101514, thx @mtanda

Download Grafana 4.6.3 Now


From the Blogosphere

Graphite 1.1: Teaching an Old Dog New Tricks: Grafana Labs’ own Dan Cech is a contributor to the Graphite project, and has been instrumental in the addition of some of the newest features. This article discusses five of the biggest additions, how they work, and what you can expect for the future of the project.

Instrument an Application Using Prometheus and Grafana: Chris walks us through how easy it is to get useful metrics from an application to understand bottlenecks and performace. In this article, he shares an application he built that indexes your Gmail account into Elasticsearch, and sends the metrics to Prometheus. Then, he shows you how to set up Grafana to get meaningful graphs and dashboards.

Visualising Serverless Metrics With Grafana Dashboards: Part 3 in this series of blog posts on “Monitoring Serverless Applications Metrics” starts with an overview of Grafana and the UI, covers queries and templating, then dives into creating some great looking dashboards. The series plans to conclude with a post about setting up alerting.

Huawei FAT WLAN Access Points in Grafana: Huawei’s FAT firmware for their WLAN Access points lacks central management overview. To get a sense of the performance of your AP’s, why not quickly create a templated dashboard in Grafana? This article quickly steps your through the process, and includes a sample dashboard.


Grafana Plugins

Lots of updated plugins this week. Plugin authors add new features and fix bugs often, to make your plugin perform better – so it’s important to keep your plugins up to date. We’ve made updating easy; for on-prem Grafana, use the Grafana-cli tool, or update with 1 click if you’re using Hosted Grafana.

UPDATED PLUGIN

Clickhouse Data Source – The Clickhouse Data Source plugin has been updated a few times with small fixes during the last few weeks.

  • Fix for quantile functions
  • Allow rounding with round option for both time filters: $from and $to

Update

UPDATED PLUGIN

Zabbix App – The Zabbix App had a release with a redesign of the Triggers panel as well as support for Multiple data sources for the triggers panel

Update

UPDATED PLUGIN

OpenHistorian Data Source – this data source plugin received some new query builder screens and improved documentation.

Update

UPDATED PLUGIN

BT Status Dot Panel – This panel received a small bug fix.

Update

UPDATED PLUGIN

Carpet Plot Panel – A recent update for this panel fixes a D3 import bug.

Update


Upcoming Events

In between code pushes we like to speak at, sponsor and attend all kinds of conferences and meetups. We also like to make sure we mention other Grafana-related events happening all over the world. If you’re putting on just such an event, let us know and we’ll list it here.

Women Who Go Berlin: Go Workshop – Monitoring and Troubleshooting using Prometheus and Grafana | Berlin, Germany – Jan 31, 2018: In this workshop we will learn about one of the most important topics in making apps production ready: Monitoring. We will learn how to use tools you’ve probably heard a lot about – Prometheus and Grafana, and using what we learn we will troubleshoot a particularly buggy Go app.

Register Now

FOSDEM | Brussels, Belgium – Feb 3-4, 2018: FOSDEM is a free developer conference where thousands of developers of free and open source software gather to share ideas and technology. There is no need to register; all are welcome.

Jfokus | Stockholm, Sweden – Feb 5-7, 2018:
Carl Bergquist – Quickie: Monitoring? Not OPS Problem

Why should we monitor our system? Why can’t we just rely on the operations team anymore? They use to be able to do that. What’s currently changing? Presentation content: – Why do we monitor our system – How did it use to work? – Whats changing – Why do we need to shift focus – Everyone should be on call. – Resilience is the goal (Best way of having someone care about quality is to make them responsible).

Register Now

Jfokus | Stockholm, Sweden – Feb 5-7, 2018:
Leonard Gram – Presentation: DevOps Deconstructed

What’s a Site Reliability Engineer and how’s that role different from the DevOps engineer my boss wants to hire? I really don’t want to be on call, should I? Is Docker the right place for my code or am I better of just going straight to Serverless? And why should I care about any of it? I’ll try to answer some of these questions while looking at what DevOps really is about and how commodisation of servers through “the cloud” ties into it all. This session will be an opinionated piece from a developer who’s been on-call for the past 6 years and would like to convince you to do the same, at least once.

Register Now

Stockholm Metrics and Monitoring | Stockholm, Sweden – Feb 7, 2018:
Observability 3 ways – Logging, Metrics and Distributed Tracing

Let’s talk about often confused telemetry tools: Logging, Metrics and Distributed Tracing. We’ll show how you capture latency using each of the tools and how they work differently. Through examples and discussion, we’ll note edge cases where certain tools have advantages over others. By the end of this talk, we’ll better understand how each of Logging, Metrics and Distributed Tracing aids us in different ways to understand our applications.

Register Now

OpenNMS – Introduction to “Grafana” | Webinar – Feb 21, 2018:
IT monitoring helps detect emerging hardware damage and performance bottlenecks in the enterprise network before any consequential damage or disruption to business processes occurs. The powerful open-source OpenNMS software monitors a network, including all connected devices, and provides logging of a variety of data that can be used for analysis and planning purposes. In our next OpenNMS webinar on February 21, 2018, we introduce “Grafana” – a web-based tool for creating and displaying dashboards from various data sources, which can be perfectly combined with OpenNMS.

Register Now

GrafanaCon EU | Amsterdam, Netherlands – March 1-2, 2018:
Lock in your seat for GrafanaCon EU while there are still tickets avaialable! Join us March 1-2, 2018 in Amsterdam for 2 days of talks centered around Grafana and the surrounding monitoring ecosystem including Graphite, Prometheus, InfluxData, Elasticsearch, Kubernetes, and more.

We have some exciting talks lined up from Google, CERN, Bloomberg, eBay, Red Hat, Tinder, Automattic, Prometheus, InfluxData, Percona and more! Be sure to get your ticket before they’re sold out.

Learn More


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard and show it off! #monitoringLove

Nice hack! I know I like to keep one eye on server requests when I’m dropping beats. 😉


Grafana Labs is Hiring!

We are passionate about open source software and thrive on tackling complex challenges to build the future. We ship code from every corner of the globe and love working with the community. If this sounds exciting, you’re in luck – WE’RE HIRING!

Check out our Open Positions


How are we doing?

Thanks for reading another issue of timeShift. Let us know what you think! Submit a comment on this article below, or post something at our community forum.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

Continuous Deployment to Kubernetes using AWS CodePipeline, AWS CodeCommit, AWS CodeBuild, Amazon ECR and AWS Lambda

Post Syndicated from Chris Barclay original https://aws.amazon.com/blogs/devops/continuous-deployment-to-kubernetes-using-aws-codepipeline-aws-codecommit-aws-codebuild-amazon-ecr-and-aws-lambda/

Thank you to my colleague Omar Lari for this blog on how to create a continuous deployment pipeline for Kubernetes!


You can use Kubernetes and AWS together to create a fully managed, continuous deployment pipeline for container based applications. This approach takes advantage of Kubernetes’ open-source system to manage your containerized applications, and the AWS developer tools to manage your source code, builds, and pipelines.

This post describes how to create a continuous deployment architecture for containerized applications. It uses AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, and AWS Lambda to deploy containerized applications into a Kubernetes cluster. In this environment, developers can remain focused on developing code without worrying about how it will be deployed, and development managers can be satisfied that the latest changes are always deployed.

What is Continuous Deployment?

There are many articles, posts and even conferences dedicated to the practice of continuous deployment. For the purposes of this post, I will summarize continuous delivery into the following points:

  • Code is more frequently released into production environments
  • More frequent releases allow for smaller, incremental changes reducing risk and enabling simplified roll backs if needed
  • Deployment is automated and requires minimal user intervention

For a more information, see “Practicing Continuous Integration and Continuous Delivery on AWS”.

How can you use continuous deployment with AWS and Kubernetes?

You can leverage AWS services that support continuous deployment to automatically take your code from a source code repository to production in a Kubernetes cluster with minimal user intervention. To do this, you can create a pipeline that will build and deploy committed code changes as long as they meet the requirements of each stage of the pipeline.

To create the pipeline, you will use the following services:

  • AWS CodePipeline. AWS CodePipeline is a continuous delivery service that models, visualizes, and automates the steps required to release software. You define stages in a pipeline to retrieve code from a source code repository, build that source code into a releasable artifact, test the artifact, and deploy it to production. Only code that successfully passes through all these stages will be deployed. In addition, you can optionally add other requirements to your pipeline, such as manual approvals, to help ensure that only approved changes are deployed to production.
  • AWS CodeCommit. AWS CodeCommit is a secure, scalable, and managed source control service that hosts private Git repositories. You can privately store and manage assets such as your source code in the cloud and configure your pipeline to automatically retrieve and process changes committed to your repository.
  • AWS CodeBuild. AWS CodeBuild is a fully managed build service that compiles source code, runs tests, and produces artifacts that are ready to deploy. You can use AWS CodeBuild to both build your artifacts, and to test those artifacts before they are deployed.
  • AWS Lambda. AWS Lambda is a compute service that lets you run code without provisioning or managing servers. You can invoke a Lambda function in your pipeline to prepare the built and tested artifact for deployment by Kubernetes to the Kubernetes cluster.
  • Kubernetes. Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It provides a platform for running, deploying, and managing containers at scale.

An Example of Continuous Deployment to Kubernetes:

The following example illustrates leveraging AWS developer tools to continuously deploy to a Kubernetes cluster:

  1. Developers commit code to an AWS CodeCommit repository and create pull requests to review proposed changes to the production code. When the pull request is merged into the master branch in the AWS CodeCommit repository, AWS CodePipeline automatically detects the changes to the branch and starts processing the code changes through the pipeline.
  2. AWS CodeBuild packages the code changes as well as any dependencies and builds a Docker image. Optionally, another pipeline stage tests the code and the package, also using AWS CodeBuild.
  3. The Docker image is pushed to Amazon ECR after a successful build and/or test stage.
  4. AWS CodePipeline invokes an AWS Lambda function that includes the Kubernetes Python client as part of the function’s resources. The Lambda function performs a string replacement on the tag used for the Docker image in the Kubernetes deployment file to match the Docker image tag applied in the build, one that matches the image in Amazon ECR.
  5. After the deployment manifest update is completed, AWS Lambda invokes the Kubernetes API to update the image in the Kubernetes application deployment.
  6. Kubernetes performs a rolling update of the pods in the application deployment to match the docker image specified in Amazon ECR.
    The pipeline is now live and responds to changes to the master branch of the CodeCommit repository. This pipeline is also fully extensible, you can add steps for performing testing or adding a step to deploy into a staging environment before the code ships into the production cluster.

An example pipeline in AWS CodePipeline that supports this architecture can be seen below:

Conclusion

We are excited to see how you leverage this pipeline to help ease your developer experience as you develop applications in Kubernetes.

You’ll find an AWS CloudFormation template with everything necessary to spin up your own continuous deployment pipeline at the CodeSuite – Continuous Deployment Reference Architecture for Kubernetes repo on GitHub. The repository details exactly how the pipeline is provisioned and how you can use it to deploy your own applications. If you have any questions, feedback, or suggestions, please let us know!