Tag Archives: ECS

Building, deploying, and operating containerized applications with AWS Fargate

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/building-deploying-and-operating-containerized-applications-with-aws-fargate/

This post was contributed by Jason Umiker, AWS Solutions Architect.

Whether it’s helping facilitate a journey to microservices or deploying existing tools more easily and repeatably, many customers are moving toward containerized infrastructure and workflows. AWS provides many of the services and mechanisms to help you with that.

In this post, I show you how to use Amazon ECS and AWS Fargate, as well as AWS CodeBuild and AWS CodePipeline, for an end-to-end CI/CD container solution.

What is Amazon ECS?

Amazon Elastic Container Service (ECS) helps schedule and orchestrate containers across a fleet of servers. It involves installing an agent on each container host that takes instructions from the ECS control plane and relays them to the local Docker image on each one. ECS makes this easy by providing an optimized Amazon Machine Image (AMI) that launches automatically using the ECS console or CLI and that you can use to launch container hosts yourself.

It is up to you to choose the appropriate instance types, sizes, and quantity for your cluster fleet. You should have the capacity to deploy and scale workloads as well as to spread them across enough failure domains for high availability. Features like Auto Scaling groups help with that.

Also, while AWS provides Amazon Linux and Windows AMIs pre-configured for ECS, you are responsible for ongoing maintenance of the OS, which includes patching and security. Items that require regular patching or updating in this model are the OS, Docker, the ECS agent, and of course the contents of the container images.

Two of the key ECS concepts are Tasks and Services. A task is one or more containers that are to be scheduled together by ECS. A service is like an Auto Scaling group for tasks. It defines the quantity of tasks to run across the cluster, where they should be running (for example, across multiple Availability Zones), automatically associates them with a load balancer, and horizontally scales based on metrics that you define like CPI or memory utilization.

What is Fargate?

AWS Fargate is a new compute engine for Amazon ECS that runs containers without requiring you to deploy or manage the underlying Amazon EC2 instances. With Fargate, you specify an image to deploy and the amount of CPU and memory it requires. Fargate handles the updating and securing of the underlying Linux OS, Docker daemon, and ECS agent as well as all the infrastructure capacity management and scaling.

How to use Fargate?

Fargate is exposed as a launch type for ECS. It uses an ECS task and service definition that is similar to the traditional EC2 launch mode, with a few minor differences. It is easy to move tasks and services back and forth between launch types. The differences include:

  • Using the awsvpc network mode
  • Specifying the CPU and memory requirements for the task in the definition

The best way to learn how to use Fargate is to walk through the process and see it in action.

Walkthrough: Deploying a service with Fargate in the console

At the time of publication, Fargate for ECS is available in the N. Virginia, Ohio, Oregon, and Ireland AWS regions. This walkthrough works in any AWS region where Fargate is available.

If you’d prefer to use a CloudFormation template, this one covers Steps 1-4. After launching this template you can skip ahead to Explore Running Service after Step 4.

Step 1 – Create an ECS cluster

An ECS cluster is a logical construct for running groups of containers known as tasks. Clusters can also be used to segregate different environments or teams from each other. In the traditional EC2 launch mode, there are specific EC2 instances associated with and managed by each ECS cluster, but this is transparent to the customer with Fargate.

  1. Open the ECS console and ensure that Fargate is available in the selected Region (for example, N. Virginia).
  2. Choose Clusters, Create Cluster.
  3. Choose Networking only, Next step.
  4. For Cluster name, enter “Fargate”. If you don’t already have a VPC to use, select the Create VPC check box and accept the defaults as well. Choose Create.

Step 2 – Create a task definition, CloudWatch log group, and task execution role

A task is a collection of one or more containers that is the smallest deployable unit of your application. A task definition is a JSON document that serves as the blueprint for ECS to know how to deploy and run your tasks.

The console makes it easier to create this definition by exposing all the parameters graphically. In addition, the console creates two dependencies:

  • The Amazon CloudWatch log group to store the aggregated logs from the task
  • The task execution IAM role that gives Fargate the permissions to run the task
  1. In the left navigation pane, choose Task Definitions, Create new task definition.
  2. Under Select launch type compatibility, choose FARGATE, Next step.
  3. For Task Definition Name, enter NGINX.
  4. If you had an IAM role for your task, you would enter it in Task Role but you don’t need one for this example.
  5. The Network Mode is automatically set to awsvpc for Fargate
  6. Under Task size, for Task memory, choose 0.5 GB. For Task CPU, enter 0.25.
  7. Choose Add container.
  8. For Container name, enter NGINX.
  9. For Image, put nginx:1.13.9-alpine.
  10. For Port mappings type 80 into Container port.
  11. Choose Add, Create.

Step 3 – Create an Application Load Balancer

Sending incoming traffic through a load balancer is often a key piece of making an application both scalable and highly available. It can balance the traffic between multiple tasks, as well as ensure that traffic is only sent to healthy tasks. You can have the service manage the addition or removal of tasks from an Application Load Balancer as they come and go but that must be specified when the service is created. It’s a dependency that you create first.

  1. Open the EC2 console.
  2. In the left navigation pane, choose Load Balancers, Create Load Balancer.
  3. Under Application Load Balancer, choose Create.
  4. For Name, put NGINX.
  5. Choose the appropriate VPC (10.0.0.0/16 if you let ECS create if for you).
  6. For Availability Zones, select both and choose Next: Configure Security Settings.
  7. Choose Next: Configure Security Groups.
  8. For Assign a security group, choose Create a new security group. Choose Next: Configure Routing.
  9. For Name, enter NGINX. For Target type, choose ip.
  10. Choose Next: Register Targets, Next: Review, Create.
  11. Select the new load balancer and note its DNS name (this is the public address for the service).

Step 4 – Create an ECS service using Fargate

A service in ECS using Fargate serves a similar purpose to an Auto Scaling group in EC2. It ensures that the needed number of tasks are running both for scaling as well as spreading the tasks over multiple Availability Zones for high availability. A service creates and destroys tasks as part of its role and can optionally add or remove them from an Application Load Balancer as targets as it does so.

  1. Open the ECS console and ensure that that Fargate is available in the selected Region (for example, N. Virginia).
  2. In the left navigation pane, choose Task Definitions.
  3. Select the NGINX task definition that you created and choose Actions, Create Service.
  4. For Launch Type, select Fargate.
  5. For Service name, enter NGINX.
  6. For Number of tasks, enter 1.
  7. Choose Next step.
  8. Under Subnets, choose both of the options.
  9. For Load balancer type, choose Application Load Balancer. It should then default to the NGINX version that you created earlier.
  10. Choose Add to load balancer.
  11. For Target group name, choose NGINX.
  12. Under DNS records for service discovery, for TTL, enter 60.
  13. Click Next step, Next step, and Create Service.

Explore the running service

At this point, you have a running NGINX service using Fargate. You can now explore what you have running and how it works. You can also ask it to scale up to two tasks across two Availability Zones in the console.

Go into the service and see details about the associated load balancer, tasks, events, metrics, and logs:

Scale the service from one task to multiple tasks:

  • Choose Update.
  • For Number of tasks, enter 2.
  • Choose Next step, Next step, Next step then Update Service.
  • Watch the event that is logged and the new additional task both appear.

On the service Details tab, open the NGINX Target Group Name link and see the IP address registered targets spread across the two zones.

Go to the DNS name for the Application Load Balancer in your browser and see the default NGINX page. Get the value from the Load Balancers dashboard in the EC2 console.

Walkthrough: Adding a CI/CD pipeline to your service

Now, I’m going to show you how to set up a CI/CD pipeline around this service. It watches a GitHub repo for changes and rebuilds the container with CodeBuild based on the buildspec.yml file and Dockerfile in the repo. If that build is successful, it then updates your Fargate service to deploy the new image.

If you’d prefer to use a CloudFormation Template, this one covers the creation of the dependencies so that the console will pre-fill these (CodeBuild Project and IAM Roles) during the creation of the CodePipeline in the steps below.

Step 1 – Create an ECR repository for the rebuilt container image

An ECR repository is a place to store your container images in a secure and reliable manner. Scaling and self-healing of Fargate tasks requires these images to be always available to be pulled when required. This is an important part of a container platform.

  1. Open the ECS console and ensure that that Fargate is available in the selected Region (for example N. Virginia).
  2. In the left navigation pane, under Amazon ECR, choose Repositories, Get started.
  3. For Repository name, put NGINX and choose Next step.

Step 2 – Fork the nginx-codebuild example into your own GitHub account

I have created an example project that takes the Dockerfile and config files for the official NGINX Docker Hub image and adds a buildspec.yml file to tell CodeBuild how to build the container and push it to your new ECR registry on completion. You can fork it into your own GitHub account for this CI/CD demo.

  1. Go to https://github.com/jasonumiker/nginx-codebuild.
  2. In the upper right corner, choose Fork.

Step 3 – Create the pipeline and associated IAM roles

You have two complementary AWS services for building a CI/CD pipeline for your containers. CodeBuild executes the build jobs and CodePipeline kicks off those builds when it notices that the source GitHub or CodeCommit repo changes. If successful, CodePipeline then deploys the new container image to Fargate.

The CodePipeline console can create the associated CodeBuild project, in addition to other dependencies such as the required IAM roles.

  1. Open the CodePipeline console and ensure that that Fargate is available in the selected Region (for example, N. Virginia).
  2. Choose Get started.
  3. For Pipeline name, enter NGINX and choose Next step.
  4. For Source provider, choose GitHub.
  5. Choose Connect to GitHub and log in.
    • For Repository, choose your forked nginx-codebuild repo. For Branch, enter master. Choose Next step.
  6. For Build provider, enter AWS CodeBuild.
  7. Select Create a new build project.
  8. For Project name, enter NGINX.
  9. For Operating system, choose Ubuntu. For Runtime, choose Docker. For Version, select the latest version.
  10. Expand Advanced and set the following environment variables:
    • AWS_ACCOUNT_ID with a value of the account number
    • IMAGE_REPO_NAME with a value of NGINX (or whatever ECR name that you used)
  11. Choose Save build project, Next step.
  12. For Deployment provider, choose Amazon ECS.
  13. For Cluster name, enter Fargate.
  14. For Service name, choose NGINX.
  15. For Image filename, enter images.json.
  16. Choose Next step.
  17. Choose Create role, Allow, Next step, and then choose Create pipeline.
  18. Open the IAM console and ensure that that Fargate is available in the selected Region (for example, N. Virginia).
  19. In the left navigation pane, choose Roles.
  20. Choose the code-build-nginx-service-role that was just created and choose Attach policy.
  21. For Policy type, choose AmazonEC2ContainerRegistryPowerUser and choose Attach policy.

Step 4 – Start the pipeline

You now have CodePipeline watching the GitHub repo for changes. It kicks off a CodeBuild build job on a change and, if the build is successful, creates a new deployment of the Fargate service with the new image.

Make a change to the source repo (even just adding a new dummy file) and then commit it and push it to master on your GitHub fork. This automatically kicks off the pipeline to build and deploy the change.

Conclusion

As you’ve seen, Fargate is fast and easy to set up, integrates well with the rest of the AWS platform, and saves you from much of the heavy lifting of running containers reliably at scale.

While it is useful to go through creating things in the console to understand them better we suggest automating them with infrastructure-as-code patterns via things like our CloudFormation to ensure that they are repeatable, and any changes can be managed. There are some example templates to help you get started in this post.

In addition, adding things like unit and integration testing, blue/green and/or manual approval gates into CodePipeline are often a good idea before deploying patterns like this to production in many organizations. Some additional examples to look at next include:

EC2 Instance Update – M5 Instances with Local NVMe Storage (M5d)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/ec2-instance-update-m5-instances-with-local-nvme-storage-m5d/

Earlier this month we launched the C5 Instances with Local NVMe Storage and I told you that we would be doing the same for additional instance types in the near future!

Today we are introducing M5 instances equipped with local NVMe storage. Available for immediate use in 5 regions, these instances are a great fit for workloads that require a balance of compute and memory resources. Here are the specs:

Instance Name vCPUs RAM Local Storage EBS-Optimized Bandwidth Network Bandwidth
m5d.large 2 8 GiB 1 x 75 GB NVMe SSD Up to 2.120 Gbps Up to 10 Gbps
m5d.xlarge 4 16 GiB 1 x 150 GB NVMe SSD Up to 2.120 Gbps Up to 10 Gbps
m5d.2xlarge 8 32 GiB 1 x 300 GB NVMe SSD Up to 2.120 Gbps Up to 10 Gbps
m5d.4xlarge 16 64 GiB 1 x 600 GB NVMe SSD 2.210 Gbps Up to 10 Gbps
m5d.12xlarge 48 192 GiB 2 x 900 GB NVMe SSD 5.0 Gbps 10 Gbps
m5d.24xlarge 96 384 GiB 4 x 900 GB NVMe SSD 10.0 Gbps 25 Gbps

The M5d instances are powered by Custom Intel® Xeon® Platinum 8175M series processors running at 2.5 GHz, including support for AVX-512.

You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe; this includes the latest Amazon Linux, Microsoft Windows (Server 2008 R2, Server 2012, Server 2012 R2 and Server 2016), Ubuntu, RHEL, SUSE, and CentOS AMIs.

Here are a couple of things to keep in mind about the local NVMe storage on the M5d instances:

Naming – You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.

Encryption – Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated.

Lifetime – Local NVMe devices have the same lifetime as the instance they are attached to, and do not stick around after the instance has been stopped or terminated.

Available Now
M5d instances are available in On-Demand, Reserved Instance, and Spot form in the US East (N. Virginia), US West (Oregon), EU (Ireland), US East (Ohio), and Canada (Central) Regions. Prices vary by Region, and are just a bit higher than for the equivalent M5 instances.

Jeff;

 

EC2 Instance Update – C5 Instances with Local NVMe Storage (C5d)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/ec2-instance-update-c5-instances-with-local-nvme-storage-c5d/

As you can see from my EC2 Instance History post, we add new instance types on a regular and frequent basis. Driven by increasingly powerful processors and designed to address an ever-widening set of use cases, the size and diversity of this list reflects the equally diverse group of EC2 customers!

Near the bottom of that list you will find the new compute-intensive C5 instances. With a 25% to 50% improvement in price-performance over the C4 instances, the C5 instances are designed for applications like batch and log processing, distributed and or real-time analytics, high-performance computing (HPC), ad serving, highly scalable multiplayer gaming, and video encoding. Some of these applications can benefit from access to high-speed, ultra-low latency local storage. For example, video encoding, image manipulation, and other forms of media processing often necessitates large amounts of I/O to temporary storage. While the input and output files are valuable assets and are typically stored as Amazon Simple Storage Service (S3) objects, the intermediate files are expendable. Similarly, batch and log processing runs in a race-to-idle model, flushing volatile data to disk as fast as possible in order to make full use of compute resources.

New C5d Instances with Local Storage
In order to meet this need, we are introducing C5 instances equipped with local NVMe storage. Available for immediate use in 5 regions, these instances are a great fit for the applications that I described above, as well as others that you will undoubtedly dream up! Here are the specs:

Instance Name vCPUs RAM Local Storage EBS Bandwidth Network Bandwidth
c5d.large 2 4 GiB 1 x 50 GB NVMe SSD Up to 2.25 Gbps Up to 10 Gbps
c5d.xlarge 4 8 GiB 1 x 100 GB NVMe SSD Up to 2.25 Gbps Up to 10 Gbps
c5d.2xlarge 8 16 GiB 1 x 225 GB NVMe SSD Up to 2.25 Gbps Up to 10 Gbps
c5d.4xlarge 16 32 GiB 1 x 450 GB NVMe SSD 2.25 Gbps Up to 10 Gbps
c5d.9xlarge 36 72 GiB 1 x 900 GB NVMe SSD 4.5 Gbps 10 Gbps
c5d.18xlarge 72 144 GiB 2 x 900 GB NVMe SSD 9 Gbps 25 Gbps

Other than the addition of local storage, the C5 and C5d share the same specs. Both are powered by 3.0 GHz Intel Xeon Platinum 8000-series processors, optimized for EC2 and with full control over C-states on the two largest sizes, giving you the ability to run two cores at up to 3.5 GHz using Intel Turbo Boost Technology.

You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe; this includes the latest Amazon Linux, Microsoft Windows (Server 2008 R2, Server 2012, Server 2012 R2 and Server 2016), Ubuntu, RHEL, SUSE, and CentOS AMIs.

Here are a couple of things to keep in mind about the local NVMe storage:

Naming – You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.

Encryption – Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated.

Lifetime – Local NVMe devices have the same lifetime as the instance they are attached to, and do not stick around after the instance has been stopped or terminated.

Available Now
C5d instances are available in On-Demand, Reserved Instance, and Spot form in the US East (N. Virginia), US West (Oregon), EU (Ireland), US East (Ohio), and Canada (Central) Regions. Prices vary by Region, and are just a bit higher than for the equivalent C5 instances.

Jeff;

PS – We will be adding local NVMe storage to other EC2 instance types in the months to come, so stay tuned!

Introducing the AWS Machine Learning Competency for Consulting Partners

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/introducing-the-aws-machine-learning-competency-for-consulting-partners/

Today I’m excited to announce a new Machine Learning Competency for Consulting Partners in the Amazon Partner Network (APN). This AWS Competency program allows APN Consulting Partners to demonstrate a deep expertise in machine learning on AWS by providing solutions that enable machine learning and data science workflows for their customers. This new AWS Competency is in addition to the Machine Learning comptency for our APN Technology Partners, that we launched at the re:Invent 2017 partner summit.

These APN Consulting Partners help organizations solve their machine learning and data challenges through:

  • Providing data services that help data scientists and machine learning practitioners prepare their enterprise data for training.
  • Platform solutions that provide data scientists and machine learning practitioners with tools to take their data, train models, and make predictions on new data.
  • SaaS and API solutions to enable predictive capabilities within customer applications.

Why work with an AWS Machine Learning Competency Partner?

The AWS Competency Program helps customers find the most qualified partners with deep expertise. AWS Machine Learning Competency Partners undergo a strict validation of their capabilities to demonstrate technical proficiency and proven customer success with AWS machine learning tools.

If you’re an AWS customer interested in machine learning workloads on AWS, check out our AWS Machine Learning launch partners below:

 

Interested in becoming an AWS Machine Learning Competency Partner?

APN Partners with experience in Machine Learning can learn more about becoming an AWS Machine Learning Competency Partner here. To learn more about the benefits of joining the AWS Partner Network, see our APN Partner website.

Thanks to the AWS Partner Team for their help with this post!
Randall

What’s new in HiveMQ 3.4

Post Syndicated from The HiveMQ Team original https://www.hivemq.com/whats-new-in-hivemq-3-4

We are pleased to announce the release of HiveMQ 3.4. This version of HiveMQ is the most resilient and advanced version of HiveMQ ever. The main focus in this release was directed towards addressing the needs for the most ambitious MQTT deployments in the world for maximum performance and resilience for millions of concurrent MQTT clients. Of course, deployments of all sizes can profit from the improvements in the latest and greatest HiveMQ.

This version is a drop-in replacement for HiveMQ 3.3 and of course supports rolling upgrades with zero-downtime.

HiveMQ 3.4 brings many features that your users, administrators and plugin developers are going to love. These are the highlights:

 

New HiveMQ 3.4 features at a glance

Cluster

HiveMQ 3.4 brings various improvements in terms of scalability, availability, resilience and observability for the cluster mechanism. Many of the new features remain under the hood, but several additions stand out:

Cluster Overload Protection

The new version has a first-of-its-kind Cluster Overload Protection. The whole cluster is able to spot MQTT clients that cause overload on nodes or the cluster as a whole and protects itself from the overload. This mechanism also protects the deployment from cascading failures due to slow or failing underlying hardware (as sometimes seen on cloud providers). This feature is enabled by default and you can learn more about the mechanism in our documentation.

Dynamic Replicates

HiveMQ’s sophisticated cluster mechanism is able to scale in a linear fashion due to extremely efficient and true data distribution mechanics based on a configured replication factor. The most important aspect of every cluster is availability, which is achieved by having eventual consistency functions in place for edge cases. The 3.4 version adds dynamic replicates to the cluster so even the most challenging edge cases involving network splits don’t lead to the sacrifice of consistency for the most important MQTT operations.

Node Stress Level Metrics

All MQTT cluster nodes are now aware of their own stress level and the stress levels of other cluster members. While all stress mitigation is handled internally by HiveMQ, experienced operators may want to monitor the individual node’s stress level (e.g with Grafana) in order to start investigating what caused the increase of load.

WebUI

Operators worldwide love the HiveMQ WebUI introduced with HiveMQ 3.3. We gathered all the fantastic feedback from our users and polished the WebUI, so it’s even more useful for day-to-day broker operations and remote debugging of MQTT clients. The most important changes and additions are:

Trace Recording Download

The unique Trace Recordings functionality is without doubt a lifesaver when the behavior of individual MQTT clients needs further investigation as all interactions with the broker can be traced — at runtime and at scale! Huge production deployments may accumulate multiple gigabytes of trace recordings. HiveMQ now offers a convenient way to collect all trace recordings from all nodes, zips them and allows the download via a simple button on the WebUI. Remote debugging was never easier!

Additional Client Detail Information in WebUI

The mission of the HiveMQ WebUI is to provide easy insights to the whole production MQTT cluster for operators and administrators. Individual MQTT client investigations are a piece of cake, as all available information about clients can be viewed in detail. We further added the ability to view the restrictions a concrete client has:

  • Maximum Inflight Queue Size
  • Client Offline Queue Messages Size
  • Client Offline Message Drop Strategy

Session Invalidation

MQTT persistent sessions are one of the outstanding features of the MQTT protocol specification. Sessions which do not expire but are never reused unnecessarily consume disk space and memory. Administrators can now invalidate individual session directly in the HiveMQ WebUI for client sessions, which can be deleted safely. HiveMQ 3.4 will take care and release the resources on all cluster nodes after a session was invalidated

Web UI Polishing

Most texts on the WebUI were revisited and are now clearer and crisper. The help texts also received a major overhaul and should now be more, well, helpful. In addition, many small improvements were added, which are most of the time invisible but are here to help when you need them most. For example, the WebUI now displays a warning if cluster nodes with old versions are in the cluster (which may happen if a rolling upgrade was not finished properly)

Plugin System

One of the most popular features of HiveMQ is the extensive Plugin System, which virtually enables the integration of HiveMQ to any system and allows hooking into all aspects of the MQTT lifecycle. We listened to the feedback and are pleased to announce many improvements, big and small, for the Plugin System:

Client Session Time-to-live for individual clients

HiveMQ 3.3 offered a global configuration for setting the Time-To-Live for MQTT sessions. With the advent of HiveMQ 3.4, users can now programmatically set Time-To-Live values for individual MQTT clients and can discard a MQTT session immediately.

Individual Inflight Queues

While the Inflight Queue configuration is typically sufficient in the HiveMQ default configuration, there are some use cases that require the adjustment of this configuration. It’s now possible to change the Inflight Queue size for individual clients via the Plugin System.
 
 

Plugin Service Overload Protection

The HiveMQ Plugin System is a power-user tool and it’s possible to do unbelievably useful modifications as well as putting major stress on the system as a whole if the programmer is not careful. In order to protect the HiveMQ instances from accidental overload, a Plugin Service Overload Protection can be configured. This rate limits the Plugin Service usage and gives feedback to the application programmer in case the rate limit is exceeded. This feature is disabled by default but we strongly recommend updating your plugins to profit from this feature.

Session Attribute Store putIfNewer

This is one of the small bits you almost never need but when you do, you’re ecstatic for being able to use it. The Session Attribute Store now offers methods to put values, if the values you want to put are newer or fresher than the values already written. This is extremely useful, if multiple cluster nodes want to write to the Session Attribute Store simultaneously, as this guarantees that outdated values can no longer overwrite newer values.
 
 
 
 

Disconnection Timestamp for OnDisconnectCallback

As the OnDisconnectCallback is executed asynchronously, the client might already be gone when the callback is executed. It’s now easy to obtain the exact timestamp when a MQTT client disconnected, even if the callback is executed later on. This feature might be very interesting for many plugin developers in conjunction with the Session Attribute Store putIfNewer functionality.

Operations

We ❤️ Operators and we strive to provide all the tools needed for operating and administrating a MQTT broker cluster at scale in any environment. A key strategy for successful operations of any system is monitoring. We added some interesting new metrics you might find useful.

System Metrics

In addition to JVM Metrics, HiveMQ now also gathers Operating System Metrics for Linux Systems. So HiveMQ is able to see for itself how the operating system views the process, including native memory, the real CPU usage, and open file usage. These metrics are particularly useful, if you don’t have a monitoring agent for Linux systems setup. All metrics can be found here.

Client Disconnection Metrics

The reality of many MQTT scenarios is that not all clients are able to disconnect gracefully by sending MQTT DISCONNECT messages. HiveMQ now also exposes metrics about clients that disconnected by closing the TCP connection instead of sending a DISCONNECT packet first. This is especially useful for monitoring, if you regularly deal with clients that don’t have a stable connection to the MQTT brokers.

 

JMX enabled by default

JMX, the Java Monitoring Extension, is now enabled by default. Many HiveMQ operators use Application Performance Monitoring tools, which are able to hook into the metrics via JMX or use plain JMX for on-the-fly debugging. While we recommend to use official off-the-shelf plugins for monitoring, it’s now easier than ever to just use JMX if other solutions are not available to you.

Other notable improvements

The 3.4 release of HiveMQ is full of hidden gems and improvements. While it would be too much to highlight all small improvements, these notable changes stand out and contribute to the best HiveMQ release ever.

Topic Level Distribution Configuration

Our recommendation for all huge deployments with millions of devices is: Start with separate topic prefixes by bringing the dynamic topic parts directly to the beginning. The reality is that many customers have topics that are constructed like the following: “devices/{deviceId}/status”. So what happens is that all topics in this example start with a common prefix, “devices”, which is the first topic level. Unfortunately the first topic level doesn’t include a dynamic topic part. In order to guarantee the best scalability of the cluster and the best performance of the topic tree, customers can now configure how many topic levels are used for distribution. In the example outlined here, a topic level distribution of 2 would be perfect and guarantees the best scalability.

Mass disconnect performance improvements

Mass disconnections of MQTT clients can happen. This might be the case when e.g. a load balancer in front of the MQTT broker cluster drops the connections or if a mobile carrier experiences connectivity problems. Prior to HiveMQ 3.4, mass disconnect events caused stress on the cluster. Mass disconnect events are now massively optimized and even tens of millions of connection losses at the same time won’t bring the cluster into stress situations.

 
 
 
 
 
 

Replication Performance Improvements

Due to the distributed nature of a HiveMQ, data needs to be replicated across the cluster in certain events, e.g. when cluster topology changes occur. There are various internal improvements in HiveMQ version 3.4, which increase the replication performance significantly. Our engineers put special love into the replication of Queued Messages, which is now faster than ever, even for multiple millions of Queued Messages that need to be transferred across the cluster.

Updated Native SSL Libraries

The Native SSL Integration of HiveMQ was updated to the newest BoringSSL version. This results in better performance and increased security. In case you’re using SSL and you are not yet using the native SSL integration, we strongly recommend to give it a try, more than 40% performance improvement can be observed for most deployments.

 
 

Improvements for Java 9

While Java 9 was already supported for older HiveMQ versions, HiveMQ 3.4 has full-blown Java 9 support. The minimum Java version still remains Java 7, although we strongly recommend to use Java 8 or newer for the best performance of HiveMQ.

EC2 Fleet – Manage Thousands of On-Demand and Spot Instances with One Request

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/ec2-fleet-manage-thousands-of-on-demand-and-spot-instances-with-one-request/

EC2 Spot Fleets are really cool. You can launch a fleet of Spot Instances that spans EC2 instance types and Availability Zones without having to write custom code to discover capacity or monitor prices. You can set the target capacity (the size of the fleet) in units that are meaningful to your application and have Spot Fleet create and then maintain the fleet on your behalf. Our customers are creating Spot Fleets of all sizes. For example, one financial service customer runs Monte Carlo simulations across 10 different EC2 instance types. They routinely make requests for hundreds of thousands of vCPUs and count on Spot Fleet to give them access to massive amounts of capacity at the best possible price.

EC2 Fleet
Today we are extending and generalizing the set-it-and-forget-it model that we pioneered in Spot Fleet with EC2 Fleet, a new building block that gives you the ability to create fleets that are composed of a combination of EC2 On-Demand, Reserved, and Spot Instances with a single API call. You tell us what you need, capacity and instance-wise, and we’ll handle all the heavy lifting. We will launch, manage, monitor and scale instances as needed, without the need for scaffolding code.

You can specify the capacity of your fleet in terms of instances, vCPUs, or application-oriented units, and also indicate how much of the capacity should be fulfilled by Spot Instances. The application-oriented units allow you to specify the relative power of each EC2 instance type in a way that directly maps to the needs of your application. All three capacity specification options (instances, vCPUs, and application-oriented units) are known as weights.

I think you’ll find a number ways this feature makes managing a fleet of instances easier, and believe that you will also appreciate the team’s near-term feature roadmap of interest (more on that in a bit).

Using EC2 Fleet
There are a number of ways that you can use this feature, whether you’re running a stateless web service, a big data cluster or a continuous integration pipeline. Today I’m going to describe how you can use EC2 Fleet for genomic processing, but this is similar to workloads like risk analysis, log processing or image rendering. Modern DNA sequencers can produce multiple terabytes of raw data each day, to process that data into meaningful information in a timely fashion you need lots of processing power. I’ll be showing you how to deploy a “grid” of worker nodes that can quickly crunch through secondary analysis tasks in parallel.

Projects in genomics can use the elasticity EC2 provides to experiment and try out new pipelines on hundreds or even thousands of servers. With EC2 you can access as many cores as you need and only pay for what you use. Prior to today, you would need to use the RunInstances API or an Auto Scaling group for the On-Demand & Reserved Instance portion of your grid. To get the best price performance you’d also create and manage a Spot Fleet or multiple Spot Auto Scaling groups with different instance types if you wanted to add Spot Instances to turbo-boost your secondary analysis. Finally, to automate scaling decisions across multiple APIs and Auto Scaling groups you would need to write Lambda functions that periodically assess your grid’s progress & backlog, as well as current Spot prices – modifying your Auto Scaling Groups and Spot Fleets accordingly.

You can now replace all of this with a single EC2 Fleet, analyzing genomes at scale for as little as $1 per analysis. In my grid, each step in in the pipeline requires 1 vCPU and 4 GiB of memory, a perfect match for M4 and M5 instances with 4 GiB of memory per vCPU. I will create a fleet using M4 and M5 instances with weights that correspond to the number of vCPUs on each instance:

  • m4.16xlarge – 64 vCPUs, weight = 64
  • m5.24xlarge – 96 vCPUs, weight = 96

This is expressed in a template that looks like this:

"Overrides": [
{
  "InstanceType": "m4.16xlarge",
  "WeightedCapacity": 64,
},
{
  "InstanceType": "m5.24xlarge",
  "WeightedCapacity": 96,
},
]

By default, EC2 Fleet will select the most cost effective combination of instance types and Availability Zones (both specified in the template) using the current prices for the Spot Instances and public prices for the On-Demand Instances (if you specify instances for which you have matching RIs, your discounts will apply). The default mode takes weights into account to get the instances that have the lowest price per unit. So for my grid, fleet will find the instance that offers the lowest price per vCPU.

Now I can request capacity in terms of vCPUs, knowing EC2 Fleet will select the lowest cost option using only the instance types I’ve defined as acceptable. Also, I can specify how many vCPUs I want to launch using On-Demand or Reserved Instance capacity and how many vCPUs should be launched using Spot Instance capacity:

"TargetCapacitySpecification": {
	"TotalTargetCapacity": 2880,
	"OnDemandTargetCapacity": 960,
	"SpotTargetCapacity": 1920,
	"DefaultTargetCapacityType": "Spot"
}

The above means that I want a total of 2880 vCPUs, with 960 vCPUs fulfilled using On-Demand and 1920 using Spot. The On-Demand price per vCPU is lower for m5.24xlarge than the On-Demand price per vCPU for m4.16xlarge, so EC2 Fleet will launch 10 m5.24xlarge instances to fulfill 960 vCPUs. Based on current Spot pricing (again, on a per-vCPU basis), EC2 Fleet will choose to launch 30 m4.16xlarge instances or 20 m5.24xlarges, delivering 1920 vCPUs either way.

Putting it all together, I have a single file (fl1.json) that describes my fleet:

    "LaunchTemplateConfigs": [
        {
            "LaunchTemplateSpecification": {
                "LaunchTemplateId": "lt-0e8c754449b27161c",
                "Version": "1"
            }
        "Overrides": [
        {
          "InstanceType": "m4.16xlarge",
          "WeightedCapacity": 64,
        },
        {
          "InstanceType": "m5.24xlarge",
          "WeightedCapacity": 96,
        },
      ]
        }
    ],
    "TargetCapacitySpecification": {
        "TotalTargetCapacity": 2880,
        "OnDemandTargetCapacity": 960,
        "SpotTargetCapacity": 1920,
        "DefaultTargetCapacityType": "Spot"
    }
}

I can launch my fleet with a single command:

$ aws ec2 create-fleet --cli-input-json file://home/ec2-user/fl1.json
{
    "FleetId":"fleet-838cf4e5-fded-4f68-acb5-8c47ee1b248a"
}

My entire fleet is created within seconds and was built using 10 m5.24xlarge On-Demand Instances and 30 m4.16xlarge Spot Instances, since the current Spot price was 1.5¢ per vCPU for m4.16xlarge and 1.6¢ per vCPU for m5.24xlarge.

Now lets imagine my grid has crunched through its backlog and no longer needs the additional Spot Instances. I can then modify the size of my fleet by changing the target capacity in my fleet specification, like this:

{         
    "TotalTargetCapacity": 960,
}

Since 960 was equal to the amount of On-Demand vCPUs I had requested, when I describe my fleet I will see all of my capacity being delivered using On-Demand capacity:

"TargetCapacitySpecification": {
	"TotalTargetCapacity": 960,
	"OnDemandTargetCapacity": 960,
	"SpotTargetCapacity": 0,
	"DefaultTargetCapacityType": "Spot"
}

When I no longer need my fleet I can delete it and terminate the instances in it like this:

$ aws ec2 delete-fleets --fleet-id fleet-838cf4e5-fded-4f68-acb5-8c47ee1b248a \
  --terminate-instances   
{
    "UnsuccessfulFleetDletetions": [],
    "SuccessfulFleetDeletions": [
        {
            "CurrentFleetState": "deleted_terminating",
            "PreviousFleetState": "active",
            "FleetId": "fleet-838cf4e5-fded-4f68-acb5-8c47ee1b248a"
        }
    ]
}

Earlier I described how RI discounts apply when EC2 Fleet launches instances for which you have matching RIs, so you might be wondering how else RI customers benefit from EC2 Fleet. Let’s say that I own regional RIs for M4 instances. In my EC2 Fleet I would remove m5.24xlarge and specify m4.10xlarge and m4.16xlarge. Then when EC2 Fleet creates the grid, it will quickly find M4 capacity across the sizes and AZs I’ve specified, and my RI discounts apply automatically to this usage.

In the Works
We plan to connect EC2 Fleet and EC2 Auto Scaling groups. This will let you create a single fleet that mixed instance types and Spot, Reserved and On-Demand, while also taking advantage of EC2 Auto Scaling features such as health checks and lifecycle hooks. This integration will also bring EC2 Fleet functionality to services such as Amazon ECS, Amazon EKS, and AWS Batch that build on and make use of EC2 Auto Scaling for fleet management.

Available Now
You can create and make use of EC2 Fleets today in all public AWS Regions!

Jeff;

Secure Build with AWS CodeBuild and LayeredInsight

Post Syndicated from Asif Khan original https://aws.amazon.com/blogs/devops/secure-build-with-aws-codebuild-and-layeredinsight/

This post is written by Asif Awan, Chief Technology Officer of Layered InsightSubin Mathew – Software Development Manager for AWS CodeBuild, and Asif Khan – Solutions Architect

Enterprises adopt containers because they recognize the benefits: speed, agility, portability, and high compute density. They understand how accelerating application delivery and deployment pipelines makes it possible to rapidly slipstream new features to customers. Although the benefits are indisputable, this acceleration raises concerns about security and corporate compliance with software governance. In this blog post, I provide a solution that shows how Layered Insight, the pioneer and global leader in container-native application protection, can be used with seamless application build and delivery pipelines like those available in AWS CodeBuild to address these concerns.

Layered Insight solutions

Layered Insight enables organizations to unify DevOps and SecOps by providing complete visibility and control of containerized applications. Using the industry’s first embedded security approach, Layered Insight solves the challenges of container performance and protection by providing accurate insight into container images, adaptive analysis of running containers, and automated enforcement of container behavior.

 

AWS CodeBuild

AWS CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy. With CodeBuild, you don’t need to provision, manage, and scale your own build servers. CodeBuild scales continuously and processes multiple builds concurrently, so your builds are not left waiting in a queue. You can get started quickly by using prepackaged build environments, or you can create custom build environments that use your own build tools.

 

Problem Definition

Security and compliance concerns span the lifecycle of application containers. Common concerns include:

Visibility into the container images. You need to verify the software composition information of the container image to determine whether known vulnerabilities associated with any of the software packages and libraries are included in the container image.

Governance of container images is critical because only certain open source packages/libraries, of specific versions, should be included in the container images. You need support for mechanisms for blacklisting all container images that include a certain version of a software package/library, or only allowing open source software that come with a specific type of license (such as Apache, MIT, GPL, and so on). You need to be able to address challenges such as:

·       Defining the process for image compliance policies at the enterprise, department, and group levels.

·       Preventing the images that fail the compliance checks from being deployed in critical environments, such as staging, pre-prod, and production.

Visibility into running container instances is critical, including:

·       CPU and memory utilization.

·       Security of the build environment.

·       All activities (system, network, storage, and application layer) of the application code running in each container instance.

Protection of running container instances that is:

·       Zero-touch to the developers (not an SDK-based approach).

·       Zero touch to the DevOps team and doesn’t limit the portability of the containerized application.

·       This protection must retain the option to switch to a different container stack or orchestration layer, or even to a different Container as a Service (CaaS ).

·       And it must be a fully automated solution to SecOps, so that the SecOps team doesn’t have to manually analyze and define detailed blacklist and whitelist policies.

 

Solution Details

In AWS CodeCommit, we have three projects:
●     “Democode” is a simple Java application, with one buildspec to build the app into a Docker container (run by build-demo-image CodeBuild project), and another to instrument said container (instrument-image CodeBuild project). The resulting container is stored in ECR repo javatestasjavatest:20180415-layered. This instrumented container is running in AWS Fargate cluster demo-java-appand can be seen in the Layered Insight runtime console as the javatestapplication in us-east-1.
●     aws-codebuild-docker-imagesis a clone of the official aws-codebuild-docker-images repo on GitHub . This CodeCommit project is used by the build-python-builder CodeBuild project to build the python 3.3.6 codebuild image and is stored at the codebuild-python ECR repo. We then manually instructed the Layered Insight console to instrument the image.
●     scan-java-imagecontains just a buildspec.yml file. This file is used by the scan-java-image CodeBuild project to instruct Layered Assessment to perform a vulnerability scan of the javatest container image built previously, and then run the scan results through a compliance policy that states there should be no medium vulnerabilities. This build fails — but in this case that is a success: the scan completes successfully, but compliance fails as there are medium-level issues found in the scan.

This build is performed using the instrumented version of the Python 3.3.6 CodeBuild image, so the activity of the processes running within the build are recorded each time within the LI console.

Build container image

Create or use a CodeCommit project with your application. To build this image and store it in Amazon Elastic Container Registry (Amazon ECR), add a buildspec file to the project and build a container image and create a CodeBuild project.

Scan container image

Once the image is built, create a new buildspec in the same project or a new one that looks similar to below (update ECR URL as necessary):

version: 0.2
phases:
  pre_build:
    commands:
      - echo Pulling down LI Scan API client scripts
      - git clone https://github.com/LayeredInsight/scan-api-example-python.git
      - echo Setting up LI Scan API client
      - cd scan-api-example-python
      - pip install layint_scan_api
      - pip install -r requirements.txt
  build:
    commands:
      - echo Scanning container started on `date`
      - IMAGEID=$(./li_add_image --name <aws-region>.amazonaws.com/javatest:20180415)
      - ./li_wait_for_scan -v --imageid $IMAGEID
      - ./li_run_image_compliance -v --imageid $IMAGEID --policyid PB15260f1acb6b2aa5b597e9d22feffb538256a01fbb4e5a95

Add the buildspec file to the git repo, push it, and then build a CodeBuild project using with the instrumented Python 3.3.6 CodeBuild image at <aws-region>.amazonaws.com/codebuild-python:3.3.6-layered. Set the following environment variables in the CodeBuild project:
●     LI_APPLICATIONNAME – name of the build to display
●     LI_LOCATION – location of the build project to display
●     LI_API_KEY – ApiKey:<key-name>:<api-key>
●     LI_API_HOST – location of the Layered Insight API service

Instrument container image

Next, to instrument the new container image:

  1. In the Layered Insight runtime console, ensure that the ECR registry and credentials are defined (click the Setup icon and the ‘+’ sign on the top right of the screen to add a new container registry). Note the name given to the registry in the console, as this needs to be referenced in the li_add_imagecommand in the script, below.
  2. Next, add a new buildspec (with a new name) to the CodeCommit project, such as the one shown below. This code will download the Layered Insight runtime client, and use it to instruct the Layered Insight service to instrument the image that was just built:
    version: 0.2
    phases:
    pre_build:
    commands:
    echo Pulling down LI API Runtime client scripts
    git clone https://github.com/LayeredInsight/runtime-api-example-python
    echo Setting up LI API client
    cd runtime-api-example-python
    pip install layint-runtime-api
    pip install -r requirements.txt
    build:
    commands:
    echo Instrumentation started on `date`
    ./li_add_image --registry "Javatest ECR" --name IMAGE_NAME:TAG --description "IMAGE DESCRIPTION" --policy "Default Policy" --instrument --wait --verbose
  3. Commit and push the new buildspec file.
  4. Going back to CodeBuild, create a new project, with the same CodeCommit repo, but this time select the new buildspec file. Use a Python 3.3.6 builder – either the AWS or LI Instrumented version.
  5. Click Continue
  6. Click Save
  7. Run the build, again on the master branch.
  8. If everything runs successfully, a new image should appear in the ECR registry with a -layered suffix. This is the instrumented image.

Run instrumented container image

When the instrumented container is now run — in ECS, Fargate, or elsewhere — it will log data back to the Layered Insight runtime console. It’s appearance in the console can be modified by setting the LI_APPLICATIONNAME and LI_LOCATION environment variables when running the container.

Conclusion

In the above blog we have provided you steps needed to embed governance and runtime security in your build pipelines running on AWS CodeBuild using Layered Insight.

 

 

 

Get Started with Blockchain Using the new AWS Blockchain Templates

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/get-started-with-blockchain-using-the-new-aws-blockchain-templates/

Many of today’s discussions around blockchain technology remind me of the classic Shimmer Floor Wax skit. According to Dan Aykroyd, Shimmer is a dessert topping. Gilda Radner claims that it is a floor wax, and Chevy Chase settles the debate and reveals that it actually is both! Some of the people that I talk to see blockchains as the foundation of a new monetary system and a way to facilitate international payments. Others see blockchains as a distributed ledger and immutable data source that can be applied to logistics, supply chain, land registration, crowdfunding, and other use cases. Either way, it is clear that there are a lot of intriguing possibilities and we are working to help our customers use this technology more effectively.

We are launching AWS Blockchain Templates today. These templates will let you launch an Ethereum (either public or private) or Hyperledger Fabric (private) network in a matter of minutes and with just a few clicks. The templates create and configure all of the AWS resources needed to get you going in a robust and scalable fashion.

Launching a Private Ethereum Network
The Ethereum template offers two launch options. The ecs option creates an Amazon ECS cluster within a Virtual Private Cloud (VPC) and launches a set of Docker images in the cluster. The docker-local option also runs within a VPC, and launches the Docker images on EC2 instances. The template supports Ethereum mining, the EthStats and EthExplorer status pages, and a set of nodes that implement and respond to the Ethereum RPC protocol. Both options create and make use of a DynamoDB table for service discovery, along with Application Load Balancers for the status pages.

Here are the AWS Blockchain Templates for Ethereum:

I start by opening the CloudFormation Console in the desired region and clicking Create Stack:

I select Specify an Amazon S3 template URL, enter the URL of the template for the region, and click Next:

I give my stack a name:

Next, I enter the first set of parameters, including the network ID for the genesis block. I’ll stick with the default values for now:

I will also use the default values for the remaining network parameters:

Moving right along, I choose the container orchestration platform (ecs or docker-local, as I explained earlier) and the EC2 instance type for the container nodes:

Next, I choose my VPC and the subnets for the Ethereum network and the Application Load Balancer:

I configure my keypair, EC2 security group, IAM role, and instance profile ARN (full information on the required permissions can be found in the documentation):

The Instance Profile ARN can be found on the summary page for the role:

I confirm that I want to deploy EthStats and EthExplorer, choose the tag and version for the nested CloudFormation templates that are used by this one, and click Next to proceed:

On the next page I specify a tag for the resources that the stack will create, leave the other options as-is, and click Next:

I review all of the parameters and options, acknowledge that the stack might create IAM resources, and click Create to build my network:

The template makes use of three nested templates:

After all of the stacks have been created (mine took about 5 minutes), I can select JeffNet and click the Outputs tab to discover the links to EthStats and EthExplorer:

Here’s my EthStats:

And my EthExplorer:

If I am writing apps that make use of my private network to store and process smart contracts, I would use the EthJsonRpcUrl.

Stay Tuned
My colleagues are eager to get your feedback on these new templates and plan to add new versions of the frameworks as they become available.

Jeff;

 

Amazon ECS Service Discovery

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-ecs-service-discovery/

Amazon ECS now includes integrated service discovery. This makes it possible for an ECS service to automatically register itself with a predictable and friendly DNS name in Amazon Route 53. As your services scale up or down in response to load or container health, the Route 53 hosted zone is kept up to date, allowing other services to lookup where they need to make connections based on the state of each service. You can see a demo of service discovery in an imaginary social networking app over at: https://servicediscovery.ranman.com/.

Service Discovery


Part of the transition to microservices and modern architectures involves having dynamic, autoscaling, and robust services that can respond quickly to failures and changing loads. Your services probably have complex dependency graphs of services they rely on and services they provide. A modern architectural best practice is to loosely couple these services by allowing them to specify their own dependencies, but this can be complicated in dynamic environments as your individual services are forced to find their own connection points.

Traditional approaches to service discovery like consul, etcd, or zookeeper all solve this problem well, but they require provisioning and maintaining additional infrastructure or installation of agents in your containers or on your instances. Previously, to ensure that services were able to discover and connect with each other, you had to configure and run your own service discovery system or connect every service to a load balancer. Now, you can enable service discovery for your containerized services in the ECS console, AWS CLI, or using the ECS API.

Introducing Amazon Route 53 Service Registry and Auto Naming APIs

Amazon ECS Service Discovery works by communicating with the Amazon Route 53 Service Registry and Auto Naming APIs. Since we haven’t talked about it before on this blog, I want to briefly outline how these Route 53 APIs work. First, some vocabulary:

  • Namespaces – A namespace specifies a domain name you want to route traffic to (e.g. internal, local, corp). You can think of it as a logical boundary between which services should be able to discover each other. You can create a namespace with a call to the aws servicediscovery create-private-dns-namespace command or in the ECS console. Namespaces are roughly equivalent to hosted zones in Route 53. A namespace contains services, our next vocabulary word.
  • Service – A service is a specific application or set of applications in your namespace like “auth”, “timeline”, or “worker”. A service contains service instances.
  • Service Instance – A service instance contains information about how Route 53 should respond to DNS queries for a resource.

Route 53 provides APIs to create: namespaces, A records per task IP, and SRV records per task IP + port.

When we ask Route 53 for something like: worker.corp we should get back a set of possible IPs that could fulfill that request. If the application we’re connecting to exposes dynamic ports then the calling application can easily query the SRV record to get more information.

ECS service discovery is built on top of the Route 53 APIs and manages all of the underlying API calls for you. Now that we understand how the service registry, works lets take a look at the ECS side to see service discovery in action.

Amazon ECS Service Discovery

Let’s launch an application with service discovery! First, I’ll create two task definitions: “flask-backend” and “flask-worker”. Both are simple AWS Fargate tasks with a single container serving HTTP requests. I’ll have flask-backend ask worker.corp to do some work and I’ll return the response as well as the address Route 53 returned for worker. Something like the code below:

@app.route("/")
namespace = os.getenv("namespace")
worker_host = "worker" + namespace
def backend():
    r = requests.get("http://"+worker_host)
    worker = socket.gethostbyname(worker_host)
    return "Worker Message: {]\nFrom: {}".format(r.content, worker)

 

Now, with my containers and task definitions in place, I’ll create a service in the console.

As I move to step two in the service wizard I’ll fill out the service discovery section and have ECS create a new namespace for me.

I’ll also tell ECS to monitor the health of the tasks in my service and add or remove them from Route 53 as needed. Then I’ll set a TTL of 10 seconds on the A records we’ll use.

I’ll repeat those same steps for my “worker” service and after a minute or so most of my tasks should be up and running.

Over in the Route 53 console I can see all the records for my tasks!

We can use the Route 53 service discovery APIs to list all of our available services and tasks and programmatically reach out to each one. We could easily extend to any number of services past just backend and worker. I’ve created a simple demo of an imaginary social network with services like “auth”, “feed”, “timeline”, “worker”, “user” and more here: https://servicediscovery.ranman.com/. You can see the code used to run that page on github.

Available Now
Amazon ECS service discovery is available now in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland). AWS Fargate is currently only available in US East (N. Virginia). When you use ECS service discovery, you pay for the Route 53 resources that you consume, including each namespace that you create, and for the lookup queries your services make. Container level health checks are provided at no cost. For more information on pricing check out the documentation.

Please let us know what you’ll be building or refactoring with service discovery either in the comments or on Twitter!

Randall

 

P.S. Every blog post I write is made with a tremendous amount of help from numerous AWS colleagues. To everyone that helped build service discovery across all of our teams – thank you :)!

Central Logging in Multi-Account Environments

Post Syndicated from matouk original https://aws.amazon.com/blogs/architecture/central-logging-in-multi-account-environments/

Centralized logging is often required in large enterprise environments for a number of reasons, ranging from compliance and security to analytics and application-specific needs.

I’ve seen that in a multi-account environment, whether the accounts belong to the same line of business or multiple business units, collecting logs in a central, dedicated logging account is an established best practice. It helps security teams detect malicious activities both in real-time and during incident response. It provides protection to log data in case it is accidentally or intentionally deleted. It also helps application teams correlate and analyze log data across multiple application tiers.

This blog post provides a solution and building blocks to stream Amazon CloudWatch log data across accounts. In a multi-account environment this repeatable solution could be deployed multiple times to stream all relevant Amazon CloudWatch log data from all accounts to a centralized logging account.

Solution Summary 

The solution uses Amazon Kinesis Data Streams and a log destination to set up an endpoint in the logging account to receive streamed logs and uses Amazon Kinesis Data Firehose to deliver log data to the Amazon Simple Storage Solution (S3) bucket. Application accounts will subscribe to stream all (or part) of their Amazon CloudWatch logs to a defined destination in the logging account via subscription filters.

Below is a diagram illustrating how the various services work together.


In logging an account, a Kinesis Data Stream is created to receive streamed log data and a log destination is created to facilitate remote streaming, configured to use the Kinesis Data Stream as its target.

The Amazon Kinesis Data Firehose stream is created to deliver log data from the data stream to S3. The delivery stream uses a generic AWS Lambda function for data validation and transformation.

In each application account, a subscription filter is created between each Amazon CloudWatch log group and the destination created for this log group in the logging account.

The following steps are involved in setting up the central-logging solution:

  1. Create an Amazon S3 bucket for your central logging in the logging account
  2. Create an AWS Lambda function for log data transformation and decoding in logging account
  3. Create a central logging stack as a logging-account destination ready to receive streamed logs and deliver them to S3
  4. Create a subscription in application accounts to deliver logs from a specific CloudWatch log group to the logging account destination
  5. Create Amazon Athena tables to query and analyze log data in your logging account

Creating a log destination in your logging account

In this section, we will setup the logging account side of the solution, providing detail on the list above. The example I use is for the us-east-1 region, however any region where required services are available could be used.

It’s important to note that your logging-account destination and application-account subscription must be in the same region. You can deploy the solution multiple times to create destinations in all required regions if application accounts use multiple regions.

Step 1: Create an S3 bucket

Use the CloudFormation template below to create S3 bucket in logging account. This template also configures the bucket to archive log data to Glacier after 60 days.


{
  "AWSTemplateFormatVersion":"2010-09-09",
  "Description": "CF Template to create S3 bucket for central logging",
  "Parameters":{

    "BucketName":{
      "Type":"String",
      "Default":"",
      "Description":"Central logging bucket name"
    }
  },
  "Resources":{
                        
   "CentralLoggingBucket" : {
      "Type" : "AWS::S3::Bucket",
      "Properties" : {
        "BucketName" : {"Ref": "BucketName"},
        "LifecycleConfiguration": {
            "Rules": [
                {
                  "Id": "ArchiveToGlacier",
                  "Prefix": "",
                  "Status": "Enabled",
                  "Transitions":[{
                      "TransitionInDays": "60",
                      "StorageClass": "GLACIER"
                  }]
                }
            ]
        }
      }
    }

  },
  "Outputs":{
    "CentralLogBucket":{
    	"Description" : "Central log bucket",
    	"Value" : {"Ref": "BucketName"} ,
    	"Export" : { "Name" : "CentralLogBucketName"}
    }
  }
} 

To create your central-logging bucket do the following:

  1. Save the template file to your local developer machine as “central-log-bucket.json”
  2. From the CloudFormation console, select “create new stack” and import the file “central-log-bucket.json”
  3. Fill in the parameters and complete stack creation steps (as indicated in the screenshot below)
  4. Verify the bucket has been created successfully and take a note of the bucket name

Step 2: Create data processing Lambda function

Use the template below to create a Lambda function in your logging account that will be used by Amazon Firehose for data transformation during the delivery process to S3. This function is based on the AWS Lambda kinesis-firehose-cloudwatch-logs-processor blueprint.

The function could be created manually from the blueprint or using the cloudformation template below. To find the blueprint navigate to Lambda -> Create -> Function -> Blueprints

This function will unzip the event message, parse it and verify that it is a valid CloudWatch log event. Additional processing can be added if needed. As this function is generic, it could be reused by all log-delivery streams.

{
  "AWSTemplateFormatVersion":"2010-09-09",
  "Description": "Create cloudwatch data processing lambda function",
  "Resources":{
      
    "LambdaRole": {
        "Type": "AWS::IAM::Role",
        "Properties": {
            "AssumeRolePolicyDocument": {
                "Version": "2012-10-17",
                "Statement": [
                    {
                        "Effect": "Allow",
                        "Principal": {
                            "Service": "lambda.amazonaws.com"
                        },
                        "Action": "sts:AssumeRole"
                    }
                ]
            },
            "Path": "/",
            "Policies": [
                {
                    "PolicyName": "firehoseCloudWatchDataProcessing",
                    "PolicyDocument": {
                        "Version": "2012-10-17",
                        "Statement": [
                            {
                                "Effect": "Allow",
                                "Action": [
                                    "logs:CreateLogGroup",
                                    "logs:CreateLogStream",
                                    "logs:PutLogEvents"
                                ],
                                "Resource": "arn:aws:logs:*:*:*"
                            }
                        ]
                    }
                }
            ]
        }
    },
      
    "FirehoseDataProcessingFunction": {
        "Type": "AWS::Lambda::Function",
        "Properties": {
            "Handler": "index.handler",
            "Role": {"Fn::GetAtt": ["LambdaRole","Arn"]},
            "Description": "Firehose cloudwatch data processing",
            "Code": {
                "ZipFile" : { "Fn::Join" : ["\n", [
                  "'use strict';",
                  "const zlib = require('zlib');",
                  "function transformLogEvent(logEvent) {",
                  "       return Promise.resolve(`${logEvent.message}\n`);",
                  "}",
                  "exports.handler = (event, context, callback) => {",
                  "    Promise.all(event.records.map(r => {",
                  "        const buffer = new Buffer(r.data, 'base64');",
                  "        const decompressed = zlib.gunzipSync(buffer);",
                  "        const data = JSON.parse(decompressed);",
                  "        if (data.messageType !== 'DATA_MESSAGE') {",
                  "            return Promise.resolve({",
                  "                recordId: r.recordId,",
                  "                result: 'ProcessingFailed',",
                  "            });",
                  "         } else {",
                  "            const promises = data.logEvents.map(transformLogEvent);",
                  "            return Promise.all(promises).then(transformed => {",
                  "                const payload = transformed.reduce((a, v) => a + v, '');",
                  "                const encoded = new Buffer(payload).toString('base64');",
                  "                console.log('---------------payloadv2:'+JSON.stringify(payload, null, 2));",
                  "                return {",
                  "                    recordId: r.recordId,",
                  "                    result: 'Ok',",
                  "                    data: encoded,",
                  "                };",
                  "           });",
                  "        }",
                  "    })).then(recs => callback(null, { records: recs }));",
                    "};"

                ]]}
            },
            "Runtime": "nodejs6.10",
            "Timeout": "60"
        }
    }

  },
  "Outputs":{
   "Function" : {
      "Description": "Function ARN",
      "Value": {"Fn::GetAtt": ["FirehoseDataProcessingFunction","Arn"]},
      "Export" : { "Name" : {"Fn::Sub": "${AWS::StackName}-Function" }}
    }
  }
}

To create the function follow the steps below:

  1. Save the template file as “central-logging-lambda.json”
  2. Login to logging account and, from the CloudFormation console, select “create new stack”
  3. Import the file “central-logging-lambda.json” and click next
  4. Follow the steps to create the stack and verify successful creation
  5. Take a note of Lambda function arn from the output section

Step 3: Create log destination in logging account

Log destination is used as the target of a subscription from application accounts, log destination can be shared between multiple subscriptions however according to the architecture suggested in this solution all logs streamed to the same destination will be stored in the same S3 location, if you would like to store log data in different hierarchy or in a completely different bucket you need to create separate destinations.

As noted previously, your destination and subscription have to be in the same region

Use the template below to create destination stack in logging account.

{
  "AWSTemplateFormatVersion":"2010-09-09",
  "Description": "Create log destination and required resources",
  "Parameters":{

    "LogBucketName":{
      "Type":"String",
      "Default":"central-log-do-not-delete",
      "Description":"Destination logging bucket"
    },
    "LogS3Location":{
      "Type":"String",
      "Default":"<BU>/<ENV>/<SOURCE_ACCOUNT>/<LOG_TYPE>/",
      "Description":"S3 location for the logs streamed to this destination; example marketing/prod/999999999999/flow-logs/"
    },
    "ProcessingLambdaARN":{
      "Type":"String",
      "Default":"",
      "Description":"CloudWatch logs data processing function"
    },
    "SourceAccount":{
      "Type":"String",
      "Default":"",
      "Description":"Source application account number"
    }
  },
    
  "Resources":{
    "MyStream": {
      "Type": "AWS::Kinesis::Stream",
      "Properties": {
        "Name": {"Fn::Join" : [ "", [{ "Ref" : "AWS::StackName" },"-Stream"] ]},
        "RetentionPeriodHours" : 48,
        "ShardCount": 1,
        "Tags": [
          {
            "Key": "Solution",
            "Value": "CentralLogging"
          }
       ]
      }
    },
    "LogRole" : {
      "Type"  : "AWS::IAM::Role",
      "Properties" : {
          "AssumeRolePolicyDocument" : {
              "Statement" : [ {
                  "Effect" : "Allow",
                  "Principal" : {
                      "Service" : [ {"Fn::Join": [ "", [ "logs.", { "Ref": "AWS::Region" }, ".amazonaws.com" ] ]} ]
                  },
                  "Action" : [ "sts:AssumeRole" ]
              } ]
          },         
          "Path" : "/service-role/"
      }
    },
      
    "LogRolePolicy" : {
        "Type" : "AWS::IAM::Policy",
        "Properties" : {
            "PolicyName" : {"Fn::Join" : [ "", [{ "Ref" : "AWS::StackName" },"-LogPolicy"] ]},
            "PolicyDocument" : {
              "Version": "2012-10-17",
              "Statement": [
                {
                  "Effect": "Allow",
                  "Action": ["kinesis:PutRecord"],
                  "Resource": [{ "Fn::GetAtt" : ["MyStream", "Arn"] }]
                },
                {
                  "Effect": "Allow",
                  "Action": ["iam:PassRole"],
                  "Resource": [{ "Fn::GetAtt" : ["LogRole", "Arn"] }]
                }
              ]
            },
            "Roles" : [ { "Ref" : "LogRole" } ]
        }
    },
      
    "LogDestination" : {
      "Type" : "AWS::Logs::Destination",
      "DependsOn" : ["MyStream","LogRole","LogRolePolicy"],
      "Properties" : {
        "DestinationName": {"Fn::Join" : [ "", [{ "Ref" : "AWS::StackName" },"-Destination"] ]},
        "RoleArn": { "Fn::GetAtt" : ["LogRole", "Arn"] },
        "TargetArn": { "Fn::GetAtt" : ["MyStream", "Arn"] },
        "DestinationPolicy": { "Fn::Join" : ["",[
		
				"{\"Version\" : \"2012-10-17\",\"Statement\" : [{\"Effect\" : \"Allow\",",
                " \"Principal\" : {\"AWS\" : \"", {"Ref":"SourceAccount"} ,"\"},",
                "\"Action\" : \"logs:PutSubscriptionFilter\",",
                " \"Resource\" : \"", 
                {"Fn::Join": [ "", [ "arn:aws:logs:", { "Ref": "AWS::Region" }, ":" ,{ "Ref": "AWS::AccountId" }, ":destination:",{ "Ref" : "AWS::StackName" },"-Destination" ] ]}  ,"\"}]}"

			]]}
          
          
      }
    },
      
    "S3deliveryStream": {
      "DependsOn": ["S3deliveryRole", "S3deliveryPolicy"],
      "Type": "AWS::KinesisFirehose::DeliveryStream",
      "Properties": {
        "DeliveryStreamName": {"Fn::Join" : [ "", [{ "Ref" : "AWS::StackName" },"-DeliveryStream"] ]},
        "DeliveryStreamType": "KinesisStreamAsSource",
        "KinesisStreamSourceConfiguration": {
            "KinesisStreamARN": { "Fn::GetAtt" : ["MyStream", "Arn"] },
            "RoleARN": {"Fn::GetAtt" : ["S3deliveryRole", "Arn"] }
        },
        "ExtendedS3DestinationConfiguration": {
          "BucketARN": {"Fn::Join" : [ "", ["arn:aws:s3:::",{"Ref":"LogBucketName"}] ]},
          "BufferingHints": {
            "IntervalInSeconds": "60",
            "SizeInMBs": "50"
          },
          "CompressionFormat": "UNCOMPRESSED",
          "Prefix": {"Ref": "LogS3Location"},
          "RoleARN": {"Fn::GetAtt" : ["S3deliveryRole", "Arn"] },
          "ProcessingConfiguration" : {
              "Enabled": "true",
              "Processors": [
              {
                "Parameters": [ 
                { 
                    "ParameterName": "LambdaArn",
                    "ParameterValue": {"Ref":"ProcessingLambdaARN"}
                }],
                "Type": "Lambda"
              }]
          }
        }

      }
    },
      
    "S3deliveryRole": {
      "Type": "AWS::IAM::Role",
      "Properties": {
        "AssumeRolePolicyDocument": {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Sid": "",
              "Effect": "Allow",
              "Principal": {
                "Service": "firehose.amazonaws.com"
              },
              "Action": "sts:AssumeRole",
              "Condition": {
                "StringEquals": {
                  "sts:ExternalId": {"Ref":"AWS::AccountId"}
                }
              }
            }
          ]
        }
      }
    },
      
    "S3deliveryPolicy": {
      "Type": "AWS::IAM::Policy",
      "Properties": {
        "PolicyName": {"Fn::Join" : [ "", [{ "Ref" : "AWS::StackName" },"-FirehosePolicy"] ]},
        "PolicyDocument": {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Effect": "Allow",
              "Action": [
                "s3:AbortMultipartUpload",
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:PutObject"
              ],
              "Resource": [
                {"Fn::Join": ["", [ {"Fn::Join" : [ "", ["arn:aws:s3:::",{"Ref":"LogBucketName"}] ]}]]},
                {"Fn::Join": ["", [ {"Fn::Join" : [ "", ["arn:aws:s3:::",{"Ref":"LogBucketName"}] ]}, "*"]]}
              ]
            },
            {
              "Effect": "Allow",
              "Action": [
                "lambda:InvokeFunction",
                "lambda:GetFunctionConfiguration",
                "logs:PutLogEvents",
                "kinesis:DescribeStream",
                "kinesis:GetShardIterator",
                "kinesis:GetRecords",
                "kms:Decrypt"
              ],
              "Resource": "*"
            }
          ]
        },
        "Roles": [{"Ref": "S3deliveryRole"}]
      }
    }

  },
  "Outputs":{
      
   "Destination" : {
      "Description": "Destination",
      "Value": {"Fn::Join": [ "", [ "arn:aws:logs:", { "Ref": "AWS::Region" }, ":" ,{ "Ref": "AWS::AccountId" }, ":destination:",{ "Ref" : "AWS::StackName" },"-Destination" ] ]},
      "Export" : { "Name" : {"Fn::Sub": "${AWS::StackName}-Destination" }}
    }

  }
} 

To create log your destination and all required resources, follow these steps:

  1. Save your template as “central-logging-destination.json”
  2. Login to your logging account and, from the CloudFormation console, select “create new stack”
  3. Import the file “central-logging-destination.json” and click next
  4. Fill in the parameters to configure the log destination and click Next
  5. Follow the default steps to create the stack and verify successful creation
    1. Bucket name is the same as in the “create central logging bucket” step
    2. LogS3Location is the directory hierarchy for saving log data that will be delivered to this destination
    3. ProcessingLambdaARN is as created in “create data processing Lambda function” step
    4. SourceAccount is the application account number where the subscription will be created
  6. Take a note of destination ARN as it appears in outputs section as you did above.

Step 4: Create the log subscription in your application account

In this section, we will create the subscription filter in one of the application accounts to stream logs from the CloudWatch log group to the log destination that was created in your logging account.

Create log subscription filter

The subscription filter is created between the CloudWatch log group and a destination endpoint. Asubscription could be filtered to send part (or all) of the logs in the log group. For example,you can create a subscription filter to stream only flow logs with status REJECT.

Use the CloudFormation template below to create subscription filter. Subscription filter and log destination must be in the same region.

{
  "AWSTemplateFormatVersion":"2010-09-09",
  "Description": "Create log subscription filter for a specific Log Group",
  "Parameters":{

    "DestinationARN":{
      "Type":"String",
      "Default":"",
      "Description":"ARN of logs destination"
    },
    "LogGroupName":{
      "Type":"String",
      "Default":"",
      "Description":"Name of LogGroup to forward logs from"
    },
    "FilterPattern":{
      "Type":"String",
      "Default":"",
      "Description":"Filter pattern to filter events to be sent to log destination; Leave empty to send all logs"
    }
  },
    
  "Resources":{
    "SubscriptionFilter" : {
      "Type" : "AWS::Logs::SubscriptionFilter",
      "Properties" : {
        "LogGroupName" : { "Ref" : "LogGroupName" },
        "FilterPattern" : { "Ref" : "FilterPattern" },
        "DestinationArn" : { "Ref" : "DestinationARN" }
      }
    }
  }
}

To create a subscription filter for one of CloudWatch log groups in your application account, follow the steps below:

  1. Save the template as “central-logging-subscription.json”
  2. Login to your application account and, from the CloudFormation console, select “create new stack”
  3. Select the file “central-logging-subscription.json” and click next
  4. Fill in the parameters as appropriate to your environment as you did above
    a.  DestinationARN is the value of obtained in “create log destination in logging account” step
    b.  FilterPatterns is the filter value for log data to be streamed to your logging account (leave empty to stream all logs in the selected log group)
    c.  LogGroupName is the log group as it appears under CloudWatch Logs
  5. Verify successful creation of the subscription

This completes the deployment process in both the logging- and application-account side. After a few minutes, log data will be streamed to the central-logging destination defined in your logging account.

Step 5: Analyzing log data

Once log data is centralized, it opens the door to run analytics on the consolidated data for business or security reasons. One of the powerful services that AWS offers is Amazon Athena.

Amazon Athena allows you to query data in S3 using standard SQL.

Follow the steps below to create a simple table and run queries on the flow logs data that has been collected from your application accounts

  1. Login to your logging account and from the Amazon Athena console, use the DDL below in your query  editor to create a new table

CREATE EXTERNAL TABLE IF NOT EXISTS prod_vpc_flow_logs (

Version INT,

Account STRING,

InterfaceId STRING,

SourceAddress STRING,

DestinationAddress STRING,

SourcePort INT,

DestinationPort INT,

Protocol INT,

Packets INT,

Bytes INT,

StartTime INT,

EndTime INT,

Action STRING,

LogStatus STRING

)

ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.RegexSerDe’

WITH SERDEPROPERTIES (

“input.regex” = “^([^ ]+)\\s+([0-9]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([0-9]+)\\s+([0-9]+)\\s+([^ ]+)\\s+([^ ]+)$”)

LOCATION ‘s3://central-logging-company-do-not-delete/’;

2. Click ”run query” and verify a successful run/ This creates the table “prod_vpc_flow_logs”

3. You can then run queries against the table data as below:

Conclusion

By following the steps I’ve outlined, you will build a central logging solution to stream CloudWatch logs from one application account to a central logging account. This solution is repeatable and could be deployed multiple times for multiple accounts and logging requirements.

 

About the Author

Mahmoud Matouk is a Senior Cloud Infrastructure Architect. He works with our customers to help accelerate migration and cloud adoption at the enterprise level.

 

How I built a data warehouse using Amazon Redshift and AWS services in record time

Post Syndicated from Stephen Borg original https://aws.amazon.com/blogs/big-data/how-i-built-a-data-warehouse-using-amazon-redshift-and-aws-services-in-record-time/

This is a customer post by Stephen Borg, the Head of Big Data and BI at Cerberus Technologies.

Cerberus Technologies, in their own words: Cerberus is a company founded in 2017 by a team of visionary iGaming veterans. Our mission is simple – to offer the best tech solutions through a data-driven and a customer-first approach, delivering innovative solutions that go against traditional forms of working and process. This mission is based on the solid foundations of reliability, flexibility and security, and we intend to fundamentally change the way iGaming and other industries interact with technology.

Over the years, I have developed and created a number of data warehouses from scratch. Recently, I built a data warehouse for the iGaming industry single-handedly. To do it, I used the power and flexibility of Amazon Redshift and the wider AWS data management ecosystem. In this post, I explain how I was able to build a robust and scalable data warehouse without the large team of experts typically needed.

In two of my recent projects, I ran into challenges when scaling our data warehouse using on-premises infrastructure. Data was growing at many tens of gigabytes per day, and query performance was suffering. Scaling required major capital investment for hardware and software licenses, and also significant operational costs for maintenance and technical staff to keep it running and performing well. Unfortunately, I couldn’t get the resources needed to scale the infrastructure with data growth, and these projects were abandoned. Thanks to cloud data warehousing, the bottleneck of infrastructure resources, capital expense, and operational costs have been significantly reduced or have totally gone away. There is no more excuse for allowing obstacles of the past to delay delivering timely insights to decision makers, no matter how much data you have.

With Amazon Redshift and AWS, I delivered a cloud data warehouse to the business very quickly, and with a small team: me. I didn’t have to order hardware or software, and I no longer needed to install, configure, tune, or keep up with patches and version updates. Instead, I easily set up a robust data processing pipeline and we were quickly ingesting and analyzing data. Now, my data warehouse team can be extremely lean, and focus more time on bringing in new data and delivering insights. In this post, I show you the AWS services and the architecture that I used.

Handling data feeds

I have several different data sources that provide everything needed to run the business. The data includes activity from our iGaming platform, social media posts, clickstream data, marketing and campaign performance, and customer support engagements.

To handle the diversity of data feeds, I developed abstract integration applications using Docker that run on Amazon EC2 Container Service (Amazon ECS) and feed data to Amazon Kinesis Data Streams. These data streams can be used for real time analytics. In my system, each record in Kinesis is preprocessed by an AWS Lambda function to cleanse and aggregate information. My system then routes it to be stored where I need on Amazon S3 by Amazon Kinesis Data Firehose. Suppose that you used an on-premises architecture to accomplish the same task. A team of data engineers would be required to maintain and monitor a Kafka cluster, develop applications to stream data, and maintain a Hadoop cluster and the infrastructure underneath it for data storage. With my stream processing architecture, there are no servers to manage, no disk drives to replace, and no service monitoring to write.

Setting up a Kinesis stream can be done with a few clicks, and the same for Kinesis Firehose. Firehose can be configured to automatically consume data from a Kinesis Data Stream, and then write compressed data every N minutes to Amazon S3. When I want to process a Kinesis data stream, it’s very easy to set up a Lambda function to be executed on each message received. I can just set a trigger from the AWS Lambda Management Console, as shown following.

I also monitor the duration of function execution using Amazon CloudWatch and AWS X-Ray.

Regardless of the format I receive the data from our partners, I can send it to Kinesis as JSON data using my own formatters. After Firehose writes this to Amazon S3, I have everything in nearly the same structure I received but compressed, encrypted, and optimized for reading.

This data is automatically crawled by AWS Glue and placed into the AWS Glue Data Catalog. This means that I can immediately query the data directly on S3 using Amazon Athena or through Amazon Redshift Spectrum. Previously, I used Amazon EMR and an Amazon RDS–based metastore in Apache Hive for catalog management. Now I can avoid the complexity of maintaining Hive Metastore catalogs. Glue takes care of high availability and the operations side so that I know that end users can always be productive.

Working with Amazon Athena and Amazon Redshift for analysis

I found Amazon Athena extremely useful out of the box for ad hoc analysis. Our engineers (me) use Athena to understand new datasets that we receive and to understand what transformations will be needed for long-term query efficiency.

For our data analysts and data scientists, we’ve selected Amazon Redshift. Amazon Redshift has proven to be the right tool for us over and over again. It easily processes 20+ million transactions per day, regardless of the footprint of the tables and the type of analytics required by the business. Latency is low and query performance expectations have been more than met. We use Redshift Spectrum for long-term data retention, which enables me to extend the analytic power of Amazon Redshift beyond local data to anything stored in S3, and without requiring me to load any data. Redshift Spectrum gives me the freedom to store data where I want, in the format I want, and have it available for processing when I need it.

To load data directly into Amazon Redshift, I use AWS Data Pipeline to orchestrate data workflows. I create Amazon EMR clusters on an intra-day basis, which I can easily adjust to run more or less frequently as needed throughout the day. EMR clusters are used together with Amazon RDS, Apache Spark 2.0, and S3 storage. The data pipeline application loads ETL configurations from Spring RESTful services hosted on AWS Elastic Beanstalk. The application then loads data from S3 into memory, aggregates and cleans the data, and then writes the final version of the data to Amazon Redshift. This data is then ready to use for analysis. Spark on EMR also helps with recommendations and personalization use cases for various business users, and I find this easy to set up and deliver what users want. Finally, business users use Amazon QuickSight for self-service BI to slice, dice, and visualize the data depending on their requirements.

Each AWS service in this architecture plays its part in saving precious time that’s crucial for delivery and getting different departments in the business on board. I found the services easy to set up and use, and all have proven to be highly reliable for our use as our production environments. When the architecture was in place, scaling out was either completely handled by the service, or a matter of a simple API call, and crucially doesn’t require me to change one line of code. Increasing shards for Kinesis can be done in a minute by editing a stream. Increasing capacity for Lambda functions can be accomplished by editing the megabytes allocated for processing, and concurrency is handled automatically. EMR cluster capacity can easily be increased by changing the master and slave node types in Data Pipeline, or by using Auto Scaling. Lastly, RDS and Amazon Redshift can be easily upgraded without any major tasks to be performed by our team (again, me).

In the end, using AWS services including Kinesis, Lambda, Data Pipeline, and Amazon Redshift allows me to keep my team lean and highly productive. I eliminated the cost and delays of capital infrastructure, as well as the late night and weekend calls for support. I can now give maximum value to the business while keeping operational costs down. My team pushed out an agile and highly responsive data warehouse solution in record time and we can handle changing business requirements rapidly, and quickly adapt to new data and new user requests.


Additional Reading

If you found this post useful, be sure to check out Deploy a Data Warehouse Quickly with Amazon Redshift, Amazon RDS for PostgreSQL and Tableau Server and Top 8 Best Practices for High-Performance ETL Processing Using Amazon Redshift.


About the Author

Stephen Borg is the Head of Big Data and BI at Cerberus Technologies. He has a background in platform software engineering, and first became involved in data warehousing using the typical RDBMS, SQL, ETL, and BI tools. He quickly became passionate about providing insight to help others optimize the business and add personalization to products. He is now the Head of Big Data and BI at Cerberus Technologies.

 

 

 

Migrating Your Amazon ECS Containers to AWS Fargate

Post Syndicated from Tiffany Jernigan original https://aws.amazon.com/blogs/compute/migrating-your-amazon-ecs-containers-to-aws-fargate/

AWS Fargate is a new technology that works with Amazon Elastic Container Service (ECS) to run containers without having to manage servers or clusters. What does this mean? With Fargate, you no longer need to provision or manage a single virtual machine; you can just create tasks and run them directly!

Fargate uses the same API actions as ECS, so you can use the ECS console, the AWS CLI, or the ECS CLI. I recommend running through the first-run experience for Fargate even if you’re familiar with ECS. It creates all of the one-time setup requirements, such as the necessary IAM roles. If you’re using a CLI, make sure to upgrade to the latest version

In this blog, you will see how to migrate ECS containers from running on Amazon EC2 to Fargate.

Getting started

Note: Anything with code blocks is a change in the task definition file. Screen captures are from the console. Additionally, Fargate is currently available in the us-east-1 (N. Virginia) region.

Launch type

When you create tasks (grouping of containers) and clusters (grouping of tasks), you now have two launch type options: EC2 and Fargate. The default launch type, EC2, is ECS as you knew it before the announcement of Fargate. You need to specify Fargate as the launch type when running a Fargate task.

Even though Fargate abstracts away virtual machines, tasks still must be launched into a cluster. With Fargate, clusters are a logical infrastructure and permissions boundary that allow you to isolate and manage groups of tasks. ECS also supports heterogeneous clusters that are made up of tasks running on both EC2 and Fargate launch types.

The optional, new requiresCompatibilities parameter with FARGATE in the field ensures that your task definition only passes validation if you include Fargate-compatible parameters. Tasks can be flagged as compatible with EC2, Fargate, or both.

"requiresCompatibilities": [
    "FARGATE"
]

Networking

"networkMode": "awsvpc"

In November, we announced the addition of task networking with the network mode awsvpc. By default, ECS uses the bridge network mode. Fargate requires using the awsvpc network mode.

In bridge mode, all of your tasks running on the same instance share the instance’s elastic network interface, which is a virtual network interface, IP address, and security groups.

The awsvpc mode provides this networking support to your tasks natively. You now get the same VPC networking and security controls at the task level that were previously only available with EC2 instances. Each task gets its own elastic networking interface and IP address so that multiple applications or copies of a single application can run on the same port number without any conflicts.

The awsvpc mode also provides a separation of responsibility for tasks. You can get complete control of task placement within your own VPCs, subnets, and the security policies associated with them, even though the underlying infrastructure is managed by Fargate. Also, you can assign different security groups to each task, which gives you more fine-grained security. You can give an application only the permissions it needs.

"portMappings": [
    {
        "containerPort": "3000"
    }
 ]

What else has to change? First, you only specify a containerPort value, not a hostPort value, as there is no host to manage. Your container port is the port that you access on your elastic network interface IP address. Therefore, your container ports in a single task definition file need to be unique.

"environment": [
    {
        "name": "WORDPRESS_DB_HOST",
        "value": "127.0.0.1:3306"
    }
 ]

Additionally, links are not allowed as they are a property of the “bridge” network mode (and are now a legacy feature of Docker). Instead, containers share a network namespace and communicate with each other over the localhost interface. They can be referenced using the following:

localhost/127.0.0.1:<some_port_number>

CPU and memory

"memory": "1024",
 "cpu": "256"

"memory": "1gb",
 "cpu": ".25vcpu"

When launching a task with the EC2 launch type, task performance is influenced by the instance types that you select for your cluster combined with your task definition. If you pick larger instances, your applications make use of the extra resources if there is no contention.

In Fargate, you needed a way to get additional resource information so we created task-level resources. Task-level resources define the maximum amount of memory and cpu that your task can consume.

  • memory can be defined in MB with just the number, or in GB, for example, “1024” or “1gb”.
  • cpu can be defined as the number or in vCPUs, for example, “256” or “.25vcpu”.
    • vCPUs are virtual CPUs. You can look at the memory and vCPUs for instance types to get an idea of what you may have used before.

The memory and CPU options available with Fargate are:

CPU Memory
256 (.25 vCPU) 0.5GB, 1GB, 2GB
512 (.5 vCPU) 1GB, 2GB, 3GB, 4GB
1024 (1 vCPU) 2GB, 3GB, 4GB, 5GB, 6GB, 7GB, 8GB
2048 (2 vCPU) Between 4GB and 16GB in 1GB increments
4096 (4 vCPU) Between 8GB and 30GB in 1GB increments

IAM roles

Because Fargate uses awsvpc mode, you need an Amazon ECS service-linked IAM role named AWSServiceRoleForECS. It provides Fargate with the needed permissions, such as the permission to attach an elastic network interface to your task. After you create your service-linked IAM role, you can delete the remaining roles in your services.

"executionRoleArn": "arn:aws:iam::<your_account_id>:role/ecsTaskExecutionRole"

With the EC2 launch type, an instance role gives the agent the ability to pull, publish, talk to ECS, and so on. With Fargate, the task execution IAM role is only needed if you’re pulling from Amazon ECR or publishing data to Amazon CloudWatch Logs.

The Fargate first-run experience tutorial in the console automatically creates these roles for you.

Volumes

Fargate currently supports non-persistent, empty data volumes for containers. When you define your container, you no longer use the host field and only specify a name.

Load balancers

For awsvpc mode, and therefore for Fargate, use the IP target type instead of the instance target type. You define this in the Amazon EC2 service when creating a load balancer.

If you’re using a Classic Load Balancer, change it to an Application Load Balancer or a Network Load Balancer.

Tip: If you are using an Application Load Balancer, make sure that your tasks are launched in the same VPC and Availability Zones as your load balancer.

Let’s migrate a task definition!

Here is an example NGINX task definition. This type of task definition is what you’re used to if you created one before Fargate was announced. It’s what you would run now with the EC2 launch type.

{
    "containerDefinitions": [
        {
            "name": "nginx",
            "image": "nginx",
            "memory": "512",
            "cpu": "100",
            "essential": true,
            "portMappings": [
                {
                    "hostPort": "80",
                    "containerPort": "80",
                    "protocol": "tcp"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        }
    ],
    "family": "nginx-ec2"
}

OK, so now what do you need to do to change it to run with the Fargate launch type?

  • Add FARGATE for requiredCompatibilities (not required, but a good safety check for your task definition).
  • Use awsvpc as the network mode.
  • Just specify the containerPort (the hostPortvalue is the same).
  • Add a task executionRoleARN value to allow logging to CloudWatch.
  • Provide cpu and memory limits for the task.
{
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "containerDefinitions": [
        {
            "name": "nginx",
            "image": "nginx",
            "memory": "512",
            "cpu": "100",
            "essential": true,
            "portMappings": [
                {
                    "containerPort": "80",
                    "protocol": "tcp"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        }
    ],
    "networkMode": "awsvpc",
    "executionRoleArn": "arn:aws:iam::<your_account_id>:role/ecsTaskExecutionRole",
    "family": "nginx-fargate",
    "memory": "512",
    "cpu": "256"
}

Are there more examples?

Yep! Head to the AWS Samples GitHub repo. We have several sample task definitions you can try for both the EC2 and Fargate launch types. Contributions are very welcome too :).

 

tiffany jernigan
@tiffanyfayj

Reactive Microservices Architecture on AWS

Post Syndicated from Sascha Moellering original https://aws.amazon.com/blogs/architecture/reactive-microservices-architecture-on-aws/

Microservice-application requirements have changed dramatically in recent years. These days, applications operate with petabytes of data, need almost 100% uptime, and end users expect sub-second response times. Typical N-tier applications can’t deliver on these requirements.

Reactive Manifesto, published in 2014, describes the essential characteristics of reactive systems including: responsiveness, resiliency, elasticity, and being message driven.

Being message driven is perhaps the most important characteristic of reactive systems. Asynchronous messaging helps in the design of loosely coupled systems, which is a key factor for scalability. In order to build a highly decoupled system, it is important to isolate services from each other. As already described, isolation is an important aspect of the microservices pattern. Indeed, reactive systems and microservices are a natural fit.

Implemented Use Case
This reference architecture illustrates a typical ad-tracking implementation.

Many ad-tracking companies collect massive amounts of data in near-real-time. In many cases, these workloads are very spiky and heavily depend on the success of the ad-tech companies’ customers. Typically, an ad-tracking-data use case can be separated into a real-time part and a non-real-time part. In the real-time part, it is important to collect data as fast as possible and ask several questions including:,  “Is this a valid combination of parameters?,””Does this program exist?,” “Is this program still valid?”

Because response time has a huge impact on conversion rate in advertising, it is important for advertisers to respond as fast as possible. This information should be kept in memory to reduce communication overhead with the caching infrastructure. The tracking application itself should be as lightweight and scalable as possible. For example, the application shouldn’t have any shared mutable state and it should use reactive paradigms. In our implementation, one main application is responsible for this real-time part. It collects and validates data, responds to the client as fast as possible, and asynchronously sends events to backend systems.

The non-real-time part of the application consumes the generated events and persists them in a NoSQL database. In a typical tracking implementation, clicks, cookie information, and transactions are matched asynchronously and persisted in a data store. The matching part is not implemented in this reference architecture. Many ad-tech architectures use frameworks like Hadoop for the matching implementation.

The system can be logically divided into the data collection partand the core data updatepart. The data collection part is responsible for collecting, validating, and persisting the data. In the core data update part, the data that is used for validation gets updated and all subscribers are notified of new data.

Components and Services

Main Application
The main application is implemented using Java 8 and uses Vert.x as the main framework. Vert.x is an event-driven, reactive, non-blocking, polyglot framework to implement microservices. It runs on the Java virtual machine (JVM) by using the low-level IO library Netty. You can write applications in Java, JavaScript, Groovy, Ruby, Kotlin, Scala, and Ceylon. The framework offers a simple and scalable actor-like concurrency model. Vert.x calls handlers by using a thread known as an event loop. To use this model, you have to write code known as “verticles.” Verticles share certain similarities with actors in the actor model. To use them, you have to implement the verticle interface. Verticles communicate with each other by generating messages in  a single event bus. Those messages are sent on the event bus to a specific address, and verticles can register to this address by using handlers.

With only a few exceptions, none of the APIs in Vert.x block the calling thread. Similar to Node.js, Vert.x uses the reactor pattern. However, in contrast to Node.js, Vert.x uses several event loops. Unfortunately, not all APIs in the Java ecosystem are written asynchronously, for example, the JDBC API. Vert.x offers a possibility to run this, blocking APIs without blocking the event loop. These special verticles are called worker verticles. You don’t execute worker verticles by using the standard Vert.x event loops, but by using a dedicated thread from a worker pool. This way, the worker verticles don’t block the event loop.

Our application consists of five different verticles covering different aspects of the business logic. The main entry point for our application is the HttpVerticle, which exposes an HTTP-endpoint to consume HTTP-requests and for proper health checking. Data from HTTP requests such as parameters and user-agent information are collected and transformed into a JSON message. In order to validate the input data (to ensure that the program exists and is still valid), the message is sent to the CacheVerticle.

This verticle implements an LRU-cache with a TTL of 10 minutes and a capacity of 100,000 entries. Instead of adding additional functionality to a standard JDK map implementation, we use Google Guava, which has all the features we need. If the data is not in the L1 cache, the message is sent to the RedisVerticle. This verticle is responsible for data residing in Amazon ElastiCache and uses the Vert.x-redis-client to read data from Redis. In our example, Redis is the central data store. However, in a typical production implementation, Redis would just be the L2 cache with a central data store like Amazon DynamoDB. One of the most important paradigms of a reactive system is to switch from a pull- to a push-based model. To achieve this and reduce network overhead, we’ll use Redis pub/sub to push core data changes to our main application.

Vert.x also supports direct Redis pub/sub-integration, the following code shows our subscriber-implementation:

vertx.eventBus().<JsonObject>consumer(REDIS_PUBSUB_CHANNEL_VERTX, received -> {

JsonObject value = received.body().getJsonObject("value");

String message = value.getString("message");

JsonObject jsonObject = new JsonObject(message);

eb.send(CACHE_REDIS_EVENTBUS_ADDRESS, jsonObject);

});

redis.subscribe(Constants.REDIS_PUBSUB_CHANNEL, res -> {

if (res.succeeded()) {

LOGGER.info("Subscribed to " + Constants.REDIS_PUBSUB_CHANNEL);

} else {

LOGGER.info(res.cause());

}

});

The verticle subscribes to the appropriate Redis pub/sub-channel. If a message is sent over this channel, the payload is extracted and forwarded to the cache-verticle that stores the data in the L1-cache. After storing and enriching data, a response is sent back to the HttpVerticle, which responds to the HTTP request that initially hit this verticle. In addition, the message is converted to ByteBuffer, wrapped in protocol buffers, and send to an Amazon Kinesis Data Stream.

The following example shows a stripped-down version of the KinesisVerticle:

public class KinesisVerticle extends AbstractVerticle {

private static final Logger LOGGER = LoggerFactory.getLogger(KinesisVerticle.class);

private AmazonKinesisAsync kinesisAsyncClient;

private String eventStream = "EventStream";

@Override

public void start() throws Exception {

EventBus eb = vertx.eventBus();

kinesisAsyncClient = createClient();

eventStream = System.getenv(STREAM_NAME) == null ? "EventStream" : System.getenv(STREAM_NAME);

eb.consumer(Constants.KINESIS_EVENTBUS_ADDRESS, message -> {

try {

TrackingMessage trackingMessage = Json.decodeValue((String)message.body(), TrackingMessage.class);

String partitionKey = trackingMessage.getMessageId();

byte [] byteMessage = createMessage(trackingMessage);

ByteBuffer buf = ByteBuffer.wrap(byteMessage);

sendMessageToKinesis(buf, partitionKey);

message.reply("OK");

}

catch (KinesisException exc) {

LOGGER.error(exc);

}

});

}

Kinesis Consumer
This AWS Lambda function consumes data from an Amazon Kinesis Data Stream and persists the data in an Amazon DynamoDB table. In order to improve testability, the invocation code is separated from the business logic. The invocation code is implemented in the class KinesisConsumerHandler and iterates over the Kinesis events pulled from the Kinesis stream by AWS Lambda. Each Kinesis event is unwrapped and transformed from ByteBuffer to protocol buffers and converted into a Java object. Those Java objects are passed to the business logic, which persists the data in a DynamoDB table. In order to improve duration of successive Lambda calls, the DynamoDB-client is instantiated lazily and reused if possible.

Redis Updater
From time to time, it is necessary to update core data in Redis. A very efficient implementation for this requirement is using AWS Lambda and Amazon Kinesis. New core data is sent over the AWS Kinesis stream using JSON as data format and consumed by a Lambda function. This function iterates over the Kinesis events pulled from the Kinesis stream by AWS Lambda. Each Kinesis event is unwrapped and transformed from ByteBuffer to String and converted into a Java object. The Java object is passed to the business logic and stored in Redis. In addition, the new core data is also sent to the main application using Redis pub/sub in order to reduce network overhead and converting from a pull- to a push-based model.

The following example shows the source code to store data in Redis and notify all subscribers:

public void updateRedisData(final TrackingMessage trackingMessage, final Jedis jedis, final LambdaLogger logger) {

try {

ObjectMapper mapper = new ObjectMapper();

String jsonString = mapper.writeValueAsString(trackingMessage);

Map<String, String> map = marshal(jsonString);

String statusCode = jedis.hmset(trackingMessage.getProgramId(), map);

}

catch (Exception exc) {

if (null == logger)

exc.printStackTrace();

else

logger.log(exc.getMessage());

}

}

public void notifySubscribers(final TrackingMessage trackingMessage, final Jedis jedis, final LambdaLogger logger) {

try {

ObjectMapper mapper = new ObjectMapper();

String jsonString = mapper.writeValueAsString(trackingMessage);

jedis.publish(Constants.REDIS_PUBSUB_CHANNEL, jsonString);

}

catch (final IOException e) {

log(e.getMessage(), logger);

}

}

Similarly to our Kinesis Consumer, the Redis-client is instantiated somewhat lazily.

Infrastructure as Code
As already outlined, latency and response time are a very critical part of any ad-tracking solution because response time has a huge impact on conversion rate. In order to reduce latency for customers world-wide, it is common practice to roll out the infrastructure in different AWS Regions in the world to be as close to the end customer as possible. AWS CloudFormation can help you model and set up your AWS resources so that you can spend less time managing those resources and more time focusing on your applications that run in AWS.

You create a template that describes all the AWS resources that you want (for example, Amazon EC2 instances or Amazon RDS DB instances), and AWS CloudFormation takes care of provisioning and configuring those resources for you. Our reference architecture can be rolled out in different Regions using an AWS CloudFormation template, which sets up the complete infrastructure (for example, Amazon Virtual Private Cloud (Amazon VPC), Amazon Elastic Container Service (Amazon ECS) cluster, Lambda functions, DynamoDB table, Amazon ElastiCache cluster, etc.).

Conclusion
In this blog post we described reactive principles and an example architecture with a common use case. We leveraged the capabilities of different frameworks in combination with several AWS services in order to implement reactive principles—not only at the application-level but also at the system-level. I hope I’ve given you ideas for creating your own reactive applications and systems on AWS.

About the Author

Sascha Moellering is a Senior Solution Architect. Sascha is primarily interested in automation, infrastructure as code, distributed computing, containers and JVM. He can be reached at [email protected]