Tag Archives: microservices

New Network Load Balancer – Effortless Scaling to Millions of Requests per Second

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-network-load-balancer-effortless-scaling-to-millions-of-requests-per-second/

Elastic Load Balancing (ELB)) has been an important part of AWS since 2009, when it was launched as part of a three-pack that also included Auto Scaling and Amazon CloudWatch. Since that time we have added many features, and also introduced the Application Load Balancer. Designed to support application-level, content-based routing to applications that run in containers, Application Load Balancers pair well with microservices, streaming, and real-time workloads.

Over the years, our customers have used ELB to support web sites and applications that run at almost any scale — from simple sites running on a T2 instance or two, all the way up to complex applications that run on large fleets of higher-end instances and handle massive amounts of traffic. Behind the scenes, ELB monitors traffic and automatically scales to meet demand. This process, which includes a generous buffer of headroom, has become quicker and more responsive over the years and works well even for our customers who use ELB to support live broadcasts, “flash” sales, and holidays. However, in some situations such as instantaneous fail-over between regions, or extremely spiky workloads, we have worked with our customers to pre-provision ELBs in anticipation of a traffic surge.

New Network Load Balancer
Today we are introducing the new Network Load Balancer (NLB). It is designed to handle tens of millions of requests per second while maintaining high throughput at ultra low latency, with no effort on your part. The Network Load Balancer is API-compatible with the Application Load Balancer, including full programmatic control of Target Groups and Targets. Here are some of the most important features:

Static IP Addresses – Each Network Load Balancer provides a single IP address for each VPC subnet in its purview. If you have targets in a subnet in us-west-2a and other targets in a subnet in us-west-2c, NLB will create and manage two IP addresses (one per subnet); connections to that IP address will spread traffic across the instances in the subnet. You can also specify an existing Elastic IP for each subnet for even greater control. With full control over your IP addresses, Network Load Balancer can be used in situations where IP addresses need to be hard-coded into DNS records, customer firewall rules, and so forth.

Zonality – The IP-per-subnet feature reduces latency with improved performance, improves availability through isolation and fault tolerance and makes the use of Network Load Balancers transparent to your client applications. Network Load Balancers also attempt to route a series of requests from a particular source to targets in a single subnet while still allowing automatic failover.

Source Address Preservation – With Network Load Balancer, the original source IP address and source ports for the incoming connections remain unmodified, so application software need not support X-Forwarded-For, proxy protocol, or other workarounds. This also means that normal firewall rules, including VPC Security Groups, can be used on targets.

Long-running Connections – NLB handles connections with built-in fault tolerance, and can handle connections that are open for months or years, making them a great fit for IoT, gaming, and messaging applications.

Failover – Powered by Route 53 health checks, NLB supports failover between IP addresses within and across regions.

Creating a Network Load Balancer
I can create a Network Load Balancer opening up the EC2 Console, selecting Load Balancers, and clicking on Create Load Balancer:

I choose Network Load Balancer and click on Create, then enter the details. I can choose an Elastic IP address for each subnet in the target VPC and I can tag the Network Load Balancer:

Then I click on Configure Routing and create a new target group. I enter a name, and then choose the protocol and port. I can also set up health checks that go to the traffic port or to the alternate of my choice:

Then I click on Register Targets and the EC2 instances that will receive traffic, and click on Add to registered:

I make sure that everything looks good and then click on Create:

The state of my new Load Balancer is provisioning, switching to active within a minute or so:

For testing purposes, I simply grab the DNS name of the Load Balancer from the console (in practice I would use Amazon Route 53 and a more friendly name):

Then I sent it a ton of traffic (I intended to let it run for just a second or two but got distracted and it created a huge number of processes, so this was a happy accident):

$ while true;
> do
>   wget http://nlb-1-6386cc6bf24701af.elb.us-west-2.amazonaws.com/phpinfo2.php &
> done

A more disciplined test would use a tool like Bees with Machine Guns, of course!

I took a quick break to let some traffic flow and then checked the CloudWatch metrics for my Load Balancer, finding that it was able to handle the sudden onslaught of traffic with ease:

I also looked at my EC2 instances to see how they were faring under the load (really well, it turns out):

It turns out that my colleagues did run a more disciplined test than I did. They set up a Network Load Balancer and backed it with an Auto Scaled fleet of EC2 instances. They set up a second fleet composed of hundreds of EC2 instances, each running Bees with Machine Guns and configured to generate traffic with highly variable request and response sizes. Beginning at 1.5 million requests per second, they quickly turned the dial all the way up, reaching over 3 million requests per second and 30 Gbps of aggregate bandwidth before maxing out their test resources.

Choosing a Load Balancer
As always, you should consider the needs of your application when you choose a load balancer. Here are some guidelines:

Network Load Balancer (NLB) – Ideal for load balancing of TCP traffic, NLB is capable of handling millions of requests per second while maintaining ultra-low latencies. NLB is optimized to handle sudden and volatile traffic patterns while using a single static IP address per Availability Zone.

Application Load Balancer (ALB) – Ideal for advanced load balancing of HTTP and HTTPS traffic, ALB provides advanced request routing that supports modern application architectures, including microservices and container-based applications.

Classic Load Balancer (CLB) – Ideal for applications that were built within the EC2-Classic network.

For a side-by-side feature comparison, see the Elastic Load Balancer Details table.

If you are currently using a Classic Load Balancer and would like to migrate to a Network Load Balancer, take a look at our new Load Balancer Copy Utility. This Python tool will help you to create a Network Load Balancer with the same configuration as an existing Classic Load Balancer. It can also register your existing EC2 instances with the new load balancer.

Pricing & Availability
Like the Application Load Balancer, pricing is based on Load Balancer Capacity Units, or LCUs. Billing is $0.006 per LCU, based on the highest value seen across the following dimensions:

  • Bandwidth – 1 GB per LCU.
  • New Connections – 800 per LCU.
  • Active Connections – 100,000 per LCU.

Most applications are bandwidth-bound and should see a cost reduction (for load balancing) of about 25% when compared to Application or Classic Load Balancers.

Network Load Balancers are available today in all AWS commercial regions except China (Beijing), supported by AWS CloudFormation, Auto Scaling, and Amazon ECS.



New – Application Load Balancing via IP Address to AWS & On-Premises Resources

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-application-load-balancing-via-ip-address-to-aws-on-premises-resources/

I told you about the new AWS Application Load Balancer last year and showed you how to use it to do implement Layer 7 (application) routing to EC2 instances and to microservices running in containers.

Some of our customers are building hybrid applications as part of a longer-term move to AWS. These customers have told us that they would like to use a single Application Load Balancer to spread traffic across a combination of existing on-premises resources and new resources running in the AWS Cloud. Other customers would like to spread traffic to web or database servers that are scattered across two or more Virtual Private Clouds (VPCs), host multiple services on the same instance with distinct IP addresses but a common port number, and to offer support for IP-based virtual hosting for clients that do not support Server Name Indication (SNI). Another group of customers would like to host multiple instances of a service on the same instance (perhaps within containers), while using multiple interfaces and security groups to implement fine-grained access control.

These situations arise within a broad set of hybrid, migration, disaster recovery, and on-premises use cases and scenarios.

Route to IP Addresses
In order to address these use cases, Application Load Balancers can now route traffic directly to IP addresses. These addresses can be in the same VPC as the ALB, a peer VPC in the same region, on an EC2 instance connected to a VPC by way of ClassicLink, or on on-premises resources at the other end of a VPN connection or AWS Direct Connect connection.

Application Load Balancers already group targets in to target groups. As part of today’s launch, each target group now has a target type attribute:

instance – Targets are registered by way of EC2 instance IDs, as before.

ip – Targets are registered as IP addresses. You can use any IPv4 address from the load balancer’s VPC CIDR for targets within load balancer’s VPC and any IPv4 address from the RFC 1918 ranges (,, and or the RFC 6598 range ( for targets located outside the load balancer’s VPC (this includes Peered VPC, EC2-Classic, and on-premises targets reachable over Direct Connect or VPN).

Each target group has a load balancer and health check configuration, and publishes metrics to CloudWatch, as has always been the case.

Let’s say that you are in the transition phase of an application migration to AWS or want to use AWS to augment on-premises resources with EC2 instances and you need to distribute application traffic across both your AWS and on-premises resources. You can achieve this by registering all the resources (AWS and on-premises) to the same target group and associate the target group with a load balancer. Alternatively, you can use DNS based weighted load balancing across AWS and on-premises resources using two load balancers i.e. one load balancer for AWS and other for on-premises resources. In the scenario where application-A back-ends are in VPC and application-B back-ends are in on-premises locations then you can put back-ends for each application in different target groups and use content based routing to route traffic to each target group.

Creating a Target Group
Here’s how I create a target group that sends traffic to some IP addresses as part of the process of creating an Application Load Balancer. I enter a name (ip-target-1) and select ip as the Target type:

Then I enter IP address targets. These can be from the VPC that hosts the load balancer:

Or they can be other private IP addresses within one of the private ranges listed above, for targets outside of the VPC that hosts the load balancer:

After I review the settings and create the load balancer, traffic will be sent to the designated IP addresses as soon as they pass the health checks. Each load balancer can accommodate up to 1000 targets.

I can examine my target group and edit the set of targets at any time:

As you can see, one of my targets was not healthy when I took this screen shot (this was by design). Metrics are published to CloudWatch for each target group; I can see them in the Console and I can create CloudWatch Alarms:

Available Now
This feature is available now and you can start using it today in all AWS Regions.



How Aussie ecommerce stores can compete with the retail giant Amazon

Post Syndicated from chris desantis original https://www.anchor.com.au/blog/2017/08/aussie-ecommerce-stores-vs-amazon/

The powerhouse Amazon retail store is set to launch in Australia toward the end of 2018 and Aussie ecommerce retailers need to ready themselves for the competition storm ahead.

2018 may seem a while away but getting your ecommerce site in tip top shape and ready to compete can take time. Check out these helpful hints from the Anchor crew.

Speed kills

If you’ve ever heard of the tale of the tortoise and the hare, the moral is that “slow and steady wins the race”. This is definitely not the place for that phrase, because if your site loads as slowly as a 1995 dial up connection, your ecommerce store will not, I repeat, will not win the race.

Site speed can be impacted by a number of factors and getting the balance right between a site that loads at lightning speed and delivering engaging content to your audience. There are many ways to check the performance of your site including Anchor’s free hosting check up or pingdom.

Taking action can boost the performance of your site:

Here’s an interesting blog from the WebCEO team about site speed’s impact on conversion rates on-page, or check out our previous blog on maximising site performance.

Show me the money

As an ecommerce store, getting credit card details as fast as possible is probably at the top of your list, but it’s important to remember that it’s an actual person that needs to hand over the details.

Consider the customer’s experience whilst checking out. Making people log in to their account before checkout, can lead to abandoned carts as customers try to remember the vital details. Similarly, making a customer enter all their details before displaying shipping costs is more of an annoyance than a benefit.

Built for growth

Before you blast out a promo email to your entire database or spend up big on PPC, consider what happens when this 5 fold increase in traffic, all jumps onto your site at around the same time.

Will your site come screeching to a sudden halt with a 504 or 408 error message, or ride high on the wave of increased traffic? If you have fixed infrastructure such as a dedicated server, or are utilising a VPS, then consider the maximum concurrent users that your site can handle.

Consider this. Amazon.com.au will be built on the scalable cloud infrastructure of Amazon Web Services and will utilise all the microservices and data mining technology to offer customers a seamless, personalised shopping experience. How will your business compete?

Search ready

Being found online is important for any business, but for ecommerce sites, it’s essential. Gaining results from SEO practices can take time so beware of ‘quick fix guarantees’ from outsourced agencies.

Search Engine Optimisation (SEO) practices can have lasting effects. Good practices can ensure your site is found via organic search without huge advertising budgets, on the other hand ‘black hat’ practices can push your ecommerce store into search oblivion.

SEO takes discipline and focus to get right. Here are some of our favourite hints for SEO greatness from those who live and breathe SEO:

  • Optimise your site for mobile
  • Use Meta Tags wisely
  • Leverage Descriptive alt tags and image file names
  • Create content for people, not bots (keyword stuffing is a no no!)

SEO best practices are continually evolving, but creating a site that is designed to give users a great experience and give them the content they expect to find.

Google My Business is a free service that EVERY business should take advantage of. It is a listing service where your business can provide details such as address, phone number, website, and trading hours. It’s easy to update and manage, you can add photos, a physical address (if applicable), and display shopper reviews.

Get your site ship shape

Overwhelmed by these starter tips? If you are ready to get your site into tip top shape–get in touch. We work with awesome partners like eWave who can help create a seamless online shopping experience.


The post How Aussie ecommerce stores can compete with the retail giant Amazon appeared first on AWS Managed Services by Anchor.

Deploying an NGINX Reverse Proxy Sidecar Container on Amazon ECS

Post Syndicated from Nathan Peck original https://aws.amazon.com/blogs/compute/nginx-reverse-proxy-sidecar-container-on-amazon-ecs/

Reverse proxies are a powerful software architecture primitive for fetching resources from a server on behalf of a client. They serve a number of purposes, from protecting servers from unwanted traffic to offloading some of the heavy lifting of HTTP traffic processing.

This post explains the benefits of a reverse proxy, and explains how to use NGINX and Amazon EC2 Container Service (Amazon ECS) to easily implement and deploy a reverse proxy for your containerized application.


NGINX is a high performance HTTP server that has achieved significant adoption because of its asynchronous event driven architecture. It can serve thousands of concurrent requests with a low memory footprint. This efficiency also makes it ideal as a reverse proxy.

Amazon ECS is a highly scalable, high performance container management service that supports Docker containers. It allows you to run applications easily on a managed cluster of Amazon EC2 instances. Amazon ECS helps you get your application components running on instances according to a specified configuration. It also helps scale out these components across an entire fleet of instances.

Sidecar containers are a common software pattern that has been embraced by engineering organizations. It’s a way to keep server side architecture easier to understand by building with smaller, modular containers that each serve a simple purpose. Just like an application can be powered by multiple microservices, each microservice can also be powered by multiple containers that work together. A sidecar container is simply a way to move part of the core responsibility of a service out into a containerized module that is deployed alongside a core application container.

The following diagram shows how an NGINX reverse proxy sidecar container operates alongside an application server container:

In this architecture, Amazon ECS has deployed two copies of an application stack that is made up of an NGINX reverse proxy side container and an application container. Web traffic from the public goes to an Application Load Balancer, which then distributes the traffic to one of the NGINX reverse proxy sidecars. The NGINX reverse proxy then forwards the request to the application server and returns its response to the client via the load balancer.

Reverse proxy for security

Security is one reason for using a reverse proxy in front of an application container. Any web server that serves resources to the public can expect to receive lots of unwanted traffic every day. Some of this traffic is relatively benign scans by researchers and tools, such as Shodan or nmap:

[18/May/2017:15:10:10 +0000] "GET /YesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScanningForResearchPurposePleaseHaveALookAtTheUserAgentTHXYesThisIsAReallyLongRequestURLbutWeAreDoingItOnPurposeWeAreScann HTTP/1.1" 404 1389 - Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36
[18/May/2017:18:19:51 +0000] "GET /clientaccesspolicy.xml HTTP/1.1" 404 322 - Cloud mapping experiment. Contact [email protected]

But other traffic is much more malicious. For example, here is what a web server sees while being scanned by the hacking tool ZmEu, which scans web servers trying to find PHPMyAdmin installations to exploit:

[18/May/2017:16:27:39 +0000] "GET /mysqladmin/scripts/setup.php HTTP/1.1" 404 391 - ZmEu
[18/May/2017:16:27:39 +0000] "GET /web/phpMyAdmin/scripts/setup.php HTTP/1.1" 404 394 - ZmEu
[18/May/2017:16:27:39 +0000] "GET /xampp/phpmyadmin/scripts/setup.php HTTP/1.1" 404 396 - ZmEu
[18/May/2017:16:27:40 +0000] "GET /apache-default/phpmyadmin/scripts/setup.php HTTP/1.1" 404 405 - ZmEu
[18/May/2017:16:27:40 +0000] "GET /phpMyAdmin- HTTP/1.1" 404 397 - ZmEu
[18/May/2017:16:27:40 +0000] "GET /mysql/scripts/setup.php HTTP/1.1" 404 386 - ZmEu
[18/May/2017:16:27:41 +0000] "GET /admin/scripts/setup.php HTTP/1.1" 404 386 - ZmEu
[18/May/2017:16:27:41 +0000] "GET /forum/phpmyadmin/scripts/setup.php HTTP/1.1" 404 396 - ZmEu
[18/May/2017:16:27:41 +0000] "GET /typo3/phpmyadmin/scripts/setup.php HTTP/1.1" 404 396 - ZmEu
[18/May/2017:16:27:42 +0000] "GET /phpMyAdmin- HTTP/1.1" 404 399 - ZmEu
[18/May/2017:16:27:44 +0000] "GET /administrator/components/com_joommyadmin/phpmyadmin/scripts/setup.php HTTP/1.1" 404 418 - ZmEu
[18/May/2017:18:34:45 +0000] "GET /phpmyadmin/scripts/setup.php HTTP/1.1" 404 390 - ZmEu
[18/May/2017:16:27:45 +0000] "GET /w00tw00t.at.blackhats.romanian.anti-sec:) HTTP/1.1" 404 401 - ZmEu

In addition, servers can also end up receiving unwanted web traffic that is intended for another server. In a cloud environment, an application may end up reusing an IP address that was formerly connected to another service. It’s common for misconfigured or misbehaving DNS servers to send traffic intended for a different host to an IP address now connected to your server.

It’s the responsibility of anyone running a web server to handle and reject potentially malicious traffic or unwanted traffic. Ideally, the web server can reject this traffic as early as possible, before it actually reaches the core application code. A reverse proxy is one way to provide this layer of protection for an application server. It can be configured to reject these requests before they reach the application server.

Reverse proxy for performance

Another advantage of using a reverse proxy such as NGINX is that it can be configured to offload some heavy lifting from your application container. For example, every HTTP server should support gzip. Whenever a client requests gzip encoding, the server compresses the response before sending it back to the client. This compression saves network bandwidth, which also improves speed for clients who now don’t have to wait as long for a response to fully download.

NGINX can be configured to accept a plaintext response from your application container and gzip encode it before sending it down to the client. This allows your application container to focus 100% of its CPU allotment on running business logic, while NGINX handles the encoding with its efficient gzip implementation.

An application may have security concerns that require SSL termination at the instance level instead of at the load balancer. NGINX can also be configured to terminate SSL before proxying the request to a local application container. Again, this also removes some CPU load from the application container, allowing it to focus on running business logic. It also gives you a cleaner way to patch any SSL vulnerabilities or update SSL certificates by updating the NGINX container without needing to change the application container.

NGINX configuration

Configuring NGINX for both traffic filtering and gzip encoding is shown below:

http {
  # NGINX will handle gzip compression of responses from the app server
  gzip on;
  gzip_proxied any;
  gzip_types text/plain application/json;
  gzip_min_length 1000;
  server {
    listen 80;
    # NGINX will reject anything not matching /api
    location /api {
      # Reject requests with unsupported HTTP method
      if ($request_method !~ ^(GET|POST|HEAD|OPTIONS|PUT|DELETE)$) {
        return 405;
      # Only requests matching the whitelist expectations will
      # get sent to the application server
      proxy_pass http://app:3000;
      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection 'upgrade';
      proxy_set_header Host $host;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_cache_bypass $http_upgrade;

The above configuration only accepts traffic that matches the expression /api and has a recognized HTTP method. If the traffic matches, it is forwarded to a local application container accessible at the local hostname app. If the client requested gzip encoding, the plaintext response from that application container is gzip-encoded.

Amazon ECS configuration

Configuring ECS to run this NGINX container as a sidecar is also simple. ECS uses a core primitive called the task definition. Each task definition can include one or more containers, which can be linked to each other:

  "containerDefinitions": [
       "name": "nginx",
       "image": "<NGINX reverse proxy image URL here>",
       "memory": "256",
       "cpu": "256",
       "essential": true,
       "portMappings": [
           "containerPort": "80",
           "protocol": "tcp"
       "links": [
       "name": "app",
       "image": "<app image URL here>",
       "memory": "256",
       "cpu": "256",
       "essential": true
   "networkMode": "bridge",
   "family": "application-stack"

This task definition causes ECS to start both an NGINX container and an application container on the same instance. Then, the NGINX container is linked to the application container. This allows the NGINX container to send traffic to the application container using the hostname app.

The NGINX container has a port mapping that exposes port 80 on a publically accessible port but the application container does not. This means that the application container is not directly addressable. The only way to send it traffic is to send traffic to the NGINX container, which filters that traffic down. It only forwards to the application container if the traffic passes the whitelisted rules.


Running a sidecar container such as NGINX can bring significant benefits by making it easier to provide protection for application containers. Sidecar containers also improve performance by freeing your application container from various CPU intensive tasks. Amazon ECS makes it easy to run sidecar containers, and automate their deployment across your cluster.

To see the full code for this NGINX sidecar reference, or to try it out yourself, you can check out the open source NGINX reverse proxy reference architecture on GitHub.

– Nathan

Now Available: Three New AWS Specialty Training Courses

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-available-three-new-aws-specialty-training-courses/

AWS Training allows you to learn from the experts so you can advance your knowledge with practical skills and get more out of the AWS Cloud. Today I am happy to announce that three of our most popular training bootcamps (a staple at AWS re:Invent and AWS Global Summits) are becoming part of our permanent instructor-led training portfolio:

These one-day courses are intended for individuals who would like to dive deeper into a specialized topic with an expert trainer.

You can explore our complete course catalog, and you can search for a public class near you within the AWS Training and Certification Portal. You can also request a private onsite training session for your team by contacting us.




Guest post: How EmailOctopus built an email marketing platform using Amazon SES

Post Syndicated from Brent Meyer original https://aws.amazon.com/blogs/ses/guest-post-how-emailoctopus-built-an-email-marketing-platform-using-amazon-ses/

The following guest post was written by Tom Evans, COO of EmailOctopus.

Our product, EmailOctopus, grew from a personal need. We were working on another business venture, and as our email subscriber base grew, the costs of using the larger email service providers became prohibitively expensive for an early-stage startup.

At this point we were already using Amazon SES to send sign up confirmations to our users. We loved Amazon SES’ low pricing and high deliverability, but being a transactional email service, we missed some tracking features offered by our marketing provider. We decided to develop a simple interface to make it easier for us to build and track the performance of marketing emails on top of the Amazon SES platform.

After sharing our accomplishments with other founders, and with no other SaaS solutions on the market that met the same need, we began to turn our basic script into a polished email marketing application. We named our application EmailOctopus. Over 4 years later, and with over 1.5 billion emails delivered through Amazon SES, our mission remains the same: to make contacting your customers as easy and inexpensive as possible.

EmailOctopus is now a fully fledged platform, with thousands of users sending marketing campaigns every day. Our platform integrates directly with our customers’ AWS accounts and provides them with an easy-to-use front end on top of the SES platform. EmailOctopus users can upload or register subscribers who have opted into their correspondence (through an import or one of our many integrations), then send a one-off campaign or an automated marketing series, all while closely tracking the performance of those emails and allowing the recipients to opt-out.

Scaling EmailOctopus to handle millions of emails per day

Building an email marketing platform from scratch has presented a number of challenges, both technical and operational. EmailOctopus has quickly grown from a side project to a mature business that has sent over 1.5 billion emails through Amazon SES.

One of the biggest challenges of our growth has been dealing with a rapidly expanding database. Email marketing generates a huge amount of data. We log every view, bounce, click, spam report, open and unsubscribe for every email sent through our platform. A single campaign can easily generate over 1 million of these events.

Our event processing system sits on a number of microservices spread over EC2 and Lambda, which allows us to selectively scale our services based on demand. Figure 1, below, demonstrates just how irregular this demand is. We save over $500 a month using an on-demand serverless model.

Figure 1. Number of events processed over time.

This model helps us manage our costs and ensures we only pay for the computing power we need.  We rely heavily on CloudFormation scripts to edit that infrastructure; these scripts allow every change to be version-controlled and propagated across all of our environments. In preparing for this blog post, we took a look at how that template had changed over the years—we’d forgotten just how much it had evolved. As our user base grew from 1 customer to 10,000, a single EC2 instance writing to a MySQL database just didn’t cut it. We now rely on a large portion of the AWS suite to reliably consume our event data, as illustrated in Figure 2, below.

Figure 2. Our current event processing infrastructure.

Operationally, our business has needed to make changes to scale too. Processes that worked fine with a handful of clients do not work so well with 10,000 users. We pride ourselves on providing our customers with a superior and personal service; to maintain that commitment, we dedicate 10% of our development time to improving our internal tools. One of these projects resulted in a dashboard which quickly provides us with detailed information on each user and their journey through the platform. The days of asking our support team to assemble database queries are long gone!

What makes EmailOctopus + SES different from the competition?

Amazon SES uses proprietary content filtering technologies and monitors the status of its services rigorously. This means that you’re likely to see increased deliverability on your communication, while saving up to 10x on your current email marketing costs. EmailOctopus pricing plans range from $0 to $109 per month (depending on the number of recipients you need to store), and the cost of sending email through Amazon SES is also very low: you pay nothing for the first 62,000 emails you send through Amazon SES each month, and $0.10 per 1,000 emails after that. Need to send a million emails in a month? You can do it for less than $100 with EmailOctopus + Amazon SES.

Our easy-to-use interface and integrations make it easy to add new subscribers, and our email templates help you create trackable, beautiful, responsive emails. We even offer trigger-based automated email delivery—perfect for onboarding new customers.

I’m ready to get started!

Great! We’ve made it easy to start using EmailOctopus with Amazon SES. First, if you don’t already have one, create an Amazon Web Services account. Once you’ve done that, head over to our website and create an EmailOctopus account. From there, we’ll walk you through the quick and easy process of linking the two services together.

If you’ve never used Amazon SES before, you will also need to provide some information about the types of communication you plan to send. This important step in the process ensures that all new Amazon SES users are reputable, and that they will not have a negative impact on other users who send email through Amazon SES. Once you’ve finished that step, you’ll be ready to start sending emails using EmailOctopus and Amazon SES.

To learn more about what EmailOctopus can do for your business, visit our website at https://emailoctopus.com.


Deploying Java Microservices on Amazon EC2 Container Service

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/deploying-java-microservices-on-amazon-ec2-container-service/

This post and accompanying code graciously contributed by:

Huy Huynh
Sr. Solutions Architect
Magnus Bjorkman
Solutions Architect

Java is a popular language used by many enterprises today. To simplify and accelerate Java application development, many companies are moving from a monolithic to microservices architecture. For some, it has become a strategic imperative. Containerization technology, such as Docker, lets enterprises build scalable, robust microservice architectures without major code rewrites.

In this post, I cover how to containerize a monolithic Java application to run on Docker. Then, I show how to deploy it on AWS using Amazon EC2 Container Service (Amazon ECS), a high-performance container management service. Finally, I show how to break the monolith into multiple services, all running in containers on Amazon ECS.

Application Architecture

For this example, I use the Spring Pet Clinic, a monolithic Java application for managing a veterinary practice. It is a simple REST API, which allows the client to manage and view Owners, Pets, Vets, and Visits.

It is a simple three-tier architecture:

  • Client
    You simulate this by using curl commands.
  • Web/app server
    This is the Java and Spring-based application that you run using the embedded Tomcat. As part of this post, you run this within Docker containers.
  • Database server
    This is the relational database for your application that stores information about owners, pets, vets, and visits. For this post, use MySQL RDS.

I decided to not put the database inside a container as containers were designed for applications and are transient in nature. The choice was made even easier because you have a fully managed database service available with Amazon RDS.

RDS manages the work involved in setting up a relational database, from provisioning the infrastructure capacity that you request to installing the database software. After your database is up and running, RDS automates common administrative tasks, such as performing backups and patching the software that powers your database. With optional Multi-AZ deployments, Amazon RDS also manages synchronous data replication across Availability Zones with automatic failover.


You can find the code for the example covered in this post at amazon-ecs-java-microservices on GitHub.


You need the following to walk through this solution:

  • An AWS account
  • An access key and secret key for a user in the account
  • The AWS CLI installed

Also, install the latest versions of the following:

  • Java
  • Maven
  • Python
  • Docker

Step 1: Move the existing Java Spring application to a container deployed using Amazon ECS

First, move the existing monolith application to a container and deploy it using Amazon ECS. This is a great first step before breaking the monolith apart because you still get some benefits before breaking apart the monolith:

  • An improved pipeline. The container also allows an engineering organization to create a standard pipeline for the application lifecycle.
  • No mutations to machines.

You can find the monolith example at 1_ECS_Java_Spring_PetClinic.

Container deployment overview

The following diagram is an overview of what the setup looks like for Amazon ECS and related services:

This setup consists of the following resources:

  • The client application that makes a request to the load balancer.
  • The load balancer that distributes requests across all available ports and instances registered in the application’s target group using round-robin.
  • The target group that is updated by Amazon ECS to always have an up-to-date list of all the service containers in the cluster. This includes the port on which they are accessible.
  • One Amazon ECS cluster that hosts the container for the application.
  • A VPC network to host the Amazon ECS cluster and associated security groups.

Each container has a single application process that is bound to port 8080 within its namespace. In reality, all the containers are exposed on a different, randomly assigned port on the host.

The architecture is containerized but still monolithic because each container has all the same features of the rest of the containers

The following is also part of the solution but not depicted in the above diagram:

  • One Amazon EC2 Container Registry (Amazon ECR) repository for the application.
  • A service/task definition that spins up containers on the instances of the Amazon ECS cluster.
  • A MySQL RDS instance that hosts the applications schema. The information about the MySQL RDS instance is sent in through environment variables to the containers, so that the application can connect to the MySQL RDS instance.

I have automated setup with the 1_ECS_Java_Spring_PetClinic/ecs-cluster.cf AWS CloudFormation template and a Python script.

The Python script calls the CloudFormation template for the initial setup of the VPC, Amazon ECS cluster, and RDS instance. It then extracts the outputs from the template and uses those for API calls to create Amazon ECR repositories, tasks, services, Application Load Balancer, and target groups.

Environment variables and Spring properties binding

As part of the Python script, you pass in a number of environment variables to the container as part of the task/container definition:

'environment': [
'value': 'mysql'
'value': my_sql_options['dns_name']
'value': my_sql_options['username']
'value': my_sql_options['password']

The preceding environment variables work in concert with the Spring property system. The value in the variable SPRING_PROFILES_ACTIVE, makes Spring use the MySQL version of the application property file. The other environment files override the following properties in that file:

  • spring.datasource.url
  • spring.datasource.username
  • spring.datasource.password

Optionally, you can also encrypt sensitive values by using Amazon EC2 Systems Manager Parameter Store. Instead of handing in the password, you pass in a reference to the parameter and fetch the value as part of the container startup. For more information, see Managing Secrets for Amazon ECS Applications Using Parameter Store and IAM Roles for Tasks.

Spotify Docker Maven plugin

Use the Spotify Docker Maven plugin to create the image and push it directly to Amazon ECR. This allows you to do this as part of the regular Maven build. It also integrates the image generation as part of the overall build process. Use an explicit Dockerfile as input to the plugin.

FROM frolvlad/alpine-oraclejdk8:slim
ADD spring-petclinic-rest-1.7.jar app.jar
RUN sh -c 'touch /app.jar'
ENTRYPOINT [ "sh", "-c", "java $JAVA_OPTS -Djava.security.egd=file:/dev/./urandom -jar /app.jar" ]

The Python script discussed earlier uses the AWS CLI to authenticate you with AWS. The script places the token in the appropriate location so that the plugin can work directly against the Amazon ECR repository.

Test setup

You can test the setup by running the Python script:
python setup.py -m setup -r <your region>

After the script has successfully run, you can test by querying an endpoint:
curl <your endpoint from output above>/owner

You can clean this up before going to the next section:
python setup.py -m cleanup -r <your region>

Step 2: Converting the monolith into microservices running on Amazon ECS

The second step is to convert the monolith into microservices. For a real application, you would likely not do this as one step, but re-architect an application piece by piece. You would continue to run your monolith but it would keep getting smaller for each piece that you are breaking apart.

By migrating microservices, you would get four benefits associated with microservices:

  • Isolation of crashes
    If one microservice in your application is crashing, then only that part of your application goes down. The rest of your application continues to work properly.
  • Isolation of security
    When microservice best practices are followed, the result is that if an attacker compromises one service, they only gain access to the resources of that service. They can’t horizontally access other resources from other services without breaking into those services as well.
  • Independent scaling
    When features are broken out into microservices, then the amount of infrastructure and number of instances of each microservice class can be scaled up and down independently.
  • Development velocity
    In a monolith, adding a new feature can potentially impact every other feature that the monolith contains. On the other hand, a proper microservice architecture has new code for a new feature going into a new service. You can be confident that any code you write won’t impact the existing code at all, unless you explicitly write a connection between two microservices.

Find the monolith example at 2_ECS_Java_Spring_PetClinic_Microservices.
You break apart the Spring Pet Clinic application by creating a microservice for each REST API operation, as well as creating one for the system services.

Java code changes

Comparing the project structure between the monolith and the microservices version, you can see that each service is now its own separate build.
First, the monolith version:

You can clearly see how each API operation is its own subpackage under the org.springframework.samples.petclinic package, all part of the same monolithic application.
This changes as you break it apart in the microservices version:

Now, each API operation is its own separate build, which you can build independently and deploy. You have also duplicated some code across the different microservices, such as the classes under the model subpackage. This is intentional as you don’t want to introduce artificial dependencies among the microservices and allow these to evolve differently for each microservice.

Also, make the dependencies among the API operations more loosely coupled. In the monolithic version, the components are tightly coupled and use object-based invocation.

Here is an example of this from the OwnerController operation, where the class is directly calling PetRepository to get information about pets. PetRepository is the Repository class (Spring data access layer) to the Pet table in the RDS instance for the Pet API:

class OwnerController {

    private PetRepository pets;
    private OwnerRepository owners;
    private static final Logger logger = LoggerFactory.getLogger(OwnerController.class);

    @RequestMapping(value = "/owner/{ownerId}/getVisits", method = RequestMethod.GET)
    public ResponseEntity<List<Visit>> getOwnerVisits(@PathVariable int ownerId){
        List<Pet> petList = this.owners.findById(ownerId).getPets();
        List<Visit> visitList = new ArrayList<Visit>();
        petList.forEach(pet -> visitList.addAll(pet.getVisits()));
        return new ResponseEntity<List<Visit>>(visitList, HttpStatus.OK);

In the microservice version, call the Pet API operation and not PetRepository directly. Decouple the components by using interprocess communication; in this case, the Rest API. This provides for fault tolerance and disposability.

class OwnerController {

    @Value("#{environment['SERVICE_ENDPOINT'] ?: 'localhost:8080'}")
    private String serviceEndpoint;

    private OwnerRepository owners;
    private static final Logger logger = LoggerFactory.getLogger(OwnerController.class);

    @RequestMapping(value = "/owner/{ownerId}/getVisits", method = RequestMethod.GET)
    public ResponseEntity<List<Visit>> getOwnerVisits(@PathVariable int ownerId){
        List<Pet> petList = this.owners.findById(ownerId).getPets();
        List<Visit> visitList = new ArrayList<Visit>();
        petList.forEach(pet -> {
        return new ResponseEntity<List<Visit>>(visitList, HttpStatus.OK);

    private List<Visit> getPetVisits(int petId){
        List<Visit> visitList = new ArrayList<Visit>();
        RestTemplate restTemplate = new RestTemplate();
        Pet pet = restTemplate.getForObject("http://"+serviceEndpoint+"/pet/"+petId, Pet.class);
        return pet.getVisits();

You now have an additional method that calls the API. You are also handing in the service endpoint that should be called, so that you can easily inject dynamic endpoints based on the current deployment.

Container deployment overview

Here is an overview of what the setup looks like for Amazon ECS and the related services:

This setup consists of the following resources:

  • The client application that makes a request to the load balancer.
  • The Application Load Balancer that inspects the client request. Based on routing rules, it directs the request to an instance and port from the target group that matches the rule.
  • The Application Load Balancer that has a target group for each microservice. The target groups are used by the corresponding services to register available container instances. Each target group has a path, so when you call the path for a particular microservice, it is mapped to the correct target group. This allows you to use one Application Load Balancer to serve all the different microservices, accessed by the path. For example, https:///owner/* would be mapped and directed to the Owner microservice.
  • One Amazon ECS cluster that hosts the containers for each microservice of the application.
  • A VPC network to host the Amazon ECS cluster and associated security groups.

Because you are running multiple containers on the same instances, use dynamic port mapping to avoid port clashing. By using dynamic port mapping, the container is allocated an anonymous port on the host to which the container port (8080) is mapped. The anonymous port is registered with the Application Load Balancer and target group so that traffic is routed correctly.

The following is also part of the solution but not depicted in the above diagram:

  • One Amazon ECR repository for each microservice.
  • A service/task definition per microservice that spins up containers on the instances of the Amazon ECS cluster.
  • A MySQL RDS instance that hosts the applications schema. The information about the MySQL RDS instance is sent in through environment variables to the containers. That way, the application can connect to the MySQL RDS instance.

I have again automated setup with the 2_ECS_Java_Spring_PetClinic_Microservices/ecs-cluster.cf CloudFormation template and a Python script.

The CloudFormation template remains the same as in the previous section. In the Python script, you are now building five different Java applications, one for each microservice (also includes a system application). There is a separate Maven POM file for each one. The resulting Docker image gets pushed to its own Amazon ECR repository, and is deployed separately using its own service/task definition. This is critical to get the benefits described earlier for microservices.

Here is an example of the POM file for the Owner microservice:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
        <!-- Generic properties -->
        <!-- Spring and Spring Boot dependencies -->
        <!-- Databases - Uses HSQL by default -->
        <!-- caching -->
        <!-- end of webjars -->

Test setup

You can test this by running the Python script:

python setup.py -m setup -r <your region>

After the script has successfully run, you can test by querying an endpoint:

curl <your endpoint from output above>/owner


Migrating a monolithic application to a containerized set of microservices can seem like a daunting task. Following the steps outlined in this post, you can begin to containerize monolithic Java apps, taking advantage of the container runtime environment, and beginning the process of re-architecting into microservices. On the whole, containerized microservices are faster to develop, easier to iterate on, and more cost effective to maintain and secure.

This post focused on the first steps of microservice migration. You can learn more about optimizing and scaling your microservices with components such as service discovery, blue/green deployment, circuit breakers, and configuration servers at http://aws.amazon.com/containers.

If you have questions or suggestions, please comment below.

Blue/Green Deployments with Amazon EC2 Container Service

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/bluegreen-deployments-with-amazon-ecs/

This post and accompanying code was generously contributed by:

Jeremy Cowan
Solutions Architect
Anuj Sharma
DevOps Cloud Architect
Peter Dalbhanjan
Solutions Architect

Deploying software updates in traditional non-containerized environments is hard and fraught with risk. When you write your deployment package or script, you have to assume that the target machine is in a particular state. If your staging environment is not an exact mirror image of your production environment, your deployment could fail. These failures frequently cause outages that persist until you re-deploy the last known good version of your application. If you are an Operations Manager, this is what keeps you up at night.

Increasingly, customers want to do testing in production environments without exposing customers to the new version until the release has been vetted. Others want to expose a small percentage of their customers to the new release to gather feedback about a feature before it’s released to the broader population. This is often referred to as canary analysis or canary testing. In this post, I introduce patterns to implement blue/green and canary deployments using Application Load Balancers and target groups.

If you’d like to try this approach to blue/green deployments, we have open sourced the code and AWS CloudFormation templates in the ecs-blue-green-deployment GitHub repo. The workflow builds an automated CI/CD pipeline that deploys your service onto an ECS cluster and offers a controlled process to swap target groups when you’re ready to promote the latest version of your code to production. You can quickly set up the environment in three steps and see the blue/green swap in action. We’d love for you to try it and send us your feedback!

Benefits of blue/green

Blue/green deployments are a type of immutable deployment that help you deploy software updates with less risk. The risk is reduced by creating separate environments for the current running or “blue” version of your application, and the new or “green” version of your application.

This type of deployment gives you an opportunity to test features in the green environment without impacting the current running version of your application. When you’re satisfied that the green version is working properly, you can gradually reroute the traffic from the old blue environment to the new green environment by modifying DNS. By following this method, you can update and roll back features with near zero downtime.

A typical blue/green deployment involves shifting traffic between 2 distinct environments.

This ability to quickly roll traffic back to the still-operating blue environment is one of the key benefits of blue/green deployments. With blue/green, you should be able to roll back to the blue environment at any time during the deployment process. This limits downtime to the time it takes to realize there’s an issue in the green environment and shift the traffic back to the blue environment. Furthermore, the impact of the outage is limited to the portion of traffic going to the green environment, not all traffic. If the blast radius of deployment errors is reduced, so is the overall deployment risk.

Containers make it simpler

Historically, blue/green deployments were not often used to deploy software on-premises because of the cost and complexity associated with provisioning and managing multiple environments. Instead, applications were upgraded in place.

Although this approach worked, it had several flaws, including the ability to roll back quickly from failures. Rollbacks typically involved re-deploying a previous version of the application, which could affect the length of an outage caused by a bad release. Fixing the issue took precedence over the need to debug, so there were fewer opportunities to learn from your mistakes.

Containers can ease the adoption of blue/green deployments because they’re easily packaged and behave consistently as they’re moved between environments. This consistency comes partly from their immutability. To change the configuration of a container, update its Dockerfile and rebuild and re-deploy the container rather than updating the software in place.

Containers also provide process and namespace isolation for your applications, which allows you to run multiple versions of them side by side on the same Docker host without conflicts. Given their small sizes relative to virtual machines, you can binpack more containers per host than VMs. This lets you make more efficient use of your computing resources, reducing the cost of blue/green deployments.

Fully Managed Updates with Amazon ECS

Amazon EC2 Container Service (ECS) performs rolling updates when you update an existing Amazon ECS service. A rolling update involves replacing the current running version of the container with the latest version. The number of containers Amazon ECS adds or removes from service during a rolling update is controlled by adjusting the minimum and maximum number of healthy tasks allowed during service deployments.

When you update your service’s task definition with the latest version of your container image, Amazon ECS automatically starts replacing the old version of your container with the latest version. During a deployment, Amazon ECS drains connections from the current running version and registers your new containers with the Application Load Balancer as they come online.

Target groups

A target group is a logical construct that allows you to run multiple services behind the same Application Load Balancer. This is possible because each target group has its own listener.

When you create an Amazon ECS service that’s fronted by an Application Load Balancer, you have to designate a target group for your service. Ordinarily, you would create a target group for each of your Amazon ECS services. However, the approach we’re going to explore here involves creating two target groups: one for the blue version of your service, and one for the green version of your service. We’re also using a different listener port for each target group so that you can test the green version of your service using the same path as the blue service.

With this configuration, you can run both environments in parallel until you’re ready to cut over to the green version of your service. You can also do things such as restricting access to the green version to testers on your internal network, using security group rules and placement constraints. For example, you can target the green version of your service to only run on instances that are accessible from your corporate network.

Swapping Over

When you’re ready to replace the old blue service with the new green service, call the ModifyListener API operation to swap the listener’s rules for the target group rules. The change happens instantaneously. Afterward, the green service is running in the target group with the port 80 listener and the blue service is running in the target group with the port 8080 listener. The diagram below is an illustration of the approach described.


Two services are defined, each with their own target group registered to the same Application Load Balancer but listening on different ports. Deployment is completed by swapping the listener rules between the two target groups.

The second service is deployed with a new target group listening on a different port but registered to the same Application Load Balancer.

By using 2 listeners, requests to blue services are directed to the target group with the port 80 listener, while requests to the green services are directed to target group with the port 8080 listener.

After automated or manual testing, the deployment can be completed by swapping the listener rules on the Application Load Balancer and sending traffic to the green service.


There are a few caveats to be mindful of when using this approach. This method:

  • Assumes that your application code is completely stateless. Store state outside of the container.
  • Doesn’t gracefully drain connections. The swapping of target groups is sudden and abrupt. Therefore, be cautious about using this approach if your service has long-running transactions.
  • Doesn’t allow you to perform canary deployments. While the method gives you the ability to quickly switch between different versions of your service, it does not allow you to divert a portion of the production traffic to a canary or control the rate at which your service is deployed across the cluster.

Canary testing

While this type of deployment automates much of the heavy lifting associated with rolling deployments, it doesn’t allow you to interrupt the deployment if you discover an issue midstream. Rollbacks using the standard Amazon ECS deployment require updating the service’s task definition with the last known good version of the container. Then, you wait for Amazon ECS to schedule and deploy it across the cluster. If the latest version introduces a breaking change that went undiscovered during testing, this might be too slow.

With canary testing, if you discover the green environment is not operating as expected, there is no impact on the blue environment. You can route traffic back to it, minimizing impaired operation or downtime, and limiting the blast radius of impact.

This type of deployment is particularly useful for A/B testing where you want to expose a new feature to a subset of users to get their feedback before making it broadly available.

For canary style deployments, you can use a variation of the blue/green swap that involves deploying the blue and the green service to the same target group. Although this method is not as fast as the swap, it allows you to control the rate at which your containers are replaced by adjusting the task count for each service. Furthermore, it gives you the ability to roll back by adjusting the number of tasks for the blue and green services respectively. Unlike the swap approach described above, connections to your containers are drained gracefully. We plan to address canary style deployments for Amazon ECS in a future post.


With AWS, you can operationalize your blue/green deployments using Amazon ECS, an Application Load Balancer, and target groups. I encourage you to adapt the code published to the ecs-blue-green-deployment GitHub repo for your use cases and look forward to reading your feedback.

If you’re interested in learning more, I encourage you to read the Blue/Green Deployments on AWS and Practicing Continuous Integration and Continuous Delivery on AWS whitepapers.

If you have questions or suggestions, please comment below.

timeShift(GrafanaBuzz, 1w) Issue 2

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/06/30/timeshiftgrafanabuzz-1w-issue-2/

A big thank you to everyone for the likes, retweets, comments and questions from last week’s timeShift debut. We were delighted to learn that people found this new resource useful, and are excited to continue to publish weekly issues. If you know of a recent article about Grafana, or are writing one yourself, please get in touch, we’d be happy to feature it here.

From the Blogosphere

Plugins and Dashboards

We are excited that there have been over 100,000 plugin installations since we launched the new plugable architecture in Grafana v3. You can discover and install plugins in your own on-premises or Hosted Grafana instance from our website. Below are some recent additions and updates.

SimpleJson SimpleJson is a generic backend datasource that has been the foundation of a number of Grafana data source plugins. It’s also a mechanism by which any application can expose metrics over http directly to Grafana. The newest version adds basic auth.

NetXMS Grafana datasource for NetXMS open source monitoring system.

GoogleCalendar This plugin shows the event description as an annotation on your graphs.

Discrete Panel Show discrete values in a horizontal graph. This panel now supports results from the table format.

Alarm Box This panel shows the total count of values across all series. This update adds a new option to customize how the display and color values are calculated.

Status Dot This panel shows a colored dot for each series; useful to monitor latest values at a glance.

This week’s MVC (Most Valuable Contributor)

Each week we’ll recognize a Grafana contributor and thank them for all of their PRs, bug reports and feedback. A majority of fixes and improvements come from our fantastic community!

mtanda (Mitsuhiro Tanda)

159 PR’s during the last 2 years and still going strong. Thank you for your contributions mtanda!

What do you think?

Anything in particular you’d like to see in this series of posts? Too long? Too short? Boring? Let us know. Comment on this article below, or post something at our community forum. With your help, we can make this a worthwhile resource.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

Synchronizing Amazon S3 Buckets Using AWS Step Functions

Post Syndicated from Andy Katz original https://aws.amazon.com/blogs/compute/synchronizing-amazon-s3-buckets-using-aws-step-functions/

Constantin Gonzalez is a Principal Solutions Architect at AWS

In my free time, I run a small blog that uses Amazon S3 to host static content and Amazon CloudFront to distribute it world-wide. I use a home-grown, static website generator to create and upload my blog content onto S3.

My blog uses two S3 buckets: one for staging and testing, and one for production. As a website owner, I want to update the production bucket with all changes from the staging bucket in a reliable and efficient way, without having to create and populate a new bucket from scratch. Therefore, to synchronize files between these two buckets, I use AWS Lambda and AWS Step Functions.

In this post, I show how you can use Step Functions to build a scalable synchronization engine for S3 buckets and learn some common patterns for designing Step Functions state machines while you do so.

Step Functions overview

Step Functions makes it easy to coordinate the components of distributed applications and microservices using visual workflows. Building applications from individual components that each perform a discrete function lets you scale and change applications quickly.

While this particular example focuses on synchronizing objects between two S3 buckets, it can be generalized to any other use case that involves coordinated processing of any number of objects in S3 buckets, or other, similar data processing patterns.

Bucket replication options

Before I dive into the details on how this particular example works, take a look at some alternatives for copying or replicating data between two Amazon S3 buckets:

  • The AWS CLI provides customers with a powerful aws s3 sync command that can synchronize the contents of one bucket with another.
  • S3DistCP is a powerful tool for users of Amazon EMR that can efficiently load, save, or copy large amounts of data between S3 buckets and HDFS.
  • The S3 cross-region replication functionality enables automatic, asynchronous copying of objects across buckets in different AWS regions.

In this use case, you are looking for a slightly different bucket synchronization solution that:

  • Works within the same region
  • Is more scalable than a CLI approach running on a single machine
  • Doesn’t require managing any servers
  • Uses a more finely grained cost model than the hourly based Amazon EMR approach

You need a scalable, serverless, and customizable bucket synchronization utility.

Solution architecture

Your solution needs to do three things:

  1. Copy all objects from a source bucket into a destination bucket, but leave out objects that are already present, for efficiency.
  2. Delete all "orphaned" objects from the destination bucket that aren’t present on the source bucket, because you don’t want obsolete objects lying around.
  3. Keep track of all objects for #1 and #2, regardless of how many objects there are.

In the beginning, you read in the source and destination buckets as parameters and perform basic parameter validation. Then, you operate two separate, independent loops, one for copying missing objects and one for deleting obsolete objects. Each loop is a sequence of Step Functions states that read in chunks of S3 object lists and use the continuation token to decide in a choice state whether to continue the loop or not.

This solution is based on the following architecture that uses Step Functions, Lambda, and two S3 buckets:

As you can see, this setup involves no servers, just two main building blocks:

  • Step Functions manages the overall flow of synchronizing the objects from the source bucket with the destination bucket.
  • A set of Lambda functions carry out the individual steps necessary to perform the work, such as validating input, getting lists of objects from source and destination buckets, copying or deleting objects in batches, and so on.

To understand the synchronization flow in more detail, look at the Step Functions state machine diagram for this example.


Here’s a detailed discussion of how this works.

To follow along, use the code in the sync-buckets-state-machine GitHub repo. The code comes with a ready-to-run deployment script in Python that takes care of all the IAM roles, policies, Lambda functions, and of course the Step Functions state machine deployment using AWS CloudFormation, as well as instructions on how to use it.

Fine print: Use at your own risk

Before I start, here are some disclaimers:

  • Educational purposes only.

    The following example and code are intended for educational purposes only. Make sure that you customize, test, and review it on your own before using any of this in production.

  • S3 object deletion.

    In particular, using the code included below may delete objects on S3 in order to perform synchronization. Make sure that you have backups of your data. In particular, consider using the Amazon S3 Versioning feature to protect yourself against unintended data modification or deletion.

Step Functions execution starts with an initial set of parameters that contain the source and destination bucket names in JSON:

    "source":       "my-source-bucket-name",
    "destination":  "my-destination-bucket-name"

Armed with this data, Step Functions execution proceeds as follows.

Step 1: Detect the bucket region

First, you need to know the regions where your buckets reside. In this case, take advantage of the Step Functions Parallel state. This allows you to use a Lambda function get_bucket_location.py inside two different, parallel branches of task states:

  • FindRegionForSourceBucket
  • FindRegionForDestinationBucket

Each task state receives one bucket name as an input parameter, then detects the region corresponding to "their" bucket. The output of these functions is collected in a result array containing one element per parallel function.

Step 2: Combine the parallel states

The output of a parallel state is a list with all the individual branches’ outputs. To combine them into a single structure, use a Lambda function called combine_dicts.py in its own CombineRegionOutputs task state. The function combines the two outputs from step 1 into a single JSON dict that provides you with the necessary region information for each bucket.

Step 3: Validate the input

In this walkthrough, you only support buckets that reside in the same region, so you need to decide if the input is valid or if the user has given you two buckets in different regions. To find out, use a Lambda function called validate_input.py in the ValidateInput task state that tests if the two regions from the previous step are equal. The output is a Boolean.

Step 4: Branch the workflow

Use another type of Step Functions state, a Choice state, which branches into a Failure state if the comparison in step 3 yields false, or proceeds with the remaining steps if the comparison was successful.

Step 5: Execute in parallel

The actual work is happening in another Parallel state. Both branches of this state are very similar to each other and they re-use some of the Lambda function code.

Each parallel branch implements a looping pattern across the following steps:

  1. Use a Pass state to inject either the string value "source" (InjectSourceBucket) or "destination" (InjectDestinationBucket) into the listBucket attribute of the state document.

    The next step uses either the source or the destination bucket, depending on the branch, while executing the same, generic Lambda function. You don’t need two Lambda functions that differ only slightly. This step illustrates how to use Pass states as a way of injecting constant parameters into your state machine and as a way of controlling step behavior while re-using common step execution code.

  2. The next step UpdateSourceKeyList/UpdateDestinationKeyList lists objects in the given bucket.

    Remember that the previous step injected either "source" or "destination" into the state document’s listBucket attribute. This step uses the same list_bucket.py Lambda function to list objects in an S3 bucket. The listBucket attribute of its input decides which bucket to list. In the left branch of the main parallel state, use the list of source objects to work through copying missing objects. The right branch uses the list of destination objects, to check if they have a corresponding object in the source bucket and eliminate any orphaned objects. Orphans don’t have a source object of the same S3 key.

  3. This step performs the actual work. In the left branch, the CopySourceKeys step uses the copy_keys.py Lambda function to go through the list of source objects provided by the previous step, then copies any missing object into the destination bucket. Its sister step in the other branch, DeleteOrphanedKeys, uses its destination bucket key list to test whether each object from the destination bucket has a corresponding source object, then deletes any orphaned objects.

  4. The S3 ListObjects API action is designed to be scalable across many objects in a bucket. Therefore, it returns object lists in chunks of configurable size, along with a continuation token. If the API result has a continuation token, it means that there are more objects in this list. You can work from token to token to continue getting object list chunks, until you get no more continuation tokens.

By breaking down large amounts of work into chunks, you can make sure each chunk is completed within the timeframe allocated for the Lambda function, and within the maximum input/output data size for a Step Functions state.

This approach comes with a slight tradeoff: the more objects you process at one time in a given chunk, the faster you are done. There’s less overhead for managing individual chunks. On the other hand, if you process too many objects within the same chunk, you risk going over time and space limits of the processing Lambda function or the Step Functions state so the work cannot be completed.

In this particular case, use a Lambda function that maximizes the number of objects listed from the S3 bucket that can be stored in the input/output state data. This is currently up to 32,768 bytes, assuming (based on some experimentation) that the execution of the COPY/DELETE requests in the processing states can always complete in time.

A more sophisticated approach would use the Step Functions retry/catch state attributes to account for any time limits encountered and adjust the list size accordingly through some list site adjusting.

Step 6: Test for completion

Because the presence of a continuation token in the S3 ListObjects output signals that you are not done processing all objects yet, use a Choice state to test for its presence. If a continuation token exists, it branches into the UpdateSourceKeyList step, which uses the token to get to the next chunk of objects. If there is no token, you’re done. The state machine then branches into the FinishCopyBranch/FinishDeleteBranch state.

By using Choice states like this, you can create loops exactly like the old times, when you didn’t have for statements and used branches in assembly code instead!

Step 7: Success!

Finally, you’re done, and can step into your final Success state.

Lessons learned

When implementing this use case with Step Functions and Lambda, I learned the following things:

  • Sometimes, it is necessary to manipulate the JSON state of a Step Functions state machine with just a few lines of code that hardly seem to warrant their own Lambda function. This is ok, and the cost is actually pretty low given Lambda’s 100 millisecond billing granularity. The upside is that functions like these can be helpful to make the data more palatable for the following steps or for facilitating Choice states. An example here would be the combine_dicts.py function.
  • Pass states can be useful beyond debugging and tracing, they can be used to inject arbitrary values into your state JSON and guide generic Lambda functions into doing specific things.
  • Choice states are your friend because you can build while-loops with them. This allows you to reliably grind through large amounts of data with the patience of an engine that currently supports execution times of up to 1 year.

    Currently, there is an execution history limit of 25,000 events. Each Lambda task state execution takes up 5 events, while each choice state takes 2 events for a total of 7 events per loop. This means you can loop about 3500 times with this state machine. For even more scalability, you can split up work across multiple Step Functions executions through object key sharding or similar approaches.

  • It’s not necessary to spend a lot of time coding exception handling within your Lambda functions. You can delegate all exception handling to Step Functions and instead simplify your functions as much as possible.

  • Step Functions are great replacements for shell scripts. This could have been a shell script, but then I would have had to worry about where to execute it reliably, how to scale it if it went beyond a few thousand objects, etc. Think of Step Functions and Lambda as tools for scripting at a cloud level, beyond the boundaries of servers or containers. "Serverless" here also means "boundary-less".


This approach gives you scalability by breaking down any number of S3 objects into chunks, then using Step Functions to control logic to work through these objects in a scalable, serverless, and fully managed way.

To take a look at the code or tweak it for your own needs, use the code in the sync-buckets-state-machine GitHub repo.

To see more examples, please visit the Step Functions Getting Started page.


Building Loosely Coupled, Scalable, C# Applications with Amazon SQS and Amazon SNS

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/building-loosely-coupled-scalable-c-applications-with-amazon-sqs-and-amazon-sns/

Stephen Liedig, Solutions Architect


One of the many challenges professional software architects and developers face is how to make cloud-native applications scalable, fault-tolerant, and highly available.

Fundamental to your project success is understanding the importance of making systems highly cohesive and loosely coupled. That means considering the multi-dimensional facets of system coupling to support the distributed nature of the applications that you are building for the cloud.

By that, I mean addressing not only the application-level coupling (managing incoming and outgoing dependencies), but also considering the impacts of of platform, spatial, and temporal coupling of your systems. Platform coupling relates to the interoperability, or lack thereof, of heterogeneous systems components. Spatial coupling deals with managing components at a network topology level or protocol level. Temporal, or runtime coupling, refers to the ability of a component within your system to do any kind of meaningful work while it is performing a synchronous, blocking operation.

The AWS messaging services, Amazon SQS and Amazon SNS, help you deal with these forms of coupling by providing mechanisms for:

  • Reliable, durable, and fault-tolerant delivery of messages between application components
  • Logical decomposition of systems and increased autonomy of components
  • Creating unidirectional, non-blocking operations, temporarily decoupling system components at runtime
  • Decreasing the dependencies that components have on each other through standard communication and network channels

Following on the recent topic, Building Scalable Applications and Microservices: Adding Messaging to Your Toolbox, in this post, I look at some of the ways you can introduce SQS and SNS into your architectures to decouple your components, and show how you can implement them using C#.


To illustrate some of these concepts, consider a web application that processes customer orders. As good architects and developers, you have followed best practices and made your application scalable and highly available. Your solution included implementing load balancing, dynamic scaling across multiple Availability Zones, and persisting orders in a Multi-AZ Amazon RDS database instance, as in the following diagram.

In this example, the application is responsible for handling and persisting the order data, as well as dealing with increases in traffic for popular items.

One potential point of vulnerability in the order processing workflow is in saving the order in the database. The business expects that every order has been persisted into the database. However, any potential deadlock, race condition, or network issue could cause the persistence of the order to fail. Then, the order is lost with no recourse to restore the order.

With good logging capability, you may be able to identify when an error occurred and which customer’s order failed. This wouldn’t allow you to “restore” the transaction, and by that stage, your customer is no longer your customer.

As illustrated in the following diagram, introducing an SQS queue helps improve your ordering application. Using the queue isolates the processing logic into its own component and runs it in a separate process from the web application. This, in turn, allows the system to be more resilient to spikes in traffic, while allowing work to be performed only as fast as necessary in order to manage costs.

In addition, you now have a mechanism for persisting orders as messages (with the queue acting as a temporary database), and have moved the scope of your transaction with your database further down the stack. In the event of an application exception or transaction failure, this ensures that the order processing can be retired or redirected to the Amazon SQS Dead Letter Queue (DLQ), for re-processing at a later stage. (See the recent post, Using Amazon SQS Dead-Letter Queues to Control Message Failure, for more information on dead-letter queues.)

Scaling the order processing nodes

This change allows you now to scale the web application frontend independently from the processing nodes. The frontend application can continue to scale based on metrics such as CPU usage, or the number of requests hitting the load balancer. Processing nodes can scale based on the number of orders in the queue. Here is an example of scale-in and scale-out alarms that you would associate with the scaling policy.

Scale-out Alarm

aws cloudwatch put-metric-alarm --alarm-name AddCapacityToCustomerOrderQueue --metric-name ApproximateNumberOfMessagesVisible --namespace "AWS/SQS" 
--statistic Average --period 300 --threshold 3 --comparison-operator GreaterThanOrEqualToThreshold --dimensions Name=QueueName,Value=customer-orders
--evaluation-periods 2 --alarm-actions <arn of the scale-out autoscaling policy>

Scale-in Alarm

aws cloudwatch put-metric-alarm --alarm-name RemoveCapacityFromCustomerOrderQueue --metric-name ApproximateNumberOfMessagesVisible --namespace "AWS/SQS" 
 --statistic Average --period 300 --threshold 1 --comparison-operator LessThanOrEqualToThreshold --dimensions Name=QueueName,Value=customer-orders
 --evaluation-periods 2 --alarm-actions <arn of the scale-in autoscaling policy>

In the above example, use the ApproximateNumberOfMessagesVisible metric to discover the queue length and drive the scaling policy of the Auto Scaling group. Another useful metric is ApproximateAgeOfOldestMessage, when applications have time-sensitive messages and developers need to ensure that messages are processed within a specific time period.

Scaling the order processing implementation

On top of scaling at an infrastructure level using Auto Scaling, make sure to take advantage of the processing power of your Amazon EC2 instances by using as many of the available threads as possible. There are several ways to implement this. In this post, we build a Windows service that uses the BackgroundWorker class to process the messages from the queue.

Here’s a closer look at the implementation. In the first section of the consuming application, use a loop to continually poll the queue for new messages, and construct a ReceiveMessageRequest variable.

public static void PollQueue()
    while (_running)
        Task<ReceiveMessageResponse> receiveMessageResponse;

        // Pull messages off the queue
        using (var sqs = new AmazonSQSClient())
            const int maxMessages = 10;  // 1-10

            //Receiving a message
            var receiveMessageRequest = new ReceiveMessageRequest
                // Get URL from Configuration
                QueueUrl = _queueUrl, 
                // The maximum number of messages to return. 
                // Fewer messages might be returned. 
                MaxNumberOfMessages = maxMessages, 
                // A list of attributes that need to be returned with message.
                AttributeNames = new List<string> { "All" },
                // Enable long polling. 
                // Time to wait for message to arrive on queue.
                WaitTimeSeconds = 5 

            receiveMessageResponse = sqs.ReceiveMessageAsync(receiveMessageRequest);

The WaitTimeSeconds property of the ReceiveMessageRequest specifies the duration (in seconds) that the call waits for a message to arrive in the queue before returning a response to the calling application. There are a few benefits to using long polling:

  • It reduces the number of empty responses by allowing SQS to wait until a message is available in the queue before sending a response.
  • It eliminates false empty responses by querying all (rather than a limited number) of the servers.
  • It returns messages as soon any message becomes available.

For more information, see Amazon SQS Long Polling.

After you have returned messages from the queue, you can start to process them by looping through each message in the response and invoking a new BackgroundWorker thread.

// Process messages
if (receiveMessageResponse.Result.Messages != null)
    foreach (var message in receiveMessageResponse.Result.Messages)
        Console.WriteLine("Received SQS message, starting worker thread");

        // Create background worker to process message
        BackgroundWorker worker = new BackgroundWorker();
        worker.DoWork += (obj, e) => ProcessMessage(message);
    Console.WriteLine("No messages on queue");

The event handler, ProcessMessage, is where you implement business logic for processing orders. It is important to have a good understanding of how long a typical transaction takes so you can set a message VisibilityTimeout that is long enough to complete your operation. If order processing takes longer than the specified timeout period, the message becomes visible on the queue. Other nodes may pick it and process the same order twice, leading to unintended consequences.

Handling Duplicate Messages

In order to manage duplicate messages, seek to make your processing application idempotent. In mathematics, idempotent describes a function that produces the same result if it is applied to itself:

f(x) = f(f(x))

No matter how many times you process the same message, the end result is the same (definition from Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions, Hohpe and Wolf, 2004).

There are several strategies you could apply to achieve this:

  • Create messages that have inherent idempotent characteristics. That is, they are non-transactional in nature and are unique at a specified point in time. Rather than saying “place new order for Customer A,” which adds a duplicate order to the customer, use “place order <orderid> on <timestamp> for Customer A,” which creates a single order no matter how often it is persisted.
  • Deliver your messages via an Amazon SQS FIFO queue, which provides the benefits of message sequencing, but also mechanisms for content-based deduplication. You can deduplicate using the MessageDeduplicationId property on the SendMessage request or by enabling content-based deduplication on the queue, which generates a hash for MessageDeduplicationId, based on the content of the message, not the attributes.
var sendMessageRequest = new SendMessageRequest
    QueueUrl = _queueUrl,
    MessageBody = JsonConvert.SerializeObject(order),
    MessageGroupId = Guid.NewGuid().ToString("N"),
    MessageDeduplicationId = Guid.NewGuid().ToString("N")
  • If using SQS FIFO queues is not an option, keep a message log of all messages attributes processed for a specified period of time, as an alternative to message deduplication on the receiving end. Verifying the existence of the message in the log before processing the message adds additional computational overhead to your processing. This can be minimized through low latency persistence solutions such as Amazon DynamoDB. Bear in mind that this solution is dependent on the successful, distributed transaction of the message and the message log.

Handling exceptions

Because of the distributed nature of SQS queues, it does not automatically delete the message. Therefore, you must explicitly delete the message from the queue after processing it, using the message ReceiptHandle property (see the following code example).

However, if at any stage you have an exception, avoid handling it as you normally would. The intention is to make sure that the message ends back on the queue, so that you can gracefully deal with intermittent failures. Instead, log the exception to capture diagnostic information, and swallow it.

By not explicitly deleting the message from the queue, you can take advantage of the VisibilityTimeout behavior described earlier. Gracefully handle the message processing failure and make the unprocessed message available to other nodes to process.

In the event that subsequent retries fail, SQS automatically moves the message to the configured DLQ after the configured number of receives has been reached. You can further investigate why the order process failed. Most importantly, the order has not been lost, and your customer is still your customer.

private static void ProcessMessage(Message message)
    using (var sqs = new AmazonSQSClient())
            Console.WriteLine("Processing message id: {0}", message.MessageId);

            // Implement messaging processing here
            // Ensure no downstream resource contention (parallel processing)
            // <your order processing logic in here…>
            Console.WriteLine("{0} Thread {1}: {2}", DateTime.Now.ToString("s"), Thread.CurrentThread.ManagedThreadId, message.MessageId);
            // Delete the message off the queue. 
            // Receipt handle is the identifier you must provide 
            // when deleting the message.
            var deleteRequest = new DeleteMessageRequest(_queueName, message.ReceiptHandle);
            Console.WriteLine("Processed message id: {0}", message.MessageId);

        catch (Exception ex)
            // Do nothing.
            // Swallow exception, message will return to the queue when 
            // visibility timeout has been exceeded.
            Console.WriteLine("Could not process message due to error. Exception: {0}", ex.Message);

Using SQS to adapt to changing business requirements

One of the benefits of introducing a message queue is that you can accommodate new business requirements without dramatically affecting your application.

If, for example, the business decided that all orders placed over $5000 are to be handled as a priority, you could introduce a new “priority order” queue. The way the orders are processed does not change. The only significant change to the processing application is to ensure that messages from the “priority order” queue are processed before the “standard order” queue.

The following diagram shows how this logic could be isolated in an “order dispatcher,” whose only purpose is to route order messages to the appropriate queue based on whether the order exceeds $5000. Nothing on the web application or the processing nodes changes other than the target queue to which the order is sent. The rates at which orders are processed can be achieved by modifying the poll rates and scalability settings that I have already discussed.

Extending the design pattern with Amazon SNS

Amazon SNS supports reliable publish-subscribe (pub-sub) scenarios and push notifications to known endpoints across a wide variety of protocols. It eliminates the need to periodically check or poll for new information and updates. SNS supports:

  • Reliable storage of messages for immediate or delayed processing
  • Publish / subscribe – direct, broadcast, targeted “push” messaging
  • Multiple subscriber protocols
  • Amazon SQS, HTTP, HTTPS, email, SMS, mobile push, AWS Lambda

With these capabilities, you can provide parallel asynchronous processing of orders in the system and extend it to support any number of different business use cases without affecting the production environment. This is commonly referred to as a “fanout” scenario.

Rather than your web application pushing orders to a queue for processing, send a notification via SNS. The SNS messages are sent to a topic and then replicated and pushed to multiple SQS queues and Lambda functions for processing.

As the diagram above shows, you have the development team consuming “live” data as they work on the next version of the processing application, or potentially using the messages to troubleshoot issues in production.

Marketing is consuming all order information, via a Lambda function that has subscribed to the SNS topic, inserting the records into an Amazon Redshift warehouse for analysis.

All of this, of course, is happening without affecting your order processing application.


While I haven’t dived deep into the specifics of each service, I have discussed how these services can be applied at an architectural level to build loosely coupled systems that facilitate multiple business use cases. I’ve also shown you how to use infrastructure and application-level scaling techniques, so you can get the most out of your EC2 instances.

One of the many benefits of using these managed services is how quickly and easily you can implement powerful messaging capabilities in your systems, and lower the capital and operational costs of managing your own messaging middleware.

Using Amazon SQS and Amazon SNS together can provide you with a powerful mechanism for decoupling application components. This should be part of design considerations as you architect for the cloud.

For more information, see the Amazon SQS Developer Guide and Amazon SNS Developer Guide. You’ll find tutorials on all the concepts covered in this post, and more. To can get started using the AWS console or SDK of your choice visit:

Happy messaging!

Latency Distribution Graph in AWS X-Ray

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/latency-distribution-graph-in-aws-x-ray/

We’re continuing to iterate on the AWS X-Ray service based on customer feedback and today we’re excited to release a set of tools to help you quickly dive deep on latencies in your applications. Visual Node and Edge latency distribution graphs are shown in a handy new “Service Details” side bar in your X-Ray Service Map.

The X-Ray service graph gives you a visual representation of services and their interactions over a period of time that you select. The nodes represent services and the edges between the nodes represent calls between the services. The nodes and edges each have a set of statistics associated with them. While the visualizations provided in the service map are useful for estimating the average latency in an application they don’t help you to dive deep on specific issues. Most of the time issues occur at statistical outliers. To alleviate this X-Ray computes histograms like the one above help you solve those 99th percentile bugs.

To see a Response Distribution for a Node just click on it in the service graph. You can also click on the edges between the nodes to see the Response Distribution from the viewpoint of the calling service.

The team had a few interesting problems to solve while building out this feature and I wanted to share a bit of that with you now! Given the large number of traces an app can produce it’s not a great idea (for your browser) to plot every single trace client side. Instead most plotting libraries, when dealing with many points, use approximations and bucketing to get a network and performance friendly histogram. If you’ve used monitoring software in the past you’ve probably seen as you zoom in on the data you get higher fidelity. The interesting thing about the latencies coming in from X-Ray is that they vary by several orders of magnitude.

If the latencies were distributed between strictly 0s and 1s you could easily just create 10 buckets of 100 milliseconds. If your apps are anything like mine there’s a lot of interesting stuff happening in the outliers, so it’s beneficial to have more fidelity at 1% and 99% than it is at 50%. The problem with fixed bucket sizes is that they’re not necessarily giving you an accurate summary of data. So X-Ray, for now, uses dynamic bucket sizing based on the t-digests algorithm by Ted Dunning and Otmar Ertl. One of the distinct advantages of this algorithm over other approximation algorithms is its accuracy and precision at extremes (where most errors typically are).

An additional advantage of X-Ray over other monitoring software is the ability to measure two perspectives of latency simultaneously. Developers almost always have some view into the server side latency from their application logs but with X-Ray you can examine latency from the view of each of the clients, services, and microservices that you’re interacting with. You can even dive deeper by adding additional restrictions and queries on your selection. You can identify the specific users and clients that are having issues at that 99th percentile.

This info has already been available in API calls to GetServiceGraph as ResponseTimeHistogram but now we’re exposing it in the console as well to make it easier for customers to consume. For more information check out the documentation here.


Using Amazon SQS Dead-Letter Queues to Control Message Failure

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/using-amazon-sqs-dead-letter-queues-to-control-message-failure/

Michael G. Khmelnitsky, Senior Programmer Writer


Sometimes, messages can’t be processed because of a variety of possible issues, such as erroneous conditions within the producer or consumer application. For example, if a user places an order within a certain number of minutes of creating an account, the producer might pass a message with an empty string instead of a customer identifier. Occasionally, producers and consumers might fail to interpret aspects of the protocol that they use to communicate, causing message corruption or loss. Also, the consumer’s hardware errors might corrupt message payload. For these reasons, messages that can’t be processed in a timely manner are delivered to a dead-letter queue.

The recent post Building Scalable Applications and Microservices: Adding Messaging to Your Toolbox gives an overview of messaging in the microservice architecture of modern applications. This post explains how and when you should use dead-letter queues to gain better control over message handling in your applications. It also offers some resources for configuring a dead-letter queue in Amazon Simple Queue Service (SQS).

What are the benefits of dead-letter queues?

The main task of a dead-letter queue is handling message failure. A dead-letter queue lets you set aside and isolate messages that can’t be processed correctly to determine why their processing didn’t succeed. Setting up a dead-letter queue allows you to do the following:

  • Configure an alarm for any messages delivered to a dead-letter queue.
  • Examine logs for exceptions that might have caused messages to be delivered to a dead-letter queue.
  • Analyze the contents of messages delivered to a dead-letter queue to diagnose software or the producer’s or consumer’s hardware issues.
  • Determine whether you have given your consumer sufficient time to process messages.

How do high-throughput, unordered queues handle message failure?

High-throughput, unordered queues (sometimes called standard or storage queues) keep processing messages until the expiration of the retention period. This helps ensure continuous processing of messages, which minimizes the chances of your queue being blocked by messages that can’t be processed. It also ensures fast recovery for your queue.

In a system that processes thousands of messages, having a large number of messages that the consumer repeatedly fails to acknowledge and delete might increase costs and place extra load on the hardware. Instead of trying to process failing messages until they expire, it is better to move them to a dead-letter queue after a few processing attempts.

Note: This queue type often allows a high number of in-flight messages. If the majority of your messages can’t be consumed and aren’t sent to a dead-letter queue, your rate of processing valid messages can slow down. Thus, to maintain the efficiency of your queue, you must ensure that your application handles message processing correctly.

How do FIFO queues handle message failure?

FIFO (first-in-first-out) queues (sometimes called service bus queues) help ensure exactly-once processing by consuming messages in sequence from a message group. Thus, although the consumer can continue to retrieve ordered messages from another message group, the first message group remains unavailable until the message blocking the queue is processed successfully.

Note: This queue type often allows a lower number of in-flight messages. Thus, to help ensure that your FIFO queue doesn’t get blocked by a message, you must ensure that your application handles message processing correctly.

When should I use a dead-letter queue?

  • Do use dead-letter queues with high-throughput, unordered queues. You should always take advantage of dead-letter queues when your applications don’t depend on ordering. Dead-letter queues can help you troubleshoot incorrect message transmission operations. Note: Even when you use dead-letter queues, you should continue to monitor your queues and retry sending messages that fail for transient reasons.
  • Do use dead-letter queues to decrease the number of messages and to reduce the possibility of exposing your system to poison-pill messages (messages that can be received but can’t be processed).
  • Don’t use a dead-letter queue with high-throughput, unordered queues when you want to be able to keep retrying the transmission of a message indefinitely. For example, don’t use a dead-letter queue if your program must wait for a dependent process to become active or available.
  • Don’t use a dead-letter queue with a FIFO queue if you don’t want to break the exact order of messages or operations. For example, don’t use a dead-letter queue with instructions in an Edit Decision List (EDL) for a video editing suite, where changing the order of edits changes the context of subsequent edits.

How do I get started with dead-letter queues in Amazon SQS?

Amazon SQS is a fully managed service that offers reliable, highly scalable hosted queues for exchanging messages between applications or microservices. Amazon SQS moves data between distributed application components and helps you decouple these components. It supports both standard queues and FIFO queues. To configure a queue as a dead-letter queue, you can use the AWS Management Console or the Amazon SQS SetQueueAttributes API action.

To get started with dead-letter queues in Amazon SQS, see the following topics in the Amazon SQS Developer Guide:

To start working with dead-letter queues programmatically, see the following resources:

Event: AWS Serverless Roadshow – Hands-on Workshops

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/event-aws-serverless-roadshow-hands-on-workshops/

Surely, some of you have contemplated how you would survive the possible Zombie apocalypse or how you would build your exciting new startup to disrupt the transportation industry when Unicorn haven is uncovered. Well, there is no need to worry; I know just the thing to get you prepared to handle both of those scenarios: the AWS Serverless Computing Workshop Roadshow.

With the roadshow’s serverless workshops, you can get hands-on experience building serverless applications and microservices so you can rebuild what remains of our great civilization after a widespread viral infection causes human corpses to reanimate around the world in the AWS Zombie Microservices Workshop. In addition, you can give your startup a jump on the competition with the Wild Rydes workshop in order to revolutionize the transportation industry; just in time for a pilot’s crash landing leading the way to the discovery of abundant Unicorn pastures found on the outskirts of the female Amazonian warrior inhabited island of Themyscira also known as Paradise Island.

These free, guided hands-on workshops will introduce the basics of building serverless applications and microservices for common and uncommon scenarios using services like AWS Lambda, Amazon API Gateway, Amazon DynamoDB, Amazon S3, Amazon Kinesis, AWS Step Functions, and more. Let me share some advice before you decide to tackle Zombies and mount Unicorns – don’t forget to bring your laptop to the workshop and make sure you have an AWS account established and available for use for the event.

Check out the schedule below and get prepared today by registering for an upcoming workshop in a city near you. Remember these are workshops are completely free, so participation is on a first come, first served basis. So register and get there early, we need Zombie hunters and Unicorn riders across the globe.  Learn more about AWS Serverless Computing Workshops here and register for your city using links below.

Event Location Date
Wild Rydes New York Thursday, June 8
Wild Rydes Austin Thursday, June 22
Wild Rydes Santa Monica Thursday, July 20
Zombie Apocalypse Chicago Thursday, July 20
Wild Rydes Atlanta Tuesday, September 12
Zombie Apocalypse Dallas Tuesday, September 19


I look forward to fighting zombies and riding unicorns with you all.


Amazon EC2 Container Service – Launch Recap, Customer Stories, and Code

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-ec2-container-service-launch-recap-customer-stories-and-code/

Today seems like a good time to recap some of the features that we have added to Amazon EC2 Container Service over the last year or so, and to share some customer success stories and code with you! The service makes it easy for you to run any number of Docker containers across a managed cluster of EC2 instances, with full console, API, CloudFormation, CLI, and PowerShell support. You can store your Linux and Windows Docker images in the EC2 Container Registry for easy access.

Launch Recap
Let’s start by taking a look at some of the newest ECS features and some helpful how-to blog posts that will show you how to use them:

Application Load Balancing – We added support for the application load balancer last year. This high-performance load balancing option runs at the application level and allows you to define content-based routing rules. It provides support for dynamic ports and can be shared across multiple services, making it easier for you to run microservices in containers. To learn more, read about Service Load Balancing.

IAM Roles for Tasks – You can secure your infrastructure by assigning IAM roles to ECS tasks. This allows you to grant permissions on a fine-grained, per-task basis, customizing the permissions to the needs of each task. Read IAM Roles for Tasks to learn more.

Service Auto Scaling – You can define scaling policies that scale your services (tasks) up and down in response to changes in demand. You set the desired minimum and maximum number of tasks, create one or more scaling policies, and Service Auto Scaling will take care of the rest. The documentation for Service Auto Scaling will help you to make use of this feature.

Blox – Scheduling, in a container-based environment, is the process of assigning tasks to instances. ECS gives you three options: automated (via the built-in Service Scheduler), manual (via the RunTask function), and custom (via a scheduler that you provide). Blox is an open source scheduler that supports a one-task-per-host model, with room to accommodate other models in the future. It monitors the state of the cluster and is well-suited to running monitoring agents, log collectors, and other daemon-style tasks.

Windows – We launched ECS with support for Linux containers and followed up with support for running Windows Server 2016 Base with Containers.

Container Instance Draining – From time to time you may need to remove an instance from a running cluster in order to scale the cluster down or to perform a system update. Earlier this year we added a set of lifecycle hooks that allow you to better manage the state of the instances. Read the blog post How to Automate Container Instance Draining in Amazon ECS to see how to use the lifecycle hooks and a Lambda function to automate the process of draining existing work from an instance while preventing new work from being scheduled for it.

CI/CD Pipeline with Code* – Containers simplify software deployment and are an ideal target for a CI/CD (Continuous Integration / Continuous Deployment) pipeline. The post Continuous Deployment to Amazon ECS using AWS CodePipeline, AWS CodeBuild, Amazon ECR, and AWS CloudFormation shows you how to build and operate a CI/CD pipeline using multiple AWS services.

CloudWatch Logs Integration – This launch gave you the ability to configure the containers that run your tasks to send log information to CloudWatch Logs for centralized storage and analysis. You simply install the Amazon ECS Container Agent and enable the awslogs log driver.

CloudWatch Events – ECS generates CloudWatch Events when the state of a task or a container instance changes. These events allow you to monitor the state of the cluster using a Lambda function. To learn how to capture the events and store them in an Elasticsearch cluster, read Monitor Cluster State with Amazon ECS Event Stream.

Task Placement Policies – This launch provided you with fine-grained control over the placement of tasks on container instances within clusters. It allows you to construct policies that include cluster constraints, custom constraints (location, instance type, AMI, and attribute), placement strategies (spread or bin pack) and to use them without writing any code. Read Introducing Amazon ECS Task Placement Policies to see how to do this!

EC2 Container Service in Action
Many of our customers from large enterprises to hot startups and across all industries, such as financial services, hospitality, and consumer electronics, are using Amazon ECS to run their microservices applications in production. Companies such as Capital One, Expedia, Okta, Riot Games, and Viacom rely on Amazon ECS.

Mapbox is a platform for designing and publishing custom maps. The company uses ECS to power their entire batch processing architecture to collect and process over 100 million miles of sensor data per day that they use for powering their maps. They also optimize their batch processing architecture on ECS using Spot Instances. The Mapbox platform powers over 5,000 apps and reaches more than 200 million users each month. Its backend runs on ECS allowing it to serve more than 1.3 billion requests per day. To learn more about their recent migration to ECS, read their recent blog post, We Switched to Amazon ECS, and You Won’t Believe What Happened Next.

Travel company Expedia designed their backends with a microservices architecture. With the popularization of Docker, they decided they would like to adopt Docker for its faster deployments and environment portability. They chose to use ECS to orchestrate all their containers because it had great integration with the AWS platform, everything from ALB to IAM roles to VPC integration. This made ECS very easy to use with their existing AWS infrastructure. ECS really reduced the heavy lifting of deploying and running containerized applications. Expedia runs 75% of all apps on AWS in ECS allowing it to process 4 billion requests per hour. Read Kuldeep Chowhan‘s blog post, How Expedia Runs Hundreds of Applications in Production Using Amazon ECS to learn more.

Realtor.com provides home buyers and sellers with a comprehensive database of properties that are currently for sale. Their move to AWS and ECS has helped them to support business growth that now numbers 50 million unique monthly users who drive up to 250,000 requests per second at peak times. ECS has helped them to deploy their code more quickly while increasing utilization of their cloud infrastructure. Read the Realtor.com Case Study to learn more about how they use ECS, Kinesis, and other AWS services.

Instacart talks about how they use ECS to power their same-day grocery delivery service:

Capital One talks about how they use ECS to automate their operations and their infrastructure management:

Clever developers are using ECS as a base for their own work. For example:

Rack is an open source PaaS (Platform as a Service). It focuses on infrastructure automation, runs in an isolated VPC, and uses a single-tenant build service for security.

Empire is also an open source PaaS. It provides a Heroku-like workflow and is targeted at small and medium sized startups, with an emphasis on microservices.

Cloud Container Cluster Visualizer (c3vis) helps to visualize resource utilization within ECS clusters:

Stay Tuned
We have plenty of new features in the works for ECS, so stay tuned!



Amazon Simple Queue Service Introduces Server-Side Encryption for Queues

Post Syndicated from Craig Liebendorfer original https://aws.amazon.com/blogs/security/amazon-simple-queue-service-introduces-server-side-encryption-for-queues/

SQS + SSE image

You can now use Amazon Simple Queue Service (SQS) to exchange sensitive data between applications using server-side encryption (SSE). SQS is a fully managed message queuing service for reliably communicating between distributed software components and microservices at any scale. You can use SQS to take advantage of the scale, cost, and operational benefits of a managed messaging service. The addition of SSE allows you to transmit sensitive data with the increased security of using encrypted queues.

SQS SSE uses the 256-bit Advanced Encryption Standard (AES-256 GCM algorithm) to encrypt each message body by using a unique key. AWS Key Management Service (KMS) provides encryption key management. In addition, KMS works with AWS CloudTrail to provide logs of all encryption key usage to help meet your regulatory and compliance needs.

SQS SSE is now available in the US West (Oregon) and US East (Ohio) Regions, with more AWS Regions to follow. There is no additional charge for using encrypted queues, but there is a charge to use KMS. For more information about KMS and CloudTrail pricing, see AWS Key Management Service Pricing.

To learn more, see the full post on the AWS Blog.

– Craig

Roundup of AWS HIPAA Eligible Service Announcements

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/roundup-of-aws-hipaa-eligible-service-announcements/

At AWS we have had a number of HIPAA eligible service announcements. Patrick Combes, the Healthcare and Life Sciences Global Technical Leader at AWS, and Aaron Friedman, a Healthcare and Life Sciences Partner Solutions Architect at AWS, have written this post to tell you all about it.


We are pleased to announce that the following AWS services have been added to the BAA in recent weeks: Amazon API Gateway, AWS Direct Connect, AWS Database Migration Service, and Amazon SQS. All four of these services facilitate moving data into and through AWS, and we are excited to see how customers will be using these services to advance their solutions in healthcare. While we know the use cases for each of these services are vast, we wanted to highlight some ways that customers might use these services with Protected Health Information (PHI).

As with all HIPAA-eligible services covered under the AWS Business Associate Addendum (BAA), PHI must be encrypted while at-rest or in-transit. We encourage you to reference our HIPAA whitepaper, which details how you might configure each of AWS’ HIPAA-eligible services to store, process, and transmit PHI. And of course, for any portion of your application that does not touch PHI, you can use any of our 90+ services to deliver the best possible experience to your users. You can find some ideas on architecting for HIPAA on our website.

Amazon API Gateway
Amazon API Gateway is a web service that makes it easy for developers to create, publish, monitor, and secure APIs at any scale. With PHI now able to securely transit API Gateway, applications such as patient/provider directories, patient dashboards, medical device reports/telemetry, HL7 message processing and more can securely accept and deliver information to any number and type of applications running within AWS or client presentation layers.

One particular area we are excited to see how our customers leverage Amazon API Gateway is with the exchange of healthcare information. The Fast Healthcare Interoperability Resources (FHIR) specification will likely become the next-generation standard for how health information is shared between entities. With strong support for RESTful architectures, FHIR can be easily codified within an API on Amazon API Gateway. For more information on FHIR, our AWS Healthcare Competency partner, Datica, has an excellent primer.

AWS Direct Connect
Some of our healthcare and life sciences customers, such as Johnson & Johnson, leverage hybrid architectures and need to connect their on-premises infrastructure to the AWS Cloud. Using AWS Direct Connect, you can establish private connectivity between AWS and your datacenter, office, or colocation environment, which in many cases can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.

In addition to a hybrid-architecture strategy, AWS Direct Connect can assist with the secure migration of data to AWS, which is the first step to using the wide array of our HIPAA-eligible services to store and process PHI, such as Amazon S3 and Amazon EMR. Additionally, you can connect to third-party/externally-hosted applications or partner-provided solutions as well as securely and reliably connect end users to those same healthcare applications, such as a cloud-based Electronic Medical Record system.

AWS Database Migration Service (DMS)
To date, customers have migrated over 20,000 databases to AWS through the AWS Database Migration Service. Customers often use DMS as part of their cloud migration strategy, and now it can be used to securely and easily migrate your core databases containing PHI to the AWS Cloud. As your source database remains fully operational during the migration with DMS, you minimize downtime for these business-critical applications as you migrate your databases to AWS. This service can now be utilized to securely transfer such items as patient directories, payment/transaction record databases, revenue management databases and more into AWS.

Amazon Simple Queue Service (SQS)
Amazon Simple Queue Service (SQS) is a message queueing service for reliably communicating among distributed software components and microservices at any scale. One way that we envision customers using SQS with PHI is to buffer requests between application components that pass HL7 or FHIR messages to other parts of their application. You can leverage features like SQS FIFO to ensure your messages containing PHI are passed in the order they are received and delivered in the order they are received, and available until a consumer processes and deletes it. This is important for applications with patient record updates or processing payment information in a hospital.

Let’s get building!
We are beyond excited to see how our customers will use our newly HIPAA-eligible services as part of their healthcare applications. What are you most excited for? Leave a comment below.