Tag Archives: load balancing

Unimog – Cloudflare’s edge load balancer

Post Syndicated from David Wragg original https://blog.cloudflare.com/unimog-cloudflares-edge-load-balancer/

Unimog - Cloudflare’s edge load balancer

As the scale of Cloudflare’s edge network has grown, we sometimes reach the limits of parts of our architecture. About two years ago we realized that our existing solution for spreading load within our data centers could no longer meet our needs. We embarked on a project to deploy a Layer 4 Load Balancer, internally called Unimog, to improve the reliability and operational efficiency of our edge network. Unimog has now been deployed in production for over a year.

This post explains the problems Unimog solves and how it works. Unimog builds on techniques used in other Layer 4 Load Balancers, but there are many details of its implementation that are tailored to the needs of our edge network.

Unimog - Cloudflare’s edge load balancer

The role of Unimog in our edge network

Cloudflare operates an anycast network, meaning that our data centers in 200+ cities around the world serve the same IP addresses. For example, our own cloudflare.com website uses Cloudflare services, and one of its IP addresses is 104.17.175.85. All of our data centers will accept connections to that address and respond to HTTP requests. By the magic of Internet routing, when you visit cloudflare.com and your browser connects to 104.17.175.85, your connection will usually go to the closest (and therefore fastest) data center.

Unimog - Cloudflare’s edge load balancer

Inside those data centers are many servers. The number of servers in each varies greatly (the biggest data centers have a hundred times more servers than the smallest ones). The servers run the application services that implement our products (our caching, DNS, WAF, DDoS mitigation, Spectrum, WARP, etc). Within a single data center, any of the servers can handle a connection for any of our services on any of our anycast IP addresses. This uniformity keeps things simple and avoids bottlenecks.

Unimog - Cloudflare’s edge load balancer

But if any server within a data center can handle any connection, when a connection arrives from a browser or some other client, what controls which server it goes to? That’s the job of Unimog.

There are two main reasons why we need this control. The first is that we regularly move servers in and out of operation, and servers should only receive connections when they are in operation. For example, we sometimes remove a server from operation in order to perform maintenance on it. And sometimes servers are automatically removed from operation because health checks indicate that they are not functioning correctly.

The second reason concerns the management of the load on the servers (by load we mean the amount of computing work each one needs to do). If the load on a server exceeds the capacity of its hardware resources, then the quality of service to users will suffer. The performance experienced by users degrades as a server approaches saturation, and if a server becomes sufficiently overloaded, users may see errors. We also want to prevent servers being underloaded, which would reduce the value we get from our investment in hardware. So Unimog ensures that the load is spread across the servers in a data center. This general idea is called load balancing (balancing because the work has to be done somewhere, and so for the load on one server to go down, the load on some other server must go up).

Note that in this post, we’ll discuss how Cloudflare balances the load on its own servers in edge data centers. But load balancing is a requirement that occurs in many places in distributed computing systems. Cloudflare also has a Layer 7 Load Balancing product to allow our customers to balance load across their servers. And Cloudflare uses load balancing in other places internally.

Deploying Unimog led to a big improvement in our ability to balance the load on our servers in our edge data centers. Here’s a chart for one data center, showing the difference due to Unimog. Each line shows the processor utilization of an individual server (the colour of the lines indicates server model). The load on the servers varies during the day with the activity of users close to this data center. The white line marks the point when we enabled Unimog. You can see that after that point, the load on the servers became much more uniform. We saw similar results when we deployed Unimog to our other data centers.

Unimog - Cloudflare’s edge load balancer

How Unimog compares to other load balancers

There are a variety of techniques for load balancing. Unimog belongs to a category called Layer 4 Load Balancers (L4LBs). L4LBs direct packets on the network by inspecting information up to layer 4 of the OSI network model, which distinguishes them from the more common Layer 7 Load Balancers.

The advantage of L4LBs is their efficiency. They direct packets without processing the payload of those packets, so they avoid the overheads associated with higher level protocols. For any load balancer, it’s important that the resources consumed by the load balancer are low compared to the resources devoted to useful work. At Cloudflare, we already pay close attention to the efficient implementation of our services, and that sets a high bar for the load balancer that we put in front of those services.

The downside of L4LBs is that they can only control which connections go to which servers. They cannot modify the data going over the connection, which prevents them from participating in higher-level protocols like TLS, HTTP, etc. (in contrast, Layer 7 Load Balancers act as proxies, so they can modify data on the connection and participate in those higher-level protocols).

L4LBs are not new. They are mostly used at companies which have scaling needs that would be hard to meet with L7LBs alone. Google has published about Maglev, Facebook open-sourced Katran, and Github has open-sourced their GLB.

Unimog is the L4LB that Cloudflare has built to meet the needs of our edge network. It shares features with other L4LBs, and it is particularly strongly influenced by GLB. But there are some requirements that were not well-served by existing L4LBs, leading us to build our own:

  • Unimog is designed to run on the same general-purpose servers that provide application services, rather than requiring a separate tier of servers dedicated to load balancing.
  • It performs dynamic load balancing: measurements of server load are used to adjust the number of connections going to each server, in order to accurately balance load.
  • It supports long-lived connections that remain established for days.
  • Virtual IP addresses are managed as ranges (Cloudflare serves hundreds of thousands of IPv4 addresses on behalf of our customers, so it is impractical to configure these individually).
  • Unimog is tightly integrated with our existing DDoS mitigation system, and the implementation relies on the same XDP technology in the Linux kernel.

The rest of this post describes these features and the design and implementation choices that follow from them in more detail.

For Unimog to balance load, it’s not enough to send the same (or approximately the same) number of connections to each server, because the performance of our servers varies. We regularly update our server hardware, and we’re now on our 10th generation. Once we deploy a server, we keep it in service for as long as it is cost effective, and the lifetime of a server can be several years. It’s not unusual for a single data center to contain a mix of server models, due to expansion and upgrades over time. Processor performance has increased significantly across our server generations. So within a single data center, we need to send different numbers of connections to different servers to utilize the same percentage of their capacity.

It’s also not enough to give each server a fixed share of connections based on static estimates of their capacity. Not all connections consume the same amount of CPU. And there are other activities running on our servers and consuming CPU that are not directly driven by connections from clients. So in order to accurately balance load across servers, Unimog does dynamic load balancing: it takes regular measurements of the load on each of our servers, and uses a control loop that increases or decreases the number of connections going to each server so that their loads converge to an appropriate value.

Refresher: TCP connections

The relationship between TCP packets and connections is central to the operation of Unimog, so we’ll briefly describe that relationship.

(Unimog supports UDP as well as TCP, but for clarity most of this post will focus on the TCP support. We explain how UDP support differs towards the end.)

Here is the outline of a TCP packet:

Unimog - Cloudflare’s edge load balancer

The TCP connection that this packet belongs to is identified by the four labelled header fields, which span the IPv4/IPv6 (i.e. layer 3) and TCP (i.e. layer 4) headers: the source and destination addresses, and the source and destination ports. Collectively, these four fields are known as the 4-tuple. When we say the Unimog sends a connection to a server, we mean that all the packets with the 4-tuple identifying that connection are sent to that server.

A TCP connection is established via a three-way handshake between the client and the server handling that connection. Once a connection has been established, it is crucial that all the incoming packets for that connection go to that same server. If a TCP packet belonging to the connection is sent to a different server, it will signal the fact that it doesn’t know about the connection to the client with a TCP RST (reset) packet. Upon receiving this notification, the client terminates the connection, probably resulting in the user seeing an error. So a misdirected packet is much worse than a dropped packet. As usual, we consider the network to be unreliable, and it’s fine for occasional packets to be dropped. But even a single misdirected packet can lead to a broken connection.

Cloudflare handles a wide variety of connections on behalf of our customers. Many of these connections carry HTTP, and are typically short lived. But some HTTP connections are used for websockets, and can remain established for hours or days. Our Spectrum product supports arbitrary TCP connections. TCP connections can be terminated or stall for many reasons, and ideally all applications that use long-lived connections would be able to reconnect transparently, and applications would be designed to support such reconnections. But not all applications and protocols meet this ideal, so we strive to maintain long-lived connections. Unimog can maintain connections that last for many days.

Forwarding packets

The previous section described that the function of Unimog is to steer connections to servers. We’ll now explain how this is implemented.

To start with, let’s consider how one of our data centers might look without Unimog or any other load balancer. Here’s a conceptual view:

Unimog - Cloudflare’s edge load balancer

Packets arrive from the Internet, and pass through the router, which forwards them on to servers (in reality there is usually additional network infrastructure between the router and the servers, but it doesn’t play a significant role here so we’ll ignore it).

But is such a simple arrangement possible? Can the router spread traffic over servers without some kind of load balancer in between? Routers have a feature called ECMP (equal cost multipath) routing. Its original purpose is to allow traffic to be spread across multiple paths between two locations, but it is commonly repurposed to spread traffic across multiple servers within a data center. In fact, Cloudflare relied on ECMP alone to spread load across servers before we deployed Unimog. ECMP uses a hashing scheme to ensure that packets on a given connection use the same path (Unimog also employs a hashing scheme, so we’ll discuss how this can work in further detail below) . But ECMP is vulnerable to changes in the set of active servers, such as when servers go in and out of service. These changes cause rehashing events, which break connections to all the servers in an ECMP group. Also, routers impose limits on the sizes of ECMP groups, which means that a single ECMP group cannot cover all the servers in our larger edge data centers. Finally, ECMP does not allow us to do dynamic load balancing by adjusting the share of connections going to each server. These drawbacks mean that ECMP alone is not an effective approach.

Ideally, to overcome the drawbacks of ECMP, we could program the router with the appropriate logic to direct connections to servers in the way we want. But although programmable network data planes have been a hot research topic in recent years, commodity routers are still essentially fixed-function devices.

We can work around the limitations of routers by having the router send the packets to some load balancing servers, and then programming those load balancers to forward packets as we want. If the load balancers all act on packets in a consistent way, then it doesn’t matter which load balancer gets which packets from the router (so we can use ECMP to spread packets across the load balancers). That suggests an arrangement like this:

Unimog - Cloudflare’s edge load balancer

And indeed L4LBs are often deployed like this.

Instead, Unimog makes every server into a load balancer. The router can send any packet to any server, and that initial server will forward the packet to the right server for that connection:

Unimog - Cloudflare’s edge load balancer

We have two reasons to favour this arrangement:

First, in our edge network, we avoid specialised roles for servers. We run the same software stack on the servers in our edge network, providing all of our product features, whether DDoS attack prevention, website performance features, Cloudflare Workers, WARP, etc. This uniformity is key to the efficient operation of our edge network: we don’t have to manage how many load balancers we have within each of our data centers, because all of our servers act as load balancers.

The second reason relates to stopping attacks. Cloudflare’s edge network is the target of incessant attacks. Some of these attacks are volumetric – large packet floods which attempt to overwhelm the ability of our data centers to process network traffic from the Internet, and so impact our ability to service legitimate traffic. To successfully mitigate such attacks, it’s important to filter out attack packets as early as possible, minimising the resources they consume. This means that our attack mitigation system needs to occur before the forwarding done by Unimog. That mitigation system is called l4drop, and we’ve written about it before. l4drop and Unimog are closely integrated. Because l4drop runs on all of our servers, and because l4drop comes before Unimog, it’s natural for Unimog to run on all of our servers too.

XDP and xdpd

Unimog implements packet forwarding using a Linux kernel facility called XDP. XDP allows a program to be attached to a network interface, and the program gets run for every packet that arrives, before it is processed by the kernel’s main network stack. The XDP program returns an action code to tell the kernel what to do with the packet:

  • PASS: Pass the packet on to the kernel’s network stack for normal processing.
  • DROP: Drop the packet. This is the basis for l4drop.
  • TX: Transmit the packet back out of the network interface. The XDP program can modify the packet data before transmission. This action is the basis for Unimog forwarding.

XDP programs run within the kernel, making this an efficient approach even at high packet rates. XDP programs are expressed as eBPF bytecode, and run within an in-kernel virtual machine. Upon loading an XDP program, the kernel compiles its eBPF code into machine code. The kernel also verifies the program to check that it does not compromise security or stability. eBPF is not only used in the context of XDP: many recent Linux kernel innovations employ eBPF, as it provides a convenient and efficient way to extend the behaviour of the kernel.

XDP is much more convenient than alternative approaches to packet-level processing, particularly in our context where the servers involved also have many other tasks. We have continued to enhance Unimog since its initial deployment. Our deployment model for new versions of our Unimog XDP code is essentially the same as for userspace services, and we are able to deploy new versions on a weekly basis if needed. Also, established techniques for optimizing the performance of the Linux network stack provide good performance for XDP.

There are two main alternatives for efficient packet-level processing:

  • Kernel-bypass networking (such as DPDK), where a program in userspace manages a network interface (or some part of one) directly without the involvement of the kernel. This approach works best when servers can be dedicated to a network function (due to the need to dedicate processor or network interface hardware resources, and awkward integration with the normal kernel network stack; see our old post about this). But we avoid putting servers in specialised roles. (Github’s open-source GLB uses DPDK, and this is one of the main factors that made GLB unsuitable for us.)
  • Kernel modules, where code is added to the kernel to perform the necessary network functions. The Linux IPVS (IP Virtual Server) subsystem falls into this category. But developing, testing, and deploying kernel modules is cumbersome compared to XDP.

The following diagram shows an overview of our use of XDP. Both l4drop and Unimog are implemented by an XDP program. l4drop matches attack packets, and uses the DROP action to discard them. Unimog forwards packets, using the TX action to resend them. Packets that are not dropped or forwarded pass through to the normal Linux network stack. To support our elaborate use of XPD, we have developed the xdpd daemon which performs the necessary supervisory and support functions for our XDP programs.

Unimog - Cloudflare’s edge load balancer

Rather than a single XDP program, we have a chain of XDP programs that must be run for each packet (l4drop, Unimog, and others we have not covered here). One of the responsibilities of xdpd is to prepare these programs, and to make the appropriate system calls to load them and assemble the full chain.

Our XDP programs come from two sources. Some are developed in a conventional way: engineers write C code, our build system compiles it (with clang) to eBPF ELF files, and our release system deploys those files to our servers. Our Unimog XDP code works like this. In contrast, the l4drop XDP code is dynamically generated by xdpd based on information it receives from attack detection systems.

xdpd has many other duties to support our use of XDP:

  • XDP programs can be supplied with data using data structures called maps. xdpd populates the maps needed by our programs, based on information received from control planes.
  • Programs (for instance, our Unimog XDP program) may depend upon configuration values which are fixed while the program runs, but do not have universal values known at the time their C code was compiled. It would be possible to supply these values to the program via maps, but that would be inefficient (retrieving a value from a map requires a call to a helper function). So instead, xdpd will fix up the eBPF program to insert these constants before it is loaded.
  • Cloudflare carefully monitors the behaviour of all our software systems, and this includes our XDP programs: They emit metrics (via another use of maps), which xdpd exposes to our metrics and alerting system (prometheus).
  • When we deploy a new version of xdpd, it gracefully upgrades in such a way that there is no interruption to the operation of Unimog or l4drop.

Although the XDP programs are written in C, xdpd itself is written in Go. Much of its code is specific to Cloudflare. But in the course of developing xdpd, we have collaborated with Cilium to develop https://github.com/cilium/ebpf, an open source Go library that provides the operations needed by xdpd for manipulating and loading eBPF programs and related objects. We’re also collaborating with the Linux eBPF community to share our experience, and extend the core eBPF technology in ways that make features of xdpd obsolete.

In evaluating the performance of Unimog, our main concern is efficiency: that is, the resources consumed for load balancing relative to the resources used for customer-visible services. Our measurements show that Unimog costs less than 1% of the processor utilization, compared to a scenario where no load balancing is in use. Other L4LBs, intended to be used with servers dedicated to load balancing, may place more emphasis on maximum throughput of packets. Nonetheless, our experience with Unimog and XDP in general indicates that the throughput is more than adequate for our needs, even during large volumetric attacks.

Unimog is not the first L4LB to use XDP. In 2018, Facebook open sourced Katran, their XDP-based L4LB data plane. We considered the possibility of reusing code from Katran. But it would not have been worthwhile: the core C code needed to implement an XDP-based L4LB is relatively modest (about 1000 lines of C, both for Unimog and Katran). Furthermore, we had requirements that were not met by Katran, and we also needed to integrate with existing components and systems at Cloudflare (particularly l4drop). So very little of the code could have been reused as-is.

Encapsulation

As discussed as the start of this post, clients make connections to one of our edge data centers with a destination IP address that can be served by any one of our servers. These addresses that do not correspond to a specific server are known as virtual IPs (VIPs). When our Unimog XDP program forwards a packet destined to a VIP, it must replace that VIP address with the direct IP (DIP) of the appropriate server for the connection, so that when the packet is retransmitted it will reach that server. But it is not sufficient to overwrite the VIP in the packet headers with the DIP, as that would hide the original destination address from the server handling the connection (the original destination address is often needed to correctly handle the connection).

Instead, the packet must be encapsulated: Another set of packet headers is prepended to the packet, so that the original packet becomes the payload in this new packet. The DIP is then used as the destination address in the outer headers, but the addressing information in the headers of the original packet is preserved. The encapsulated packet is then retransmitted. Once it reaches the target server, it must be decapsulated: the outer headers are stripped off to yield the original packet as if it had arrived directly.

Encapsulation is a general concept in computer networking, and is used in a variety of contexts. The headers to be added to the packet by encapsulation are defined by an encapsulation format. Many different encapsulation formats have been defined within the industry, tailored to the requirements in specific contexts. Unimog uses a format called GUE (Generic UDP Encapsulation), in order to allow us to re-use the glb-redirect component from github’s GLB (glb-redirect is discussed below).

GUE is a relatively simple encapsulation format. It encapsulates within a UDP packet, placing a GUE-specific header between the outer IP/UDP headers and the payload packet to allow extension data to be carried (and we’ll see how Unimog takes advantage of this):

Unimog - Cloudflare’s edge load balancer

When an encapsulated packet arrives at a server, the encapsulation process must be reversed. This step is called decapsulation. The headers that were added during the encapsulation process are removed, leaving the original packet to be processed by the network stack as if it had arrived directly from the client.

An issue that can arise with encapsulation is hitting limits on the maximum packet size, because the encapsulation process makes packets larger. The de-facto maximum packet size on the Internet is 1500 bytes, and not coincidentally this is also the maximum packet size on ethernet networks. For Unimog, encapsulating a 1500-byte packet results in a 1536-byte packet. To allow for these enlarged encapsulated packets, we have enabled jumbo frames on the networks inside our data centers, so that the 1500-byte limit only applies to packets headed out to the Internet.

Forwarding logic

So far, we have described the technology used to implement the Unimog load balancer, but not how our Unimog XDP program selects the DIP address when forwarding a packet. This section describes the basic scheme. But as we’ll see, there is a problem, so then we’ll describe how this scheme is elaborated to solve that problem.

In outline, our Unimog XDP program processes each packet in the following way:

  1. Determine whether the packet is destined for a VIP address. Not all of the packets arriving at a server are for VIP addresses. Other packets are passed through for normal handling by the kernel’s network stack. (xdpd obtains the VIP address ranges from the Unimog control plane.)
  2. Determine the DIP for the server handling the packet’s connection.
  3. Encapsulate the packet, and retransmit it to the DIP.

In step 2, note that all the load balancers must act consistently – when forwarding packets, they must all agree about which connections go to which servers. The rate of new connections arriving at a data center is large, so it’s not practical for load balancers to agree by communicating information about connections amongst themselves. Instead L4LBs adopt designs which allow the load balancers to reach consistent forwarding decisions independently. To do this, they rely on hashing schemes: Take the 4-tuple identifying the packet’s connection, put it through a hash function to obtain a key (the hash function ensures that these key values are uniformly distributed), then perform some kind of lookup into a data structure to turn the key into the DIP for the target server.

Unimog uses such a scheme, with a data structure that is simple compared to some other L4LBs. We call this data structure the forwarding table, and it consists of an array where each entry contains a DIP specifying the server target server for the relevant packets (we call these entries buckets). The forwarding table is generated by the Unimog control plane and broadcast to the load balancers (more on this below), so that it has the same contents on all load balancers.

To look up a packet’s key in the forwarding table, the low N bits from the key are used as the index for a bucket (the forwarding table is always a power-of-2 in size):

Unimog - Cloudflare’s edge load balancer

Note that this approach does not provide per-connection control – each bucket typically applies to many connections. All load balancers in a data center use the same forwarding table, so they all forward packets in a consistent manner. This means it doesn’t matter which packets are sent by the router to which servers, and so ECMP re-hashes are a non-issue. And because the forwarding table is immutable and simple in structure, lookups are fast.

Although the above description only discusses a single forwarding table, Unimog supports multiple forwarding tables, each one associated with a trafficset – the traffic destined for a particular service. Ranges of VIP addresses are associated with a trafficset. Each trafficset has its own configuration settings and forwarding tables. This gives us the flexibility to differentiate how Unimog behaves for different services.

Precise load balancing requires the ability to make fine adjustments to the number of connections arriving at each server. So we make the number of buckets in the forwarding table more than 100 times the number of servers. Our data centers can contain hundreds of servers, and so it is normal for a Unimog forwarding table to have tens of thousands of buckets. The DIP for a given server is repeated across many buckets in the forwarding table, and by increasing or decreasing the number of buckets that refer to a server, we can control the share of connections going to that server. Not all buckets will correspond to exactly the same number of connections at a given point in time (the properties of the hash function make this a statistical matter). But experience with Unimog has demonstrated that the relationship between the number of buckets and resulting server load is sufficiently strong to allow for good load balancing.

But as mentioned, there is a problem with this scheme as presented so far. Updating a forwarding table, and changing the DIPs in some buckets, would break connections that hash to those buckets (because packets on those connections would get forwarded to a different server after the update). But one of the requirements for Unimog is to allow us to change which servers get new connections without impacting the existing connections. For example, sometimes we want to drain the connections to a server, maintaining the existing connections to that server but not forwarding new connections to it, in the expectation that many of the existing connections will terminate of their own accord. The next section explains how we fix this scheme to allow such changes.

Maintaining established connections

To make changes to the forwarding table without breaking established connections, Unimog adopts the “daisy chaining” technique described in the paper Stateless Datacenter Load-balancing with Beamer.

To understand how the Beamer technique works, let’s look at what can go wrong when a forwarding table changes: imagine the forwarding table is updated so that a bucket which contained the DIP of server A now refers to server B. A packet that would formerly have been sent to A by the load balancers is now sent to B. If that packet initiates a new connection (it’s a TCP SYN packet), there’s no problem – server B will continue the three-way handshake to complete the new connection. On the other hand, if the packet belongs to a connection established before the change, then the TCP implementation of server B has no matching TCP socket, and so sends a RST back to the client, breaking the connection.

This explanation hints at a solution: the problem occurs when server B receives a forwarded packet that does not match a TCP socket. If we could change its behaviour in this case to forward the packet a second time to the DIP of server A, that would allow the connection to server A to be preserved. For this to work, server B needs to know the DIP for the bucket before the change.

To accomplish this, we extend the forwarding table so that each bucket has two slots, each containing the DIP for a server. The first slot contains the current DIP, which is used by the load balancer to forward packets as discussed (and here we refer to this forwarding as the first hop). The second slot preserves the previous DIP (if any), in order to allow the packet to be forwarded again on a second hop when necessary.

For example, imagine we have a forwarding table that refers to servers A, B, and C, and then it is updated to stop new connections going to server A, but maintaining established connections to server A. This is achieved by replacing server A’s DIP in the first slot of any buckets where it appears, but preserving it in the second slot:

Unimog - Cloudflare’s edge load balancer

In addition to extending the forwarding table, this approach requires a component on each server to forward packets on the second hop when necessary. This diagram shows where this redirector fits into the path a packet can take:

Unimog - Cloudflare’s edge load balancer

The redirector follows some simple logic to decide whether to process a packet locally on the first-hop server or to forward it on the second-hop server:

  • If the packet is a SYN packet, initiating a new connection, then it is always processed by the first-hop server. This ensures that new connections go to the first-hop server.
  • For other packets, the redirector checks whether the packet belongs to a connection with a corresponding TCP socket on the first-hop server. If so, it is processed by that server.
  • Otherwise, the packet has no corresponding TCP socket on the first-hop server. So it is forwarded on to the second-hop server to be processed there (in the expectation that it belongs to some connection established on the second-hop server that we wish to maintain).

In that last step, the redirector needs to know the DIP for the second hop. To avoid the need for the redirector to do forwarding table lookups, the second-hop DIP is placed into the encapsulated packet by the Unimog XDP program (which already does a forwarding table lookup, so it has easy access to this value). This second-hop DIP is carried in a GUE extension header, so that it is readily available to the redirector if it needs to forward the packet again.

This second hop, when necessary, does have a cost. But in our data centers, the fraction of forwarded packets that take the second hop is usually less than 1% (despite the significance of long-lived connections in our context). The result is that the practical overhead of the second hops is modest.

When we initially deployed Unimog, we adopted the glb-redirect iptables module from github’s GLB to serve as the redirector component. In fact, some implementation choices in Unimog, such as the use of GUE, were made in order to facilitate this re-use. glb-redirect worked well for us initially, but subsequently we wanted to enhance the redirector logic. glb-redirect is a custom Linux kernel module, and developing and deploying changes to kernel modules is more difficult for us than for eBPF-based components such as our XDP programs. This is not merely due to Cloudflare having invested more engineering effort in software infrastructure for eBPF; it also results from the more explicit boundary between the kernel and eBPF programs (for example, we are able to run the same eBPF programs on a range of kernel versions without recompilation). We wanted to achieve the same ease of development for the redirector as for our XDP programs.

To that end, we decided to write an eBPF replacement for glb-redirect. While the redirector could be implemented within XDP, like our load balancer, practical concerns led us to implement it as a TC classifier program instead (TC is the traffic control subsystem within the Linux network stack). A downside to XDP is that the packet contents prior to processing by the XDP program are not visible using conventional tools such as tcpdump, complicating debugging. TC classifiers do not have this downside, and in the context of the redirector, which passes most packets through, the performance advantages of XDP would not be significant.

The result is cls-redirect, a redirector implemented as a TC classifier program. We have contributed our cls-redirect code as part of the Linux kernel test suite. In addition to implementing the redirector logic, cls-redirect also implements decapsulation, removing the need to separately configure GUE tunnel endpoints for this purpose.

There are some features suggested in the Beamer paper that Unimog does not implement:

  • Beamer embeds generation numbers in the encapsulated packets to address a potential corner case where a ECMP rehash event occurs at the same time as a forwarding table update is propagating from the control plane to the load balancers. Given the combination of circumstances required for a connection to be impacted by this issue, we believe that in our context the number of affected connections is negligible, and so the added complexity of the generation numbers is not worthwhile.
  • In the Beamer paper, the concept of daisy-chaining encompasses third hops etc. to preserve connections across a series of changes to a bucket. Unimog only uses two hops (the first and second hops above), so in general it can only preserve connections across a single update to a bucket. But our experience is that even with only two hops, a careful strategy for updating the forwarding tables permits connection lifetimes of days.

To elaborate on this second point: when the control plane is updating the forwarding table, it often has some choice in which buckets to change, depending on the event that led to the update. For example, if a server is being brought into service, then some buckets must be assigned to it (by placing the DIP for the new server in the first slot of the bucket). But there is a choice about which buckets. A strategy of choosing the least-recently modified buckets will tend to minimise the impact to connections.

Furthermore, when updating the forwarding table to adjust the balance of load between servers, Unimog often uses a novel trick: due to the redirector logic, exchanging the first-hop and second-hop DIPs for a bucket only affects which server receives new connections for that bucket, and never impacts any established connections. Unimog is able to achieve load balancing in our edge data centers largely through forwarding table changes of this type.

Control plane

So far, we have discussed the Unimog data plane – the part that processes network packets. But much of the development effort on Unimog has been devoted to the control plane – the part that generates the forwarding tables used by the data plane. In order to correctly maintain the forwarding tables, the control plane consumes information from multiple sources:

  • Server information: Unimog needs to know the set of servers present in a data center, some key information about each one (such as their DIP addresses), and their operational status. It also needs signals about transitional states, such as when a server is being withdrawn from service, in order to gracefully drain connections (preventing the server from receiving new connections, while maintaining its established connections).
  • Health: Unimog should only send connections to servers that are able to correctly handle those connections, otherwise those servers should be removed from the forwarding tables. To ensure this, it needs health information at the node level (indicating that a server is available) and at the service level (indicating that a service is functioning normally on a server).
  • Load: in order to balance load, Unimog needs information about the resource utilization on each server.
  • IP address information: Cloudflare serves hundreds of thousands of IPv4 addresses, and these are something that we have to treat as a dynamic resource rather than something statically configured.

The control plane is implemented by a process called the conductor. In each of our edge data centers, there is one active conductor, but there are also standby instances that will take over if the active instance goes away.

We use Hashicorp’s Consul in a number of ways in the Unimog control plane (we have an independent Consul server cluster in each data center):

  • Consul provides a key-value store, with support for blocking queries so that changes to values can be received promptly. We use this to propagate the forwarding tables and VIP address information from the conductor to xdpd on the servers.
  • Consul provides server- and service-level health checks. We use this as the source of health information for Unimog.
  • The conductor stores its state in the Consul KV store, and uses Consul’s distributed locks to ensure that only one conductor instance is active.

The conductor obtains server load information from Prometheus, which we already use for metrics throughout our systems. It balances the load across the servers using a control loop, periodically adjusting the forwarding tables to send more connections to underloaded servers and less connections to overloaded servers. The load for a server is defined by a Prometheus metric expression which measures processor utilization (with some intricacies to better handle characteristics of our workloads). The determination of whether a server is underloaded or overloaded is based on comparison with the average value of the load metric, and the adjustments made to the forwarding table are proportional to the deviation from the average. So the result of the feedback loop is that the load metric for all servers converges on the average.

Finally, the conductor queries internal Cloudflare APIs to obtain the necessary information on servers and addresses.

Unimog - Cloudflare’s edge load balancer

Unimog is a critical system: incorrect, poorly adjusted or stale forwarding tables could cause incoming network traffic to a data center to be dropped, or servers to be overloaded, to the point that a data center would have to be removed from service. To maintain a high quality of service and minimise the overhead of managing our many edge data centers, we have to be able to upgrade all components. So to the greatest extent possible, all components are able to tolerate brief absences of the other components without any impact to service. In some cases this is possible through careful design. In other cases, it requires explicit handling. For example, we have found that Consul can temporarily report inaccurate health information for a server and its services when the Consul agent on that server is restarted (for example, in order to upgrade Consul). So we implemented the necessary logic in the conductor to detect and disregard these transient health changes.

Unimog also forms a complex system with feedback loops: The conductor reacts to its observations of behaviour of the servers, and the servers react to the control information they receive from the conductor. This can lead to behaviours of the overall system that are hard to anticipate or test for. For instance, not long after we deployed Unimog we encountered surprising behaviour when data centers became overloaded. This is of course a scenario that we strive to avoid, and we have automated systems to remove traffic from overloaded data centers if it does. But if a data center became sufficiently overloaded, then health information from its servers would indicate that many servers were degraded to the point that Unimog would stop sending new connections to those servers. Under normal circumstances, this is the correct reaction to a degraded server. But if enough servers become degraded, diverting new connections to other servers would mean those servers became degraded, while the original servers were able to recover. So it was possible for a data center that became temporarily overloaded to get stuck in a state where servers oscillated between healthy and degraded, even after the level of demand on the data center had returned to normal. To correct this issue, the conductor now has logic to distinguish between isolated degraded servers and such data center-wide problems. We have continued to improve Unimog in response to operational experience, ensuring that it behaves in a predictable manner over a wide range of conditions.

UDP Support

So far, we have described Unimog’s support for directing TCP connections. But Unimog also supports UDP traffic. UDP does not have explicit connections between clients and servers, so how it works depends upon how the UDP application exchanges packets between the client and server. There are a few cases of interest:

Request-response UDP applications

Some applications, such as DNS, use a simple request-response pattern: the client sends a request packet to the server, and expects a response packet in return. Here, there is nothing corresponding to a connection (the client only sends a single packet, so there is no requirement to make sure that multiple packets arrive at the same server). But Unimog can still provide value by spreading the requests across our servers.

To cater to this case, Unimog operates as described in previous sections, hashing the 4-tuple from the packet headers (the source and destination IP addresses and ports). But the Beamer daisy-chaining technique that allows connections to be maintained does not apply here, and so the buckets in the forwarding table only have a single slot.

UDP applications with flows

Some UDP applications have long-lived flows of packets between the client and server. Like TCP connections, these flows are identified by the 4-tuple. It is necessary that such flows go to the same server (even when Cloudflare is just passing a flow through to the origin server, it is convenient for detecting and mitigating certain kinds of attack to have that flow pass through a single server within one of Cloudflare’s data centers).

It’s possible to treat these flows by hashing the 4-tuple, skipping the Beamer daisy-chaining technique as for request-response applications. But then adding servers will cause some flows to change servers (this would effectively be a form of consistent hashing). For UDP applications, we can’t say in general what impact this has, as we can for TCP connections. But it’s possible that it causes some disruption, so it would be nice to avoid this.

So Unimog adapts the daisy-chaining technique to apply it to UDP flows. The outline remains similar to that for TCP: the same redirector component on each server decides whether to send a packet on a second hop. But UDP does not have anything corresponding to TCP’s SYN packet that indicates a new connection. So for UDP, the part that depends on SYNs is removed, and the logic applied for each packet becomes:

  • The redirector checks whether the packet belongs to a connection with a corresponding UDP socket on the first-hop server. If so, it is processed by that server.
  • Otherwise, the packet has no corresponding TCP socket on the first-hop server. So it is forwarded on to the second-hop server to be processed there (in the expectation that it belongs to some flow established on the second-hop server that we wish to maintain).

Although the change compared to the TCP logic is not large, it has the effect of switching the roles of the first- and second-hop servers: For UDP, new flows go to the second-hop server. The Unimog control plane has to take account of this when it updates a forwarding table. When it introduces a server into a bucket, that server should receive new connections or flows. For a TCP trafficset, this means it becomes the first-hop server. For UDP trafficset, it must become the second-hop server.

This difference between handling of TCP and UDP also leads to higher overheads for UDP. In the case of TCP, as new connections are formed and old connections terminate over time, fewer packets will require the second hop, and so the overhead tends to diminish. But with UDP, new connections always involve the second hop. This is why we differentiate the two cases, taking advantage of SYN packets in the TCP case.

The UDP logic also places a requirement on services. The redirector must be able to match packets to the corresponding sockets on a server according to their 4-tuple. This is not a problem in the TCP case, because all TCP connections are represented by connected sockets in the BSD sockets API (these sockets are obtained from an accept system call, so that they have a local address and a peer address, determining the 4-tuple). But for UDP, unconnected sockets (lacking a declared peer address) can be used to send and receive packets. So some UDP services only use unconnected sockets. For the redirector logic above to work, services must create connected UDP sockets in order to expose their flows to the redirector.

UDP applications with sessions

Some UDP-based protocols have explicit sessions, with a session identifier in each packet. Session identifiers allow sessions to persist even if the 4-tuple changes. This happens in mobility scenarios – for example, if a mobile device passes from a WiFi to a cellular network, causing its IP address to change. An example of a UDP-based protocol with session identifiers is QUIC (which calls them connection IDs).

Our Unimog XDP program allows a flow dissector to be configured for different trafficsets. The flow dissector is the part of the code that is responsible for taking a packet and extracting the value that identifies the flow or connection (this value is then hashed and used for the lookup into the forwarding table). For TCP and UDP, there are default flow dissectors that extract the 4-tuple. But specialised flow dissectors can be added to handle UDP-based protocols.

We have used this functionality in our WARP product. We extended the Wireguard protocol used by WARP in a backwards-compatible way to include a session identifier, and added a flow dissector to Unimog to exploit it. There are more details in our post on the technical challenges of WARP.

Conclusion

Unimog has been deployed to all of Cloudflare’s edge data centers for over a year, and it has become essential to our operations. Throughout that time, we have continued to enhance Unimog (many of the features described here were not present when it was first deployed). So the ease of developing and deploying changes, due to XDP and xdpd, has been a significant benefit. Today we continue to extend it, to support more services, and to help us manage our traffic and the load on our servers in more contexts.

Using WebSockets and Load Balancers Part two

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/using-websockets-and-load-balancers-part-two/

This post was written by Robert Zhu, Principal Developer Advocate at AWS. 

This article continues a blog I posted earlier about using Load Balancers on Amazon Lightsail. In this article, I demonstrate a few common challenges and solutions when combining stateful applications with load balancers. I start with a simple WebSocket application in Amazon Lightsail that counts the number of seconds the client has been connected. Then, I add a Lightsail Load Balancer, and show you how the application performs routing and retries. Let’s get started.

WebSockets

WebSockets are persistent, duplex sockets that enable bi-directional communication between a client and server. Applications often use WebSockets to provide real-time functionality such as chat and gaming. Let’s start with some sample code for a simple WebSocket server:

const WebSocket = require("ws");
const name = require("./randomName");
const server = require("http").createServer();
const express = require("express");
const app = express();

console.log(`This server is named: ${name}`);

// serve files from the public directory
server.on("request", app.use(express.static("public")));

// tell the WebSocket server to use the same HTTP server
const wss = new WebSocket.Server({
  server,
});

wss.on("connection", function connection(ws, req) {
  const clientId = req.url.replace("/?id=", "");
  console.log(`Client connected with ID: ${clientId}`);

  let n = 0;
  const interval = setInterval(() => {
    ws.send(`${name}: you have been connected for ${n++} seconds`);
  }, 1000);

  ws.on("close", () => {
    clearInterval(interval);
  });
});

const port = process.env.PORT || 80;
server.listen(port, () => {
  console.log(`Server listening on port ${port}`);
});

We serve static files from the public directory and WebSocket connection requests on the same port. An incoming HTTP request from a browser loads public/index.html, and a WebSocket connection initiated from the client triggers the wss.on(“connection”, …) code. Upon receiving a WebSocket connection, I set up a recurring callback where I tell the client how long it has been connected. Now, let’s look at a look at the client code:

buttonConnect.onclick = async () => {
  const serverAddress = inputServerAddress.value;
  messages.innerHTML = "";
  instructions.parentElement.removeChild(instructions);

  appendMessage(`Connecting to ${serverAddress}`);

  try {
    let retries = 0;
    while (retries < 50) {
      appendMessage(`establishing connection... retry #${retries}`);
      await runSession(serverAddress);
      await sleep(1500);
      retries++;
    }

    appendMessage("Reached maximum retries, giving up.");
  } catch (e) {
    appendMessage(e.message || e);
  }
};

async function runSession(address) {
  const ws = new WebSocket(address);

  ws.addEventListener("open", () => {
    appendMessage("connected to server");
  });

  ws.addEventListener("message", ({ data }) => {
    console.log(data);
    appendMessage(data);
  });

  return new Promise((resolve) => {
    ws.addEventListener("close", () => {
      appendMessage("Connection lost with server.");
      resolve();
    });
  });
}

I use the WebSocket DOM API to connect to the server. Once connected, I append any received messages to console and on screen via the custom appendMessage function. If the client loses connectivity, it will try to reconnect up to 50 times. Let’s run it:

"slimy-cardinal" is a randomly generated server name

Now, suppose I am running a very demanding real-time application, and need to scale the server capacity beyond a single host. How would I do this? I create two Ubuntu 18.04 instances. Once the instances are up, I SSH to each one, and run the following commands:

sudo apt-get update
sudo apt-get -y install nodejs npm
git clone https://github.com/robzhu/ws-time
cd ws-time && npm install
node server.js

During installation, select Yes when presented with the prompt:

Select "Yes" when prompted to install libssl for npm

Keep these SSH sessions open, you need them shortly. Next, you create the Load Balancer in Amazon Lightsail, and attach the instances:

screenshot of target instances for load balancers

Note: the Lightsail load balancer only works for port 80, which is part of the reason I use the same port for HTTP and WebSocket requests.

Copy the DNS name for the load balancer, open it in a new browser tab, and paste it into the WebSocket server address with the format:

ws://<DNSName>

screenshot of what the correct server address should look like

Make sure the server address does not accidentally start with “ws://http://…”

Next, locate the SSH session that accepted the connection. It looks like this:

accepted ssh section

The server logs the client ID when it receives a connection.

If you kill this process, the client disconnects and runs its retry logic, hopefully causing the load balancer to route the client to a healthy node. Next, hit connect from the client. After a few seconds, kill the process on the server, and you should see the client reconnect to a healthy instance:

what you should see when you connect to a healthy instance

The client retry was routed to a healthy instance on the first attempt. This is due to the round-robin algorithm that the Lightsail load balancer uses. In production, you should not expect the load balancer to detect an unhealthy node immediately. If the load balancer continues to route incoming connections to an unhealthy node, the client will need more retry attempts before reconnecting. If this is a large scale system, we will want to implement an exponential backoff on the retry intervals to avoid overwhelming other nodes in the cluster (aka the thundering herd problem).

Notice that the message “you have been connected for X seconds” reset X to 0 after the client reconnected. What if you want to make the failover transparent to the user? The problem is that the connection duration (X) is stored in the NodeJS process that we killed. That state is lost if the process dies or if the host goes down. The solution is unsurprising: move the state off the WebSocket server and into a distributed cache, such as redis.

Deep health checks

When you attached your instances to the load balancer, your health checks passed because the Lightsail load balancer issues an HTTP request for the default path (where you serve index.html). However, if you expect most of your server load to come from I/O on the WebSocket connections, the ability to serve our index.html file is not a good health check.  You might implement a better health check like so:

app.get('/healthcheck', (req, res) => {
  const serverHasCapacity = getAverageNetworkUsage() < 0.6;
  if (serverHasCapacity) res.status(200).send("ok");
  res.status(400).send("server is overloaded");
});

This caused the load balancer to consider a node as “unhealthy” when the target node’s network usage reaches a threshold value. In response, the load balancer stops routing new incoming connections to that node. However, note that the load balancer will not terminate existing connections to an over-subscribed node.

When working with persistent connections or sticky sessions, always leave some capacity buffer. For example, do not mark the server as unhealthy only when it reaches 100% capacity. This is because existing connections or sticky clients will continue to generate traffic for that node, and some workloads may increase server usage beyond its threshold (e.g. a chat room that suddenly gets very busy).

 

Conclusion

I hope this post has given you a clear idea of how to use load balancers to improve scalability for stateful applications, and how to implement such a solution using Amazon Lightsail instances and load balancers. Please feel free to leave comments, and try this solution for yourself.

 

High Availability Load Balancers with Maglev

Post Syndicated from Terin Stock original https://blog.cloudflare.com/high-availability-load-balancers-with-maglev/

High Availability Load Balancers with Maglev

Background

High Availability Load Balancers with Maglev

We run many backend services that power our customer dashboard, APIs, and features available at our edge. We own and operate physical infrastructure for our backend services. We need an effective way to route arbitrary TCP and UDP traffic between services and also from outside these data centers.

Previously, all traffic for these backend services would pass through several layers of stateful TCP proxies and NATs before reaching an available instance. This solution worked for several years, but as we grew it caused our service and operations teams many issues. Our service teams needed to deal with drops of availability, and our operations teams had much toil when needing to do maintenance on load balancer servers.

Goals

With the experience with our stateful TCP proxy and NAT solutions in mind, we had several goals for a replacement load balancing service, while remaining on our own infrastructure:

  1. Preserve source IPs through routing decisions to destination servers. This allows us to support servers that require client IP addresses as part of their operation, without workarounds such as X-Forwarded-For headers or the PROXY TCP extension.
  2. Support an architecture where backends are located across many racks and subnets. This prevents solutions that cannot be routed by existing network equipment.
  3. Allow operation teams to perform maintenance with zero downtime. We should be able to remove load balancers at any time without causing any connection resets or downtime for services.
  4. Use Linux tools and features that are commonplace and well-tested. There are a lot of very cool networking features in Linux we could experiment with, but we wanted to optimize for least surprising behavior for operators who do not primarily work with these load balancers.
  5. No explicit connection synchronization between load balancers. We found that communication between load balancers significantly increased the system complexity, allowing for more opportunities for things to go wrong.
  6. Allow for staged rollout from the previous load balancer implementation. We should be able to migrate the traffic of specific services between the two implementations to find issues and gain confidence in the system.

Reaching Zero Downtime

Problems

High Availability Load Balancers with Maglev

Previously, when traffic arrived at our backend data centers, our routers would pick and forward packets to one of the L4 load balancers servers it knew about.  These L4 load balancers would determine what service the traffic was for, then forward the traffic to one of the service’s L7 servers.

This architecture worked fine during normal operations. However, issues would quickly surface whenever the set of load balancers changed. Our routers would forward traffic to the new set and it was very likely traffic would arrive to a different load balancer than before. As each load balancer maintained its own connection state, it would be unable to forward  traffic for these new in-progress connections. These connections would then be reset, potentially causing errors for our customers.

Consistent Hashing

High Availability Load Balancers with Maglev

Explanation of how the components work together to allow for zero downtime failovers and maintenance. Include a diagram of a director being taken out of rotation and connections continuing to work.

During normal operations, our new architecture has similar behavior to the previous design. A L4 load balancer would be selected by our routers, which would then forward traffic to a service’s L7 server.

There’s a significant change when the set of load balancers changes. As our load balancers are now stateless, it doesn’t matter which load balancer our router selects to forward traffic to, they’ll end up reaching the same backend server.

Implementation

BGP

Our load balancer servers announce service IP addresses to our data centers’ routers using BGP, unchanged from the previous solution. Our routers choose which load balancers will receive packets based on a routing strategy called equal-cost multi-path routing (ECMP).

ECMP hashes information from packets to pick a path for that packet. The hash function used by routers is often fixed in firmware. Routers that chose a poor hashing function, or chose bad inputs, can create unbalanced network and server load, or break assumptions made by the protocol layer.

We worked with our networking team to ensure ECMP is configured on our routers to hash only based on the packet’s 5-tuple—the protocol, source address and port, and destination address and port.

For maintenance, our operators can withdraw the BGP session and traffic will transparently shift to other load balancers. However, if a load balancer suddenly becomes unavailable, such as with a kernel panic or power failure, there is a short delay before the BGP keepalive mechanism fails and routers terminate the session.

It’s possible for routers to terminate BGP sessions after a much shorter delay using the Bidirectional Forwarding Detection (BFD) protocol between the router and load balancers. Different routers have different limitations and restrictions on BFD that makes it difficult to use in an environment heavily using L2 link aggregation and VXLANs.

We’re continuing to work with our networking team to find solutions to reduce the time to terminate BGP sessions, using tools and configurations they’re most comfortable with.

Selecting Backends with Maglev

To ensure all load balancers are sending traffic to the same backends, we decided to use the Maglev connection scheduler. Maglev is a consistent hash scheduler hashing a 5-tuple of information from each packet—the protocol, source address and port, and destination address and port—to determine a backend server.

By being a consistent hash, the same backend server is chosen by every load balancer for a packet without needing to persist any connection state. This allows us to transparently move traffic between load balancers without requiring explicit connection synchronization between them.

IPVS and Foo-Over-UDP

Where possible, we wanted to use commonplace and reliable Linux features. Linux has implemented a powerful layer 4 load balancer, the IP Virtual Server (IPVS), since the early 2000s. IPVS has supported the Maglev scheduler since Linux 4.18.

High Availability Load Balancers with Maglev

Our load balancer and application servers are spread across multiple racks and subnets. To route traffic from the load balancer we opted to use Foo-Over-UDP encapsulation.

In Foo-Over-UDP encapsulation a new IP and UDP header are added around the original packet. When these packets arrive on the destination server, the Linux kernel removes the outer IP and UDP headers and inserts the inner payload back into the networking stack for processing as if the packet had originally been received on that server.

Compared to other encapsulation methods—such as IPIP, GUE, and GENEVE—we felt Foo-Over-UDP struck a nice balance between features and flexibility. Direct Server Return, where application servers reply directly to clients and bypass the load balancers, was implemented as a byproduct of the encapsulation. There was no state associated with the encapsulation, each server only required one encapsulation interface to receive traffic from all load balancers.

Example Load Balancer Configuration

# Load in the kernel modules required for IPVS and FOU.
$ modprobe ip_vs && modprobe ip_vs_mh && modprobe fou

# Create one tunnel between the load balancer and
# an application server. The IPs are the machines'
# real IPs on the network.
$ ip link add name lbtun1 type ipip \
remote 192.0.2.1 local 192.0.2.2 ttl 2 \  encap fou encap-sport auto encap-dport 5555

# Inform the kernel about the VIPs that might be announced here.
$ ip route add table local local 198.51.100.0/24 \
dev lo proto kernel

# Give the tunnel an IP address local to this machine.
# Traffic on this machine destined for this IP address will
# be sent down the tunnel.
$ ip route add 203.0.113.1 dev lbtun1 scope link

# Tell IPVS about the service, and that it should use the
# Maglev scheduler.
$ ipvsadm -A -t 198.51.100.1:80 -s mh

# Tell IPVS about a backend for this service.
$ ipvsadm -a -t 198.51.100.1:80 -r 203.0.113.1:80```

Example Application Server Configuration

# The kernel module may need to be loaded.
$ modprobe fou

# Setup an IPIP receiver.# ipproto 4 = IPIP (not IPv4)
$ ip fou add port 5555 ipproto 4

# Bring up the tunnel.
$ ip link set dev tunl0 up

# Disable reverse path filtering on tunnel interface.
$ sysctl -w net.ipv4.conf.tunl0.rp_filter=0
$ sysctl -w net.ipv4.conf.all.rp_filter=0```

IPVS does not support Foo-Over-UDP as a packet forwarding method. To work around this limitation, we’ve created virtual interfaces that implement Foo-Over-UDP encapsulation. We can then use IPVS’s direct packet forwarding method along with the kernel routing table to choose a specific interface.

Linux is often configured to ignore packets that arrive on an interface that is different from the interface used for replies. As packets will now be arriving on the virtual “tunl0” interface, we need to disable reverse path filtering on this interface. The kernel uses the higher value of the named and “all” interfaces, so you may need to decrease “all” and adjust other interfaces.

MTUs and Encapsulation

The maximum IPv4 packet size, or maximum transmission unit (MTU), we accept from the internet is 1500 bytes. To ensure we did not fragment these packets during encapsulation we increased our internal MTUs from the default to accommodate the IP and UDP headers.

The team had underestimated the complexity of changing the MTU across all our racks of equipment. We had to adjust the MTU across all our routers and switches, of our bonded and VXLAN interfaces, and finally our Foo-Over-UDP encapsulation. Even with a carefully orchestrated rollout, and we still uncovered MTU-related bugs with our switches and server stack, many of which manifested first as issues on other parts of the network.

Node Agent

We’ve written a Go agent running on each load balancer that synchronizes with a control plane layer that’s tracking the location of services. The agent then configures the system based on active services and available backend servers.

To configure IPVS and the routing table we’re using packages built upon the netlink Go package. We’re open sourcing the IPVS netlink package we built today, which supports querying, creating and updating IPVS virtual servers, destinations, and statistics.

Unfortunately, there is no official programming interface for iptables, so we must instead execute the iptables binary. The agent computes an ideal set of iptables chains and rules, which is then reconciled with the live rules.

Subset of iptables for a service

*filter
-A INPUT -d 198.51.100.0/24 -m comment --comment
"leif:nhAi5v93jwQYcJuK" -j LEIFUR-LB
-A LEIFUR-LB -d 198.51.100.1/32 -p tcp -m comment --comment
"leif:G4qtNUVFCkLCu4yt" -m multiport --dports 80 -j LEIFUR-GQ4OKHRLCJYOWIN9
-A LEIFUR-GQ4OKHRLCJYOWIN9 -s 10.0.0.0/8 -m comment --comment
"leif:G4qtNUVFCkLCu4yt" -j ACCEPT
-A LEIFUR-GQ4OKHRLCJYOWIN9 -s 172.16.0.0/12 -m comment --comment
"leif:0XxZ2OwlQWzIYFTD" -j ACCEPT

The iptables output of a rule may differ significantly from the input given by our ideal rule. To avoid needing to parse the entire iptables rule in our comparisons, we store a hash of the rule, including the position in the chain, as an iptables comment. We then can compare the comment to our ideal rule to determine if we need to take any actions. On chains that are shared (such as INPUT) the agent ignores unmanaged rules.

Kubernetes Integration

We use the network load balancer described here as a cloud load balancer for Kubernetes. A controller assigns virtual IP addresses to Kubernetes services requesting a load balancer IP. These IPs get configured by the agent in IPVS. Traffic is directed to a subset of cluster nodes for handling by kube-proxy, unless the External Traffic Policy is set to “Local” in which case the traffic is sent to the specific backends the workloads are running on.

This allows us to have internal Kubernetes clusters that better replicate the load balancer behavior of managed clusters on cloud providers. Services running Kubernetes, such as ingress controllers, API gateways, and databases, have access to correct client IP addresses of load balanced traffic.

Future Work

  • Continuing a close eye on future developments of IPVS and alternatives, including nftlb and Express Data Path (XDP) and eBPF.
  • Migrate to nftables. The “flat priorities” and lack of programmable interface for iptables makes it ill-suited for including automated rules alongside rules added by operators. We hope as more projects and operations move to nftables we’ll be able to switch without creating a “blind-spot” to operations.
  • Failures of a load balancer can result in temporary outages due to BGP hold timers. We’d like to improve how we’re handling the failures with BGP sessions.
  • Investigate using Lightweight Tunnels to reduce the number of Foo-Over-UDP interfaces are needed on the load balancer nodes.

Additional Reading

Using Load Balancers on Amazon Lightsail

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/using-load-balancers-on-amazon-lightsail/

This post was written by Robert Zhu, Principal Developer Advocate at AWS. 

As your web application grows, it needs to scale up to main performance and improve availability. A Load Balancer is an important tool in any developer’s tool belt. In this post, I show how to load balance a simple Node.js web application using the Amazon Lightsail Load Balancer. I chose Node.js as an example, but the concepts and techniques I discuss work with any other web application stack such as PHP, Djago, .net, or even plain old HTML. To demonstrate these concepts, I deploy a load balancer that splits traffic between two Amazon Lightsail instances using round-robin load balancing. Compared to other load balancing solutions on AWS, such as our Application Load Balancer and Network Load Balancer, the Lightsail Load Balancer is simpler, easier to use, and has a fixed cost of $18/month, regardless of how much bandwidth you use or how many connections are open.

Launch the Cluster

For the demo, first create a cluster consisting of two Lightsail instances. To do so, navigate to the Lightsail console, and launch two micro Lightsail instances using the Ubuntu 18.04 image.

After you select your Region and OS, select an instance plan and identify your instance. Make sure that your instances have enough memory and compute to run your application. For this demo, the smallest instance size suffices.

Instance plan and name

 

Once your instances are ready, click each instance and open an SSH session and run the following commands:

 

sudo apt-get update

sudo apt-get -y install nodejs npm

git clone https://github.com/robzhu/colornode

cd colornode && npm install

 

The above commands update APT package registry, and install Node.js and npm, in order to run the application. When installing npm, select “yes” when you see the following prompt:

npm prompt

Once installation is complete, you can run “node -v” and “npm -v” to make sure everything works. While this is an older version of Node.js, you should be able to use it without any issues. Remember to run these steps for both instances.

The application

Let’s take a tour of the application. Here’s the main.js source file:

 

const color = require("randomcolor")();

const name = require("./randomName");

const app = require("express")();

app.get("/", (_, res) => {

  res.send(`

    <html>

      <body bgcolor=${color}>

      <h1>${name}</h1>

    </html>

  `);

});


app.get("/health", (_, res) => {

  res.status(200).send("I'm OK");

});

app.listen(80, () => console.log(`${name} started on port 80`));

 

Upon startup, the application generates a random name and color. It then uses express-js to serve responses to requests at two routes: “/” and “/health”. When a client requests the “/” route, you respond with an HTML snippet that contains the name and color of the application. When a client requests the “/health” route, you respond with HTTP status code 200. Later, I use the health route to enable Lightsail to determine the health of a node and whether the load balancer should continue routing requests to that node.

 

You can start the application by running “sudo node main.js”. Do this for both instances, and you should see two distinct webpages at each instance’s IP address:

images of two instance urls

 

Launch the Load Balancer

To create a load balancer, open the Network tab in the Lightsail console and click Create load balancer.

Networking tab of Lightsail console

On the next page, name your load balancer and click Create load balancer. After that, a page shows where you can add target instances to the load balancer:

target instances

From the dropdown menu under Target instances, select each instance and attach it to the load balancer, like so:

 

target instances 2

image of first loadbalancer

Once both instances are attached, copy the DNS name for your load balancer and paste it into a new browser session. When the load balancer receives the browser request, it routes the request to one of the two instances. As you refresh the browser, you should see the returned response alternate between the two instances. This should clearly illustrate the functionality of the load balancer.

Health Checks

Over time, your instances inevitably exhibit faulty behavior due to application errors, network partitions, reboots, etc. Load balancers need to detect faulty nodes in order to avoid routing traffic to them. The simplest way to implement a load balancer health check is to expose an HTTP endpoint on your nodes. By default, the Lightsail Load Balancer performs health checks by issuing an HTTP GET request against the node IP address directly. However, recall that the application serves a simple HTTP 200 response at the “/health” route. You can configure the Load Balancer to use this route to perform health checks on our nodes instead.

 

Within your Lightsail console, open your load balancer, click Customize health checking, and enter health for the route:

health check screenshot

 

Now, the Lightsail load balancer uses http://{instanceIP}/health to perform the health check. You can further customize your load balancer by using a custom domain and adding SSL termination.

 

The custom health check feature allows us to implement deeper health check logic. For example, if you want to make sure that the entire application stack is running, you can write a test value to the database and read it back. That would give our application a much higher assurance of functionality, rather than returning Status Code 200 at an HTTP endpoint.

 

Conclusion

In summary, you created a Lightsail Load Balancer and used it to route traffic between two Lightsail instances serving our demo application. If one of these instances crashes or becomes unreachable, the load balancer will automatically route traffic to the remaining healthy nodes in the cluster based on a customizable HTTP health check. In comparison to other AWS load balancing solutions, the Lightsail Load Balancer is simpler, easier, and uses a fixed-pricing model. Please leave feedback in the comments or reach out to me on Twitter or email!

 

In a follow-up post, I walk through some more advanced concepts:

  • SSL termination
  • Custom Domains
  • Deep health check
  • Websocket connections

 

About the Author

Robert Zhu is a principal developer advocate at AWS.

Email: [email protected]

Twitter: @rbzhu

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

Post Syndicated from Brian Batraski original https://blog.cloudflare.com/adding-the-fallback-pool-to-the-load-balancing-ui/

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

The Cloudflare Load Balancer was introduced over three years ago to provide our customers with a powerful, easy to use tool to intelligently route traffic to their origins across the world. During the initial design process, one of the questions we had to answer was ‘where do we send traffic if all pools are down?’ We did not think it made sense just to drop the traffic, so we used the concept of a ‘fallback pool’ to send traffic to a ‘pool of last resort’ in the case that no pools were detected as available. While this may still result in an error, it gave an eyeball request a chance at being served successfully in case the pool was still up.

As a brief reminder, a load balancer helps route traffic across your origin servers to ensure your overall infrastructure stays healthy and available. Load Balancers are made up of pools, which can be thought of as collections of servers in a particular location.

Over the past three years, we’ve made many updates to the dashboard. The new designs now support the fallback pool addition to the dashboard UI. The use of a fallback pool is incredibly helpful in a tight spot, but not having it viewable in the dashboard led to confusion around which pool was set as the fallback. Was there a fallback pool set at all? We want to be sure you have the tools to support your day-to-day work, while also ensuring our dashboard is usable and intuitive.

You can now check which pool is set as the fallback in any given Load Balancer, along with being able to easily designate any pool in the Load Balancer as the fallback. If no fallback pool is set, then the last pool in the list will automatically be chosen. We made the decision to auto-set a pool to be sure that customers are always covered in case the worst scenario happens. You can access the fallback pool within the Traffic App of the Cloudflare dashboard when creating or editing a Load Balancer.

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

Load Balancing UI Improvements

Not only did we add the fallback pool to the UI, but we saw this as an opportunity to update other areas of the Load Balancing app that have caused some confusion in the past.

Facelift and De-modaling

As a start, we gave the main Load Balancing page a face lift as well as de-modaling (moving content out of a smaller modal screen into a larger area) the majority of the Load Balancing UI. We felt moving this content out of a small web element would allow users to more easily understand the content on the page and allow us to better use the larger available space rather than being limited to the small area of a modal. This change has been applied when you create or edit a Load Balancer and manage monitors and/or pools.

Before:

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

After:

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

The updated UI has combined the health status and icon to declutter the available space and make it clear at a glance what the status is for a particular Load Balancer or Pool. We have also updated to a smaller toggle button across the Load Balancing UI, which allows us to update the action buttons with the added margin space gained. Now that we are utilizing the page surface area more efficiently, we moved forward to add more information in our tables so users are more aware of the shared aspects of their Load Balancer.

Shared Objects and Editing

Shared objects have caused some level of concern for companies who have teams across the world – all leveraging the Cloudflare dashboard.

Some of the shared objects, Monitors and Pools, have a new column added outlining which Pools or Load Balancers are currently in use by a particular Monitor or Pool. This brings more clarity around what will be affected by any changes made by someone from your organization. This supports users to be more autonomous and confident when they make an update in the dashboard. If someone from team A wants to update a monitor for a production server, they can do so without the worry of monitoring for another pool possibly breaking or have to speak to team B first. The time saved and empowerment to make updates as things change in your business is incredibly valuable. It supports velocity you may want to achieve while maintaining a safe environment to operate in. The days of having to worry about unforeseen consequences that could crop up later down the road are swiftly coming to a close.

This helps teams understand the impact of a given change and what else would be affected. But, we did not feel this was enough. We want to be sure that everyone is confident in the changes they are making. On top of the additional columns, we added in a number of confirmation modals to drive confidence about a particular change. Also, a list in the modal of the other Load Balancers or Pools that would be impacted. We really wanted to drive the message home around which objects are shared: we made a final change to allow edits of monitors to take place only within the Manage Monitors page. We felt that having users navigate to the manage page in itself gives more understanding that these items are shared. For example, allowing edits to a Monitor in the same view of editing a Load Balancer can make it seem like those changes are only for that Load Balancer, which is not always the case.

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

Manage Monitors before:

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

Manage monitors after:

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

Updated CTAs/Buttons

Lastly, when users would expand the Manage Load Balancer table to view more details about their Pools or Origins within that specific Load Balancer, they would click the large X icon in the top right of that expanded card to close it – seems reasonable in the expanded context.

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

But, the X icon did not close the expanded card, but rather deleted the Load Balancer altogether. This is dangerous and we want to prevent users from making mistakes. With the added space we gained from de-modaling large areas of the UI, we have updated these buttons to be clickable text buttons that read ‘Edit’ or ‘Delete’ instead of the icon buttons. The difference is providing clearly defined text around the action that will take place, rather than leaving it up to a users interpretation of what the icon on the button means and the action it would result in. We felt this was much clearer to users and not be met with unwanted changes.

We are very excited about the updates to the Load Balancing dashboard and look forward to improving day in and day out.

After:

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

Introducing Load Balancing Analytics

Post Syndicated from Brian Batraski original https://blog.cloudflare.com/introducing-load-balancing-analytics/

Introducing Load Balancing Analytics

Introducing Load Balancing Analytics

Cloudflare aspires to make Internet properties everywhere faster, more secure, and more reliable. Load Balancing helps with speed and reliability and has been evolving over the past three years.

Let’s go through a scenario that highlights a bit more of what a Load Balancer is and the value it can provide.  A standard load balancer comprises a set of pools, each of which have origin servers that are hostnames and/or IP addresses. A routing policy is assigned to each load balancer, which determines the origin pool selection process.

Let’s say you build an API that is using cloud provider ACME Web Services. Unfortunately, ACME had a rough week, and their service had a regional outage in their Eastern US region. Consequently, your website was unable to serve traffic during this period, which resulted in reduced brand trust from users and missed revenue. To prevent this from happening again, you decide to take two steps: use a secondary cloud provider (in order to avoid having ACME as a single point of failure) and use Cloudflare’s Load Balancing to take advantage of the multi-cloud architecture. Cloudflare’s Load Balancing can help you maximize your API’s availability for your new architecture. For example, you can assign health checks to each of your origin pools. These health checks can monitor your origin servers’ health by checking HTTP status codes, response bodies, and more. If an origin pool’s response doesn’t match what is expected, then traffic will stop being steered there. This will reduce downtime for your API when ACME has a regional outage because traffic in that region will seamlessly be rerouted to your fallback origin pool(s). In this scenario, you can set the fallback pool to be origin servers in your secondary cloud provider. In addition to health checks, you can use the ‘random’ routing policy in order to distribute your customers’ API requests evenly across your backend. If you want to optimize your response time instead, you can use ‘dynamic steering’, which will send traffic to the origin determined to be closest to your customer.

Our customers love Cloudflare Load Balancing, and we’re always looking to improve and make our customers’ lives easier. Since Cloudflare’s Load Balancing was first released, the most popular customer request was for an analytics service that would provide insights on traffic steering decisions.

Today, we are rolling out Load Balancing Analytics in the Traffic tab of the Cloudflare  dashboard. The three major components in the analytics service are:

  • An overview of traffic flow that can be filtered by load balancer, pool, origin, and region.
  • A latency map that indicates origin health status and latency metrics from Cloudflare’s global network spanning 194 cities and growing!
  • Event logs denoting changes in origin health. This feature was released in 2018 and tracks pool and origin transitions between healthy and unhealthy states. We’ve moved these logs under the new Load Balancing Analytics subtab. See the documentation to learn more.

In this blog post, we’ll discuss the traffic flow distribution and the latency map.

Traffic Flow Overview

Our users want a detailed view into where their traffic is going, why it is going there, and insights into what changes may optimize their infrastructure. With Load Balancing Analytics, users can graphically view traffic demands on load balancers, pools, and origins over variable time ranges.

Understanding how traffic flow is distributed informs the process of creating new origin pools, adapting to peak traffic demands, and observing failover response during origin pool failures.

Introducing Load Balancing Analytics
Figure 1

In Figure 1, we can see an overview of traffic for a given domain. On Tuesday, the 24th, the red pool was created and added to the load balancer. In the following 36 hours, as the red pool handled more traffic, the blue and green pool both saw a reduced workload. In this scenario, the traffic distribution graph did provide the customer with new insights. First, it demonstrated that traffic was being steered to the new red pool. It also allowed the customer to understand the new level of traffic distribution across their network. Finally, it allowed the customer to confirm whether traffic decreased in the expected pools. Over time, these graphs can be used to better manage capacity and plan for upcoming infrastructure needs.

Latency Map

The traffic distribution overview is only one part of the puzzle. Another essential component is understanding request performance around the world. This is useful because customers can ensure user requests are handled as fast as possible, regardless of where in the world the request originates.

The standard Load Balancing configuration contains monitors that probe the health of customer origins. These monitors can be configured to run from a particular region(s) or, for Enterprise customers, from all Cloudflare locations. They collect useful information, such as round-trip time, that can be aggregated to create the latency map.

The map provides a summary of how responsive origins are from around the world, so customers can see regions where requests are underperforming and may need further investigation. A common metric used to identify performance is request latency. We found that the p90 latency for all Load Balancing origins being monitored is 300 milliseconds, which means that 90% of all monitors’ health checks had a round trip time faster than 300 milliseconds. We used this value to identify locations where latency was slower than the p90 latency seen by other Load Balancing customers.

Introducing Load Balancing Analytics
Figure 2

In Figure 2, we can see the responsiveness of the Northeast Asia pool. The Northeast Asia pool is slow specifically for monitors in South America, the Middle East, and Southern Africa, but fast for monitors that are probing closer to the origin pool. Unfortunately, this means users for the pool in countries like Paraguay are seeing high request latency. High page load times have many unfortunate consequences: higher visitor bounce rate, decreased visitor satisfaction rate, and a lower search engine ranking. In order to avoid these repercussions, a site administrator could consider adding a new origin pool in a region closer to underserved regions. In Figure 3, we can see the result of adding a new origin pool in Eastern North America. We see the number of locations where the domain was found to be unhealthy drops to zero and the number of slow locations cut by more than 50%.

Introducing Load Balancing Analytics
Figure 3

Tied with the traffic flow metrics from the Overview page, the latency map arms users with insights to optimize their internal systems, reduce their costs, and increase their application availability.

GraphQL Analytics API

Behind the scenes, Load Balancing Analytics is powered by the GraphQL Analytics API. As you’ll learn later this week, GraphQL provides many benefits to us at Cloudflare. Customers now only need to learn a single API format that will allow them to extract only the data they require. For internal development, GraphQL eliminates the need for customized analytics APIs for each service, reduces query cost by increasing cache hits, and reduces developer fatigue by using a straightforward query language with standardized input and output formats. Very soon, all Load Balancing customers on paid plans will be given the opportunity to extract insights from the GraphQL API.  Let’s walk through some examples of how you can utilize the GraphQL API to understand your Load Balancing logs.

Suppose you want to understand the number of requests the pools for a load balancer are seeing from the different locations in Cloudflare’s global network. The query in Figure 4 counts the number of unique (location, pool ID) combinations every fifteen minutes over the course of a week.

Introducing Load Balancing Analytics
Figure 4

For context, our example load balancer, lb.example.com, utilizes dynamic steering. Dynamic steering directs requests to the most responsive, available, origin pool, which is often the closest. It does so using a weighted round-trip time measurement. Let’s try to understand why all traffic from Singapore (SIN) is being steered to our pool in Northeast Asia (asia-ne). We can run the query in Figure 5. This query shows us that the asia-ne pool has an avgRttMs value of 67ms, whereas the other two pools have avgRttMs values that exceed 150ms. The lower avgRttMs value explains why traffic in Singapore is being routed to the asia-ne pool.

Introducing Load Balancing Analytics
Figure 5

Notice how the query in Figure 4 uses the loadBalancingRequestsGroups schema, whereas the query in Figure 5 uses the loadBalancingRequests schema. loadBalancingRequestsGroups queries aggregate data over the requested query interval, whereas loadBalancingRequests provides granular information on individual requests. For those ready to get started, Cloudflare has written a helpful guide. The GraphQL website is also a great resource. We recommend you use an IDE like GraphiQL to make your queries. GraphiQL embeds the schema documentation into the IDE, autocompletes, saves your queries, and manages your custom headers, all of which help make the developer experience smoother.

Conclusion

Now that the Load Balancing Analytics solution is live and available to all Pro, Business, Enterprise customers, we’re excited for you to start using it! We’ve attached a survey to the Traffic overview page, and we’d love to hear your feedback.

Announcing deeper insights and new monitoring capabilities from Cloudflare Analytics

Post Syndicated from Filipp Nisenzoun original https://blog.cloudflare.com/announcing-deeper-insights-and-new-monitoring-capabilities/

Announcing deeper insights and new monitoring capabilities from Cloudflare Analytics

Announcing deeper insights and new monitoring capabilities from Cloudflare Analytics

This week we’re excited to announce a number of new products and features that provide deeper security and reliability insights, “proactive” analytics when there’s a problem, and more powerful ways to explore your data.

If you’ve been a user or follower of Cloudflare for a little while, you might have noticed that we take pride in turning technical challenges into easy solutions. Flip a switch or run a few API commands, and the attack you’re facing is now under control or your site is now 20% faster. However, this ease of use is even more helpful if it’s complemented by analytics. Before you make a change, you want to be sure that you understand your current situation. After the change, you want to confirm that it worked as intended, ideally as fast as possible.

Because of the front-line position of Cloudflare’s network, we can provide comprehensive metrics regarding both your traffic and the security and performance of your Internet property. And best of all, there’s nothing to set up or enable. Cloudflare Analytics is automatically available to all Cloudflare users and doesn’t rely on Javascript trackers, meaning that our metrics include traffic from APIs and bots and are not skewed by ad blockers.

Here’s a sneak peek of the product launches. Look out for individual blog posts this week for more details on each of these announcements.

  • Product Analytics:
    • Today, we’re making Firewall Analytics available to Business and Pro plans, so that more customers understand how well Cloudflare mitigates attacks and handles malicious traffic. And we’re highlighting some new metrics, such as the rate of solved captchas (useful for Bot Management), and features, such as customizable reports to facilitate sharing and archiving attack information.
    • We’re introducing Load Balancing Analytics, which shows traffic flows by load balancer, pool, origin, and region, and helps explain why a particular origin was selected to receive traffic.
  • Monitoring:
    • We’re announcing tools to help you monitor your origin either actively or passively and automatically reroute your traffic to a different server when needed. Because Cloudflare sits between your end users and your origin, we can spot problems with your servers without the use of external monitoring services.
  • Data tools:
    • The product analytics we’ll be featuring this week use a new API behind the scenes. We’re making this API generally available, allowing you to easily build custom dashboards and explore all of your Cloudflare data the same way we do, so you can easily gain insights and identify and debug issues.
  • Account Analytics:
    • We’re releasing (in beta) a new dashboard that shows aggregated information for all of the domains under your account, allowing you to know what’s happening at a glance.

We’re excited to tell you about all of these new products in this week’s posts and would love to hear your thoughts. If you’re not already subscribing to the blog, sign up now to receive daily updates in your inbox.

Cloudflare Partners: A New Program with New Partners

Post Syndicated from Dan Hollinger original https://blog.cloudflare.com/cloudflare-partners-a-new-program-with-new-partners/

Cloudflare Partners: A New Program with New Partners

Many overlook a critical portion of the language in Cloudflare’s mission: “to help build a better Internet.” From the beginning, we knew a mission this bold, an undertaking of this magnitude, couldn’t be done alone. We could only help. To ultimately build a better Internet, it would take a diverse and engaged ecosystem of technologies, customers, partners, and end-users. Fortunately, we’ve been able to work with amazing partners as we’ve grown, and we are eager to announce new, specific programs to grow our ecosystem with an increasingly diverse set of partners.

Today, we’re excited to announce the latest iteration of our partnership program for solutions partners. These categories encompass resellers and referral partners, OEM partners, and the new partner services program. Over the past few years, we’ve grown and learned from some amazing partnerships, and want to bring those best practices to our latest partners at scale—to help them grow their business with Cloudflare’s global network.

Cloudflare Partners: A New Program with New Partners
Cloudflare Partner Tiers

Partner Program for Solution Partners

Every partner program out there has tiers, and Cloudflare’s program is no exception. However, our tiering was built to help our partners ramp up, accelerate and move fast. As Matt Harrell highlighted, we want the process to be as seamless as possible, to help partners find the level of engagement that works best for them‚ with world-class enablement paths and best-in-class revenue share models—built-in from the beginning.

World-Class Enablement

Cloudflare offers complimentary training and enablement to all partners. From self-serve paths, to partner-focused webinars, and instructor-based courses, and certification—we want to ensure our partners can learn and develop Cloudflare and product expertise, to make them as effective as possible when utilizing our massive global network.

Driving Business Value

We want our partners to grow and succeed. From self-serve resellers to our most custom integration, we want to make it frictionless to build a profitable business on Cloudflare. From our tenant API system to dedicated account teams, we’re ready to help you go-to-market with solutions that help end-customers. This includes opportunities to co-market, develop target accounts, and directly partner, to succeed with strategic accounts.

Cloudflare recognizes that, in order to help build a better Internet, we need successful partners—and our latest program is built to help partners build and grow profitable business models around Cloudflare’s solutions.

Partner Services Program – SIs, MSSPs, MSSPs, PSOs

For the first time, we are expanding our program to also include service providers that want to develop and grow profitable services practices around Cloudflare’s solutions.

Our customers face some of the most complex challenges on the Internet. From those challenges, we’ve already seen some amazing opportunities for service providers to create value, grow their business, and make an impact. From customers migrating to the public, hybrid or multi-cloud for the first time, to entirely re-writing applications using Cloudflare Workers®, the need for expertise has increased across our entire customer base. In early pilots, we’ve seen four major categories in building a successful service practice around Cloudflare:

  • Network Digital Transformations – Help customers migrate and modernize their network solution. Cloudflare is the only cloud-native network to give Enterprise-grade control and visibility into on-prem, hybrid, and multi-cloud architectures.
  • Serverless Architecture Development – Provide serverless application development services, thought leadership, and technical consulting around leveraging Cloudflare Workers and Apps.
  • Managed Security & Insights – Enable CISO and IT leaders to obtain single pane of glass of reporting and policy management across all Internet-facing application with Cloudflare’s Security solutions.
  • Managed Performance & Reliability – Keep customer’s High-Availability applications running quickly and efficiently with Cloudflare’s Global Load Balancing, Smart Routing, and Anycast DNS, which allows performance consulting, traffic analysis, and application monitoring.

As we expand this program, we are looking for audacious service providers and system integrators that want to help us build a better Internet for our mutual customers. Cloudflare can be an essential lynchpin to simplify and accelerate digital transformations. We imagine a future where massive applications run within 10ms of 90% of the global population, and where a single-pane solution provides security, performance, and reliability for mission-critical applications—running across multiple clouds. Cloudflare needs help from amazing global integrators and service partners to help realize this future.

If you are interested in learning more about becoming a service partner and growing your business with Cloudflare, please reach out to [email protected] or explore cloudflare.com/partners/services

Just Getting Started

Metcalf’s law states that a network is only as powerful as the amount of nodes within the network. And within the global Cloudflare network, we want as many partner nodes as possible—from agencies to systems integrators, managed security providers, Enterprise resellers, and OEMs. A diverse ecosystem of partners is essential to our mission of helping to build a better Internet, together. We are dedicated to the success of our partners, and we will continue to iterate and develop our programs to make sure our partners can grow and develop on Cloudflare’s global network. Our commitment moving forward is that Cloudflare will be the easiest and most valuable solution for channel partners to sell and support globally.


More Information:

MultiCloud… flare

Post Syndicated from Zack Bloom original https://blog.cloudflare.com/multicloudflare/

If you want to start an intense conversation in the halls of Cloudflare, try describing us as a “CDN”. CDNs don’t generally provide you with Load Balancing, they don’t allow you to deploy Serverless Applications, and they certainly don’t get installed onto your phone. One of the costs of that confusion is many people don’t realize everything Cloudflare can do for people who want to operate in multiple public clouds, or want to operate in both the cloud and on their own hardware.

Load Balancing

Cloudflare has countless servers located in 180 data centers around the world. Each one is capable of acting as a Layer 7 load balancer, directing incoming traffic between origins wherever they may be. You could, for example, add load balancing between a set of machines you have in AWS’ EC2, and another set you keep in Google Cloud.

This load balancing isn’t just round-robining traffic. It supports weighting to allow you to control how much traffic goes to each cluster. It supports latency-based routing to automatically route traffic to the cluster which is closer (so adding geographic distribution can be as simple as spinning up machines). It even supports health checks, allowing it to automatically direct traffic to the cloud which is currently healthy.

Most importantly, it doesn’t run in any of the provider’s clouds and isn’t dependent on them to function properly. Even better, since the load balancing runs near virtually every Internet user around the world it doesn’t come at any performance cost. (Using our Argo technology performance often gets better!).

Argo Tunnel

One of the hardest components to managing a multi-cloud deployment is networking. Each provider has their own method of defining networks and firewalls, and even tools which can deploy clusters across multiple clouds often can’t quite manage to get the networking configuration to work in the same way. The task of setting it up can often be a trial-and-error game where the final config is best never touched again, leaving ‘going multi-cloud’ as a failed experiment within organizations.

At Cloudflare we have a technology called Argo Tunnel which flips networking on its head. Rather than opening ports and directing incoming traffic, each of your virtual machines (or k8s pods) makes outbound tunnels to the nearest Cloudflare PoPs. All of your Internet traffic then flows over those tunnels. You keep all your ports closed to inbound traffic, and never have to think about Internet networking again.

What’s so powerful about this configuration is is makes it trivial to spin up machines in new locations. Want a dozen machines in Australia? As long as they start the Argo Tunnel daemon they will start receiving traffic. Don’t need them any more? Shut them down and the traffic will be routed elsewhere. And, of course, none of this relies on any one public cloud provider, making it reliable even if they should have issues.

Argo Tunnel makes it trivial to add machines in new clouds, or to keep machines on-prem even as you start shifting workloads into the Cloud.

Access Control

One thing you’ll realize about using Argo Tunnel is you now have secure tunnels which connect your infrastructure with Cloudflare’s network. Once traffic reaches that network, it doesn’t necessarily have to flow directly to your machines. It could, for example, have access control applied where we use your Identity Provider (like Okta or Active Directory) to decide who should be able to access what. Rather than wrestling with VPCs and VPN configurations, you can move to a zero-trust model where you use policies to decide exactly who can access what on a per-request basis.

In fact, you can now do this with SSH as well! You can manage all your user accounts in a single place and control with precision who can access which piece of infrastructure, irrespective of which cloud it’s in.

Our Reliability

No computer system is perfect, and ours is no exception. We make mistakes, have bugs in our code, and deal with the pain of operating at the scale of the Internet every day. One great innovation in the recent history of computers, however, is the idea that it is possible to build a reliable system on top of many individually unreliable components.

Each of Cloudflare’s PoPs is designed to function without communication or support from others, or from a central data center. That alone greatly increases our tolerance for network partitions and moves us from maintaining a single system to be closer to maintaining 180 independent clouds, any of which can serve all traffic.

We are also a system built on anycast which allows us to tap into the fundamental reliability of the Internet. The Internet uses a protocol called BGP which asks each system who would like to receive traffic for a particular IP address to ‘advertise’ it. Each router then will decide to forward traffic based on which person advertising an address is the closest. We advertise all of our IP addresses in every one of our data centers. If a data centers goes down, it stops advertising BGP routes, and the very same packets which would have been destined for it arrive in another PoP seamlessly.

Ultimately we are trying to help build a better Internet. We don’t believe that Internet is built on the back of a single provider. Many of the services provided by these cloud providers are simply too complex to be as reliable as the Internet demands.

True reliability and cost control both require existing on multiple clouds. It is clear that the tools which the Internet of the 80s and 90s gave us may be insufficient to move into that future. With a smarter network we can do more, better.

Designing resilient systems beyond retries (Part 2): Bulkheading, Load Balancing, and Fallbacks

Post Syndicated from Grab Tech original https://engineering.grab.com/beyond-retries-part-2

This post is the second of a three-part series on going beyond retries to improve system resiliency. We’ve previously discussed about rate-limiting as a strategy to improve resiliency. In this article, we will cover these techniques: bulkheading, load balancing, and fallbacks.

Introducing Bulkheading (Isolation)

Bulkheading is a fundamental pattern which underpins many other resiliency techniques, especially where microservices are concerned, so it’s worth introducing first. The term actually comes from an ancient technique in ship building, where a ship’s hull would be partitioned into several watertight compartments. If one of the compartments has a leak, then the water fills just that compartment and is contained, rather than flooding the entire ship. We can apply this principle to software applications and microservices: by isolating failures to individual components, we can prevent a single failure from cascading and bringing down the entire system.

Bulkheads also help to prevent single points of failure, by reducing the impact of any failures so services can maintain some level of service.

Level of bulkheads

It is important to note that bulkheads can be applied at multiple levels in software architecture. The two highest levels of bulkheads are at the infrastructure level, and the first is hardware isolation. In a cloud environment, this usually means isolating regions or availability zones. The second is isolating the operating system, which has become a widespread technique with the popularity of virtual machines and now containerization. Previously, it was common for multiple applications to run on a single (very powerful) dedicated server. Unfortunately, this meant that a rogue application could wreak havoc on the entire system in a number of ways, from filling the disk with logs to consuming memory or other resources.

Isolation can be achieved by applying bulkheading at multiple levels
Isolation can be achieved by applying bulkheading at multiple levels

 

This article focuses on resiliency from the application perspective, so below the system level is process-level isolation. In practical terms, this isolation prevents an application crash from affecting multiple system components. By moving those components into separate processes (or microservices), certain classes of application-level failures are prevented from causing cascading failure.

At the lowest level, and perhaps the most common form of bulkheading to software engineers, are the concepts of connection pooling and thread pools. While these techniques are commonly employed for performance reasons (reusing resources is cheaper than acquiring new ones), they also help to put a finite limit on the number of connections or concurrent threads that an operation is allowed to consume. This ensures that if the load of a particular operation suddenly increases unexpectedly (such as due to external load or downstream latency), the impact is contained to only a partial failure.

Bulkheading support in the Hystrix library

The Hystrix library for Go supports a form of bulkheading through its MaxConcurrentRequests parameter. This is conveniently tied to the circuit name, meaning that different levels of isolation can be achieved by choosing an appropriate circuit name. A good rule of thumb is to use a different circuit name for each operation or API call. This ensures that if just one particular endpoint of a remote service is failing, the other circuits are still free to be used for the remaining healthy endpoints, achieving failure isolation.

Load balancing

Global rate-limiting with a central server
Global rate-limiting with a central server

 

Load balancing is where network traffic from a client may be served by one of many backend servers. You can think of load balancers as traffic cops who distribute traffic on the road to prevent congestion and overload. Assuming the traffic is distributed evenly on the network, this effectively increases the computing power of the backend. Adding capacity like this is a common way to handle an increase in load from the clients, such as when a website becomes more popular.

Almost always, load balancers provide high availability for the application. When there is just a single backend server, this server is a ‘single point of failure’, because if it is ever unavailable, there are no servers remaining to serve the clients. However, if there is a pool of backend servers behind a load balancer, the impact is reduced. If there are 4 backend servers and only 1 is unavailable, evenly distributed requests would only fail 25% of the time instead of 100%. This is already an improvement, but modern load balancers are more sophisticated.

Usually, load balancers will include some form of a health check. This is a mechanism that monitors whether servers in the pool are ‘healthy’, ie. able to serve requests. The implementations for the health check vary, but this can be an active check such as sending ‘pings’, or passive monitoring of responses and removing the failing backend server instances.

As with rate-limiting, there are many strategies for load balancing to consider.

There are four main types of load balancer to choose from, each with their own pros and cons:

  • Proxy. This is perhaps the most well-known form of load-balancer, and is the method used by Amazon’s Elastic Load Balancer. The proxy sits on the boundary between the backend servers and the public clients, and therefore also doubles as a security layer: the clients do not know about or have direct access to the backend servers. The proxy will handle all the logic for load balancing and health checking. It is a very convenient and popular approach because it requires no special integration with the client or server code. They also typically perform ‘SSL termination’, decrypting incoming HTTPS traffic and using HTTP to communicate with the backend servers.
  • Client-side. This is where the client performs all of the load-balancing itself, often using a dedicated library built for the purpose. Compared with the proxy, it is more performant because it avoids an extra network ‘hop.’ However, there is a significant cost in developing and maintaining the code, which is necessarily complex and any bugs have serious consequences.
  • Lookaside. This is a hybrid approach where the majority of the load-balancing logic is handled by a dedicated service, but it does not proxy; the client still makes direct connections to the backend. This reduces the burden of the client-side library but maintains high performance, however the load-balancing service becomes another potential point of failure.
  • Service mesh with sidecar. A service mesh is an all-in-one solution for service communication, with many popular open-source products available. They usually include a sidecar, which is a proxy that sits on the same server as the application to route network traffic. Like the traditional proxy load balancer, this handles many concerns of load-balancing for free. However, there is still an extra network hop, and there can be a significant development cost to integrate with existing systems for logging, reporting and so on, so this must be weighed against building a client-side solution in-house.
Comparison of load-balancer architectures
Comparison of load-balancer architectures

 

Grab’s load-balancing implementation

At Grab, we have built our own internal client-side solution called CSDP, which uses the distributed key-value store etcd as its backend store.

Fallbacks

There are scenarios when simply retrying a failed API call doesn’t work. If the remote server is completely down or only returning errors, no amount of retries are going to help; the failure is unrecoverable. When recovery isn’t an option, mitigation is an alternative. This is related to the concept of graceful degradation: sometimes it is preferable to return a less optimal response than fail completely, especially for user-facing applications where user experience is important.

One such mitigation strategy is fallbacks. This is a broad topic with many different sub-strategies, but here are a few of the most common:

Fail silently

Starting with the easiest to implement, one basic fallback strategy is fail silently. This means returning an empty or null response when an error is encountered, as if the call had succeeded. If the data being requested is not critical functionality then this can be considered: missing part of a UI is less noticeable than an error page! For example, UI bubbles showing unread notifications are a common feature. But if the service providing the notifications is failing and the bubble shows 0 instead of N notifications, the user’s experience is unlikely to be significantly affected.

Local computation

A second fallback strategy when a downstream dependency is failing could be to compute the value locally instead. This could mean either returning a default (static) value, or using a simple formula to compute the response. For example, a marketplace application might have a service to calculate shipping costs. If it is unavailable, then using a default price might be acceptable. Or even $0 – users are unlikely to complain about errors that benefit them, and it’s better than losing business!

Cached values

Similarly, cached values are often used as fallbacks. If the service isn’t available to calculate the most up to date value, returning a stale response might be better than returning nothing. If an application is already caching the value with a short expiration to optimize performance, it can be reused as a fallback cache by setting two expiration times: one for normal circumstances, and another when the service providing the response has failed.

Backup service

Finally, if the response is too complex to compute locally or if major functionality of the application is required to have a fallback, then an entirely new service can act as a fallback; a backup service. Such a service is a big investment, so to make it worthwhile some trade-offs must be accepted. The backup service should be considerably simpler than the service it is intended to replace; if it is too complex then it will require constant testing and maintenance, not to mention documentation and training to make sure it is well understood within the engineering team. Also, a complex system is more likely to fail when activated. Usually such systems will have very few or no dependencies, and certainly should not depend on any parts of the original system, since they could have failed, rendering the backup system useless.

Grab’s fallback implementation

At Grab, we make use of various fallback strategies in our services. For example, our microservice framework Grab-Kit has built-in support for returning cached values when a downstream service is unresponsive. We’ve even built a backup service to replicate our core functionality, so we can continue to serve customers despite severe technical difficulties!

Up next, Architecture Patterns and Chaos Engineering…

We’ve covered various techniques in designing reliable and resilient systems in the previous articles. I hope you found them useful. Comments are always welcome.

In our next post, we will look at ways to prevent and reduce failures through architecture patterns and testing.

Please stay tuned!

AWS Achieves Spain’s ENS High Certification Across 29 Services

Post Syndicated from Oliver Bell original https://aws.amazon.com/blogs/security/aws-achieves-spains-ens-high-certification-across-29-services/

AWS has achieved Spain’s Esquema Nacional de Seguridad (ENS) High certification across 29 services. To successfully achieve the ENS High Standard, BDO España conducted an independent audit and attested that AWS meets confidentiality, integrity, and availability standards. This provides the assurance needed by Spanish Public Sector organizations wanting to build secure applications and services on AWS.

The National Security Framework, regulated under Royal Decree 3/2010, was developed through close collaboration between ENAC (Entidad Nacional de Acreditación), the Ministry of Finance and Public Administration and the CCN (National Cryptologic Centre), and other administrative bodies.

The following AWS Services are ENS High accredited across our Dublin and Frankfurt Regions:

  • Amazon API Gateway
  • Amazon DynamoDB
  • Amazon Elastic Container Service
  • Amazon Elastic Block Store
  • Amazon Elastic Compute Cloud
  • Amazon Elastic File System
  • Amazon Elastic MapReduce
  • Amazon ElastiCache
  • Amazon Glacier
  • Amazon Redshift
  • Amazon Relational Database Service
  • Amazon Simple Queue Service
  • Amazon Simple Storage Service
  • Amazon Simple Workflow Service
  • Amazon Virtual Private Cloud
  • Amazon WorkSpaces
  • AWS CloudFormation
  • AWS CloudTrail
  • AWS Config
  • AWS Database Migration Service
  • AWS Direct Connect
  • AWS Directory Service
  • AWS Elastic Beanstalk
  • AWS Key Management Service
  • AWS Lambda
  • AWS Snowball
  • AWS Storage Gateway
  • Elastic Load Balancing
  • VM Import/Export

New AWS Auto Scaling – Unified Scaling For Your Cloud Applications

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-auto-scaling-unified-scaling-for-your-cloud-applications/

I’ve been talking about scalability for servers and other cloud resources for a very long time! Back in 2006, I wrote “This is the new world of scalable, on-demand web services. Pay for what you need and use, and not a byte more.” Shortly after we launched Amazon Elastic Compute Cloud (EC2), we made it easy for you to do this with the simultaneous launch of Elastic Load Balancing, EC2 Auto Scaling, and Amazon CloudWatch. Since then we have added Auto Scaling to other AWS services including ECS, Spot Fleets, DynamoDB, Aurora, AppStream 2.0, and EMR. We have also added features such as target tracking to make it easier for you to scale based on the metric that is most appropriate for your application.

Introducing AWS Auto Scaling
Today we are making it easier for you to use the Auto Scaling features of multiple AWS services from a single user interface with the introduction of AWS Auto Scaling. This new service unifies and builds on our existing, service-specific, scaling features. It operates on any desired EC2 Auto Scaling groups, EC2 Spot Fleets, ECS tasks, DynamoDB tables, DynamoDB Global Secondary Indexes, and Aurora Replicas that are part of your application, as described by an AWS CloudFormation stack or in AWS Elastic Beanstalk (we’re also exploring some other ways to flag a set of resources as an application for use with AWS Auto Scaling).

You no longer need to set up alarms and scaling actions for each resource and each service. Instead, you simply point AWS Auto Scaling at your application and select the services and resources of interest. Then you select the desired scaling option for each one, and AWS Auto Scaling will do the rest, helping you to discover the scalable resources and then creating a scaling plan that addresses the resources of interest.

If you have tried to use any of our Auto Scaling options in the past, you undoubtedly understand the trade-offs involved in choosing scaling thresholds. AWS Auto Scaling gives you a variety of scaling options: You can optimize for availability, keeping plenty of resources in reserve in order to meet sudden spikes in demand. You can optimize for costs, running close to the line and accepting the possibility that you will tax your resources if that spike arrives. Alternatively, you can aim for the middle, with a generous but not excessive level of spare capacity. In addition to optimizing for availability, cost, or a blend of both, you can also set a custom scaling threshold. In each case, AWS Auto Scaling will create scaling policies on your behalf, including appropriate upper and lower bounds for each resource.

AWS Auto Scaling in Action
I will use AWS Auto Scaling on a simple CloudFormation stack consisting of an Auto Scaling group of EC2 instances and a pair of DynamoDB tables. I start by removing the existing Scaling Policies from my Auto Scaling group:

Then I open up the new Auto Scaling Console and selecting the stack:

Behind the scenes, Elastic Beanstalk applications are always launched via a CloudFormation stack. In the screen shot above, awseb-e-sdwttqizbp-stack is an Elastic Beanstalk application that I launched.

I can click on any stack to learn more about it before proceeding:

I select the desired stack and click on Next to proceed. Then I enter a name for my scaling plan and choose the resources that I’d like it to include:

I choose the scaling strategy for each type of resource:

After I have selected the desired strategies, I click Next to proceed. Then I review the proposed scaling plan, and click Create scaling plan to move ahead:

The scaling plan is created and in effect within a few minutes:

I can click on the plan to learn more:

I can also inspect each scaling policy:

I tested my new policy by applying a load to the initial EC2 instance, and watched the scale out activity take place:

I also took a look at the CloudWatch metrics for the EC2 Auto Scaling group:

Available Now
We are launching AWS Auto Scaling today in the US East (Northern Virginia), US East (Ohio), US West (Oregon), EU (Ireland), and Asia Pacific (Singapore) Regions today, with more to follow. There’s no charge for AWS Auto Scaling; you pay only for the CloudWatch Alarms that it creates and any AWS resources that you consume.

As is often the case with our new services, this is just the first step on what we hope to be a long and interesting journey! We have a long roadmap, and we’ll be adding new features and options throughout 2018 in response to your feedback.

Jeff;

Scale Your Web Application — One Step at a Time

Post Syndicated from Saurabh Shrivastava original https://aws.amazon.com/blogs/architecture/scale-your-web-application-one-step-at-a-time/

I often encounter people experiencing frustration as they attempt to scale their e-commerce or WordPress site—particularly around the cost and complexity related to scaling. When I talk to customers about their scaling plans, they often mention phrases such as horizontal scaling and microservices, but usually people aren’t sure about how to dive in and effectively scale their sites.

Now let’s talk about different scaling options. For instance if your current workload is in a traditional data center, you can leverage the cloud for your on-premises solution. This way you can scale to achieve greater efficiency with less cost. It’s not necessary to set up a whole powerhouse to light a few bulbs. If your workload is already in the cloud, you can use one of the available out-of-the-box options.

Designing your API in microservices and adding horizontal scaling might seem like the best choice, unless your web application is already running in an on-premises environment and you’ll need to quickly scale it because of unexpected large spikes in web traffic.

So how to handle this situation? Take things one step at a time when scaling and you may find horizontal scaling isn’t the right choice, after all.

For example, assume you have a tech news website where you did an early-look review of an upcoming—and highly-anticipated—smartphone launch, which went viral. The review, a blog post on your website, includes both video and pictures. Comments are enabled for the post and readers can also rate it. For example, if your website is hosted on a traditional Linux with a LAMP stack, you may find yourself with immediate scaling problems.

Let’s get more details on the current scenario and dig out more:

  • Where are images and videos stored?
  • How many read/write requests are received per second? Per minute?
  • What is the level of security required?
  • Are these synchronous or asynchronous requests?

We’ll also want to consider the following if your website has a transactional load like e-commerce or banking:

How is the website handling sessions?

  • Do you have any compliance requests—like the Payment Card Industry Data Security Standard (PCI DSS compliance) —if your website is using its own payment gateway?
  • How are you recording customer behavior data and fulfilling your analytics needs?
  • What are your loading balancing considerations (scaling, caching, session maintenance, etc.)?

So, if we take this one step at a time:

Step 1: Ease server load. We need to quickly handle spikes in traffic, generated by activity on the blog post, so let’s reduce server load by moving image and video to some third -party content delivery network (CDN). AWS provides Amazon CloudFront as a CDN solution, which is highly scalable with built-in security to verify origin access identity and handle any DDoS attacks. CloudFront can direct traffic to your on-premises or cloud-hosted server with its 113 Points of Presence (102 Edge Locations and 11 Regional Edge Caches) in 56 cities across 24 countries, which provides efficient caching.
Step 2: Reduce read load by adding more read replicas. MySQL provides a nice mirror replication for databases. Oracle has its own Oracle plug for replication and AWS RDS provide up to five read replicas, which can span across the region and even the Amazon database Amazon Aurora can have 15 read replicas with Amazon Aurora autoscaling support. If a workload is highly variable, you should consider Amazon Aurora Serverless database  to achieve high efficiency and reduced cost. While most mirror technologies do asynchronous replication, AWS RDS can provide synchronous multi-AZ replication, which is good for disaster recovery but not for scalability. Asynchronous replication to mirror instance means replication data can sometimes be stale if network bandwidth is low, so you need to plan and design your application accordingly.

I recommend that you always use a read replica for any reporting needs and try to move non-critical GET services to read replica and reduce the load on the master database. In this case, loading comments associated with a blog can be fetched from a read replica—as it can handle some delay—in case there is any issue with asynchronous reflection.

Step 3: Reduce write requests. This can be achieved by introducing queue to process the asynchronous message. Amazon Simple Queue Service (Amazon SQS) is a highly-scalable queue, which can handle any kind of work-message load. You can process data, like rating and review; or calculate Deal Quality Score (DQS) using batch processing via an SQS queue. If your workload is in AWS, I recommend using a job-observer pattern by setting up Auto Scaling to automatically increase or decrease the number of batch servers, using the number of SQS messages, with Amazon CloudWatch, as the trigger.  For on-premises workloads, you can use SQS SDK to create an Amazon SQS queue that holds messages until they’re processed by your stack. Or you can use Amazon SNS  to fan out your message processing in parallel for different purposes like adding a watermark in an image, generating a thumbnail, etc.

Step 4: Introduce a more robust caching engine. You can use Amazon Elastic Cache for Memcached or Redis to reduce write requests. Memcached and Redis have different use cases so if you can afford to lose and recover your cache from your database, use Memcached. If you are looking for more robust data persistence and complex data structure, use Redis. In AWS, these are managed services, which means AWS takes care of the workload for you and you can also deploy them in your on-premises instances or use a hybrid approach.

Step 5: Scale your server. If there are still issues, it’s time to scale your server.  For the greatest cost-effectiveness and unlimited scalability, I suggest always using horizontal scaling. However, use cases like database vertical scaling may be a better choice until you are good with sharding; or use Amazon Aurora Serverless for variable workloads. It will be wise to use Auto Scaling to manage your workload effectively for horizontal scaling. Also, to achieve that, you need to persist the session. Amazon DynamoDB can handle session persistence across instances.

If your server is on premises, consider creating a multisite architecture, which will help you achieve quick scalability as required and provide a good disaster recovery solution.  You can pick and choose individual services like Amazon Route 53, AWS CloudFormation, Amazon SQS, Amazon SNS, Amazon RDS, etc. depending on your needs.

Your multisite architecture will look like the following diagram:

In this architecture, you can run your regular workload on premises, and use your AWS workload as required for scalability and disaster recovery. Using Route 53, you can direct a precise percentage of users to an AWS workload.

If you decide to move all of your workloads to AWS, the recommended multi-AZ architecture would look like the following:

In this architecture, you are using a multi-AZ distributed workload for high availability. You can have a multi-region setup and use Route53 to distribute your workload between AWS Regions. CloudFront helps you to scale and distribute static content via an S3 bucket and DynamoDB, maintaining your application state so that Auto Scaling can apply horizontal scaling without loss of session data. At the database layer, RDS with multi-AZ standby provides high availability and read replica helps achieve scalability.

This is a high-level strategy to help you think through the scalability of your workload by using AWS even if your workload in on premises and not in the cloud…yet.

I highly recommend creating a hybrid, multisite model by placing your on-premises environment replica in the public cloud like AWS Cloud, and using Amazon Route53 DNS Service and Elastic Load Balancing to route traffic between on-premises and cloud environments. AWS now supports load balancing between AWS and on-premises environments to help you scale your cloud environment quickly, whenever required, and reduce it further by applying Amazon auto-scaling and placing a threshold on your on-premises traffic using Route 53.