Our fleet of over 200 locations comprises various generations of servers and routers. And with the ever changing landscape of services and computing demands, it’s imperative that we manage power in our data centers right. This blog is a brief Electrical Engineering 101 session going over specifically how power distribution units (PDU) work, along with some good practices on how we use them. It appears to me that we could all use a bit more knowledge on this topic, and more love and appreciation of something that’s critical but usually taken for granted, like hot showers and opposable thumbs.
A PDU is a device used in data centers to distribute power to multiple rack-mounted machines. It’s an industrial grade power strip typically designed to power an average consumption of about seven US households. Advanced models have monitoring features and can be accessed via SSH or webGUI to turn on and off power outlets. How we choose a PDU depends on what country the data center is and what it provides in terms of voltage, phase, and plug type.
For each of our racks, all of our dual power-supply (PSU) servers are cabled to one of the two vertically mounted PDUs. As shown in the picture above, one PDU feeds a server’s PSU via a red cable, and the other PDU feeds that server’s other PSU via a blue cable. This is to ensure we have power redundancy maximizing our service uptime; in case one of the PDUs or server PSUs fail, the redundant power feed will be available keeping the server alive.
Faraday’s Law and Ohm’s Law
Like most high-voltage applications, PDUs and servers are designed to use AC power. Meaning voltage and current aren’t constant — they’re sine waves with magnitudes that can alternate between positive and negative at a certain frequency. For example, a voltage feed of 100V is not constantly at 100V, but it bounces between 100V and -100V like a sine wave. One complete sine wave cycle is one phase of 360 degrees, and running at 50Hz means there are 50 cycles per second.
The sine wave can be explained by Faraday’s Law and by looking at how an AC power generator works. Faraday’s Law tells us that a current is induced to flow due to a changing magnetic field. Below illustrates a simple generator with a permanent magnet rotating at constant speed and a coil coming in contact with the magnet’s magnetic field. Magnetic force is strongest at the North and South ends of the magnet. So as it rotates on itself near the coil, current flow fluctuates in the coil. One complete rotation of the magnet represents one phase. As the North end approaches the coil, current increases from zero. Once the North end leaves, current decreases to zero. The South end in turn approaches, now the current “increases” in the opposite direction. And finishing the phase, the South end leaves returning the current back to zero. Current alternates its direction at every half cycle, hence the naming of Alternating Current.
Current and voltage in AC power fluctuate in-phase, or “in tandem”, with each other. So by Ohm’s Law of Power = Voltage x Current, power will always be positive. Notice on the graph below that AC power (Watts) has two peaks per cycle. But for practical purposes, we’d like to use a constant power value. We do that by interpreting AC power into “DC” power using root-mean-square (RMS) averaging, which takes the max value and divides it by √2. For example, in the US, our conditions are 208V 24A at 60Hz. When we look at spec sheets, all of these values can be assumed as RMS’d into their constant DC equivalent values. When we say we’re fully utilizing a PDU’s max capacity of 5kW, it actually means that the power consumption of our machines bounces between 0 and 7.1kW (5kW x √2).
It’s also critical to figure out the sum of power our servers will need in a rack so that it falls under the PDU’s design max power capacity. For our US example, a PDU is typically 5kW (208 volts x 24 amps); therefore, we’re budgeting 5kW and fit as many machines as we can under that. If we need more machines and the total sum power goes above 5kW, we’d need to provision another power source. That would lead to possibly another set of PDUs and racks that we may not fully use depending on demand; e.g. more underutilized costs. All we can do is abide by P = V x I.
However there is a way we can increase the max power capacity economically — 3-phase PDU. Compared to single phase, its max capacity is √3 or 1.7 times higher. A 3-phase PDU of the same US specs above has a capacity of 8.6kW (5kW x √3), allowing us to power more machines under the same source. Using a 3-phase setup might mean it has thicker cables and bigger plug; but despite being more expensive than a 1-phase, its value is higher compared to two 1-phase rack setups for these reasons:
It’s more cost-effective, because there are fewer hardware resources to buy
Say the computing demand adds up to 215kW of hardware, we would need 25 3-phase racks compared to 43 1-phase racks.
Each rack needs two PDUs for power redundancy. Using the example above, we would need 50 3-phase PDUs compared to 86 1-phase PDUs to power 215kW worth of hardware.
That also means a smaller rack footprint and fewer power sources provided and charged by the data center, saving us up to √3 or 1.7 times in opex.
It’s more resilient, because there are more circuit breakers in a 3-phase PDU — one more than in a 1-phase. For example, a 48-outlet PDU that is 1-phase would be split into two circuits of 24 outlets. While a 3-phase one would be split into 3 circuits of 16 outlets. If a breaker tripped, we’d lose 16 outlets using a 3-phase PDU instead of 24.
The PDU shown above is a 3-phase model of 48 outlets. We can see three pairs of circuit breakers for the three branches that are intertwined with each other — white, grey, and black. Industry demands today pressure engineers to maximize compute performance and minimize physical footprint, making the 3-phase PDU a widely-used part of operating a data center.
What is 3-phase?
A 3-phase AC generator has three coils instead of one where the coils are 120 degrees apart inside the cylindrical core, as shown in the figure below. Just like the 1-phase generator, current flow is induced by the rotation of the magnet thus creating power from each coil sequentially at every one-third of the magnet’s rotation cycle. In other words, we’re generating three 1-phase power offset by 120 degrees.
A 3-phase feed is set up by joining any of its three coils into line pairs. L1, L2, and L3 coils are live wires with each on their own phase carrying their own phase voltage and phase current. Two phases joining together form one line carrying a common line voltage and line current. L1 and L2 phase voltages create the L1/L2 line voltage. L2 and L3 phase voltages create the L2/L3 line voltage. L1 and L3 phase voltages create the L1/L3 line voltage.
Let’s take a moment to clarify the terminology. Some other sources may refer to line voltage (or current) as line-to-line or phase-to-phase voltage (or current). It can get confusing, because line voltage is the same as phase voltage in 1-phase circuits, as there’s only one phase. Also, the magnitude of the line voltage is equal to the magnitude of the phase voltage in 3-phase Delta circuits, while the magnitude of the line current is equal to the magnitude of the phase current in 3-phase Wye circuits.
Conversely, the line current equals to phase current times √3 in Delta circuits. In Wye circuits, the line voltage equals to phase voltage times √3.
In Delta circuits: Vline = Vphase Iline = √3 x Iphase
In Wye circuits: Vline = √3 x Vphase Iline = Iphase
Delta and Wye circuits are the two methods that three wires can join together. This happens both at the power source with three coils and at the PDU end with three branches of outlets. Note that the generator and the PDU don’t need to match each other’s circuit types.
On PDUs, these phases join when we plug servers into the outlets. So we conceptually use the wirings of coils above and replace them with resistors to represent servers. Below is a simplified wiring diagram of a 3-phase Delta PDU showing the three line pairs as three modular branches. Each branch carries two phase currents and its own one common voltage drop.
And this one below is of a 3-phase Wye PDU. Note that Wye circuits have an additional line known as the neutral line where all three phases meet at one point. Here each branch carries one phase and a neutral line, therefore one common current. The neutral line isn’t considered as one of the phases.
Thanks to a neutral line, a Wye PDU can offer a second voltage source that is √3 times lower for smaller devices, like laptops or monitors. Common voltages for Wye PDUs are 230V/400V or 120V/208V, particularly in North America.
Where does the √3 come from?
Why are we multiplying by √3? As the name implies, we are adding phasors. Phasors are complex numbers representing sine wave functions. Adding phasors is like adding vectors. Say your GPS tells you to walk 1 mile East (vector a), then walk a 1 mile North (vector b). You walked 2 miles, but you only moved by 1.4 miles NE from the original location (vector a+b). That 1.4 miles of “work” is what we want.
Let’s take in our application L1 and L2 in a Delta circuit. we add phases L1 and L2, we get a L1/L2 line. We assume the 2 coils are identical. Let’s say α represents the voltage magnitude for each phase. The 2 phases are 120 degrees offset as designed in the 3-phase power generator:
Since voltage is a scalar, we’re only interested in the “work”:
|L1/L2| = √3α
Given that α also applies for L3. This means for any of the three line pairs, we multiply the phase voltage by √3 to calculate the line voltage.
Vline = √3 x Vphase
Now with the three line powers being equal, we can add them all to get the overall effective power. The derivation below works for both Delta and Wye circuits.
Poverall = 3 x Pline Poverall = 3 x (Vline x Iline) Poverall = (3/√3) x (Vphase x Iphase) Poverall = √3 x Vphase x Iphase
Using the US example, Vphase is 208V and Iphase is 24A. This leads to the overall 3-phase power to be 8646W (√3 x 208V x 24A) or 8.6kW. There lies the biggest advantage for using 3-phase systems. Adding 2 sets of coils and wires (ignoring the neutral wire), we’ve turned a generator that can produce √3 or 1.7 times more power!
Dealing with 3-phase
The derivation in the section above assumes that the magnitude at all three phases is equal, but we know in practice that’s not always the case. In fact, it’s barely ever. We rarely have servers and switches evenly distributed across all three branches on a PDU. Each machine may have different loads and different specs, so power could be wildly different, potentially causing a dramatic phase imbalance. Having a heavily imbalanced setup could potentially hinder the PDU’s available capacity.
A perfectly balanced and fully utilized PDU at 8.6kW means that each of its three branches has 2.88kW of power consumed by machines. Laid out simply, it’s spread 2.88 + 2.88 + 2.88. This is the best case scenario. If we were to take 1kW worth of machines out of one branch, spreading power to 2.88 + 1.88 + 2.88. Imbalance is introduced, the PDU is underutilized, but we’re fine. However, if we were to put back that 1kW into another branch — like 3.88 + 1.88 + 2.88 — the PDU is over capacity, even though the sum is still 8.6kW. In fact, it would be over capacity even if you just added 500W instead of 1kW on the wrong branch, thus reaching 3.18 + 1.88 + 2.88 (8.1kW).
That’s because a 8.6kW PDU is spec’d to have a maximum of 24A for each phase current. Overloading one of the branches can force phase currents to go over 24A. Theoretically, we can reach the PDU’s capacity by loading one branch until its current reaches 24A and leave the other two branches unused. That’ll render it into a 1-phase PDU, losing the benefit of the √3 multiplier. In reality, the branch would have fuses rated less than 24A (usually 20A) to ensure we won’t reach that high and cause overcurrent issues. Therefore the same 8.6kW PDU would have one of its branches tripped at 4.2kW (208V x 20A).
Loading up one branch is the easiest way to overload the PDU. Being heavily imbalanced significantly lowers PDU capacity and increases risk of failure. To help minimize that, we must:
Ensure that total power consumption of all machines is under the PDU’s max power capacity
Try to be as phase-balanced as possible by spreading cabling evenly across the three branches
Ensure that the sum of phase currents from powered machines at each branch is under the fuse rating at the circuit breaker.
For the sake of simplicity, let’s ignore other machines like switches. Our latest 2U4N servers are rated at 1800W. That means we can only fit a maximum of four of these 2U4N chassis (8600W / 1800W = 4.7 chassis). Rounding them up to 5 would reach a total rack level power consumption of 9kW, so that’s a no-no.
Splitting 4 chassis into 3 branches evenly is impossible, and will force us to have one of the branches to have 2 chassis. That would lead to a non-ideal phase balancing:
Keeping phase currents under 24A, there’s only 1.1A (24A – 22.9A) to add on L1 or L2 before the PDU gets overloaded. Say we want to add as many machines as we can under the PDU’s power capacity. One solution is we can add up to 242W on the L1/L2 branch until both L1 and L2 currents reach their 24A limit.
Alternatively, we can add up to 298W on the L2/L3 branch until L2 current reaches 24A. Note we can also add another 298W on the L3/L1 branch until L1 current reaches 24A.
In the examples above, we can see that various solutions are possible. Adding two 298W machines each at L2/L3 and L3/L1 is the most phase balanced solution, given the parameters. Nonetheless, PDU capacity isn’t optimized at 7.8kW.
Dealing with a 1800W server is not ideal, because whichever branch we choose to power one would significantly swing the phase balance unfavorably. Thankfully, our Gen X servers take up less space and are more power efficient. Smaller footprint allows us to have more flexibility and fine-grained control over our racks in many of our diverse data centers. Assuming each 1U server is 450W, as if we physically split the 1800W 2U4N into fours each with their own power supplies, we’re now able to fit 18 nodes. That’s 2 more nodes than the four 2U4N setup:
Adding two more servers here means we’ve increased our value by 12.5%. While there are more factors not considered here to calculate the Total Cost of Ownership, this is still a great way to show we can be smarter with asset costs.
Cloudflare provides the back-end services so that our customers can enjoy the performance, reliability, security, and global scale of our edge network. Meanwhile, we manage all of our hardware in over 100 countries with various power standards and compliances, and ensure that our physical infrastructure is as reliable as it can be.
At Cloudflare we pride ourselves in our global network that spans more than 200 cities in over 100 countries. To handle all the traffic passing through our network, there are multiple technologies at play. So let’s have a look at one of the cornerstones that makes all of this work… ASICs. No, not the running shoes.
What’s an ASIC?
ASIC stands for Application Specific Integrated Circuit. The name already says it, it’s a chip with a very narrow use case, geared towards a single application. This is in stark contrast to a CPU (Central Processing Unit), or even a GPU (Graphics Processing Unit). A CPU is designed and built for general purpose computation, and does a lot of things reasonably well. A GPU is more geared towards graphics (it’s in the name), but in the last 15 years, there’s been a drastic shift towards GPGPU (General Purpose GPU), in which technologies such as CUDA or OpenCL allow you to use the highly parallel nature of the GPU to do general purpose computing. A good example of GPU use is video encoding, or more recently, computer vision, used in applications such as self-driving cars.
Unlike CPUs or GPUs, ASICs are built with a single function in mind. Great examples are the Google Tensor Processing Units (TPU), used to accelerate machine learning functions, or for orbital maneuvering, in which specific orbital maneuvers are encoded, like the Hohmann Transfer, used to move rockets (and their payloads) to a new orbit at a different altitude. And they are also heavily used in the networking industry. Technically, the use case in the network industry should be called an ASSP (Application Specific Standard Product), but network engineers are simple people, so we prefer to call it an ASIC.
Why an ASIC
ASICs have the major benefit of being hyper-efficient. The more complex hardware is, the more it will need cooling and power. As ASICs only contain the hardware components needed for their function, their overall size can be reduced, and so are their power requirements. This has a positive impact on the overall physical size of the network (devices don’t need to be as bulky to provide sufficient cooling), and helps reduce the power consumption of a data center.
Reducing hardware complexity also reduces the failure rate of the manufacturing process, and allows for easier production.
The downside is that you need to embed a lot of your features in hardware, and once a new technology or specification comes around, any chips made without that technology baked in, won’t be able to support it (VXLAN for example).
For network equipment, this works perfectly. Overall, the networking industry is slow-moving, and considerable time is taken before new technologies make it to the market (as can be seen with IPv6, MPLS implementations, xDSL availability, …). This means the chips don’t need to evolve on a yearly basis, and can instead be created on a much slower cycle, with bigger leaps in technology. For example, it took Broadcom two years to go from Tomahawk 3 to Tomahawk 4, but in that process they doubled the throughput. The benefits listed earlier are super helpful for network equipment, as they allow for considerable throughput in a small form factor.
Building an ASIC
As with chips of any kind, building an ASIC is a long-term process. Just like with CPUs, if there’s a defect in the hardware design, you have to start from scratch, and scrap the entire build line. As such, the development lifecycle is incredibly long. It starts with prototyping in an FPGA (Field Programmable Gate Array), in which chip designers can program their required functionality and confirm compatibility. All of this is done in a HDL (Hardware Description Language), such as Verilog.
Once the prototyping stage is over, they move to baking the new packet processing pipeline into the chip at a foundry. After that, no more changes can be made to the chip, as it’s literally baked into the hardware (unlike an FPGA, which can be reprogrammed). Further difficulty is added by the fact that there are a very small number of hardware companies that will buy ASICs in bulk to build equipment with; as such the unit cost can increase drastically.
All of this means that the iteration cycle of an ASIC tends to be on the slower side of things (compared to the yearly refreshes in the Intel Process-Architecture-Optimization model for example), and will usually be smaller incremental updates: For example, increases in port-speeds are incremental (1G → 10G → 25G → 40G → 100G → 400G → 800G → …), and are tied into upgrades to the SerDes (Serialiser/Deserialiser) part of the chip.
New protocol support is a lot harder, and might require multiple development cycles before it shows up in a chip.
What ASICs do
The ASICs in our network equipment are responsible for the switching and routing of packets, as well as being the first layer of defense (in the form of a stateless firewall). Due to the sheer nature of how fast packets get switched, fast memory access is a primary concern. Most ASICs will use a special sort of memory, called TCAM (Ternary Content-Addressable Memory). This memory will be used to store all sorts of lookup tables. These may be forwarding tables (where does this packet go), ACL (Access Control List) tables (is this packet allowed), or CoS (Class of Service) tables (which priority should be given to this packet)
CAM, and its more advanced sibling, TCAM, are fascinating kinds of memory, as they operate fundamentally different than traditional Random Access Memory (RAM). While you have to use a memory address to access data in RAM, with CAM and TCAM you can directly refer to the content you are looking for. It is a physical implementation of a key-value store.
In CAM you use the exact binary representation of a word, in a network application, that word is likely going to be an IP address, so 11001011.00000000.01110001.00000000 for example (203.0.113.0). While this is definitely useful, networks operate a big collection of IP addresses, and storing each individually would require significant memory. To remedy this memory requirement, TCAM can store three states, instead of the binary two. This third state, sometimes called ‘ignore’ state, allows for the storage of multiple sequential data words as a single entry.
In networking, these sequential data words are IP prefixes. So for the previous example, if we wanted to store the collection of that IP address, and the 254 IPs following it, in TCAM it would as follows: 11001011.00000000.01110001.XXXXXXXX (203.0.113.0/24). This storage method means we can ask questions of the ASIC such as “where should I send packets with the destination IP address of 203.0.113.19”, to which the ASIC can have a reply ready in a single clock cycle, as it does not need to run through all memory, but instead can directly reference the key. This reply will usually be a reference to a memory address in traditional RAM, where more data can be stored, such as output port, or firewall requirements for the packet.
To dig a bit deeper into what ASICs do in network equipment, let’s briefly go over some fundamentals.
Networking can be split into two primary components: routing and switching. Switching allows you to directly interconnect multiple devices, so they can talk with each other across the network. It’s what allows your phone to connect to your TV to play a new family video. Routing is the next level up. It’s the mechanism that interconnects all these switched networks into a network of networks, and eventually, the Internet.
So routers are the devices responsible for steering traffic through this complex maze of networks, so it gets to its destination safely, and hopefully, as fast as possible. On the Internet, routers will usually use a routing protocol called BGP (Border Gateway Protocol) to exchange reachability information for a prefix (a collection of IP addresses), also called NLRI (Network Layer Reachability Information).
As with navigating the roads, there are multiple ways to get from point A to point B on the Internet. To make sure the router makes the right decision, it will store all of the reachability information in the RIB (Routing Information Base). That way, if anything changes with one route, the router still has other options immediately available.
With this information, a BGP daemon can calculate the ideal path to take for any given destination from its own point-of-view. This Cisco documentation explains the decision process the daemon goes through to calculate that ideal path.
Once we have this ideal path for a given destination, we should store this information, as it would be very inefficient to calculate this every single time we need to go there. The storage database is called the FIB (Forwarding Information Base). The FIB will be a subset of the RIB, as it will only ever contain the best path for a destination at any given time, while the RIB keeps all the available paths, even the non-ideal ones.
With these individual components, routers can make packets go from point A to point B in a blink of an eye.
Here’ are some of the more specific functions our ASICs need to perform:
FIB install: Once the router has calculated its FIB, it’s important the router can access this as quickly as possible. To do so, the ASIC will install (write) this calculated FIB into the TCAM, so any lookups can happen as quickly as possible.
Packet forwarding lookups: as we need to know where to send a received packet, we look up this information in TCAM, which is, as we mentioned, incredibly fast.
Stateless Firewall: while a router routes packets between destinations, you also want to ensure that certain packets don’t reach a destination at all. This can be done using either a stateless or stateful firewall. “State” in this case refers to TCP state, so the router would need to understand if a connection is new, or already established. As maintaining state is a complex issue, which requires storing tables, and can quickly consume a lot of memory, most routers will only operate a stateless firewall. Instead, stateful firewalls often have their own appliances. At Cloudflare, we’ve opted to move maintaining state to our compute nodes, as that severely reduces the state-table (one router for all state vs X metals for all state combined). A stateless firewall makes use of the TCAM again to store rules on what to do with specific types of packets. For example, one of the rules we employ at our edge is DENY-BOGON-RANGES , in which we discard traffic sourced from RFC1918 space (and other unroutable space). As this makes use of TCAM, it can all be done at line rate (the maximum speed of the interface).
Advanced features, such as GRE encapsulation: modern networking isn’t just packet switching and packet routing anymore, and more advanced features are needed. One of these is encapsulation. With packet encapsulation, a system will put a data packet into another data packet. Using this technique, it’s possible to build a network on top of an existing network (an overlay). Overlays can be used to build a virtual backbone for example, in which multiple locations can be virtually connected through the Internet. While you can encapsulate packets on a CPU (we do this for Magic Transit), there are considerable challenges in doing so in software. As such, the ASIC can have built-in functionality to encapsulate a packet in a multitude of protocols, such as GRE. You may not want encapsulated packets to have to take a second trip through your entire pipeline, as this adds latency, so these shortcuts can also be built into the chip.
MPLS, EVPN, VXLAN, SDWAN, SDN, …: I ran out of buzzwords to enumerate here, but while MPLS isn’t new (the first RFC was created in 2001), it’s a rather advanced requirement, just as the others listed, which means not all ASIC vendors will implement this for all their chips due to the increased complexity.
At Cloudflare, we interact with both hardware and software vendors on a daily basis while operating our global network. As we’re talking about ASICs today, we’ll explore the hardware landscape, but some hardware vendors also have their own NOS (Network Operating System). There’s a vast selection of hardware out there, all with different features and pricing. It can become incredibly hard to see the wood for the trees, so we’ll focus on 4 important distinguishing factors: Throughput (how many bits can the ASIC push through), buffer size (how many bits can the ASIC store in memory in case of resource contention), programmability (how easy is it for a third party programmer like Cloudflare to interact directly with the ASIC), feature set (how many advanced things outside of routing/switching can the ASIC do).
The landscape is so varied because different companies have different requirements. A company like Cloudflare has different expectations for its network hardware than your typical corner shop. Even within our own network we’ll have different requirements for the different layers that make up our network.
The elephant in the networking room (or is it the jumbo frame in the switch?) is Broadcom. Broadcom is a semiconductor company, with their primary revenue in the wired infrastructure segment (over 50% of revenue). While they’ve been around since 1991, they’ve become an unstoppable force in the last 10 years, in part due to their reliance on Apple (25% of revenue). As a semiconductor manufacturer, their market dominance is primarily achieved by acquiring other companies. A great example is the acquisition of Dune Networks, which has become an excellent revenue generator as the StrataDNX series of ASIC (Arad, QumranMX, Jericho). As such, they have become the biggest ASIC vendor by far, and own 59% of the entire Ethernet Integrated Circuits market.
As such, they supply a lot of merchant silicon to Cisco, Juniper, Arista and others. Up until recently, if you wanted to use the Broadcom SDK to accelerate your packet forwarding, you have to sign so many NDAs you might get a hand cramp, which makes programming them a lot trickier. This changed recently when Broadcom open-sourced their SDK. Let’s have a quick look at some of their products.
The Tomahawk line of ASICs are the bread-and-butter for the enterprise market. They’re cheap and incredibly fast. The first generation of Tomahawk chips did 3.2Tbps linerate, with low-latency switching. The latest generation of this chip (Tomahawk 4) does 25.6Tbps in a 7nm transistor footprint). As you can’t have a cheap, fast, and full feature set for a single package, this means you lose out on features. In this case, you’re missing most of the more advanced networking technologies such as VXLAN, and you have no buffer to speak of. As an example of a different vendor using this silicon, you can have a look at the Juniper QFX5200 switching platform.
StrataDNX (Arad, QumranMX, Jericho)
These chipsets came through the acquisition of Dune Networks, and are a collection of high-bandwidth, deep buffer (large amount of memory available to store (buffer) packets) chips, allowing them to be deployed in versatile environments, including the Cloudflare edge. The Arista DCS-7280SR that we run in some of our edge locations as edge routers run on the Jericho chipset. Since then, the chips have evolved, and with Jericho2, Broadcom now have a 10Tbps deep buffer chip. With their fabric chip (this links multiple ASICs together), you can build switches with 48x400G ports without much effort. Cisco built their NCS5500 line of routers using the QumranMX.
This ASIC is an upgrade from the Tomahawk chipset, with a complex and extensive feature set, while maintaining high throughput rates. The latest Trident4 does 12.8Tbps at incredibly low latencies, making it an incredibly flexible platform. It unfortunately has no buffer space to speak of, which limits its scope for Cloudflare, as we need the buffer space to be able to switch between the different port speeds we have on our edge routers. The Arista 7050X and 7300X are built on top of this.
Intel is well known in the network industry for building stable and high-performance 10G NICs (Network Interface Controller). They’re not known for ASICs. They made an initial attempt with their acquisition of Fulcrum, which built the FM6000 series of ASIC, but nothing of note was really built with them. Intel decided to try again in 2019 with their acquisition of Barefoot. This small manufacturer is responsible for the Barefoot Tofino ASIC, which may well be a fundamental paradigm shift in the network industry.
The Tofino is built using a PISA (Protocol Independent Switch Architecture), and using P4 (Programming Protocol-Independent Packet Processors), you can program the data-plane (packet forwarding) as you see fit. It’s a drastic move away from the traditional method of networking, in which direct programming of the ASIC isn’t easily possible, and definitely not through a standard programming language. As an added benefit, P4 also allows you to perform a formal verification of your forwarding program, and be sure that it will do what you expect it to. Caveat: OpenFlow tried this, but unfortunately never really got much traction. 
There are multiple variations of the Tofino 1 available, but the top-end ASIC has a 6.5Tbps linerate capacity. As the ASIC is programmable, its featureset is as rich as you’d want it to be. Unfortunately, the chip does not come with a lot of buffer memory, so we can’t deploy these as edge devices (yet). Both Arista (7170 Series) and Cisco (Nexus 34180YC and 3464C series) have built equipment with the Tofino chip inside.
As some of you may know, Mellanox is the vendor that recently got acquired by Nvidia, which also provides our 25G NICs in our compute nodes. Besides NICs, Mellanox has a well-established line of ASICs, mostly for switching.
The latest iteration of this ASIC, Spectrum 3 offers 12.8Tbps switching capacity, with an extensive featureset, including Deep Packet Inspection and NAT. This chip allows for building dense high-speed port devices, going up to 25.6Tbps. Buffering wise, there’s none to really speak of (64MB). Mellanox also builds their own hardware platforms. Unlike the other vendors below, they aren’t shipped with the Mellanox Operating System, instead, they offer you a variety of choices to run on top, including Cumulus Linux (which was also acquired by Nvidia 🤔).
As mentioned, while we use their NIC technology extensively, we currently don’t have any Mellanox ASIC silicon in our network.
Juniper is a network hardware supplier, and currently the biggest supplier of network equipment for Cloudflare. As previously mentioned in the Broadcom section, Juniper buys some of their silicon from Broadcom, but they also have a significant lineup of home-grown silicon, which can be split into 2 families: Trio and Express.
The Express family is the switching-skewed family, where bandwidth is a priority, while still maintaining a broad range of feature capabilities. These chips live in the same application landscape as the Broadcom StrataDNX chips.
The Q5 is the new generation of the Juniper switching ASIC. While by itself it doesn’t boast high linerates (500Gbps), when combined into a chassis with a fabric chip (Clos network in this case), they can produce switches (or line cards) with up to 12Tbps of throughput capacity. In addition to allowing for high-throughput, dense network appliances, the chip also comes with a staggering amount of buffer space (4GB per ASIC), provided by external HMC (Hybrid Memory Cube). In this HMC, they’ve also decided to put the FIB, MAC and other tables (so no TCAM). The Q5 chip is used in their QFX1000 lineup of switches, which include the QFX10002-36Q, QFX10002-60C, QFX10002-72Q and QFX10008, all of which are deployed in our datacenters, as either edge routers or core aggregation switches.
The ExpressPlus is the more feature-rich and faster evolution of the Paradise chip. It offers double the bandwidth per chip (1Tbps) and is built into a combined Clos-fabric reaching 6Tbps in a 2U form-factor (PTX10002). It also has an increased logical scale, which comes with bigger buffers, larger FIB storage, and more ACL space.
The ExpressPlus drives some of the PTX line of IP routers, together with its newest sibling, Triton.
Triton is the latest generation of ASIC in the Express family, with 3.6Tbps of capacity per chip, making way for some truly bandwidth-dense hardware. Both Triton and ExpressPlus are 400GE capable.
The Trio family of chips are primarily used in the feature-heavy MX routing platform, and is currently at its 5th generation.
A Juniper MPC4E-3D-32XGE line card
Trio Eagle (Trio 4.0) (EA)
The Trio Eagle is the previous generation of the Trio Penta, and can be found on the MPC7E line cards for example. It’s a feature-rich ASIC, with a 400Gbps forwarding capacity, and significant buffer and TCAM capacity (as is to be expected from a routing platform ASIC)
Trio Penta (Trio 5.0) (ZT)
Penta is the new generation routing chip, which is built for the MX platform routers. On top of being a very beefy chip, capable of 500Gbps per ASIC, allowing Juniper to build line cards of up to 4Tbps of capacity, the chip also has a lot of baked in features, offering advanced hardware offloading for for example MACSec, or Layer 3 IPsec.
The Penta chip is packaged on the MPC10E and MPC11E line card, which can be installed in multiple variations of the MX chassis routers (MX480 included).
Last but not least, there’s Cisco. As the saying goes “nobody ever got fired for buying Cisco”, they’re the biggest vendor of network solutions around. Just like Juniper, they have a mixed product fleet of merchant silicon, as well as home-grown. While we used to operate Cisco routers as edge routers (Cisco ASR 9000), this is no longer the case. We do still use them heavily for our ToR (Top-of-Rack) switching needs, utilizing both their Nexus 5000 series and Nexus 9000 series switches.
Bigsur is custom silicon developed for the Nexus 6000 line of switches (confusingly, the switches themselves are called Cisco Nexus 5672UP and Cisco Nexus 6001). In our specific model, the Cisco Nexus 5672UP, there’s 7 of them interconnected, providing 10G and 40G connectivity. Unfortunately Cisco is a lot more tight-lipped about their ASIC capabilities, so I can’t go as deep as I did with the Juniper chips. Feature-wise, there’s not a lot we require from them in our edge network. They’re simple Layer 2 forwarding switches, with no added requirements. Buffer wise, they use a system called Virtual Output Queueing, just like the Juniper Express chip. Unlike the Juniper silicon, the Bigsur ASIC doesn’t come with a lot of TCAM or buffer space.
The Tahoe is the Cisco ASIC found in the Cisco 9300-EX switches, also known as the LSE (Leaf Spine Engine). It offers higher-density port configurations compared to the Bigsur (1.6Tbps). Overall, this ASIC is a maturation of the Bigsur silicon, offering more advanced features such as advanced VXLAN+EVPN fabrics, greater port flexibility (10G, 25G, 40G and 100G), and increased buffer sizes (40MB). We use this ASIC extensively in both our edge data centers as well as in our core data centers.
A lot of different factors come into play when making the decision to purchase the next generation of Cloudflare network equipment. This post only scratches the surface of technical considerations to be made, and doesn’t come near any other factors, such as ecosystem contributions, openness, interoperability, or pricing. None of this would’ve been possible without the contributions from other network engineers—this post was written on the shoulders of giants. In particular, thanks to the excellent work by Jim Warner at UCSC, the engrossing book on the new MX platforms, written by David Roy (Day One: Inside the MX 5G), as well as the best book on the Juniper QFX lineup: Juniper QFX10000 Series by Douglas Richard Hanks Jr, and to finish it off, the Summary of Network ASICs post by Justin Pietsch.
Many Amazon Web Services (AWS) customers need enhanced insight into IP network flow. Traditionally, cost, the complexity of collection, and the time required for analysis has led to incomplete investigations of network flows. Having good telemetry is paramount, and VPC Flow Logs are a very important part of a robust centralized logging architecture. The information that VPC Flow Logs provide is frequently used by security analysts to determine the scope of security issues, to validate that network access rules are working as expected, and to help analysts investigate issues and diagnose network behaviors. Flow logs capture information about the IP traffic going to and from EC2 interfaces in a VPC. Each record describes aspects of the traffic flow, such as where it originated and where it was sent to, what network ports were used, and how many bytes were sent.
Amazon Detective now enables you to interactively examine the details of the virtual private cloud (VPC) network flows of your Amazon Elastic Compute Cloud (Amazon EC2) instances. Amazon Detective makes it easy to analyze, investigate, and quickly identify the root cause of potential security issues or suspicious activities. Detective automatically collects VPC flow logs from your monitored accounts, aggregates them by EC2 instance, and presents visual summaries and analytics about these network flows. Detective doesn’t require VPC Flow Logs to be configured and doesn’t impact existing flow log collection.
In this blog post, I describe how to use the new VPC flow feature in Detective to investigate an UnauthorizedAccess:EC2/TorClient finding from Amazon GuardDuty. Amazon GuardDuty is a threat detection service that continuously monitors for malicious activity and unauthorized behavior to protect your AWS accounts, workloads, and data stored in Amazon S3. GuardDuty documentation states that this alert can indicate unauthorized access to your AWS resources with the intent of hiding the unauthorized user’s true identity. I’ll demonstrate how to use Amazon Detective to investigate an instance that was flagged by Amazon GuardDuty to determine whether it is compromised or not.
Starting the investigation in GuardDuty
In my GuardDuty console, I’m going to select the UnauthorizedAccess:EC2/TorClient finding shown in Figure 1, choose the Actions menu, and select Investigate.
Figure 1: Investigating from the GuardDuty console
This opens a new browser tab and launches the Amazon Detective console, where I’m presented with the profile page for this finding, shown in Figure 2. You must have Detective enabled to pivot between a GuardDuty finding and Detective. Detective provides profile pages for supported GuardDuty findings and AWS resources (for example, IP address, EC2 instance, user, and role) that include information and data visualizations that summarize observed behaviors and give guidance for interpreting them. Profiles help analysts to determine whether the finding is of genuine concern or a false positive. For resources, profiles provide supporting details for an investigation into a finding or for a general hunt for suspicious activity.
Figure 2: Finding profile panel
In this case, the profile page for this GuardDuty UnauthorizedAccess:EC2/TorClient finding provides contextual and behavioral data about the EC2 instance on which GuardDuty has noted the issue. As I dive into this finding, I’m going to be asking questions that help assess whether the instance was in fact accessed unintentionally, such as, “What IP port or network service was in use at that time?,” “Were any large data transfers involved?,” “Was the traffic allowed by my security groups?” Profile pages in Detective organize content that helps security analysts investigate GuardDuty findings, examine unexpected network behavior, and identify other AWS resources that might be affected by a potential security issue.
I begin scrolling down the page and notice the Findings associated with EC2 instance i-9999999999999999 panel. Detective displays related findings to provide analysts with additional evidence and context about potentially related issues. The finding I’m investigating is listed there, as well as an Unusual Behaviors/VM/Behavior:EC2-NetworkPortUnusual finding. GuardDuty builds a baseline on your network traffic and will generate findings where there is traffic outside the calculated normal. While we might not investigate every instance of anomalous traffic, having these alerts correlated by Detective provides context for validating the issue. Keeping this in mind as I scroll down, at the bottom of this profile page, I find the Overall VPC flow volume panel. If you choose the Info link next to the panel title, you can see helpful tips that describe how to use the visualizations and provide ideas for questions to ask within your investigation. These info links are available throughout Detective. Check them out!
Investigating VPC flow in Detective
In this investigation, I’m very curious about the two large spikes in inbound traffic that I see in the Overall VPC flow volume panel, which seem to be visually associated with some unusual outbound traffic spikes. It’s most likely that these outbound spikes are related to the Unusual Behaviors/VM/Behavior:EC2-NetworkPortUnusual finding I mentioned earlier. To start the investigation, I choose the display details for scope time button, shown circled at the bottom of Figure 2. This expands the VPC Flow Details, shown in Figure 3.
Figure 3: Our first look at VPC Flow Details
We now can see that each entry displays the volume of inbound traffic, the volume of outbound traffic, and whether the access request was accepted or rejected. Detective provides annotations on the VPC flows to help guide your investigation. These From finding annotations make it clear which flows and resources were involved in the finding. In this case, we can easily see (in Figure 3) the three IP addresses at the top of the list that triggered this GuardDuty finding.
I’m first going to focus on the spikes in traffic that are above the baseline. When I click on one of the spikes in the graph, the time window for the VPC flow activity now matches the dates of these spikes I’m investigating.
If I choose the Inbound Traffic column header, shown in Figure 4, I can find the flows that contributed to the spike during this time window.
Figure 4: Inbound traffic spikes
Note that the two large inbound spikes aren’t associated with the IP address from the UnauthorizedAccess:EC2/TorClient finding, based on the Detective annotation From finding. Let’s check the outbound traffic. If I do a quick sort of the table based on the outbound traffic column, as shown in Figure 5, we can also see the outbound spikes, and it isn’t immediately evident whether the spikes are associated with this finding. I could continue to investigate the spikes (because they are a visual anomaly), or focus just on the VPC flow traffic that GuardDuty and Detective have labeled as associated with this TOR finding.
Figure 5: Outbound traffic spikes
Let’s focus on the outbound and inbound spikes and see if we can determine what’s happening. The inbound spikes are on port 443, typically an HTTPS port, or a secure web connection. The outbound spikes are on port 22 (ssh), but go to IP addresses that look to be internal based on their addresses of 172.16.x.x. The port 443 traffic might indicate a web server that’s open to the internet and receiving traffic. With further investigation, we can determine if this idea is valid, and continue hunting for potentially malicious traffic.
A good next step would be to investigate the two specific IP addresses to rule out their involvement in the finding. I can do this by right-clicking on either of the external IP addresses and opening a new tab, where I can focus on investigating these two specific IP addresses. I would take this line of investigation to possibly rule out the involvement of these IP addresses in this finding, determine if they regularly communicate with my resources, find out what instance(s) they’re related to, and see if there are other findings associated with these instances or IP addresses. This deeper investigation is outside the scope of this blog post, but it’s something you should be doing in your own environment.
IP addresses in AWS are ephemeral in nature. The unique identifier in VPC flow logs is the Instance ID. At the time of this investigation, 172.16.0.7 is assigned to the instance related to this finding, so let’s continue to take a look at the internal 172.16.0.7 IP address with 218 MB outbound traffic on port 22. I choose 172.16.0.7, and Detective opens up the profile page for this specific IP address, as shown in Figure 6. Here we see some interesting correlations: two other GuardDuty findings related to SSH brute-force attacks. These could be related to our outbound port 22 spikes, because they’re certainly in the window of time we’re investigating.
Figure 6: IP address profile panel
As part of a deeper investigation, you would investigate the SSH brute-force findings for 198.51.100.254 and 203.0.113.83 but for now I’m interested what this IP is involved in. Detective easily associates this 172.16.0.7 IP address with the instance that was assigned the IP during the scope time. I scroll down to the bottom of the profile page for 172.16.0.7 and investigate the i-9999999999999999 instance by choosing the instance name.
Filtering VPC flow activity
In Detective, as the investigator we are looking at an instance profile panel, similar to the one in Figure 2, and since we’re interested in VPC flow details, I’m going to scroll down and select display details for scope time.
To focus on specific activity, I can filter the activity details by the following values:
Local or remote port
Whether the request was accepted or rejected
I’m going to filter these VPC flow details and just look at port 22 (sshd) inbound traffic. I select the Filter check box and select Local Port and 22, as shown in Figure 7. Detective fills in all the available ports for you, making it easy to complete this filter.
Figure 7: Port 22 traffic
The activity details show a few IP addresses related to port 22, and we’re still following the large outbound spikes of traffic. It’s outside the scope of this blog post, but now it would be time to start looking at your security groups and network access control lists (ACLs) and determine why port 22 is open to the internet and sending all this traffic.
Understanding traffic behavior
As an investigator, I now have a good picture of the traffic related to the initial finding, and by diving deeper we’re able to discover other interesting traffic during the same timeframe. While we may not always determine “who has done it,” the goal should be to improve our understanding of the behavior of our environment and gather important technical evidence. Detective helps you identify and investigate anomalies to give you insight into your environment. If we were to continue our investigation into the finding, here are some actions we can take within Detective.
Investigate VPC findings with Detective:
Perform ports and utilization analysis
Identify service and ephemeral ports
Determine whether traffic was accepted or rejected based on security groups and NACL configurations
Investigate possible reconnaissance traffic by exploring the significant amount of rejected traffic
Correlate EC2 instances to TCP/IP ports and IPs
Analyze traffic spikes and anomalies
Discover traffic patterns and make behavioral correlations
Explore EC2 instance behavior with Detective:
Directional Traffic Analysis
Investigate possible data exfiltration events by digging into large transfers
Enumerate distinct IP connections and sort and filter by protocol, amount of traffic, and traffic direction
Gather data related to a spike in port count from a single IP address (potential brute force) or multiple IP addresses (distributed denial of service (DDoS))
Additional forensics steps to consider
Snapshot EC2 Volumes
Memory dump of EC2 instance
Isolate EC2 instance
Review your authentication strategy and assess whether the chosen authentication method is sufficient to protect your asset
Without requiring you to set up infrastructure or spend time configuring log ingestion, Detective collects, organizes, and presents relevant data for your threat analysis and investigations. Security and operations teams will find this new capability helpful for simplifying EC2 traffic analysis, validating security group permissions, and diagnosing EC2 instance behavior. Detective does the heavy lifting of storing, and analyzing VPC flow data so you can focus on quickly answering your investigative questions. VPC network flow details are available now in all Detective supported Regions and are included as part of your service subscription.
Cloudflare recently shipped improved upload speeds across our network for clients using HTTP/2. This post describes our journey from troubleshooting an issue to fixing it and delivering faster upload speeds to the global Internet.
Upload speed is more important than ever, especially for people using home broadband connections. As many people have been forced to work from home they’re using their broadband connections differently than before. Prior to the pandemic broadband traffic was very asymmetric (you downloaded way more than you uploaded… think listening to music, or streaming a movie), but now we’re seeing an increase in uploading as people video conference from home or create content from their home office.
User reports were focused on particularly fast home networks. I set up a dummynet network simulator to test upload speed in a controlled environment. I launched a linux VM running our code inside my Macbook Pro and set up a dummynet between the VM and Mac host. Measuring upload speed is simple – create a file and upload using curl to an endpoint which accepts a request body. I ran the same test 20 times and took a median upload speed (Mbps).
Stepping up to uploading a 10MB object over a network which has 200Mbps available bandwidth and 40ms RTT, the result was surprising. Using our production configuration, HTTP/2 upload speed tested at almost half of the same test conditions using HTTP/1.1 (higher is better).
The result may differ depending on your network, but the gap is bigger when the network is fast. On a slow network, like my home cable connection (5Mbps upload and 20ms RTT), HTTP/2 upload speed was almost identical to the performance observed with HTTP/1.1.
Receiver Flow Control
Before we get into more detail on this topic, my intuition suggested the issue was related to receiver flow control. Usually the client (browser or any HTTP client) is the receiver of data, but in the case the client is uploading content to the server, the server is the receiver of data. And the receiver needs some type of flow control of the receive buffer.
How we handle receiver flow control differs between HTTP/1.1 and HTTP/2. For example, HTTP/1.1 doesn’t define protocol-level receiver flow control since there is no multiplexing of requests in the connection and it’s up to the TCP layer which handles receiving data. Note that most of the modern OS TCP stacks have auto tuning of the receive buffer (we will revisit that later) and they tune based on the current BDP (bandwidth-delay product).
In the case of HTTP/2, there is a stream-level flow control mechanism because the protocol supports multiplexing of streams. Each HTTP/2 stream has its own flow control window and there is connection level flow control for all streams in the connection. If it’s too tight, the sender will be blocked by the flow control. If it’s too loose we may end up wasting memory for buffering. So keeping it optimal is important when implementing flow control and the most optimal strategy is to keep the receive buffer matching the current BDP. BDP represents the maximum bytes in flight in the given network and can be used as an optimal buffer size.
How NGINX handles the request body buffer
Initially I tried to find a parameter which controls NGINX upload buffering and tried to see if tuning the values improved the result. There are a couple of parameters which are related to uploading a request body.
Cloudflare does not use the proxy_buffering directive, so it can be immediately discounted. client_body_buffer_size is the size of the request body buffer which is used regardless of the protocol, so this one applies to HTTP/1.1 and HTTP/2 as well.
When looking into the code, here is how it works:
HTTP/1.1: use client_body_buffer_size buffer as a buffer between upstream and the client, simply repeating reading and writing using this buffer.
HTTP/2: since we need a flow control window update for the HTTP/2 DATA frame, there are two parameters:
http2_body_preread_size: it specifies the size of the initial request body read before it starts to send to the upstream.
client_body_buffer_size: it specifies the size of the request body buffer.
Those two parameters are used for allocating a request body buffer during uploading. Here is a brief summary of how unbuffered upload works:
Allocate a single request body buffer which size is a maximum of http2_body_preread_size and client_body_buffer_size. This means if http2_body_preread_size is 64KB and client_body_buffer_size is 128KB, then a 128KB buffer is allocated. We use 128KB for client_body_buffer_size.
HTTP/2 Settings INITIAL_WINDOW_SIZE of the stream is set to http2_body_preread_size and we use 64KB as a default (the RFC7540 default value).
HTTP/2 module reads up to http2_body_preread_size before sending it to upstream.
After flushing the preread buffer, keep reading from the client and write to upstream and send WINDOW_UPDATE frame back to the client when necessary until the request body is fully received.
To summarise what this means: HTTP/1.1 simply uses a single buffer, so TCP socket buffers do the flow control. However with HTTP/2, the application layer also has receiver flow control and NGINX uses a fixed size buffer for the receiver. This limits upload speed when the current link has a BDP larger than the current request body buffer size. So the bottleneck is HTTP/2 flow control when the buffer size is too tight.
We’re going to need a bigger buffer?
In theory, bigger buffer sizes should avoid upload bottlenecks, so I tried a few out by running my tests again. The previous chart result is now labelled “prod” and plotted alongside HTTP/2 tests with client_body_buffer_size set to 256KB, 512KB and 1024KB:
It appears 512KB is an optimal value for client_body_buffer_size.
What if I test with some other network parameter? Here is when RTT is 10ms, in this case, 256KB looks optimal.
Both cases look much better than current 128KB and get a similar performance to HTTP/1.1 or even better. However, it seems like the optimal buffer size is a moving target and furthermore having too large a buffer size can hurt the performance: we need a smart way to find the optimal buffer size.
Autotuning request body buffer size
One of the ideas that can help this kind of situation is autotuning. For example, modern TCP stacks autotune their receive buffers automatically. In production, our edge also has TCP receiver buffer autotuning enabled by default.
net.ipv4.tcp_moderate_rcvbuf = 1
But in case of HTTP/2, TCP buffer autotuning is not very effective because the HTTP/2 layer is doing its own flow control and the existing 128KB was too small for a high BDP link. At this point, I decided to pursue autotuning HTTP/2 receive buffer sizing as well, similar to what TCP does.
The basic idea is that NGINX doubles the size of HTTP/2 request body buffer size based on its BDP. Here is an algorithm currently implemented in our version of NGINX:
Allocate a request body buffer as explained above.
For every RTT (using linux tcp_info), update the current BDP.
Double the request body buffer size when the current BDP > (receiver window / 4).
Here is a test result when HTTP/2 autotune upload is enabled (still using client_body_buffer_size 128KB). You can see “h2 autotune” is doing pretty well – similar or even slightly faster than HTTP/1.1 speed (that’s the initial goal). It might be slightly worse than a hand-picked optimal buffer size for given conditions, but you can see now NGINX picks up optimal buffer size automatically based on network conditions.
After we deployed this feature, I ran similar tests against our production edge, uploading a 10MB file from well connected client nodes to our edge. I created a Linux VM instance in Google Cloud and ran the upload test where the network is very fast (a few Gbps) and low latency (<10ms).
Here is when I run the test in the Google Cloud Belgium region to our CDG (Paris) PoP which has 7ms RTT. This looks very good with almost 3x improvement.
I also tested between the Google Cloud Tokyo region and our NRT (Tokyo) PoP, which had a 2.3ms RTT. Although this is not realistic for home users, the results are interesting. A 128KB fixed buffer performs well, but HTTP/2 with buffer autotune outperforms even HTTP/1.1.
HTTP/2 upload buffer autotuning is now fully deployed in the Cloudflare edge. Customers should now benefit from improved upload performance for all HTTP/2 connections, including speed tests on speed.cloudflare.com. Autotuning upload buffer size logic works well for most cases, and now HTTP/2 upload is much faster than before! When we think about the performance we usually tend to think about download speed or latency reduction, but faster uploading can also help users working from home when they need a large amount of upload, such as photo/video sharing apps, content creation, video conferencing or self broadcasting.
Today we’re excited to announce Cloudflare Network Interconnect (CNI). CNI allows our customers to interconnect branch and HQ locations directly with Cloudflare wherever they are, bringing Cloudflare’s full suite of network functions to their physical network edge. Using CNI to interconnect provides security, reliability, and performance benefits vs. using the public Internet to connect to Cloudflare. And because of Cloudflare’s global network reach, connecting to our network is straightforward no matter where on the planet your infrastructure and employees are.
At its most basic level, an interconnect is a link between two networks. Today, we’re offering customers the following options to interconnect with Cloudflare’s network:
Via a private network interconnect (PNI). A physical cable (or a virtual “pseudo-wire”; more on that later) that connects two networks.
Over an Internet Exchange (IX). A common switch fabric where multiple Internet Service Providers (ISPs) and Internet networks can interconnect with each other.
To use a real world analogy: Cloudflare over the years has built a network of highways across the Internet to handle all our customers’ traffic. We’re now providing dedicated on-ramps for our customers’ on-prem networks to get onto those highways.
Why interconnect with Cloudflare?
CNI provides more reliable, faster, and more private connectivity between your infrastructure and Cloudflare’s. This delivers benefits across our product suite. Here are some examples of specific products and how you can combine them with CNI:
Cloudflare Access: Cloudflare Access replaces corporate VPNs with Cloudflare’s network. Instead of placing internal tools on a private network, teams deploy them in any environment, including hybrid or multi-cloud models, and secure them consistently with Cloudflare’s network. CNI allows you to bring your own MPLS network to meet ours, allowing your employees to connect to your network securely and quickly no matter where they are.
CDN: Cloudflare’s CDN places content closer to visitors, improving site speed while minimizing origin load. CNI improves cache fill performance and reduces costs.
Magic Transit: Magic Transit protects datacenter and branch networks from unwanted attack and malicious traffic. Pairing Magic Transit with CNI decreases jitter and drives throughput improvements, and further hardens infrastructure from attack.
Cloudflare Workers: Workers is Cloudflare’s serverless compute platform. Integrating with CNI provides a secure connection to serverless cloud compute that does not traverse the public Internet, allowing customers to use Cloudflare’s unique set of Workers services with tighter network performance tolerances.
Let’s talk more about how CNI delivers these benefits.
Improving performance through interconnection
CNI is a great way to boost performance for many existing Cloudflare products. By utilizing CNI and setting up interconnection with Cloudflare wherever a customer’s origin infrastructure is, customers can get increased performance and security at lower cost than using public transit providers.
CNI makes things faster
As an example of the performance improvements network interconnects can deliver for Cloudflare customers, consider an HTTP application workload which flows through Cloudflare’s CDN and WAF. Many of our customers rely on our CDN to make their HTTP applications more responsive.
Cloudflare caches content very close to end users to provide the best performance possible. But, if content is not in cache, Cloudflare edge PoPs must contact the origin server to retrieve cacheable content. This can be slow, and places more load on an origin server compared to serving directly from cache.
With CNI, these origin pulls can be completed over a dedicated link, improving throughput and reducing overall time needed for origin pulls. Using Argo Tiered Cache, customers can manage tiered cache topologies and specify upstream cache tiers that correspond with locations where network interconnects are in place. Using Tiered Cache in this fashion lowers origin loads and increases cache hit rates, thereby improving performance and reducing origin infrastructures costs.
Here’s anonymized and sampled data from a real Cloudflare customer who recently provisioned interconnections between our network and theirs to further improve performance. Heavy users of our CDN, they were able to shave off precious milliseconds from their origin round trip time (RTT) by adding PNIs in multiple locations.
As an example, their 90th percentile round trip time in Warsaw, Poland decreased by 6.5ms as a result of provisioning a private network interconnect (from 7.5ms to 1ms), which is a performance win of 87%! The jitter (variation in delay in received packets) on the link decreased from 82.9 to 0.3, which speaks to the dedicated, reliable nature of the link. CNI helps deliver reliable and performant network connectivity to your customers and employees.
Enhanced security through private connectivity
Customers with large on-premise networks want to move to the cloud: it’s cheaper, less hassle, and less overhead and maintenance. However, customers want to also preserve their existing security and threat models.
Traditionally, CIOs trying to connect their IP networks to the Internet do so in two steps:
Source connectivity to the Internet from transit providers (ISPs).
Purchase, operate, and maintain network function specific hardware appliances. Think hardware load balancers, firewalls, DDoS mitigation equipment, WAN optimization, and more.
CNI allows CIOs to provision security services on Cloudflare and connect their existing networks to Cloudflare in a way that bypasses the public Internet. Because Cloudflare integrates with on-premise networks and the cloud, customers can enforce security policies across both networks and create a consistent, secure boundary.
CNI increases cloud and network security by providing a private, dedicated link to the Cloudflare network. Since this link is reserved exclusively for the customer that provisions it, the customer’s traffic is isolated and private.
CNI + Magic Transit: Removing public Internet exposure
To use a product-specific example: through CNI’s integration with Magic Transit, customers can take advantage of private connectivity to minimize exposure of their network to the public Internet.
Magic Transit attracts customers’ IP traffic to our data centers by advertising their IP addresses from our edge via BGP. When traffic arrives, it’s filtered and sent along to customers’ data centers. Before CNI, all Magic Transit traffic was sent from Cloudflare to customers via Generic Routing Encapsulation (GRE) tunnels over the Internet. Because GRE endpoints are publicly routable, there is some risk these endpoints could be discovered and attacked, bypassing Cloudflare’s DDoS mitigation and security tools.
Using CNI removes this exposure to the Internet. Advantages of using CNI with Magic Transit include:
Reduced threat exposure. Although there are many steps companies can take to increase network security, some risk-sensitive organizations prefer not to expose endpoints to the public Internet at all. CNI allows Cloudflare to absorb that risk and forward only clean traffic (via Magic Transit) through a truly private interface.
Increased reliability. Traffic traveling over the public Internet is subject to factors outside of your control, including latency and packet loss on intermediate networks. Removing steps between Cloudflare’s network and yours means that after Magic Transit processes traffic, it’s forwarded directly and reliably to your network.
Simplified configuration. Soon, Magic Transit + CNI customers will have the option to skip making MSS (maximum segment size) changes when onboarding, a step that’s required for GRE-over-Internet and can be challenging for customers who need to consider their downstream customers’ MSS as well (eg. service providers).
Example deployment: Penguin Corp uses Cloudflare for Teams, Magic Transit, and CNI to protect branch and core networks, and employees.
Imagine Penguin Corp, a hypothetical company, has a fully connected private MPLS network. Maintaining their network is difficult and they have a dedicated team of network engineers to do this. They are currently paying a lot of money to run their own private cloud. To minimize costs, they limit their network egress points to two worldwide. This creates a major performance problem for their users, whose bits have to travel a long way to accomplish basic tasks while still traversing Penguin’s network boundary.
SASE (Secure Access Service Edge) models look attractive to them, because they can, in theory, move away from their traditional MPLS network and move towards the cloud. SASE deployments provide firewall, DDoS mitigation, and encryption services at the network edge, and bring security as a service to any cloud deployment, as seen in the diagram below:
CNI allows Penguin to use Cloudflare as their true network edge, hermetically sealing their branch office locations and datacenters from the Internet. Penguin can adapt to a SASE-like model while keeping exposure to the public Internet at zero. Penguin establishes PNIs with Cloudflare from their branch office in San Jose to Cloudflare’s San Jose location to take advantage of Cloudflare for Teams, and from their core colocation facility in Austin to Cloudflare’s Dallas location to use Magic Transit to protect their core networks.
Like Magic Transit, Cloudflare for Teams replaces traditional security hardware on-premise with Cloudflare’s global network. Customers who relied on VPN appliances to reach internal applications can instead connect securely through Cloudflare Access. Organizations maintaining physical web gateway boxes can send Internet-bound traffic to Cloudflare Gateway for filtering and logging.
Cloudflare for Teams services run in every Cloudflare data center, bringing filtering and authentication closer to your users and locations to avoid compromising performance. CNI improves that even further with a direct connection from your offices to Cloudflare. With a simple configuration change, all branch traffic reaches Cloudflare’s edge where Cloudflare for Teams policies can be applied. The link improves speed and reliability for users and replaces the need to backhaul traffic to centralized filtering appliances.
Once interconnected this way, Penguin’s network and employees realize two benefits:
They get to use Cloudflare’s full set of security services without having to provision expensive and centralized physical or virtualized network appliances.
Their security and performance services are running across Cloudflare’s global network in over 200 cities. This brings performance and usability improvements for users by putting security functions closer to them.
Scalable, global, and flexible interconnection options
CNI offers a big benefit to customers because it allows them to take advantage of our global footprint spanning 200+ cities: their branch office and datacenter infrastructure can connect to Cloudflare wherever they are.
This matters for two reasons: our globally distributed network makes it easier to interconnect locally, no matter where a customer’s branches and core infrastructure is, and allows for a globally distributed workforce to interact with our edge network with low latency and improved performance.
Customers don’t have to worry about securely expanding their network footprint: that’s our job.
To this point, global companies need to interconnect at many points around the world. Cloudflare Network Interconnect is priced for global network scale: Cloudflare doesn’t charge anything for enterprise customers to provision CNI. Customers may need to pay for access to an interconnection platform or a datacenter cross-connect. We’ll work with you and any other parties involved to make the ordering and provisioning process as smooth as possible.
In other words, CNI’s pricing is designed to accommodate complicated enterprise network topologies and modern IT budgets.
How to interconnect
Customers can interconnect with Cloudflare in one of three ways: over a private network interconnect (PNI), over an IX, or through one of our interconnection platform partners. We have worked closely with our global partners to meet our customers where they are and how they want.
Private Network Interconnects
Private Network Interconnects are available at any of our listed private peering facilities. Getting a physical connection to Cloudflare is easy: specify where you want to connect, port speeds, and target VLANs. From there, we’ll authorize it, you’ll place the order, and let us do the rest. Customers should choose PNI as their connectivity option if they want higher throughput than a virtual connection or connection over an IX, or want to eliminate as many intermediaries from an interconnect as possible.
Customers who want to use existing Internet Exchanges can interconnect with us at any of the 235+ Internet Exchanges we participate in. To connect with Cloudflare via an Internet Exchange, follow the IX’s instructions to connect, and Cloudflare will spin up our side of the connection. Customers should choose Internet Exchanges as their connectivity option if they are either already peered at an IX, or they want to interconnect in a place where an interconnection platform isn’t present.
Interconnection Platform Partners
Cloudflare is proud to be partnering with Equinix, Megaport, PCCW ConsoleConnect, PacketFabric, and Zayo to provide you with easy ways to virtually connect with us in any of the partner-supported locations. Customers should choose to connect with an interconnection platform if they are already using these providers or want a quick and easy way to onboard onto a secure cloud experience.
If you’re interested in learning more, please see this blog post about all the different ways you can interconnect. For all of the interconnect methodologies described above, the BGP session establishment and IP routing are the same. The only thing that is different is the physical way in which we interconnect with other networks.
How do I find the best places to interconnect?
Our product page for CNI includes tools to better understand the right places for your network to interconnect with ours. Customers can use this data to help figure out the optimal place to interconnect to have the most connectivity with other cloud providers and other ISPs in general.
What’s the difference between CNI and peering?
Technically, peering and CNI use similar mechanisms and technical implementations behind the scenes.
We have had an open peering policy for years with any network and will continue to abide by that policy: it allows us to help build a better Internet for everyone by interconnecting networks together, making the Internet more reliable. Traditional networks use interconnect/peering to drive better performance for their customers and connectivity while driving down costs. With CNI, we are opening up our infrastructure to extend the same benefits to our customers as well.
How do I learn more?
CNI provides customers with better performance, reliability, scalability, and security than using the public Internet. A customer can interconnect with Cloudflare in any of our physical locations today, getting dedicated links to Cloudflare that deliver security benefits and more stable latency, jitter, and available bandwidth through each interconnection point.
Contact our enterprise sales team about adding Cloudflare Network Interconnect to your existing offerings.
The proliferation of DDoS attacks of varying size, duration, and persistence has made DDoS protection a foundational part of every business and organization’s online presence. However, there are key considerations including network capacity, management capabilities, global distribution, alerting, reporting and support that security and risk management technical professionals need to evaluate when selecting a DDoS protection solution.
Gartner’s view of the DDoS solutions; How did Cloudflare fare?
Gartner recently published the report Solution Comparison for DDoS Cloud Scrubbing Centers (ID G00467346), authored by Thomas Lintemuth, Patrick Hevesi and Sushil Aryal. This report enables customers to view a side-by-side solution comparison of different DDoS cloud scrubbing centers measured against common assessment criteria. If you have a Gartner subscription, you can view the report here. Cloudflare has received the greatest number of ‘High’ ratings as compared to the 6 other DDoS vendors across 23 assessment criteria in the report.
The vast landscape of DDoS attacks
From our perspective, the nature of DDoS attacks has transformed, as the economics and ease of launching a DDoS attack has changed dramatically. With a rise in cost-effective capabilities of launching a DDoS attack, we have observed a rise in the number of under 10 Gbps DDoS network-level attacks, as shown in the figure below. Even though 10 Gbps from an attack size perspective does not seem that large, it is large enough to significantly affect a majority of the websites existing today.
At the same time, larger-sized DDoS attacks are still prevalent and have the capability of crippling the availability of an organization’s infrastructure. In March 2020, Cloudflare observed numerous 300+ Gbps attacks with the largest attack being 550 Gbps in size.
In the report Gartner also observes a similar trend, “In speaking with the vendors for this research, Gartner discovered a consistent theme: Clients are experiencing more frequent smaller attacks versus larger volumetric attacks.” In addition, they also observe that “For enterprises with Internet connections up to and exceeding 10 Gbps, frequent but short attacks up to 10 Gbps are still quite disruptive without DDoS protection. Not to say that large attacks have gone away. We haven’t seen a 1-plus Tbps attack since spring 2018, but attacks over 500 Gbps are still common.”
Gartner recommends in the report to “Choose a provider that offers scrubbing capacity of three times the largest documented volumetric attack on your continent.”
From an application-level DDoS attack perspective an interesting DDoS attack observed and mitigated by Cloudflare last year, is shown below. This HTTP DDoS attack had a peak of 1.4M requests per second, which isn’t highly rate-intensive. However, the fact that the 1.1M IPs from which the attack originated were unique and not spoofed made the attack quite interesting. The unique IP addresses were actual clients who were able to complete a TCP and HTTPS handshake.
Harness the full power of Cloudflare’s DDoS protection
Cloudflare’s cloud-delivered DDoS solution provides key features that enable security professionals to protect their organizations and customers against even the most sophisticated DDoS attacks. Some of the key features and benefits include:
Massive network capacity: With over 35 Tbps of network capacity, Cloudflare ensures that you are protected against even the most sophisticated and largest DDoS attacks. Cloudflare’s network capacity is almost equal to the total scrubbing capacity of the other 6 leading DDoS vendors combined.
Globally distributed architecture: Having a few scrubbing centers globally to mitigate DDoS attacks is an outdated approach. As DDoS attacks scale and individual attacks originate from millions of unique IPs worldwide, it’s important to have a DDoS solution that mitigates the attack at the source rather than hauling traffic to a dedicated scrubbing center. With every one of our data centers across 200 cities enabled with full DDoS mitigation capabilities, Cloudflare has more points of presence than the 6 leading DDoS vendors combined.
Fast time to mitigation: Automated edge-analyzed and edge-enforced DDoS mitigation capabilities allows us to mitigate attacks at unprecedented speeds. Typical time to mitigate a DDoS attack is less than 10s.
Integrated security: A key design tenet while building products at Cloudflare is integration. Our DDoS solution integrates seamlessly with other product offerings including WAF, Bot Management, CDN and many more. A comprehensive and integrated security solution to bolster the security posture while aiding performance. No tradeoffs between security and performance!
Unmetered and unlimited mitigation: Cloudflare offers unlimited and unmetered DDoS mitigation. This eliminates the legacy concept of ‘Surge Pricing,’ which is especially painful when a business is under duress and experiencing a DDoS attack. This enables you to avoid unpredictable costs from traffic.
Whether you’re part of a large global enterprise, or use Cloudflare for your personal site, we want to make sure that you’re protected and also have the visibility that you need. DDoS Protection is included as part of every Cloudflare service. Enterprise-level plans include advanced mitigation, detailed reporting, enriched logs, productivity enhancements and fine-grained controls. Enterprise Plan customers also receive access to dedicated customer success and solution engineering.
The recommendation for social distancing to slow down the spread of COVID-19 has led many companies to adopt a work-from-home policy for their employees in offices around the world, and Cloudflare is no exception.
As a result, a large portion of Internet access shifted from office-focused areas, like city centers and business parks, towards more residential areas like suburbs and outlying towns. We wanted to find out just precisely how broad this geographical traffic migration was, and how different locations were affected by it.
It turns out it is substantial, and the results are quite stunning:
Gathering the Data
So how can we determine if Internet usage patterns have changed from a geographical perspective?
In each Cloudflare Point of Presence (in more than 200 cities worldwide) there’s an edge router whose responsibility it is to switch Internet traffic to serve the requests of end users in the region.
These edge routers are the network’s entry point and for monitoring and debugging purposes each router samples IP packet information regarding the traffic that traverses them. This data is collected as flow records and contains layer-3 related information, such as the source and destination IP address, port, packet size etc.
These statistical samples allow us to monitor aggregate traffic information, spot unusual activity, and verify that our connections to the greater Internet are working correctly.
By using a geolocation database like Maxmind, we can determine the rough location from which a flow emanates. If the geographic data is too imprecise it is excluded from our set to reduce noise.
By merging together information about flows that come from similar areas, we can infer a geospatial distribution of traffic volume across the globe.
This distribution of traffic volume is an indication of Internet usage patterns. By plotting a geospatial heatmap of the Internet traffic in New York City for two separate dates four weeks apart (February 19, 2020 and March 18, 2020), we can already notice an erosion of traffic volume from the city center dissipating into the surrounding areas.
All the charts in this blog post show day time traffic in each geography. This is done to show the difference working from home is having on Internet traffic.
But the migration pattern really jumps forward when producing a differential heatmap of these two dates:
This chart shows a diff of the “before” and “after” scenarios, highlighting only differences in volume between the two dates. The red color represents areas where Internet usage has decreased since February 19, and green where it has increased.
It strongly suggests a migration of Internet usage towards suburban and residential areas.
The data used is comparing working hours periods (1000 to 1600 local time) between two Wednesdays four weeks apart (February 19 and March 18).
We produced similar visual representations for other impacted locations around the world. All exhibit similar migration patterns.
Additional features can be picked out from the chart. The red dot in Silverdale, WA appears to be Internet access from the Kitsap Mall (which is currently closed). And Sea-Tac Airport is seen as a red area between Burien and Renton.
San Francisco Bay Area, US
Areas where large Silicon Valley businesses have sent employees home are clearly visible.
The red area in north east Paris appears to be the airport at Le Bourget and the surrounding industrial zone. To the west there’s another red area: Le Château de Versailles.
In Berlin a red dot near the Tegeler See is Flughafen Berlin-Tegel likely resulting from fewer passengers passing through the airport.
As shown above, geographical Internet usage changes are visible from Internet traffic exchanged between end users and Cloudflare at various locations around the globe.
Besides the geographical migration in various metropolitan areas, the overall volume of traffic in these locations has also increased between 10% and 40% in just a period of four weeks.
It’s interesting to reflect that the Internet was originally conceived as a communications network for humanity during a crisis, and it’s come a long way since then. But in this moment of crisis, it’s being put to use for that original purpose. The Internet was built for this.
Cloudflare helps power a substantial portion of Internet traffic. During this time of increased strain on networks around the world, our team is working to ensure our service continues to run smoothly.
We are also providing our Cloudflare for Teams products at no cost to companies of any size struggling to support their employees working from home: learn more.
At Cloudflare we develop new products at a great pace. Their needs often challenge the architectural assumptions we made in the past. For example, years ago we decided to avoid using Linux’s "conntrack" – stateful firewall facility. This brought great benefits – it simplified our iptables firewall setup, sped up the system a bit and made the inbound packet path easier to understand.
But eventually our needs changed. One of our new products had a reasonable need for it. But we weren’t confident – can we just enable conntrack and move on? How does it actually work? I volunteered to help the team understand the dark corners of the "conntrack" subsystem.
What is conntrack?
"Conntrack" is a part of Linux network stack, specifically part of the firewall subsystem. To put that into perspective: early firewalls were entirely stateless. They could express only basic logic, like: allow SYN packets to port 80 and 443, and block everything else.
The stateless design gave some basic network security, but was quickly deemed insufficient. You see, there are certain things that can’t be expressed in a stateless way. The canonical example is assessment of ACK packets – it’s impossible to say if an ACK packet is legitimate or part of a port scanning attempt, without tracking the connection state.
To fill such gaps all the operating systems implemented connection tracking inside their firewalls. This tracking is usually implemented as a big table, with at least 6 columns: protocol (usually TCP or UDP), source IP, source port, destination IP, destination port and connection state. On Linux this subsystem is called "conntrack" and is often enabled by default. Here’s how the table looks on my laptop inspected with "conntrack -L" command:
The obvious question is how large this state tracking table can be. This setting is under "/proc/sys/net/nf_conntrack_max":
$ cat /proc/sys/net/nf_conntrack_max
This is a global setting, but the limit is per per-container. On my system each container, or "network namespace", can have up to 256K conntrack entries.
What exactly happens when the number of concurrent connections exceeds the conntrack limit?
Testing conntrack is hard
In past testing conntrack was hard – it required complex hardware or vm setup. Fortunately, these days we can use modern "user namespace" facilities which do permission magic, allowing an unprivileged user to feel like root. Using the tool "unshare" it’s possible to create an isolated environment where we can precisely control the packets going through and experiment with iptables and conntrack without threatening the health of our host system. With appropriate parameters it’s possible to create and manage a networking namespace, including access to namespaced iptables and conntrack, from an unprivileged user.
This script is the heart of our test:
# Enable tun interface
ip tuntap add name tun0 mode tun
ip link set tun0 up
ip addr add 192.0.2.1 peer 192.0.2.2 dev tun0
ip route add 0.0.0.0/0 via 192.0.2.2 dev tun0
# Refer to conntrack at least once to ensure it's enabled
iptables -t raw -A PREROUTING -j CT
# Create a counter in mangle table
iptables -t mangle -A PREROUTING
# Make sure reverse traffic doesn't affect conntrack state
iptables -t raw -A OUTPUT -p tcp – sport 80 -j DROP
tcpdump -ni any -B 16384 -ttt &
# Show iptables counters
iptables -nvx -t raw -L PREROUTING
iptables -nvx -t mangle -L PREROUTING
This bash script is shortened for readability. See the full version here. The accompanying "send_syn.py" is just sending 10 SYN packets over "tun0" interface. Here is the source but allow me to paste it here – showing off "scapy" is always fun:
tun = TunTapInterface("tun0", mode_tun=True)
for i in range(10000,10000+10):
tcp=TCP(sport=i, dport=80, flags="S")
send(ip/tcp, verbose=False, inter=0.01, socket=tun)
The bash script above contains a couple of gems. Let’s walk through them.
First, please note that we can’t just inject packets into the loopback interface using SOCK_RAW sockets. The Linux networking stack is a complex beast. The semantics of sending packets over a SOCK_RAW are different then delivering a packet over a real interface. We’ll discuss this later, but for now, to avoid triggering unexpected behaviour, we will deliver packets over a tun/tap device which better emulates a real interface.
Then we need to make sure the conntrack is active in the network namespace we wish to use for testing. Traditionally, just loading the kernel module would have done that, but in the brave new world of containers and network namespaces, a method had to be found to allow conntrack to be active in some and inactive in other containers. Hence this is tied to usage – rules referencing conntrack must exist in the namespace’s iptables for conntrack to be active inside the container.
After the "-t raw -A PREROUTING" rule, which we added "-t mangle -A PREROUTING" rule, but notice – it doesn’t have any action! This syntax is allowed by iptables and it is pretty useful to get iptables to report rule counters. We’ll need these counters soon. A careful reader might suggest looking at "policy" counters in iptables to achieve our goal. Sadly, "policy" counters (increased for each packet entering a chain), work only if there is at least one rule inside it.
The rest of the steps are self-explanatory. We set up "tcpdump" in the background, send 10 SYN packets to 127.0.0.1:80 using the "scapy" Python library. Ten we print the conntrack table and iptables counters.
Let’s run this script in action. Remember to run it under networking namespace as fake root with "unshare -Ur -n":
This is all nice. First we see a "tcpdump" listing showing 10 SYN packets. Then we see the conntrack table state, showing 10 created flows. Finally, we see iptables counters in two rules we created, each showing 10 packets processed.
Can conntrack table fill up?
Given that the conntrack table is size constrained, what exactly happens when it fills up? Let’s check it out. First, we need to drop the conntrack size. As mentioned it’s controlled by a global toggle – it’s necessary to tune it on the host side. Let’s reduce the table size to 7 entries, and repeat our test:
This is getting interesting. We still see the 10 inbound SYN packets. We still see that the "-t raw PREROUTING" table received 10 packets, but this is where similarities end. The "-t mangle PREROUTING" table saw only 7 packets. Where did the three missing SYN packets go?
It turns out they went where all the dead packets go. They were hard dropped. Conntrack on overfill does exactly that. It even complains in the "dmesg":
This is confirmed by our iptables counters. Let’s review the famous iptables diagram:
As we can see, the "-t raw PREROUTING" happens before conntrack, while "-t mangle PREROUTING" is just after it. This is why we see 10 and 7 packets reported by our iptables counters.
Let me emphasize the gravity of our discovery. We showed three completely valid SYN packets being implicitly dropped by "conntrack". There is no explicit "-j DROP" iptables rule. There is no configuration to be toggled. Just the fact of using "conntrack" means that, when it’s full, packets creating new flows will be dropped. No questions asked.
This is the dark side of using conntrack. If you use it, you absolutely must make sure it doesn’t get filled.
We could end our investigation here, but there are a couple of interesting caveats.
Strict vs loose
Conntrack supports a "strict" and "loose" mode, as configured by "nf_conntrack_tcp_loose" toggle.
By default, it’s set to "loose" which means that stray ACK packets for unseen TCP flows will create new flow entries in the table. We can generalize: "conntrack" will implicitly drop all the packets that create new flow, whether that’s SYN or just stray ACK.
What happens when we clear the "nf_conntrack_tcp_loose=0" setting? This is a subject for another blog post, but suffice to say – mess. First, this setting is not settable in the network namespace scope – although it should be. To test it you need to be in the root network namespace. Then, due to twisted logic the ACK will be dropped on a full conntrack table, even though in this case it doesn’t create a flow. If the table is not full, the ACK packet will pass through it, having "-ctstate INVALID" from "mangle" table forward.
When a conntrack entry is not created?
There are important situations when conntrack entry is not created. For example, we could replace these line in our script:
# Make sure reverse traffic doesn't affect conntrack state
iptables -t raw -A OUTPUT -p tcp – sport 80 -j DROP
# Make sure inbound SYN packets don't go to networking stack
iptables -A INPUT -j DROP
Naively we could think dropping SYN packets past the conntrack layer would not interfere with the created flows. This is not correct. In spite of these SYN packets having been seen by conntrack, no flow state is created for them. Packets hitting "-j DROP" will not create new conntrack flows. Pretty magical, isn’t it?
Full Conntrack causes with EPERM
Recently we hit a case when a "sendto()" syscall on UDP socket from one of our applications was erroring with EPERM. This is pretty weird, and not documented in the man page. My colleague had no doubts:
I’ll save you the gruesome details, but indeed, the full conntrack table will do that to your new UDP flows – you will get EPERM. Beware. Funnily enough, it’s possible to get EPERM if an outbound packet is dropped on OUTPUT firewall in other ways. For example:
If you ever receive EPERM from "sendto()", you might want to treat it as a transient error, if you suspect a filled conntrack problem, or permanent error if you blame iptables configuration.
This is also why we can’t send our SYN packets directly using SOCK_RAW sockets in our test. Let’s see what happens on conntrack overfill with standard "hping3" tool:
$ hping3 -S -i u10000 -c 10 – spoof 18.104.22.168 192.0.2.1 -p 80 -I lo
HPING 192.0.2.1 (lo 192.0.2.1): S set, 40 headers + 0 data bytes
[send_ip] sendto: Operation not permitted
"send()" even on a SOCK_RAW socket fails with EPERM when conntrack table is full.
Full conntrack can happen on a SYN flood
There is one more caveat. During a SYN flood, the conntrack entries will totally be created for the spoofed flows. Take a look at second test case we prepared, this time correctly listening on port 80, and sending SYN+ACK:
We can see 7 SYN+ACK’s flying out of the port 80 listening socket. The final three SYN’s go nowhere as they are dropped by conntrack.
This has important implications. If you use conntrack on publicly accessible ports, during SYN flood mitigation technologies like SYN Cookies won’t help. You are still at risk of running out of conntrack space and therefore affecting legitimate connections.
For this reason, as a rule of thumb consider avoiding conntrack on inbound connections (-j NOTRACK). Alternatively having some reasonable rate limits on iptables layer, doing "-j DROP". This will work well and won’t create new flows, as we discussed above. The best method though, would be to trigger SYN Cookies from a layer before conntrack, like XDP. But this is a subject for another time.
Over the years Linux conntrack has gone through many changes and has improved a lot. While performance used to be a major concern, these days it’s considered to be very fast. Dark corners remain. Correctly applying conntrack is tricky.
In this blog post we showed how it’s possible to test parts of conntrack with "unshare" and a series of scripts. We showed the behaviour when the conntrack table gets filled – packets might implicitly be dropped. Finally, we mentioned the curious case of SYN floods where incorrectly applied conntrack may cause harm.
Stay tuned for more horror stories as we dig deeper and deeper into the Linux networking stack guts.
As the COVID-19 emergency continues and an increasing number of cities and countries are establishing quarantines or cordons sanitaire, the Internet has become, for many, the primary method to keep in touch with their friends and families. And it’s a vital motor of the global economy as many companies have employees who are now working from home.
Traffic towards video conferencing, streaming services and news, e-commerce websites has surged. We’ve seen growth in traffic from residential broadband networks, and a slowing of traffic from businesses and universities.
The Cloudflare team is fully operational and the Network Operating Center (NOC) is watching the changing traffic patterns in the more than 200 cities in which we operate hardware.
Big changes in Internet traffic aren’t unusual. They often occur around large sporting events like the Olympics or World Cup, cultural events like the Eurovision Song Contest and even during Ramadan at the breaking of the fast each day.
The Internet was built to cope with an ever changing environment. In fact, it was literally created, tested, debugged and designed to deal with changing load patterns.
Over the last few weeks, the Cloudflare Network team has noticed some new patterns and we wanted to share a few of them with you.
Entire countries are watching their leaders
Last Friday evening, the US President announced a State of Emergency in the United States. Not so long after, our US data centers served 20% more traffic than usual. The red line shows Friday, the grey lines the preceding days for comparison.
On the Sunday, March 15, the Dutch government announced on the radio at 1730 local time closures of the non-essential business (1630 UTC). A sharp dip in the regular Sunday traffic followed:
The French president made two national announcements, on March 12 (pink curve) and March 16 (red curve) at 2000 local time (1900 UTC). The lockdown announcement on March 16 caused French traffic to dip by half followed by a spike:
Evolution of traffic in quarantine
Italy has seen a 20-40% increase in daily traffic since the lockdown:
With universities closing, some national research networks are remaining (almost) as quiet as a weekend (in purple). Current day in red and previous days in grey (overlaps with previous week):
The Internet Exchange Points, a key part of the Internet infrastructure, where Internet service providers and content providers can exchange data directly (rather than via a third party) have also seen spikes in traffic. Many provide public traffic graphs.
In Amsterdam (AMS-IX), London (LINX) and Frankfurt (DE-CIX), around 10-20% increase is seen around March 9th:
In Milan (MXP-IX), the Exchange point shows a 40% increase on Wednesday, 9th of March 2020, the day of the quarantine:
In Asia, in Hong Kong (HKIX), we can observe a faster increase since the end of January which likely corresponds to the Hubei lockdown on the January 23:
The emergency has a non-negligible impact on Internet services and our lives. Although it is difficult to quantify exactly the increase, we observe numbers from 10% to 40% depending on the region and the state of government action in those regions.
Even though from time to time individual services, such as a web site or an app, have outages the core of the Internet is robust. Traffic is shifting from corporate and university networks to residential broadband, but the Internet was designed for change.
Check back on the Cloudflare blog for further updates and insights.
Back in March 2019, we released Firewall Analytics which provides insights into HTTP security events across all of Cloudflare’s protection suite; Firewall rule matches, HTTP DDoS Attacks, Site Security Level which harnesses Cloudflare’s threat intelligence, and more. It helps customers tailor their security configurations more effectively. The initial release was for Enterprise customers, however we believe that everyone should have access to powerful tools, not just large enterprises, and so in December 2019 we extended those same enterprise-level analytics to our Business and Pro customers.
Since then, we’ve built on top of our analytics platform; improved the usability, added more functionality and extended it to additional Cloudflare services in the form of Account Analytics, DNS Analytics, Load Balancing Analytics, Monitoring Analytics and more.
Our entire analytics platform harnesses the powerful GraphQL framework which is also available to customers that want to build, export and share their own custom reports and dashboards.
Extending Visibility From L7 To L3
Until recently, all of our dashboards were mostly HTTP-oriented and provided visibility into HTTP attributes such as the user agent, hosts, cached resources, etc. This is valuable to customers that use Cloudflare to protect and accelerate HTTP applications, mobile apps, or similar. We’re able to provide them visibility into the application layer (Layer 7 in the OSI model) because we proxy their traffic at L7.
However with Magic Transit, we don’t proxy traffic at L7 but rather route it at L3 (network layer). Using BGP Anycast, customer traffic is routed to the closest point of presence of Cloudflare’s network edge where it is filtered by customer-defined network firewall rules and automatic DDoS mitigation systems. Clean traffic is then routed via dynamic GRE Anycast tunnels to the customer’s data-centers. Routing at L3 means that we have limited visibility into the higher layers. So in order to provide Magic Transit customers visibility into traffic and attacks, we needed to extend our analytics platform to the packet-layer.
On January 16, 2020, we released the Network Analytics dashboard for Magic Transit customers and Bring Your Own IP (BYOIP) customers. This packet and bit oriented dashboard provides near real-time visibility into network- and transport-layer traffic patterns and DDoS attacks that are blocked at the Cloudflare edge in over 200 cities around the world.
Analytics For A Year
The way we’ve architected the analytics data-stores enables us to provide our customers one year’s worth of insights. Traffic is sampled at the edge data-centers. From those samples we structure IP flow logs which are similar to SFlow and bucket one minute’s worth of traffic, grouped by destination IP, port and protocol. IP flows includes multiple packet attributes such as TCP flags, source IPs and ports, Cloudflare data-center, etc. The source IP is considered PII data and is therefore only stored for 30 days, after which the source IPs are discarded and logs are rolled up into one hour groups, and then one day groups. The one hour roll-ups are stored for 6 months and the one day roll-ups for 1 year.
Similarly, attack logs are also stored efficiently. Attacks are stored as summaries with start/end timestamps, min/max/average/total bits and packets per second, attack type, action taken and more. A DDoS attack could easily consist of billions of packets which could impact performance due to the number of read/write calls to the data-store. By storing attacks as summary logs, we’re able to overcome these challenges and therefore provide attack logs for up to 1 year back.
Network Analytics via GraphQL API
We built this dashboard on the same analytics platform, meaning that our packet-level analytics are also available by GraphQL. As an example, below is an attack report query that would show the top attacker IPs, the data-center cities and countries where the attack was observed, the IP version distribution, the ASNs that were used by the attackers and the ports. The query is done at the account level, meaning it would provide a report for all of your IP ranges. In order to narrow the report down to a specific destination IP or port range, you can simply add additional filters. The same filters also exist in the UI.
After running the query using Altair GraphQL Client, the response is returned in a JSON format:
What Do Customers Want?
As part of our product definition and design research stages, we interviewed internal customer-facing teams including Customer Support, Solution Engineering and more. I consider these stakeholders as super-user-aggregators because they’re customer-facing teams and are constantly engaging and helping our users. After the internal research phase, we expanded externally to customers and prospects; particularly network and security engineers and leaders. We wanted to know how they expect the dashboard to fit in their work-flow, what are their use cases and how we can tailor the dashboard to their needs. Long story short, we identified two main use cases: Incident Response and Reporting. Let’s go into each of these use cases in more detail.
I started off by asking them a simple question – “what do you do when you’re paged?” We wanted to better understand their incident response process; specifically, how they’d expect to use this dashboard when responding to an incident and what are the metrics that matter to them to help them make quick calculated decisions.
You’ve Just Been Paged
Let’s say that you’re a security operations engineer. It’s Black Friday. You’re on call. You’ve just been paged. Traffic levels to one of your data-centers has exceeded a safe threshold. Boom. What do you do? Responding quickly and resolving the issue as soon as possible is key.
If your workflows are similar to our customers’, then your objective is to resolve the page as soon as possible. However, before you can resolve it, you need to determine if there is any action that you need to take. For instance, is this a legitimate rise in traffic from excited Black Friday shoppers, perhaps a new game release or maybe an attack that hasn’t been mitigated? Do you need to shift traffic to another data-center or are levels still stable? Our customers tell us that these are the metrics that matter the most:
Top Destination IP and port – helps understand what services are being impacted
Top source IPs, port, ASN, data-center and data-center Country – helps identify the source of the traffic
Real-time packet and bit rates – helps understand the traffic levels
Protocol distribution – helps understand what type of traffic is abnormal
TCP flag distribution – an abnormal distribution could indicate an attack
Attack Log – shows what types of traffic is being dropped/rate-limited
As network and transport layer attacks can be highly distributed and the packet attributes can be easily spoofed, it’s usually not practical to block IPs. Instead, the dashboard enables you to quickly identify patterns such as an increased frequency of a specific TCP flag or increased traffic from a specific country. Identifying these patterns brings you one step closer to resolving the issue. After you’ve identified the patterns, packet-level filtering can be applied to drop or rate-limit the malicious traffic. If the attack was automatically mitigated by Cloudflare’s systems, you’ll be able to see it immediately along with the attack attributes in the activity log. By filtering by the Attack ID, the entire dashboard becomes your attack report.
During our interviews with security and network engineers, we also asked them what metrics and insights they need when creating reports for their managers, C-levels, colleagues and providing evidence to law-enforcement agencies. After all, processing data and creating reports can consume over a third (36%) of a security team’s time (~3 hours a day) and is also one of the most frequent DDoS asks by our customers.
On top of all of these customizable insights, we wanted to also provide a one-line summary that would reflect your recent activity. The one-liner is dynamic and changes based on your activity. It tells you whether you’re currently under attack, and how many attacks were blocked. If your CISO is asking for a security update, you can simply copy-paste it and convey the efficiency of the service:
Our customers say that they want to reflect the value of Cloudflare to their managers and peers:
How much potential downtime and bandwidth did Cloudflare spare me?
What are my top attacked IPs and ports?
Where are the attacks coming from? What types and what are the trends?
The Secret To Creating Good Reports
What does everyone love? Cool Maps! The key to a good report is adding a map; showing where the attack came from. But given that packet attributes can be easily spoofed, including the source IP, it won’t do us any good to plot a map based on the locations of the source IPs. It would result in a spoofed source country and is therefore useless data. Instead, we decided to show the geographic distribution of packets and bits based on the Cloudflare data-center in which they were ingested. As opposed to legacy scrubbing center solutions with limited network infrastructures, Cloudflare has data-centers in more than 200 cities around the world. This enables us to provide precise geographic distribution with high confidence, making your reports accurate.
Tailored For You
One of the main challenges both our customers and we struggle with is how to process and generate actionable insights from all of the data points. This is especially critical when responding to an incident. Under this assumption, we built this dashboard with the purpose of speeding up your reporting and investigation processes. By tailoring it to your needs, we hope to make you more efficient and make the most out of Cloudflare’s services. Got any feedback or questions? Post them below in the comments section.
If you’re an existing Magic Transit or BYOIP customer, then the dashboard is already available to you. Not a customer yet? Click here to learn more.
Today we announced Cloudflare Magic Transit, which makes Cloudflare’s network available to any IP traffic on the Internet. Up until now, Cloudflare has primarily operated proxy services: our servers terminate HTTP, TCP, and UDP sessions with Internet users and pass that data through new sessions they create with origin servers. With Magic Transit, we are now also operating at the IP layer: in addition to terminating sessions, our servers are applying a suite of network functions (DoS mitigation, firewalling, routing, and so on) on a packet-by-packet basis.
Over the past nine years, we’ve built a robust, scalable global network that currently spans 193 cities in over 90 countries and is ever growing. All Cloudflare customers benefit from this scale thanks to two important techniques. The first is anycast networking. Cloudflare was an early adopter of anycast, using this routing technique to distribute Internet traffic across our data centers. It means that any data center can handle any customer’s traffic, and we can spin up new data centers without needing to acquire and provision new IP addresses. The second technique is homogeneous server architecture. Every server in each of our edge data centers is capable of running every task. We build our servers on commodity hardware, making it easy to quickly increase our processing capacity by adding new servers to existing data centers. Having no specialty hardware to depend on has also led us to develop an expertise in pushing the limits of what’s possible in networking using modern Linux kernel techniques.
Magic Transit is built on the same network using the same techniques, meaning our customers can now run their network functions at Cloudflare scale. Our fast, secure, reliable global edge becomes our customers’ edge. To explore how this works, let’s follow the journey of a packet from a user on the Internet to a Magic Transit customer’s network.
Putting our DoS mitigation to work… for you!
In the announcement blog post we describe an example deployment for Acme Corp. Let’s continue with this example here. When Acme brings their IP prefix 203.0.113.0/24 to Cloudflare, we start announcing that prefix to our transit providers, peers, and to Internet exchanges in each of our data centers around the globe. Additionally, Acme stops announcing the prefix to their own ISPs. This means that any IP packet on the Internet with a destination address within Acme’s prefix is delivered to a nearby Cloudflare data center, not to Acme’s router.
Let’s say I want to access Acme’s FTP server on 203.0.113.100 from my computer in Cloudflare’s office in Champaign, IL. My computer generates a TCP SYN packet with destination address 203.0.113.100 and sends it out to the Internet. Thanks to anycast, that packet ends up at Cloudflare’s data center in Chicago, which is the closest data center (in terms of Internet routing distance) to Champaign. The packet arrives on the data center’s router, which uses ECMP (Equal Cost Multi-Path) routing to select which server should handle the packet and dispatches the packet to the selected server.
Once at the server, the packet flows through our XDP- and iptables-based DoS detection and mitigation functions. If this TCP SYN packet were determined to be part of an attack, it would be dropped and that would be the end of it. Fortunately for me, the packet is permitted to pass.
So far, this looks exactly like any other traffic on Cloudflare’s network. Because of our expertise in running a global anycast network we’re able to attract Magic Transit customer traffic to every data center and apply the same DoS mitigation solution that has been protecting Cloudflare for years. Our DoS solution has handled some of the largest attacks ever recorded, including a 942Gbps SYN flood in 2018. Below is a screenshot of a recent SYN flood of 300M packets per second. Our architecture lets us scale to stop the largest attacks.
Network namespaces for isolation and control
The above looked identical to how all other Cloudflare traffic is processed, but this is where the similarities end. For our other services, the TCP SYN packet would now be dispatched to a local proxy process (e.g. our nginx-based HTTP/S stack). For Magic Transit, we instead want to dynamically provision and apply customer-defined network functions like firewalls and routing. We needed a way to quickly spin up and configure these network functions while also providing inter-network isolation. For that, we turned to network namespaces.
Namespaces are a collection of Linux kernel features for creating lightweight virtual instances of system resources that can be shared among a group of processes. Namespaces are a fundamental building block for containerization in Linux. Notably, Docker is built on Linux namespaces. A network namespace is an isolated instance of the Linux network stack, including its own network interfaces (with their own eBPF hooks), routing tables, netfilter configuration, and so on. Network namespaces give us a low-cost mechanism to rapidly apply customer-defined network configurations in isolation, all with built-in Linux kernel features so there’s no performance hit from userspace packet forwarding or proxying.
When a new customer starts using Magic Transit, we create a brand new network namespace for that customer on every server across our edge network (did I mention that every server can run every task?). We built a daemon that runs on our servers and is responsible for managing these network namespaces and their configurations. This daemon is constantly reading configuration updates from Quicksilver, our globally distributed key-value store, and applying customer-defined configurations for firewalls, routing, etc, inside the customer’s namespace. For example, if Acme wants to provision a firewall rule to allow FTP traffic (TCP ports 20 and 21) to 203.0.113.100, that configuration is propagated globally through Quicksilver and the Magic Transit daemon applies the firewall rule by adding an nftables rule to the Acme customer namespace:
Getting the customer’s traffic to their network namespace requires a little routing configuration in the default network namespace. When a network namespace is created, a pair of virtual ethernet (veth) interfaces is also created: one in the default namespace and one in the newly created namespace. This interface pair creates a “virtual wire” for delivering network traffic into and out of the new network namespace. In the default network namespace, we maintain a routing table that forwards Magic Transit customer IP prefixes to the veths corresponding to those customers’ namespaces. We use iptables to mark the packets that are destined for Magic Transit customer prefixes, and we have a routing rule that specifies that these specially marked packets should use the Magic Transit routing table.
(Why go to the trouble of marking packets in iptables and maintaining a separate routing table? Isolation. By keeping Magic Transit routing configurations separate we reduce the risk of accidentally modifying the default routing table in a way that affects how non-Magic Transit traffic flows through our edge.)
Network namespaces provide a lightweight environment where a Magic Transit customer can run and manage network functions in isolation, letting us put full control in the customer’s hands.
GRE + anycast = magic
After passing through the edge network functions, the TCP SYN packet is finally ready to be delivered back to the customer’s network infrastructure. Because Acme Corp. does not have a network footprint in a colocation facility with Cloudflare, we need to deliver their network traffic over the public Internet.
This poses a problem. The destination address of the TCP SYN packet is 203.0.113.100, but the only network announcing the IP prefix 203.0.113.0/24 on the Internet is Cloudflare. This means that we can’t simply forward this packet out to the Internet—it will boomerang right back to us! In order to deliver this packet to Acme we need to use a technique called tunneling.
Tunneling is a method of carrying traffic from one network over another network. In our case, it involves encapsulating Acme’s IP packets inside of IP packets that can be delivered to Acme’s router over the Internet. There are a number of common tunneling protocols, but Generic Routing Encapsulation (GRE) is often used for its simplicity and widespread vendor support.
GRE tunnel endpoints are configured both on Cloudflare’s servers (inside of Acme’s network namespace) and on Acme’s router. Cloudflare servers then encapsulate IP packets destined for 203.0.113.0/24 inside of IP packets destined for a publicly-routable IP address for Acme’s router, which decapsulates the packets and emits them into Acme’s internal network.
Now, I’ve omitted an important detail in the diagram above: the IP address of Cloudflare’s side of the GRE tunnel. Configuring a GRE tunnel requires specifying an IP address for each side, and the outer IP header for packets sent over the tunnel must use these specific addresses. But Cloudflare has thousands of servers, each of which may need to deliver packets to the customer through a tunnel. So how many Cloudflare IP addresses (and GRE tunnels) does the customer need to talk to? The answer: just one, thanks to the magic of anycast.
Cloudflare uses anycast IP addresses for our GRE tunnel endpoints, meaning that any server in any data center is capable of encapsulating and decapsulating packets for the same GRE tunnel. How is this possible? Isn’t a tunnel a point-to-point link? The GRE protocol itself is stateless—each packet is processed independently and without requiring any negotiation or coordination between tunnel endpoints. While the tunnel is technically bound to an IP address it need not be bound to a specific device. Any device that can strip off the outer headers and then route the inner packet can handle any GRE packet sent over the tunnel. Actually, in the context of anycast the term “tunnel” is misleading since it implies a link between two fixed points. With Cloudflare’s Anycast GRE, a single “tunnel” gives you a conduit to every server in every data center on Cloudflare’s global edge.
One very powerful consequence of Anycast GRE is that it eliminates single points of failure. Traditionally, GRE-over-Internet can be problematic because an Internet outage between the two GRE endpoints fully breaks the “tunnel”. This means reliable data delivery requires going through the headache of setting up and maintaining redundant GRE tunnels terminating at different physical sites and rerouting traffic when one of the tunnels breaks. But because Cloudflare is encapsulating and delivering customer traffic from every server in every data center, there is no single “tunnel” to break. This means Magic Transit customers can enjoy the redundancy and reliability of terminating tunnels at multiple physical sites while only setting up and maintaining a single GRE endpoint, making their jobs simpler.
Our scale is now your scale
Magic Transit is a powerful new way to deploy network functions at scale. We’re not just giving you a virtual instance, we’re giving you a global virtual edge. Magic Transit takes the hardware appliances you would typically rack in your on-prem network and distributes them across every server in every data center in Cloudflare’s network. This gives you access to our global anycast network, our fleet of servers capable of running your tasks, and our engineering expertise building fast, reliable, secure networks. Our scale is now your scale.
To learn more about the origins of The Network is the Computer®, I spoke with John Gage, the creator of the phrase and the 21st employee of Sun Microsystems. John had a key role in shaping the vision of Sun and had a lot to share about his vision for the future. Listen to our conversation here and read the full transcript below.
John Graham-Cumming: I’m talking to John Gage who was what, the 21st employee of Sun Microsystems, which is what Wikipedia claims and it also claims that you created this phrase “The Network is the Computer,” and that’s actually one of the things I want to talk about with you a little bit because I remember when I was in Silicon Valley seeing that slogan plastered about the place and not quite understanding what it meant. So do you want to tell me what you meant by it or what Sun meant by it at the time?
John Gage: Well, in 2019, recalling what it meant in 1982 or 83’ will be colored by all our experience since then but at the time it seemed so obvious that when we introduced the first scientific workstations, they were not very powerful computers. The first Suns had a giant screen and they were on the Internet but they were designed as a complementary component to supercomputers. Bill Joy and I had a series of diagrams for talks we’d give, and Bill had the bi-modal, the two node picture. The serious computing occurred on the giant machines where you could fly into the heart of a black hole and the human interface was the workstation across the network. So each had to complement the other, each built on the strengths of the other, and each enhanced the other because to deal in those days with a supercomputer was very ugly. And to run all your very large computations, you could run them on a Sun because we had virtual memory and series of such advanced things but not fast. So the speed of scientific understanding is deeply affected by the tools the scientist has — is it a microscope, is it an optical telescope, is it a view into the heart of a star by running a simulation on a supercomputer? You need to have the loop with the human and the science constantly interacting and constantly modifying each other, and that’s what the network is for, to tie those different nodes together in as seamless a way as possible. Then, the instant anyone that’s ever created a programming language says, “so if I have to create a syntax of this where I’m trying to let you express, do this, how about the delay on the network, the latency”? Does your phrase “The Network is the Computer” really capture this hundreds, thousands, tens of thousands, millions perhaps at that time, now billions and billions and billions today, all these devices interacting and exchanging state with latency, with delay. It’s sort of an oversimplification, and that we would point out, but it’s just network is the computer. Four words, you know, what we tried to do is give a metaphor that allows you to explore it in your mind and think of new things to do and be inspired.
Graham-Cumming: And then by a sort of strange sequence of events, that was a trademark of Sun. It got abandoned. And now Cloudflare has swooped in and trademarked it again. So now it’s our trademark which sort of brings us full circle, I suppose.
Gage: Well, trademarks are dealing with the real world, but the inspiration of Cloudflare is to do exactly what Bill Joy and I were talking about in 1982. It’s to build an environment in which every participant globally can share with security, and we were not as strong. Bill wrote most of the code of TCP/IP implemented by every other computer vendor, and still these questions of latency, these questions of distributed denial of service which was, how do you block that? I was so happy to see that Cloudflare invests real money and real people in addressing those kinds of critical problems, which are at the core, what will destroy the Internet.
Graham-Cumming: Yes, I agree. I mean, it is a significant investment to actually deal with it and what I think people don’t appreciate about the DDoS attack situation is that they are going on all the time and it’s just a continuous, you know, just depends who the target is. It’s funny you mentioned TCP/IP because about 10 years after, so in about ‘92, my first real job, I had to write a TCP/IP stack for an obscure network card. And this was prior to the Internet really being available everywhere. And so I didn’t realize I could go and get the BSD implementation and recompile it. So I did it from scratch from the RFCs.
Gage: You did!
Graham-Cumming: And the thing I recommend here is that nobody ever does that because, you know, the real world, real code that really interacts is really hard when you’re trying to work it with other things, so.
Gage: Do you still, John, do you have that code?
Graham-Cumming: I wonder. I have the binary for it.
Gage: Do hunt for it, because our story was at the time DARPA, the Defense Advanced Research Projects Agency, that had funded networking initiatives around the world. I just had a discussion yesterday with Norway and they were one of the first entities to implement using essentially Bill Joy’s code, but to be placed on the ARPANET. And a challenge went out, and at that time the slightly older generation, the Bolt Beranek and Newman Group, Vint Cerf, Bob Con, those names, as Vint Cerf was a grad student at UCLA where he had built one of the four first Internet sites and the DARPA offices were in Arlington, Virginia, they had massive investments in detection of nuclear underground tests, so seismological data, and the moment we made the very first Suns, I shipped them to DARPA, we got the network up and began serving seismic data globally. Really lovely visualization of events. If you’re trying to detect something, those things go off and then there’s a distinctive signature, a collapse of the underground cavern after. So DARPA had tried to implement, as you did, from the spec, from the RFC, the components, and Vint had designed a lot of this, all the acknowledgement codes and so forth that you had to implement in TCP/IP. So Bill, as a graduate student at Berkeley, we had a meeting in Arlington at DARPA headquarters where BBN and AT&T Bell Labs and a number of other people were in the room. Their code didn’t work, this graduate student from Berkeley named Bill Joy, his code did work, and when Bob Kahn and Vint Cerf asked Bill, “Well, so how did you do it?” What he said was exactly what you just said, he said, “I just read the spec and wrote the code.”
Graham-Cumming: I do remember very distinctly because the company I was working at didn’t have a TCP/IP stack and we didn’t have any IP machines, right, we were doing actually stuff that was all IBM networking, SMA stuff. Somehow we bought what was at that point a HP machine, it was an Apollo workstation and a Sun workstation. I had them on Ethernet and talking to each other. And I do distinctly remember the first time a ping packet came back from that Sun box, saying, yes I managed to send you an IP packet, you managed to send me ICMP response and that was pretty magical. And then I got to TCP and that was hard.
Gage: That was hard. Yeah. When you get down to the details, the spec can be wrong. I mean, it will want you to do something that’s a stupid thing to do. So Bill has such good taste in these things. It would be interesting to do a kind of a diff across the various implementations of the stack. Years and years later we had maybe 50 companies all assemble in a room, only engineers, throw out all the marketing people and all the Ps and VPs and every company in this room—IBM, Hewlett-Packard—oh my God, Hewlett-Packard, fix your TCP—and we just kept going until everybody could work with everybody else in sort of a pact. We’re not going to reveal, Honeywell, that you guys were great with earlier absolute assembly code, determinate time control stuff but you have no clue about how packets work, we’ll help you, so that all of us can make every machine interoperate, which yielded the network show, Interop. Every year we would go put a bunch of fiber inside whatever, you know, Geneva, or pick some, Las Vegas, some big venue.
Graham-Cumming: I used to go to Vegas all the time and that was my great introduction to Vegas was going there for Interop, year after year.
Gage: Oh, you did! Oh, great.
Graham-Cumming: Yes, yes, yes.
Gage: You know in a way, what you’re doing with, for example, just last week with the Verizon problem, everybody implementing what you’re doing now that is not open about their mistakes and what they’ve learned and is not sharing this, it’s a problem. And your global presence to me is another absolutely critical thing. We had about, I forget, 600 engineers in Beijing at the East Gate of Tsinghua a lot of networking expertise and lots of those people are at Tencent and Huawei and those network providers throughout the rest of the world, politics comes and goes but the engineering has to be done in a way that protects us. And so these conversations globally are critical.
Graham-Cumming: Yes, that’s one of the things that’s fascinating actually about doing real things on the real Internet is there is a global community of people making computers talk to each other and you know, that it’s a tremendously complicated thing to actually make that work, and you do it across countries, across languages. But you end up actually making them work, and that’s the Internet we’re sitting on, that you and I are talking on right now that is based on those conversations around the world.
Gage: And only by doing it do you understand more deeply how to do it. It’s very difficult in the abstract to say what should happen as we begin to spread. As Sun grew, every major city in Africa had installations and for network access, you were totally dependent on an often very corrupt national telco or the complications dealing with these people just to make your packet smooth. And as it turned out, many of the intelligence and military entities in all of these countries had very little understanding of any of this. That’s changed to some degree. But the dangerous sides of the Internet. Total surveillance, IPv6, complete control of exact identity of origins of packets. We implemented, let’s see, you had an early Sun. We probably completed our IPv6 implementation, was it still fluid in the 90s, but I remember 10 years after we finished a complete implementation of IPv6, the U.S. was still IPv4, it’s still IPv4.
Graham-Cumming: It still is, it still is. Pretty much. Except for the mobile carriers right now. I think in general the mobile phone operators are the ones who’ve gone more into IPv6 than anybody else.
Gage: It was remarkable in China. We used to have a conference. We’d bring a thousand Chinese universities into a room. Professor Wu from Tsinghua who built the Chinese Education and Research Network, CERNET. And now a thousand universities have a building on campus doing Internet research. We would get up and show this map of China and he kept his head down politically, but he managed at the point when there was a big fight between the Minister of Telecom and the Minister of Railways. The Minister of Railways said, look, I have continuity throughout China because I have railines. I’ve just made a partnership with the People’s Liberation Army, and they are essentially slave labor, and they’re going to dig the ditches, and I’m going to run fiber alongside the railways and I don’t care what you, the Minister of Telecommunications, has to say about it, because I own the territory. And that created a separate pathway for the backbone IPv6 network in China. Cheap, cheap, cheap, get everybody doing things.
Graham-Cumming: Yes, now of course in China that’s resulted in an interesting situation where you have China Telecom and China Unicom, who sort of cooperate with each other but they’re almost rivals which makes IP packets quite difficult to route inside China.
Gage: Yes exactly. At one point I think we had four hunks of China. Everyone was geographically divided. You know there were meetings going on, I remember the moment they merged the telecom ministry with the electronics ministry and since we were working with both of them, I walk in a room and there’s a third group, people I didn’t know, it turns out that’s the People’s Liberation Army.
Graham-Cumming: Yes, they’re part of the team. So okay, going back to this “Network is the Computer” notion. So you were talking about the initial things that you were doing around that, why is it that it’s okay that Cloudflare has gone out and trademarked that phrase now, because you seem to think that we’ve got a leg to stand on, I guess.
Gage: Frankly, I’d only vaguely heard of Cloudflare. I’ve been working in areas, I’ve got a project in the middle of Nairobi in the slum where I’ve spent the last 15 years or so learning a lot about clean water and sewage treatment because we have almost 400,000 people in a very small area, biggest slum in East Africa. How can you introduce sanitary water and clean sewage treatment into a very, an often corrupt, a very difficult environment, and so that’s been a fascination of mine and I’ve been spending a lot of time. What’s a computer person know about fluid dynamics and pathogens? There’s a lot to learn. So as you guys grew so rapidly, I vaguely knew of you but until I started reading your blog about post-quantum crypto and how do we devise a network in these resilient denial of service attacks and all these areas where you’re a growing company, it’s very hard to take time to do serious advanced research-level work on distributed computing and distributed security, and yet you guys are doing it. When Bill created Java, the subsequent step from Java for billions and billions of devices to share resources and share computations was something we call Genie which is a framework for validation of who you are, movement of code from device to device in a secure way, total memory control so that someone is not capable of taking over memory in your device as we’ve seen with Spectre and the failures of these billions of Intel chips out there that all have a flaw on take all branches parallel compute implementations. So the very hardware you’re using can be insecure so your operating systems are insecure, the hardware is insecure, and yet you’re trying to build on top with fallible pieces in infallible systems. And you’re in the middle of this, John, which I’m so impressed by.
Graham-Cumming: And Jini sort of lives on as called Apache River now. It moved away from Sun and into an Apache project.
Gage: Yes, very few people seem to realize that the name Apache is a poetic phrasing of “a patchy system.” We patch everything because everything is broken. We moved a lot of it, Brian Behlendorf and the Apache group. Well, many of the innovations at Sun, Java is one, file systems that are far more secure and far more resilient than older file systems, the SPARC implementation, I think the SPARC processor, even though you’re using the new ARM processors, but Fujitsu, I still think keeps the SPARC architecture as the world’s fastest microprocessor.
Graham-Cumming: Right. Yes. Being British of course, ARM is a great British success. So I’m honor-bound to use that particular architecture. Clearly.
Gage: Oh, absolutely. And the power. That was the one always in a list of what our engineering goals are. We wanted to make, we were building supercomputers, we were building very large file servers for the telcos and the banks and the intelligence agencies and all these different people, but we always wanted to make a low power and it just fell off the list of what you could accomplish and the ARM chips, their ratios of wattage to packets treated are—you have a great metric on your website someplace about measuring these things at a very low level—that’s key.
Graham-Cumming: Yes, and we had Sophie Wilson, who of course is one of the founders of ARM and actually worked on the original chip, tell this wonderful story at our Internet Summit about how the first chip they hooked up was operating fine until they realized they hadn’t hooked the power up and they were asked to. It was so low power that it was able to use the power that was coming in over the logic lines to actually power the whole chip. And they said to me, wait a minute, we haven’t plugged the power in but the thing is running, which was really, I mean that was an amazing achievement to have done that.
Gage: That’s amazing. We open sourced SPARC, the instruction set, so that anybody doing crypto that also had Fab capabilities could implement detection of ones and zeroes, sheep and goats, or other kinds of algorithms that are necessary for very high speed crypto. And that’s another aspect that I’m so impressed by Cloudflare. Cloudflare is paying attention at a machine instruction level because you’re implementing with your own hardware packages in what, 180 cities? You’re moving logistically a package into Ulan Bator, or into Mombasa and you’re coming up live.
Graham-Cumming: And we need that to be inexpensive and fast because we’re promising people that we will make their Internet properties faster and secure at the same time and that’s one of the interesting challenges which is not trading those two things off. Which means your crypto better be fast, for example, and that requires a lot of fiddling around at the hardware level and understanding it. In our case because we’re using Intel, really what Intel chips are doing at the low level.
Gage: Intel did implement a couple of things in one or another of the more recent chips that were very useful for crypto. We had a group of the SPARC engineers, probably 30, at a dinner five or six months ago discussing, yes, we set the world standard for parallel execution branching optimizations for pipelines and chips, and when the overall design is not matched by an implementation that pays attention to protecting the memory, it’s a fundamental, exploitable flaw. So a lot of discussion about this. Selecting precisely which instructions are the most important, the risk analysis with the ability to make a chip specifically to implement a particular algorithm, there’s a lot more to go. We have multiples of performance ahead of us for specific algorithms based on a more fluid way to add instructions that are necessary into a specific piece of hardware. And then we jump to quantum. Oh my.
Graham-Cumming: Yes. To talk about that a little bit, the ever-increasing speed of processors and the things we can do; Do you think we actually need that given that we’re now living in this incredibly distributed world where we are actually now running very distributed algorithms and do we really need beefier machines?
Gage: At this moment, in a way, it’s you making fun of Bill Joy for only wanting a megabit in Aspen. When Steve Jobs started NeXT, sadly his hardware was just terrible, so we sent a group over to boost NeXT. In fact we sort of secretly slipped him $30 million to keep him afloat. And I’d say, “Jobs, if you really understood something about hardware, it would really be useful here.” So one of the main team members that we sent over to NeXT came to live in Aspen and ended up networking the entire valley. At a point, megabit for what you needed to do, seemed reasonable, so at this moment, as things become alive by the introduction of a little bit of intelligence in them, some little flickering chip that’s able to execute an algorithm, many tasks don’t require. If you really want to factor things fast, quantum, quantum. Which will destroy our existing crypto systems. But if you are just bringing the billions of places where a little bit of knowledge can alter locally a little bit of performance, we could do very well with the compute power that we have right now. But making it live on the network, securely, that’s the key part. The attacks that are going on, simple errors as you had yesterday, are simple errors. In a way, across Cloudflare’s network, you’re watching the challenges of the 21st century take place: attacks, obscure, unknown exploits of devices in the power and water control systems. And so, you are in exactly the right spot to not get much sleep and feel a heavy responsibility.
Graham-Cumming: Well it certainly felt like it yesterday when we were offline for 27 minutes, and that’s when we suddenly discovered, we sort of know how many customers we have, and then we really discover when they start phoning us. Our support line had his own DDoS basically where it didn’t work anymore because so many people signed in. But yes, I think that it’s interesting your point about a little bit extra on a device somewhere can do something quite magical and then you link it up to the network and you can do a lot. What we think is going on partly is some things around AI, where large amounts of machine learning are happening on big beefy machines, perhaps in the cloud, perhaps groups of machines, and then devices are doing their own little bits of inference or recognizing faces and stuff like that. And that seems to be an interesting future where we have these devices that are actually intelligent in our pockets.
Gage: Oh, I think that’s exactly right. There’s so much power in your pocket. I’m spending a lot of time trying to catch up that little bit of mathematics that you thought you understood so many years ago and it turns out, oh my, I need a little bit of work here. And I’ve been reading Michael Jordan’s papers and watching his talks and he’s the most cited computer scientist in machine learning and he will always say, “Be very careful about the use of the phrase, ‘Artificial Intelligence’.” Maybe it’s a metaphor like “The Network is the Computer.” But, we’re doing gradient descent optimization. Is the slope going up, or is the slope going down? That’s not smart. It’s useful and the real time language translation and a lot of incredible work can occur when you’re doing phrases. There’s a lot of great pattern work you can do, but he’s out in space essentially combining differentiation and integration in a form of integral. And off we go. Are your hessians rippling in the wind? And what’s the shape of this slope? And is this actually the fastest path from here to there to constantly go downhill. Maybe it’s sometimes going uphill and going over and then downhill that’s faster. So there’s just a huge amount of new mathematics coming in this territory and each time, as we move from 2G to 3G to 4G to 5G, many people don’t appreciate that the compression algorithms changed between 2G, 3G, 4G and 5G and as a result, so much more can move into your mobile device for the same amount of power. 10 or 20 times more for the same about of power. And mathematics leads to insights and applications of it. And you have a working group in that area, I think. I tried to probe around to see if you’re hiring.
Graham-Cumming: Well you could always just come around to just ask us because we’ll probably tell you because we tend to be fairly transparent. But yes, I mean compression is definitely an area where we are interested in doing things. One of the things I first worked on at Cloudflare was a thing that did differential compression based on the insight that web pages don’t actually change that much when you hit ‘refresh’. And so it turns out that if you if you compress based on the delta from the last thing you served to someone you can actually send many orders of magnitude less data and so there’s lots of interesting things you can do with that kind of insight to save a tremendous amount of bandwidth. And so yeah, definitely compression is interesting, crypto is interesting to us. We’ve actually open sourced some of our compression improvements in zlib which was very popular compression algorithm and now it’s been picked up. It turns out that in neuroscience, because there’s a tremendous amount of data which needs compression and there are pipelines used in neuroscience where actually having better compression algorithms makes you work a lot faster. So it’s fascinating to see the sort of overspill of things we’re doing into other areas where I know nothing about what goes on inside the brain.
Gage: Well isn’t that fascinating, John. I mean here you are, the CTO of Cloudflare working on a problem that deeply affects the Internet, enabling a lot more to move across the Internet in less time with less power, and suddenly it turns into a tool for brain modeling and neuroscientists. This is the benefit. There’s a terrific initiative. I’m at Berkeley. The Jupiter notebooks created by Fernando Perez, this environment in which you can write text and code and share things. That environment, taken up by machine learning. I think it’s a major change. And the implementation of diagrams that are causal. These forms of analysis of what caused what. These are useful across every discipline and for you to model traffic and see patterns emerge and find webpages and see the delta has changed and then intelligently change the pattern of traffic in response to it, it’s all pretty much the same thing here.
Graham-Cumming: Yes and then as a mathematician, when I see things that are the same thing, I can’t help wondering what the real deep structure is underneath. There must be another layer another layer down or something. So as you know it’s this thing. There’s some other deeper layer below all this stuff.
Gage: I think this is just endlessly fascinating. So my only recommendations to Cloudflare: first, double what you’re doing. That’s so hard because as you go from 10 people to 100 people to 1,000 people to 10,000 people, it’s a different world. You are a prime example, you are global. Suddenly you’re able to deal with local authorities in 60-70 countries and deal with some of the world’s most interesting terrain and with network connectivity and moving data, surveillance, and some security of the foundation infrastructure of all countries. You couldn’t be engaged in more exciting things.
Graham-Cumming: It’s true. I mean one of the most interesting things to me is that I have grown up with the Internet when I you know I got an email address using actually the crazy JANET scheme in the UK where the DNS names were backwards. I was in Oxford and they gave me an email address and it was I think it was JGC at uk dot ac dot ox dot prg and that then at some point it flipped around and it went to DNS looked like it had won. For a long time my address was the wrong way around. I think that’s a typically British decision to be slightly different to everybody else.
Gage: Well, Oxford’s always had that style, that we’re going to do things differently. There’s an Oxford Center for the 21st century that was created by the money from a wonderful guy who had donated maybe $100 million. And they just branched out into every possible research area. But when you went to meetings, you would enter a building that was built at the time of the Raj. It was the India temple of colonialism.
Graham-Cumming: There’s quite a few of those in the UK. Are you thinking of the Martin School? James Martin. And he gave a lot of money to Oxford. Well the funny thing about that was the programming research group. The one thing they didn’t teach us really as an undergraduate was how to program which was one of the most fascinating things they have because that was a bit getting your hands dirty so you needed to let all the theory. So we learnt all the theory we did a little bit of functional programming and that was the extent of it which set me really up badly for a career in an industry. My first job I had to pretend I knew how to program and see and learn very quickly.
Gage: Oh my. Well now you’ve been writing code in Go.
Graham-Cumming: Yes. Well the thing about Go, the other Oxford thing of course is Tony Hoare, who is a professor of computer science there. He had come up with this thing called CSP (Communicating Sequential Processes) so that was a whole theory around how you do parallel execution. And so of course everybody used his formalism and I did in my doctoral thesis and so when Go came along and they said oh this how Go works, I said, well clearly that’s CSP and I know how to do this. So I can do it again.
Gage: Tony Hoare occasionally would issue a statement about something and it was always a moment. So few people seem to realize the birth of so much of what we took in the 60s, 70s, 80s, in Silicon Valley and Berkeley, derived from the Manchester Group, the virtual memory work, these innovations. Today, Whit Diffie. He used to love these Bletchley stories, they’re so far advanced. That generation has died off.
Graham-Cumming: There’s a very peculiar thing in computer science and the real application of computing which is that we both somehow sit on this great knowledge of the past of computing and at the same time we seem to willfully forget it and reinvent everything every few years. We go through these cycles where it’s like, let’s do centralized computing, now distributed computing. No, let’s have desktop PCs, now let’s have the cloud. We seem to have this collective amnesia and then on occasion people go, “Oh, Leslie Lamport wrote this thing in 1976 about this problem”. What other subject do we willfully forget the past and then have to go and doing archaeology to discover again?
Gage: As a sociological phenomenon it means that the older crowd in a company are depressing because they’ll say, “Oh we tried that and it didn’t work”. Over the years as Sun grew from 15 people or so and ended up being like 45,000 people before we were sold off to Oracle and then everybody dumped out because Oracle didn’t know too much about computing. So Ivan Sutherland, Whit Diffie. Ivan actually stayed on. He may actually still have an Oracle email. Almost all of the research groups, certainly the chip group went off to Intel, Fujitsu, Microsoft. It’s funny to think now that Microsoft’s run by a Sun person.
Graham-Cumming: Well that’s the same thing. Everyone’s forgotten that Microsoft was the evil empire not that long ago. And so now it’s not. Right now it’s cool again.
Gage: Well, all of the embedded stuff from Microsoft is still that legacy that Bill Gates who’s now doing wonderful things with the Gates Foundation. But the embedded insecurity of the global networks is due to, in large part, the insecurities, that horrible engineering of Microsoft embedded everywhere. You go anywhere in China to some old industrial facility and there is some old not updated junky PC running totally insecure software. And it’s controlling the grid. It’s discouraging. It’s like a lot of the SCADA systems.
Graham-Cumming: I’m completely terrified of SCADA systems.
Gage: The simplest exploits. I mean, it’s nothing even complicated. There are a series of emerging journalists today that are paying attention to cybersecurity and people have come out with books even very recently. Well, now because we’re in this China, US, Iran nightmare, a United States presidential directive taking the cybersecurity crowd and saying, oops, now you’re an offensive force. Which means we got some 20-year-old lieutenant somewhere who suddenly might just for fun turn off Tehran’s water supply or something. This is scary because the SCADA systems are embedded everywhere, and they’re, I don’t know, would you say totally insecure? Just the simple things, just simple exploits. One of the journalists described, I guess it was the Russians who took a bunch of small USB sticks and at a shopping center near a military base just gave them away. And people put them into their PCs inside SIPRNet, inside the secure U.S. Department of Defense network. Instantly the network was taken over just by inserting a USB device to something on the net. And there you are, John, protecting against this.
Graham-Cumming: Trying hard to protect against these things, yes absolutely. It’s very interesting because you mentioned before how rapidly Cloudflare had grown over the last few years. And of course Sun also really got going pretty rapidly, didn’t it?
Gage: Well, yes. The first year we were just some students from Berkeley, hardware from Stanford, Andy Bechtolsheim, software from Berkeley, Berkeley Unix BSD, Bill Joy. Combine the two, and 10 of us or so, and we were, I think the first year was 12 million booked, the second year was 50 or 60 million booked, and the third year was 150 or so million booked and then we hit 500 million and then we hit a billion. And now, it’s selling boxes, we were a manufacturing company so that’s different from software or services, but we also needed lots of people and so we instantly raided the immense benefit of variety of people in the San Francisco Bay Area, with Berkeley and Stanford. We had students in computer science, and mechanical engineering, and physics, and mathematics from every country in the world and we recruited from every country in the world. So a great part of Sun’s growth came, as you are, expanding internationally, and at one point I think we ran most of the telcos of the world, we ran China Mobile. 900 million subscribers on China Mobile, all Sun stuff in the back. Throughout Africa, every telco was running Sun and Cisco until Huawei knocked Cisco out. It was an amazing time.
Graham-Cumming: You ran the machine that ran Latek, that let me get my doctoral thesis done.
Gage: You know that’s how I got into it, actually. I was in econometrics and mathematics at Berkeley, and I walk down a hallway and outside a room was that funny smell from photographic paper from something, and there was perfectly typeset mathematics. Troff and nroff, all those old UNIX utilities for the Bell Technical Journal, and I open the door and I’ve got to get in there. There’s two hundred people sitting in front of these beehive-like little terminals all typing away on a UNIX system. And I want to get an account and I walk down the hall and there’s this skinny guy who types about 200 words a minute named Bill Joy. And I said, I need an account, I’ve got to type set integral signs, and he said, what’s your name. I tell him my name, John Gage, and he goes voop, and I’ve never seen anybody type as fast as him in my life. This is a new world, here.
Graham-Cumming: So he was rude then?
Gage: Yeah he was, he was. Well, it’s interesting since the arrival of a device at Berkeley to complement the arrival of an MIT professor who had implemented in LISP, mathematical, not typesetting of mathematics, but actual maxima. To get Professor fetman, maxima god from MIT, to come to Berkeley and live a UNIX environment, we had to put a LISP up outside on the PDP. So Bill took that machine which had virtual memory and implemented the environment for significant computational mathematics. And Steve Wolfram took that CalTech, and Princeton Institute for Advanced Studies, and now we have Mathematica. So in a way, all of Sun and the UNIX world derived from attempting to do executable mathematics.
Graham-Cumming: Which in some ways is what computers are doing. I think one of the things that people don’t really appreciate is the extent to which all numbers underneath.
Gage: Well that’s just this discrete versus continuous problem that Michael Jordan is attempting to address. To my current total puzzlement and complete ignorance, is what in the world is symplectic integration? And how do Lyapunovfunctions work? Oh, no clue.
Graham-Cumming: Are we going to do a second podcast on that? Are you going to come back and teach us?
Gage: Try it. We’re on, you’re on, you’re on. Absolutely. But you’ve got to run a company.
Graham-Cumming: Well I’ve got some things to do. Yeah. But you can go do that and come tell us about it.
Gage: All right, Great John. Well it was terrific to talk to you.
Graham-Cumming: So yes it was wonderful speaking to you as well. Thank you for helping me dig up memories of when I was first fooling around with Sun Systems and, you know, some of the early days and of course “The Network is the Computer,” I’m not sure I fully yet understand quite the metaphor or even if maybe I do somehow deeply in my soul get it, but we’re going to try and make it a reality, whatever it is.
Gage: Well, I count it as a complete success, because you count as one of our successes because you‘re doing what you’re doing, therefore the phrase, “The Network is the Computer,” resides in your brain and when you get up in the morning and decide what to do, a little bit nudges you toward making the network work.
Graham-Cumming: I think that’s probably true. And there’s the dog, the dog is saying you’ve been yakking for an hour and now we better stop. So listen, thank you so much for taking the time. It was wonderful talking to you. You have a good day. Thank you very much.
Interested in hearing more? Listen to my conversations with Ray Rothrock and Greg Papadopoulos of Sun Microsystems:
Last week I spoke with Ray Rothrock, former Director of CAD/CAM Marketing at Sun Microsystems, to discuss his time at Sun and how the Internet has evolved. In this conversation, Ray discusses the importance of trust as a principle, the growth of Sun in sales and marketing, and that time he gave Vice President Bush a Sun demo. Listen to our conversation here and read the full transcript below.
John Graham-Cumming: Here I am very lucky to get to talk with Ray Rothrock who was I think one of the first investors in Cloudflare, a Series A investor and got the company a little bit of money to get going, but if we dial back a few earlier years than that, he was also at Sun as the Director of CAD/CAM Marketing. There is a link between Sun and Cloudflare. At least one, but probably more than one, which is that Cloudflare has recently trademarked, “The Network is the Computer”. And that was a Sun trademark, wasn’t it?
Ray Rothrock: It was, yes.
Graham-Cumming: I talked to John Gage and I asked him about this as well and I asked him to explain to me what it meant. And I’m going to ask you the same thing because I remember walking around the Valley thinking, that sounds cool; I’m not sure I totally understand it. So perhaps you can tell me, was I right that it was cool, and what does it mean?
Rothrock: Well it certainly was cool and it was extraordinarily unique at the time. Just some quick background. In those early days when I was there, the whole concept of networking computers was brand new. Our competitor Apollo had a proprietary network but Sun chose to go with TCP/IP which was a standard at the time but a brand new standard that very few people know about right. So when we started connecting computers and doing some intensive computing which is what I was responsible for—CAD/CAM in those days was extremely intensive whether it was electrical CAD/camera, or mechanical CAD/CAM, or even simulation solid design modeling and things—having a little extra power from other computers was a big deal. And so this concept of “The Network is the Computer” essentially said that you had one window into the network through your desktop computer in those days—there was no mobile computing at that time, this was like 84’, 85’, 86’ I think. And so if you had the appropriate software you could use other people’s computers (for CPU power) and so you could do very hard problems at that single computer could not do because you could offload some of that CPU to the other computers. Now that was very nerdy, very engineering intensive, and not many people did it. We’d go to the SIGGRAPH, which was a huge graphics show in those days and we would demonstrate ten Sun computers for example, doing some graphic rendering of a 3D wireframe that had been created in the CAD/CAM software of some sort. And it was, it was hard, and that was in the mechanical side. On the electrical side, Berkeley had some software that was called Magic—it’s still around and is a very popular EDA software that’s been incorporated in those concepts. But to imagine calculating the paths in a very complicated PCB or a very complicated chip, one computer couldn’t do it, but Sun had the fundamental technology. So from my seat at Sun at the time, I had access to what could be infinite computing power, even though I had a single application running, and that was a big selling point for me when I was trying to convince EDA and MDA companies to put their software on the Sun. That was my job.
Graham-Cumming: And hearing it now, it doesn’t sound very revolutionary, because of course we’re all doing that now. I mean I get my phone out of my pocket and connect to goodness knows what computing power which does image recognition and spots faces and I can do all sorts of things. But walk me through what it felt like at the time.
Rothrock: Just doing a Google search, I mean, how many data stores are being spun up for that? At the time it was incredible, because you could actually do side by side comparisons. We created some demonstrations, where one computer might take ten hours to do a calculation, two computers might take three hours, five computers might take 30 minutes. So with this demo, you could turn on computers and we would go out on the TCP/IP network to look for an available CPU that could give me some time. Let’s go back even further. Probably 15 years before that, we had time sharing. So you had a terminal into a big mainframe and did all this swapping in and out of stuff to give you a time slice computing. We were doing the exact same thing except we were CPU slicing, not just time slicing. That’s pretty nerdy, but that’s what we did. And I had to work with the engineering department, with all these great engineers in those days, to make this work for a demo. It was so unique, you know, their eyes would get big. You remember Novell…
Graham-Cumming: I was literally just thinking about Novell because I actually worked on IPX and SPX networking stuff at the time. I was going to ask you actually, to what extent do you think TCP/IP was a very important part of this revolution?
Rothrock: It was huge. It was fundamentally huge because it was a standard, so it was available and if you implemented it, you didn’t have to pay for it. When Bob Metcalfe did Ethernet, it was on top of the TCP stack. Sun, in my memory, and I could be wrong, was the first company to put a TCP/IP stack on the computer. And so you just plugged in the back, an RJ45 into this TCP/IP network with a switch or a router on it and you were golden. They made it so simple and so cheap that you just did it. And of course if you give an engineer that kind of freedom and it opens up. By the way, as the marketing guy at Sun, this was my first non-engineering job. I came from a very technical world of nuclear physics into Sun. And so it was stunning, just stunning.
Graham-Cumming: It’s interesting that you mentioned Novell and then you mentioned Apollo before that and obviously IBM had SNA networking and there were attempts to do all those networking things. It’s interesting that these open standards have really enabled the explosion of everything else we’ve seen and with everything that’s going on in the Internet.
Rothrock: Sun was open, so to speak, but this concept of open source now that just dominates the conversation. As a venture capitalist, every deal I ever invested in had open source of some sort in it. There was a while when it was very problematic in an M&A event, but the world’s gotten used to it. So open, is very powerful. It’s like freedom. It’s like liberty. Like today, July 4th, it’s a big deal.
Graham-Cumming: Yes, absolutely. It’s just interesting to see it explode today because I spent a lot of my career looking at so many different networking protocols. The thing that really surprises me, or perhaps shouldn’t surprise me when you’ve got these open things, is that you harness so many people’s intelligence that you just end up with something that’s just better. It seems simple.
Rothrock: It seems simple. I think part of the magic of Sun is that they made it easy. Easy is the most powerful thing you can do in computing. Computing can be so nerdy and so difficult. But if you just make it easy, and Cloudflare has done a great job with that at that; they did it with their DNS service, they did it with all the stuff we worked on back when I was on the board and actively involved in the company. You’ve got to make it easy. I mean, I remember when Matthew and Lee worked like 20 hours a day on how to switch your DNS from whoever your provider was to Cloudflare. That was supposed to be one click, done. A to B. And that DNA was part of the magic. And whether we agree that Sun did it that way, to me at least, Sun did it that way as well. So it’s huge, a huge lift.
Graham-Cumming: It’s funny you talk about that because at the time, how that actually worked is that we just asked people to give us their username and password. And we logged in and did it for them. Early on, Matthew asked me if I’d be interested in joining Cloudflare when it was brand new and because of other reasons I’d moved back to the UK and I wasn’t ready to change jobs and I’d just taken another job. And I remember thinking, this thing is crazy this Cloudflare thing. Who’s going to hand over their DNS and their traffic to these four or five people above a nail salon in Palo Alto? And Matthew’s response was, “They’re giving us their passwords, let alone their traffic.” Because they were so desperate for it.
Rothrock: It tells you a lot about Matthew and you know as an attorney, I mean he was very sensitive to that and believes that one of the one of the founding principles is trust. His view was that, if I ever lose the customer’s trust, Cloudflare is toast. And so everything focused around that key value. And he was right.
Graham-Cumming: And you must have, at Sun, been involved with some high performance computing things that involved sensitive customers doing cryptography and things like that. So again trust is another theme that runs through there as well.
Rothrock: Yeah, very true. As the marketing guy of CAD/CAM, I was in the field two-thirds of the time, showing customers what was possible with them. My job was to get third party software onto the Sun box and then to turn that into a presentation to a customer. So I visited many government customers, many aerospace, power, all these very high falutin sort of behind the firewall kinds of guys in those days. So yes, trust was huge. It would come up: “Okay, so I’m using your CPU, how is it that you can’t use mine. And how do you convince me that you’ve not violated something.” In those days it was a whole different conversation that it is today but it was nonetheless just as important. In fact I remember I spent quite a bit of time at NCSA at the University of Illinois Urbana-Champaign. Larry Smarr was the head of NCSA. We spent a lot of time with Larry. I think John was there with me. John Gage and Vinod and some others but it was a big deal taking about high performance computing because that’s what they were doing and doing it with Sun.
Graham-Cumming: So just to dial forward, so you’re at Venrock and you decide to invest in Cloudflare. What was it that made you think that this was worth investing in? Presumably you saw some things that were in some of Sun’s vision. Because Sun had a very wide-ranging visions about what was going to be possible with computing.
Rothrock: Yeah. Let me sort of touch on a few points probably. Certainly Sun was my first computer company I worked for after I got out of the nuclear business and the philosophy of the company was very powerful. Not only we had this cool 19 inch black and white giant Macintosh essentially although the Mac wasn’t even born yet, but it had this ease of use that was powerful and had this open, I mean it was we preached that all the time and we made that possible. And Cloudflare—the related philosophy of Matthew and Michelle’s genius—was they wanted to make security and distribution of data as free and easy as possible for the long tail. That was the first thinking because you didn’t have access if you were in the long tail you were a small company you or you’re just going to get whipped around by the big boys. And so there was a bit of, “We’re here to help you, we’re going to do it.” It’s a good thing that the long tail get mobilized if you will or emboldened to use the Internet like the big boys do. And that was part of the attractiveness. I didn’t say, “Boy, Matthew, this sounds like Sun,” but the concept of open and liberating which is what they were trying to do with this long tail DNS and CDN stuff was very compelling and seemed easy. But nothing ever is. But they made it look easy.
Graham-Cumming: Yeah, it never is. One of the parallels that I’ve noticed is that I think early on at Sun, a lot of Sun equipment went to companies that later became big companies. So some of these small firms that were using crazy work stations ended up becoming some of the big names in the Valley. To your point about the long tail, they were being ignored and couldn’t buy from IBM even if they wanted to.
Rothrock: They couldn’t afford SNA and they couldn’t do lots of things. So Sun was an enabler for these companies with cool ideas for products and software to use Sun as the underpinning. workstations were all the rage, because PCs were very limited in those days. Very very limited, they were all Intel based. Sun was 68000-based originally and then it was their own stuff, SPARC. You know in the beginning it was a cheap microprocessor from Motorola.
Graham-Cumming: What was the growth like at Sun? Because it was very fast, right?
Rothrock: Oh yes, it was extraordinarily fast. I think I was employee 130 or something like that. I left Sun in 1986 to go to business school and they gave me a leave of absence. Carol Bartz was my boss at that moment. The company was like at 2000 people just two and a half years later. So it was growing like a weed. I measured my success by how thick the catalyst—that was our catalog name and our program—how thick and how quickly I could add bonafide software developers to our catalog. We published on one sheet of paper front to back. When I first got there, our catalyst catalogue was a sheet of paper, and when I left, it was a book. It was about three-quarters of an inch thick. My group grew from me to 30 people in about a year and a half. It was extraordinary growth. We went public during that time, had a lot of capital and a lot of buzz. That openness, that our competition was all proprietary just like you were citing there, John. IBM and Apollo were all proprietary networks. You could buy a NIC card and stick it into your PC and talk to a Sun. And vise versa. And you couldn’t do that with IBM or Apollo. Do you remember those?
Graham-Cumming: I do because I was talking to John Gage. In my first job out of college, I wrote a TCP/IP stack from scratch, for a manufacturer of network cards. The test of this stack was I had an HP Apollo box and I had a Sun workstation and there was a sort of magical, can I talk to these devices? And can I ping them? And then that was already magical the first ping as it went across the network. And then, can I Telnet to one of these? So you know, getting the networking actually running was sort of the key thing. How important was networking for Sun in the early days? Was it always there?
Rothrock: Yeah, it was there from the beginning, the idea of having a network capability. When I got there it was network; the machine wasn’t standalone at all. We sort of mimicked the mainframe world where we had green screens hooked into a Sun in a department for example. And there was time sharing. But as soon as you got a Sun on your desk, which was rare because we were shipping as many as we could build, it was fantastic. I was sharing information with engineering and we were working back and forth on stuff. But I think it was fundamental: you have a microprocessor, you’ve got a big screen, you’ve got a graphic UI, and you have a network that hooks into the greater universe. In those days, to send an all-Sun email around the world, modems spun up everywhere. The network wasn’t what it is now.
Graham-Cumming: I remember in about 89’, I was at a conference and Whit Diffie was there. I asked him what he was doing. He was in a little computer room. I was trying to typeset something. And he said, “I’m telnetting into a machine which is in San Diego.” It was the first time I’d seen this and I stepped over and he was like, “look at this.” And he’s hitting the keyboard and the keys are getting echoed back. And I thought, oh my goodness, this is incredible. It’s right across the Atlantic and across the country as well.
Rothrock: I think, and this is just me talking having lived the last years and with all the investing and stuff I did, but you know it enabled the Internet to come about, the TCP/IP standard. You may recall that Microsoft tried to modify the TCP/IP stack slightly, and the world rejected it, because it was just too powerful, too pervasive. And then along comes HTTP and all the other protocols that followed. Telnetting, FTPing, all that file transfer stuff, we were doing that left, right, and center back in the 80s. I mean you know Cloudflare just took all this stuff and made it better, easier, and literally lower friction. That was the core investment thesis at the time and it just exploded. Much like when Sun adopted TCP/IP, it just exploded. You were there when it happened. My little company that I’m the CEO of now, we use Cloudflare services. First thing I did when I got there was switched to Cloudflare.
Graham-Cumming: And that was one of the things when I joined, we really wanted people get to a point where if you’re putting something on the web, you just say, well I’m going to put Cloudflare or a thing like Cloudflare just on it. Because it protects it, it makes it faster, etc. And of course now what we’ve done is we’ve given people compute facility. Right now you can write code and run it in our in our machines worldwide which is another whole thing.
Rothrock: And that is “The Network is the Computer”. The other thing that Sun was pitching then was a paperless office. I remember we had posters of paper flying out of a computer window on a Sun workstation and I don’t think we’ve gotten there yet. But certainly, the network is the computer.
Graham-Cumming: It was probably the case that the paperless office was one of those things that was about to happen for quite a long time.
Rothrock: It’s still about to happen if you ask me. I think e-commerce and the sort of the digital transformation has driven it harder than just networking. You know, the fact that we can now sign legal documents over the Internet without paper and things like that. People had to adopt. People have to trust. People have to adopt these standards and accept them. And lo and behold we are because we made it easy, we made it cheap, and we made it trustworthy.
Graham-Cumming: If you dial back through Sun, what was the hardest thing? I’m asking because I’m at a 1,000-person company and it feels hard some days, so I’m curious. What do I need to start worrying about?
Rothrock: Well yeah, at 1,000 people, I think that’s when John came into the company and sort of organized marketing. I would say, holding engineering to schedules; that was hard. That was hard because we were pushing the envelope our graphics was going from black and white to color. The networking stuff the performance of all the chips into the boards and just the performance was a big deal. And I remember, for me personally, I would go to a trade show. I’d go to Boston to the Association of Mechanical Engineers with the team there and would show up at these workstations and of course the engineers want to show off the latest. So I would be bringing with me tapes that we had of the latest operating system. But getting the engineers to be ready for a tradeshow was very hard because they were always experimenting. I don’t believe the word “code freeze” meant much to them, frankly, but we would we would be downloading the software and building a trade show thing that had to run for three days on the latest and greatest and we knew our competitor would be there right across the aisle from us sort of showing their hot stuff. And working with Eric Schmidt in those days, you know, Eric you just got to be done on this date. But trade shows were wonderful. They focused the company’s endpoints if you will. And marketing and sales drove Sun; Scott McNealy’s culture there was big on that. But we had to show. It’s different today than it was then, I don’t know about the Cloudflare competition, but back then, there were a dozen workstation companies and we were fighting for mindshare and market share every day. So you didn’t dare sort of leave your best jewels at home. You brought them with you. I will give John Gage high, high marks. He showed me how to dance through a reboot in case the code crashed and he’s marvelous and I learned how to work that stuff and to survive.
Rothrock: Can I tell you one sort of sales story?
Graham-Cumming: Yes, I’m very interested in hearing the non-technical stories. As an engineer, I can hear engineering stories all the time, but I’m curious what it was like being in sales and marketing in such an engineering heavy company as Sun.
Rothrock: Yeah. Well it was challenging of course. One of the strategies that Sun had in those days was to get anyone who was building their own computer. This was Computer Vision and Data General and all those guys to adopt the Sun as their hardware platform and then they could put on whatever they wanted. So because I was one of the demo gods, my job was to go along with the sales guys when they wanted to try to convince somebody. So one of the companies we went after was Data General (DG) in Massachusetts. And so I worked for weeks on getting this whole demo suite running MDA, EDA, word processing, I had everything. And this was a big, big, big deal. And I mean like hundreds of millions of dollars of revenue. And so I went out a couple of days early and we were going to put up a bunch of Suns and I had a demo room at DG. So all the gear showed up and I got there at like 5:30 in the morning and started downloading everything, downloading software, making it dance. And at about 8:00 a.m. in the morning the CEO of Data General walks in. I didn’t know who he was but it turned out to be Ed de Castro. And he introduces himself and I didn’t know who he was and he said, “What are you doing?” And I explained, “I’m from Sun, I’m getting ready for a big demo. We’ve got a big executive presentation. Mr. McNealy will be here shortly, etc.” And he said, “Well, show me what you’ve got.” So I’m sort of still in the middle of downloading this software and I start making this thing dance. I’ve got these machines talking to each other and showing all kinds of cool stuff. And he left. And the meeting was about 10 or 11 in the morning. And so when the executive team from Sun showed up they said, “Well, how’s it going?” I said, “Well I gave a demo to a guy,” and they asked, “Who’s the guy,” and I said, “It was Ed de Castro.” And they went, “Oh my God, that was the CEO.” Well, we got the deal. I thought Ed had a little tactic there to come in early, see what he could see, maybe get the true skinny on this thing and see what’s real. I carried the day. But anyway, I got a nice little bonus for that. But Vinod and I would drop into Lockheed down in Southern California. They wanted to put Suns on P-3 airplanes and we’d go down there with an engineer and we’d figure out how to make it. Those were just incredible times. You may remember back in the 80s everyone dressed up except on Fridays. It was dress-down Fridays. And one day I dressed down and Carol Bartz, my boss, saw me wearing blue jeans and just an open collared shirt and she said, “Rothrock, you go home and put on a suit! You never know when a customer is going to walk in the front door.” She was quite right. Kodak shows up. Kodak made a big investment in Sun when it was still private. And I gave that demo and then AT&T, and then interestingly Vice President Bush back in the Reagan administration came to Sun to see the manufacturing and I gave the demo to the Vice President with Scott and Andy and Bill and Vinod standing there.
Graham-Cumming: Do you remember what he saw?
Rothrock: It was my standard two minute Sun demo that I can give in my sleep. We were on the manufacturing floor. We picked up a machine and I created a demo for it and my executive team was there. We have a picture of it somewhere, but it was fun. As John Gage would say, he’d say, “Ray, your job is to make the computer dance.” So I did.
Graham-Cumming: And one of the other things I wanted to ask you about is at some point Sun was almost Amazon Web Services, wasn’t it. There was a rent-a-computer service, right?
Rothrock: I don’t know. I don’t remember the rent-a-computer service. I remember we went after the PC business aggressively and went after the data centers which were brand new in those days pretty aggressively, but I don’t remember the rent-a-computer business that much. It wasn’t in my domain.
Graham-Cumming: So what are you up to these days?
Rothrock: I’m still investing. I do a lot of security investing. I did 15 deals while I was at Venrock. Cloudflare was the last one I did, which turned out really well of course. More to come, I hope. And I’m CEO of one of Venrock’s portfolio companies that had a little trouble a few years back but I fixed that and it’s moving up nicely now. But I’ve started thinking about more of a science base. I’m on the board of the Carnegie Institute of Science. I’m on the board of MIT and I just joined the board of the Nuclear Threat Initiative in Washington which is run by Secretary Ernie Moniz, former secretary of energy. So I’m doing stuff like that. John would be pleased with how well that played through. But I’ll tell you it is this these fundamental principles, just tying it all back to Sun and Cloudflare, and this sort of open, cheap, easy, enabling humans to do things without too much friction, that is exciting. I mean, look at your phone. Steve Jobs was the master of design to make this thing as sweet as it is.
Graham-Cumming: Yes, and as addictive.
Rothrock: Absolutely, right. I haven’t been to a presentation from Cloudflare in two years, but every time I see an announcement like the DNS service, I immediately switched all my DNS here at the house to 22.214.171.124. Stuff like that. Because I know it’s good and I know it’s trustworthy, and it’s got that philosophy built in the DNA.
Graham-Cumming: Yes definitely. Taking it back to what we talked about at the beginning, it’s definitely the trustworthiness is something that Cloudflare has cared about from the beginning and continues to care about. We’re sort of the guardians of the traffic that passes through it.
Rothrock: Back when the Internet started happening and when Sun was doing Java, I mean, all those things in the 90s, I was of course at Venrock, but I was still pretty connected to [Edward] Zander and [Scott] McNealy. We were hoping that it would be liberating, that it would create a world which was much more free and open to conversation and we’ve seen the dark side of some of that. But I continue to believe that transparency and openness is a good thing and we should never shut it down. I don’t mean to get it all waxing philosophical here but way more good comes from being open and transparent than bad.
Graham-Cumming: Listen it’s July 4th. It’s evening here in London. We can be waxing philosophical as much as we like. Well listen, thank you for taking the time to chat with me. Are there any other reminiscences of Sun that you think the public needs to know in this oral history of “The Network is the Computer.”
Rothrock: Well you know the only thing I’d say is having landed in the Silicon Valley in 1981 and getting on with Sun, I can say this given my age and longevity here, everything is built on somebody else’s great ideas. And starting with TCP/IP and then we went to this HTML protocol and browsers, it’s just layer on layer on layer on layer and so Cloudflare is just one of the latest to climb on the shoulders of those giants who put it all together. I mean, we don’t even think about the physical network anymore. But it is there and thank goodness companies like Cloudflare keep providing that fundamental service on which we can build interesting, cool, exciting, and mind-changing things. And without a Cloudflare, without Sun, without Apollo, without all those guys back in the day, it would be different. The world would just be so, so different. I did the New York Times crossword puzzle. I could not do it without Google because I have access to information I would not have unless I went to the library. It’s exponential and it just gets better. Thanks to Michelle and Matthew and Lee for starting Cloudflare and allowing Venrock to invest in it.
Graham-Cumming: Well thank you for being an investor. I mean, it helped us get off the ground and get things moving. I very much agree with you about the standing on the shoulders of giants because people don’t appreciate the extent to which so much of this fundamental work that we did was done in the 70s and 80s.
Rothrock: Yea, it’s just like the automobile and the airplane. We reminisce about the history but boy, there were a lot of giants in those industries as well. And computing is just the latest.
Graham-Cumming: Yep, absolutely. Well, Ray, thank you. Have a good afternoon.
Interested in hearing more? Listen to my conversations with John Gage and Greg Papadopoulos of Sun Microsystems:
Optimizely – Optimizely chose Workers when updating their experimentation platform to provide faster responses from the edge and support more experiments for their customers.
Cordial – Cordial used a “stable of Workers” to do custom Black Friday load shedding as well as using it as a serverless platform for building scalable customer-facing services.
AO.com – AO.com used Workers to avoid significant code changes to their underlying platform when migrating from a legacy provider to a modern cloud backend.
Pwned Passwords – Troy Hunt’s popular “Have I Been Pwned” project benefits from cache hit ratios of 94% on its Pwned Passwords API due to Workers.
Timely – Using Workers and Workers KV, Timely was able to safely migrate application endpoints using simple value updates to a distributed key-value store. Quintype – Quintype was an eager adopter of Workers to cache content they previously considered un-cacheable and improve the user experience of their publishing platform.
I spoke with Greg Papadopoulos, former CTO of Sun Microsystems, to discuss the origins and meaning of The Network is the Computer®, as well as Cloudflare’s role in the evolution of the phrase. During our conversation, we considered the inevitability of latency, the slowness of the speed of light, and the future of Cloudflare’s newly acquired trademark. Listen to our conversation here and read the full transcript below.
John Graham-Cumming: Thank you so much for taking the time to chat with me. I’ve got Greg Papadopoulos who was CTO of Sun and is currently a venture capitalist. Tell us about “The Network is the Computer.”
Greg Papadopoulos: Well, from certainly a Sun perspective, the very first Sun-1 was connected via Internet protocols and at that time there was a big war about what should win from a networking point of view. And there was a dedication there that everything that we made was going to interoperate on the network over open standards, and from day one in the company, it was always that thought. It’s really about the collection of these machines and how they interact with one another, and of course that puts the network in the middle of it. And then it becomes hard to, you know, where’s the line? But it is one of those things that I think even if you ask most people at Sun, you go, “Okay explain to me ‘The Network is the Computer.’” It would get rather meta. People would see that phrase and sort of react to it in their own way. But it would always come back to something similar to what I had said I think in the earlier days.
Graham-Cumming: I remember it very well because it was obviously plastered everywhere in Silicon Valley for a while. And it sounded incredibly cool but I was never quite sure what it meant. It sounded like it was one of those things that was super deep but I couldn’t dig deep enough. But it sort of seems like this whole vision has come true because if you dial back to I think it’s 2006, you wrote a blog post about how the world was only going to need five or seven or some small number of computers. And that was also linked to this as well, wasn’t it?
Papadopoulos: Yeah, I think as things began to evolve into what we would call cloud computing today, but that you could put substantial resources on the other side of the network and from the end user’s perspective and those could be as effective or more effective than something you’d have in front of you. And so this idea that you really could provide these larger scale computing services in early days — you know, grid was the term used before cloud — but if you follow that logic, and you watch what was happening to the improvements of the network. Dave Patterson at Cal was very fond of saying in that era and in the 90s, networks are getting to the place where the desk connected to another machine is transparent to you. I mean it could be your own, in fact, somebody else’s memory may in fact be closer to you than your own disk. And that’s a pretty interesting thought. And so where we ended up going was really a complete realization that these things we would call servers were actually just components of this network computer. And so it was very mysterious, “The Network is the Computer,” and it actually grew into itself in this way. And I’ll say looking at Cloudflare, you see this next level of scale happening. It’s not just, what are those things that you build inside a data center, how do you connect to it, but in fact, it’s the network that is the computer that is the network.
Graham-Cumming: It’s interesting though that there have been these waves of centralization and then push the computing power to the edge and the PCs at some point and then Larry Ellison came along and he was going to have this networked computer thing, and it sort of seems to swing back and forth, so where do you think we are in this swinging?
Papadopoulos: You know, I don’t think so much swinging. I think it’s a spiral upwards and we come to a place and we look down and it looks familiar. You know, where you’ll say, oh I see, here’s a 3270 connected to a mainframe. Well, that looks like a browser connected to a web server. And you know, here’s the device, it’s connected to the web service. And they look similar but there are some very important differences as we’re traversing this helix of sorts. And if you look back, for example the 3270, it was inextricably bound to a single server that was hosted. And now our devices have really the ability to connect to any other computer on the network. And so then I think we’re seeing something that looks like a pendulum there, it’s really a refactoring question on what software belongs where and how hard is it to maintain where it is, and naturally I think that the Internet protocol clearly is a peer to peer protocol, so it doesn’t take sides on this. And so that we end up in one state, with more on the client or less on the client. I think it really has to do with how well we’ve figured out distributed computing and how well we can deliver code in a management-free way. And that’s a longer conversation.
Graham-Cumming: Well, it’s an interesting conversation. One thing is what you talked about with Sun Grid which then we end up with Amazon Web Services and things like that, is that there was sort of the device, be it your handheld or your laptop talking to some cloud computing, and then what Cloudflare has done with this Workers product to say, well, actually I think there’s three places where code could exist. There’s something you can put inside the network.
Papadopoulos: Yes. And by extension that could grow to another layer too. And it goes back to, I think it’s Dave Clark who I first remember saying you can get all the bandwidth you want, that’s money, but you can’t reduce latency. That’s God, right. And so I think there are certainly things and as I see the Workers architecture, there are two things going on. There’s clearly something to be said about latency there, and having distributed points of presence and getting closer to the clients. And there’s IBM with interaction there too, but it is also something that is around management of software and how we should be thinking in delivery of applications, which ultimately I believe, in the limit, become more distributed-looking than they are now. It’s just that it’s really hard to write distributed applications in kind of the general way we think about it.
Graham-Cumming: Yes, that’s one of these things isn’t it, it is exceedingly hard to actually write these things which is why I think we’re going through a bit of a transition right now where people are trying to figure out where that code should actually execute and what should execute where.
Papadopoulos: Yeah. You had graciously pointed out this blog from a dozen years ago on, hey this is inevitable that we’re going to have this concentration of computing, for a lot of economic reasons as anything else. But it’s both a hammer and a nail. You know, cloud stuff in some ways is unnatural in that why should we expect computing to get concentrated like it is. If you really look into it more deeply, I think it has to do with management and control and capital cycles and really things that are kind of on the economic and the administrative side of things, are not about what’s truth and beauty and the destination for where applications should be.
Graham-Cumming: And I think you also see some companies are now starting to wrestle with the economics of the cloud where they realize that they are kind of locked into their cloud provider and are paying rent kind of thing; it becomes entirely economic at that point.
Papadopoulos: Well it does, and you know, this was also something I was pretty vocal about, although I got misinterpreted for a while there as being, you know, anti-cloud or something which I’m not, I think I’m pragmatic about it. One of the dangers is certainly as people yield particularly to SaaS products, that in fact, your data in many ways, unless you have explicit contracts and abilities to disgorge that data from that service, that data becomes more and more captive. And that’s the part that I think is actually the real question here, which is like, what’s the switching cost from one service to another, from one cloud to another.
Graham-Cumming: Yes, absolutely. That’s one of the things that we faced, one of the reasons why we worked on this thing called the Bandwidth Alliance, which is one of the ways in which stuff gets locked into clouds is the egress fee is so large that you don’t want to get your data out.
Papadopoulos: Exactly. And then there is always the, you know, well we have these particular features in our particular cloud that are very seductive to developers and you write to them and it’s kind of hard to undo, you know, just the physics of moving things around. So what you all have been doing there is I think necessary and quite progressive. But we can do more.
Graham-Cumming: Yes definitely. Just to go back to the thought about latency and bandwidth, I have a jokey pair of slides where I show the average broadband network you can buy over time and it going up, and then the change in the speed of light over the same period, which of course is entirely flat, zero progress in the speed of light. Looking back through your biography, you wrote thinking machines and I assume that fighting latency at a much shorter distance of cabling must have been interesting in those machines because of the speeds at which they were operating.
Papadopoulos: Yes, it surprises most people when you say it, but you know, computer architects complain that the speed of light is really slow. And you know, Grace Hopper who is really one of the founders, the pioneers of modern programming languages and COBOL. I think she was a vice admiral. And she would walk around with a wire that was a foot long and say, “this is a nanosecond”. And that seemed pretty short for a while but, you know a nanosecond is an eternity these days.
Graham-Cumming: Yes, it’s an eternity. People don’t quite appreciate it if they’re not thinking about it, how long it is. I had someone who was new to the computing world learning about it, come to me with a book which was talking about fiber optics, and in the book it said there is a laser that flashes on and off a billion times a second to send data down the fiber optic. And he came to me and said, “This can’t possibly be true; it’s just too fast.”
Papadopoulos: No, it’s too slow!
Graham-Cumming: Right? And I thought, well that’s slow. And then I stepped back and thought, you know, to the average person, that is a ridiculous statement, that somehow we humans have managed to control time at this ridiculously small level. And then we keep pushing and pushing and pushing it and people don’t appreciate how fast and actually how slow the light is, really.
Papadopoulos: Yeah. And I think if it actually comes down to it, if you want to get into a very pure reckoning of this is latency is the only thing that matters. And one can look at bandwidth as a component of latency, so you can see bandwidth as a serialization delay and that kind of goes back to Clark thing, you know that, yeah I can buy that, I can’t bribe God on the other side so you know I’m fundamentally left with this problem that we have. Thank you, Albert Einstein, right? It’s kind of hopeless to think about sending information faster than that.
Graham-Cumming: Yeah exactly. There’s information limits, which is driving why we have such powerful phones, because in fact the latency to the human is very low if you have it in your hand.
Papadopoulos: Yes, absolutely. This is where the edge architecture and the Worker structure that you guys are working on, and I think that’s where it becomes really interesting too because it gives me — you talked about earlier, well we’re now introducing this new tier — but it gives me a really closer place from a latency point of view to have some intimate relationship with a device, and at the same time be well-connected to the network.
Graham-Cumming: Right. And I think the other thing that is interesting about that is that your device fundamentally is an insecure thing, so you know if you put code on that thing, you can’t put secrets in it, like a cryptographic secrets, because the end user has access to them. Normally you would keep that in the server somewhere, but then the other funny thing is if you have this intermediary tier which is both secure and low latency to the end user, you suddenly have a different world in which you can put secrets, you can put code that is privileged, but it can interact with the user very very rapidly because the low latency.
Papadopoulos: Yeah. And that essence of where’s my trust domain. Now I’ve seen all kinds of like, oh my gosh, I cannot believe somebody is doing it, like putting their S3 credentials, putting it down on a device and having it talk, you know, the log in for a database or something. You must be kidding. I mean that trust proxy point at low latency is a really key thing.
Graham-Cumming: Yes, I think it’s just people need to start thinking about that architecture. Is there a sort of parallel with things that were going on with very high-performance computing with sort of the massively parallel stuff and what’s happening today? What lessons can we take from work done in the 70s and 80s and apply it to the Internet of today?
Papadopoulos: Well, we talked about this sort of, there are a couple of fundamental issues here. And one we’ve been speaking about is latency. The other one is synchronization, and this comes up in a bunch of different ways. You know, whether it’s when one looks at the cap theorem kinds of things that Eric Brewer has been famous for, can I get consistency and availability and survive partitionability, all that, at the same time. And so you end up in this kind of place of—goes back to maybe Einstein a bit—but you know, knowing when things have happened and when state has been actually changed or committed is a pretty profound problem.
Graham-Cumming: It is, and what order things have happened.
Papadopoulos: Yes. And that order is going to be relative to an observer here as well. And so if you’re insisting on some total ordering then you’re insisting on slowing things down as well. And that really is fundamental. We were pushing into that in the massively parallel stuff and you’ll see that at Internet scale. You know there’s another thing, if I could. This is one of my greatest “aha”s about networks and it’s due to a fellow at Sun, Rob Gingell, who actually ended up being chief engineer at Sun and was one of the real pioneers of the software development framework that brought Solaris forward. But Rob would talk about this thing that I label as network entropy. It’s basically what happens when you connect systems to networks, what do networks kind of do to those systems? And this is a little bit of a philosophical question; it’s not a physical one. And Rob observed that over time networks have this property of wanting to decompose things into constituent parts, have those parts get specialized and then reintegrated. And so let me make that less abstract. So in the early days of connecting systems to networks, one of the natural observations were, well why don’t we take the storage out of those desktop systems or server systems and put them on the other side of at least a local network and into a file server or storage server. And so you could see that computer sort of get pulled apart between its computing and its storage piece. And then that storage piece, you know in Rob’s step, that would go on and get specialized. So we had whole companies start like Network Appliances, Pure Storage, EMC. And so, you know like big pieces of industry or look the original routers were RADb you know running on workstations and you know Cisco went and took that and made that into something and so you now see this effect happen at the next scale. One of the things that really got me excited when I first saw Cloudflare a decade ago was, wow okay in those early days, well we can take a component like a network firewall and that can get pulled away and created as its own network entity and specialized. And I think one of the things, at least from my history of Cloudflare, one of the most profound things was, particularly as you guys went in and separated off these functions early on, the fear of people was this is going to introduce latency, and in fact things got faster. Figure that.
Graham-Cumming: Part of that of course is caching and then there’s dealing with the speed of light by being close to people. But also if you say your company makes things faster and you do all these different things including security, you are forced to optimize the whole thing to live up to the claim. Whereas if you try and chain things together, nobody’s really responsible for that overall latency budget. It becomes natural that you have to do it.
Papadopoulos: Yes. And you all have done it brilliantly, you know, to sort of Gingell’s view. Okay so this piece got decomposed and now specialized, meaning optimized like heck, because that’s what you do. And so you can see that over and over again and you see it in terms of even Twilio or something. You know, here’s a messaging service. I’m just pulling my applications apart, letting people specialize. But the final piece, and this is really the punchline. The final piece is, Rob will talk about it, the value is in the reintegration of it. And so you know what are those unifying forces that are creating, if you will, the operating system for “The Network is the Computer.” You were asking about the massively parallel scale. Well, we had an operating system we wrote for this. As you get up to the higher scale, you get into these more distributed circumstances where the complexity goes up by some important number of orders of magnitude, and now what’s that reintegration? And so I come back and I look at what Cloudflare is doing here. You’re entering into that phase now of actually being that re-integrator, almost that operating system for the computer that is the network.
Graham-Cumming: I think that’s right. We often talk about actually being an operating system on the Internet, so very similar kind of thoughts.
Papadopoulos: Yes. And you know as we were talking earlier about how developers make sense of this pendulum or cycle or whatever it is. Having this idea of an operating system or of a place where I can have ground truths and trust and sort of fixed points in this are terribly important.
Graham-Cumming: Absolutely. So do you have any final thoughts on, what, it must be 30 years on from when “The Network is the Computer” was a Sun trademark. Now it’s a Cloudflare trademark. What’s the future going to look of that slogan and who’s going to trademark it in 30 years time now?
Papadopoulos: Well, it could be interplanetary at that point.
Graham-Cumming: Well, if you talk about the latency problems of going interplanetary, we definitely have to solve the latency.
Papadopoulos: Yeah. People do understand that. They go, wow it’s like seven minutes within here and Mars, hitting close approach.
Graham-Cumming: The earthly equivalent of that is New Zealand. If you speak to people from New Zealand and they come on holiday to Europe or they move to the US, they suddenly say that the Internet works so much better here. And it’s just that it’s closer. Now the Australians have figured this out because Australia is actually drifting northwards so they’re actually going to get within. That’s going to fix it for them but New Zealand is stuck.
Papadopoulos: I do ask my physicist friends for one of two things. You know, either give me a faster speed of light — so far they have not delivered — or another dimension I can cut through. Maybe we’ll keep working on the latter.
Graham-Cumming: All right. Well listen Greg, thank you for the conversation. Thank you for thinking about this stuff many many years ago. I think we’re getting there slowly on some of this work. And yeah, good talking to you.
Papadopoulos: Well, you too. And thank you for carrying the torch forward. I think everyone from Sun who listens to this, and John, and everybody should feel really proud about what part they played in the evolution of this great invention.
Graham-Cumming: It’s certainly the case that a tremendous amount of work was done at Sun that was really fundamental and, you know, perhaps some of that was ahead of its time but here we are.
Papadopoulos: Thank you.
Graham-Cumming: Thank you very much.
Interested in hearing more? Listen to my conversations with John Gage and Ray Rothrock of Sun Microsystems:
We recently registered the trademark for The Network is the Computer®, to encompass how Cloudflare is utilizing its network to pave the way for the future of the Internet.
The phrase was first coined in 1984 by John Gage, the 21st employee of Sun Microsystems, where he was credited with building Sun’s vision around “The Network is the Computer.” When Sun was acquired in 2010, the trademark was not renewed, but the vision remained.
Take it from him:
“When we built Sun Microsystems, every computer we made had the network at its core. But we could only imagine, over thirty years ago, today’s billions of networked devices, from the smallest camera or light bulb to the largest supercomputer, sharing their packets across Cloudflare’s distributed global network.
We based our vision of an interconnected world on open and shared standards. Cloudflare extends this dedication to new levels by openly sharing designs for security and resilience in the post-quantum computer world.
Most importantly, Cloudflare is committed to immediate, open, transparent accountability for network performance. I’m a dedicated reader of their technical blog, as the network becomes central to our security infrastructure and the global economy, demanding even more powerful technical innovation.”
Cloudflare’s massive network, which spans more than 180 cities in 80 countries, enables the company to deliver its suite of security, performance, and reliability products, including its serverless edge computing offerings.
In March of 2018, we launched our serverless solution Cloudflare Workers, to allow anyone to deploy code at the edge of our network. We also recently announced advancements to Cloudflare Workers in June of 2019 to give application developers the ability to do away with cloud regions, VMs, servers, containers, load balancers—all they need to do is write the code, and we do the rest. With each of Cloudflare’s data centers acting as a highly scalable application origin to which users are automatically routed via our Anycast network, code is run within milliseconds of users worldwide.
In honor of registering Sun’s former trademark, I spoke with John Gage, Greg Papadopoulos, former CTO of Sun Microsystems, and Ray Rothrock, former Director of CAD/CAM Marketing at Sun Microsystems, to learn more about the history of the phrase and what it means for the future:
Today we are introducing M5 instances equipped with local NVMe storage. Available for immediate use in 5 regions, these instances are a great fit for workloads that require a balance of compute and memory resources. Here are the specs:
1 x 75 GB NVMe SSD
Up to 2.120 Gbps
Up to 10 Gbps
1 x 150 GB NVMe SSD
Up to 2.120 Gbps
Up to 10 Gbps
1 x 300 GB NVMe SSD
Up to 2.120 Gbps
Up to 10 Gbps
1 x 600 GB NVMe SSD
Up to 10 Gbps
2 x 900 GB NVMe SSD
4 x 900 GB NVMe SSD
The M5d instances are powered by Custom Intel® Xeon® Platinum 8175M series processors running at 2.5 GHz, including support for AVX-512.
You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe; this includes the latest Amazon Linux, Microsoft Windows (Server 2008 R2, Server 2012, Server 2012 R2 and Server 2016), Ubuntu, RHEL, SUSE, and CentOS AMIs.
Here are a couple of things to keep in mind about the local NVMe storage on the M5d instances:
Naming – You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.
Encryption – Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated.
Lifetime – Local NVMe devices have the same lifetime as the instance they are attached to, and do not stick around after the instance has been stopped or terminated.
Available Now M5d instances are available in On-Demand, Reserved Instance, and Spot form in the US East (N. Virginia), US West (Oregon), EU (Ireland), US East (Ohio), and Canada (Central) Regions. Prices vary by Region, and are just a bit higher than for the equivalent M5 instances.
Join us this month to learn about AWS services and solutions. New this month, we have a fireside chat with the GM of Amazon WorkSpaces and our 2nd episode of the “How to re:Invent” series. We’ll also cover best practices, deep dives, use cases and more! Join us and register today!
Containers June 25, 2018 | 09:00 AM – 09:45 AM PT – Running Kubernetes on AWS – Learn about the basics of running Kubernetes on AWS including how setup masters, networking, security, and add auto-scaling to your cluster.
June 28, 2018 | 01:00 PM – 01:45 PM PT – Fireside Chat: End User Collaboration on AWS – Learn how End User Compute services can help you deliver access to desktops and applications anywhere, anytime, using any device. IoT
Knowing what not to do is the first and very important step. But how do we store production credentials. Database credentials, system secrets (e.g. for HMACs), access keys for 3rd party services like payment providers or social networks. There doesn’t seem to be an agreed upon solution.
I’ve previously argued with the 12-factor app recommendation to use environment variables – if you have a few that might be okay, but when the number of variables grow (as in any real application), it becomes impractical. And you can set environment variables via a bash script, but you’d have to store it somewhere. And in fact, even separate environment variables should be stored somewhere.
This somewhere could be a local directory (risky), a shared storage, e.g. FTP or S3 bucket with limited access, or a separate git repository. I think I prefer the git repository as it allows versioning (Note: S3 also does, but is provider-specific). So you can store all your environment-specific properties files with all their credentials and environment-specific configurations in a git repo with limited access (only Ops people). And that’s not bad, as long as it’s not the same repo as the source code.
Since many companies are using GitHub or BitBucket for their repositories, storing production credentials on a public provider may still be risky. That’s why it’s a good idea to encrypt the files in the repository. A good way to do it is via git-crypt. It is “transparent” encryption because it supports diff and encryption and decryption on the fly. Once you set it up, you continue working with the repo as if it’s not encrypted. There’s even a fork that works on Windows.
You simply run git-crypt init (after you’ve put the git-crypt binary on your OS Path), which generates a key. Then you specify your .gitattributes, e.g. like that:
And you’re done. Well, almost. If this is a fresh repo, everything is good. If it is an existing repo, you’d have to clean up your history which contains the unencrypted files. Following these steps will get you there, with one addition – before calling git commit, you should call git-crypt status -f so that the existing files are actually encrypted.
You’re almost done. We should somehow share and backup the keys. For the sharing part, it’s not a big issue to have a team of 2-3 Ops people share the same key, but you could also use the GPG option of git-crypt (as documented in the README). What’s left is to backup your secret key (that’s generated in the .git/git-crypt directory). You can store it (password-protected) in some other storage, be it a company shared folder, Dropbox/Google Drive, or even your email. Just make sure your computer is not the only place where it’s present and that it’s protected. I don’t think key rotation is necessary, but you can devise some rotation procedure.
git-crypt authors claim to shine when it comes to encrypting just a few files in an otherwise public repo. And recommend looking at git-remote-gcrypt. But as often there are non-sensitive parts of environment-specific configurations, you may not want to encrypt everything. And I think it’s perfectly fine to use git-crypt even in a separate repo scenario. And even though encryption is an okay approach to protect credentials in your source code repo, it’s still not necessarily a good idea to have the environment configurations in the same repo. Especially given that different people/teams manage these credentials. Even in small companies, maybe not all members have production access.
The outstanding questions in this case is – how do you sync the properties with code changes. Sometimes the code adds new properties that should be reflected in the environment configurations. There are two scenarios here – first, properties that could vary across environments, but can have default values (e.g. scheduled job periods), and second, properties that require explicit configuration (e.g. database credentials). The former can have the default values bundled in the code repo and therefore in the release artifact, allowing external files to override them. The latter should be announced to the people who do the deployment so that they can set the proper values.
The whole process of having versioned environment-speific configurations is actually quite simple and logical, even with the encryption added to the picture. And I think it’s a good security practice we should try to follow.
The Mozilla blog has an article describing the addition of DNS over HTTPS (DoH) as an optional feature in the Firefox browser. “DoH support has been added to Firefox 62 to improve the way Firefox interacts with DNS. DoH uses encrypted networking to obtain DNS information from a server that is configured within Firefox. This means that DNS requests sent to the DoH cloud server are encrypted while old style DNS requests are not protected.” The configured server is hosted by Cloudflare, which has posted this privacy agreement about the service.
The German charity Save Nemo works to protect coral reefs, and they are developing Nemo-Pi, an underwater “weather station” that monitors ocean conditions. Right now, you can vote for Save Nemo in the Google.org Impact Challenge.
The organisation says there are two major threats to coral reefs: divers, and climate change. To make diving saver for reefs, Save Nemo installs buoy anchor points where diving tour boats can anchor without damaging corals in the process.
In addition, they provide dos and don’ts for how to behave on a reef dive.
To monitor the effects of climate change, and to help divers decide whether conditions are right at a reef while they’re still on shore, Save Nemo is also in the process of perfecting Nemo-Pi.
This Raspberry Pi-powered device is made up of a buoy, a solar panel, a GPS device, a Pi, and an array of sensors. Nemo-Pi measures water conditions such as current, visibility, temperature, carbon dioxide and nitrogen oxide concentrations, and pH. It also uploads its readings live to a public webserver.
The Save Nemo team is currently doing long-term tests of Nemo-Pi off the coast of Thailand and Indonesia. They are also working on improving the device’s power consumption and durability, and testing prototypes with the Raspberry Pi Zero W.
The web dashboard showing live Nemo-Pi data
Save Nemo aims to install a network of Nemo-Pis at shallow reefs (up to 60 metres deep) in South East Asia. Then diving tour companies can check the live data online and decide day-to-day whether tours are feasible. This will lower the impact of humans on reefs and help the local flora and fauna survive.
A healthy coral reef
Nemo-Pi data may also be useful for groups lobbying for reef conservation, and for scientists and activists who want to shine a spotlight on the awful effects of climate change on sea life, such as coral bleaching caused by rising water temperatures.
A bleached coral reef
Vote now for Save Nemo
If you want to help Save Nemo in their mission today, vote for them to win the Google.org Impact Challenge:
Click “Abstimmen” in the footer of the page to vote
Click “JA” in the footer to confirm
Voting is open until 6 June. You can also follow Save Nemo on Facebook or Twitter. We think this organisation is doing valuable work, and that their projects could be expanded to reefs across the globe. It’s fantastic to see the Raspberry Pi being used to help protect ocean life.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.