Tag Archives: Network

Network Performance Update: Developer Week 2022

2022-11-18 David Tuber

Post Syndicated from David Tuber original https://blog.cloudflare.com/network-performance-update-developer-week/

Network Performance Update: Developer Week 2022

Cloudflare is building the fastest network in the world. But we don’t want you to just take our word for it. To demonstrate it, we are continuously testing ourselves versus everyone else to make sure we’re the fastest. Since it’s Developer Week, we wanted to provide an update on how our Workers products perform against the competition, as well as our overall network performance.

Earlier this year, we compared ourselves to Fastly’s Compute@Edge and overall we were faster. This time, not only did we repeat the tests, but we also added AWS Lambda@Edge to help show how we stack up against more and more competitors. The summary: we offer the fastest developer platform on the market. Let’s talk about how we build our network to help make you faster, and then we’ll get into how that translates to our developer platform.

Latest update on network performance

We have two updates on data: a general network performance update, and then data on how Workers compares with Compute@Edge and Lambda@Edge.

To quantify global network performance, we have to get enough data from around the world, across all manner of different networks, comparing ourselves with other providers. We used Real User Measurements (RUM) to fetch a 100kB file from different providers. Users around the world report the performance of different providers. The more users who report the data, the higher fidelity the signal is. The goal is to provide an accurate picture of where different providers are faster, and more importantly, where Cloudflare can improve. You can read more about the methodology in the original Speed Week blog post here.

During Cloudflare One Week (June 2022), we shared that we were faster in more of the most reported networks than our competitors. Out of the top 3,000 networks in the world (by number of IPv4 addresses advertised), here’s a breakdown of the number of networks where each provider is number one in p95 TCP Connection Time, which represents the time it takes for a user on a given network to connect to the provider. This data is from Cloudflare One Week (June 2022):

Here is what the distribution looks like for the top 3,000 networks for Developer Week (November 2022):

In addition to being the fastest across popular networks, Cloudflare is also committed to being the fastest provider in every country.

Using data on the top 3,000 networks from Cloudflare One Week (June 2022), here’s what the world map looks like (Cloudflare is in orange):

And here’s what the world looks like while looking at the top 3,000 networks for Developer Week (November 2022):

Cloudflare became #1 in more countries in Europe and Asia, specifically Russia, Ukraine, Kazakhstan, India, and China, further delivering on our mission to be the fastest network in the world. So let’s talk about how that network helps power the Supercloud to be the fastest developer platform around.

How we’re comparing developer platforms

It’s been six months since we published our initial tests, but here’s a quick refresher. We make comparisons by measuring time to connect to the network, time spent completing requests, and overall time to respond. We call these numbers connect, wait, and response. We’ve chosen these numbers because they are critical components of a request that need to be as fast as possible in order for users to see a good experience. We can reduce the connect times by peering as close as possible to the users. We can reduce the wait times by optimizing code execution to be as fast as possible. If we optimize those two processes, we’ve optimized the response, which represents the end-to-end latency of a request.

Test methodology

To measure connect, wait, and response, we perform three tests against each provider: a simple no-op JavaScript function, a complex JavaScript function, and a complex Rust function. We don’t do a simple Rust function because we expect it to take almost no time at all, and we already have a baseline for end-to-end functionality in the no-op JavaScript function since many providers will often compile both down to WebAssembly.

Here are the functions for each of them:

JavaScript no-op:

async function getErrorResponse(event, message, status) {
  return new Response(message, {status: status, headers: {'Content-Type': 'text/plain'}});
}

JavaScript hard function:

function testHardBusyLoop() {
  let value = 0;
  let offset = Date.now();

  for (let n = 0; n < 15000; n++) {
    value += Math.floor(Math.abs(Math.sin(offset + n)) * 10);
  }

  return value;
}

Rust hard function:

fn test_hard_busy_loop() -> i32 {
  let mut value = 0;
  let offset = Date::now().as_millis();

  for n in 0..15000 {
    value += (((offset + n) as f64).sin().abs() * 10.0) as i32;
  }

  value
}

We’re trying to test how good each platform is at optimizing compute in addition to evaluating how close each platform is to end-users. However, for this test, we did not run a Rust test on Lambda@Edge because it did not natively support our Rust function without uploading a WASM binary that you compile yourself. Since Lambda@Edge does not have a true first-class developer platform and tooling to run Rust, we decided to exclude the Rust scenarios for Lambda@Edge. So when we compare numbers for Lambda@Edge, it will only be for the JavaScript simple and JavaScript hard tests.

Measuring Workers performance from real users

To collect data, we use two different methods: one from a third party service called Catchpoint, and a second from our own network performance benchmarking tests. First, we used Catchpoint to gather a set of data from synthetic probes. Catchpoint is an industry standard “synthetic” testing tool, and measurements collected from real users distributed around the world. Catchpoint is a monitoring platform that has around 2,000 total endpoints distributed around the world that can be configured to fetch specific resources and time for each test. Catchpoint is useful for network providers like us as it provides a consistent, repeatable way to measure end-to-end performance of a workload, and delivers a best-effort approximation for what a user sees.

Catchpoint has backbone nodes that are embedded in ISPs around the world. That means that these nodes are plugged into ISP routers just like you are, and the traffic goes through the ISP network to each endpoint they are monitoring. These can approximate a real user, but they will never truly replicate a real user. For example, the bandwidth for these nodes is 100% dedicated for platform monitoring, as opposed to your home Internet connection, where your Internet experience will be a mixed bag of different use cases, some of which won’t talk to Workers applications at all.

For this new test, we chose 300 backbone nodes that are embedded in last mile ISPs around the world. We filtered out nodes in cloud providers, or in metro areas with multiple transit options, trying to remove duplicate paths as much as possible.

We cross-checked these tests with our own data set, which is collected from users connecting to free websites when they are served 1xxx error pages, just like how we collect data for global network performance. When a user sees this error page, that page that will execute these tests as a part of rendering and upload performance metrics on these calls to Cloudflare.

We also changed our test methodology to use paid accounts for Fastly, Cloudflare, and AWS.

Workers vs Compute@Edge vs Lambda@Edge

This time, let’s start off with the response times to show how we’re doing end-to-end:

Test	95th percentile response (ms)
Cloudflare JavaScript no-op	479
Fastly JavaScript no-op	634
AWS JavaScript no-op	1,400

Cloudflare JavaScript hard	471
Fastly JavaScript hard	683
AWS JavaScript hard	1,411

Cloudflare Rust hard	472
Fastly Rust hard	638

We’re fastest in all cases. Now let’s look at connect times, which show us how fast users connect to the compute platform before doing any actual compute:

Test	95th percentile connect (ms)
Cloudflare JavaScript no-op	82
Fastly JavaScript no-op	94
AWS JavaScript no-op	295

Cloudflare JavaScript hard	82
Fastly JavaScript hard	94
AWS JavaScript hard	297

Cloudflare Rust hard	79
Fastly Rust hard	94

Note that we don’t expect these times to differ based on the code being run, but we extract them from the same set of tests, so we’ve broken them out here.

But what about wait times? Remember, wait times represent time spent computing the request, so who has optimized their platform best? Again, it’s Cloudflare, although Fastly still has a slight edge on the hard Rust test (which we plan to beat by further optimization):

Test	95th percentile wait (ms)
Cloudflare JavaScript no-op	110
Fastly JavaScript no-op	122
AWS JavaScript no-op	362

Cloudflare JavaScript hard	115
Fastly JavaScript hard	178
AWS JavaScript hard	367

Cloudflare Rust hard	125
Fastly Rust hard	122

To verify these results, we compared the Catchpoint results to our own data set. Here is the p95 TTFB for the JavaScript and Rust hard loops for Fastly, AWS, and Cloudflare from our data:

Cloudflare is faster on JavaScript and Rust calls. These numbers also back up the slight compute advantage for Fastly on Rust calls.

The big takeaway from this is that in addition to Cloudflare being faster for the time spent processing requests in nearly every test, Cloudflare’s network and performance optimizations as a whole set us apart and make our Workers platform even faster for everything. And, of course, we plan to keep it that way.

Your application, but faster

Latency is an important component of the user experience, and for developers, being able to ensure their users can do things as fast as possible is critical for the success of an application. Whether you’re building applications in Workers, D1, and R2, hosting your documentation in Pages, or even leveraging Workers as part of your SaaS platform, having your code run in the SuperCloud that is our global network will ensure that your users see the best experience they possibly can.

Our network is hyper-optimized to make your code as fast as possible. By using Cloudflare’s network to run your applications, you can focus on making the best possible application possible and rest easy knowing that Cloudflare is providing you the best user experience possible. This is because Cloudflare’s developer platform is built on top of the world’s fastest network. So go out and build your dreams, and know that we’ll make your dreams as fast as they can possibly be.

Making peering easy with the new Cloudflare Peering Portal

2022-10-19 David Tuber

Post Syndicated from David Tuber original https://blog.cloudflare.com/making-peering-easy-with-the-new-cloudflare-peering-portal/

Making peering easy with the new Cloudflare Peering Portal

In 2018, we launched the Cloudflare Peering Portal, which allows network operators to see where your traffic is coming from and to identify the best possible places to interconnect with Cloudflare. We’re excited to announce that we’ve made it even easier to interconnect with Cloudflare through this portal by removing Cloudflare-specific logins and allowing users to request sessions in the portal itself!

We’re going to walk through the changes we’ve made to make peering easier, but before we do that, let’s talk a little about peering: what it is, why it’s important, and how Cloudflare is making peering easier.

What is peering and why is it important?

Put succinctly, peering is the act of connecting two networks together. If networks are like towns, peering is the bridges, highways, and streets that connect the networks together. There are lots of different ways to connect networks together, but when networks connect, traffic between them flows to their destination faster. The reason for this is that peering reduces the number of Border Gateway Protocol (BGP) hops between networks.

What is BGP?

For a quick refresher, Border Gateway Protocol (or BGP for short) is a protocol that propagates instructions on how networks should forward packets so that traffic can get from its origin to its destination. BGP provides packets instructions on how to get from one network to another by indicating which networks the packets need to go through to get to the destination, prioritizing the paths with the smallest number of hops between origin and destination. BGP sees networks as Autonomous Systems (AS), and each AS has its own number. For example, Cloudflare’s ASN is 13335.

In the example below, AS 1 is trying to send packets to AS 3, but there are two possible paths the packets can go:

The BGP decision algorithm will select the path with the least number of hops, meaning that the path the packets will take is AS 1 → AS 2 → AS 3.

When two networks peer with each other, the number of networks needed to connect AS 1 and AS 3 is reduced to one, because AS 1 and AS 3 are directly connected with each other. But “connecting with” another network can be kind of vague, so let’s be more specific. In general, there are three ways that networks can connect with other networks: directly through Private Network Interconnects (PNI), at Internet Exchanges (IX), and through transit networks that connect with many networks.

Private Network Interconnect

A private network interconnect (PNI) sounds complicated, but it’s really simple at its core: it’s a cable connecting two networks to each other. If networks are in the same datacenter facility, it’s often easy for these two networks to connect and by doing so over a private connection, they can get dedicated bandwidth to each other as well as reliable uptime. Cloudflare has a product called Cloudflare Network Interconnect (CNI) that allows other networks to directly connect their networks to Cloudflare in this way.

Internet Exchanges

An Internet exchange (IX) is a building that specifically houses many networks in the same place. Each network gets one or more ports, and plugs into what is essentially a shared switch so that every network has the potential to interconnect. These switches are massive and house hundreds, even thousands of networks. This is similar to a PNI, but instead of two networks directly connecting to each other, thousands of networks connect to the same device and establish BGP sessions through that device.

At Internet Exchanges, traffic is generally exchanged between networks free of charge, and it’s a great way to interconnect a network with other networks and save money on bandwidth between networks.

Transit networks

Transit networks are networks that are specifically designed to carry packets between other networks. These networks peer at Internet Exchanges and directly with many other networks to provide connectivity for your network without having to get PNIs or IX presence with networks. This service comes at a price, and may impact network performance as the transit network is an intermediary hop between your network and the place your traffic is trying to reach. Transit networks aren’t peering, but they do peering on your behalf.

No matter how you may decide to connect your network to Cloudflare, we have an open peering policy, and strongly encourage you to connect your networks directly to Cloudflare. If you’re interested, you can get started by going through the Cloudflare Peering Portal, which has now been made even easier. But let’s take a second to talk about why peering is so important.

Why is peering important?

Peering is important on the Internet for three reasons: it distributes traffic across many networks reducing single points of failure of the Internet, it often reduces bandwidth prices on the Internet making overall costs lower, and it improves performance by removing network hops. Let’s talk about each of those benefits.

Peering improves the overall uptime of the Internet by distributing traffic across multiple networks, meaning that if one network goes down, traffic from your network will still be able to reach your users. Compare that to connecting to a transit network: if the transit network has an issue, your network will be unreachable because that network was the only thing connecting your network to the rest of the Internet (unless you decide to pay multiple transit providers). With peering, any individual network failure will not completely impact the ability for your users to reach your network.

Peering helps reduce your network bandwidth costs because it distributes the cost you pay an Internet Exchange for a port across all the networks you interconnect with at the IX. If you’re paying \$1000/month for a port at an IX, and you’re peered with 100 networks there, you’re effectively paying \$10/network, as opposed to paying \$1000/month to connect to one transit network. Furthermore, many networks including Cloudflare have open peering policies and settlement free peering, which means we don’t charge you to send traffic to us or the other way round, making peering even more economical.

Peering also improves performance for Internet traffic by bringing networks closer together, reducing the time it takes for a packet to go from one network to another. The more two networks peer with each other, the more physical places on the planet they can exchange traffic directly, meaning that users everywhere see better performance.

Here’s an example. Janine is trying to order food from Acme Food Services, a site protected by Cloudflare. She lives in Boston and connects to Cloudflare via her ISP. Acme Food Services has their origin in Boston as well, so for Janine to see the fastest performance, her ISP should connect to Cloudflare in Boston and then Cloudflare should route her traffic directly to the Acme origin in Boston. Unfortunately for Janine, her ISP doesn’t peer with Cloudflare in Boston, but instead peers with Cloudflare in New York: meaning that when Janine connects to Acme, her traffic is going through her ISP to New York before it reaches Cloudflare, and then all the way back to Boston to the Acme origins!

But with proper peering, we can ensure that traffic is routed over the fastest possible path to ensure Janine connects to Cloudflare in Boston and everything stays local:

Fortunately for Janine, Cloudflare peers with over 10,000 networks in the world in over 275 locations, so high latency on the network is rare. And every time a new network peers with us, we help make user traffic even faster. So now let’s talk about how we’ve made peering even easier.

Cloudflare, along with many other networks, rely on PeeringDB as a source of truth for which networks are present on the Internet. PeeringDB is a community-maintained database of all the networks that are present on the Internet and what datacenter facilities and IXs they are present at, as well as what IPs are used for peering at each public location. Many networks, including Cloudflare, require you to have an account on PeeringDB before you can initiate a peering session with their network.

You can now use that same PeeringDB account to log into the Cloudflare Peering Portal directly, saving you the need to make a specific Cloudflare Peering Portal account.

When you log into the Cloudflare Peering Portal, simply click on the PeeringDB login button and enter your PeeringDB credentials. Cloudflare will then use this login information to determine what networks you are responsible for and automatically load data for those networks.

From here you can see all the places your network exchanges traffic with Cloudflare. You can see all the places you currently have direct peering with us, as well as locations for potential peering: places you could peer with us but currently don’t. Wouldn’t it be great if you could just click a button and configure a peering session with Cloudflare directly from that view? Well now you can!

Requesting sessions in the Peering Portal

Starting today, you can now request peering sessions with Cloudflare at Internet Exchanges right from the peering portal, making it even easier to get connected with Cloudflare. When you’re looking at potential peering sessions in the portal, you’ll now see a button that will allow you to verify your peering information is correct, and if it is to proceed with a peering request:

Once you click that button, a ticket will go immediately to our network team to configure a peering session with you using the details already provided in PeeringDB. Our network team looks at whether we already have existing connections with your network at that location, and also what the impact to your Internet traffic will be if we peer with you there. Once we’ve evaluated these variables, we’ll proceed to establish a BGP session with you at the location and inform you via email that you’ve already provided via PeeringDB. Then all you have to do is accept the BGP sessions, and you’ll be exchanging traffic with Cloudflare!

Peer with Cloudflare today!

It has never been easier to peer with Cloudflare, and our simplified peering portal will make it even easier to get connected. Visit our peering portal today and get started on the path of faster, cheaper connectivity to Cloudflare!

Monitor your own network with free network flow analytics from Cloudflare

2022-09-28 Chris Draper

Post Syndicated from Chris Draper original https://blog.cloudflare.com/free-magic-network-monitoring/

Monitor your own network with free network flow analytics from Cloudflare

As a network engineer or manager, answering questions about the traffic flowing across your infrastructure is a key part of your job. Cloudflare built Magic Network Monitoring (previously called Flow Based Monitoring) to give you better visibility into your network and to answer questions like, “What is my network’s peak traffic volume? What are the sources of that traffic? When does my network see that traffic?” Today, Cloudflare is excited to announce early access to a free version of Magic Network Monitoring that will be available to everyone. You can request early access by filling out this form.

Magic Network Monitoring now features a powerful analytics dashboard, self-serve configuration, and a step-by-step onboarding wizard. You’ll have access to a tool that helps you visualize your traffic and filter by packet characteristics including protocols, source IPs, destination IPs, ports, TCP flags, and router IP. Magic Network Monitoring also includes network traffic volume alerts for specific IP addresses or IP prefixes on your network.

Making Network Monitoring easy

Magic Networking Monitoring allows customers to collect network analytics without installing a physical device like a network TAP (Test Access Point) or setting up overly complex remote monitoring systems. Our product works with any hardware that exports network flow data, and customers can quickly configure any router to send flow data to Cloudflare’s network. From there, our network flow analyzer will aggregate your traffic data and display it in Magic Network Monitoring analytics.

Analytics dashboard

In Magic Network Monitoring analytics, customers can take a deep dive into their network traffic data. You can filter traffic data by protocol, source IP, destination IP, TCP flags, and router IP. Customers can combine these filters together to answer questions like, “How much ICMP data was requested from my speed test server over the past 24 hours?” Visibility into traffic analytics is a key part of understanding your network’s operations and proactively improving your security. Let’s walk through some cases where Magic Network Monitoring analytics can answer your network visibility and security questions.

Create network volume alert thresholds per IP address or IP prefix

Magic Network Monitoring is incredibly flexible, and it can be customized to meet the needs of any network hobbyist or business. You can monitor your traffic volume trends over time via the analytics dashboard and build an understanding of your network’s traffic profile. After gathering historical network data, you can set custom volumetric threshold alerts for one IP prefix or a group of IP prefixes. As your network traffic changes over time, or their network expands, they can easily update their Magic Network Monitoring configuration to receive data from new routers or destinations within their network.

Monitoring a speed test server in a home lab

Let’s run through an example where you’re running a network home lab. You decide to use Magic Network Monitoring to track the volume of requests a speed test server you’re hosting receives and check for potential bad actors. Your goal is to identify when your speed test server experiences peak traffic, and the volume of that traffic. You set up Magic Network Monitoring and create a rule that analyzes all traffic destined for your speed test server’s IP address. After collecting data for seven days, the analytics dashboard shows that peak traffic occurs on weekdays in the morning, and that during this time, your traffic volume ranges from 450 – 550 Mbps.

As you’re checking over the analytics data, you also notice strange traffic spikes of 300 – 350 Mbps in the middle of the night that occur at the same time. As you investigate further, the analytics dashboard shows the source of this traffic spike is from the same IP prefix. You research some source IPs, and find they’re associated with malicious activity. As a result, you update your firewall to block traffic from this problematic source.

Identifying a network layer DDoS attack

Magic Network Monitoring can also be leveraged to identify a variety of L3, L4, and L7 DDoS attacks. Let’s run through an example of how ACME Corp, a small business using Magic Network Monitoring, can identify a Ping (ICMP) Flood attack on their network. Ping Flood attacks aim to overwhelm the targeted network’s ability to respond to a high number of requests or overload the network connection with bogus traffic.

At the start of a Ping Flood attack, your server’s traffic volume will begin to ramp up. Magic Network Monitoring will analyze traffic across your network, and send an email, webhook, or PagerDuty alert once an unusual volume of traffic is identified. Your network and security team can respond to the volumetric alert by checking the data in Magic Network Monitoring analytics and identifying the attack type. In this case, they’ll notice the following traffic characteristics:

Network traffic volume above your historical traffic averages
An unusually large amount of ICMP traffic
ICMP traffic coming from a specific set of source IPs

Now, your network security team has confirmed the traffic is malicious by identifying the attack type, and can begin taking steps to mitigate the attack.

Magic Network Monitoring and Magic Transit

If your business is impacted by DDoS attacks, Magic Network Monitoring will identify attacks, and Magic Transit can be used to mitigate those DDoS attacks. Magic Transit protects customers’ entire network from DDoS attacks by placing our network in front of theirs. You can use Magic Transit Always On to reduce latency and mitigate attacks all the time, or Magic Transit On Demand to protect your network during active attacks. With Magic Transit, you get DDoS protection, traffic acceleration, and other network functions delivered as a service from every Cloudflare data center. Magic Transit works by allowing Cloudflare to advertise customers’ IP prefixes to the Internet with BGP to route the customer’s traffic through our network for DDoS protection. If you’re interested in protecting your network with Magic Transit, you can visit the Magic Transit product page and request a demo today.

The free version of Magic Network Monitoring (MNM) will be released in the next few weeks. You can request early access by filling out this form.

This is just the beginning for Magic Network Monitoring. In the future, you can look forward to features like advanced DDoS attack identification, network incident history and trends, and volumetric alert threshold recommendations.

Bringing Zero Trust to mobile network operators

2022-09-26 Mike Conlow

Post Syndicated from Mike Conlow original https://blog.cloudflare.com/zero-trust-for-mobile-operators/

Bringing Zero Trust to mobile network operators

At Cloudflare, we’re excited about the quickly-approaching 5G future. Increasingly, we’ll have access to high throughput and low-latency wireless networks wherever we are. It will make the Internet feel instantaneous, and we’ll find new uses for this connectivity such as sensors that will help us be more productive and energy-efficient. However, this type of connectivity doesn’t have to come at the expense of security, a concern raised in this recent Wired article. Today we’re announcing the creation of a new partnership program for mobile networks—Zero Trust for Mobile Operators—to jointly solve the biggest security and performance challenges.

SASE for Mobile Networks

Every network is different, and the key to managing the complicated security environment of an enterprise network is having lots of tools in the toolbox. Most of these functions fall under the industry buzzword SASE, which stands for Secure Access Service Edge. Cloudflare’s SASE product is Cloudflare One, and it’s a comprehensive platform for network operators. It includes:

Magic WAN, which offers secure Network-as-a-Service (NaaS) connectivity for your data centers, branch offices and cloud VPCs and integrates with your legacy MPLS networks
Cloudflare Access, which is a Zero Trust Network Access (ZTNA) service requiring strict verification for every user and every device before authorizing them to access internal resources.
Gateway, our Secure Web Gateway, which operates between a corporate network and the Internet to enforce security policies and protect company data.
A Cloud Access Security Broker, which monitors the network and external cloud services for security threats.
Cloudflare Area 1, an email threat detection tool to scan email for phishing, malware, and other threats.

We’re excited to partner with mobile network operators for these services because our networks and services are tremendously complementary. Let’s first think about SD-WAN (Software-Defined Wide Area Network) connectivity, which is the foundation on which much of the SASE framework rests. As an example, imagine a developer working from home developing a solution with a Mobile Network Operator’s (MNO) Internet of Things APIs. Maybe they’re developing tracking software for the number of drinks left in a soda machine, or want to track the routes for delivery trucks.

The developer at home and their fleet of devices should be on the same wide area network, securely, and at reasonable cost. What Cloudflare provides is the programmable software layer that enables this secure connectivity. The developer and the developer’s employer still need to have connectivity to the Internet at home, and for the fleet of devices. The ability to make a secure connection to your fleet of devices doesn’t do any good without enterprise connectivity, and the enterprise connectivity is only more valuable with the secure connection running on top of it. They’re the perfect match.

Once the connectivity is established, we can layer on a Zero Trust platform to ensure every user can only access a resource to which they’ve been explicitly granted permission. Any time a user wants to access a protected resource – via ssh, to a cloud service, etc. – they’re challenged to authenticate with their single-sign-on credentials before being allowed access. The networks we use are growing and becoming more distributed. A Zero Trust architecture enables that growth while protecting against known risks.

Edge Computing

Given the potential of low-latency 5G networks, consumers and operators are both waiting for a “killer 5G app”. Maybe it will be autonomous vehicles and virtual reality, but our bet is on a quieter revolution: moving compute – the “work” that a server needs to do to respond to a request – from big regional data centers to small city-level data centers, embedding the compute capacity inside wireless networks, and eventually even to the base of cell towers.

Cloudflare’s edge compute platform is called Workers, and it does exactly this – execute code at the edge. It’s designed to be simple. When a developer is building an API to support their product or service, they don’t want to worry about regions and availability zones. With Workers, a developer writes code they want executed at the edge, deploys it, and within seconds it’s running at every Cloudflare data center globally.

Some workloads we already see, and expect to see more of, include:

IoT (Internet of Things) companies implementing complex device logic and security features directly at the edge, letting them add cutting-edge capabilities without adding cost or latency to their devices.
eCommerce platforms storing and caching customized assets close to their visitors for improved customer experience and great conversion rates.
Financial data platforms, including new Web3 players, providing near real-time information and transactions to their users.
A/B testing and experimentation run at the edge without adding latency or introducing dependencies on the client-side.
Fitness-type devices tracking a user’s movement and health statistics can offload compute-heavy workloads while maintaining great speed/latency.
Retail applications providing fast service and a customized experience for each customer without an expensive on-prem solution.

The Cloudflare Case Studies section has additional examples from NCR, Edgemesh, BlockFi, and others on how they’re using the Workers platform. While these examples are exciting, we’re most excited about providing the platform for new innovation.

You may have seen last week we announced Workers for Platforms is now in General Availability. Workers for Platforms is an umbrella-like structure that allows a parent organization to enable Workers for their own customers. As an MNO, your focus is on providing the means for devices to send communication to clients. For IoT use cases, sending data is the first step, but the exciting potential of this connectivity is the applications it enables. With Workers for Platforms, MNOs can expose an embedded product that allows customers to access compute power at the edge.

Network Infrastructure

The complementary networks between mobile networks and Cloudflare is another area of opportunity. When a user is interacting with the Internet, one of the most important factors for the speed of their connection is the physical distance from their handset to the content and services they’re trying to access. If the data request from a user in Denver needs to wind its way to one of the major Internet hubs in Dallas, San Jose, or Chicago (and then all the way back!), that is going to be slow. But if the MNO can link to the service locally in Denver, the connection will be much faster.

One of the exciting developments with new 5G networks is the ability of MNOs to do more “local breakout”. Many MNOs are moving towards cloud-native and distributed radio access networks (RANs) which provides more flexibility to move and multiply packet cores. These packet cores are the heart of a mobile network and all of a subscriber’s data flows through one.

For Cloudflare – with a data center presence in 275+ cities globally – a user never has to wait long for our services. We can also take it a step further. In some cases, our services are embedded within the MNO or ISP’s own network. The traffic which connects a user to a device, authorizes the connection, and securely transmits data is all within the network boundary of the MNO – it never needs to touch the public Internet, incur added latency, or otherwise compromise the performance for your subscribers.

We’re excited to partner with mobile networks because our security services work best when our customers have excellent enterprise connectivity underneath. Likewise, we think mobile networks can offer more value to their customers with our security software added on top. If you’d like to talk about how to integrate Cloudflare One into your offerings, please email us at [email protected], and we’ll be in touch!

Identifying publicly accessible resources with Amazon VPC Network Access Analyzer

2022-08-22 Patrick Duffy

Post Syndicated from Patrick Duffy original https://aws.amazon.com/blogs/security/identifying-publicly-accessible-resources-with-amazon-vpc-network-access-analyzer/

Network and security teams often need to evaluate the internet accessibility of all their resources on AWS and block any non-essential internet access. Validating who has access to what can be complicated—there are several different controls that can prevent or authorize access to resources in your Amazon Virtual Private Cloud (Amazon VPC). The recently launched Amazon VPC Network Access Analyzer helps you understand potential network paths to and from your resources without having to build automation or manually review security groups, network access control lists (network ACLs), route tables, and Elastic Load Balancing (ELB) configurations. You can use this information to add security layers, such as moving instances to a private subnet behind a NAT gateway or moving APIs behind AWS PrivateLink, rather than use public internet connectivity. In this blog post, we show you how to use Network Access Analyzer to identify publicly accessible resources.

What is Network Access Analyzer?

Network Access Analyzer allows you to evaluate your network against your design requirements and network security policy. You can specify your network security policy for resources on AWS through a Network Access Scope. Network Access Analyzer evaluates the configuration of your Amazon VPC resources and controls, such as security groups, elastic network interfaces, Amazon Elastic Compute Cloud (Amazon EC2) instances, load balancers, VPC endpoint services, transit gateways, NAT gateways, internet gateways, VPN gateways, VPC peering connections, and network firewalls.

Network Access Analyzer uses automated reasoning to produce findings of potential network paths that don’t meet your network security policy. Network Access Analyzer reasons about all of your Amazon VPC configurations together rather than in isolation. For example, it produces findings for paths from an EC2 instance to an internet gateway only when the following conditions are met: the security group allows outbound traffic, the network ACL allows outbound traffic, and the instance’s route table has a route to an internet gateway (possibly through a NAT gateway, network firewall, transit gateway, or peering connection). Network Access Analyzer produces actionable findings with more context such as the entire network path from the source to the destination, as compared to the isolated rule-based checks of individual controls, such as security groups or route tables.

Sample environment

Let’s walk through a real-world example of using Network Access Analyzer to detect publicly accessible resources in your environment. Figure 1 shows an environment for this evaluation, which includes the following resources:

An EC2 instance in a public subnet allowing inbound public connections on port 80/443 (HTTP/HTTPS).
An EC2 instance in a private subnet allowing connections from an Application Load Balancer on port 80/443.
An Application Load Balancer in a public subnet with a Target Group connected to the private web server, allowing public connections on port 80/443.
An Amazon Aurora database in a public subnet allowing public connections on port 3306 (MySQL).
An Aurora database in a private subnet.
An EC2 instance in a public subnet allowing public connections on port 9200 (OpenSearch/Elasticsearch).
An Amazon EMR cluster allowing public connections on port 8080.
A Windows EC2 instance in a public subnet allowing public connections on port 3389 (Remote Desktop Protocol).

Figure 1: Example environment of web servers hosted on EC2 instances, remote desktop servers hosted on EC2, Relational Database Service (RDS) databases, Amazon EMR cluster, and OpenSearch cluster on EC2

Let us assume that your organization’s security policy requires that your databases and analytics clusters not be directly accessible from the internet, whereas certain workload such as instances for web services can have internet access only through an Application Load Balancer over ports 80 and 443. Network Access Analyzer allows you to evaluate network access to resources in your VPCs, including database resources such as Amazon RDS and Amazon Aurora clusters, and analytics resources such as Amazon OpenSearch Service clusters and Amazon EMR clusters. This allows you to govern network access to your resources on AWS, by identifying network access that does not meet your security policies, and creating exclusions for paths that do have the appropriate network controls in place.

Configure Network Access Analyzer

In this section, you will learn how to create network scopes, analyze the environment, and review the findings produced. You can create network access scopes by using the AWS Command Line Interface (AWS CLI) or AWS Management Console. When creating network access scopes using the AWS CLI, you can supply the scope by using a JSON document. This blog post provides several network access scopes as JSON documents that you can deploy to your AWS accounts.

To create a network scope (AWS CLI)

Verify that you have the AWS CLI installed and configured.
Download the network-scopes.zip file, which contains JSON documents that detect the following publicly accessible resources:
- OpenSearch/Elasticsearch clusters
- Databases (MySQL, PostgreSQL, MSSQL)
- EMR clusters
- Windows Remote Desktop
- Web servers that can be accessed without going through a load balancer
Make note of the folder where you save the JSON scopes because you will need it for the next step.
Open a systems shell, such as Bash, Zsh, or cmd.
Navigate to the folder where you saved the preceding JSON scopes.

Run the following commands in the shell window:

aws ec2 create-network-insights-access-scope 
--cli-input-json file://detect-public-databases.json 
--tag-specifications 'ResourceType="network-insights-access-scope",
Tags=[{Key="Name",Value="detect-public-databases"},{Key="Description",
		   Value="Detects publicly accessible databases."}]' 
--region us-east-1

aws ec2 create-network-insights-access-scope 
--cli-input-json file://detect-public-elastic.json 
--tag-specifications 'ResourceType="network-insights-access-scope",
Tags=[{Key="Name",Value="detect-public-opensearch"},{Key="Description",
		   Value="Detects publicly accessible OpenSearch/Elasticsearch endpoints."}]' 
--region us-east-1

aws ec2 create-network-insights-access-scope 
--cli-input-json file://detect-public-emr.json 
--tag-specifications 'ResourceType="network-insights-access-scope",
Tags=[{Key="Name",Value="detect-public-emr"},{Key="Description",
		   Value="Detects publicly accessible Amazon EMR endpoints."}]'
--region us-east-1

aws ec2 create-network-insights-access-scope 
--cli-input-json file://detect-public-remotedesktop.json 
--tag-specifications 'ResourceType="network-insights-access-scope",
Tags=[{Key="Name",Value="detect-public-remotedesktop"},{Key="Description",
		   Value="Detects publicly accessible Microsoft Remote Desktop servers."}]' 
--region us-east-1

aws ec2 create-network-insights-access-scope 
--cli-input-json file://detect-public-webserver-noloadbalancer.json 
--tag-specifications 'ResourceType="network-insights-access-scope",
Tags=[{Key="Name",Value="detect-public-webservers"},{Key="Description",
		   Value="Detects publicly accessible web servers that can be accessed without using a load balancer."}]' 
--region us-east-1

Now that you’ve created the scopes, you will analyze them to find resources that match your match conditions.

To analyze your scopes (console)

Open the Amazon VPC console.
In the navigation pane, under Network Analysis, choose Network Access Analyzer.
Under Network Access Scopes, select the checkboxes next to the scopes that you want to analyze, and then choose Analyze, as shown in Figure 2.

Figure 2: Custom network scopes created for Network Access Analyzer

If Network Access Analyzer detects findings, the console indicates the status Findings detected for each scope, as shown in Figure 3.

Figure 3: Network Access Analyzer scope status

To review findings for a scope (console)

On the Network Access Scopes page, under Network Access Scope ID, select the link for the scope that has the findings that you want to review. This opens the latest analysis, with the option to review past analyses, as shown in Figure 4.

Figure 4: Finding summary identifying Amazon Aurora instance with public access to port 3306
To review the path for a specific finding, under Findings, select the radio button to the left of the finding, as shown in Figure 4. Figure 5 shows an example of a path for a finding.

Figure 5: Finding details showing access to the Amazon Aurora instance from the internet gateway to the elastic network interface, allowed by a network ACL and security group.
Choose any resource in the path for detailed information, as shown in Figure 6.

Figure 6: Resource detail within a finding outlining a specific security group allowing access on port 3306

How to remediate findings

After deploying network scopes and reviewing findings for publicly accessible resources, you should next limit access to those resources and remove public access. Use cases vary, but the scopes outlined in this post identify resources that you should share publicly in a more secure manner or remove public access entirely. The following techniques will help you align to the Protecting Networks portion of the AWS Well-Architected Framework Security Pillar.

If you have a need to share a database with external entities, consider using AWS PrivateLink, VPC peering, or use AWS Site-to-Site VPN to share access. You can remove public access by modifying the security group attached to the RDS instance or EC2 instance serving the database, but you should migrate the RDS database to a private subnet as well.

When creating web servers in EC2, you should not place web servers directly in a public subnet with security groups allowing HTTP and HTTPS ports from all internet addresses. Instead, you should place your EC2 instances in private subnets and use Application Load Balancers in a public subnet. From there, you can attach a security group that allows HTTP/HTTPS access from public internet addresses to your Application Load Balancer, and attach a security group that allows HTTP/HTTPS from your Load Balancer security group to your web server EC2 instances. You can also associate AWS WAF web ACLs to the load balancer to protect your web applications or APIs against common web exploits and bots that may affect availability, compromise security, or consume excessive resources.

Similarly, if you have OpenSearch/Elasticsearch running on EC2 or Amazon OpenSearch Service, or are using Amazon EMR, you can share these resources using PrivateLink. Use the Amazon EMR block public access configuration to verify that your EMR clusters are not shared publicly.

To connect to Remote Desktop on EC2 instances, you should use AWS Systems Manager to connect using Fleet Manager. Connecting with Fleet Manager only requires your Windows EC2 instances to be a managed node. When connecting using Fleet Manager, the security group requires no inbound ports, and the instance can be in a private subnet. For more information, see the Systems Manager prerequisites.

Conclusion

This blog post demonstrates how you can identify and remediate publicly accessible resources. Amazon VPC Network Access Analyzer helps you identify available network paths by using automated reasoning technology and user-defined access scopes. By using these scopes, you can define non-permitted network paths, identify resources that have those paths, and then take action to increase your security posture. To learn more about building continuous verification of network compliance at scale, see the blog post Continuous verification of network compliance using Amazon VPC Network Access Analyzer and AWS Security Hub. Take action today by deploying the Network Access Analyzer scopes in this post to evaluate your environment and add layers of security to best fit your needs.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Integrating Network Analytics Logs with your SIEM dashboard

2022-05-17 Omer Yoachimik

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/network-analytics-logs/

Integrating Network Analytics Logs with your SIEM dashboard

We’re excited to announce the availability of Network Analytics Logs. Magic Transit, Magic Firewall, Magic WAN, and Spectrum customers on the Enterprise plan can feed packet samples directly into storage services, network monitoring tools such as Kentik, or their Security Information Event Management (SIEM) systems such as Splunk to gain near real-time visibility into network traffic and DDoS attacks.

What’s included in the logs

By creating a Network Analytics Logs job, Cloudflare will continuously push logs of packet samples directly to the HTTP endpoint of your choice, including Websockets. The logs arrive in JSON format which makes them easy to parse, transform, and aggregate. The logs include packet samples of traffic dropped and passed by the following systems:

Network-layer DDoS Protection Ruleset
Advanced TCP Protection
Magic Firewall

Note that not all mitigation systems are applicable to all Cloudflare services. Below is a table describing which mitigation service is applicable to which Cloudflare service:

Mitigation System	Cloudflare Service
Mitigation System	Magic Transit	Magic WAN	Spectrum
Network-layer DDoS Protection Ruleset	✅	❌	✅
Advanced TCP Protection	✅	❌	❌
Magic Firewall	✅	✅	❌

Packets are processed by the mitigation systems in the order outlined above. Therefore, a packet that passed all three systems may produce three packet samples, one from each system. This can be very insightful when troubleshooting and wanting to understand where in the stack a packet was dropped. To avoid overcounting the total passed traffic, Magic Transit users should only take into consideration the passed packets from the last mitigation system, Magic Firewall.

An example of a packet sample log:

{"AttackCampaignID":"","AttackID":"","ColoName":"bkk06","Datetime":1652295571783000000,"DestinationASN":13335,"Direction":"ingress","IPDestinationAddress":"(redacted)","IPDestinationSubnet":"/24","IPProtocol":17,"IPSourceAddress":"(redacted)","IPSourceSubnet":"/24","MitigationReason":"","MitigationScope":"","MitigationSystem":"magic-firewall","Outcome":"pass","ProtocolState":"","RuleID":"(redacted)","RulesetID":"(redacted)","RulesetOverrideID":"","SampleInterval":100,"SourceASN":38794,"Verdict":"drop"}

All the available log fields are documented here: https://developers.cloudflare.com/logs/reference/log-fields/account/network_analytics_logs/

Setting up the logs

In this walkthrough, we will demonstrate how to feed the Network Analytics Logs into Splunk via Postman. At this time, it is only possible to set up Network Analytics Logs via API. Setting up the logs requires three main steps:

Create a Cloudflare API token.
Create a Splunk Cloud HTTP Event Collector (HEC) token.
Create and enable a Cloudflare Logpush job.

Let’s get started!

1) Create a Cloudflare API token

Log in to your Cloudflare account and navigate to My Profile.
On the left-hand side, in the collapsing navigation menu, click API Tokens.
Click Create Token and then, under Custom token, click Get started.
Give your custom token a name, and select an Account scoped permission to edit Logs. You can also scope it to a specific/subset/all of your accounts.
At the bottom, click Continue to summary, and then Create Token.
Copy and save your token. You can also test your token with the provided snippet in Terminal.

When you’re using an API token, you don’t need to provide your email address as part of the API credentials.

Read more about creating an API token on the Cloudflare Developers website: https://developers.cloudflare.com/api/tokens/create/

2) Create a Splunk token for an HTTP Event Collector

In this walkthrough, we’re using a Splunk Cloud free trial, but you can use almost any service that can accept logs over HTTPS. In some cases, if you’re using an on-premise SIEM solution, you may need to allowlist Cloudflare IP address in your firewall to be able to receive the logs.

Create a Splunk Cloud account. I created a trial account for the purpose of this blog.
In the Splunk Cloud dashboard, go to Settings > Data Input.
Next to HTTP Event Collector, click Add new.
Follow the steps to create a token.
Copy your token and your allocated Splunk hostname and save both for later.

Read more about using Splunk with Cloudflare Logpush on the Cloudflare Developers website: https://developers.cloudflare.com/logs/get-started/enable-destinations/splunk/

Read more about creating an HTTP Event Collector token on Splunk’s website: https://docs.splunk.com/Documentation/Splunk/8.2.6/Data/UsetheHTTPEventCollector

3) Create a Cloudflare Logpush job

Creating and enabling a job is very straightforward. It requires only one API call to Cloudflare to create and enable a job.

To send the API calls I used Postman, which is a user-friendly API client that was recommended to me by a colleague. It allows you to save and customize API calls. You can also use Terminal/CMD or any other API client/script of your choice.

One thing to notice is Network Analytics Logs are account-scoped. The API endpoint is therefore a tad different from what you would normally use for zone-scoped datasets such as HTTP request logs and DNS logs.

This is the endpoint for creating an account-scoped Logpush job:

https://api.cloudflare.com/client/v4/accounts/{account-id}/logpush/jobs

Your account identifier number is a unique identifier of your account. It is a string of 32 numbers and characters. If you’re not sure what your account identifier is, log in to Cloudflare, select the appropriate account, and copy the string at the end of the URL.

https://dash.cloudflare.com/{account-id}

Then, set up a new request in Postman (or any other API client/CLI tool).

To successfully create a Logpush job, you’ll need the HTTP method, URL, Authorization token, and request body (data). The request body must include a destination configuration (destination_conf), the specified dataset (network_analytics_logs, in our case), and the token (your Splunk token).

Method:

POST

URL:

https://api.cloudflare.com/client/v4/accounts/{account-id}/logpush/jobs

Authorization: Define a Bearer authorization in the Authorization tab, or add it to the header, and add your Cloudflare API token.

Body: Select a Raw > JSON

{
"destination_conf": "{your-unique-splunk-configuration}",
"dataset": "network_analytics_logs",
"token": "{your-splunk-hec-tag}",
"enabled": "true"
}

If you’re using Splunk Cloud, then your unique configuration has the following format:

{your-unique-splunk-configuration}=splunk://{your-splunk-hostname}.splunkcloud.com:8088/services/collector/raw?channel={channel-id}&header_Authorization=Splunk%20{your-splunk–hec-token}&insecure-skip-verify=false

Definition of the variables:

{your-splunk-hostname}= Your allocated Splunk Cloud hostname.

{channel-id} = A unique ID that you choose to assign for.`{your-splunk–hec-token}` = The token that you generated for your Splunk HEC.

An important note is that customers should have a valid SSL/TLS certificate on their Splunk instance to support an encrypted connection.

After you’ve done that, you can create a GET request to the same URL (no request body needed) to verify that the job was created and is enabled.

The response should be similar to the following:

{
    "errors": [],
    "messages": [],
    "result": {
        "id": {job-id},
        "dataset": "network_analytics_logs",
        "frequency": "high",
        "kind": "",
        "enabled": true,
        "name": null,
        "logpull_options": null,
        "destination_conf": "{your-unique-splunk-configuration}",
        "last_complete": null,
        "last_error": null,
        "error_message": null
    },
    "success": true
}

Shortly after, you should start receiving logs to your Splunk HEC.

Read more about enabling Logpush on the Cloudflare Developers website: https://developers.cloudflare.com/logs/reference/logpush-api-configuration/examples/example-logpush-curl/

Reduce costs with R2 storage

Depending on the amount of logs that you read and write, the cost of third party cloud storage can skyrocket — forcing you to decide between managing a tight budget and being able to properly investigate networking and security issues. However, we believe that you shouldn’t have to make those trade-offs. With R2’s low costs, we’re making this decision easier for our customers. Instead of feeding logs to a third party, you can reap the cost benefits of storing them in R2.

To learn more about the R2 features and pricing, check out the full blog post. To enable R2, contact your account team.

Cloudflare logs for maximum visibility

Cloudflare Enterprise customers have access to detailed logs of the metadata generated by our products. These logs are helpful for troubleshooting, identifying network and configuration adjustments, and generating reports, especially when combined with logs from other sources, such as your servers, firewalls, routers, and other appliances.

Network Analytics Logs joins Cloudflare’s family of products on Logpush: DNS logs, Firewall events, HTTP requests, NEL reports, Spectrum events, Audit logs, Gateway DNS, Gateway HTTP, and Gateway Network.

Not using Cloudflare yet? Start now with our Free and Pro plans to protect your websites against DDoS attacks, or contact us for comprehensive DDoS protection and firewall-as-a-service for your entire network.

Cloudflare partners with Microsoft to protect joint customers with a Global Zero Trust Network

2022-03-18 Abhi Das

Post Syndicated from Abhi Das original https://blog.cloudflare.com/cloudflare-partners-with-microsoft-to-protect-joint-customers-with-global-zero-trust-network/

Cloudflare partners with Microsoft to protect joint customers with a Global Zero Trust Network

As a company, we are constantly asking ourselves what we can do to provide more value to our customers, including integrated solutions with our partners. Joint customers benefit from our integrations below with Azure Active Directory by:

First, centralized identity and access management via Azure Active Directory which provides single sign-on, multifactor authentication, and access via conditional authentication.

Second, policy oriented access to specific applications using Cloudflare Access—a VPN replacement service.

Third, an additional layer of security for internal applications by connecting them to Cloudflare global network and not having to open them up to the whole Internet.

Let’s step back a bit.

Why Zero Trust?

Companies of all sizes are faced with an accelerating digital transformation of their IT stack and an increasingly distributed workforce, changing the definition of the security perimeter. We are moving away from the castle and moat model to the whole Internet, requiring security checks for every user accessing every resource. As a result, all companies, especially those whose use of Azure’s broad cloud portfolio is increasing, are adopting Zero Trust architectures as an essential part of their cloud and SaaS journey.

Cloudflare Access provides secure access to Azure hosted applications and on-premise applications. Also, it acts as an on-ramp to the world’s fastest network to Azure and the rest of the Internet. Users connect from their devices or offices via Cloudflare’s network in over 250 cities around the world. You can use Cloudflare Zero Trust on that global network to ensure that every request to every resource is evaluated for security, including user identity. We are excited to bring this secure global network on-ramp to Azure hosted applications and on-premise applications.

Also, performance is one of our key advantages in addition to security. Cloudflare serves over 32 million HTTP requests per second, on average, for the millions of customers who secure their applications on our network. When those applications do not run on our network, we can rely on our own global private backbone and our connectivity with over 10,000 networks globally to connect the user.

We are excited to bring this global security and performance perimeter network as our Cloudflare Zero Trust product for your Azure hosted applications and on-premises applications.

Cloudflare Access: a modern Zero Trust approach

Cloudflare’s Zero Trust solution Cloudflare Access provides a modern approach to authentication for internally managed applications. When corporate applications on Azure or on-premise are protected with Cloudflare Access, they feel like SaaS applications, and employees can log in to them with a simple and consistent flow. Cloudflare Access acts as a unified reverse proxy to enforce access control by making sure every request is authenticated, authorized, and encrypted.

Identity: Cloudflare Access integrates out of the box with all the major identity providers, including Azure Active Directory, allowing use of the policies and users you already created to provide conditional access to your web applications. For example, you can use Cloudflare Access to ensure that only company employees and no contractors can get to your internal kanban board, or you can lock down the SAP finance application hosted on Azure or on-premise.

Devices: You can use TLS with Client Authentication and limit connections only to devices with a unique client certificate. Cloudflare will ensure the connecting device has a valid client certificate signed by the corporate CA, and authenticate user credentials to grant access to an internal application.

Additional security: Want to use Cloudflare Access in front of an internal application but don’t want to open up that application to the whole Internet? For additional security, you can combine Access with Cloudflare Tunnel. Cloudflare Tunnel will connect from your Azure environment directly to Cloudflare’s network, so there is no publicly accessible IP.

Secure both your legacy applications and Azure hosted applications via Azure AD and Cloudflare Access jointly

For on-premise legacy applications, we are excited to announce that Cloudflare is an Azure Active Directory secure hybrid access partner. Azure AD secure hybrid access enables customers to centrally manage access to their legacy on-premise applications using SSO authentication without incremental development. Starting today, joint customers can easily use Cloudflare Access solution as an additional layer of security with built-in performance, in front of their legacy applications.

Traditionally for on-premise applications, customers have to change their existing code or add additional layers of code to integrate Azure AD or Cloudflare Access–like capabilities. With the help of Azure Active Directory secure hybrid access, customers can integrate these capabilities seamlessly without much code changes. Once integrated, customers can take advantage of the below Azure AD features and more:

Multi-factor authentication (MFA)
Single sign-on (SSO)
Passwordless authentication
Unified user access management
Azure AD Conditional Access and device trust

Very similarly, the Azure AD and Cloudflare Access combo can also be used to secure your Azure hosted applications. Cloudflare Access enables secure on-ramp to Azure hosted applications or on-premise applications via the below two integrations:

1. Cloudflare Access integration with Azure AD:

Cloudflare Access is a Zero Trust Network Access (ZTNA) solution that allows you to configure precise access policies across their applications. You can integrate Microsoft Azure Active Directory with Cloudflare Zero Trust and build rules based on user identity, group membership and Azure AD Conditional Access policies. Users will authenticate with their Azure AD credentials and connect to Cloudflare Access. Additional policy controls include Device Posture, Network, Location and more. Setup typically takes less than a few hours!

2. Cloudflare Tunnel integration with Azure:

Cloudflare Tunnel can expose applications running on the Microsoft Azure platform. See guide to install and configure Cloudflare Tunnel. Also, a prebuilt Cloudflare Linux image exists on the Azure Marketplace. To simplify the process of connecting Azure applications to Cloudflare’s network, deploy the prebuilt image to an Azure resource group. Cloudflare Tunnel is now available on Microsoft’s Azure marketplace.

“The hybrid work environment has accelerated the cloud transition and increased the need for CIOs everywhere to provide secure and performant access to applications for their employees. This is especially true for self-hosted applications. Cloudflare’s global network security perimeter via Cloudflare Access provides this additional layer of security together with Azure Active Directory to enable employees to get work done from anywhere securely and performantly.” said David Gregory, Principal Program Manager
– Microsoft Azure Active Directory

Conclusion

Over the last ten years, Cloudflare has built one of the fastest, most reliable, most secure networks in the world. You now have the ability to use that network as a global security and performance perimeter to your Azure hosted applications via the above integrations, and it is easy.

What’s next?

In the coming months, we will be further strengthening our integrations with Microsoft Azure allowing customers to better implement our SASE perimeter.

If you’re using Cloudflare Zero Trust products today and are interested in using this integration with Azure, please visit our above developer documentation to learn about how you can enable it. If you want to learn more or have additional questions, please fill out the form or get in touch with your Cloudflare CSM or AE, and we’ll be happy to help you.

Interconnect Anywhere — Reach Cloudflare’s network from 1,600+ locations

2021-06-11 Matt Lewis

Post Syndicated from Matt Lewis original https://blog.cloudflare.com/interconnect-anywhere/

Interconnect Anywhere — Reach Cloudflare’s network from 1,600+ locations

Customers choose Cloudflare for our network performance, privacy and security. Cloudflare Network Interconnect is the best on-ramp for our customers to utilize our diverse product suite. In the past, we’ve talked about Cloudflare’s physical footprint in over 200+ data centers, and how Cloudflare Network Interconnect enabled companies in those data centers to connect securely to Cloudflare’s network. Today, Cloudflare is excited to announce expanded partnerships that allows customers to connect to Cloudflare from their own Layer 2 service fabric. There are now over 1,600 locations where enterprise security and network professionals have the option to connect to Cloudflare securely and privately from their existing fabric.

Interconnect Anywhere is a journey

Since we launched Cloudflare Network Interconnect (CNI) in August 2020, we’ve been focused on extending the availability of Cloudflare’s network to as many places as possible. The initial launch opened up 150 physical locations alongside 25 global partner locations. During Security Week this year, we grew that availability by adding data center partners to our CNI Partner Program. Today, we are adding even more connectivity options by expanding Cloudflare availability to all of our partners’ locations, as well as welcoming CoreSite Open Cloud Exchange (OCX) and Infiny by Epsilon into our CNI Partner Program. This totals 1,638 locations where our customers can now connect securely to the Cloudflare network. As we continue to expand, customers are able to connect the fabric of their choice to Cloudflare from a growing list of data centers.

Fabric Partner	Enabled Locations
PacketFabric	180+
Megaport	700+
Equinix Fabric	46+
Console Connect	440+
CoreSite Open Cloud Exchange	22+
Infiny by Epsilon	250+

“We are excited to expand our partnership with Cloudflare to ensure that our mutual customers can benefit from our carrier-class Software Defined Network (SDN) and Cloudflare’s network security in all Packetfabric locations. Now customers can easily connect from wherever they are located to access best of breed security services alongside Packetfabric’s Cloud Connectivity options.”
– Alex Henthorn-Iwane, PacketFabric’s Chief Marketing Officer

“With the significant rise in DDoS attacks over the past year, it’s becoming ever more crucial for IT and Operations teams to prevent and mitigate network security threats. We’re thrilled to enable Cloudflare Interconnect everywhere on Megaport’s global Software Defined Network, which is available in over 700 enabled locations in 24 countries worldwide. Our partnership will give organizations the ability to reduce their exposure to network attacks, improve customer experiences, and simplify their connectivity across our private on-demand network in a matter of a few mouse clicks.”
– Misha Cetrone, Megaport VP of Strategic Partnerships

“Expanding the connectivity options to Cloudflare ensures that customers can provision hybrid architectures faster and more easily, leveraging enterprise-class network services and automation on the Open Cloud Exchange. Simplifying the process of establishing modern networks helps achieve critical business objectives, including reducing total cost of ownership, and improving business agility as well as interoperability.”
– Brian Eichman, CoreSite Senior Director of Product Development

“Partner accessibility is key in our cloud enablement and interconnection strategy. We are continuously evolving to offer our customers and partners simple and secure connectivity no matter where their network resides in the world. Infiny enables access to Cloudflare from across a global footprint while delivering high-quality cloud connectivity solutions at scale. Customers and partners gain an innovative Network-as-a-Service SDN platform that supports them with programmable and automated connectivity for their cloud and networking needs.”
– Mark Daley, Epsilon Director of Digital Strategy

Uncompromising security and increased reliability from your choice of network fabric

Now, companies can connect to Cloudflare’s suite of network and security products without traversing shared public networks by taking advantage of software-defined networking providers. No matter where a customer is connected to one of our fabric partners, Cloudflare’s 200+ data centers ensure that world-class network security is close by and readily available via a secure, low latency, and cost-effective connection. An increased number of locations further allows customers to have multiple secure connections to the Cloudflare network, increasing redundancy and reliability. As we further expand our global network and increase the number of data centers where Cloudflare and our partners are connected, latency becomes shorter and customers will reap the benefits.

Let’s talk about how a customer can use Cloudflare Network Interconnect to improve their security posture through a fabric provider.

Plug and Play Fabric connectivity

Acme Corp is an example company that wants to deliver highly performant digital services to their customers and ensure employees can securely connect to their business apps from anywhere. They’ve purchased Magic Transit and Cloudflare Access and are evaluating Magic WAN to secure their network while getting the performance Cloudflare provides. They want to avoid potential network traffic congestion and latency delays, so they have designed a network architecture with their software-defined network fabric and Cloudflare using Cloudflare Network Interconnect.

With Cloudflare Network Interconnect, provisioning this connection is simple. Acme goes to their partner portal, and requests a virtual layer 2 connection to Cloudflare with the bandwidth of their choice. Cloudflare accepts the connection, provides the BGP session establishment information and organizes a turn up call if required. Easy!

Let’s talk about how Cloudflare and our partners have worked together to simplify the interconnectivity experience for the customer.

With Cloudflare Network interconnect, availability only gets better

When a customer uses CNI to establish a connection between a fabric partner and Cloudflare, that connection runs over layer 2, configured via the partner user interface. Our new partnership model allows customers to connect privately and securely to Cloudflare’s network even when the customer is not located in the same data center as Cloudflare.

The diagram above shows a shorter customer path thanks to an incremental partner-enabled location. Every time Cloudflare brings a new data center online and connects with a partner fabric, all customers in that region immediately benefit from the closer proximity and reduced latency.

Network fabrics in action

For those who want to self-serve, we’ve published documentation that details the steps in provisioning these connections. You can find the steps for each partner below:

PacketFabric Documentation
Megaport Documentation
Console Connect Documentation
Equinix Fabric Documentation
CoreSite Documentation (Coming soon…)
Epsilon Documentation (Coming soon…)

As we expand our network, it’s critical we provide more ways to allow our customers to connect easily. We will continue to shorten the time it takes to set up a new interconnection, drive down costs, strengthen security and improve customer experience with all of our Network On-Ramp partners.

If you are using one of our software-defined networking partners and would like to connect to Cloudflare via their fabric, contact your fabric partner account team or reach out to us using the Cloudflare Network Interconnect page. If you are not using a fabric today, but would like to take advantage of software-defined networking to connect to Cloudflare, reach out to your account team.

Conntrack turns a blind eye to dropped SYNs

2021-03-04 Jakub Sitnicki

Post Syndicated from Jakub Sitnicki original https://blog.cloudflare.com/conntrack-turns-a-blind-eye-to-dropped-syns/

Intro

Conntrack turns a blind eye to dropped SYNs

We have been working with conntrack, the connection tracking layer in the Linux kernel, for years. And yet, despite the collected know-how, questions about its inner workings occasionally come up. When they do, it is hard to resist the temptation to go digging for answers.

One such question popped up while writing the previous blog post on conntrack:

“Why are there no entries in the conntrack table for SYN packets dropped by the firewall?”

Ready for a deep dive into the network stack? Let’s find out.

Conntrack turns a blind eye to dropped SYNs — *Image* by *chulmin park* *from* *Pixabay*

We already know from last time that conntrack is in charge of tracking incoming and outgoing network traffic. By running conntrack -L we can inspect existing network flows, or as conntrack calls them, connections.

So if we spin up a toy VM, connect to it over SSH, and inspect the contents of the conntrack table, we will see…

$ vagrant init fedora/33-cloud-base
$ vagrant up
…
$ vagrant ssh
Last login: Sun Jan 31 15:08:02 2021 from 192.168.122.1
[vagrant@ct-vm ~]$ sudo conntrack -L
conntrack v1.4.5 (conntrack-tools): 0 flow entries have been shown.

… nothing!

Even though the conntrack kernel module is loaded:

[vagrant@ct-vm ~]$ lsmod | grep '^nf_conntrack\b'
nf_conntrack          163840  1 nf_conntrack_netlink

Hold on a minute. Why is the SSH connection to the VM not listed in conntrack entries? SSH is working. With each keystroke we are sending packets to the VM. But conntrack doesn’t register it.

Isn’t conntrack an integral part of the network stack that sees every packet passing through it? 🤔

Clearly everything we learned about conntrack last time is not the whole story.

Calling into conntrack

Our little experiment with SSH’ing into a VM begs the question — how does conntrack actually get notified about network packets passing through the stack?

We can walk the receive path step by step and we won’t find any direct calls into the conntrack code in either the IPv4 or IPv6 stack. Conntrack does not interface with the network stack directly.

Instead, it relies on the Netfilter framework, and its set of hooks baked into in the stack:

int ip_rcv(struct sk_buff *skb, struct net_device *dev, …)
{
    …
    return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
               net, NULL, skb, dev, NULL,
               ip_rcv_finish);
}

Netfilter users, like conntrack, can register callbacks with it. Netfilter will then run all registered callbacks when its hook processes a network packet.

For the INET family, that is IPv4 and IPv6, there are five Netfilter hooks to choose from:

Which ones does conntrack use? We will get to that in a moment.

First, let’s focus on the trigger. What makes conntrack register its callbacks with Netfilter?

The SSH connection doesn’t show up in the conntrack table just because the module is loaded. We already saw that. This means that conntrack doesn’t register its callbacks with Netfilter at module load time.

Or at least, it doesn’t do it by default. Since Linux v5.1 (May 2019) the conntrack module has the enable_hooks parameter, which causes conntrack to register its callbacks on load:

[vagrant@ct-vm ~]$ modinfo nf_conntrack
…
parm:           enable_hooks:Always enable conntrack hooks (bool)

Going back to our toy VM, let’s try to reload the conntrack module with enable_hooks set:

[vagrant@ct-vm ~]$ sudo rmmod nf_conntrack_netlink nf_conntrack
[vagrant@ct-vm ~]$ sudo modprobe nf_conntrack enable_hooks=1
[vagrant@ct-vm ~]$ sudo conntrack -L
tcp      6 431999 ESTABLISHED src=192.168.122.204 dst=192.168.122.1 sport=22 dport=34858 src=192.168.122.1 dst=192.168.122.204 sport=34858 dport=22 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.5 (conntrack-tools): 1 flow entries have been shown.
[vagrant@ct-vm ~]$

Nice! The conntrack table now contains an entry for our SSH session.

The Netfilter hook notified conntrack about SSH session packets passing through the stack.

Now that we know how conntrack gets called, we can go back to our question — can we observe a TCP SYN packet dropped by the firewall with conntrack?

Listing Netfilter hooks

That is easy to check:

Add a rule to drop anything coming to port tcp/2570²

[vagrant@ct-vm ~]$ sudo iptables -t filter -A INPUT -p tcp --dport 2570 -j DROP

2) Connect to the VM on port tcp/2570 from the outside

host $ nc -w 1 -z 192.168.122.204 2570

3) List conntrack table entries

[vagrant@ct-vm ~]$ sudo conntrack -L
tcp      6 431999 ESTABLISHED src=192.168.122.204 dst=192.168.122.1 sport=22 dport=34858 src=192.168.122.1 dst=192.168.122.204 sport=34858 dport=22 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.5 (conntrack-tools): 1 flow entries have been shown.

No new entries. Conntrack didn’t record a new flow for the dropped SYN.

But did it process the SYN packet? To answer that we have to find out which callbacks conntrack registered with Netfilter.

Netfilter keeps track of callbacks registered for each hook in instances of struct nf_hook_entries. We can reach these objects through the Netfilter state (struct netns_nf), which lives inside network namespace (struct net).

struct netns_nf {
    …
    struct nf_hook_entries __rcu *hooks_ipv4[NF_INET_NUMHOOKS];
    struct nf_hook_entries __rcu *hooks_ipv6[NF_INET_NUMHOOKS];
    …
}

struct nf_hook_entries, if you look at its definition, is a bit of an exotic construct. A glance at how the object size is calculated during its allocation gives a hint about its memory layout:

    struct nf_hook_entries *e;
    size_t alloc = sizeof(*e) +
               sizeof(struct nf_hook_entry) * num +
               sizeof(struct nf_hook_ops *) * num +
               sizeof(struct nf_hook_entries_rcu_head);

It’s an element count, followed by two arrays glued together, and some RCU-related state which we’re going to ignore. The two arrays have the same size, but hold different kinds of values.

We can walk the second array, holding pointers to struct nf_hook_ops, to discover the registered callbacks and their priority. Priority determines the invocation order.

With drgn, a programmable C debugger tailored for the Linux kernel, we can locate the Netfilter state in kernel memory, and walk its contents relatively easily. Given we know what we are looking for.

[vagrant@ct-vm ~]$ sudo drgn
drgn 0.0.8 (using Python 3.9.1, without libkdumpfile)
…
>>> pre_routing_hook = prog['init_net'].nf.hooks_ipv4[0]
>>> for i in range(0, pre_routing_hook.num_hook_entries):
...     pre_routing_hook.hooks[i].hook
...
(nf_hookfn *)ipv4_conntrack_defrag+0x0 = 0xffffffffc092c000
(nf_hookfn *)ipv4_conntrack_in+0x0 = 0xffffffffc093f290
>>>

Neat! We have a way to access Netfilter state.

Let’s take it to the next level and list all registered callbacks for each Netfilter hook (using less than 100 lines of Python):

[vagrant@ct-vm ~]$ sudo /vagrant/tools/list-nf-hooks
🪝 ipv4 PRE_ROUTING
       -400 → ipv4_conntrack_defrag     ☜ conntrack callback
       -300 → iptable_raw_hook
       -200 → ipv4_conntrack_in         ☜ conntrack callback
       -150 → iptable_mangle_hook
       -100 → nf_nat_ipv4_in

🪝 ipv4 LOCAL_IN
       -150 → iptable_mangle_hook
          0 → iptable_filter_hook
         50 → iptable_security_hook
        100 → nf_nat_ipv4_fn
 2147483647 → ipv4_confirm
…

The output from our script shows that conntrack has two callbacks registered with the PRE_ROUTING hook – ipv4_conntrack_defrag and ipv4_conntrack_in. But are they being called?

Tracing conntrack callbacks

We expect that when the Netfilter PRE_ROUTING hook processes a TCP SYN packet, it will invoke ipv4_conntrack_defrag and then ipv4_conntrack_in callbacks.

To confirm it we will put to use the tracing powers of BPF 🐝. BPF programs can run on entry to functions. These kinds of programs are known as BPF kprobes. In our case we will attach BPF kprobes to conntrack callbacks.

Usually, when working with BPF, we would write the BPF program in C and use clang -target bpf to compile it. However, for tracing it will be much easier to use bpftrace. With bpftrace we can write our BPF kprobe program in a high-level language inspired by AWK:

kprobe:ipv4_conntrack_defrag,
kprobe:ipv4_conntrack_in
{
    $skb = (struct sk_buff *)arg1;
    $iph = (struct iphdr *)($skb->head + $skb->network_header);
    $th = (struct tcphdr *)($skb->head + $skb->transport_header);

    if ($iph->protocol == 6 /* IPPROTO_TCP */ &&
        $th->dest == 2570 /* htons(2570) */ &&
        $th->syn == 1) {
        time("%H:%M:%S ");
        printf("%s:%u > %s:%u tcp syn %s\n",
               ntop($iph->saddr),
               (uint16)($th->source << 8) | ($th->source >> 8),
               ntop($iph->daddr),
               (uint16)($th->dest << 8) | ($th->dest >> 8),
               func);
    }
}

What does this program do? It is roughly an equivalent of a tcpdump filter:

dst port 2570 and tcp[tcpflags] & tcp-syn != 0

But only for packets passing through conntrack PRE_ROUTING callbacks.

(If you haven’t used bpftrace, it comes with an excellent reference guide and gives you the ability to explore kernel data types on the fly with bpftrace -lv 'struct iphdr'.)

Let’s run the tracing program while we connect to the VM from the outside (nc -z 192.168.122.204 2570):

[vagrant@ct-vm ~]$ sudo bpftrace /vagrant/tools/trace-conntrack-prerouting.bt
Attaching 3 probes...
Tracing conntrack prerouting callbacks... Hit Ctrl-C to quit
13:22:56 192.168.122.1:33254 > 192.168.122.204:2570 tcp syn ipv4_conntrack_defrag
13:22:56 192.168.122.1:33254 > 192.168.122.204:2570 tcp syn ipv4_conntrack_in
^C

[vagrant@ct-vm ~]$

Conntrack callbacks have processed the TCP SYN packet destined to tcp/2570.

But if conntrack saw the packet, why is there no corresponding flow entry in the conntrack table?

Going down the rabbit hole

What actually happens inside the conntrack PRE_ROUTING callbacks?

To find out, we can trace the call chain that starts on entry to the conntrack callback. The function_graph tracer built into the Ftrace framework is perfect for this task.

But because all incoming traffic goes through the PRE_ROUTING hook, including our SSH connection, our trace will be polluted with events from SSH traffic. To avoid that, let’s switch from SSH access to a serial console.

When using libvirt as the Vagrant provider, you can connect to the serial console with virsh:

host $ virsh -c qemu:///session list
 Id   Name                State
-----------------------------------
 1    conntrack_default   running

host $ virsh -c qemu:///session console conntrack_default
Once connected to the console and logged into the VM, we can record the call chain using the trace-cmd wrapper for Ftrace:
[vagrant@ct-vm ~]$ sudo trace-cmd start -p function_graph -g ipv4_conntrack_defrag -g ipv4_conntrack_in
  plugin 'function_graph'
[vagrant@ct-vm ~]$ # … connect from the host with `nc -z 192.168.122.204 2570` …
[vagrant@ct-vm ~]$ sudo trace-cmd stop
[vagrant@ct-vm ~]$ sudo cat /sys/kernel/debug/tracing/trace
# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 1)   1.219 us    |  finish_task_switch();
 1)   3.532 us    |  ipv4_conntrack_defrag [nf_defrag_ipv4]();
 1)               |  ipv4_conntrack_in [nf_conntrack]() {
 1)               |    nf_conntrack_in [nf_conntrack]() {
 1)   0.573 us    |      get_l4proto [nf_conntrack]();
 1)               |      nf_ct_get_tuple [nf_conntrack]() {
 1)   0.487 us    |        nf_ct_get_tuple_ports [nf_conntrack]();
 1)   1.564 us    |      }
 1)   0.820 us    |      hash_conntrack_raw [nf_conntrack]();
 1)   1.255 us    |      __nf_conntrack_find_get [nf_conntrack]();
 1)               |      init_conntrack.constprop.0 [nf_conntrack]() {  ❷
 1)   0.427 us    |        nf_ct_invert_tuple [nf_conntrack]();
 1)               |        __nf_conntrack_alloc [nf_conntrack]() {      ❶
                             … 
 1)   3.680 us    |        }
                           … 
 1) + 15.847 us   |      }
                         … 
 1) + 34.595 us   |    }
 1) + 35.742 us   |  }
 …
[vagrant@ct-vm ~]$

What catches our attention here is the allocation, __nf_conntrack_alloc() (❶), inside init_conntrack() (❷). __nf_conntrack_alloc() creates a struct nf_conn object which represents a tracked connection.

This object is not created in vain. A glance at init_conntrack() source shows that it is pushed onto a list of unconfirmed connections³.

What does it mean that a connection is unconfirmed? As conntrack(8) man page explains:

unconfirmed:
       This table shows new entries, that are not yet inserted into the
       conntrack table. These entries are attached to packets that  are
       traversing  the  stack, but did not reach the confirmation point
       at the postrouting hook.

Perhaps we have been looking for our flow in the wrong table? Does the unconfirmed table have a record for our dropped TCP SYN?

Pulling the rabbit out of the hat

I have bad news…

[vagrant@ct-vm ~]$ sudo conntrack -L unconfirmed
conntrack v1.4.5 (conntrack-tools): 0 flow entries have been shown.
[vagrant@ct-vm ~]$

The flow is not present in the unconfirmed table. We have to dig deeper.

Let’s for a moment assume that a struct nf_conn object was added to the unconfirmed list. If the list is now empty, then the object must have been removed from the list before we inspected its contents.

Has an entry been removed from the unconfirmed table? What function removes entries from the unconfirmed table?

It turns out that nf_ct_add_to_unconfirmed_list() which init_conntrack() invokes, has its opposite defined just right beneath it – nf_ct_del_from_dying_or_unconfirmed_list().

It is worth a shot to check if this function is being called, and if so, from where. For that we can again use a BPF tracing program, attached to function entry. However, this time our program will record a kernel stack trace:

kprobe:nf_ct_del_from_dying_or_unconfirmed_list { @[kstack()] = count(); exit(); }

With bpftrace running our one-liner, we connect to the VM from the host with nc as before:

[vagrant@ct-vm ~]$ sudo bpftrace -e 'kprobe:nf_ct_del_from_dying_or_unconfirmed_list { @[kstack()] = count(); exit(); }'
Attaching 1 probe...

@[
    nf_ct_del_from_dying_or_unconfirmed_list+1 ❹
    destroy_conntrack+78
    nf_conntrack_destroy+26
    skb_release_head_state+78
    kfree_skb+50 ❸
    nf_hook_slow+143 ❷
    ip_local_deliver+152 ❶
    ip_sublist_rcv_finish+87
    ip_sublist_rcv+387
    ip_list_rcv+293
    __netif_receive_skb_list_core+658
    netif_receive_skb_list_internal+444
    napi_complete_done+111
    …
]: 1

[vagrant@ct-vm ~]$

Bingo. The conntrack delete function was called, and the captured stack trace shows that on local delivery path (❶), where LOCAL_IN Netfilter hook runs (❷), the packet is destroyed (❸). Conntrack must be getting called when sk_buff (the packet and its metadata) is destroyed. This causes conntrack to remove the unconfirmed flow entry (❹).

It makes sense. After all we have a DROP rule in the filter/INPUT chain. And that iptables -j DROP rule has a significant side effect. It cleans up an entry in the conntrack unconfirmed table!

This explains why we can’t observe the flow in the unconfirmed table. It lives for only a very short period of time.

Not convinced? You don’t have to take my word for it. I will prove it with a dirty trick!

Making the rabbit disappear, or actually appear

If you recall the output from list-nf-hooks that we’ve seen earlier, there is another conntrack callback there – ipv4_confirm, which I have ignored:

[vagrant@ct-vm ~]$ sudo /vagrant/tools/list-nf-hooks
…
🪝 ipv4 LOCAL_IN
       -150 → iptable_mangle_hook
          0 → iptable_filter_hook
         50 → iptable_security_hook
        100 → nf_nat_ipv4_fn
 2147483647 → ipv4_confirm              ☜ another conntrack callback
…

ipv4_confirm is “the confirmation point” mentioned in the conntrack(8) man page. When a flow gets confirmed, it is moved from the unconfirmed table to the main conntrack table.

The callback is registered with a “weird” priority – 2,147,483,647. It’s the maximum positive value of a 32-bit signed integer can hold, and at the same time, the lowest possible priority a callback can have.

This ensures that the ipv4_confirm callback runs last. We want the flows to graduate from the unconfirmed table to the main conntrack table only once we know the corresponding packet has made it through the firewall.

Luckily for us, it is possible to have more than one callback registered with the same priority. In such cases, the order of registration matters. We can put that to use. Just for educational purposes.

Good old iptables won’t be of much help here. Its Netfilter callbacks have hard-coded priorities which we can’t change. But nftables, the iptables successor, is much more flexible in this regard. With nftables we can create a rule chain with arbitrary priority.

So this time, let’s use nftables to install a filter rule to drop traffic to port tcp/2570. The trick, though, is to register our chain before conntrack registers itself. This way our filter will run last.

First, delete the tcp/2570 drop rule in iptables and unregister conntrack.

vm # iptables -t filter -F
vm # rmmod nf_conntrack_netlink nf_conntrack

Then add tcp/2570 drop rule in nftables, with lowest possible priority.

vm # nft add table ip my_table
vm # nft add chain ip my_table my_input { type filter hook input priority 2147483647 \; }
vm # nft add rule ip my_table my_input tcp dport 2570 counter drop
vm # nft -a list ruleset
table ip my_table { # handle 1
        chain my_input { # handle 1
                type filter hook input priority 2147483647; policy accept;
                tcp dport 2570 counter packets 0 bytes 0 drop # handle 4
        }
}

Finally, re-register conntrack hooks.

vm # modprobe nf_conntrack enable_hooks=1

The registered callbacks for the LOCAL_IN hook now look like this:

vm # /vagrant/tools/list-nf-hooks
…
🪝 ipv4 LOCAL_IN
       -150 → iptable_mangle_hook
          0 → iptable_filter_hook
         50 → iptable_security_hook
        100 → nf_nat_ipv4_fn
 2147483647 → ipv4_confirm, nft_do_chain_ipv4
…

What happens if we connect to port tcp/2570 now?

vm # conntrack -L
tcp      6 115 SYN_SENT src=192.168.122.1 dst=192.168.122.204 sport=54868 dport=2570 [UNREPLIED] src=192.168.122.204 dst=192.168.122.1 sport=2570 dport=54868 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.5 (conntrack-tools): 1 flow entries have been shown.

We have fooled conntrack 💥

Conntrack promoted the flow from the unconfirmed to the main conntrack table despite the fact that the firewall dropped the packet. We can observe it.

Outro

Conntrack processes every received packet⁴ and creates a flow for it. A flow entry is always created even if the packet is dropped shortly after. The flow might never be promoted to the main conntrack table and can be short lived.

However, this blog post is not really about conntrack. Its internals have been covered by magazines, papers, books, and on other blogs long before. We probably could have learned elsewhere all that has been shown here.

For us, conntrack was really just an excuse to demonstrate various ways to discover the inner workings of the Linux network stack. As good as any other.

Today we have powerful introspection tools like drgn, bpftrace, or Ftrace, and a cross referencer to plow through the source code, at our fingertips. They help us look under the hood of a live operating system and gradually deepen our understanding of its workings.

I have to warn you, though. Once you start digging into the kernel, it is hard to stop…

………..
¹Actually since Linux v5.10 (Dec 2020) there is an additional Netfilter hook for the INET family named NF_INET_INGRESS. The new hook type allows users to attach nftables chains to the Traffic Control ingress hook.
²Why did I pick this port number? Because 2570 = 0x0a0a. As we will see later, this saves us the trouble of converting between the network byte order and the host byte order.
³To be precise, there are multiple lists of unconfirmed connections. One per each CPU. This is a common pattern in the kernel. Whenever we want to prevent CPUs from contending for access to a shared state, we give each CPU a private instance of the state.
⁴Unless we explicitly exclude it from being tracked with iptables -j NOTRACK.

Cloudflare recognized as a ‘Leader’ in The Forrester Wave for DDoS Mitigation Solutions

2021-03-02 Vivek Ganti

Post Syndicated from Vivek Ganti original https://blog.cloudflare.com/cloudflare-is-named-a-leader-in-the-forrester-wave-for-ddos-mitigation-solutions/

Cloudflare recognized as a 'Leader' in The Forrester Wave for DDoS Mitigation Solutions

We’re thrilled to announce that Cloudflare has been named a leader in The Forrester Wave^TM: DDoS Mitigation Solutions, Q1 2021. You can download a complimentary copy of the report here.

According to the report, written by, Forrester Senior Analyst for Security and Risk, David Holmes, “Cloudflare protects against DDoS from the edge, and fast… customer references view Cloudflare’s edge network as a compelling way to protect and deliver applications.”

Unmetered and unlimited DDoS protection for all

Cloudflare was founded with the mission to help build a better Internet — one where the impact of DDoS attacks is a thing of the past. Over the last 10 years, we have been unwavering in our efforts to protect our customers’ Internet properties from DDoS attacks of any size or kind. In 2017, we announced unmetered DDoS protection for free — as part of every Cloudflare service and plan including the Free plan — to make sure every organization can stay protected and available.

Thanks to our home-grown automated DDoS protection systems, we’re able to provide unmetered and unlimited DDoS protection for free. Our automated systems constantly analyze traffic samples asynchronously as to avoid impact to performance. They scan for DDoS attacks across layers 3-7 of the OSI model. They look for patterns in IP packets, HTTP requests and HTTP responses. When an attack is identified, a real-time signature is generated in the form of an ephemeral mitigation rule. The rule is propagated to the most optimal location in our edge for the most cost-efficient mitigation: either in the Linux kernel’s eXpress Data Path (XDP), Linux userspace iptables or in the HTTP reverse-proxy. A cost-efficient mitigation strategy means that we can mitigate the most volumetric, distributed attacks without impacting performance.

Read more about how Cloudflare’s DDoS protection systems work here.

DDoS attacks increasing

We’d like to say DDoS attacks are a thing of the past. But unfortunately, they are not.

On the contrary, we continue to see the frequency, sophistication, and geographical distribution of DDoS attacks rise every quarter – in quantity or size. See our reports from last year (Q1 ‘20, Q2 ‘20, Q3 ‘20, and Q4 ‘20) and view overall Internet traffic trends here on Cloudflare Radar.

Over the past year, Cloudflare has seen and automatically mitigated some of the largest and arguably the most creative cyber attacks. As attackers are getting bolder and smarter in their ways, organizations are looking for ways to battle these kinds of attacks with no disruption to the services they provide.

Organizations are being extorted under threat of DDoS

In January this year, we shared the story of how we helped a Fortune Global 500 company stay online and protected whilst they were targeted by a ransom DDoS attack. They weren’t the only one. In fact, in the fourth quarter of 2020, 17% of surveyed Cloudflare customers reported receiving a ransom or a threat of DDoS attack. In Q1 2021, this increased to 26% — roughly 1 out of every 4 respondents reported a ransom threat and a subsequent DDoS attack on their network infrastructure.

Whether organizations are targeted with ransom attacks or amateur ‘cyber vandalism’, it’s important for organizations to utilize an always-on, automated DDoS protection service that doesn’t require manual human intervention in the hour of need. We take great pride in being able to provide this level of protection to our customers.

Continuous improvement

As attacks have continued to evolve, and the number of customers using our services has increased, Cloudflare has continually invested in our technology to stay several steps ahead of attackers. We’ve made significant investments in bolstering our mitigation capacity, honing our detection algorithms, and providing better analytics capabilities to our customers. Our aim is to make impact from DDoS attacks a thing of the past, for all customers, just like spam in the 90s.

In 2019, we rolled out our autonomous DDoS detection and mitigation system, dosd. This component of our mitigation stack is fully software-defined, leverages Linux’s eXpress Data Path (XDP), and allows us to quickly and automatically deploy eBPF rules that run on each packet received for inspection — mitigating the most sophisticated attacks within less than 3 seconds on average at the edge and other common attacks instantly. It works by detecting patterns in the attack traffic and then quickly deploying rules autonomously to drop the offenders at wire speed. Additionally, because dosd operates independently within each data center, with no reliance on a centralized data center, it greatly increases the resilience of our network.

While dosd is great at mitigating attacks by detecting patterns in the traffic, what about patternless attacks? That’s where flowtrackd comes in, our novel TCP state classification engine, built in 2020, to defend against disruptive L3/L4 attacks targeting our Magic Transit customers. It’s able to detect and mitigate the most randomized, sophisticated attacks. Additionally, at L7, we also learn our customer’s traffic baselines and identify when their origin is in distress. When an origin server shows signs of deterioration, our systems begin soft mitigation in order to reduce the impact on the server and allow it to recuperate.

Building advanced DDoS protection systems is not only about the detection, but also about cost efficient mitigation. We aim to mitigate attacks without impacting performance that can be caused due to excessive computational consumption. This requirement is why we introduced IP Jails to the world: IP Jails is a gatebot capability that mitigates the most volumetric and distributed attacks without impacting performance. Gatebot activates IP Jails when attacks become significantly volumetric, and then instead of blocking at L7, IP Jails temporarily drops the connection of the offending IP address that generated the request which matched the attack signature that gatebot created. IP Jails leverages the Linux iptables mechanism to drop packets at wirespeed. Dropping L7 attacks at L4, is significantly more cost-efficient, and benefits both our customers and our Site Reliability Engineering team.

Lastly, to provide our customers better visibility and insight into the increasingly sophisticated attacks we’re seeing and mitigating, we released the Firewall Analytics dashboard in 2019. This dashboard provides insights into both HTTP application security and DDoS activity at L7, allowing customers to configure rules directly from within analytics dashboards thus tightening the feedback loop for responding to events. Later in 2020, we released an equivalent dashboard for L3/4 activity for our enterprise Magic Transit and Spectrum customers, in the form of the Network Analytics dashboard. Network Analytics provides insight into packet-level traffic and DDoS attack activity, along with periodical Insights and Trends. To complement the dashboards and provide our users the right information as they need it, we rolled out real-time DDoS alerts and also periodical DDoS reports — right into your inboxes. Or if you prefer, directly into your SIEM dashboards.

Cloudflare received the top score in the strategy category

This year, due to our advanced DDoS protection capabilities, Cloudflare received the top score in the strategy category and among the top three in the current offering category. Additionally, we were given the highest possible scores in 15 criteria in the report, including:

Threat detection
Burst attacks
Response automation
Speed of implementation
Product vision
Performance
Security operation center (SOC) service

We believe that our standing stems from the sustained investments we’ve made over the last few years in our global Anycast network — which serves as the foundation of all services we provide to our customers.

Our network is architected for scale — every service runs on every server in every Cloudflare data center that spans over 200 cities globally. And as opposed to some of the other vendors in the report, every Cloudflare service is delivered from every one of our edge data centers.

Integrated security and performance

A leading application performance monitoring company that uses Cloudflare’s services for serverless compute and content delivery recently told us that they wanted to consolidate their performance and security services under one provider. They got rid of their incumbent L3 services provider and onboarded Cloudflare for their application and network services (with Magic Transit) for easier management and better support.

We see this more and more. The benefits of using a single cloud provider for bundled security and performance services are plentiful:

Easier management — users can manage all of Cloudflare’s services such as DDoS protection, WAF, CDN, bot management and serverless compute from a single dashboard and a single API endpoint.
Deep service integration – all of our services are deeply integrated which allows our users to truly leverage the power of Cloudflare. As an example, Bot Management rules are implemented with our Application Firewall.
Easier troubleshooting — instead of having to reach out to multiple providers, our customers have a single point of contact when troubleshooting. Additionally, we provide immediate human response in our under attack hotline.
Lower latency — because every one of our services are delivered from all of our data centers, there are no performance penalties. As an example, there are no additional routing hops between the DDoS service to Bot Management service to CDN service.

However, not all cloud services are built the same, i.e. most vendors today do not have a comprehensive and robust solution to offer. Cloudflare’s unique architecture enables it to offer an integrated solution that comprises an all-star cast featuring the following to name a few:

CDN: Customer’s Choice LEADER in 2020 Gartner Peer Insights ‘Voice of the Customer’: Global CDN1
DDoS: Received the highest number of high scores in the 2020 Gartner report for Solution Comparison for DDoS Cloud Scrubbing Centers2
WAF: Cloudflare is a CHALLENGER in the 2020 Gartner Magic Quadrant for Web Application Firewall (receiving the highest placement in the ‘Ability to Execute’)3
Zero Trust: Cloudflare is a LEADER in the Omdia Market Radar: Zero-Trust Access Report, 20204
Bot Management: Leader in the 2020 SPARK Matrix of Bot Management Market5
Integrated solution: Innovation leader in the Global Holistic Web Protection Market for 2020 by Frost & Sullivan6

We are pleased to be named a LEADER in The Forrester Wave™: for DDoS Mitigation Solutions, Q1 2021 report, and will continue to work tirelessly to remain, as the report puts it, a “compelling way to protect and deliver applications” for our customers.

For more information about Cloudflare’s DDoS protection, reach out to us here or hands-on evaluation of Cloudflare, sign up here.

¹https://www.gartner.com/reviews/market/global-cdn/vendor/cloudflare/product/cloudflare-cdn
²https://www.gartner.com/en/documents/3983636/solution-comparison-for-ddos-cloud-scrubbing-centers
³Gartner, “Magic Quadrant for Web Application Firewalls”, Analyst(s): Jeremy D’Hoinne, Adam Hils, John Watts, Rajpreet Kaur, October 19, 2020. https://www.gartner.com/doc/reprints?id=1-249JQ6L1&ct=200929&st=sb
⁴https://www.cloudflare.com/lp/omdia-zero-trust
⁵https://www.cloudflare.com/lp/qks-bot-management-leader/
⁶https://www.cloudflare.com/lp/frost-radar-holistic-web/

An introduction to three-phase power and PDUs

2020-12-04 Rob Dinh

Post Syndicated from Rob Dinh original https://blog.cloudflare.com/an-introduction-to-three-phase-power-and-pdus/

An introduction to three-phase power and PDUs

Our fleet of over 200 locations comprises various generations of servers and routers. And with the ever changing landscape of services and computing demands, it’s imperative that we manage power in our data centers right. This blog is a brief Electrical Engineering 101 session going over specifically how power distribution units (PDU) work, along with some good practices on how we use them. It appears to me that we could all use a bit more knowledge on this topic, and more love and appreciation of something that’s critical but usually taken for granted, like hot showers and opposable thumbs.

A PDU is a device used in data centers to distribute power to multiple rack-mounted machines. It’s an industrial grade power strip typically designed to power an average consumption of about seven US households. Advanced models have monitoring features and can be accessed via SSH or webGUI to turn on and off power outlets. How we choose a PDU depends on what country the data center is and what it provides in terms of voltage, phase, and plug type.

An introduction to three-phase power and PDUs

For each of our racks, all of our dual power-supply (PSU) servers are cabled to one of the two vertically mounted PDUs. As shown in the picture above, one PDU feeds a server’s PSU via a red cable, and the other PDU feeds that server’s other PSU via a blue cable. This is to ensure we have power redundancy maximizing our service uptime; in case one of the PDUs or server PSUs fail, the redundant power feed will be available keeping the server alive.

Faraday’s Law and Ohm’s Law

Like most high-voltage applications, PDUs and servers are designed to use AC power. Meaning voltage and current aren’t constant — they’re sine waves with magnitudes that can alternate between positive and negative at a certain frequency. For example, a voltage feed of 100V is not constantly at 100V, but it bounces between 100V and -100V like a sine wave. One complete sine wave cycle is one phase of 360 degrees, and running at 50Hz means there are 50 cycles per second.

The sine wave can be explained by Faraday’s Law and by looking at how an AC power generator works. Faraday’s Law tells us that a current is induced to flow due to a changing magnetic field. Below illustrates a simple generator with a permanent magnet rotating at constant speed and a coil coming in contact with the magnet’s magnetic field. Magnetic force is strongest at the North and South ends of the magnet. So as it rotates on itself near the coil, current flow fluctuates in the coil. One complete rotation of the magnet represents one phase. As the North end approaches the coil, current increases from zero. Once the North end leaves, current decreases to zero. The South end in turn approaches, now the current “increases” in the opposite direction. And finishing the phase, the South end leaves returning the current back to zero. Current alternates its direction at every half cycle, hence the naming of Alternating Current.

Current and voltage in AC power fluctuate in-phase, or “in tandem”, with each other. So by Ohm’s Law of Power = Voltage x Current, power will always be positive. Notice on the graph below that AC power (Watts) has two peaks per cycle. But for practical purposes, we’d like to use a constant power value. We do that by interpreting AC power into “DC” power using root-mean-square (RMS) averaging, which takes the max value and divides it by √2. For example, in the US, our conditions are 208V 24A at 60Hz. When we look at spec sheets, all of these values can be assumed as RMS’d into their constant DC equivalent values. When we say we’re fully utilizing a PDU’s max capacity of 5kW, it actually means that the power consumption of our machines bounces between 0 and 7.1kW (5kW x √2).

It’s also critical to figure out the sum of power our servers will need in a rack so that it falls under the PDU’s design max power capacity. For our US example, a PDU is typically 5kW (208 volts x 24 amps); therefore, we’re budgeting 5kW and fit as many machines as we can under that. If we need more machines and the total sum power goes above 5kW, we’d need to provision another power source. That would lead to possibly another set of PDUs and racks that we may not fully use depending on demand; e.g. more underutilized costs. All we can do is abide by P = V x I.

However there is a way we can increase the max power capacity economically — 3-phase PDU. Compared to single phase, its max capacity is √3 or 1.7 times higher. A 3-phase PDU of the same US specs above has a capacity of 8.6kW (5kW x √3), allowing us to power more machines under the same source. Using a 3-phase setup might mean it has thicker cables and bigger plug; but despite being more expensive than a 1-phase, its value is higher compared to two 1-phase rack setups for these reasons:

It’s more cost-effective, because there are fewer hardware resources to buy
Say the computing demand adds up to 215kW of hardware, we would need 25 3-phase racks compared to 43 1-phase racks.
Each rack needs two PDUs for power redundancy. Using the example above, we would need 50 3-phase PDUs compared to 86 1-phase PDUs to power 215kW worth of hardware.
That also means a smaller rack footprint and fewer power sources provided and charged by the data center, saving us up to √3 or 1.7 times in opex.
It’s more resilient, because there are more circuit breakers in a 3-phase PDU — one more than in a 1-phase. For example, a 48-outlet PDU that is 1-phase would be split into two circuits of 24 outlets. While a 3-phase one would be split into 3 circuits of 16 outlets. If a breaker tripped, we’d lose 16 outlets using a 3-phase PDU instead of 24.

The PDU shown above is a 3-phase model of 48 outlets. We can see three pairs of circuit breakers for the three branches that are intertwined with each other — white, grey, and black. Industry demands today pressure engineers to maximize compute performance and minimize physical footprint, making the 3-phase PDU a widely-used part of operating a data center.

What is 3-phase?

A 3-phase AC generator has three coils instead of one where the coils are 120 degrees apart inside the cylindrical core, as shown in the figure below. Just like the 1-phase generator, current flow is induced by the rotation of the magnet thus creating power from each coil sequentially at every one-third of the magnet’s rotation cycle. In other words, we’re generating three 1-phase power offset by 120 degrees.

A 3-phase feed is set up by joining any of its three coils into line pairs. L1, L2, and L3 coils are live wires with each on their own phase carrying their own phase voltage and phase current. Two phases joining together form one line carrying a common line voltage and line current. L1 and L2 phase voltages create the L1/L2 line voltage. L2 and L3 phase voltages create the L2/L3 line voltage. L1 and L3 phase voltages create the L1/L3 line voltage.

Let’s take a moment to clarify the terminology. Some other sources may refer to line voltage (or current) as line-to-line or phase-to-phase voltage (or current). It can get confusing, because line voltage is the same as phase voltage in 1-phase circuits, as there’s only one phase. Also, the magnitude of the line voltage is equal to the magnitude of the phase voltage in 3-phase Delta circuits, while the magnitude of the line current is equal to the magnitude of the phase current in 3-phase Wye circuits.

Conversely, the line current equals to phase current times √3 in Delta circuits. In Wye circuits, the line voltage equals to phase voltage times √3.

In Delta circuits:
V_line = V_phase
I_{line = √3 x I_phase}

In Wye circuits:
V_line = √3 x V_phase
I_line = I_phase

Delta and Wye circuits are the two methods that three wires can join together. This happens both at the power source with three coils and at the PDU end with three branches of outlets. Note that the generator and the PDU don’t need to match each other’s circuit types.

On PDUs, these phases join when we plug servers into the outlets. So we conceptually use the wirings of coils above and replace them with resistors to represent servers. Below is a simplified wiring diagram of a 3-phase Delta PDU showing the three line pairs as three modular branches. Each branch carries two phase currents and its own one common voltage drop.

And this one below is of a 3-phase Wye PDU. Note that Wye circuits have an additional line known as the neutral line where all three phases meet at one point. Here each branch carries one phase and a neutral line, therefore one common current. The neutral line isn’t considered as one of the phases.

Thanks to a neutral line, a Wye PDU can offer a second voltage source that is √3 times lower for smaller devices, like laptops or monitors. Common voltages for Wye PDUs are 230V/400V or 120V/208V, particularly in North America.

Where does the √3 come from?

Why are we multiplying by √3? As the name implies, we are adding phasors. Phasors are complex numbers representing sine wave functions. Adding phasors is like adding vectors. Say your GPS tells you to walk 1 mile East (vector a), then walk a 1 mile North (vector b). You walked 2 miles, but you only moved by 1.4 miles NE from the original location (vector a+b). That 1.4 miles of “work” is what we want.

Let’s take in our application L1 and L2 in a Delta circuit. we add phases L1 and L2, we get a L1/L2 line. We assume the 2 coils are identical. Let’s say α represents the voltage magnitude for each phase. The 2 phases are 120 degrees offset as designed in the 3-phase power generator:

|L1| = |L2| = α
L1 = |L1|∠0° = α∠0°
L2 = |L2|∠-120° = α∠-120°

Using vector addition to solve for L1/L2:

L1/L2 = L1 + L2

Convert L1/L2 into polar form:

Since voltage is a scalar, we’re only interested in the “work”:

|L1/L2| = √3α

Given that α also applies for L3. This means for any of the three line pairs, we multiply the phase voltage by √3 to calculate the line voltage.

V_line = √3 x V_phase

Now with the three line powers being equal, we can add them all to get the overall effective power. The derivation below works for both Delta and Wye circuits.

P_overall = 3 x P_line
P_overall = 3 x (V_line x I_line)
P_overall = (3/√3) x (V_phase x I_phase)
P_overall = √3 x V_phase x I_phase

Using the US example, V_phase is 208V and I_phase is 24A. This leads to the overall 3-phase power to be 8646W (√3 x 208V x 24A) or 8.6kW. There lies the biggest advantage for using 3-phase systems. Adding 2 sets of coils and wires (ignoring the neutral wire), we’ve turned a generator that can produce √3 or 1.7 times more power!

Dealing with 3-phase

The derivation in the section above assumes that the magnitude at all three phases is equal, but we know in practice that’s not always the case. In fact, it’s barely ever. We rarely have servers and switches evenly distributed across all three branches on a PDU. Each machine may have different loads and different specs, so power could be wildly different, potentially causing a dramatic phase imbalance. Having a heavily imbalanced setup could potentially hinder the PDU’s available capacity.

A perfectly balanced and fully utilized PDU at 8.6kW means that each of its three branches has 2.88kW of power consumed by machines. Laid out simply, it’s spread 2.88 + 2.88 + 2.88. This is the best case scenario. If we were to take 1kW worth of machines out of one branch, spreading power to 2.88 + 1.88 + 2.88. Imbalance is introduced, the PDU is underutilized, but we’re fine. However, if we were to put back that 1kW into another branch — like 3.88 + 1.88 + 2.88 — the PDU is over capacity, even though the sum is still 8.6kW. In fact, it would be over capacity even if you just added 500W instead of 1kW on the wrong branch, thus reaching 3.18 + 1.88 + 2.88 (8.1kW).

That’s because a 8.6kW PDU is spec’d to have a maximum of 24A for each phase current. Overloading one of the branches can force phase currents to go over 24A. Theoretically, we can reach the PDU’s capacity by loading one branch until its current reaches 24A and leave the other two branches unused. That’ll render it into a 1-phase PDU, losing the benefit of the √3 multiplier. In reality, the branch would have fuses rated less than 24A (usually 20A) to ensure we won’t reach that high and cause overcurrent issues. Therefore the same 8.6kW PDU would have one of its branches tripped at 4.2kW (208V x 20A).

Loading up one branch is the easiest way to overload the PDU. Being heavily imbalanced significantly lowers PDU capacity and increases risk of failure. To help minimize that, we must:

Ensure that total power consumption of all machines is under the PDU’s max power capacity
Try to be as phase-balanced as possible by spreading cabling evenly across the three branches
Ensure that the sum of phase currents from powered machines at each branch is under the fuse rating at the circuit breaker.

This spreadsheet from Raritan is very useful when designing racks.

For the sake of simplicity, let’s ignore other machines like switches. Our latest 2U4N servers are rated at 1800W. That means we can only fit a maximum of four of these 2U4N chassis (8600W / 1800W = 4.7 chassis). Rounding them up to 5 would reach a total rack level power consumption of 9kW, so that’s a no-no.

Splitting 4 chassis into 3 branches evenly is impossible, and will force us to have one of the branches to have 2 chassis. That would lead to a non-ideal phase balancing:

Keeping phase currents under 24A, there’s only 1.1A (24A – 22.9A) to add on L1 or L2 before the PDU gets overloaded. Say we want to add as many machines as we can under the PDU’s power capacity. One solution is we can add up to 242W on the L1/L2 branch until both L1 and L2 currents reach their 24A limit.

Alternatively, we can add up to 298W on the L2/L3 branch until L2 current reaches 24A. Note we can also add another 298W on the L3/L1 branch until L1 current reaches 24A.

In the examples above, we can see that various solutions are possible. Adding two 298W machines each at L2/L3 and L3/L1 is the most phase balanced solution, given the parameters. Nonetheless, PDU capacity isn’t optimized at 7.8kW.

Dealing with a 1800W server is not ideal, because whichever branch we choose to power one would significantly swing the phase balance unfavorably. Thankfully, our Gen X servers take up less space and are more power efficient. Smaller footprint allows us to have more flexibility and fine-grained control over our racks in many of our diverse data centers. Assuming each 1U server is 450W, as if we physically split the 1800W 2U4N into fours each with their own power supplies, we’re now able to fit 18 nodes. That’s 2 more nodes than the four 2U4N setup:

Adding two more servers here means we’ve increased our value by 12.5%. While there are more factors not considered here to calculate the Total Cost of Ownership, this is still a great way to show we can be smarter with asset costs.

Cloudflare provides the back-end services so that our customers can enjoy the performance, reliability, security, and global scale of our edge network. Meanwhile, we manage all of our hardware in over 100 countries with various power standards and compliances, and ensure that our physical infrastructure is as reliable as it can be.

There’s no Cloudflare without hardware, and there’s no Cloudflare without power. Want to know more? Watch this Cloudflare TV segment about power: https://cloudflare.tv/event/7E359EDpCZ6mHahMYjEgQl.

ASICs at the Edge

2020-11-27 Tom Strickx

Post Syndicated from Tom Strickx original https://blog.cloudflare.com/asics-at-the-edge/

ASICs at the Edge

At Cloudflare we pride ourselves in our global network that spans more than 200 cities in over 100 countries. To handle all the traffic passing through our network, there are multiple technologies at play. So let’s have a look at one of the cornerstones that makes all of this work… ASICs. No, not the running shoes.

What’s an ASIC?

ASIC stands for Application Specific Integrated Circuit. The name already says it, it’s a chip with a very narrow use case, geared towards a single application. This is in stark contrast to a CPU (Central Processing Unit), or even a GPU (Graphics Processing Unit). A CPU is designed and built for general purpose computation, and does a lot of things reasonably well. A GPU is more geared towards graphics (it’s in the name), but in the last 15 years, there’s been a drastic shift towards GPGPU (General Purpose GPU), in which technologies such as CUDA or OpenCL allow you to use the highly parallel nature of the GPU to do general purpose computing. A good example of GPU use is video encoding, or more recently, computer vision, used in applications such as self-driving cars.

Unlike CPUs or GPUs, ASICs are built with a single function in mind. Great examples are the Google Tensor Processing Units (TPU), used to accelerate machine learning functions^[1], or for orbital maneuvering^[2], in which specific orbital maneuvers are encoded, like the Hohmann Transfer, used to move rockets (and their payloads) to a new orbit at a different altitude. And they are also heavily used in the networking industry. Technically, the use case in the network industry should be called an ASSP (Application Specific Standard Product), but network engineers are simple people, so we prefer to call it an ASIC.

Why an ASIC

ASICs have the major benefit of being hyper-efficient. The more complex hardware is, the more it will need cooling and power. As ASICs only contain the hardware components needed for their function, their overall size can be reduced, and so are their power requirements. This has a positive impact on the overall physical size of the network (devices don’t need to be as bulky to provide sufficient cooling), and helps reduce the power consumption of a data center.

Reducing hardware complexity also reduces the failure rate of the manufacturing process, and allows for easier production.

The downside is that you need to embed a lot of your features in hardware, and once a new technology or specification comes around, any chips made without that technology baked in, won’t be able to support it (VXLAN for example).

For network equipment, this works perfectly. Overall, the networking industry is slow-moving, and considerable time is taken before new technologies make it to the market (as can be seen with IPv6, MPLS implementations, xDSL availability, …). This means the chips don’t need to evolve on a yearly basis, and can instead be created on a much slower cycle, with bigger leaps in technology. For example, it took Broadcom two years to go from Tomahawk 3 to Tomahawk 4, but in that process they doubled the throughput. The benefits listed earlier are super helpful for network equipment, as they allow for considerable throughput in a small form factor.

Building an ASIC

As with chips of any kind, building an ASIC is a long-term process. Just like with CPUs, if there’s a defect in the hardware design, you have to start from scratch, and scrap the entire build line. As such, the development lifecycle is incredibly long. It starts with prototyping in an FPGA (Field Programmable Gate Array), in which chip designers can program their required functionality and confirm compatibility. All of this is done in a HDL (Hardware Description Language), such as Verilog.

Once the prototyping stage is over, they move to baking the new packet processing pipeline into the chip at a foundry. After that, no more changes can be made to the chip, as it’s literally baked into the hardware (unlike an FPGA, which can be reprogrammed). Further difficulty is added by the fact that there are a very small number of hardware companies that will buy ASICs in bulk to build equipment with; as such the unit cost can increase drastically.

All of this means that the iteration cycle of an ASIC tends to be on the slower side of things (compared to the yearly refreshes in the Intel Process-Architecture-Optimization model for example), and will usually be smaller incremental updates: For example, increases in port-speeds are incremental (1G → 10G → 25G → 40G → 100G → 400G → 800G → …), and are tied into upgrades to the SerDes (Serialiser/Deserialiser) part of the chip.

New protocol support is a lot harder, and might require multiple development cycles before it shows up in a chip.

What ASICs do

The ASICs in our network equipment are responsible for the switching and routing of packets, as well as being the first layer of defense (in the form of a stateless firewall). Due to the sheer nature of how fast packets get switched, fast memory access is a primary concern. Most ASICs will use a special sort of memory, called TCAM (Ternary Content-Addressable Memory). This memory will be used to store all sorts of lookup tables. These may be forwarding tables (where does this packet go), ACL (Access Control List) tables (is this packet allowed), or CoS (Class of Service) tables (which priority should be given to this packet)

CAM, and its more advanced sibling, TCAM, are fascinating kinds of memory, as they operate fundamentally different than traditional Random Access Memory (RAM). While you have to use a memory address to access data in RAM, with CAM and TCAM you can directly refer to the content you are looking for. It is a physical implementation of a key-value store.

In CAM you use the exact binary representation of a word, in a network application, that word is likely going to be an IP address, so 11001011.00000000.01110001.00000000 for example (203.0.113.0). While this is definitely useful, networks operate a big collection of IP addresses, and storing each individually would require significant memory. To remedy this memory requirement, TCAM can store three states, instead of the binary two. This third state, sometimes called ‘ignore’ state, allows for the storage of multiple sequential data words as a single entry.

In networking, these sequential data words are IP prefixes. So for the previous example, if we wanted to store the collection of that IP address, and the 254 IPs following it, in TCAM it would as follows: 11001011.00000000.01110001.XXXXXXXX (203.0.113.0/24). This storage method means we can ask questions of the ASIC such as “where should I send packets with the destination IP address of 203.0.113.19”, to which the ASIC can have a reply ready in a single clock cycle, as it does not need to run through all memory, but instead can directly reference the key. This reply will usually be a reference to a memory address in traditional RAM, where more data can be stored, such as output port, or firewall requirements for the packet.
ASICs at the Edge

To dig a bit deeper into what ASICs do in network equipment, let’s briefly go over some fundamentals.

Networking can be split into two primary components: routing and switching. Switching allows you to directly interconnect multiple devices, so they can talk with each other across the network. It’s what allows your phone to connect to your TV to play a new family video. Routing is the next level up. It’s the mechanism that interconnects all these switched networks into a network of networks, and eventually, the Internet.

So routers are the devices responsible for steering traffic through this complex maze of networks, so it gets to its destination safely, and hopefully, as fast as possible. On the Internet, routers will usually use a routing protocol called BGP (Border Gateway Protocol) to exchange reachability information for a prefix (a collection of IP addresses), also called NLRI (Network Layer Reachability Information).

As with navigating the roads, there are multiple ways to get from point A to point B on the Internet. To make sure the router makes the right decision, it will store all of the reachability information in the RIB (Routing Information Base). That way, if anything changes with one route, the router still has other options immediately available.

With this information, a BGP daemon can calculate the ideal path to take for any given destination from its own point-of-view. This Cisco documentation explains the decision process the daemon goes through to calculate that ideal path.

Once we have this ideal path for a given destination, we should store this information, as it would be very inefficient to calculate this every single time we need to go there. The storage database is called the FIB (Forwarding Information Base). The FIB will be a subset of the RIB, as it will only ever contain the best path for a destination at any given time, while the RIB keeps all the available paths, even the non-ideal ones.

With these individual components, routers can make packets go from point A to point B in a blink of an eye.

Here’ are some of the more specific functions our ASICs need to perform:

FIB install: Once the router has calculated its FIB, it’s important the router can access this as quickly as possible. To do so, the ASIC will install (write) this calculated FIB into the TCAM, so any lookups can happen as quickly as possible.
Packet forwarding lookups: as we need to know where to send a received packet, we look up this information in TCAM, which is, as we mentioned, incredibly fast.
Stateless Firewall: while a router routes packets between destinations, you also want to ensure that certain packets don’t reach a destination at all. This can be done using either a stateless or stateful firewall. “State” in this case refers to TCP state, so the router would need to understand if a connection is new, or already established. As maintaining state is a complex issue, which requires storing tables, and can quickly consume a lot of memory, most routers will only operate a stateless firewall.
Instead, stateful firewalls often have their own appliances. At Cloudflare, we’ve opted to move maintaining state to our compute nodes, as that severely reduces the state-table (one router for all state vs X metals for all state combined). A stateless firewall makes use of the TCAM again to store rules on what to do with specific types of packets. For example, one of the rules we employ at our edge is DENY-BOGON-RANGES , in which we discard traffic sourced from RFC1918 space (and other unroutable space). As this makes use of TCAM, it can all be done at line rate (the maximum speed of the interface).
Advanced features, such as GRE encapsulation: modern networking isn’t just packet switching and packet routing anymore, and more advanced features are needed. One of these is encapsulation. With packet encapsulation, a system will put a data packet into another data packet. Using this technique, it’s possible to build a network on top of an existing network (an overlay). Overlays can be used to build a virtual backbone for example, in which multiple locations can be virtually connected through the Internet.
While you can encapsulate packets on a CPU (we do this for Magic Transit), there are considerable challenges in doing so in software. As such, the ASIC can have built-in functionality to encapsulate a packet in a multitude of protocols, such as GRE. You may not want encapsulated packets to have to take a second trip through your entire pipeline, as this adds latency, so these shortcuts can also be built into the chip.
MPLS, EVPN, VXLAN, SDWAN, SDN, …: I ran out of buzzwords to enumerate here, but while MPLS isn’t new (the first RFC was created in 2001), it’s a rather advanced requirement, just as the others listed, which means not all ASIC vendors will implement this for all their chips due to the increased complexity.

Vendor Landscape

At Cloudflare, we interact with both hardware and software vendors on a daily basis while operating our global network. As we’re talking about ASICs today, we’ll explore the hardware landscape, but some hardware vendors also have their own NOS (Network Operating System).
There’s a vast selection of hardware out there, all with different features and pricing. It can become incredibly hard to see the wood for the trees, so we’ll focus on 4 important distinguishing factors: Throughput (how many bits can the ASIC push through), buffer size (how many bits can the ASIC store in memory in case of resource contention), programmability (how easy is it for a third party programmer like Cloudflare to interact directly with the ASIC), feature set (how many advanced things outside of routing/switching can the ASIC do).

The landscape is so varied because different companies have different requirements. A company like Cloudflare has different expectations for its network hardware than your typical corner shop. Even within our own network we’ll have different requirements for the different layers that make up our network.

Broadcom

The elephant in the networking room (or is it the jumbo frame in the switch?) is Broadcom. Broadcom is a semiconductor company, with their primary revenue in the wired infrastructure segment (over 50% of revenue^[3]). While they’ve been around since 1991, they’ve become an unstoppable force in the last 10 years, in part due to their reliance on Apple (25% of revenue). As a semiconductor manufacturer, their market dominance is primarily achieved by acquiring other companies. A great example is the acquisition of Dune Networks, which has become an excellent revenue generator as the StrataDNX series of ASIC (Arad, QumranMX, Jericho). As such, they have become the biggest ASIC vendor by far, and own 59% of the entire Ethernet Integrated Circuits market^[4].

As such, they supply a lot of merchant silicon to Cisco, Juniper, Arista and others. Up until recently, if you wanted to use the Broadcom SDK to accelerate your packet forwarding, you have to sign so many NDAs you might get a hand cramp, which makes programming them a lot trickier. This changed recently when Broadcom open-sourced their SDK. Let’s have a quick look at some of their products.

Tomahawk

The Tomahawk line of ASICs are the bread-and-butter for the enterprise market. They’re cheap and incredibly fast. The first generation of Tomahawk chips did 3.2Tbps linerate, with low-latency switching. The latest generation of this chip (Tomahawk 4) does 25.6Tbps in a 7nm transistor footprint^[5]). As you can’t have a cheap, fast, and full feature set for a single package, this means you lose out on features. In this case, you’re missing most of the more advanced networking technologies such as VXLAN, and you have no buffer to speak of.
As an example of a different vendor using this silicon, you can have a look at the Juniper QFX5200 switching platform.

StrataDNX (Arad, QumranMX, Jericho)

These chipsets came through the acquisition of Dune Networks, and are a collection of high-bandwidth, deep buffer (large amount of memory available to store (buffer) packets) chips, allowing them to be deployed in versatile environments, including the Cloudflare edge. The Arista DCS-7280SR that we run in some of our edge locations as edge routers run on the Jericho chipset. Since then, the chips have evolved, and with Jericho2, Broadcom now have a 10Tbps deep buffer chip^[6]. With their fabric chip (this links multiple ASICs together), you can build switches with 48x400G ports^[7] without much effort.
Cisco built their NCS5500 line of routers using the QumranMX^[8].

Trident

This ASIC is an upgrade from the Tomahawk chipset, with a complex and extensive feature set, while maintaining high throughput rates. The latest Trident4 does 12.8Tbps at incredibly low latencies^[9], making it an incredibly flexible platform. It unfortunately has no buffer space to speak of, which limits its scope for Cloudflare, as we need the buffer space to be able to switch between the different port speeds we have on our edge routers. The Arista 7050X and 7300X are built on top of this.

Intel

Intel is well known in the network industry for building stable and high-performance 10G NICs (Network Interface Controller). They’re not known for ASICs. They made an initial attempt with their acquisition of Fulcrum^[10], which built the FM6000^[11] series of ASIC, but nothing of note was really built with them. Intel decided to try again in 2019 with their acquisition of Barefoot. This small manufacturer is responsible for the Barefoot Tofino ASIC, which may well be a fundamental paradigm shift in the network industry.

Barefoot Tofino

The Tofino^[12] is built using a PISA (Protocol Independent Switch Architecture), and using P4 (Programming Protocol-Independent Packet Processors)^[13], you can program the data-plane (packet forwarding) as you see fit. It’s a drastic move away from the traditional method of networking, in which direct programming of the ASIC isn’t easily possible, and definitely not through a standard programming language. As an added benefit, P4 also allows you to perform a formal verification of your forwarding program, and be sure that it will do what you expect it to. Caveat: OpenFlow tried this, but unfortunately never really got much traction.
ASICs at the Edge ^[14]

There are multiple variations of the Tofino 1 available, but the top-end ASIC has a 6.5Tbps linerate capacity. As the ASIC is programmable, its featureset is as rich as you’d want it to be. Unfortunately, the chip does not come with a lot of buffer memory, so we can’t deploy these as edge devices (yet). Both Arista (7170 Series^[15]) and Cisco (Nexus 34180YC and 3464C series^[16]) have built equipment with the Tofino chip inside.

Mellanox

As some of you may know, Mellanox is the vendor that recently got acquired by Nvidia, which also provides our 25G NICs in our compute nodes. Besides NICs, Mellanox has a well-established line of ASICs, mostly for switching.

Spectrum

The latest iteration of this ASIC, Spectrum 3 offers 12.8Tbps switching capacity, with an extensive featureset, including Deep Packet Inspection and NAT. This chip allows for building dense high-speed port devices, going up to 25.6Tbps^[17]. Buffering wise, there’s none to really speak of (64MB). Mellanox also builds their own hardware platforms. Unlike the other vendors below, they aren’t shipped with the Mellanox Operating System, instead, they offer you a variety of choices to run on top, including Cumulus Linux (which was also acquired by Nvidia 🤔).

As mentioned, while we use their NIC technology extensively, we currently don’t have any Mellanox ASIC silicon in our network.

Juniper

Juniper is a network hardware supplier, and currently the biggest supplier of network equipment for Cloudflare. As previously mentioned in the Broadcom section, Juniper buys some of their silicon from Broadcom, but they also have a significant lineup of home-grown silicon, which can be split into 2 families: Trio and Express.

Express

The Express family is the switching-skewed family, where bandwidth is a priority, while still maintaining a broad range of feature capabilities. These chips live in the same application landscape as the Broadcom StrataDNX chips.

Paradise (Q5)

The Q5 is the new generation of the Juniper switching ASIC^[18]. While by itself it doesn’t boast high linerates (500Gbps), when combined into a chassis with a fabric chip (Clos network in this case), they can produce switches (or line cards) with up to 12Tbps of throughput capacity^[19]. In addition to allowing for high-throughput, dense network appliances, the chip also comes with a staggering amount of buffer space (4GB per ASIC), provided by external HMC (Hybrid Memory Cube). In this HMC, they’ve also decided to put the FIB, MAC and other tables (so no TCAM).
The Q5 chip is used in their QFX1000 lineup of switches, which include the QFX10002-36Q, QFX10002-60C, QFX10002-72Q and QFX10008, all of which are deployed in our datacenters, as either edge routers or core aggregation switches.

ExpressPlus (ZX)

The ExpressPlus is the more feature-rich and faster evolution of the Paradise chip. It offers double the bandwidth per chip (1Tbps) and is built into a combined Clos-fabric reaching 6Tbps in a 2U form-factor (PTX10002). It also has an increased logical scale, which comes with bigger buffers, larger FIB storage, and more ACL space.

The ExpressPlus drives some of the PTX line of IP routers, together with its newest sibling, Triton.

Triton (BT)

Triton is the latest generation of ASIC in the Express family, with 3.6Tbps of capacity per chip, making way for some truly bandwidth-dense hardware. Both Triton and ExpressPlus are 400GE capable.

Trio

The Trio family of chips are primarily used in the feature-heavy MX routing platform, and is currently at its 5th generation.

ASICs at the Edge A Juniper MPC4E-3D-32XGE line card

Trio Eagle (Trio 4.0) (EA)

The Trio Eagle is the previous generation of the Trio Penta, and can be found on the MPC7E line cards for example. It’s a feature-rich ASIC, with a 400Gbps forwarding capacity, and significant buffer and TCAM capacity (as is to be expected from a routing platform ASIC)

Trio Penta (Trio 5.0) (ZT)

Penta is the new generation routing chip, which is built for the MX platform routers. On top of being a very beefy chip, capable of 500Gbps per ASIC, allowing Juniper to build line cards of up to 4Tbps of capacity, the chip also has a lot of baked in features, offering advanced hardware offloading for for example MACSec, or Layer 3 IPsec.

The Penta chip is packaged on the MPC10E and MPC11E line card, which can be installed in multiple variations of the MX chassis routers (MX480 included).

Cisco

Last but not least, there’s Cisco. As the saying goes “nobody ever got fired for buying Cisco”, they’re the biggest vendor of network solutions around. Just like Juniper, they have a mixed product fleet of merchant silicon, as well as home-grown. While we used to operate Cisco routers as edge routers (Cisco ASR 9000), this is no longer the case. We do still use them heavily for our ToR (Top-of-Rack) switching needs, utilizing both their Nexus 5000 series and Nexus 9000 series switches.

Bigsur

Bigsur is custom silicon developed for the Nexus 6000 line of switches (confusingly, the switches themselves are called Cisco Nexus 5672UP and Cisco Nexus 6001). In our specific model, the Cisco Nexus 5672UP, there’s 7 of them interconnected, providing 10G and 40G connectivity. Unfortunately Cisco is a lot more tight-lipped about their ASIC capabilities, so I can’t go as deep as I did with the Juniper chips. Feature-wise, there’s not a lot we require from them in our edge network. They’re simple Layer 2 forwarding switches, with no added requirements. Buffer wise, they use a system called Virtual Output Queueing, just like the Juniper Express chip. Unlike the Juniper silicon, the Bigsur ASIC doesn’t come with a lot of TCAM or buffer space.

Tahoe

The Tahoe is the Cisco ASIC found in the Cisco 9300-EX switches, also known as the LSE (Leaf Spine Engine). It offers higher-density port configurations compared to the Bigsur (1.6Tbps)^[20]. Overall, this ASIC is a maturation of the Bigsur silicon, offering more advanced features such as advanced VXLAN+EVPN fabrics, greater port flexibility (10G, 25G, 40G and 100G), and increased buffer sizes (40MB). We use this ASIC extensively in both our edge data centers as well as in our core data centers.

Conclusion

A lot of different factors come into play when making the decision to purchase the next generation of Cloudflare network equipment. This post only scratches the surface of technical considerations to be made, and doesn’t come near any other factors, such as ecosystem contributions, openness, interoperability, or pricing. None of this would’ve been possible without the contributions from other network engineers—this post was written on the shoulders of giants. In particular, thanks to the excellent work by Jim Warner at UCSC, the engrossing book on the new MX platforms, written by David Roy (Day One: Inside the MX 5G), as well as the best book on the Juniper QFX lineup: Juniper QFX10000 Series by Douglas Richard Hanks Jr, and to finish it off, the Summary of Network ASICs post by Justin Pietsch.

Investigate VPC flow with Amazon Detective

2020-11-17 Ross Warren

Post Syndicated from Ross Warren original https://aws.amazon.com/blogs/security/investigate-vpc-flow-with-amazon-detective/

Many Amazon Web Services (AWS) customers need enhanced insight into IP network flow. Traditionally, cost, the complexity of collection, and the time required for analysis has led to incomplete investigations of network flows. Having good telemetry is paramount, and VPC Flow Logs are a very important part of a robust centralized logging architecture. The information that VPC Flow Logs provide is frequently used by security analysts to determine the scope of security issues, to validate that network access rules are working as expected, and to help analysts investigate issues and diagnose network behaviors. Flow logs capture information about the IP traffic going to and from EC2 interfaces in a VPC. Each record describes aspects of the traffic flow, such as where it originated and where it was sent to, what network ports were used, and how many bytes were sent.

Amazon Detective now enables you to interactively examine the details of the virtual private cloud (VPC) network flows of your Amazon Elastic Compute Cloud (Amazon EC2) instances. Amazon Detective makes it easy to analyze, investigate, and quickly identify the root cause of potential security issues or suspicious activities. Detective automatically collects VPC flow logs from your monitored accounts, aggregates them by EC2 instance, and presents visual summaries and analytics about these network flows. Detective doesn’t require VPC Flow Logs to be configured and doesn’t impact existing flow log collection.

In this blog post, I describe how to use the new VPC flow feature in Detective to investigate an UnauthorizedAccess:EC2/TorClient finding from Amazon GuardDuty. Amazon GuardDuty is a threat detection service that continuously monitors for malicious activity and unauthorized behavior to protect your AWS accounts, workloads, and data stored in Amazon S3. GuardDuty documentation states that this alert can indicate unauthorized access to your AWS resources with the intent of hiding the unauthorized user’s true identity. I’ll demonstrate how to use Amazon Detective to investigate an instance that was flagged by Amazon GuardDuty to determine whether it is compromised or not.

Starting the investigation in GuardDuty

In my GuardDuty console, I’m going to select the UnauthorizedAccess:EC2/TorClient finding shown in Figure 1, choose the Actions menu, and select Investigate.

Figure 1: Investigating from the GuardDuty console

This opens a new browser tab and launches the Amazon Detective console, where I’m presented with the profile page for this finding, shown in Figure 2. You must have Detective enabled to pivot between a GuardDuty finding and Detective. Detective provides profile pages for supported GuardDuty findings and AWS resources (for example, IP address, EC2 instance, user, and role) that include information and data visualizations that summarize observed behaviors and give guidance for interpreting them. Profiles help analysts to determine whether the finding is of genuine concern or a false positive. For resources, profiles provide supporting details for an investigation into a finding or for a general hunt for suspicious activity.

Figure 2: Finding profile panel

In this case, the profile page for this GuardDuty UnauthorizedAccess:EC2/TorClient finding provides contextual and behavioral data about the EC2 instance on which GuardDuty has noted the issue. As I dive into this finding, I’m going to be asking questions that help assess whether the instance was in fact accessed unintentionally, such as, “What IP port or network service was in use at that time?,” “Were any large data transfers involved?,” “Was the traffic allowed by my security groups?” Profile pages in Detective organize content that helps security analysts investigate GuardDuty findings, examine unexpected network behavior, and identify other AWS resources that might be affected by a potential security issue.

I begin scrolling down the page and notice the Findings associated with EC2 instance i-9999999999999999 panel. Detective displays related findings to provide analysts with additional evidence and context about potentially related issues. The finding I’m investigating is listed there, as well as an Unusual Behaviors/VM/Behavior:EC2-NetworkPortUnusual finding. GuardDuty builds a baseline on your network traffic and will generate findings where there is traffic outside the calculated normal. While we might not investigate every instance of anomalous traffic, having these alerts correlated by Detective provides context for validating the issue. Keeping this in mind as I scroll down, at the bottom of this profile page, I find the Overall VPC flow volume panel. If you choose the Info link next to the panel title, you can see helpful tips that describe how to use the visualizations and provide ideas for questions to ask within your investigation. These info links are available throughout Detective. Check them out!

Investigating VPC flow in Detective

In this investigation, I’m very curious about the two large spikes in inbound traffic that I see in the Overall VPC flow volume panel, which seem to be visually associated with some unusual outbound traffic spikes. It’s most likely that these outbound spikes are related to the Unusual Behaviors/VM/Behavior:EC2-NetworkPortUnusual finding I mentioned earlier. To start the investigation, I choose the display details for scope time button, shown circled at the bottom of Figure 2. This expands the VPC Flow Details, shown in Figure 3.

Figure 3: Our first look at VPC Flow Details

We now can see that each entry displays the volume of inbound traffic, the volume of outbound traffic, and whether the access request was accepted or rejected. Detective provides annotations on the VPC flows to help guide your investigation. These From finding annotations make it clear which flows and resources were involved in the finding. In this case, we can easily see (in Figure 3) the three IP addresses at the top of the list that triggered this GuardDuty finding.

I’m first going to focus on the spikes in traffic that are above the baseline. When I click on one of the spikes in the graph, the time window for the VPC flow activity now matches the dates of these spikes I’m investigating.

If I choose the Inbound Traffic column header, shown in Figure 4, I can find the flows that contributed to the spike during this time window.

Figure 4: Inbound traffic spikes

Note that the two large inbound spikes aren’t associated with the IP address from the UnauthorizedAccess:EC2/TorClient finding, based on the Detective annotation From finding. Let’s check the outbound traffic. If I do a quick sort of the table based on the outbound traffic column, as shown in Figure 5, we can also see the outbound spikes, and it isn’t immediately evident whether the spikes are associated with this finding. I could continue to investigate the spikes (because they are a visual anomaly), or focus just on the VPC flow traffic that GuardDuty and Detective have labeled as associated with this TOR finding.

Figure 5: Outbound traffic spikes

Let’s focus on the outbound and inbound spikes and see if we can determine what’s happening. The inbound spikes are on port 443, typically an HTTPS port, or a secure web connection. The outbound spikes are on port 22 (ssh), but go to IP addresses that look to be internal based on their addresses of 172.16.x.x. The port 443 traffic might indicate a web server that’s open to the internet and receiving traffic. With further investigation, we can determine if this idea is valid, and continue hunting for potentially malicious traffic.

A good next step would be to investigate the two specific IP addresses to rule out their involvement in the finding. I can do this by right-clicking on either of the external IP addresses and opening a new tab, where I can focus on investigating these two specific IP addresses. I would take this line of investigation to possibly rule out the involvement of these IP addresses in this finding, determine if they regularly communicate with my resources, find out what instance(s) they’re related to, and see if there are other findings associated with these instances or IP addresses. This deeper investigation is outside the scope of this blog post, but it’s something you should be doing in your own environment.

IP addresses in AWS are ephemeral in nature. The unique identifier in VPC flow logs is the Instance ID. At the time of this investigation, 172.16.0.7 is assigned to the instance related to this finding, so let’s continue to take a look at the internal 172.16.0.7 IP address with 218 MB outbound traffic on port 22. I choose 172.16.0.7, and Detective opens up the profile page for this specific IP address, as shown in Figure 6. Here we see some interesting correlations: two other GuardDuty findings related to SSH brute-force attacks. These could be related to our outbound port 22 spikes, because they’re certainly in the window of time we’re investigating.

Figure 6: IP address profile panel

As part of a deeper investigation, you would investigate the SSH brute-force findings for 198.51.100.254 and 203.0.113.83 but for now I’m interested what this IP is involved in. Detective easily associates this 172.16.0.7 IP address with the instance that was assigned the IP during the scope time. I scroll down to the bottom of the profile page for 172.16.0.7 and investigate the i-9999999999999999 instance by choosing the instance name.

Filtering VPC flow activity

In Detective, as the investigator we are looking at an instance profile panel, similar to the one in Figure 2, and since we’re interested in VPC flow details, I’m going to scroll down and select display details for scope time.

To focus on specific activity, I can filter the activity details by the following values:

IP address
Local or remote port
Direction
Protocol
Whether the request was accepted or rejected

I’m going to filter these VPC flow details and just look at port 22 (sshd) inbound traffic. I select the Filter check box and select Local Port and 22, as shown in Figure 7. Detective fills in all the available ports for you, making it easy to complete this filter.

Figure 7: Port 22 traffic

The activity details show a few IP addresses related to port 22, and we’re still following the large outbound spikes of traffic. It’s outside the scope of this blog post, but now it would be time to start looking at your security groups and network access control lists (ACLs) and determine why port 22 is open to the internet and sending all this traffic.

Understanding traffic behavior

As an investigator, I now have a good picture of the traffic related to the initial finding, and by diving deeper we’re able to discover other interesting traffic during the same timeframe. While we may not always determine “who has done it,” the goal should be to improve our understanding of the behavior of our environment and gather important technical evidence. Detective helps you identify and investigate anomalies to give you insight into your environment. If we were to continue our investigation into the finding, here are some actions we can take within Detective.

Investigate VPC findings with Detective:

Perform ports and utilization analysis
- Identify service and ephemeral ports
- Determine whether traffic was accepted or rejected based on security groups and NACL configurations
- Investigate possible reconnaissance traffic by exploring the significant amount of rejected traffic
Correlate EC2 instances to TCP/IP ports and IPs
Analyze traffic spikes and anomalies
Discover traffic patterns and make behavioral correlations

Explore EC2 instance behavior with Detective:

Directional Traffic Analysis
Investigate possible data exfiltration events by digging into large transfers
Enumerate distinct IP connections and sort and filter by protocol, amount of traffic, and traffic direction
Gather data related to a spike in port count from a single IP address (potential brute force) or multiple IP addresses (distributed denial of service (DDoS))

Additional forensics steps to consider

Snapshot EC2 Volumes
Memory dump of EC2 instance
Isolate EC2 instance
Review your authentication strategy and assess whether the chosen authentication method is sufficient to protect your asset

Summary

Without requiring you to set up infrastructure or spend time configuring log ingestion, Detective collects, organizes, and presents relevant data for your threat analysis and investigations. Security and operations teams will find this new capability helpful for simplifying EC2 traffic analysis, validating security group permissions, and diagnosing EC2 instance behavior. Detective does the heavy lifting of storing, and analyzing VPC flow data so you can focus on quickly answering your investigative questions. VPC network flow details are available now in all Detective supported Regions and are included as part of your service subscription.

To get started, you can enable a 30-day free trial of Amazon Detective. See the AWS Regional Services page for all the regions where Detective is available. To learn more, visit the Amazon Detective product page.

Are you a visual learner? Check out Amazon Detective Overview and Demonstration. This video helps you learn how and when to use Amazon Detective to improve the security of your AWS resources.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Delivering HTTP/2 upload speed improvements

2020-08-24 Junho Choi

Post Syndicated from Junho Choi original https://blog.cloudflare.com/delivering-http-2-upload-speed-improvements/

Delivering HTTP/2 upload speed improvements

Cloudflare recently shipped improved upload speeds across our network for clients using HTTP/2. This post describes our journey from troubleshooting an issue to fixing it and delivering faster upload speeds to the global Internet.

We launched speed.cloudflare.com in May 2020 to give our users insight into how well their networks perform. The test provides download, upload and latency tests. Soon after release, we received reports from a small number of users that sometimes upload speeds were underreported. Our investigation determined that it seemed to happen with end users that had high upload bandwidth available (several hundreds Mbps class cable modem or fiber service). Our speed tests are performed via browser JavaScript, and most browsers use HTTP/2 by default. We found that HTTP/2 upload speeds were sometimes much slower than HTTP/1.1 (assuming all TLS) when the user had high available upload bandwidth.

Upload speed is more important than ever, especially for people using home broadband connections. As many people have been forced to work from home they’re using their broadband connections differently than before. Prior to the pandemic broadband traffic was very asymmetric (you downloaded way more than you uploaded… think listening to music, or streaming a movie), but now we’re seeing an increase in uploading as people video conference from home or create content from their home office.

Initial Tests

User reports were focused on particularly fast home networks. I set up a dummynet network simulator to test upload speed in a controlled environment. I launched a linux VM running our code inside my Macbook Pro and set up a dummynet between the VM and Mac host. Measuring upload speed is simple – create a file and upload using curl to an endpoint which accepts a request body. I ran the same test 20 times and took a median upload speed (Mbps).

% dd if=/dev/urandom of=test.dat bs=1M count=10
% curl --http1.1 -w '%{speed_upload}\n' -sf -o/dev/null --data-binary @test.dat https://edge/upload-endpoint
% curl --http2 -w '%{speed_upload}\n' -sf -o/dev/null --data-binary @test.dat https://edge/upload-endpoint

Stepping up to uploading a 10MB object over a network which has 200Mbps available bandwidth and 40ms RTT, the result was surprising. Using our production configuration, HTTP/2 upload speed tested at almost half of the same test conditions using HTTP/1.1 (higher is better).

The result may differ depending on your network, but the gap is bigger when the network is fast. On a slow network, like my home cable connection (5Mbps upload and 20ms RTT), HTTP/2 upload speed was almost identical to the performance observed with HTTP/1.1.

Receiver Flow Control

Before we get into more detail on this topic, my intuition suggested the issue was related to receiver flow control. Usually the client (browser or any HTTP client) is the receiver of data, but in the case the client is uploading content to the server, the server is the receiver of data. And the receiver needs some type of flow control of the receive buffer.

How we handle receiver flow control differs between HTTP/1.1 and HTTP/2. For example, HTTP/1.1 doesn’t define protocol-level receiver flow control since there is no multiplexing of requests in the connection and it’s up to the TCP layer which handles receiving data. Note that most of the modern OS TCP stacks have auto tuning of the receive buffer (we will revisit that later) and they tune based on the current BDP (bandwidth-delay product).

In the case of HTTP/2, there is a stream-level flow control mechanism because the protocol supports multiplexing of streams. Each HTTP/2 stream has its own flow control window and there is connection level flow control for all streams in the connection. If it’s too tight, the sender will be blocked by the flow control. If it’s too loose we may end up wasting memory for buffering. So keeping it optimal is important when implementing flow control and the most optimal strategy is to keep the receive buffer matching the current BDP. BDP represents the maximum bytes in flight in the given network and can be used as an optimal buffer size.

How NGINX handles the request body buffer

Initially I tried to find a parameter which controls NGINX upload buffering and tried to see if tuning the values improved the result. There are a couple of parameters which are related to uploading a request body.

And this is HTTP/2 specific:

http2_body_preread_size

Cloudflare does not use the proxy_buffering directive, so it can be immediately discounted. client_body_buffer_size is the size of the request body buffer which is used regardless of the protocol, so this one applies to HTTP/1.1 and HTTP/2 as well.

When looking into the code, here is how it works:

HTTP/1.1: use client_body_buffer_size buffer as a buffer between upstream and the client, simply repeating reading and writing using this buffer.
HTTP/2: since we need a flow control window update for the HTTP/2 DATA frame, there are two parameters:
- http2_body_preread_size: it specifies the size of the initial request body read before it starts to send to the upstream.
- client_body_buffer_size: it specifies the size of the request body buffer.
- Those two parameters are used for allocating a request body buffer during uploading. Here is a brief summary of how unbuffered upload works:
  - Allocate a single request body buffer which size is a maximum of http2_body_preread_size and client_body_buffer_size. This means if http2_body_preread_size is 64KB and client_body_buffer_size is 128KB, then a 128KB buffer is allocated. We use 128KB for client_body_buffer_size.
  - HTTP/2 Settings INITIAL_WINDOW_SIZE of the stream is set to http2_body_preread_size and we use 64KB as a default (the RFC7540 default value).
  - HTTP/2 module reads up to http2_body_preread_size before sending it to upstream.
  - After flushing the preread buffer, keep reading from the client and write to upstream and send WINDOW_UPDATE frame back to the client when necessary until the request body is fully received.

To summarise what this means: HTTP/1.1 simply uses a single buffer, so TCP socket buffers do the flow control. However with HTTP/2, the application layer also has receiver flow control and NGINX uses a fixed size buffer for the receiver. This limits upload speed when the current link has a BDP larger than the current request body buffer size. So the bottleneck is HTTP/2 flow control when the buffer size is too tight.

We’re going to need a bigger buffer?

In theory, bigger buffer sizes should avoid upload bottlenecks, so I tried a few out by running my tests again. The previous chart result is now labelled “prod” and plotted alongside HTTP/2 tests with client_body_buffer_size set to 256KB, 512KB and 1024KB:

It appears 512KB is an optimal value for client_body_buffer_size.

What if I test with some other network parameter? Here is when RTT is 10ms, in this case, 256KB looks optimal.

Both cases look much better than current 128KB and get a similar performance to HTTP/1.1 or even better. However, it seems like the optimal buffer size is a moving target and furthermore having too large a buffer size can hurt the performance: we need a smart way to find the optimal buffer size.

Autotuning request body buffer size

One of the ideas that can help this kind of situation is autotuning. For example, modern TCP stacks autotune their receive buffers automatically. In production, our edge also has TCP receiver buffer autotuning enabled by default.

net.ipv4.tcp_moderate_rcvbuf = 1

But in case of HTTP/2, TCP buffer autotuning is not very effective because the HTTP/2 layer is doing its own flow control and the existing 128KB was too small for a high BDP link. At this point, I decided to pursue autotuning HTTP/2 receive buffer sizing as well, similar to what TCP does.

The basic idea is that NGINX doubles the size of HTTP/2 request body buffer size based on its BDP. Here is an algorithm currently implemented in our version of NGINX:

Allocate a request body buffer as explained above.
For every RTT (using linux tcp_info), update the current BDP.
Double the request body buffer size when the current BDP > (receiver window / 4).

Test Result

Lab Test

Here is a test result when HTTP/2 autotune upload is enabled (still using client_body_buffer_size 128KB). You can see “h2 autotune” is doing pretty well – similar or even slightly faster than HTTP/1.1 speed (that’s the initial goal). It might be slightly worse than a hand-picked optimal buffer size for given conditions, but you can see now NGINX picks up optimal buffer size automatically based on network conditions.

Production test

After we deployed this feature, I ran similar tests against our production edge, uploading a 10MB file from well connected client nodes to our edge. I created a Linux VM instance in Google Cloud and ran the upload test where the network is very fast (a few Gbps) and low latency (<10ms).

Here is when I run the test in the Google Cloud Belgium region to our CDG (Paris) PoP which has 7ms RTT. This looks very good with almost 3x improvement.

I also tested between the Google Cloud Tokyo region and our NRT (Tokyo) PoP, which had a 2.3ms RTT. Although this is not realistic for home users, the results are interesting. A 128KB fixed buffer performs well, but HTTP/2 with buffer autotune outperforms even HTTP/1.1.

Summary

HTTP/2 upload buffer autotuning is now fully deployed in the Cloudflare edge. Customers should now benefit from improved upload performance for all HTTP/2 connections, including speed tests on speed.cloudflare.com. Autotuning upload buffer size logic works well for most cases, and now HTTP/2 upload is much faster than before! When we think about the performance we usually tend to think about download speed or latency reduction, but faster uploading can also help users working from home when they need a large amount of upload, such as photo/video sharing apps, content creation, video conferencing or self broadcasting.

Many thanks to Lucas Pardue and Rustam Lalkaka for great feedback and suggestions on the article.

Latest update on network performance

How we’re comparing developer platforms

Test methodology

Measuring Workers performance from real users

Workers vs Compute@Edge vs Lambda@Edge

Your application, but faster

What is peering and why is it important?

What is BGP?

Private Network Interconnect

Internet Exchanges

Transit networks

Why is peering important?

Cloudflare Peering Portal supports PeeringDB login

Requesting sessions in the Peering Portal

Peer with Cloudflare today!

Making Network Monitoring easy

Analytics dashboard

Create network volume alert thresholds per IP address or IP prefix

Monitoring a speed test server in a home lab

Identifying a network layer DDoS attack

Magic Network Monitoring and Magic Transit

Sign up for early access and what’s next

SASE for Mobile Networks

Edge Computing

Network Infrastructure

What is Network Access Analyzer?

Sample environment

Configure Network Access Analyzer

How to remediate findings

Conclusion

What’s included in the logs

Setting up the logs

1) Create a Cloudflare API token

2) Create a Splunk token for an HTTP Event Collector

3) Create a Cloudflare Logpush job

Reduce costs with R2 storage

Cloudflare logs for maximum visibility

Why Zero Trust?

Cloudflare Access: a modern Zero Trust approach

Secure both your legacy applications and Azure hosted applications via Azure AD and Cloudflare Access jointly

Conclusion

What’s next?

Interconnect Anywhere is a journey

Uncompromising security and increased reliability from your choice of network fabric

Plug and Play Fabric connectivity

With Cloudflare Network interconnect, availability only gets better

Network fabrics in action

Intro

Calling into conntrack

Listing Netfilter hooks

Tracing conntrack callbacks

Going down the rabbit hole

Pulling the rabbit out of the hat

Making the rabbit disappear, or actually appear

Outro

Unmetered and unlimited DDoS protection for all

DDoS attacks increasing

Organizations are being extorted under threat of DDoS

Continuous improvement

Cloudflare received the top score in the strategy category

Integrated security and performance

Faraday’s Law and Ohm’s Law

What is 3-phase?

Where does the √3 come from?

Dealing with 3-phase

What’s an ASIC?

Why an ASIC

Building an ASIC

What ASICs do

Vendor Landscape

Broadcom

Tomahawk

StrataDNX (Arad, QumranMX, Jericho)

Trident

Intel

Barefoot Tofino

Mellanox

Spectrum

Juniper

Express