Tag Archives: Edge

The attendee’s guide to hybrid cloud and edge computing at AWS re:Invent 2025

Post Syndicated from Rachel Zheng original https://aws.amazon.com/blogs/compute/the-attendees-guide-to-hybrid-cloud-and-edge-computing-at-aws-reinvent-2025/

AWS re:Invent 2025 returns to Las Vegas, Nevada, from December 1–5, 2025. This year, we’re offering a comprehensive lineup of sessions and booth activities to help you build resilient, performant, and scalable applications wherever you need them—in the cloud, on premises, or at the edge.

Session types

These sessions are available in the following formats. Most of the sessions are under the topic of Hybrid Cloud and Multicloud (HMC) in the event catalog. If you plan to attend in person, lightning talks and theater sessions are walk-up only. For all other session types, you can reserve your seat through the event portal (login required) or join as a walk-up based on availability.

  • Breakout sessions – Lecture-style 60-minute presentations led by AWS experts and customers.
  • Lightning talks – 20-minute content on specific topics. Each Hybrid Cloud lightning talk features a real-world customer implementation.
  • Chalk talks – 60-minute interactive sessions where AWS experts lead discussions and whiteboard in real time.
  • Code talks – 60-minute sessions featuring coding demonstrations and technical implementations.
  • Workshops – Hands-on, 120-minute sessions where you work directly with AWS services in a guided environment.
  • Theater sessions – 15-minute quick sessions on the Expo floor, typically featuring partner solutions.

Our Hybrid Cloud sessions explore how you can extend AWS infrastructure, services, and tools to distributed locations for low latency, data residency, and local processing needs. These sessions focus on AWS Local Zones, AWS Outposts, and AWS Dedicated Local Zones. We’ve curated the following sessions by theme to help you navigate re:Invent and find content most relevant to your needs.

Leadership session

HMC202 | Breakout Session | AWS wherever you need it: From the cloud to the edge

Wednesday, Dec 3 | 2:30 PM – 3:30 PM PST

Wynn | Convention Promenade | Lafite 7 | Content Hub | Turquoise Theater

Presented by our engineering and product management leaders, Spencer Dillard and Madhura Kale, this session provides an overview of all our latest innovations for hybrid cloud and edge computing, and how they help you address infrastructure requirements in digital sovereignty, generative AI with local data processing needs, and migration and modernization with on-premises dependencies.

Theme 1: Generative AI and agentic AI with local data processing

As you scale generative AI implementations from pilots to production, you need to balance speed of innovation with data sovereignty requirements, low-latency edge inference needs, and space, power, and cost efficiency. In the following sessions, you will learn how to address these challenges.

HMC308 | Breakout session | Build generative and agentic AI applications on-premises and at the edge

Thursday, Dec 4 | 2:30 PM – 3:30 PM PST

Wynn | Upper Convention Promenade | Cristal 7

This session shares reference architectures, best practices, and demos for running small language models (SLMs), Retrieval Augmented Generation (RAG), and agentic AI with AWS Hybrid and Edge services. Gain insights into strategies for model selection and optimization.

HMC324 | Lightning talk | BCC: Hybrid architecture for generative AI to meet regulatory needs

Monday, Dec 1 | 10:00AM – 10:20AM PST

Mandalay Bay | Oceanside C | Content Hub | Lightning Theater

Learn how Bank CenterCredit (BCC) implemented two generative AI use cases: anonymizing personally identifiable information (PII) in customer service calls with Outposts before sending it to the parent AWS Region for foundation model (FM) fine-tuning, and implementing local RAG with regulated data to improve HR efficiency.

HMC311 | Chalk talk | Developing end-to-end SLM pipelines from the cloud to the edge

Thursday, Dec 4 | 11:30AM – 12:30 PM PST

MGM | Level 3 | Chairman’s 362

This session presents a comprehensive approach to deploying SLMs to Local Zones and Outposts using Amazon SageMaker and Amazon EKS. Learn how to deliver domain-specific, fine-tuned SLMs from Regions to edge locations for low-latency inference.

HMC312 | Chalk talk | Implement RAG while meeting data residency requirements

Wednesday, Dec 3 | 5:30PM – 6:30 PM PST

Mandalay Bay | Level 3 South | South Seas A

This session explores how to implement RAG with on-premises and edge data to help you meet data residency and digital sovereignty needs.

HMC317 | Code talk | Implement Agentic AI at the edge for industrial automation

Monday, Dec 1 | 10:30AM – 11:30AM PST

Mandalay Bay | Level 3 South | Jasmine H

Manufacturing and industrial operations demand real-time, intelligent decision-making in low-connectivity environments. Learn how to deploy SmolVLMs (small vision language models) and AI agents to automate site operations using Outposts and Strands Agents.

HMC302 | Workshop | Implementing agentic AI solutions on-premises and at the edge

Wednesday, Dec 3 | 4:00PM – 6:00PM PST

MGM | Level 3 | Chairman’s 368

In this workshop, learn how to extend Amazon Bedrock AgentCore to Outposts and Local Zones to build distributed agentic applications using Model Context Protocol (MCP) and agent-to-agent (A2A) communication with on-premises data.

HMC305 | Workshop | Low-latency SLM deployment: Optimizing inference on AWS Hybrid and Edge services

Monday, Dec 1 | 08:00AM – 10:00AM PST

MGM | Level 1 | Grand 117

This workshop demonstrates a fully local deployment approach for running SLMs at the edge using Local Zones and Outposts. The implementation focuses on achieving low-latency inference and enabling data sovereignty compliance through RAG within local infrastructure.

Theme 2: Migration and modernization with on-premises or edge dependencies

Certain workloads need to stay on-premises or closer to end users. These can be driven by data residency and digital sovereignty requirements or the need to access legacy on-premises systems at a low latency. When a Region is not close enough to meet these needs, AWS brings AWS infrastructure and AWS services closer to where you need them to accelerate migration and modernization.

HMC309 | Breakout session | Migrating your VMware workloads with on-premises dependencies

Thursday, Dec 4 | 11:30AM – 12:30PM PST

Caesars Forum | Level 1 | Summit 212 | Content Hub | Orange Theater

Learn how AWS can help you migrate VMware-based workloads while addressing data residency requirements and latency-sensitive application interdependencies. Gain insights from a real-world implementation at Caesars Entertainment.

HMC217 | Lightning talk | Rivian: Modernize mission-critical manufacturing applications with AWS

Wednesday, Dec 3 | 2:30PM – 2:50PM PST

Mandalay Bay | Level 2 South | Oceanside C | Content Hub | Lightning Theater

Manufacturing applications like Ignition, SCADA, MES, and robotic control require low-latency connectivity to on-premises manufacturing equipment. Learn how Rivian is modernizing mission-critical and latency-sensitive manufacturing applications with Outposts.

HMC313 | Chalk talk | Extend Amazon EKS clusters for on-premises and edge use cases

Wednesday, Dec 3 | 4:00PM – 5:00PM PST

MGM | Level 3 | Chairman’s 356

Dive deep into strategies for modernizing distributed applications with Amazon EKS across Regions, Local Zones, and Outposts.

HMC303 | Workshop | Migrate and modernize VMware workloads with on-premises dependencies

Tuesday, Dec 2 | 12:30PM – 2:30PM PST

Wynn | Convention Promenade | Margaux 2

This workshop guides you through migrating VMware and other on-premises applications to Outposts while modernizing them through containerization.

Theater session | Deploying robust disaster recovery for mission-critical workloads

Wednesday, Dec 3 | 1:00PM – 1:15PM PST

The Venetian | Level 2 | Expo Hall B | NetApp Booth (#1039)

With Outposts third-party storage integration, you can modernize right inside your data centers while leveraging your investment in on-premises storage solutions. Join this session to learn how you can implement a robust disaster recovery solution for mission-critical workloads using Amazon FSx for NetApp ONTAP and on-premises ONTAP with Outposts. Learn how to perform real-time SnapMirror replication between Regions and on-premises environments and monitor replication status and RPO metrics.

Theme 3: Data residency and digital sovereignty

As organizations scale innovative solutions globally, they need to navigate complex digital sovereignty requirements. Learn how AWS Hybrid and Edge services help you adopt a consistent approach to security, monitoring, management, and auditing across jurisdictions while meeting regulatory obligations.

HMC310 | Breakout session | Digital sovereignty and data residency with AWS Hybrid and Edge services

Tuesday, Dec 2 | 4:30PM – 5:30PM PST

Caesars Forum | Level 1 | Summit 212 | Content Hub | Yellow Theater

This session examines best practices for data residency, security controls, and operational consistency across distributed locations. Hear how AWS helped Geidea, a leading fintech company, accelerate business expansion in the Middle East while meeting country-specific data residency requirements.

HMC214 | Lightning talk | DraftKings: Meeting gaming regulations at Super Bowl scale with AWS Local Zones

Wednesday, Dec 3 | 2:00PM – 2:20PM PST

Mandalay Bay | Level 2 South | Oceanside C | Content Hub | Lightning Theater

Learn how DraftKings built a scalable edge strategy using Regions and Local Zones to meet Federal Wire Act requirements while accelerating expansion into 26 US states.

HMC316 | Chalk talk | Address digital sovereignty with hybrid cloud solutions

Monday, Dec 1 | 1:30PM – 2:30PM PST

Mandalay Bay | Lower Level North | South Pacific D

Learn how to choose the best AWS infrastructure option for your sovereign needs and architect applications for data residency and resiliency. Discover how to implement security controls to regulate how data can be stored, processed, and transferred, and how to prevent unauthorized data access.

Theme 4: Optimizing for low and ultra-low latency

Systems like online ticketing, real-time threat detection, manufacturing control, and financial trading require network latency ranging from double-digit milliseconds to low double-digit microseconds. The Hybrid Cloud track discusses how AWS brings cloud services closer to end users and data generation points to satisfy various latency profiles.

SPF206 | Breakout session | Ticketmaster: Enhancing live event experience for fans with AWS Local Zones

Thursday, Dec 4 | 2:30PM – 3:30PM PST

Caesars Forum | Level 1 | Sports Forum | Mainstage

Discover how Ticketmaster delivers superior live event experiences by bringing its online ticket sales platform closer to fans using Local Zones.

HMC216 | Lightning talk | LSEG: Modernizing critical financial systems with multicloud and hybrid cloud

Tuesday, Dec 2 | 3:30PM – 3:50PM PST

Mandalay Bay | Level 2 South | Oceanside C | Content Hub | Lightning Theater

Learn how LSEG is transforming its global PriceStream FX trading platform, implementing a sophisticated architecture for ultra-low latency with managed services. Additionally, explore how LSEG is modernizing clearing systems as critical national infrastructure, balancing regulatory compliance, operational excellence, and business continuity with strict RPO/RTO requirements.

HMC215 | Lightning talk | AWS Local Zones – Sophos’ new edge in the global race against cyber-attacks

Tuesday, Dec 2 | 3:00PM – 3:20PM PST

Mandalay Bay | Level 2 South | Oceanside C | Content Hub | Lightning Theater

Discover how AWS Hybrid and Edge solutions transform how organizations deliver low-latency applications at the edge. Learn how Local Zones extends Regions and services closer to population centers, fitting use cases from media streaming to real-time gaming and financial trading.

HMC213 | Lightning talk | AWS Local Zones & Outposts: Verifone’s Global Edge Computing Strategy

Tuesday, Dec 2 | 11:30AM – 11:50AM PST

Mandalay Bay | Level 2 South | Oceanside C | Content Hub | Lightning Theater

Hear from Verifone on how it transformed its global payment ecosystem, managing 35 million terminals with innovative edge computing. Dive into strategies for deploying point-of-sale (POS) software using multi-tier architectures.

HMC402 | Chalk talk | Meet ultra-low latency and high throughput needs with AWS Outposts

Wednesday, Dec 3 | 10:00AM – 11:00AM PST

Mandalay Bay | Level 2 South | Lagoon G

Dive deep into the new category of accelerated networking Amazon EC2 instances on Outposts, purpose-built for modernizing ultra-low latency and high-throughput mission-critical workloads.

Theme 5: Architecture considerations for hybrid cloud

Running applications outside of Regions requires special architectural considerations to address space, power, and networking constraints. We will share best practices and implementation guidance on improving security, resiliency, and availability.

HMC327 | Lightning talk | Nasdaq: Build resilient infrastructure for global financial services

Tuesday, Dec 2 | 11:00AM – 11:20AM PST

Mandalay Bay | Level 2 South | Oceanside C | Content Hub | Lightning Theater

This session discusses how Nasdaq modernizes its mission-critical capital markets infrastructure while upholding the highest level of resiliency. Learn how Nasdaq integrates Outposts and multi-Region deployments into its core trading and surrounding systems, balancing cloud flexibility with performance and reliability.

HMC328 | Lightning talk | Build resilient and low-latency hybrid telecom infrastructure at scale

Thursday, Dec 4 | 5:00PM – 5:20PM PST

Mandalay Bay | Level 2 South | Oceanside C | Content Hub | Lightning Theater

Discover how Liberty Latin America (LLA) built telecom infrastructure in a hybrid cloud architecture for millions of subscribers. Learn how LLA created a resilient, low-latency network with over 180 VPCs.

HMC314 | Chalk talk | Deploying for resilience: HA/DR strategies for AWS Outposts and AWS Local Zones

Monday, Dec 1 | 1:30PM – 2:30PM PST

MGM | Level 1 | Boulevard 169

In this chalk talk, learn how to plan and implement resilient deployments to deliver high availability and disaster recovery, especially for business-critical or mission-critical workloads.

HMC315 | Chalk talk | Deep dive on AWS hybrid and edge networking architectures

Tuesday, Dec 2 | 1:30PM – 2:30PM PST

MGM | Level 1 | Boulevard 156

This chalk talk covers patterns such as private vs. public connectivity, service link and on-premises connectivity, and designing resilient Multi-AZ architecture across Outposts and Local Zones.

HMC403 | Code talk | Build and optimize edge architecture for resiliency with AI

Wednesday, Dec 3 | 2:30PM – 3:30PM PST

MGM | Level 3 | Chairman’s 356

This live coding session explores how to automate edge infrastructure operations with AI. Discover how to build resilient architectures with the latest Outposts and Local Zones APIs and Strands Agents.

HMC301 | Workshop | Build and operate resilient and performant distributed applications [REPEAT]

HMC301-R: Monday, Dec 1 | 3:00PM – 5:00PM PST | MGM | Grand 117

HMC301-R1: Thursday, Dec 4 | 3:00PM – 5:00PM PST | MGM | Premier 318

This workshop explores how to design and implement applications for multi-geo operations while meeting data residency and performance requirements. You will learn how to design fault-tolerant, latency-sensitive applications across distributed locations with limited hardware resources.

Activities in the Expo

Beyond sessions, join us in the re:Invent Expo (The Venetian, Level 2, Hall B) to meet with AWS experts and learn through interactive demos.

AWS Village (Booth #750)

Connect with AWS experts in the Hybrid Cloud kiosk and AWS Global Infrastructure kiosk to discuss your hybrid cloud and edge computing needs. Watch demos on the following topics and ask questions:

  • Consistent Amazon EKS experience across distributed locations and how it can simplify GPU resource orchestration
  • How to optimize and deploy SLMs locally for AI inference with low latency or data residency needs
  • How to build agentic AI workflows at the edge for manufacturing automation

Stop by the Migration & Modernization area of the AWS Village to see hardware innovations inside the latest generation of Outposts up close and personal.

AWS for Industries Pavillion (Booth #111)

Explore how AWS Hybrid and Edge services unlock new use cases and improve operational efficiency across multiple industries through immersive experiences:

  • AWS for Telecom – Modernizing telecom infrastructure while meeting data sovereignty and performance requirements
  • AWS for Public Sector – Accelerating tactical edge deployments to improve mission outcomes
  • AWS for Automotive – Advancing global R&D of embedded hardware and software

Partner booths

Discover how AWS Hybrid and Edge services and AWS Partner solutions unlock additional use cases through demos:

  • Pure Storage (Booth #1756) – Simplifying cloud migration with on-premises dependencies through Outposts integration with Pure Storage FlashArray
  • Intel (Booth #1010): – Powering physical AI and agentic systems for real-time operations in manufacturing
  • Seagate (Booth #159) – Facilitating data ingestion and pre-processing at the edge

Ready for re:Invent 2025?

We hope this guide to hybrid cloud and edge computing helps you maximize your learning experience at re:Invent. Not able to attend in-person? Register for the virtual-only event offered at no additional cost to livestream keynotes and innovation talks, and access on-demand session content. See you in Las Vegas or virtually!

QUIC action: patching a broadcast address amplification vulnerability

Post Syndicated from Josephine Chow original https://blog.cloudflare.com/mitigating-broadcast-address-attack/

Cloudflare was recently contacted by a group of anonymous security researchers who discovered a broadcast amplification vulnerability through their QUIC Internet measurement research. Our team collaborated with these researchers through our Public Bug Bounty program, and worked to fully patch a dangerous vulnerability that affected our infrastructure.

Since being notified about the vulnerability, we’ve implemented a mitigation to help secure our infrastructure. According to our analysis, we have fully patched this vulnerability and the amplification vector no longer exists. 

Summary of the amplification attack

QUIC is an Internet transport protocol that is encrypted by default. It offers equivalent features to TCP (Transmission Control Protocol) and TLS (Transport Layer Security), while using a shorter handshake sequence that helps reduce connection establishment times. QUIC runs over UDP (User Datagram Protocol).

The researchers found that a single client QUIC Initial packet targeting a broadcast IP destination address could trigger a large response of initial packets. This manifested as both a server CPU amplification attack and a reflection amplification attack.

Transport and security handshakes

When using TCP and TLS there are two handshake interactions. First, is the TCP 3-way transport handshake. A client sends a SYN packet to a server, it responds with a SYN-ACK to the client, and the client responds with an ACK. This process validates the client IP address. Second, is the TLS security handshake. A client sends a ClientHello to a server, it carries out some cryptographic operations and responds with a ServerHello containing a server certificate. The client verifies the certificate, confirms the handshake and sends application traffic such as an HTTP request.

QUIC follows a similar process, however the sequence is shorter because the transport and security handshake is combined. A client sends an Initial packet containing a ClientHello to a server, it carries out some cryptographic operations and responds with an Initial packet containing a ServerHello with a server certificate. The client verifies the certificate and then sends application data.


The QUIC handshake does not require client IP address validation before starting the security handshake. This means there is a risk that an attacker could spoof a client IP and cause a server to do cryptographic work and send data to a target victim IP (aka a reflection attack). RFC 9000 is careful to describe the risks this poses and provides mechanisms to reduce them (for example, see Sections 8 and 9.3.1). Until a client address is verified, a server employs an anti-amplification limit, sending a maximum of 3x as many bytes as it has received. Furthermore, a server can initiate address validation before engaging in the cryptographic handshake by responding with a Retry packet. The retry mechanism, however, adds an additional round-trip to the QUIC handshake sequence, negating some of its benefits compared to TCP. Real-world QUIC deployments use a range of strategies and heuristics to detect traffic loads and enable different mitigations.

In order to understand how the researchers triggered an amplification attack despite these QUIC guardrails, we first need to dive into how IP broadcast works.

Broadcast addresses

In Internet Protocol version 4 (IPv4) addressing, the final address in any given subnet is a special broadcast IP address used to send packets to every node within the IP address range. Every node that is within the same subnet receives any packet that is sent to the broadcast address, enabling one sender to send a message that can be “heard” by potentially hundreds of adjacent nodes. This behavior is enabled by default in most network-connected systems and is critical for discovery of devices within the same IPv4 network.


The broadcast address by nature poses a risk of DDoS amplification; for every one packet sent, hundreds of nodes have to process the traffic. 

Dealing with the expected broadcast

To combat the risk posed by broadcast addresses, by default most routers reject packets originating from outside their IP subnet which are targeted at the broadcast address of networks for which they are locally connected. Broadcast packets are only allowed to be forwarded within the same IP subnet, preventing attacks from the Internet from targeting servers across the world.


The same techniques are not generally applied when a given router is not directly connected to a given subnet. So long as an address is not locally treated as a broadcast address, Border Gateway Protocol (BGP) or other routing protocols will continue to route traffic from external IPs toward the last IPv4 address in a subnet. Essentially, this means a “broadcast address” is only relevant within a local scope of routers and hosts connected together via Ethernet. To routers and hosts across the Internet, a broadcast IP address is routed in the same way as any other IP.

Binding IP address ranges to hosts

Each Cloudflare server is expected to be capable of serving content from every website on the Cloudflare network. Because our network utilizes Anycast routing, each server necessarily needs to be listening on (and capable of returning traffic from) every Anycast IP address in use on our network.

To do so, we take advantage of the loopback interface on each server. Unlike a physical network interface, all IP addresses within a given IP address range are made available to the host (and will be processed locally by the kernel) when bound to the loopback interface.

The mechanism by which this works is straightforward. In a traditional routing environment, longest prefix matching is employed to select a route. Under longest prefix matching, routes towards more specific blocks of IP addresses (such as 192.0.2.96/29, a range of 8 addresses) will be selected over routes to less specific blocks of IP addresses (such as 192.0.2.0/24, a range of 256 addresses).

While Linux utilizes longest prefix matching, it consults an additional step — the Routing Policy Database (RPDB) — before immediately searching for a match. The RPDB contains a list of routing tables which can contain routing information and their individual priorities. The default RPDB looks like this:

$ ip rule show
0:	from all lookup local
32766:	from all lookup main
32767:	from all lookup default

Linux will consult each routing table in ascending numerical order to try and find a matching route. Once one is found, the search is terminated and the route immediately used.

If you’ve previously worked with routing rules on Linux, you are likely familiar with the contents of the main table. Contrary to the existence of the table named “default”, “main” generally functions as the default lookup table. It is also the one which contains what we traditionally associate with route table information:

$ ip route show table main
default via 192.0.2.1 dev eth0 onlink
192.0.2.0/24 dev eth0 proto kernel scope link src 192.0.2.2

This is, however, not the first routing table that will be consulted for a given lookup. Instead, that task falls to the local table:

$ ip route show table local
local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1
local 192.0.2.2 dev eth0 proto kernel scope host src 192.0.2.2
broadcast 192.0.2.255 dev eth0 proto kernel scope link src 192.0.2.2

Looking at the table, we see two new types of routes — local and broadcast. As their names would suggest, these routes dictate two distinctly different functions: routes that are handled locally and routes that will result in a packet being broadcast. Local routes provide the desired functionality — any prefix with a local route will have all IP addresses in the range processed by the kernel. Broadcast routes will result in a packet being broadcast to all IP addresses within the given range. Both types of routes are added automatically when an IP address is bound to an interface (and, when a range is bound to the loopback (lo) interface, the range itself will be added as a local route).

Vulnerability discovery

Deployments of QUIC are highly dependent on the load-balancing and packet forwarding infrastructure that they sit on top of. Although QUIC’s RFCs describe risks and mitigations, there can still be attack vectors depending on the nature of server deployments. The reporting researchers studied QUIC deployments across the Internet and discovered that sending a QUIC Initial packet to one of Cloudflare’s broadcast addresses triggered a flood of responses. The aggregate amount of response data exceeded the RFC’s 3x amplification limit.

Taking a look at the local routing table of an example Cloudflare system, we see a potential culprit:

$ ip route show table local
local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1
local 192.0.2.2 dev eth0 proto kernel scope host src 192.0.2.2
broadcast 192.0.2.255 dev eth0 proto kernel scope link src 192.0.2.2
local 203.0.113.0 dev lo proto kernel scope host src 203.0.113.0
local 203.0.113.0/24 dev lo proto kernel scope host src 203.0.113.0
broadcast 203.0.113.255 dev lo proto kernel scope link src 203.0.113.0

On this example system, the anycast prefix 203.0.113.0/24 has been bound to the loopback interface (lo) through the use of standard tooling. Acting dutifully under the standards of IPv4, the tooling has assigned both special types of routes — a local one for the IP range itself and a broadcast one for the final address in the range — to the interface.

While traffic to the broadcast address of our router’s directly connected subnet is filtered as expected, broadcast traffic targeting our routed anycast prefixes still arrives at our servers themselves. Normally, broadcast traffic arriving at the loopback interface does little to cause problems. Services bound to a specific port across an entire range will receive data sent to the broadcast address and continue as normal. Unfortunately, this relatively simple trait breaks down when normal expectations are broken.

Cloudflare’s frontend consists of several worker processes, each of which independently binds to the entire anycast range on UDP port 443. In order to enable multiple processes to bind to the same port, we use the SO_REUSEPORT socket option. While SO_REUSEPORT has additional benefits, it also causes traffic sent to the broadcast address to be copied to every listener.

Each individual QUIC server worker operates in isolation. Each one reacted to the same client Initial, duplicating the work on the server side and generating response traffic to the client’s IP address. Thus, a single packet could trigger a significant amplification. While specifics will vary by implementation, a typical one-listener-per-core stack (which sends retries in response to presumed timeouts) on a 128-core system could result in 384 replies being generated and sent for each packet sent to the broadcast address.

Although the researchers demonstrated this attack on QUIC, the underlying vulnerability can affect other UDP request/response protocols that use sockets in the same way.

Mitigation

As a communication methodology, broadcast is not generally desirable for anycast prefixes. Thus, the easiest method to mitigate the issue was simply to disable broadcast functionality for the final address in each range.

Ideally, this would be done by modifying our tooling to only add the local routes in the local routing table, skipping the inclusion of the broadcast ones altogether. Unfortunately, the only practical mechanism to do so would involve patching and maintaining our own internal fork of the iproute2 suite, a rather heavy-handed solution for the problem at hand.

Instead, we decided to focus on removing the route itself. Similar to any other route, it can be removed using standard tooling:

$ sudo ip route del 203.0.113.255 table local

To do so at scale, we made a relatively minor change to our deployment system:

  {%- for lo_route in lo_routes %}
    {%- if lo_route.type == "broadcast" %}
        # All broadcast addresses are implicitly ipv4
        {%- do remove_route({
        "dev": "lo",
        "dst": lo_route.dst,
        "type": "broadcast",
        "src": lo_route.src,
        }) %}
    {%- endif %}
  {%- endfor %}

In doing so, we effectively ensure that all broadcast routes attached to the loopback interface are removed, mitigating the risk by ensuring that the specification-defined broadcast address is treated no differently than any other address in the range.

Next steps 

While the vulnerability specifically affected broadcast addresses within our anycast range, it likely expands past our infrastructure. Anyone with infrastructure that meets the relatively narrow criteria (a multi-worker, multi-listener UDP-based service that is bound to all IP addresses on a machine with routable IP prefixes attached in such a way as to expose the broadcast address) will be affected unless mitigations are in place. We encourage network administrators and security professionals to assess their systems for configurations that may present a local amplification attack vector.

Improving platform resilience at Cloudflare through automation

Post Syndicated from Opeyemi Onikute original https://blog.cloudflare.com/improving-platform-resilience-at-cloudflare

Failure is an expected state in production systems, and no predictable failure of either software or hardware components should result in a negative experience for users. The exact failure mode may vary, but certain remediation steps must be taken after detection. A common example is when an error occurs on a server, rendering it unfit for production workloads, and requiring action to recover.

When operating at Cloudflare’s scale, it is important to ensure that our platform is able to recover from faults seamlessly. It can be tempting to rely on the expertise of world-class engineers to remediate these faults, but this would be manual, repetitive, unlikely to produce enduring value, and not scaling. In one word: toil; not a viable solution at our scale and rate of growth.

In this post we discuss how we built the foundations to enable a more scalable future, and what problems it has immediately allowed us to solve.

Growing pains

The Cloudflare Site Reliability Engineering (SRE) team builds and manages the platform that helps product teams deliver our extensive suite of offerings to customers. One important component of this platform is the collection of servers that power critical products such as Durable Objects, Workers, and DDoS mitigation. We also build and maintain foundational software services that power our product offerings, such as configuration management, provisioning, and IP address allocation systems.

As part of tactical operations work, we are often required to respond to failures in any of these components to minimize impact to users. Impact can vary from lack of access to a specific product feature, to total unavailability. The level of response required is determined by the priority, which is usually a reflection of the severity of impact on users. Lower-priority failures are more common — a server may run too hot, or experience an unrecoverable hardware error. Higher-priority failures are rare and are typically resolved via a well-defined incident response process, requiring collaboration with multiple other teams.

The commonality of lower-priority failures makes it obvious when the response required, as defined in runbooks, is “toilsome”. To reduce this toil, we had previously implemented a plethora of solutions to automate runbook actions such as manually-invoked shell scripts, cron jobs, and ad-hoc software services. These had grown organically over time and provided solutions on a case-by-case basis, which led to duplication of work, tight coupling, and lack of context awareness across the solutions.

We also care about how long it takes to resolve any potential impact on users. A resolution process which involves the manual invocation of a script relies on human action, increasing the Mean-Time-To-Resolve (MTTR) and leaving room for human error. This risks increasing the amount of errors we serve to users and degrading trust.

These problems proved that we needed a way to automatically heal these platform components. This especially applies to our servers, for which failure can cause impact across multiple product offerings. While we have mechanisms to automatically steer traffic away from these degraded servers, in some rare cases the breakage is sudden enough to be visible.

Solving the problem

To provide a more reliable platform, we needed a new component that provides a common ground for remediation efforts. This would remove duplication of work, provide unified context-awareness and increase development speed, which ultimately saves hours of engineering time and effort.

A good solution would not allow only the SRE team to auto-remediate, it would empower the entire company. The key to adding self-healing capability was a generic interface for all teams to self-service and quickly remediate failures at various levels: machine, service, network, or dependencies.

A good way to think about auto-remediation is in terms of workflows. A workflow is a sequence of steps to get to a desired outcome. This is not dissimilar to a manual shell script which executes what a human would otherwise do via runbook instructions. Because of this logical fit with workflows, we decided to adopt Temporal

Temporal is a durable execution platform which is useful to gracefully manage infrastructure failures such as network outages and transient failures in external service endpoints. This capability meant we only needed to build a way to schedule “workflow” tasks and have Temporal provide reliability guarantees. This allowed us to focus on building out the orchestration system to support the control and flow of workflow execution in our data centers. 

How does Temporal work?


Before we discuss the system that provides our self-healing functions, let’s explore how the workflow execution engine works, as its native architecture provided numerous benefits that we took advantage of to build a more robust foundation.

The most attractive feature Temporal offered us was the ability to write code that has reliability baked in. Some examples of these primitives are automatic retries, timeouts, rollbacks, and queueing. The Temporal Platform consists of the Temporal Cluster and Worker processes (application code that contains your custom logic).

This architecture allowed us to write our application logic as we normally would, with the added benefits of Temporal. Since Temporal Workers are external to the cluster, we can run tasks anywhere across our global network — a feature that made it easy to build an extensible, easy-to-understand framework for automating tasks.

In Temporal terms, control is provided by the basic principles used to provide workflow execution — Workflows and Activities. A Workflow is simply a sequence of Activities, which are functions that ideally do only ONE task, such as making a request to an external service or rebooting a machine.

Control of workflow behavior can be done using ActivityOptions. This is where you can define timeouts for workflow execution, retry policies, and task queues. Each worker can poll several task queues for both Workflow and Activity tasks. If no worker is polling the task queue in which a Workflow task is declared, nothing happens.

Temporal’s documentation provides a good introduction to writing Temporal workflows.

Building on Temporal

Below, we describe how our automatic remediation system works. It is essentially a way to schedule tasks across our global network with built-in reliability guarantees. With this system, teams can serve their customers more reliably. An unexpected failure mode can be recognized and immediately mitigated, while the root cause can be determined later via a more detailed analysis.

Step one: we need a coordinator


After our initial testing of Temporal, it was now possible to write workflows. But we needed a way to schedule workflow tasks from other internal services. The coordinator was built to serve this purpose, and became the primary mechanism for the authorisation and scheduling of workflows. 

The most important roles of the coordinator are authorisation, workflow task routing, and safety constraints enforcement. Each consumer is authorized via mTLS authentication, and the coordinator uses an ACL to determine whether to permit the execution of a workflow. An ACL configuration looks like the following example.

server_config {
    enable_tls = true
    [...]
    route_rule {
      name  = "global_get"
      method = "GET"
      route_patterns = ["/*"]
      uris = ["spiffe://example.com/worker-admin"]
    }
    route_rule {
      name = "global_post"
      method = "POST"
      route_patterns = ["/*"]
      uris = ["spiffe://example.com/worker-admin"]
      allow_public = true
    }
    route_rule {
      name = "public_access"
      method = "GET"
      route_patterns = ["/metrics"]
      uris = []
      allow_public = true
      skip_log_match = true
    }
}

Each workflow specifies two key characteristics: where to run the tasks and the safety constraints, using an HCL configuration file. Example constraints could be whether to run on only a specific node type (such as a database), or if multiple parallel executions are allowed: if a task has been triggered too many times, that is a sign of a wider problem that might require human intervention. The coordinator uses the Temporal Visibility API to determine the current state of the executions in the Temporal cluster.

An example of a configuration file is shown below:

task_queue_target = "<target>"

# The following entries will ensure that
# 1. This workflow is not run at the same time in a 15m window.
# 2. This workflow will not run more than once an hour.
# 3. This workflow will not run more than 3 times in one day.
#
constraint {
    kind = "concurency"
    value = "1"
    period = "15m"
}

constraint {
    kind = "maxExecution"
    value = "1"
    period = "1h"
}

constraint {
    kind = "maxExecution"
    value = "3"
    period = "24h"
    is_global = true
}

Step two: Task Routing is amazing


An unforeseen benefit of using a central Temporal cluster was the discovery of Task Routing. This feature allows us to schedule a Workflow/Activity on any server that has a running Temporal Worker, and further segment by the type of server, its location, etc. For this reason, we have three primary task queues — the general queue in which tasks can be executed by any worker in the datacenter, the node type queue in which tasks can only be executed by a specific node type in the datacenter, and the individual node queue where we target a specific node for task execution.

We rely on this heavily to ensure the speed and efficiency of automated remediation. Certain tasks can be run in datacenters with known low latency to an external resource, or a node type with better performance than others (due to differences in the underlying hardware). This reduces the amount of failure and latency we see overall in task executions. Sometimes we are also constrained by certain types of tasks that can only run on a certain node type, such as a database.

Task Routing also means that we can configure certain task queues to have a higher priority for execution, although this is not a feature we have needed so far. A drawback of task routing is that every Workflow/Activity needs to be registered to the target task queue, which is a common gotcha. Thankfully, it is possible to catch this failure condition with proper testing.

Step three: when/how to self-heal?

None of this would be relevant if we didn’t put it to good use. A primary design goal for the platform was to ensure we had easy, quick ways to trigger workflows on the most important failure conditions. The next step was to determine what the best sources to trigger the actions were. The answer to this was simple: we could trigger workflows from anywhere as long as they are properly authorized and detect the failure conditions accurately.

Example triggers are an alerting system, a log tailer, a health check daemon, or an authorized engineer via a chatbot. Such flexibility allows a high level of reuse, and permits to invest more in workflow quality and reliability.

As part of the solution, we built a daemon that is able to poll a signal source for any unwanted condition and trigger a configured workflow. We have initially found Prometheus useful as a source because it contains both service-level and hardware/system-level metrics. We are also exploring more event-based trigger mechanisms, which could eliminate the need to use precious system resources to poll for metrics.

We already had internal services that are able to detect widespread failure conditions for our customers, but were only able to page a human. With the adoption of auto-remediation, these systems are now able to react automatically. This ability to create an automatic feedback loop with our customers is the cornerstone of these self-healing capabilities and we continue to work on stronger signals, faster reaction times, and better prevention of future occurrences.

The most exciting part, however, is the future possibility. Every customer cares about any negative impact from Cloudflare. With this platform we can onboard several services (especially those that are foundational for the critical path) and ensure we react quickly to any failure conditions, even before there is any visible impact.

Step four: packaging and deployment

The whole system is written in golang, and a single binary can implement each role. We distribute it as an apt package or a container for maximum ease of deployment.

We deploy a Temporal-based worker to every server we intend to run tasks on, and a daemon in datacenters where we intend to automatically trigger workflows based on the local conditions. The coordinator is more nuanced since we rely on task routing and can trigger from a central coordinator, but we have also found value in running coordinators locally in the datacenters. This is especially useful in datacenters with less capacity or degraded performance, removing the need for a round-trip to schedule the workflows.

Step five: test, test, test

Temporal provides native mechanisms to test an entire workflow, via a comprehensive test suite that supports end-to-end, integration, and unit testing, which we used extensively to prevent regressions while developing. We also ensured proper test coverage for all the critical platform components, especially the coordinator.

Despite the ease of written tests, we quickly discovered that they were not enough. After writing workflows, engineers need an environment as close as possible to the target conditions. This is why we configured our staging environments to support quick and efficient testing. These environments receive the latest changes and point to a different (staging) Temporal cluster, which enables experimentation and easy validation of changes.

After a workflow is validated in the staging environment, we can then do a full release to production. It seems obvious, but catching simple configuration errors before releasing has saved us many hours in development/change-related-task time.

Deploying to production

As you can guess from the title of this post, we put this in production to automatically react to server-specific errors and unrecoverable failures. To this end, we have a set of services that are able to detect single-server failure conditions based on analyzed traffic data. After deployment, we have successfully mitigated potential impact by taking any errant single sources of failure out of production.

We have also created a set of workflows to reduce internal toil and improve efficiency. These workflows can automatically test pull requests on target machines, wipe and reset servers after experiments are concluded, and take away manual processes that cost many hours in toil.

Building a system that is maintained by several SRE teams has allowed us to iterate faster, and rapidly tackle long-standing problems. We have set ambitious goals regarding toil elimination and are on course to achieve them, which will allow us to scale faster by eliminating the human bottleneck.

Looking to the future

Our immediate plans are to leverage this system to provide a more reliable platform for our customers and drastically reduce operational toil, freeing up engineering resources to tackle larger-scale problems. We also intend to leverage more Temporal features such as Workflow Versioning, which will simplify the process of making changes to workflows by ensuring that triggered workflows run expected versions. 

 We are also interested in how others are solving problems using durable execution platforms such as Temporal, and general strategies to eliminate toil. If you would like to discuss this further, feel free to reach out on the Cloudflare Community and start a conversation!

If you’re interested in contributing to projects that help build a better Internet, our engineering teams are hiring.

Thermal design supporting Gen 12 hardware: cool, efficient and reliable

Post Syndicated from Leslye Paniagua original https://blog.cloudflare.com/thermal-design-supporting-gen-12-hardware-cool-efficient-and-reliable

In the dynamic evolution of AI and cloud computing, the deployment of efficient and reliable hardware is critical. As we roll out our Gen 12 hardware across hundreds of cities worldwide, the challenge of maintaining optimal thermal performance becomes essential. This blog post provides a deep dive into the robust thermal design that supports our newest Gen 12 server hardware, ensuring it remains reliable, efficient, and cool (pun very much intended).

The importance of thermal design for hardware electronics

Generally speaking, a server has five core resources: CPU (computing power), RAM (short term memory), SSD (long term storage), NIC (Network Interface Controller, connectivity beyond the server), and GPU (for AI/ML computations). Each of these components can withstand different temperature limits based on their design, materials, location within the server, and most importantly, the power they are designed to work at. This final criteria is known as thermal design power (TDP).

The reason why TDP is so important is closely related to the first law of thermodynamics, which states that energy cannot be created or destroyed, only transformed. In semiconductors, electrical energy is converted into heat, and TDP measures the maximum heat output that needs to be managed to ensure proper functioning.

Back in December 2023, we talked about our decision to move to a 2U form factor, doubling the height of the server chassis to optimize rack density and increase cooling capacity. In this post, we want to share more details on how this additional space is being used to improve performance and reliability supporting up to three times more total system power.

Standardization

In order to support our multi-vendor strategy that mitigates supply chain risks ensuring continuity for our infrastructure, we introduced our own thermal specification to standardize thermal design and system performance. At Cloudflare, we find significant value in building customized hardware optimized for our unique workloads and applications, and we are very fortunate to partner with great hardware vendors who understand and support this vision. However, partnering with multiple vendors can introduce design variables that Cloudflare then controls for consistency within a hardware generation. Some of the most relevant requirements we include in our thermal specification are:

  • Ambient conditions: Given our globally distributed footprint with presence in over 330 cities, environmental conditions can vary significantly.  Hence, servers in our fleet can experience a wide range of temperatures, typically ranging between 28 to 35°C. Therefore, our systems are designed and validated to operate with no issue over temperature ranges from 5 to 40°C (following the ASHRAE A3 definition).

  • Thermal margins: Cloudflare designs with clear requirements for temperature limits on different operating conditions, simulating peak stress, average workloads, and idle conditions. This allows Cloudflare to validate that the system won’t experience thermal throttling, which is a power management control mechanism used to protect electronics from high temperatures.

  • Fan failure support to increase system reliability: This new generation of servers is 100% air cooled. As such, the algorithm that controls fan speed based on critical component temperature needs to be optimized to support continuous operation over the server life cycle. Even though fans are designed with a high (up to seven years) mean time between failure (MTBF), we know fans can and do fail. Losing a server’s worth of capacity due to thermal risks caused by a single fan failure is expensive. Cloudflare requires the server to continue to operate with no issue even in the event of one fan failure. Each Gen 12 server contains four axial fans providing the extra cooling capacity to prevent failures.

  • Maximum power used to cool the system: Because our goal is to serve more Internet traffic using less power, we aim to ensure the hardware we deploy is using power efficiently. Great thermal management must consider the overall cost of cooling relative to the total system power input. It is inefficient to burn power consumption on cooling instead of compute. Thermal solutions should look at the hardware architecture holistically and implement mechanical modifications to the system design in order to optimize airflow and cooling capacity before considering increasing fan speed, as fan power consumption proportionally scales to the cube of its rotational speed. (For example, running the fans at twice (2x) the rotational speed would consume 8x more power,)

System layout

Placing each component strategically within the server will also influence the thermal performance of the system. For this generation of servers, we made several internal layout decisions, where the final component placement takes into consideration optimal airflow patterns, preventing pre-heated air from affecting equipment in the rear end of the chassis. 

Bigger and more powerful fans were selected in order to take advantage of the additional volume available in a 2U form factor. Growing from 40 to 80 millimeters, a single fan can provide up to four times more airflow. Hence, bigger fans can run at slower speeds to provide the required airflow to cool down the same components, significantly improving power efficiency. 

The Extended Volume Air Cooled (EVAC) heatsink was optimized for Gen 12 hardware, and is designed with increased surface area to maximize heat transfer. It uses heatpipes to move the heat effectively away from the CPU to the extended fin region that sits immediately in front of the fans as shown in the picture below.


EVAC heatsink installed in one of our Gen 12 servers. The extended fin region sits right in front of the axial fans. (Photo courtesy of vendor.)

The combination of optimized heatsink design and selection of high-performing fans is expected to significantly reduce the power used for cooling the system. These savings will vary depending on ambient conditions and system stress, but under a typical stress scenario at 25°C ambient temperature, power savings could be as much as 50%.

Additionally, we ensured that the critical components in the rear section of the system, such as the NIC and DC-SCM, were positioned away from the heatsink to promote the use of cooler available air within the system. Learning from past experience, the NIC temperature is monitored by the Baseboard Management Controller (BMC), which provides remote access to the server for administrative tasks and monitoring health metrics. Because the NIC has a built-in feature to protect itself from overheating by going into standby mode when the chip temperature reaches critical limits, it is important to provide air at the lowest possible temperature. As a reference, the temperature of the air right behind the CPU heatsink can reach 70°C or higher, whereas behind the memory banks, it would reach about 55°C under the same circumstances. The image below shows the internal placement of the most relevant components considered while building the thermal solution.  

Using air as cold as possible to cool down any component will increase overall system reliability, preventing potential thermal issues and unplanned system shutdowns. That’s why our fan algorithm uses every thermal sensor available to ensure thermal health while using the minimum possible amount of energy.


Components inside the compute server from one of our vendors, viewed from the rear of the server. (Illustration courtesy of vendor.)

1️. Host Processor Module (HPM)

8. Power Distribution Board (PDB)

2️. DIMMs (x12) 

9. GPUs (up to 2)

3️. CPU (under CPU heatsink)

10. GPU riser card

4. CPU heatsink

11. GPU riser cage

5. System fans (x4: 80mm, dual rotor)

12. Power Supply Units, PSUs (x2)

6. Bracket with power button and intrusion switch

13. DC-SCM 2.0 module

7. E1.S SSD 

14. OCP 3.0 module

Making hardware flexible

With the same thought process of optimizing system layout, we decided to use a PCIe riser above the Power Supply Units (PSUs), enabling the support of up to 2x single wide GPU add-in cards. Once again, the combination of high-performing fans with strategic system architecture gave us the capability to add up to 400W to the original power envelope and incorporate accelerators used in our new and recently announced AI and ML features. 

Hardware lead times are typically long, certainly when compared to software development. Therefore, a reliable strategy for hardware flexibility is imperative in this rapidly changing environment for specialized computing. When we started evaluating Gen 12 hardware architecture and early concept design, we didn’t know for sure we would be needing to implement GPUs for this generation, let alone how many or which type. However, highly efficient design and intentional due diligence analyzing hypothetical use cases help ensure flexibility and scalability of our thermal solution, supporting new requirements from our product teams, and ultimately providing the best solutions to our customers.

Rack-integrated solutions

We are also increasing the volume of integrated racks shipped to our global colocation facilities. Due to the expected increase in rack shipments, it is now more important that we also increase the corresponding mechanical and thermal test coverage from system level (L10) to rack level (L11).

Since our servers don’t use the full depth of a standard rack in order to leave room for cable management and Power Distribution Units (PDUs), there is another fluid mechanics factor that we need to consider to improve our holistic solution. 

We design our hardware based on one of the most typical data center architectures, which have alternating cold and hot aisles. Fans at the front of the server pull in cold air from the corresponding aisle, the air then flows through the server, cooling down the internal components and the hot air is exhausted into the adjacent aisle, as illustrated in the diagram below.


A conventional air-flow diagram of a standard server where the cold air enters from the front of the server and hot air leaves through the rear side of the system. 

In fluid dynamics, the minimum effort principle will drive fluids (air in this case) to move where there is less resistance — i.e. wherever it takes less energy to get from point A to point B. With the help of fans forcing air to flow inside the server and pushing it through the rear, the more crowded systems will naturally get less air than those with more space where the air can move around. Since we need more airflow to pass through the systems with higher power demands, we’ve also ensured that the rack configuration keeps these systems in the bottom of the rack where air tends to be at a lower temperature. Remember that heat rises, so even within the cold aisle, there can be a small but important temperature difference between the bottom and the top section of the rack. It is our duty as hardware engineers to use thermodynamics in our favor. 

Conclusion

Our new generation of hardware is live in our data centers and it represents a significant leap forward in our efficiency, reliability, and sustainability commitments. Combining optimal heat sink design, thoughtful fan selection, and meticulous system layout and hardware architecture, we are confident that these new servers will operate smoothly in our global network with diverse environmental conditions, maintaining optimal performance of our Connectivity Cloud. 

Come join us at Cloudflare to help deliver a better Internet!

Removing uncertainty through “what-if” capacity planning

Post Syndicated from Curt Robords original https://blog.cloudflare.com/scenario-planner

Infrastructure planning for a network serving more than 81 million requests at peak and which is globally distributed across more than 330 cities in 120+ countries is complex. The capacity planning team at Cloudflare ensures there is enough capacity in place all over the world so that our customers have one less thing to worry about – our infrastructure, which should just work. Through our processes, the team puts careful consideration into “what-ifs”. What if something unexpected happens and one of our data centers fails? What if one of our largest customers triples, or quadruples their request count?  Across a gamut of scenarios like these, the team works to understand where traffic will be served from and how the Cloudflare customer experience may change.

This blog post gives a look behind the curtain of how these scenarios are modeled at Cloudflare, and why it’s so critical for our customers.

Scenario planning and our customers

Cloudflare customers rely on the data centers that Cloudflare has deployed all over the world, placing us within 50 ms of approximately 95% of the Internet-connected population globally. But round-trip time to our end users means little if those data centers don’t have the capacity to serve requests. Cloudflare has invested deeply into systems that are working around the clock to optimize the requests flowing through our network because we know that failures happen all the time: the Internet can be a volatile place. See our blog post from August 2024 on how we handle this volatility in real time on our backbone, and our blog post from late 2023 about how another system, Traffic Manager, actively works in and between data centers, moving traffic to optimize the customer experience around constraints in our data centers. Both of these systems do a fantastic job in real time, but there is still a gap — what about over the long term?  

Most of the volatility that the above systems are built to manage is resolved within shorter time scales than which we build plans for. (There are, of course, some failures that are exceptions.) Most scenarios we model still need to take into account the state of our data centers in the future, as well as what actions systems like Traffic Manager will take during those periods.  But before getting into those constraints, it’s important to note how capacity planning measures things: in units of CPU Time, defined as the time that each request takes in the CPU.  This is done for the same reasons that Traffic Manager uses CPU Time, in that it enables the team to 1) use a common unit across different types of customer workloads and 2) speak a common language with other teams and systems (like Traffic Manager). The same reasoning the Traffic Manager team cited in their own blog post is equally applicable for capacity planning: 

…using requests per second as a metric isn’t accurate enough when actually moving traffic. The reason for this is that different customers have different resource costs to our service; a website served mainly from cache with the WAF deactivated is much cheaper CPU wise than a site with all WAF rules enabled and caching disabled. So we record the time that each request takes in the CPU. We can then aggregate the CPU time across each plan to find the CPU time usage per plan. We record the CPU time in ms, and take a per second value, resulting in a unit of milliseconds per second.

This is important for customers for the same reason that the Traffic Manager team cited in their blog post as well: we can correlate CPU time to performance, specifically latency.

Now that we know our unit of measurement is CPU time, we need to set up our models with the new constraints associated with the change that we’re trying to model.  Specifically, there are a subset of constraints that we are particularly interested in because we know that they have the ability to impact our customers by impacting the availability of CPU in a data center.  These are split into two main inputs in our models: Supply and Demand.  We can think of these as “what-if” questions, such as the following examples:

Demand what-ifs

  • What if a new customer onboards to Cloudflare with a significant volume of requests and/or bytes?  

  • What if an existing customer increased its volume of requests and/or bytes by some multiplier (i.e. 2x, 3x, nx), at peak, for the next three months?

  • What if the growth rate, in number of requests and bytes, of all of our data centers worldwide increased from X to Y two months from now, indefinitely?

  • What if the growth rate, in number of requests and bytes, of data center facility A increased from X to Y one month from now?

  • What if traffic egressing from Cloudflare to a last-mile network shifted from one location (such as Boston) to another (such as New York City) next week?

Supply what-ifs

  • What if data center facility A lost some or all of its available servers two months from now?

  • What if we added X servers to data center facility A today?

  • What if some or all of our connectivity to other ASNs (12,500 Networks/nearly 300 Tbps) failed now?

Output

For any one of these, or a combination of them, in our model’s output, we aim to provide answers to the following: 

  • What will the overall capacity picture look like over time? 

  • Where will the traffic go? 

  • How will this impact our costs?

  • Will we need to deploy additional servers to handle the increased load?

Given these sets of questions and outputs, manually creating a model to answer each of these questions, or a combination of these questions, quickly becomes an operational burden for any team.  This is what led us to launch “Scenario Planner”.

Scenario Planner

In August 2024, the infrastructure team finished building “Scenario Planner”, a system that enables anyone at Cloudflare to simulate “what-ifs”. This provides our team the opportunity to quickly model hypothetical changes to our demand and supply metrics across time and in any of Cloudflare’s data centers. The core functionality of the system has to do with the same questions we need to answer in the manual models discussed above.  After we enter the changes we want to model, Scenario Planner converts from units that are commonly associated with each question to our common unit of measurement: CPU Time. These inputs are then used to model the updated capacity across all of our data centers, including how demand may be distributed in cases where capacity constraints may start impacting performance in a particular location.  As we know, if that happens then it triggers Traffic Manager to serve some portion of those requests from a nearby location to minimize impact on customers and user experience.

Updated demand questions with inputs

  • Question: What if a new customer onboards to Cloudflare with a significant volume of requests?  

  • Input: The new customer’s expected volume, geographic distribution, and timeframe of requests, converted to a count of virtual CPUs

  • Calculation(s): Scenario Planner converts from server count to CPU Time, and distributes the new demand across the regions selected according to the aggregate distribution of all customer usage.  

  • Question: What if an existing customer increased its volume of requests and/or bytes by some multiplier (i.e. 2x, 3x, nx), at peak, for the next three months?

  • Input: Select the customer name, the multiplier, and the timeframe

  • Calculation(s): Scenario Planner already has how the selected customer’s traffic is distributed across all data centers globally, so this involves simply multiplying that value by the multiplier selected by the user

  • Question: What if the growth rate, in number of requests and bytes, of all of our data centers worldwide increased from X to Y two months from now, indefinitely?

  • Input: Enter a new global growth rate and timeframe

  • Calculation(s): Scenario Planner distributes this growth across all data centers globally according to their current growth rate.  In other words, the global growth is an aggregation of all individual data center’s growth rates, and to apply a new “Global” growth rate, the system scales up each of the individual data center’s growth rates commensurate with the current distribution of growth.

  • Question: What if the growth rate, in number of requests and bytes, of data center facility A increased from X to Y one month from now?

  • Input: Select a data center facility, enter a new growth rate for that data center and the timeframe to apply that change across.

  • Calculation(s): Scenario Planner passes the new growth rate for the data center to the backend simulator, across the timeline specified by the user

Updated supply questions with inputs

  • Question: What if data center facility A lost some or all of its available servers two months from now?

  • Input: Select a data center, and enter the number of servers to remove, or select to remove all servers in that location, as well as the timeframe for when those servers will not be available

  • Calculation(s): Scenario Planner converts the server count entered (including all servers in a given location) to CPU Time before passing to the backend

  • Question: What if we added X servers to data center facility A today?

  • Input: Select a data center, and enter the number of servers to add, as well as the timeline for when those servers will first go live

  • Calculation(s): Scenario Planner converts the server count entered (including all servers in a given location) to CPU Time before passing to the backend

We made it simple for internal users to understand the impact of those changes because Scenario Planner outputs the same views that everyone who has seen our heatmaps and visual representations of our capacity status is familiar with. There are two main outputs the system provides: a heatmap and an “Expected Failovers” view. Below, we explore what these are, with some examples.

Heatmap

Capacity planning evaluates its success on its ability to predict demand: we generally produce a weekly, monthly, and quarterly forecast of 12 months to three years worth of demand, and nearly all of our infrastructure decisions are based on the output of this forecast.  Scenario Planner provides a view of the results of those forecasts that are implemented via a heatmap: it shows our current state, as well as future planned server additions that are scheduled based on the forecast.  

Here is an example of our heatmap, showing some of our largest data centers in Eastern North America (ENAM). Ashburn is showing as yellow, briefly, because our capacity planning threshold for adding more server capacity to our data centers is 65% utilization (based on CPU time supply and demand): this gives the Cloudflare teams time to procure additional servers, ship them, install them, and bring them live before customers will be impacted and systems like Traffic Manager would begin triggering.  The little cloud icons indicate planned upgrades of varying sizes to get ahead of forecasted future demand well ahead of time to avoid customer performance degradation.


The question Scenario Planner answers then is how this view changes with a hypothetical scenario: What if our Ashburn, Miami, and Atlanta facilities shut down completely?  This is unlikely to happen, but we would expect to see enormous impact on the remaining largest facilities in ENAM. We’ll simulate all three of these failing at the same time, taking them offline indefinitely:




This results in a view of our capacity through the rest of the year in the remaining large data centers in ENAM — capacity is clearly constrained: Traffic Manager will be working hard to mitigate any impact to customer performance if this were to happen. Our capacity view in the heatmap is capped at 75%: this is because Traffic Manager typically engages around this level of CPU utilization. Beyond 75%, Cloudflare customers may begin to experience increased latency, though this is dependent on the product and workload, and is in reality much more dynamic.


This outcome in the heatmap is not unexpected.  But now we typically get a follow-up question: clearly this traffic won’t fit in just Newark, Chicago, and Toronto, so where do all these requests get served from?  Enter the failover simulator: Capacity Planning has been simulating how Traffic Manager may work in the long term for quite a while, and for Scenario Planner, it was simple to extend this functionality to answer exactly this question.

There is currently no traffic being moved by Traffic Manager from these data centers, but our simulation shows a significant portion of the Atlanta CPU time being served from our DFW/Dallas data center as well as Newark (bottom pink), and Chicago (orange) through the rest of the year, during this hypothetical failure. With Scenario Planner, Capacity Planning can take this information and simulate multiple failures all over the world to understand the impact to customers, taking action to ensure that customers trusting Cloudflare with their web properties can expect high performance even in instances of major data center failures. 


Planning with uncertainty

Capacity planning a large global network comes with plenty of uncertainties. Scenario Planner is one example of the work the Capacity Planning team is doing to ensure that the millions of web properties our customers entrust to Cloudflare can expect consistent, top tier performance all over the world.

The Capacity Planning team is hiring — check out the Cloudflare careers page and search for open roles on the Capacity Planning team.

An attendee’s guide to hybrid cloud and edge computing at AWS re:Invent 2023

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/an-attendees-guide-to-hybrid-cloud-and-edge-computing-at-aws-reinvent-2023/

This post is written by Savitha Swaminathan, AWS Sr. Product Marketing Manager

AWS re:Invent 2023 starts on Nov 27th in Las Vegas, Nevada. The event brings technology business leaders, AWS partners, developers, and IT practitioners together to learn about the latest innovations, meet AWS experts, and network among their peer attendees.

This year, AWS re:Invent will once again have a dedicated track for hybrid cloud and edge computing. The sessions in this track will feature the latest innovations from AWS to help you build and run applications securely in the cloud, on premises, and at the edge – wherever you need to. You will hear how AWS customers are using our cloud services to innovate on premises and at the edge. You will also be able to immerse yourself in hands-on experiences with AWS hybrid and edge services through innovative demos and workshops.

At re:Invent there are several session types, each designed to provide you with a way to learn however fits you best:

  • Innovation Talks provide a comprehensive overview of how AWS is working with customers to solve their most important problems.
  • Breakout sessions are lecture style presentations focused on a topic or area of interest and are well liked by business leaders and IT practitioners, alike.
  • Chalk talks deep dive on customer reference architectures and invite audience members to actively participate in the white boarding exercise.
  • Workshops and builder sessions popular with developers and architects, provide the most hands-on experience where attendees can build real-time solutions with AWS experts.

The hybrid edge track will include one leadership overview session and 15 other sessions (4 breakouts, 6 chalk talks, and 5 workshops). The sessions are organized around 4 key themes: Low latency, Data residency, Migration and modernization, and AWS at the far edge.

Hybrid Cloud & Edge Overview

HYB201 | AWS wherever you need it

Join Jan Hofmeyr, Vice President, Amazon EC2, in this leadership session where he presents a comprehensive overview of AWS hybrid cloud and edge computing services, and how we are helping customers innovate on AWS wherever they need it – from Regions, to metro centers, 5G networks, on premises, and at the far edge. Jun Shi, CEO and President of Accton, will also join Jan on stage to discuss how Accton enables smart manufacturing across its global manufacturing sites using AWS hybrid, IoT, and machine learning (ML) services.

Low latency

Many customer workloads require single-digit millisecond latencies for optimal performance. Customers in every industry are looking for ways to run these latency sensitive portions of their applications in the cloud while simplifying operations and optimizing for costs. You will hear about customer use cases and how AWS edge infrastructure is helping companies like Riot Games meet their application performance goals and innovate at the edge.

Breakout session

HYB305 | Delivering low-latency applications at the edge

Chalk talk

HYB308 | Architecting for low latency and performance at the edge with AWS

Workshops

HYB302 | Architecting and deploying applications at the edge

HYB303 | Deploying a low-latency computer vision application at the edge

Data residency

As cloud has become main stream, governments and standards bodies continue to develop security, data protection, and privacy regulations. Having control over digital assets and meeting data residency regulations is becoming increasingly important for public sector customers and organizations operating in regulated industries. The data residency sessions deep dive into the challenges, solutions, and innovations that customers are addressing with AWS to meet their data residency requirements.

Breakout session

HYB309 | Navigating data residency and protecting sensitive data

Chalk talk

HYB307 | Architecting for data residency and data protection at the edge

Workshops

HYB301 | Addressing data residency requirements with AWS edge services

Migration and modernization

Migration and modernization in industries that have traditionally operated with on-premises infrastructure or self-managed data centers is helping customers achieve scale, flexibility, cost savings, and performance. We will dive into customer stories and real-world deployments, and share best practices for hybrid cloud migrations.

Breakout session

HYB203 | A migration strategy for edge and on-premises workloads

Chalk talk

HYB313 | Real-world analysis of successful hybrid cloud migrations

AWS at the far edge

Some customers operate in what we call the far edge: remote oil rigs, military and defense territories, and even space! In these sessions we cover customer use cases and explore how AWS brings cloud services to the far edge and helps customers gain the benefits of the cloud regardless of where they operate.

Breakout session

HYB306 | Bringing AWS to remote edge locations

Chalk talk

HYB312 | Deploying cloud-enabled applications starting at the edge

Workshops

HYB304 | Generative AI for robotics: Race for the best drone control assistant

In addition to the sessions across the 4 themes listed above, the track includes two additional chalk talks covering topics that are applicable more broadly to customers operating hybrid workloads. These chalk talks were chosen based on customer interest and will have repeat sessions, due to high customer demand.

HYB310 | Building highly available and fault-tolerant edge applications

HYB311 | AWS hybrid and edge networking architectures

Learn through interactive demos

In addition to breakout sessions, chalk talks, and workshops, make sure you check out our interactive demos to see the benefits of hybrid cloud and edge in action:

Drone Inspector: Generative AI at the Edge

Location: AWS Village | Venetian Level 2, Expo Hall, Booth 852 | AWS for Every App activation

Embark on a competitive adventure where generative artificial intelligence (AI) intersects with edge computing. Experience how drones can swiftly respond to chat instructions for a time-sensitive object detection mission. Learn how you can deploy foundation models and computer vision (CV) models at the edge using AWS hybrid and edge services for real-time insights and actions.

AWS Hybrid Cloud & Edge kiosk

Location: AWS Village | Venetian Level 2, Expo Hall, Booth 852 | Kiosk #9 & 10

Stop by and chat with our experts about AWS Local Zones, AWS Outposts, AWS Snow Family, AWS Wavelength, AWS Private 5G, AWS Telco Network Builder, and Integrated Private Wireless on AWS. Check out the hardware innovations inside an AWS Outposts rack up close and in person. Learn how you can set up a reliable private 5G network within days and live stream video content with minimal latency.

AWS Next Gen Infrastructure Experience

Location: AWS Village | Venetian Level 2, Expo Hall, Booth 852

Check out demos across Global Infrastructure, AWS for Hybrid Cloud & Edge, Compute, Storage, and Networking kiosks, share on social, and win prizes!

The Future of Connected Mobility

Location: Venetian Level 4, EBC Lounge, wall outside of Lando 4201B

Step into the driver’s seat and experience high fidelity 3D terrain driving simulation with AWS Local Zones. Gain real-time insights from vehicle telemetry with AWS IoT Greengrass running on AWS Snowcone and a broader set of AWS IoT services and Amazon Managed Grafana in the Region. Learn how to combine local data processing with cloud analytics for enhanced safety, performance, and operational efficiency. Explore how you can rapidly deliver the same experience to global users in 75+ countries with minimal application changes using AWS Outposts.

Immersive tourism experience powered by 5G and AR/VR

Location: Venetian, Level 2 | Expo Hall | Telco demo area

Explore and travel to Chichen Itza with an augmented reality (AR) application running on a private network fully built on AWS, which includes the Radio Access Network (RAN), the core, security, and applications, combined with services for deployment and operations. This demo features AWS Outposts.

AWS unplugged: A real time remote music collaboration session using 5G and MEC

Location: Venetian, Level 2 | Expo Hall | Telco demo area

We will demonstrate how musicians in Los Angeles and Las Vegas can collaborate in real time with AWS Wavelength. You will witness songwriters and musicians in Los Angeles and Las Vegas in a live jam session.

Disaster relief with AWS Snowball Edge and AWS Wickr

Location: AWS for National Security & Defense | Venetian, Casanova 606

The hurricane has passed leaving you with no cell coverage and you have a slim chance of getting on the internet. You need to set up a situational awareness and communications network for your team, fast. Using Wickr on Snowball Edge Compute, you can rapidly deploy a platform that provides both secure communications with rich collaboration functionality, as well as real time situational awareness with the Wickr ATAK integration. Allowing you to get on with what’s important.


We hope this guide to the Hybrid Cloud and Edge track at AWS re:Invent 2023 helps you plan for the event and we hope to see you there!

Protect your Amazon Cognito user pool with AWS WAF

Post Syndicated from Maitreya Ranganath original https://aws.amazon.com/blogs/security/protect-your-amazon-cognito-user-pool-with-aws-waf/

Many of our customers use Amazon Cognito user pools to add authentication, authorization, and user management capabilities to their web and mobile applications. You can enable the built-in advanced security in Amazon Cognito to detect and block the use of credentials that have been compromised elsewhere, and to detect unusual sign-in activity and then prompt users for additional verification or block sign-ins. Additionally, you can associate an AWS WAF web access control list (web ACL) with your user pool to allow or block requests to Amazon Cognito user pools, based on security rules.

In this post, we’ll show how you can use AWS WAF with Amazon Cognito user pools and provide a sample set of rate-based rules and advanced AWS WAF rule groups. We’ll also show you how to test and tune the rules to help protect your user pools from common threats.

Rate-based rules for Amazon Cognito user pool endpoints

The following are endpoints exposed publicly by an Amazon Cognito user pool that you can protect with AWS WAF:

  • Hosted UI — These endpoints are listed in the OIDC and hosted UI API reference. Cognito creates these endpoints when you assign a domain to your user pool. Your users will interact with these endpoints when they use the Hosted UI web interface directly, or when your application calls Cognito OAuth endpoints such as Authorize or Token.
  • Public API operations — These generate a request to Cognito API actions that are either unauthenticated or authenticated with a session string or access token, but not with AWS credentials.

A good way to protect these endpoints is to deploy rate-based AWS WAF rules. These rules will detect and block requests with high rates that could indicate an attempt to exceed your Amazon Cognito API request rate quotas and that could subsequently impact requests from legitimate users.

When you apply rate limits, it helps to group Amazon Cognito API actions into four action categories. You can set specific rate limits per action category giving you traffic visibility for each category.

  • User Creation — This category includes operations that create new users in Cognito. Setting a rate limit for this category provides visibility for traffic of these operations and threats such as fake users being created in Cognito, which drives up your Monthly Active User (MAU) costs for Cognito.
  • Sign-in — This category includes operations to initiate a sign-in operation. Setting a rate limit for this category can provide visibility into the abuse of these operations. This could indicate high frequency, automated attempts to guess user credentials, sometimes referred to as credential stuffing.
  • Account Recovery — This category includes operations to recover accounts, including “forgot password” flows. Setting a rate limit for this category can provide visibility into the abuse of these operations, malicious activity can include: sending fake reset attempts, which might result in emails and SMS messages being sent to users.
  • Default — This is a catch-all rate limit that applies to an operation that is not in one of the prior categories. Setting a default rate limit can provide visibility and mitigation from request flooding attacks.

Table 1 below shows selected Hosted UI endpoint paths (the equivalent of individual API actions) and the recommended rate-based rule limit category for each.

Table 1: Amazon Cognito Hosted UI URL paths mapped to action categories

Hosted UI URL path Authentication method Action category
/signup Unauthenticated User Creation
/confirmUser Confirmation code User Creation
/resendcode Unauthenticated User Creation
/login Unauthenticated Sign-in
/oauth2/authorize Unauthenticated Sign-in
/forgotPassword Unauthenticated Account Recovery
/confirmForgotPassword Confirmation code Account Recovery
/logout Unauthenticated Default
/oauth2/revoke Refresh token Default
/oauth2/token Auth code, or refresh token, or client credentials Default
/oauth2/userInfo Access token Default
/oauth2/idpresponse Authorization code Default
/saml2/idpresponse SAML assertion Default

Table 2 below shows selected Cognito API actions and the recommended rate-based rule category for each.

Table 2: Selected Cognito API actions mapped to action categories

API action name Authentication method Action category
SignUp Unauthenticated User Creation
ConfirmSignUp Confirmation code User Creation
ResendConfirmationCode Unauthenticated User Creation
InitiateAuth Unauthenticated Sign-in
RespondToAuthChallenge Unauthenticated Sign-in
ForgotPassword Unauthenticated Account Recovery
ConfirmForgotPassword Confirmation code Account Recovery
AssociateSoftwareToken Access token or session Default
VerifySoftwareToken Access token or session Default

Additionally, the rate-based rules we provide in this post include the following:

  • Two IP sets that represent allow lists for IPv4 and IPv6. You can add IPs that represent your trusted source IP addresses to these IP sets so that other AWS WAF rules don’t apply to requests that originate from these IP addresses.
  • Two IP sets that represent deny lists for IPv4 and IPv6. Add IPs to these IP sets that you want to block in all cases, regardless of the result of other rules.
  • An AWS managed IP reputation rule group: The AWS managed IP reputation list rule group contains rules that are based on Amazon internal threat intelligence, to identify IP addresses typically associated with bots or other threats. You can limit requests that match rules in this rule group to a specific rate limit.

Deploy rate-based rules

You can deploy the rate-based rules described in the previous section by using the AWS CloudFormation template that we provide here.

To deploy rate-based rules using the template

  1. (Optional but recommended) If you want to enable AWS WAF logging and resources to analyze request rates, create an Amazon Simple Storage Service (Amazon S3) bucket in the same AWS Region as your Amazon Cognito user pool, with a bucket name starting with the prefix aws-waf-logs-. If you previously created an S3 bucket for AWS WAF logs, you can choose to reuse it, or you can create a new bucket to store AWS WAF logs for Amazon Cognito.
  2. Choose the following Launch Stack button to launch a CloudFormation stack in your account.

    Launch Stack

    Note: The stack will launch in the N. Virginia (us-east-1) Region. To deploy this solution into other AWS Regions, download the solution’s CloudFormation template and deploy it to the selected Region.

    This template creates the following resources in your AWS account:

    • A rule group for the rate-based rules, according to the limits shown in Tables 1 and 2.
    • Four IP sets for an allow list and deny list for IPv4 and IPv6 addresses.
    • A web ACL that includes the rule group that is created, IP set based rules, and the AWS managed IP reputation rule group.
    • (Optional) The template enables AWS WAF logging for the web ACL to an S3 bucket that you specify.
    • (Optional) The template creates resources to help you analyze AWS WAF logs in S3 to calculate peak request rates that you can use to set rate limits for the rate-based rules.
  3. Set the template parameters as needed. The following table shows the default values for the parameters. We recommend that you deploy the template with the default values and with TestMode set to Yes so that all rules are set to Count. This allows all requests but emits Amazon CloudWatch metrics and AWS WAF log events for each rule that matches. You can then follow the guidance in the next section to analyze the logs and tune the rate limits to match the traffic patterns to your user pool. When you are satisfied with the unique rate limits for each parameter, you can update the stack and set TestMode to No to start blocking requests that exceed the rate limits.

    The rate limits for AWS WAF rate-based rules are configured as the number of requests per 5-minute period per unique source IP. The value of the rate limit can be between 100 and 2,000,000,000 (2 billion).

    Table 3: Default values for template parameters

    Parameter name Description Default value Allowed values
    Request rate limits by action category
    UserCreationRateLimit Rate limit applied to User Creation actions 2000 100–2,000,000,000
    SignInRateLimit Rate limit applied to Sign-in actions 4000 100–2,000,000,000
    AccountRecoveryRateLimit Rate limit applied to Account Recovery actions 1000 100–2,000,000,000
    IPReputationRateLimit Rate limit applied to requests that match the AWS Managed IP reputation list 1000 100–2,000,000,000
    DefaultRateLimit Default rate limit applied to actions that are not in any of the prior categories 6000 100–2,000,000,000
    Test mode
    TestMode Set to Yes to test rules by overriding rule actions to Count. Set to No to apply the default actions for rules after you’ve tested the impact of these rules. Yes Yes or No
    AWS WAF logging and rate analysis
    EnableWAFLogsAndRateAnalysis Set to Yes to enable logging for the AWS WAF web ACL to an S3 bucket and create resources for request rate analysis. Set to No to disable AWS WAF logging and skip creating resources for rate analysis. If No, the rest of the parameter values in this section are ignored. If Yes, choose values for the rest of the parameters in this section. Yes Yes or No
    WAFLogsS3Bucket The name of an existing S3 bucket where AWS WAF logs are delivered. The bucket name must start with aws-waf-logs- and can end with any suffix.
    Only used if the parameter EnableWAFLogsAndRateAnalysis is set to Yes.
    None Name of an existing S3 bucket that starts with the prefix aws-waf-logs-
    DatabaseName The name of the AWS Glue database to create, which will contain the request rate analysis tables created by this template. (Important: The name cannot contain hyphens.)
    Only used if the parameter EnableWAFLogsAndRateAnalysis is set to Yes.
    rate_analysis
    WorkgroupName The name of the Amazon Athena workgroup to create for rate analysis.
    Only used if the parameter EnableWAFLogsAndRateAnalysis is set to Yes.
    rate_analysis
    WAFLogsTableName The name of the AWS Glue table for AWS WAF logs.
    Only used if the parameter EnableWAFLogsAndRateAnalysis is set to Yes.
    waf_logs
    WAFLogsProjectionStartDate The earliest date to analyze AWS WAF logs, in the format YYYY/MM/DD (example: 2023/02/28).
    Only used if the parameter EnableWAFLogsAndRateAnalysis is set to Yes.
    None Set this to the current date, in the format YYYY/MM/DD
  4. Wait for the CloudFormation template to be created successfully.
  5. Go to the AWS WAF console and choose the web ACL created by the template. It will have a name ending with CognitoWebACL.
  6. Choose the Associated AWS resources tab, and then choose Add AWS resource.
  7. For Resource type, choose Amazon Cognito user pool, and then select the Amazon Cognito user pools that you want to protect with this web ACL.
  8. Choose Add.

Now that your user pool is being protected by the rate-based rules in the web ACL you created, you can proceed to tune the rate-based rule limits by analyzing AWS WAF logs.

Tune AWS WAF rate-based rule limits

As described in the previous section, the rate-based rules give you the ability to set separate rate limit values for each category of Amazon Cognito API actions.

Although the CloudFormation template has default starting values for these rate limits, it is important that you tune these values to match the traffic patterns for your user pool. To begin the tuning process, deploy the template with default values for all parameters, including Yes for TestMode. This overrides all rule actions to Count, allowing all requests but emitting CloudWatch metrics and AWS WAF log events for each rule that matches.

After you collect AWS WAF logs for a period of time (this period can vary depending on your traffic, from a couple of hours to a couple of days), you can analyze them, as shown in the next section, to get peak request rates to tune the rate limits to match observed traffic patterns for your user pool.

Query AWS WAF logs to calculate peak request rates by request type

You can calculate peak request rates by analyzing information that is present in AWS WAF logs. One way to analyze these is to send AWS WAF logs to S3 and to analyze the logs by using SQL queries in Amazon Athena. If you deploy the template in this post with default values, it creates the resources you need to analyze AWS WAF logs in S3 to calculate peak requests rates by request type.

If you are instead ingesting AWS WAF logs into your security information and event management (SIEM) system or a different analytics environment, you can create equivalent queries by using the query language for your SIEM or analytics environment to get similar results.

To access and edit the queries built by the CloudFormation template for use

  1. Open the Athena console and switch to the Athena workgroup that was created by the template (the default name is rate_analysis).
  2. On the Saved queries tab, choose the query named Peak request rate per 5-minute period by source IP and request category. The following SQL query will be loaded into the edit panel.
    -- Gets the top 5 source IPs sending the most requests in a 5-minute period per request category
    ‐‐ NOTE: change the start and end timestamps to match the duration of interest
    SELECT request_category, from_unixtime(time_bin*60*5) AS date_time, client_ip, request_count FROM (
      SELECT *, row_number() OVER (PARTITION BY request_category ORDER BY request_count DESC, time_bin DESC) AS row_num FROM (
        SELECT
          CASE
            WHEN ip_reputation_labels.name IN (
              'awswaf:managed:aws:amazon-ip-list:AWSManagedIPReputationList',
              'awswaf:managed:aws:amazon-ip-list:AWSManagedReconnaissanceList',
              'awswaf:managed:aws:amazon-ip-list:AWSManagedIPDDoSList'
            ) THEN 'IPReputation'
            WHEN target.value IN (
              'AWSCognitoIdentityProviderService.InitiateAuth',
              'AWSCognitoIdentityProviderService.RespondToAuthChallenge'
            ) THEN 'SignIn'
            WHEN target.value IN (
              'AWSCognitoIdentityProviderService.ResendConfirmationCode',
              'AWSCognitoIdentityProviderService.SignUp',
              'AWSCognitoIdentityProviderService.ConfirmSignUp'
            ) THEN 'UserCreation'
            WHEN target.value IN (
              'AWSCognitoIdentityProviderService.ForgotPassword',
              'AWSCognitoIdentityProviderService.ConfirmForgotPassword'
            ) THEN 'AccountRecovery'
            WHEN httprequest.uri IN (
              '/login',
              '/oauth2/authorize'
            ) THEN 'SignIn'
            WHEN httprequest.uri IN (
              '/signup',
              '/confirmUser',
              '/resendcode'
            ) THEN 'UserCreation'
            WHEN  httprequest.uri IN (
              '/forgotPassword',
              '/confirmForgotPassword'
            ) THEN 'AccountRecovery'
            ELSE 'Default'
          END AS request_category,
          httprequest.clientip AS client_ip,
          FLOOR("timestamp"/(1000*60*5)) AS time_bin,
          COUNT(*) AS request_count
        FROM waf_logs
          LEFT OUTER JOIN UNNEST(FILTER(httprequest.headers, h -> h.name = 'x-amz-target')) AS t(target) ON TRUE
          LEFT OUTER JOIN UNNEST(FILTER(labels, l -> l.name like 'awswaf:managed:aws:amazon-ip-list:%')) AS t(ip_reputation_labels) ON TRUE
        WHERE
          from_unixtime("timestamp"/1000) BETWEEN TIMESTAMP '2022-01-01 00:00:00' AND TIMESTAMP '2023-01-01 00:00:00'
        GROUP BY 1, 2, 3
        ORDER BY 1, 4 DESC
      )
    ) WHERE row_num <= 5 ORDER BY request_category ASC, row_num ASC
  3. Scroll down to Line 48 in the Query Editor and edit the timestamps to match the start and end time of the time window of interest.
  4. Run the query to calculate the top 5 peak request rates per 5-minute period by source IP and by action category.

The results show the action category, source IP, time, and count of requests. You can use the request count to tune the rate limits for each action category.

The lowest rate limit you can set for AWS WAF rate-based rules is 100 requests per 5-minute period. If your query results show that the peak request count is less than 100, set the rate limit as 100 or higher.

After you have tuned the rate limits, you can apply the changes to your web ACL by updating the CloudFormation stack.

To update the CloudFormation stack

  1. On the CloudFormation console, choose the stack you created earlier.
  2. Choose Update. For Prepare template, choose Use current template, and then choose Next.
  3. Update the values of the parameters with rate limits to match the tuned values from your analysis.
  4. You can choose to enable blocking of requests by setting TestMode to No. This will set the action to Block for the rate-based rules in the web ACL and start blocking traffic that exceeds the rate limits you have chosen.
  5. Choose Next and then Next again to update the stack.

Now the rate-based rules are updated with your tuned limits, and requests will be blocked if you set TestMode to No.

Protect endpoints with user interaction

Now that we’ve covered the bases with rate-based rules, we’ll show you some more advanced AWS WAF rules that further help protect your user pool. We’ll explore two sample scenarios in detail, and provide AWS WAF rules for each. You can use the rules provided as a guideline to build others that can help with similar use cases.

Rules to verify human activity

The first scenario is protecting endpoints where users have interaction with the page. This will be a browser-based interaction, and a human is expected to be behind the keyboard. This scenario applies to the Hosted UI endpoints such as /login, /signup, and /forgotPassword, where a CAPTCHA can be rendered on the user’s browser for the user to solve. Let’s take the login (sign-in) endpoint as an example, and imagine you want to make sure that only actual human users are attempting to sign in and you want to block bots that might try to guess passwords.

To illustrate how to protect this endpoint with AWS WAF, we’re sharing a sample rule, shown in Figure 1. In this rule, you can take input from prior rules like the Amazon IP reputation list or the Anonymous IP list (which are configured to Count requests and add labels) and combine that with a CAPTCHA action. The logic of the rule says that if the request matches the reputation rules (and has received the corresponding labels) and is going to the /login endpoint, then the AWS WAF action should be to respond with a CAPTCHA challenge. This will present a challenge that increases the confidence that a human is performing the action, and it also adds a custom label so you can efficiently identify and have metrics on how many requests were matched by this rule. The rule is provided in the CloudFormation template and is in JSON format, because it has advanced logic that cannot be displayed by the console. Learn more about labels and CAPTCHA actions in the AWS WAF documentation.

Figure 1: Login sample rule flow

Figure 1: Login sample rule flow

Note that the rate-based rules you created in the previous section are evaluated before the advanced rules. The rate-based rules will block requests to the /login endpoint that exceed the rate limit you have configured, while this advanced rule will match requests that are below the rate limit but match the other conditions in the rule.

Rules for specific activity

The second scenario explores activity on specific application clients within the user pool. You can spot this activity by monitoring the logs provided by AWS WAF, or other traffic logs like Application Load Balancer (ALB) logs. The application client information is provided in the call to the service.

In the Amazon Cognito user pool in this scenario, we have different application clients and they’re constrained by geography. For example, for one of the application clients, requests are expected to come from the United States at or below a certain rate. We can create a rule that combines the rate and geographical criteria to block requests that don’t meet the conditions defined.

The flow of this rule is shown in Figure 2. The logic of the rule will evaluate the application client information provided in the request and the geographic information identified by the service, and apply the selected rate limit. If blocked, the rule will provide a custom response code by using HTTP code 429 Too Many Requests, which can help the sender understand the reason for the block. For requests that you make with the Amazon Cognito API, you could also customize the response body of a request that receives a Block response. Adding a custom response helps provide the sender context and adjust the rate or information that is sent.

Figure 2: AppClientId sample rule flow

Figure 2: AppClientId sample rule flow

AWS WAF can detect geo location with Region accuracy and add specific labels for the location. These can then be used in other rule evaluations. This rule is also provided as a sample in the CloudFormation template.

Advanced protections

To build on the rules we’ve shared so far, you can consider using some of the other intelligent threat mitigation rules that are available as managed rules—namely, bot control for common or targeted bots. These rules offer advanced capabilities to detect bots in sensitive endpoints where automation or non-browser user agents are not expected or allowed. If you receive machine traffic to the endpoint, these rules will result in false positives that would need to be tuned. For more information, see Options for intelligent threat mitigation.

The sample rule flow in Figure 3 shows an example for our Hosted UI, which builds on the first rule we built for specific activity and adds signals coming from the Bot Control common bots managed rule, in this case the non-browser-user-agent label.

Figure 3: Login sample rule with advanced protections

Figure 3: Login sample rule with advanced protections

Adding the bot detection label will also add accuracy to the evaluation, because AWS WAF will consider multiple different sources of information when analyzing the request. This can also block attacks that come from a small set of IPs or easily recognizable bots.

We’ve shared this rule in the CloudFormation template sample. The rule requires you to add AWS WAF Bot Control (ABC) before the custom rule evaluation. ABC has additional costs associated with it and should only be used for specific use cases. For more information on ABC and how to enable it, see this blog post.

After adding these protections, we have a complete set of rules for our Hosted UI–specific needs; consider that your traffic and needs might be different. Figure 4 shows you what the rule priority looks like. All rules except the last are included in the provided CloudFormation template. Managed rule evaluations need to have higher priority and be in Count mode; this way, a matching request can get labels that can be evaluated further down the priority list by using the custom rules that were created. For more information, see How labeling works.

Figure 4: Summary of the rules discussed in this post

Figure 4: Summary of the rules discussed in this post

Conclusion

In this post, we examined the different protections provided by the integration between AWS WAF and Amazon Cognito. This integration makes it simpler for you to view and monitor the activity in the different Amazon Cognito endpoints and APIs, while also adding rate-based rules and IP reputation evaluations. For more specific use cases and advanced protections, we provided sample custom rules that use labels, as well as an advanced rule that uses bot control for common bots. You can use these advanced rules as examples to create similar rules that apply to your use cases.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the re:Post with tag AWS WAF or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Author

Maitreya Ranganath

Maitreya is an AWS Security Solutions Architect. He enjoys helping customers solve security and compliance challenges and architect scalable and cost-effective solutions on AWS.

Diana Alvarado

Diana Alvarado

Diana is Sr security solutions architect at AWS. She is passionate about helping customers solve difficult cloud challenges, she has a soft spot for all things logs.

Oxy: Fish/Bumblebee/Splicer subsystems to improve reliability

Post Syndicated from Quang Luong original https://blog.cloudflare.com/oxy-fish-bumblebee-splicer-subsystems-to-improve-reliability/

Oxy: Fish/Bumblebee/Splicer subsystems to improve reliability

Oxy: Fish/Bumblebee/Splicer subsystems to improve reliability

At Cloudflare, we are building proxy applications on top of Oxy that must be able to handle a huge amount of traffic. Besides high performance requirements, the applications must also be resilient against crashes or reloads. As the framework evolves, the complexity also increases. While migrating WARP to support soft-unicast (Cloudflare servers don’t own IPs anymore), we needed to add different functionalities to our proxy framework. Those additions increased not only the code size but also resource usage and states required to be preserved between process upgrades.

To address those issues, we opted to split a big proxy process into smaller, specialized services. Following the Unix philosophy, each service should have a single responsibility, and it must do it well. In this blog post, we will talk about how our proxy interacts with three different services – Splicer (which pipes data between sockets), Bumblebee (which upgrades an IP flow to a TCP socket), and Fish (which handles layer 3 egress using soft-unicast IPs). Those three services help us to improve system reliability and efficiency as we migrated WARP to support soft-unicast.

Oxy: Fish/Bumblebee/Splicer subsystems to improve reliability

Splicer

Most transmission tunnels in our proxy forward packets without making any modifications. In other words, given two sockets, the proxy just relays the data between them: read from one socket and write to the other. This is a common pattern within Cloudflare, and we reimplement very similar functionality in separate projects. These projects often have their own tweaks for buffering, flushing, and terminating connections, but they also have to coordinate long-running proxy tasks with their process restart or upgrade handling, too.

Turning this into a service allows other applications to send a long-running proxying task to Splicer. The applications pass the two sockets to Splicer and they will not need to worry about keeping the connection alive when restart. After finishing the task, Splicer will return the two original sockets and the original metadata attached to the request, so the original application can inspect the final state of the sockets – for example using TCP_INFO – and finalize audit logging if required.

Bumblebee

Many of Cloudflare’s on-ramps are IP-based (layer 3) but most of our services operate on TCP or UDP sockets (layer 4). To handle TCP termination, we want to create a kernel TCP socket from IP packets received from the client (and we can later forward this socket and an upstream socket to Splicer to proxy data between the eyeball and origin). Bumblebee performs the upgrades by spawning a thread in an anonymous network namespace with unshare syscall, NAT-ing the IP packets, and using a tun device there to perform TCP three-way handshakes to a listener. You can find a more detailed write-up on how we upgrade an IP flows to a TCP stream here.

In short, other services just need to pass a socket carrying the IP flow, and Bumblebee will upgrade it to a TCP socket, no user-space TCP stack involved! After the socket is created, Bumblebee will return the socket to the application requesting the upgrade. Again, the proxy can restart without breaking the connection as Bumblebee pipes the IP socket while Splicer handles the TCP ones.

Fish

Fish forwards IP packets using soft-unicast IP space without upgrading them to layer 4 sockets. We previously implemented packet forwarding on shared IP space using iptables and conntrack. However, IP/port mapping management is not simple when you have many possible IPs to egress from and variable port assignments. Conntrack is highly configurable, but applying configuration through iptables rules requires careful coordination and debugging iptables execution can be challenging. Plus, relying on configuration when sending a packet through the network stack results in arcane failure modes when conntrack is unable to rewrite a packet to the exact IP or port range specified.

Fish attempts to overcome this problem by rewriting the packets and configuring conntrack using the netlink protocol. Put differently, a proxy application sends a socket containing IP packets from the client, together with the desired soft-unicast IP and port range, to Fish. Then, Fish will ensure to forward those packets to their destination. The client’s choice of IP address does not matter; Fish ensures that egressed IP packets have a unique five-tuple within the root network namespace and performs the necessary packet rewriting to maintain this isolation. Fish’s internal state is also survived across restarts.

The Unix philosophy, manifest

To sum up what we are having so far: instead of adding the functionalities directly to the proxy application, we create smaller and reusable services. It becomes possible to understand the failure cases present in a smaller system and design it to exhibit reliable behavior. Then if we can remove the subsystems of a larger system, we can apply this logic to those subsystems. By focusing on making the smaller service work correctly, we improve the whole system’s reliability and development agility.

Although those three services’ business logics are different, you can notice what they do in common: receive sockets, or file descriptors, from other applications to allow them to restart. Those services can be restarted without dropping the connection too. Let’s take a look at how graceful restart and file descriptor passing work in our cases.

File descriptor passing

We use Unix Domain Sockets for interprocess communication. This is a common pattern for inter-process communication. Besides sending raw data, unix sockets also allow passing file descriptors between different processes. This is essential for our architecture as well as graceful restarts.

Oxy: Fish/Bumblebee/Splicer subsystems to improve reliability

There are two main ways to transfer a file descriptor: using pid_getfd syscall or SCM_RIGHTS. The latter is the better choice for us here as the use cases gear toward the proxy application “giving” the sockets instead of the microservices “taking” them. Moreover, the first method would require special permission and a way for the proxy to signal which file descriptor to take.

Currently we have our own internal library named hot-potato to pass the file descriptors around as we use stable Rust in production. If you are fine with using nightly Rust, you may want to consider the unix_socket_ancillary_data feature. The linked blog post above about SCM_RIGHTS also explains how that can be implemented. Still, we also want to add some “interesting” details you may want to know before using your SCM_RIGHTS in production:

  • There is a maximum number of file descriptors you can pass per message
    The limit is defined by the constant SCM_MAX_FD in the kernel. This is set to 253 since kernel version 2.6.38
  • Getting the peer credentials of a socket may be quite useful for observability in multi-tenant settings
  • A SCM_RIGHTS ancillary data forms a message boundary.
  • It is possible to send any file descriptors, not only sockets
    We use this trick together with memfd_create to get around the maximum buffer size without implementing something like length-encoded frames. This also makes zero-copy message passing possible.

Graceful restart

We explored the general strategy for graceful restart in “Oxy: the journey of graceful restarts” blog. Let’s dive into how we leverage tokio and file descriptor passing to migrate all important states in the old process to the new one. We can terminate the old process almost instantly without leaving any connection behind.

Passing states and file descriptors

Applications like NGINX can be reloaded with no downtime. However, if there are pending requests then there will be lingering processes that handle those connections before they terminate. This is not ideal for observability. It can also cause performance degradation when the old processes start building up after consecutive restarts.

In three micro-services in this blog post, we use the state-passing concept, where the pending requests will be paused and transferred to the new process. The new process will pick up both new requests and the old ones immediately on start. This method indeed requires a higher complexity than keeping the old process running. At a high level, we have the following extra steps when the application receives an upgrade request (usually SIGHUP): pause all tasks, wait until all tasks (in groups) are paused, and send them to the new process.

Oxy: Fish/Bumblebee/Splicer subsystems to improve reliability

WaitGroup using JoinSet

Problem statement: we dynamically spawn different concurrent tasks, and each task can spawn new child tasks. We must wait for some of them to complete before continuing.

In other words, tasks can be managed as groups. In Go, waiting for a collection of tasks to complete is a solved problem with WaitGroup. We discussed a way to implement WaitGroup in Rust using channels in a previous blog. There also exist crates like waitgroup that simply use AtomicWaker. Another approach is using JoinSet, which may make the code more readable. Considering the below example, we group the requests using a JoinSet.

    let mut task_group = JoinSet::new();

    loop {
        // Receive the request from a listener
        let Some(request) = listener.recv().await else {
            println!("There is no more request");
            break;
        };
        // Spawn a task that will process request.
        // This returns immediately
        task_group.spawn(process_request(request));
    }

    // Wait for all requests to be completed before continue
    while task_group.join_next().await.is_some() {}

However, an obvious problem with this is if we receive a lot of requests then the JoinSet will need to keep the results for all of them. Let’s change the code to clean up the JoinSet as the application processes new requests, so we have lower memory pressure

    loop {
        tokio::select! {
            biased; // This is optional

            // Clean up the JoinSet as we go
            // Note: checking for is_empty is important 😉
            _task_result = task_group.join_next(), if !task_group.is_empty() => {}

            req = listener.recv() => {
                let Some(request) = req else {
                    println!("There is no more request");
                    break;
                };
                task_group.spawn(process_request(request));
            }
        }
    }

    while task_group.join_next().await.is_some() {}

Cancellation

We want to pass the pending requests to the new process as soon as possible once the upgrade signal is received. This requires us to pause all requests we are processing. In other terms, to be able to implement graceful restart, we need to implement graceful shutdown. The official tokio tutorial already covered how this can be achieved by using channels. Of course, we must guarantee the tasks we are pausing are cancellation-safe. The paused results will be collected into the JoinSet, and we just need to pass them to the new process using file descriptor passing.

For example, in Bumblebee, a paused state will include the environment’s file descriptors, client socket, and the socket proxying IP flow. We also need to transfer the current NAT table to the new process, which could be larger than the socket buffer. So the NAT table state is encoded into an anonymous file descriptor, and we just need to pass the file descriptor to the new process.

Conclusion

We considered how a complex proxy app can be divided into smaller components. Those components can run as new processes, allowing different life-times. Still, this type of architecture does incur additional costs: distributed tracing and inter-process communication. However, the costs are acceptable nonetheless considering the performance, maintainability, and reliability improvements. In the upcoming blog posts, we will talk about different debug tricks we learned when working with a large codebase with complex service interactions using tools like strace and eBPF.

Oxy: the journey of graceful restarts

Post Syndicated from Chris Branch original https://blog.cloudflare.com/oxy-the-journey-of-graceful-restarts/

Oxy: the journey of graceful restarts

Oxy: the journey of graceful restarts

Any software under continuous development and improvement will eventually need a new version deployed to the systems running it. This can happen in several ways, depending on how much you care about things like reliability, availability, and correctness. When I started out in web development, I didn’t think about any of these qualities; I simply blasted my new code over FTP directly to my /cgi-bin/ directory, which was the style at the time. For those of us producing desktop software, often you sidestep this entirely by having the user save their work, close the program and install an update – but they usually get to decide when this happens.

At Cloudflare we have to take this seriously. Our software is in constant use and cannot simply be stopped abruptly. A dropped HTTP request can cause an entire webpage to load incorrectly, and a broken connection can kick you out of a video call. Taking away reliability creates a vacuum filled only by user frustration.

The limitations of the typical upgrade process

There is no one right way to upgrade software reliably. Some programming languages and environments make it easier than others, but in a Turing-complete language few things are impossible.

One popular and generally applicable approach is to start a new version of the software, make it responsible for a small number of tasks at first, and then gradually increase its workload until the new version is responsible for everything and the old version responsible for nothing. At that point, you can stop the old version.

Oxy: the journey of graceful restarts

Most of Cloudflare’s proxies follow a similar pattern: they receive connections or requests from many clients over the Internet, communicate with other internal services to decide how to serve the request, and fetch content over the Internet if we cannot serve it locally. In general, all of this work happens within the lifetime of a client’s connection. If we aren’t serving any clients, we aren’t doing any work.

The safest time to restart, therefore, is when there is nobody to interrupt. But does such a time really exist? The Internet operates 24 hours a day and many users rely on long-running connections for things like backups, real-time updates or remote shell sessions. Even if you defer restarts to a “quiet” period, the next-best strategy of “interrupt the fewest number of people possible” will fail when you have a critical security fix that needs to be deployed immediately.

Despite this challenge, we have to start somewhere. You rarely arrive at the perfect solution in your first try.

(╯°□°)╯︵ ┻━┻

We have previously blogged about implementing graceful restarts in Cloudflare’s Go projects, using a library called tableflip. This starts a new version of your program and allows the new version to signal to the old version that it started successfully, then lets the old version clear its workload. For a proxy like any Oxy application, that means the old version stops accepting new connections once the new version starts accepting connections, then drives its remaining connections to completion.

Oxy: the journey of graceful restarts

This is the simplest case of the migration strategy previously described: the new version immediately takes all new connections, instead of a gradual rollout. But in aggregate across Cloudflare’s server fleet the upgrade process is spread across several hours and the result is as gradual as a deployment orchestrated by Kubernetes or similar.

tableflip also allows your program to bind to sockets, or to reuse the sockets opened by a previous instance. This enables the new instance to accept new connections on the same socket and let the old instance release that responsibility.

Oxy is a Rust project, so we can’t reuse tableflip. We rewrote the spawning/signaling section in Rust, but not the socket code. For that we had an alternative approach.

Socket management with systemd

systemd is a widely used suite of programs for starting and managing all of the system software needed to run a useful Linux system. It is responsible for running software in the correct order – for example ensuring the network is ready before starting a program that needs network access – or running it only if it is needed by another program.

Socket management falls in this latter category, under the term ‘socket activation’. Its intended and original use is interesting but ultimately irrelevant here; for our purposes, systemd is a mere socket manager. Many Cloudflare services configure their sockets using systemd .socket files, and when their service is started the socket is brought into the process with it. This is how we deploy most Oxy-based services, and Oxy has first-class support for sockets opened by systemd.

Oxy: the journey of graceful restarts

Using systemd decouples the lifetime of sockets from the lifetime of the Oxy application. When Oxy creates its sockets on startup, if you restart or temporarily stop the Oxy application the sockets are closed. When clients attempt to connect to the proxy during this time, they will get a very unfriendly “connection refused” error. If, however, systemd manages the socket, that socket remains open even while the Oxy application is stopped. Clients can still connect to the socket and those connections will be served as soon as the Oxy application starts up successfully.

Channeling your inner WaitGroup

A useful piece of library code our Go projects use is WaitGroups. These are essential in Go, where goroutines – asynchronously-running code blocks – are pervasive. Waiting for goroutines to complete before continuing another task is a common requirement. Even the example for tableflip uses them, to demonstrate how to wait for tasks to shut down cleanly before quitting your process.

There is not an out-of-the-box equivalent in tokio – the async Rust runtime Oxy uses – or async/await generally, so we had to create one ourselves. Fortunately, most of the building blocks to roll your own exist already. Tokio has multi-producer, single consumer (MPSC) channels, generally used by multiple tasks to push the results of work onto a queue for a single task to process, but we can exploit the fact that it signals to that single receiver when all the sender channels have been closed and no new messages are expected.

Oxy: the journey of graceful restarts

To start, we create an MPSC channel. Each task takes a clone of the producer end of the channel, and when that task completes it closes its instance of the producer. When we want to wait for all of the tasks to complete, we await a result on the consumer end of the MPSC channel. When every instance of the producer channel is closed – i.e. all tasks have completed – the consumer receives a notification that all of the channels are closed. Closing the channel when a task completes is an automatic consequence of Rust’s RAII rules. Because the language enforces this rule it is harder to write incorrect code, though in fact we need to write very little code at all.

Getting feedback on failure

Many programs that implement a graceful reload/restart mechanism use Unix signals to trigger the process to perform an action. Signals are an ancient technique introduced in early versions of Unix to solve a specific problem while creating dozens more. A common pattern is to change a program’s configuration on disk, then send it a signal (often SIGHUP) which the program handles by reloading those configuration files.

The limitations of this technique are obvious as soon as you make a mistake in the configuration, or when an important file referenced in the configuration is deleted. You reload the program and wonder why it isn’t behaving as you expect. If an error is raised, you have to look in the program’s log output to find out.

This problem compounds when you use an automated configuration management tool. It is not useful if that tool makes a configuration change and reports that it successfully reloaded your program, when in fact the program failed to read the change. The only thing that was successful was sending the reload signal!

We solved this in Oxy by creating a Unix socket specifically for coordinating restarts, and adding a new mode to Oxy that triggers a restart. In this mode:

  1. The restarter process validates the configuration file.
  2. It connects to the restart coordination socket defined in that file.
  3. It sends a “restart requested” message.
  4. The current proxy instance receives this message.
  5. A new instance is started, inheriting a pipe it will use to notify its parent instance.
  6. The current instance waits for the new instance to report success or fail.
  7. The current instance sends a “restart response” message back to the restarter process, containing the result.
  8. The restarter process reports this result back to the user, using exit codes for automated systems to detect failure.
Oxy: the journey of graceful restarts

Now when we make a change to any of our Oxy applications, we can be confident that failures are detected using nothing more than our SREs’ existing tooling. This lets us discover failures earlier, narrow down root causes sooner, and avoid our systems getting into an inconsistent state.

This technique is described more generally in a coworker’s blog, using an internal HTTP endpoint instead. Yet HTTP is missing one important property of Unix sockets for the purpose of replacing signals. A user may only send a signal to a process if the process belongs to them – i.e. they started it – or if the user is root. This prevents another user logged into the same machine from you from terminating all of your processes. As Unix sockets are files, they also follow the Unix permission model. Write permissions are required to connect to a socket. Thus we can trivially reproduce the signals security model by making the restart coordination socket user writable only. (Root, as always, bypasses all permission checks.)

Leave no connection behind

We have put a lot of effort into making restarts as graceful as possible, but there are still certain limitations. After restarting, eventually the old process has to terminate, to prevent a build-up of old processes after successive restarts consuming excessive memory and reducing the performance of other running services. There is an upper bound to how long we’ll let the old process run for; when this is reached, any connections remaining are forcibly broken.

The configuration changes that can be applied using graceful restart is limited by the design of systemd. While some configuration like resource limits can now be applied without restarting the service it applies to, others cannot; most significantly, new sockets. This is a problem inherent to the fork-and-inherit model.

For UDP-based protocols like HTTP/3, there is not even a concept of listener socket. The new process may open UDP sockets, but by default incoming packets are balanced between all open unconnected UDP sockets for a given address. How does the old process drain existing sessions without receiving packets intended for the new process and vice versa?

Is there a way to carry existing state to a new process to avoid some of these limitations? This is a hard problem to solve generally, and even in languages designed to support hot code upgrades there is some degree of running old tasks with old versions of code. Yet there are some common useful tasks that can be carried between processes so we can “interrupt the fewest number of people possible”.

Let’s not forget the unplanned outages: segfaults, oomkiller and other crashes. Thankfully rare in Rust code, but not impossible.

You can find the source for our Rust implementation of graceful restarts, named shellflip, in its GitHub repository. However, restarting correctly is just the first step of many needed to achieve our ultimate reliability goals. In a follow-up blog post we’ll talk about some creative solutions to these limitations.

Introducing Rollbacks for Workers Deployments

Post Syndicated from Cloudflare original https://blog.cloudflare.com/introducing-rollbacks-for-workers-deployments/

Introducing Rollbacks for Workers Deployments

Introducing Rollbacks for Workers Deployments

In November, 2022, we introduced deployments for Workers. Deployments are created as you make changes to a Worker. Each one is unique. These let you track changes to your Workers over time, seeing who made the changes, and where they came from.

Introducing Rollbacks for Workers Deployments

When we made the announcement, we also said our intention was to build more functionality on top of deployments.

Today, we’re proud to release rollbacks for deployments.

Rollbacks

As nice as it would be to know that every deployment is perfect, it’s not always possible – for various reasons. Rollbacks provide a quick way to deploy past versions of a Worker – providing another layer of confidence when developing and deploying with Workers.

Via the dashboard

In the dashboard, you can navigate to the Deployments tab. For each deployment that’s not the most recent, you should see a new icon on the far right of the deployment. Hovering over that icon will display the option to rollback to the specified deployment.

Introducing Rollbacks for Workers Deployments

Clicking on that will bring up a confirmation dialog, where you can enter a reason for rollback. This provides another mechanism of record-keeping and helps give more context for why the rollback was necessary.

Introducing Rollbacks for Workers Deployments

Once you enter a reason and confirm, a new rollback deployment will be created. This deployment has its own ID, but is a duplicate of the one you rolled back to. A message appears with the new deployment ID, as well as an icon showing the rollback message you entered above.

Introducing Rollbacks for Workers Deployments

Via Wrangler

With Wrangler version 2.13, rolling back deployments via Wrangler can be done via a new command – wrangler rollback. This command takes an optional ID to rollback to a specific deployment, but can also be run without an ID to rollback to the previous deployment. This provides an even faster way to rollback in a situation where you know that the previous deployment is the one that you want.

Introducing Rollbacks for Workers Deployments

Just like the dashboard, when you initiate a rollback you will be prompted to add a rollback reason and to confirm the action.

In addition to wrangler rollback we’ve done some refactoring to the wrangler deployments command. Now you can run wrangler deployments list to view up to the last 10 deployments.

Introducing Rollbacks for Workers Deployments

Here, you can see two new annotations: rollback from and message. These match the dashboard experience, and provide more visibility into your deployment history.

To view an individual deployment, you can run wrangler deployments view. This will display the last deployment made, which is the active deployment. If you would like to see a specific deployment, you can run wrangler deployments view [ID].

Introducing Rollbacks for Workers Deployments

We’ve updated this command to display more data like: compatibility date, usage model, and bindings. This additional data will help you to quickly visualize changes to Worker or to see more about a specific Worker deployment without having to open your editor and go through source code.

Keep deploying!

We hope this feature provides even more confidence in deploying Workers, and encourages you to try it out! If you leverage the Cloudflare dashboard to manage deployments, you should have access immediately. Wrangler users will need to update to version 2.13 to see the new functionality.

Make sure to check out our updated deployments docs for more information, as well as information on limitations to rollbacks. If you have any feedback, please let us know via this form.

Oxy is Cloudflare’s Rust-based next generation proxy framework

Post Syndicated from Ivan Nikulin original https://blog.cloudflare.com/introducing-oxy/

Oxy is Cloudflare's Rust-based next generation proxy framework

Oxy is Cloudflare's Rust-based next generation proxy framework

In this blog post, we are proud to introduce Oxy – our modern proxy framework, developed using the Rust programming language. Oxy is a foundation of several Cloudflare projects, including the Zero Trust Gateway, the iCloud Private Relay second hop proxy, and the internal egress routing service.

Oxy leverages our years of experience building high-load proxies to implement the latest communication protocols, enabling us to effortlessly build sophisticated services that can accommodate massive amounts of daily traffic.

We will be exploring Oxy in greater detail in upcoming technical blog posts, providing a comprehensive and in-depth look at its capabilities and potential applications. For now, let us embark on this journey and discover what Oxy is and how we built it.

What Oxy does

We refer to Oxy as our “next-generation proxy framework”. But what do we really mean by “proxy framework”? Picture a server (like NGINX, that reader might be familiar with) that can proxy traffic with an array of protocols, including various predefined common traffic flow scenarios that enable you to route traffic to specific destinations or even egress with a different protocol than the one used for ingress. This server can be configured in many ways for specific flows and boasts tight integration with the surrounding infrastructure, whether telemetry consumers or networking services.

Now, take all of that and add in the ability to programmatically control every aspect of the proxying: protocol decapsulation, traffic analysis, routing, tunneling logic, DNS resolution, and so much more. And this is what Oxy proxy framework is: a feature-rich proxy server tightly integrated with our internal infrastructure that’s customizable to meet application requirements, allowing engineers to tweak every component.

This design is in line with our belief in an iterative approach to development, where a basic solution is built first and then gradually improved over time. With Oxy, you can start with a basic solution that can be deployed to our servers and then add additional features as needed, taking advantage of the many extensibility points offered by Oxy. In fact, you can avoid writing any code, besides a few lines of bootstrap boilerplate and get a production-ready server with a wide variety of startup configuration options and traffic flow scenarios.

Oxy is Cloudflare's Rust-based next generation proxy framework
High-level Oxy architecture

For example, suppose you’d like to implement an HTTP firewall. With Oxy, you can proxy HTTP(S) requests right out of the box, eliminating the need to write any code related to production services, such as request metrics and logs. You simply need to implement an Oxy hook handler for HTTP requests and responses. If you’ve used Cloudflare Workers before, then you should be familiar with this extensibility model.

Similarly, you can implement a layer 4 firewall by providing application hooks that handle ingress and egress connections. This goes beyond a simple block/accept scenario, as you can build authentication functionality or a traffic router that sends traffic to different destinations based on the geographical information of the ingress connection. The capabilities are incredibly rich, and we’ve made the extensibility model as ergonomic and flexible as possible. As an example, if information obtained from layer 4 is insufficient to make an informed firewall decision, the app can simply ask Oxy to decapsulate the traffic and process it with HTTP firewall.

The aforementioned scenarios are prevalent in many products we build at Cloudflare, so having a foundation that incorporates ready solutions is incredibly useful. This foundation has absorbed lots of experience we’ve gained over the years, taking care of many sharp and dark corners of high-load service programming. As a result, application implementers can stay focused on the business logic of their application with Oxy taking care of the rest. In fact, we’ve been able to create a few privacy proxy applications using Oxy that now serve massive amounts of traffic in production with less than a couple of hundred lines of code. This is something that would have taken multiple orders of magnitude more time and lines of code before.

As previously mentioned, we’ll dive deeper into the technical aspects in future blog posts. However, for now, we’d like to provide a brief overview of Oxy’s capabilities. This will give you a glimpse of the many ways in which Oxy can be customized and used.

On-ramps

On-ramp defines a combination of transport layer socket type and protocols that server listeners can use for ingress traffic.

Oxy supports a wide variety of traffic on-ramps:

  • HTTP 1/2/3 (including various CONNECT protocols for layer 3 and 4 traffic)
  • TCP and UDP traffic over Proxy Protocol
  • general purpose IP traffic, including ICMP

With Oxy, you have the ability to analyze and manipulate traffic at multiple layers of the OSI model – from layer 3 to layer 7. This allows for a wide range of possibilities in terms of how you handle incoming traffic.

One of the most notable and powerful features of Oxy is the ability for applications to force decapsulation. This means that an application can analyze traffic at a higher level, even if it originally arrived at a lower level. For example, if an application receives IP traffic, it can choose to analyze the UDP traffic encapsulated within the IP packets. With just a few lines of code, the application can tell Oxy to upgrade the IP flow to a UDP tunnel, effectively allowing the same code to be used for different on-ramps.

The application can even go further and ask Oxy to sniff UDP packets and check if they contain HTTP/3 traffic. In this case, Oxy can upgrade the UDP traffic to HTTP and handle HTTP/3 requests that were originally received as raw IP packets. This allows for the simultaneous processing of traffic at all three layers (L3, L4, L7), enabling applications to analyze, filter, and manipulate the traffic flow from multiple perspectives. This provides a robust toolset for developing advanced traffic processing applications.

Oxy is Cloudflare's Rust-based next generation proxy framework
Multi-layer traffic processing in Oxy applications

Off-ramps

Off-ramp defines a combination of transport layer socket type and protocols that proxy server connectors can use for egress traffic.

Oxy offers versatility in its egress methods, supporting a range of protocols including HTTP 1 and 2, UDP, TCP, and IP. It is equipped with internal DNS resolution and caching, as well as customizable resolvers, with automatic fallback options for maximum system reliability. Oxy implements happy eyeballs for TCP, advanced tunnel timeout logic and has the ability to route traffic to internal services with accompanying metadata.

Additionally, through collaboration with one of our internal services (which is an Oxy application itself!) Oxy is able to offer geographical egress — allowing applications to route traffic to the public Internet from various locations in our extensive network covering numerous cities worldwide. This complex and powerful feature can be easily utilized by Oxy application developers at no extra cost, simply by adjusting configuration settings.

Tunneling and request handling

We’ve discussed Oxy’s communication capabilities with the outside world through on-ramps and off-ramps. In the middle, Oxy handles efficient stateful tunneling of various traffic types including TCP, UDP, QUIC, and IP, while giving applications full control over traffic blocking and redirection.

Additionally, Oxy effectively handles HTTP traffic, providing full control over requests and responses, and allowing it to serve as a direct HTTP or API service. With built-in tools for streaming analysis of HTTP bodies, Oxy makes it easy to extract and process data, such as form data from uploads and downloads.

In addition to its multi-layer traffic processing capabilities, Oxy also supports advanced HTTP tunneling methods, such as CONNECT-UDP and CONNECT-IP, using the latest extensions to HTTP 3 and 2 protocols. It can even process HTTP CONNECT request payloads on layer 4 and recursively process the payload as HTTP if the encapsulated traffic is HTTP.

Oxy is Cloudflare's Rust-based next generation proxy framework
Recursive processing of HTTP CONNECT body payload in HTTP pipeline

TLS

The modern Internet is unimaginable without traffic encryption, and Oxy, of course, provides this essential aspect. Oxy’s cryptography and TLS are based on BoringSSL, providing both a FIPS-compliant version with a limited set of certified features and the latest version that supports all the currently available TLS features. Oxy also allows applications to switch between the two versions in real-time, on a per-request or per-connection basis.

Oxy’s TLS client is designed to make HTTPS requests to upstream servers, with the functionality and security of a browser-grade client. This includes the reconstruction of certificate chains, certificate revocation checks, and more. In addition, Oxy applications can be secured with TLS v1.3, and optionally mTLS, allowing for the extraction of client authentication information from x509 certificates.

Oxy has the ability to inspect and filter HTTPS traffic, including HTTP/3, and provides the means for dynamically generating certificates, serving as a foundation for implementing data loss prevention (DLP) products. Additionally, Oxy’s internal fork of BoringSSL, which is not FIPS-compliant, supports the use of raw public keys as an alternative to WebPKI, making it ideal for internal service communication. This allows for all the benefits of TLS without the hassle of managing root certificates.

Gluing everything together

Oxy is more than just a set of building blocks for network applications. It acts as a cohesive glue, handling the bootstrapping of the entire proxy application with ease, including parsing and applying configurations, setting up an asynchronous runtime, applying seccomp hardening and providing automated graceful restarts functionality.

With built-in support for panic reporting to Sentry, Prometheus metrics with a Rust-macro based API, Kibana logging, distributed tracing, memory and runtime profiling, Oxy offers comprehensive monitoring and analysis capabilities. It can also generate detailed audit logs for layer 4 traffic, useful for billing and network analysis.

To top it off, Oxy includes an integration testing framework, allowing for easy testing of application interactions using TypeScript-based tests.

Extensibility model

To take full advantage of Oxy’s capabilities, one must understand how to extend and configure its features. Oxy applications are configured using YAML configuration files, offering numerous options for each feature. Additionally, application developers can extend these options by leveraging the convenient macros provided by the framework, making customization a breeze.

Suppose the Oxy application uses a key-value database to retrieve user information. In that case, it would be beneficial to expose a YAML configuration settings section for this purpose. With Oxy, defining a structure and annotating it with the #[oxy_app_settings] attribute is all it takes to accomplish this:

///Application’s key-value (KV) database settings
#[oxy_app_settings]
pub struct MyAppKVSettings {
    /// Key prefix.
    pub prefix: Option<String>,
    /// Path to the UNIX domain socket for the appropriate KV 
    /// server instance.
    pub socket: Option<String>,
}

Oxy can then generate a default YAML configuration file listing available options and their default values, including those extended by the application. The configuration options are automatically documented in the generated file from the Rust doc comments, following best Rust practices.

Moreover, Oxy supports multi-tenancy, allowing a single application instance to expose multiple on-ramp endpoints, each with a unique configuration. But, sometimes even a YAML configuration file is not enough to build a desired application, this is where Oxy’s comprehensive set of hooks comes in handy. These hooks can be used to extend the application with Rust code and cover almost all aspects of the traffic processing.

To give you an idea of how easy it is to write an Oxy application, here is an example of basic Oxy code:

struct MyApp;

// Defines types for various application extensions to Oxy's
// data types. Contexts provide information and control knobs for
// the different parts of the traffic flow and applications can extend // all of them with their custom data. As was mentioned before,
// applications could also define their custom configuration.
// It’s just a matter of defining a configuration object with
// `#[oxy_app_settings]` attribute and providing the object type here.
impl OxyExt for MyApp {
    type AppSettings = MyAppKVSettings;
    type EndpointAppSettings = ();
    type EndpointContext = ();
    type IngressConnectionContext = MyAppIngressConnectionContext;
    type RequestContext = ();
    type IpTunnelContext = ();
    type DnsCacheItem = ();

}
   
#[async_trait]
impl OxyApp for MyApp {
    fn name() -> &'static str {
        "My app"
    }

    fn version() -> &'static str {
        env!("CARGO_PKG_VERSION")
    }

    fn description() -> &'static str {
        "This is an example of Oxy application"
    }

    async fn start(
        settings: ServerSettings<MyAppSettings, ()>
    ) -> anyhow::Result<Hooks<Self>> {
        // Here the application initializes various hooks, with each
        // hook being a trait implementation containing multiple
        // optional callbacks invoked during the lifecycle of the
        // traffic processing.
        let ingress_hook = create_ingress_hook(&settings);
        let egress_hook = create_egress_hook(&settings);
        let tunnel_hook = create_tunnel_hook(&settings);
        let http_request_hook = create_http_request_hook(&settings);
        let ip_flow_hook = create_ip_flow_hook(&settings);

        Ok(Hooks {
            ingress: Some(ingress_hook),
            egress: Some(egress_hook),
            tunnel: Some(tunnel_hook),
            http_request: Some(http_request_hook),
            ip_flow: Some(ip_flow_hook),
            ..Default::default()
        })
    }
}

// The entry point of the application
fn main() -> OxyResult<()> {
    oxy::bootstrap::<MyApp>()
}

Technology choice

Oxy leverages the safety and performance benefits of Rust as its implementation language. At Cloudflare, Rust has emerged as a popular choice for new product development, and there are ongoing efforts to migrate some of the existing products to the language as well.

Rust offers memory and concurrency safety through its ownership and borrowing system, preventing issues like null pointers and data races. This safety is achieved without sacrificing performance, as Rust provides low-level control and the ability to write code with minimal runtime overhead. Rust’s balance of safety and performance has made it popular for building safe performance-critical applications, like proxies.

We intentionally tried to stand on the shoulders of the giants with this project and avoid reinventing the wheel. Oxy heavily relies on open-source dependencies, with hyper and tokio being the backbone of the framework. Our philosophy is that we should pull from existing solutions as much as we can, allowing for faster iteration, but also use widely battle-tested code. If something doesn’t work for us, we try to collaborate with maintainers and contribute back our fixes and improvements. In fact, we now have two team members who are core team members of tokio and hyper projects.

Even though Oxy is a proprietary project, we try to give back some love to the open-source community without which the project wouldn’t be possible by open-sourcing some of the building blocks such as https://github.com/cloudflare/boring and https://github.com/cloudflare/quiche.

The road to implementation

At the beginning of our journey, we set out to implement a proof-of-concept  for an HTTP firewall using Rust for what would eventually become Zero Trust Gateway product. This project was originally part of the WARP service repository. However, as the PoC rapidly advanced, it became clear that it needed to be separated into its own Gateway proxy for both technical and operational reasons.

Later on, when tasked with implementing a relay proxy for iCloud Private Relay, we saw the opportunity to reuse much of the code from the Gateway proxy. The Gateway project could also benefit from the HTTP/3 support that was being added for the Private Relay project. In fact, early iterations of the relay service were forks of the Gateway server.

It was then that we realized we could extract common elements from both projects to create a new framework, Oxy. The history of Oxy can be traced back to its origins in the commit history of the Gateway and Private Relay projects, up until its separation as a standalone framework.

Since our inception, we have leveraged the power of Oxy to efficiently roll out multiple projects that would have required a significant amount of time and effort without it. Our iterative development approach has been a strength of the project, as we have been able to identify common, reusable components through hands-on testing and implementation.

Our small core team is supplemented by internal contributors from across the company, ensuring that the best subject-matter experts are working on the relevant parts of the project. This contribution model also allows us to shape the framework’s API to meet the functional and ergonomic needs of its users, while the core team ensures that the project stays on track.

Relation to Pingora

Although Pingora, another proxy server developed by us in Rust, shares some similarities with Oxy, it was intentionally designed as a separate proxy server with a different objective. Pingora was created to serve traffic from millions of our client’s upstream servers, including those with ancient and unusual configurations. Non-UTF 8 URLs or TLS settings that are not supported by most TLS libraries being just a few such quirks among many others. This focus on handling technically challenging unusual configurations sets Pingora apart from other proxy servers.

The concept of Pingora came about during the same period when we were beginning to develop Oxy, and we initially considered merging the two projects. However, we quickly realized that their objectives were too different to do that. Pingora is specifically designed to establish Cloudflare’s HTTP connectivity with the Internet, even in its most technically obscure corners. On the other hand, Oxy is a multipurpose platform that supports a wide variety of communication protocols and aims to provide a simple way to develop high-performance proxy applications with business logic.

Conclusion

Oxy is a proxy framework that we have developed to meet the demanding needs of modern services. It has been designed  to provide a flexible and scalable solution that can be adapted to meet the unique requirements of each project and by leveraging the power of Rust, we made it both safe and fast.

Looking forward, Oxy is poised to play one of the critical roles in our company’s larger effort to modernize and improve our architecture. It provides a solid block in foundation on which we can keep building the better Internet.

As the framework continues to evolve and grow, we remain committed to our iterative approach to development, constantly seeking out new opportunities to reuse existing solutions and improve our codebase. This collaborative, community-driven approach has already yielded impressive results, and we are confident that it will continue to drive the future success of Oxy.

Stay tuned for more tech savvy blog posts on the subject!

Incremental adoption of micro-frontends with Cloudflare Workers

Post Syndicated from Peter Bacon Darwin original https://blog.cloudflare.com/fragment-piercing/

Incremental adoption of micro-frontends with Cloudflare Workers

Bring micro-frontend benefits to legacy Web applications

Incremental adoption of micro-frontends with Cloudflare Workers

Recently, we wrote about a new fragment architecture for building Web applications that is fast, cost-effective, and scales to the largest projects, while enabling a fast iteration cycle. The approach uses multiple collaborating Cloudflare Workers to render and stream micro-frontends into an application that is interactive faster than traditional client-side approaches, leading to better user experience and SEO scores.

This approach is great if you are starting a new project or have the capacity to rewrite your current application from scratch. But in reality most projects are too large to be rebuilt from scratch and can adopt architectural changes only in an incremental way.

In this post we propose a way to replace only selected parts of a legacy client-side rendered application with server-side rendered fragments. The result is an application where the most important views are interactive sooner, can be developed independently, and receive all the benefits of the micro-frontend approach, while avoiding large rewrites of the legacy codebase. This approach is framework-agnostic; in this post we demonstrate fragments built with React, Qwik, and SolidJS.

The pain of large frontend applications

Many large frontend applications developed today fail to deliver good user experience. This is often caused by architectures that require large amounts of JavaScript to be downloaded, parsed and executed before users can interact with the application. Despite efforts to defer non-critical JavaScript code via lazy loading, and the use of server-side rendering, these large applications still take too long to become interactive and respond to the user’s inputs.

Furthermore, large monolithic applications can be complex to build and deploy. Multiple teams may be collaborating on a single codebase and the effort to coordinate testing and deployment of the project makes it hard to develop, deploy and iterate on individual features.

As outlined in our previous post, micro-frontends powered by Cloudflare Workers can solve these problems but converting an application monolith to a micro-frontend architecture can be difficult and expensive. It can take months, or even years, of engineering time before any benefits are perceived by users or developers.

What we need is an approach where a project can incrementally adopt micro-frontends into the most impactful parts of the application incrementally, without needing to rewrite the whole application in one go.

Fragments to the rescue

The goal of a fragment based architecture is to significantly decrease loading and interaction latency for large web applications (as measured via Core Web Vitals) by breaking the application into micro-frontends that can be quickly rendered (and cached) in Cloudflare Workers. The challenge is how to integrate a micro-frontend fragment into a legacy client-side rendered application with minimal cost to the original project.

The technique we propose allows us to convert the most valuable parts of a legacy application’s UI, in isolation from the rest of the application.

It turns out that, in many applications, the most valuable parts of the UI are often nested within an application “shell” that provides header, footer, and navigational elements. Examples of these include a login form, product details panel in an e-commerce application, the inbox in an email client, etc.

Let’s take a login form as an example. If it takes our application several seconds to display the login form, the users will dread logging in, and we might lose them. We can however convert the login form into a server-side rendered fragment, which is displayed and interactive immediately, while the rest of the legacy application boots up in the background. Since the fragment is interactive early, the user can even submit their credentials before the legacy application has started and rendered the rest of the page.

Animation showing the login form being available before the main application

This approach enables engineering teams to deliver valuable improvements to users in just a fraction of the time and engineering cost compared to traditional approaches, which either sacrifice user experience improvements, or require a lengthy and high-risk rewrite of the entire application. It allows teams with monolithic single-page applications to adopt a micro-frontend architecture incrementally, target the improvements to the most valuable parts of the application, and therefore front-load the return on investment.

An interesting challenge in extracting parts of the UI into server-side rendered fragments is that, once displayed in the browser, we want the legacy application and the fragments to feel like a single application. The fragments should be neatly embedded within the legacy application shell, keeping the application accessible by correctly forming the DOM hierarchy, but we also want the server-side rendered fragments to be displayed and become interactive as quickly as possible — even before the legacy client-side rendered application shell comes into existence. How can we embed UI fragments into an application shell that doesn’t exist yet? We resolved this problem via a technique we devised, which we call “fragment piercing”.

Fragment piercing

Fragment piercing combines HTML/DOM produced by server-side rendered micro-frontend fragments with HTML/DOM produced by a legacy client-side rendered application.

The micro-frontend fragments are rendered directly into the top level of the HTML response, and are designed to become immediately interactive. In the background, the legacy application is client-side rendered as a sibling of these fragments. When it is ready, the fragments are “pierced” into the legacy application – the DOM of each fragment is moved to its appropriate place within the DOM of the legacy application – without causing any visual side effects, or loss of client-side state, such as focus, form data, or text selection. Once “pierced”, a fragment can begin to communicate with the legacy application, effectively becoming an integrated part of it.

Here, you can see a “login” fragment and the empty legacy application “root” element at the top level of the DOM, before piercing.

<body>
  <div id="root"></div>
  <piercing-fragment-host fragment-id="login">
    <login q:container...>...</login>
  </piercing-fragment-host>
</body>

And here you can see that the fragment has been pierced into the “login-page” div in the rendered legacy application.

<body>
  <div id="root">
    <header>...</header>
    <main>
      <div class="login-page">
        <piercing-fragment-outlet fragment-id="login">
          <piercing-fragment-host fragment-id="login">
            <login  q:container...>...</login>
          </piercing-fragment-host>
        </piercing-fragment-outlet>
      </div>
    </main>
    <footer>...</footer>
  </div>
</body>

To keep the fragment from moving and causing a visible layout shift during this transition, we apply CSS styles that position the fragment in the same way before and after piercing.

At any time an application can be displaying any number of pierced fragments, or none at all. This technique is not limited only to the initial load of the legacy application. Fragments can also be added to and removed from an application, at any time. This allows fragments to be rendered in response to user interactions and client-side routing.

With fragment piercing, you can start to incrementally adopt micro-frontends, one fragment at a time. You decide on the granularity of fragments, and which parts of the application to turn into fragments. The fragments don’t all have to use the same Web framework, which can be useful when switching stacks, or during a post-acquisition integration of multiple applications.

The “Productivity Suite” demo

As a demonstration of fragment piercing and incremental adoption we have developed a “productivity suite” demo application that allows users to manage to-do lists, read hacker news, etc. We implemented the shell of this application as a client-side rendered React application — a common tech choice in corporate applications. This is our “legacy application”. There are three routes in the application that have been updated to use micro-frontend fragments:

  • /login – a simple dummy login form with client-side validation, displayed when users are not authenticated (implemented in Qwik).
  • /todos – manages one or more todo lists, implemented as two collaborating fragments:
    • Todo list selector – a component for selecting/creating/deleting Todo lists (implemented in Qwik).
    • Todo list editor – a clone of the TodoMVC app (implemented in React).
  • /news – a clone of the HackerNews demo (implemented in SolidJS).

This demo showcases that different independent technologies can be used for both the legacy application and for each of the fragments.

Incremental adoption of micro-frontends with Cloudflare Workers
A visualization of the fragments that are pierced into the legacy application

The application is deployed at https://productivity-suite.web-experiments.workers.dev/.

To try it out, you first need to log in – simply use any username you like (no password needed). The user’s data is saved in a cookie, so you can log out and back in using the same username. After you’ve logged in, navigate through the various pages using the navigation bar at the top of the application. In particular, take a look at the “Todo Lists” and “News” pages to see the piercing in action.

At any point, try reloading the page to see that fragments are rendered instantly while the legacy application loads slowly in the background. Try interacting with the fragments even before the legacy application has appeared!

At the very top of the page there are controls to let you see the impact of fragment piercing in action.

Incremental adoption of micro-frontends with Cloudflare Workers
  • Use the “Legacy app bootstrap delay” slider to set the simulated delay before the legacy application starts.
  • Toggle “Piercing Enabled” to see what the user experience would be if the app did not use fragments.
  • Toggle “Show Seams” to see where each fragment is on the current page.

How it works

The application is composed of a number of building blocks.

Incremental adoption of micro-frontends with Cloudflare Workers
An overview of the collaborating Workers and legacy application host

The Legacy application host in our demo serves the files that define the client-side React application (HTML, JavaScript and stylesheets). Applications built with other tech stacks would work just as well. The Fragment Workers host the micro-frontend fragments, as described in our previous fragment architecture post. And the Gateway Worker handles requests from the browser, selecting, fetching and combining response streams from the legacy application and micro-frontend fragments.

Once these pieces are all deployed, they work together to handle each request from the browser. Let’s look at what happens when you go to the `/login` route.

Incremental adoption of micro-frontends with Cloudflare Workers
The flow of requests when viewing the login page

The user navigates to the application and the browser makes a request to the Gateway Worker to get the initial HTML. The Gateway Worker identifies that the browser is requesting the login page. It then makes two parallel sub-requests – one to fetch the index.html of the legacy application, and another to request the server-side rendered login fragment. It then combines these two responses into a single response stream containing the HTML that is delivered to the browser.

The browser displays the HTML response containing the empty root element for the legacy application, and the server-side rendered login fragment, which is immediately interactive for the user.

The browser then requests the legacy application’s JavaScript. This request is proxied by the Gateway Worker to the Legacy application host. Similarly, any other assets for the legacy application or fragments get routed through the Gateway Worker to the legacy application host or appropriate Fragment Worker.

Once the legacy application’s JavaScript has been downloaded and executed, rendering the shell of the application in the process, the fragment piercing kicks in, moving the fragment into the appropriate place in the legacy application, while preserving all of its UI state.

While focussed on the login fragment to explain fragment piercing, the same ideas apply to the other fragments implemented in the /todos and /news routes.

The piercing library

Despite being implemented using different Web frameworks, all the fragments are integrated into the legacy application in the same way using helpers from a “Piercing Library”. This library is a collection of server-side and client-side utilities that we developed, for the demo, to handle integrating the legacy application with micro-frontend fragments. The main features of the library are the PiercingGateway class, fragment host and fragment outlet custom elements, and the MessageBus class.

PiercingGateway

The PiercingGateway class can be used to instantiate a Gateway Worker that handles all requests for our application’s HTML, JavaScript and other assets. The `PiercingGateway` routes requests through to the appropriate Fragment Workers or to the host of the Legacy Application. It also combines the HTML response streams from these fragments with the response from the legacy application into a single HTML stream that is returned to the browser.

Implementing a Gateway Worker is straightforward using the Piercing Library. Create a new gateway instance of PiercingGateway, passing it the URL to the legacy application host and a function to determine whether piercing is enabled for the given request. Export the gateway as the default export from the Worker script so that the Workers runtime can wire up its fetch() handler.

const gateway = new PiercingGateway<Env>({
  // Configure the origin URL for the legacy application.
  getLegacyAppBaseUrl: (env) => env.APP_BASE_URL,
  shouldPiercingBeEnabled: (request) => ...,
});
...

export default gateway;

Fragments can be registered by calling the registerFragment() method so that the gateway can automatically route requests for a fragment’s HTML and assets to its Fragment Worker. For example, registering the login fragment would look like:

gateway.registerFragment({
  fragmentId: "login",
  prePiercingStyles: "...",
  shouldBeIncluded: async (request) => !(await isUserAuthenticated(request)),
});

Fragment host and outlet

Routing requests and combining HTML responses in the Gateway Worker is only half of what makes piercing possible. The other half needs to happen in the browser where the fragments need to be pierced into the legacy application using the technique we described earlier.

The fragment piercing in the browser is facilitated by a pair of custom elements, the fragment host (<piercing-fragment-host>) and the fragment outlet (<piercing-fragment-outlet>).

The Gateway Worker wraps the HTML for each fragment in a fragment host. In the browser, the fragment host manages the life-time of the fragment and is used when moving the fragment’s DOM into position in the legacy application.

<piercing-fragment-host fragment-id="login">
  <login q:container...>...</login>
</piercing-fragment-host>

In the legacy application, the developer marks where a fragment should appear when it is pierced by adding a fragment outlet. Our demo application’s Login route looks as follows:

export function Login() {
  …
  return (
    <div className="login-page" ref={ref}>
      <piercing-fragment-outlet fragment-id="login" />
    </div>
  );
}

When a fragment outlet is added to the DOM, it searches the current document for its associated fragment host. If found, the fragment host and its contents are moved inside the outlet. If the fragment host is not found, the outlet will make a request to the gateway worker to fetch the fragment HTML, which is then streamed directly into the fragment outlet, using the writable-dom library (a small but powerful library developed by the MarkoJS team).

This fallback mechanism enables client-side navigation to routes that contain new fragments. This way fragments can be rendered in the browser via both initial (hard) navigation and client-side (soft) navigation.

Message bus

Unless the fragments in our application are completely presentational or self-contained, they also need to communicate with the legacy application and other fragments. The MessageBus is a simple asynchronous, isomorphic, and framework-agnostic communication bus that the legacy application and each of the fragments can access.

In our demo application the login fragment needs to inform the legacy application when the user has authenticated. This message dispatch is implemented in the Qwik LoginForm component as follows:

const dispatchLoginEvent = $(() => {
  getBus(ref.value).dispatch("login", {
    username: state.username,
    password: state.password,
  });
  state.loading = true;
});

The legacy application can then listen for these messages like this:

useEffect(() => {
  return getBus().listen<LoginMessage>("login", async (user) => {
    setUser(user);
    await addUserDataIfMissing(user.username);
    await saveCurrentUser(user.username);
    getBus().dispatch("authentication", user);
    navigate("/", { replace: true, });
  });
}, []);

We settled on this message bus implementation because we needed a solution that was framework-agnostic, and worked well on both the server as well as client.

Give it a go!

With fragments, fragment piercing, and Cloudflare Workers, you can improve performance as well as the development cycle of legacy client-side rendered applications. These changes can be adopted incrementally, and you can even do so while implementing fragments with a Web framework for your choice.

The “Productivity Suite” application demonstrating these capabilities can be found at https://productivity-suite.web-experiments.workers.dev/.

All the code we have shown is open-source and published to Github: https://github.com/cloudflare/workers-web-experiments/tree/main/productivity-suite.

Feel free to clone the repository. It is easy to run locally and even deploy your own version (for free) to Cloudflare. We tried to make the code as reusable as possible. Most of the core logic is in the piercing library that you could try in your own projects. We’d be thrilled to receive feedback, suggestions, or hear about applications you’d like to use it for. Join our GitHub discussion or also reach us on our discord channel.

We believe that combining Cloudflare Workers with the latest ideas from frameworks will drive the next big steps forward in improved experiences for both users and developers in Web applications. Expect to see more demos, blog posts and collaborations as we continue to push the boundaries of what the Web can offer. And if you’d also like to be directly part of this journey, we are also happy to share that we are hiring!

Cloudflare Workers and micro-frontends: made for one another

Post Syndicated from Peter Bacon Darwin original https://blog.cloudflare.com/better-micro-frontends/

Cloudflare Workers and micro-frontends: made for one another

To help developers build better web applications we researched and devised a fragments architecture to build micro-frontends using Cloudflare Workers that is lightning fast, cost-effective to develop and operate, and scales to the needs of the largest enterprise teams without compromising release velocity or user experience.

Here we share a technical overview and a proof of concept of this architecture.

Why micro-frontends?

One of the challenges of modern frontend web development is that applications are getting bigger and more complex. This is especially true for enterprise web applications supporting e-commerce, banking, insurance, travel, and other industries, where a unified user interface provides access to a large amount of functionality. In such projects it is common for many teams to collaborate to build a single web application. These monolithic web applications, usually built with JavaScript technologies like React, Angular, or Vue, span thousands, or even millions of lines of code.

When a monolithic JavaScript architecture is used with applications of this scale, the result is a slow and fragile user experience with low Lighthouse scores. Furthermore, collaborating development teams often struggle to maintain and evolve their parts of the application, as their fates are tied with fates of all the other teams, so the mistakes and tech debt of one team often impacts all.

Drawing on ideas from microservices, the frontend community has started to advocate for micro-frontends to enable teams to develop and deploy their features independently of other teams. Each micro-frontend is a self-contained mini-application, that can be developed and released independently, and is responsible for rendering a “fragment” of the page. The application then combines these fragments together so that from the user’s perspective it feels like a single application.

Cloudflare Workers and micro-frontends: made for one another
An application consisting of multiple micro-frontends

Fragments could represent vertical application features, like “account management” or “checkout”, or horizontal features, like “header” or “navigation bar”.

Client-side micro-frontends

A common approach to micro-frontends is to rely upon client-side code to lazy load and stitch fragments together (e.g. via Module Federation). Client-side micro-frontend applications suffer from a number of problems.

Common code must either be duplicated or published as a shared library. Shared libraries are problematic themselves. It is not possible to tree-shake unused library code at build time resulting in more code than necessary being downloaded to the browser and coordinating between teams when shared libraries need to be updated can be complex and awkward.

Also, the top-level container application must bootstrap before the micro-frontends can even be requested, and they also need to boot before they become interactive. If they are nested, then you may end up getting a waterfall of requests to get micro-frontends leading to further runtime delays.

These problems can result in a sluggish application startup experience for the user.

Server-side rendering could be used with client-side micro-frontends to improve how quickly a browser displays the application but implementing this can significantly increase the complexity of development, deployment and operation. Furthermore, most server-side rendering approaches still suffer from a hydration delay before the user can fully interact with the application.

Addressing these challenges was the main motivation for exploring an alternative solution, which relies on the distributed, low latency properties provided by Cloudflare Workers.

Micro-frontends on Cloudflare Workers

Cloudflare Workers is a compute platform that offers a highly scalable, low latency JavaScript execution environment that is available in over 275 locations around the globe. In our exploration we used Cloudflare Workers to host and render micro-frontends from anywhere on our global network.

Fragments architecture

In this architecture the application consists of a tree of “fragments” each deployed to Cloudflare Workers that collaborate to server-side render the overall response. The browser makes a request to a “root fragment”, which will communicate with “child fragments” to generate the final response. Since Cloudflare Workers can communicate with each other with almost no overhead, applications can be server-side rendered quickly by child fragments, all working in parallel to render their own HTML, streaming their results to the parent fragment, which combines them into the final response stream delivered to the browser.

Cloudflare Workers and micro-frontends: made for one another
A high-level overview of a fragments architecture

We have built an example of a “Cloud Gallery” application to show how this can work in practice. It is deployed to Cloudflare Workers at  https://cloud-gallery.web-experiments.workers.dev/

The demo application is a simple filtered gallery of cloud images built using our fragments architecture. Try selecting a tag in the type-ahead to filter the images listed in the gallery. Then change the delay on the stream of cloud images to see how the type-ahead filtering can be interactive before the page finishes loading.

Multiple Cloudflare Workers

The application is composed of a tree of six collaborating but independently deployable Cloudflare Workers, each rendering their own fragment of the screen and providing their own client-side logic, and assets such as CSS stylesheets and images.

Cloudflare Workers and micro-frontends: made for one another
Architectural overview of the Cloud Gallery app

The “main” fragment acts as the root of the application. The “header” fragment has a slider to configure an artificial delay to the display of gallery images. The “body” fragment contains the “filter” fragment and “gallery” fragments. Finally, the “footer” fragment just shows some static content.

The full source code of the demo app is available on GitHub.

Benefits and features

This architecture of multiple collaborating server-side rendered fragments, deployed to Cloudflare Workers has some interesting features.

Encapsulation

Fragments are entirely encapsulated, so they can control what they own and what they make available to other fragments.

Fragments can be developed and deployed independently

Updating one of the fragments is as simple as redeploying that fragment. The next request to the main application will use the new fragment. Also, fragments can host their own assets (client-side JavaScript, images, etc.), which are streamed through their parent fragment to the browser.

Server-only code is not sent to the browser

As well as reducing the cost of downloading unnecessary code to the browser, security sensitive code that is only needed for server-side rendering the fragment is never exposed to other fragments and is not downloaded to the browser. Also, features can be safely hidden behind feature flags in a fragment, allowing more flexibility with rolling out new behavior safely.

Composability

Fragments are fully composable – any fragment can contain other fragments. The resulting tree structure gives you more flexibility in how you architect and deploy your application. This helps larger projects to scale their development and deployment. Also, fine-grain control over how fragments are composed, could allow fragments that are expensive to server-side render to be cached individually.

Fantastic Lighthouse scores

Streaming server-rendered HTML results in great user experiences and Lighthouse scores, which in practice means happier users and higher chance of conversions for your business.

Cloudflare Workers and micro-frontends: made for one another
Lighthouse scores for the Cloud Gallery app

Each fragment can parallelize requests to its child fragments and pipe the resulting HTML streams into its own single streamed server-side rendered response. Not only can this reduce the time to render the whole page but streaming each fragment through to the browser reduces the time to the first byte of each fragment.

Eager interactivity

One of the powers of a fragments architecture is that fragments can become interactive even while the rest of the application (including other fragments) is still being streamed down to the browser.

In our demo, the “filter” fragment is immediately interactive as soon as it is rendered, even if the image HTML for the “gallery” fragment is still loading.

To make it easier to see this, we added a slider to the top of the “header” that can simulate a network or database delay that slows down the HTML stream which renders the “gallery” images. Even when the “gallery” fragment is still loading, the type-ahead input, in the “filter” fragment, is already fully interactive.

Just think of all the frustration that this eager interactivity could avoid for web application users with unreliable Internet connection.

Under the hood

As discussed already this architecture relies upon deploying this application as many cooperating Cloudflare Workers. Let’s look into some details of how this works in practice.

We experimented with various technologies, and while this approach can be used with many frontend libraries and frameworks, we found the Qwik framework to be a particularly good fit, because of its HTML-first focus and low JavaScript overhead, which avoids any hydration problems.

Implementing a fragment

Each fragment is a server-side rendered Qwik application deployed to its own Cloudflare Worker. This means that you can even browse to these fragments directly. For example, the “header” fragment is deployed to https://cloud-gallery-header.web-experiments.workers.dev/.

Cloudflare Workers and micro-frontends: made for one another
A screenshot of the self-hosted “header” fragment

The header fragment is defined as a Header component using Qwik. This component is rendered in a Cloudflare Worker via a fetch() handler:

export default {
  fetch(request: Request, env: Record<string, unknown>): Promise<Response> {
    return renderResponse(request, env, <Header />, manifest, "header");
  },
};

cloud-gallery/header/src/entry.ssr.tsx

The renderResponse() function is a helper we wrote that server-side renders the fragment and streams it into the body of a Response that we return from the fetch() handler.

The header fragment serves its own JavaScript and image assets from its Cloudflare Worker. We configure Wrangler to upload these assets to Cloudflare and serve them from our network.

Implementing fragment composition

Fragments that contain child fragments have additional responsibilities:

  • Request and inject child fragments when rendering their own HTML.
  • Proxy requests for child fragment assets through to the appropriate fragment.

Injecting child fragments

The position of a child fragment inside its parent can be specified by a FragmentPlaceholder helper component that we have developed. For example, the “body” fragment has the “filter” and “gallery” fragments.

<div class="content">
  <FragmentPlaceholder name="filter" />
  <FragmentPlaceholder name="gallery" />
</div>

cloud-gallery/body/src/root.tsx

The FragmentPlaceholder component is responsible for making a request for the fragment and piping the fragment stream into the output stream.

Proxying asset requests

As mentioned earlier, fragments can host their own assets, especially client-side JavaScript files. When a request for an asset arrives at the parent fragment, it needs to know which child fragment should receive the request.

In our demo we use a convention that such asset paths will be prefixed with /_fragment/<fragment-name>. For example, the header logo image path is /_fragment/header/cf-logo.png. We developed a tryGetFragmentAsset() helper which can be added to the parent fragment’s fetch() handler to deal with this:

export default {
  async fetch(
    request: Request,
    env: Record<string, unknown>
  ): Promise<Response> {
    // Proxy requests for assets hosted by a fragment.
    const asset = await tryGetFragmentAsset(env, request);
    if (asset !== null) {
      return asset;
    }
    // Otherwise server-side render the template injecting child fragments.
    return renderResponse(request, env, <Root />, manifest, "div");
  },
};

cloud-gallery/body/src/entry.ssr.tsx

Fragment asset paths

If a fragment hosts its own assets, then we need to ensure that any HTML it renders uses the special _fragment/<fragment-name> path prefix mentioned above when referring to these assets. We have implemented a strategy for this in the helpers we developed.

The FragmentPlaceholder component adds a base searchParam to the fragment request to tell it what this prefix should be. The renderResponse() helper extracts this prefix and provides it to the Qwik server-side renderer. This ensures that any request for client-side JavaScript has the correct prefix. Fragments can apply a hook that we developed called useFragmentRoot(). This allows components to gather the prefix from a FragmentContext context.

For example, since the “header” fragment hosts the Cloudflare and GitHub logos as assets, it must call the useFragmentRoot() hook:

export const Header = component$(() => {
  useStylesScoped$(HeaderCSS);
  useFragmentRoot();

  return (...);
});

cloud-gallery/header/src/root.tsx

The FragmentContext value can then be accessed in components that need to apply the prefix. For example, the Image component:

export const Image = component$((props: Record<string, string | number>) => {
  const { base } = useContext(FragmentContext);
  return <img {...props} src={base + props.src} />;
});

cloud-gallery/helpers/src/image/image.tsx

Service-binding fragments

Cloudflare Workers provide a mechanism called service bindings to make requests between Cloudflare Workers efficiently that avoids network requests. In the demo we use this mechanism to make the requests from parent fragments to their child fragments with almost no performance overhead, while still allowing the fragments to be independently deployed.

Comparison to current solutions

This fragments architecture has three properties that distinguish it from other current solutions.

Unlike monoliths, or client-side micro-frontends, fragments are developed and deployed as independent server-side rendered applications that are composed together on the server-side. This significantly improves rendering speed, and lowers interaction latency in the browser.

Unlike server-side rendered micro-frontends with Node.js or cloud functions, Cloudflare Workers is a globally distributed compute platform with a region-less deployment model. It has incredibly low latency, and a near-zero communication overhead between fragments.

Unlike solutions based on module federation, a fragment’s client-side JavaScript is very specific to the fragment it is supporting. This means that it is small enough that we don’t need to have shared library code, eliminating the version skew issues and coordination problems when updating shared libraries.

Future possibilities

This demo is just a proof of concept, so there are still areas to investigate. Here are some of the features we’d like to explore in the future.

Caching

Each micro-frontend fragment can be cached independently of the others based on how static its content is. When requesting the full page, the fragments only need to run server-side rendering for micro-frontends that have changed.

Cloudflare Workers and micro-frontends: made for one another
An application where the output of some fragments are cached

With per-fragment caching you can return the HTML response to the browser faster, and avoid incurring compute costs in re-rendering content unnecessarily.

Fragment routing and client-side navigation

Our demo application used micro-frontend fragments to compose a single page. We could however use this approach to implement page routing as well. When server-side rendering, the main fragment could insert the appropriate “page” fragment based on the visited URL. When navigating, client-side, within the app, the main fragment would remain the same while the displayed “page” fragment would change.

Cloudflare Workers and micro-frontends: made for one another
An application where each route is delegated to a different fragment

This approach combines the best of server-side and client-side routing with the power of fragments.

Using other frontend frameworks

Although the Cloud Gallery application uses Qwik to implement all fragments, it is possible to use other frameworks as well. If really necessary, it’s even possible to mix and match frameworks.

To achieve good results, the framework of choice should be capable of server-side rendering, and should have a small client-side JavaScript footprint. HTML streaming capabilities, while not required, can significantly improve performance of large applications.

Cloudflare Workers and micro-frontends: made for one another
An application using different frontend frameworks

Incremental migration strategies

Adopting a new architecture, compute platform, and deployment model is a lot to take in all at once, and for existing large applications is prohibitively risky and expensive. To make this  fragment-based architecture available to legacy projects, an incremental adoption strategy is a key.

Developers could test the waters by migrating just a single piece of the user-interface within their legacy application to a fragment, integrating with minimal changes to the legacy application. Over time, more of the application could then be moved over, one fragment at a time.

Convention over configuration

As you can see in the Cloud Gallery demo application, setting up a fragment-based micro-frontend requires quite a bit of configuration. A lot of this configuration is very mechanical and could be abstracted away via conventions and better tooling. Following productivity-focused precedence found in Ruby on Rails, and filesystem based routing meta-frameworks, we could make a lot of this configuration disappear.

Try it yourself!

There is still so much to dig into! Web applications have come a long way in recent years and their growth is hard to overstate. Traditional implementations of micro-frontends have had only mixed success in helping developers scale development and deployment of large applications. Cloudflare Workers, however, unlock new possibilities which can help us tackle many of the existing challenges and help us build better web applications.

Thanks to the generous free plan offered by Cloudflare Workers, you can check out the Gallery Demo code and deploy it yourself.

If all of these sounds interesting to you, and you would like to work with us on improving the developer experience for Cloudflare Workers, we are also happy to share that we are hiring!

Let’s Architect! Architecting for the edge

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-architecting-for-the-edge/

Edge computing comprises elements of geography and networking and brings computing closer to the end users of the application.

For example, using a content delivery network (CDN) such as AWS CloudFront can help video streaming providers reduce latency for distributing their material by taking advantage of caching at the edge. Another example might look like an Internet of Things (IoT) solution that helps a company run business logic in remote areas or with low latency.

IoT is a challenging field because there are multiple aspects to consider as architects, like hardware, protocols, networking, and software. All of these aspects must be designed to interact together and be fault tolerant.

In this edition of Let’s Architect!, we share resources that are helpful for teams that are approaching or expanding their workloads for edge computing We cover macro topics such as security, best practices for IoT, patterns for machine learning (ML), and scenarios with strict latency requirements.

Build Machine Learning at the edge applications

In Let’s Architect! Architecting for Machine Learning, we touched on some of the most relevant aspects to consider while putting ML into production. However, in many scenarios, you may also have specific constraints like latency or a lack of connectivity that require you to design a deployment at the edge.

This blog post considers a solution based on ML applied to agriculture, where a reliable connection to the Internet is not always available. You can learn from this scenario, which includes information from model training to deployment, to design your ML workflows for the edge. The solution uses Amazon SageMaker in the cloud to explore, train, package, and deploy the model to AWS IoT Greengrass, which is used for inference at the edge.

 High-level architecture of the components that reside on the farm and how they interact with the cloud environment

High-level architecture of the components that reside on the farm and how they interact with the cloud environment

Security at the edge

Security is one of the fundamental pillars described in the AWS Well-Architected Framework. In all organizations, security is a major concern both for the business and the technical stakeholders. It impacts the products they are building and the perception that customers have.

We covered security in Let’s Architect! Architecting for Security, but we didn’t focus specifically on edge technologies. This whitepaper shows approaches for implementing a security strategy at the edge, with a focus on describing how AWS services can be used. You can learn how to secure workloads designed for content delivery, as well as how to implement network protection to defend against DDoS attacks and protect your IoT solutions.

The AWS Well-Architected Tool is designed to help you review the state of your applications and workloads. It provides a central place for architectural best practices and guidance

The AWS Well-Architected Tool is designed to help you review the state of your applications and workloads. It provides a central place for architectural best practices and guidance

AWS Outposts High Availability Design and Architecture Considerations

AWS Outposts allows companies to run some AWS services on-premises, which may be crucial to comply with strict data residency or low latency requirements. With Outposts, you can deploy servers and racks from AWS directly into your data center.

This whitepaper introduces architectural patterns, anti-patterns, and recommended practices for building highly available systems based on Outposts. You will learn how to manage your Outposts capacity and use networking and data center facility services to set up highly available solutions. Moreover, you can learn from mental models that AWS engineers adopted to consider the different failure modes and the corresponding mitigations, and apply the same models to your architectural challenges.

An Outpost deployed in a customer data center and connected back to its anchor Availability Zone and parent Region

An Outpost deployed in a customer data center and connected back to its anchor Availability Zone and parent Region

AWS IoT Lens

The AWS Well-Architected Lenses are designed for specific industry or technology scenarios. When approaching the IoT domain, the AWS IoT Lens is a key resource to learn the best practices to adopt for IoT. This whitepaper breaks down the IoT workloads into the different subdomains (for example, communication, ingestion) and maps the AWS services for IoT with each specific challenge in the corresponding subdomain.

As architects and developers, we tend to automate and reduce the risk of human errors, so the IoT Lens Checklist is a great resource to review your workloads by following a structured approach.

Workload context checklist from the IoT Lens Checklist

Workload context checklist from the IoT Lens Checklist

See you next time!

Thanks for joining our discussion on architecting for the edge! See you in two weeks when we talk about database architectures on AWS.

Other posts in this series

Looking for more architecture content?

AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

Packet captures at the edge

Post Syndicated from Annika Garbers original https://blog.cloudflare.com/packet-captures-at-edge/

Packet captures at the edge

Packet captures at the edge

Packet captures are a critical tool used by network and security engineers every day. As more network functions migrate from legacy on-prem hardware to cloud-native services, teams risk losing the visibility they used to get by capturing 100% of traffic funneled through a single device in a datacenter rack. We know having easy access to packet captures across all your network traffic is important for troubleshooting problems and deeply understanding traffic patterns, so today, we’re excited to announce the general availability of on-demand packet captures from Cloudflare’s global network.

What are packet captures and how are they used?

A packet capture is a file that contains all packets that were seen by a particular network box, usually a firewall or router, during a specific time frame. Packet captures are a powerful and commonly used tool for debugging network issues or getting better visibility into attack traffic to tighten security (e.g. by adding firewall rules to block a specific attack pattern).

A network engineer might use a pcap file in combination with other tools, like mtr, to troubleshoot problems with reachability to their network. For example, if an end user reports intermittent connectivity to a specific application, an engineer can set up a packet capture filtered to the user’s source IP address to record all packets received from their device. They can then analyze that packet capture and compare it to other sources of information (e.g. pcaps from the end user’s side of the network path, traffic logs and analytics) to understand the magnitude and isolate the source of the problem.

Security engineers can also use packet captures to gain a better understanding of potentially malicious traffic. Let’s say an engineer notices an unexpected spike in traffic that they suspect could be an attempted attack. They can grab a packet capture to record the traffic as it’s hitting their network and analyze it to determine whether the packets are valid. If they’re not, for example, if the packet payload is randomly generated gibberish, the security engineer can create a firewall rule to block traffic that looks like this from entering their network.

Packet captures at the edge
Example of a packet capture from a recent DDoS attack targeted at Cloudflare infrastructure. The contents of this pcap can be used to create a “signature” to block the attack.

Fragmenting traffic creates gaps in visibility

Traditionally, users capture packets by logging into their router or firewall and starting a process like tcpdump. They’d set up a filter to only match on certain packets and grab the file. But as networks have become more fragmented and users are moving security functions out to the edge, it’s become increasingly challenging to collect packet captures for relevant traffic. Instead of just one device that all traffic flows through (think of a drawbridge in the “castle and moat” analogy) engineers may have to capture packets across many different physical and virtual devices spread across locations. Many of these packets may not allow taking pcaps at all, and then users have to try to  stitch them back together to create a full picture of their network traffic. This is a nearly impossible task today and only getting harder as networks become more fractured and complex.

Packet captures at the edge

On-demand packet captures from the Cloudflare global network

With Cloudflare, you can regain this visibility. With Magic Transit and Magic WAN, customers route all their public and private IP traffic through Cloudflare’s network to make it more secure, faster, and more reliable, but also to increase visibility. You can think of Cloudflare like a giant, globally distributed version of the drawbridge in our old analogy: because we act as a single cloud-based router and firewall across all your traffic, we can capture packets across your entire network and deliver them back to you in one place.

How does it work?

Customers can request a packet capture using our Packet Captures API. To get the packets you’re looking for you can provide a filter with the IP address, ports, and protocol of the packets you want.

curl -X POST https://api.cloudflare.com/client/v4/accounts/${account_id}/pcaps \
-H 'Content-Type: application/json' \
-H 'X-Auth-Email: [email protected]' \
-H 'X-Auth-Key: 00000000000' \
--data '{
        "filter_v1": {
               "source_address": "1.2.3.4",
               "protocol": 6
        },
        "time_limit": 300,
        "byte_limit": "10mb",
        "packet_limit": 10000,
        "type": "simple",
        "system": "magic-transit"
}'

Example of a request for packet capture using our API.

We leverage nftables to apply the filter to the customer’s incoming packets and log them using nflog:

table inet pcaps_1 {
    chain pcap_1 {
        ip protocol 6 ip saddr 1.2.3.4 log group 1 comment “packet capture”
    }
}

Example nftables configuration used to filter log customer packets

nflog creates a netfilter socket through which logs of a packet are sent from the Linux kernel to user space. In user space, we use tcpdump to read packets off the netfilter socket and generate a packet capture file:

tcpdump -i nflog:1 -w pcap_1.pcap

Example tcpdump command to create a packet capture file.

Usually tcpdump is used by listening to incoming packets on a network interface, but in our case we configure it to read packet logs from an nflog group. tcpdump will convert the packet logs into a packet capture file.

Once we have a packet capture file, we need to deliver it to customers. Because packet capture files can be large and contain sensitive information (e.g. packet payloads), we send them to customers directly from our machines to a cloud storage service of their choice. This means we never store sensitive data, and it’s easy for customers to manage and store these large files.

Get started today

On-demand packet captures are now generally available for customers who have purchased the Advanced features of Magic Firewall. The packet capture API allows customers to capture the first 160 bytes of packets, sampled at a default rate of 1/100. More functionality including full packet captures and on-demand packet capture control in the Cloudflare Dashboard is coming in the following weeks. Contact your account team to stay updated on the latest!

New – Securely manage your AWS IoT Greengrass edge devices using AWS Systems Manager

Post Syndicated from Sean M. Tracey original https://aws.amazon.com/blogs/aws/new-securely-manage-your-aws-iot-greengrass-edge-devices-using-aws-systems-manager/

A header image with the text AWS IoT Greengrass announces AWS System Manager

In 2020, we launched AWS IoT Greengrass 2.0, an open-source edge runtime and cloud service for building, deploying, and managing device software and applications. Today, we’re very excited to announce the ability to securely manage your AWS IoT Greengrass edge devices using AWS Systems Manager (SSM).

Managing vast fleets of varying systems and applications remotely can be a challenge for administrators of edge devices. AWS IoT Greengrass was built to enable these administrators to manage their edge device application stack. While this addressed the needs of many typical edge device administrators, system software on these devices still needed to be updated and maintained through operational policies consistent with those of their broader IT organizations. To this end, administrators would typically have to build or integrate tools to create a centralized interface for managing their edge and IT device software stacks – from security updates, to remote access, and operating system patches.

Until today, IT administrators have had to build or integrate custom tools to make sure edge devices can be managed alongside EC2 and on-prem instances, through a consistent set of policies. At scale, managing device and systems software across a wide variety of edge and IT systems becomes a significant investment in time and money. This is time that could be better spent deploying, optimizing, and managing the very edge devices that they’re maintaining.

What’s New?
Today, we have integrated IoT Greengrass and Systems Manager to simplify the management and maintenance of system software for edge devices. When coupled with the AWS IoT Greengrass Client Software, edge device administrators now can remotely access and securely manage with the multitude of devices that they own – from OS patching, to application deployments. Additionally, regularly scheduled operations that maintain edge compute systems can be automated, all without the need for creating additional custom processes. For IT administrators, this release gives a complete overview of all of their devices through a centralized interface, and a consistent set of tools and policies with the AWS Systems Manager.

For customers new to the AWS IoT Greengrass platform, the integration with Systems Manager simplifies setup even further with a new on- boarding wizard that can reduce the time it takes to create operational management systems for edge devices from weeks to hours.

How is this achieved?
This new capability is enabled by the AWS Systems Manager (SSM) Agent. As of today, customers can deploy the AWS Systems Manager Agent, via the AWS IoT Greengrass console, to their existing edge devices. Once installed on each device, AWS Systems Manager will list all of the devices in the Systems Manager Console, thereby giving administrators and IoT stakeholders an overview of their entire fleet. When coupled with the AWS IoT Greengrass console, administrators can manage their newly configured devices remotely; patching or updating operating systems, troubleshooting remotely, and deploying new applications, all through a centralized, integrated user interface. Devices can be patched individually, or in groups organized by tags or resource groups.

Further information
These new features are now available in all regions where AWS Systems Manager and AWS IoT Greengrass are available. To get started, please visit the IoT Greengrass home page.

Computer Vision at the Edge with AWS Panorama

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/computer-vision-at-the-edge-with-aws-panorama/

Today, the AWS Panorama Appliance is generally available to all of you. The AWS Panorama Appliance is a computer vision (CV) appliance designed to be deployed on your network to analyze images provided by your on-premises cameras.

AWS Panorama Appliance

Every week, I read about new and innovative use cases for computer vision. Some customers are using CV to verify pallet trucks are parked in designated areas to ensure worker safety in warehouses, some are analyzing customer walking flows in retail stores to optimize space and product placement, and some are using it to recognize cats and mice, just to name a few.

AWS customers agree the cloud is the most convenient place to train computer vision models thanks to its virtually infinite access to storage and compute resources. In the cloud, data scientists have access to powerful tools such as Amazon SageMaker and a wide variety of compute resources and frameworks.

However, when it’s time to analyze images from one or multiple video feeds, many of you are telling us the cloud is not the place where you want to run such workloads. There are a number of reasons for that: sometimes the facilities where the images are captured do not have enough bandwidth to send video feeds to the cloud, some use cases require very low latency, or some just want to keep their images on premises and not send them for analysis outside of their network.

At re:Invent 2020, we announced the AWS Panorama Appliance and SDK to address these requirements.

AWS Panorama is a machine learning appliance and software development kit (SDK) that allows you to bring computer vision to on-premises cameras to make predictions locally with high accuracy and low latency. With the AWS Panorama Appliance, you can automate tasks that have traditionally required human inspection to improve visibility into potential issues. For example, you can use AWS Panorama Appliance to evaluate manufacturing quality, identify bottlenecks in industrial processes, and monitor workplace security even in environments with limited or no internet connectivity. The software development kit allows camera manufacturers to bring equivalent capabilities directly inside their IP camera.

As usual on this blog, I would like to walk you through the development and deployment of a computer vision application for the AWS Panorama Appliance. The demo application from this blog uses a machine learning model to recognise objects in frames of video from a network camera. The application loads a model onto the AWS Panorama Appliance, gets images from a camera, and runs those images through the model. The application then overlays the results on top of the original video and outputs it to a connected display. The application uses libraries provided by AWS Panorama to interact with input and output video streams and the model, no low level programming is required.

Let’s first define a few concepts. I borrowed the following definitions from the AWS Panorama documentation page.

Concepts
The AWS Panorama Appliance is the hardware that runs your applications. You use the AWS Panorama console or AWS SDKs to register an appliance, update its software, and deploy applications to it. The software that runs on the appliance discovers and connects to camera streams, sends frames of video to your application, and optionally displays video output on an attached display.

The appliance is an edge device. Instead of sending images to the AWS Cloud for processing, it runs applications locally on optimized hardware. This enables you to analyze video in real time and process the results with limited connectivity. The appliance only requires an internet connection to report its status, upload logs, and get software updates and deployments.

An application comprises multiple components called nodes, which represent cameras, models, code, or global variables. A node can be configuration only (inputs and outputs), or include artifacts (models and code). Application nodes are bundled in node packages that you upload to an S3 access point, where the AWS Panorama Appliance can access them. An application manifest is a configuration file that defines connections between the nodes.

A computer vision model is a machine learning network that is trained to process images. Computer vision models can perform various tasks such as classification, detection, segmentation, and tracking. A computer vision model takes an image as input and outputs information about the image or objects in the image.

AWS Panorama supports models built with Apache MXNet, DarkNet, GluonCV, Keras, ONNX, PyTorch, TensorFlow, and TensorFlow Lite. You can build models with Amazon SageMaker and import them from an Amazon Simple Storage Service (Amazon S3) bucket.

AWS Panorama Context

Now that we grasp the concepts, let’s get our hands on.

Unboxing Your AWS Panorama Appliance
In the box the service team sent me, I found the appliance itself (no surprise!), a power cord and two ethernet cables. The box also contains a USB key to initially configure the appliance. The device is designed to work in industrial environments. It has two ethernet ports next to the power connector on the back. On the front, protected behind a sliding door, I found a SD card reader, one HDMI connector and two USB ports. There is also a power button and a reset button to reinitialise the device to its factory state.

AWS Panorama Appliance Front Side AWS Panorama Appliance Back Side AWS Panorama USB key

Configuring Your Appliance
I first configured it for my network (cable + DHCP, but it also supports static IP configuration) and registered it to securely connect back to my AWS Account. To do so, I navigated to the AWS Management Console, entered my network configuration details. It generated a set of configuration files and certificates. I copied them to the appliance using the provided USB key. My colleague Martin Beeby shared screenshots of this process. The team slightly modified the screens based on the feedback they received during the preview, but I don’t think it is worth going through the step-by-step process again. Tip from the field: be sure to use the USB key provided in the box, it is correctly formatted and automatically recognised by the appliance (my own USB key was not recognized properly).

I then downloaded a sample application from the Panorama GitHub repository and tried it with the Test Utility for Panorama, also available on this GitHub (the test utility is an EC2 instance configured to act as a simulator). The Test Utility for Panorama uses Jupyter notebooks to quickly experiment with sample applications or your code before deploying it to the appliance. It also lists commands allowing you to deploy your applications to the appliance programmatically.

Panorama Command Line
The Panorama command line simplifies the operations to create a project, import assets, package it, and deploy it to the AWS Panorama Appliance. You can follow these instructions to download and install the Panorama command line.

When receiving an application developed by someone else, like the sample application, I have to replace AWS account IDs in all application files and directory names. I do this with one single command:

panorama-cli import-application

Application Structure
A Panorama application structure looks as follows:

├── assets
├── graphs
│   └── example_project
│       └── graph.json
└── packages
    ├── accountXYZ-model-1.0
    │   ├── descriptor.json
    │   └── package.json
    └── accountXYZ-sample-app-1.0
        ├── Dockerfile
        ├── descriptor.json
        ├── package.json
        └── src
            └── app.py

  • graph.json lists all the packages and nodes in this application. Nodes are the way to define an application in Panorama.
  • in each package package.json has details about the package and the assets it uses.
  • model package model has a descriptor.json which contains the metadata required for compiling the model.
  • container packagesample-app package contains the application code in the src directory and a Dockerfile to build the container. descriptor.json has details about which command and file to use when the container is launched.
  • assets directory is where all the assets reside, such as packaged code and compiled models. You should not make any changes in this directory.

Note that package names are prefixed with your account number.

When my application is ready, I build the container (I am using a Linux machine with Docker Engine and Docker CLI to avoid using Docker Desktop for macOS or Windows.)

$ panorama-cli build-container                               \
               --container-asset-name {container_asset_name} \ 
               --package-path packages/{account_id}-{package_name}-1.0 

A Note About the Cameras
AWS Panorama Appliance has a concept of “abstract cameras”. Abstract camera sources are placeholders that can be mapped to actual camera devices during application deployment. The Test Utility for Panorama allows you to map abstract cameras to video files for easy, repeatable tests.

Adding a ML Model
The AWS Panorama Appliance supports multiple ML Model frameworks. Models may be trained on Amazon SageMaker or any other solution of your choice. I downloaded my ML model from S3 and import it to my project:

panorama-cli add-raw-model                                                 \
    --model-asset-name {asset_name}                                        \
    --model-s3-uri s3://{S3_BUCKET}/{project_name}/{ML_MODEL_FNAME}.tar.gz \
    --descriptor-path {descriptor_path}                                    \
    --packages-path {package_path}

Behind the scenes, ML Models are compiled to optimise them to the Nvidia Accelerated Linux Arm64 architecture of the AWS Panorama Appliance.

Package the Application
Now that I have a ML model and my application code packaged in a container, I am ready to package my application assets for AWS Panorama Appliance:

panorama-cli package-application

This command uploads all my application assets to the AWS cloud account along with all the manifests.

Deploy the Application
Finally I deploy the application to the AWS Panorama Appliance. A deployment copies the application and its configuration, like camera stream selection, from the AWS cloud to my on-premise AWS Panorama Appliance. I may deploy my application programmatically using Python code (and the Boto3 SDK you might know already):


client = boto3.client('panorama')
client.create_application_instance(
    Name="AWS News Blog Sample Application",
    Description="An object detection app",
    ManifestPayload={
       'PayloadData': manifest         # <== this is the graph.json file content 
    },
    RuntimeRoleArn=role,               # <== this is a role that gives my app permissions to use AWS Services such as Cloudwatch
    DefaultRuntimeContextDevice=device # <== this is my device name 
)

Alternatively, I may use the AWS Management Console:

On Deployed applications, I select Deploy application.

AWS Panorama - Start the deployment

I copy and paste the content of graphs/<project name>/graph.json to the console and select Next.

AWS Panorama - Copy Graph JSON

I give my application a name and an optional description. I select Proceed to deploy.

AWS Panoram Deploy - Application Name

The next steps are

  • declare an IAM role to give permissions to my application to use AWS Service. The minimal permissions set allows to call the PuMetricData API on CloudWatch.
  • select the AWS Panorama Appliance I want to deploy to
  • map the abstract cameras defined in the application descriptors.json to physical cameras known by the AWS Panorama Appliance
  • fill in any application-specific inputs, such as acceptable threshold value, log level etc.

An example IAM policy is

AWSTemplateFormatVersion: '2010-09-09'
Description: Resources for an AWS Panorama application.
Resources:
  runtimeRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          -
            Effect: Allow
            Principal:
              Service:
                - panorama.amazonaws.com
            Action:
              - sts:AssumeRole
      Policies:
        - PolicyName: cloudwatch-putmetrics
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action: 'cloudwatch:PutMetricData'
                Resource: '*'
      Path: /service-role/

These six screenhots capture this process:

AWS Panorama - Deploy 2 AWS Panorama - Deploy 1 AWS Panorama - Deploy 3
AWS Panorama - Deploy 4 AWS Panorama - Deploy 5 AWS Panorama - Deploy 5

The deployment takes 15-30 minutes depending on the size of your code and your ML models, and the appliance available bandwidth. Eventually, the status turn green to “Running”.

AWS Panorama - Deployment Completed

Once the application is deployed to your AWS Panorama Appliance it begins to run, continuously analyzing video and generating highly accurate predictions locally within milliseconds. I connect an HDMI cable to the AWS Panorama Appliance to monitor the output, and I can see:

AWS Panorama HDMI output

Should anything goes wrong during the deployment or during the life of the application, I have access to the logs on Amazon CloudWatch. There are two log streams created, one for the AWS Panorama Appliance itself and one for the application.

AWS Panorama Appliance log AWS Panorama application log

Pricing and Availability
The AWS Panorama Appliance is available to purchase at AWS Elemental order page in the AWS Console. You can place orders from the United States, Canada, the United Kingdom, and the European Union. There is a one-time charge of $4,000 for the appliance itself.

There is a usage charge of $8.33 / month / camera feed.

AWS Panorama stores versioned copies of all assets deployed to the AWS Panorama Appliance (including ML models and business logic) in the cloud. You are charged $0.10 per-GB, per-month for this storage.

You may incur additional charges if the business logic deployed to your AWS Panorama Appliance uses other AWS services. For example, if your business logic uploads ML predictions to S3 for offline analysis, you will be billed separately by S3 for any storage charges incurred.

The AWS Panorama Appliance can be installed anywhere. The appliance connects back to the AWS Panorama service in the AWS cloud in one of the following AWS Region : US East (N. Virginia), US West (Oregon), Canada (Central), or Europe (Ireland).

Go and build your first computer vision model today.

— seb

Introducing the Security at the Edge: Core Principles whitepaper

Post Syndicated from Maddie Bacon original https://aws.amazon.com/blogs/security/introducing-the-security-at-the-edge-core-principles-whitepaper/

Amazon Web Services (AWS) recently released the Security at the Edge: Core Principles whitepaper. Today’s business leaders know that it’s critical to ensure that both the security of their environments and the security present in traditional cloud networks are extended to workloads at the edge. The whitepaper provides security executives the foundations for implementing a defense in depth strategy for security at the edge by addressing three areas of edge security:

  • AWS services at AWS edge locations
  • How those services and others can be used to implement the best practices outlined in the design principles of the AWS Well-Architected Framework Security Pillar
  • Additional AWS edge services, which customers can use to help secure their edge environments or expand operations into new, previously unsupported environments

Together, these elements offer core principles for designing a security strategy at the edge, and demonstrate how AWS services can provide a secure environment extending from the core cloud to the edge of the AWS network and out to customer edge devices and endpoints. You can find more information in the Security at the Edge: Core Principles whitepaper.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Maddie Bacon

Maddie (she/her) is a technical writer for AWS Security with a passion for creating meaningful content. She previously worked as a security reporter and editor at TechTarget and has a BA in Mathematics. In her spare time, she enjoys reading, traveling, and all things Harry Potter.

Author

Jana Kay

Since 2018, Jana has been a cloud security strategist with the AWS Security Growth Strategies team. She develops innovative ways to help AWS customers achieve their objectives, such as security table top exercises and other strategic initiatives. Previously, she was a cyber, counter-terrorism, and Middle East expert for 16 years in the Pentagon’s Office of the Secretary of Defense.

Enhancing Existing Building Systems with AWS IoT Services

Post Syndicated from Lewis Taylor original https://aws.amazon.com/blogs/architecture/enhancing-existing-building-systems-with-aws-iot-services/

With the introduction of cloud technology and by extension the rapid emergence of Internet of Things (IoT), the barrier to entry for creating smart building solutions has never been lower. These solutions offer commercial real estate customers potential cost savings and the ability to enhance their tenants’ experience. You can differentiate your business from competitors by offering new amenities and add new sources of revenue by understanding more about your buildings’ operations.

There are several building management systems to consider in commercial buildings, such as air conditioning, fire, elevator, security, and grey/white water. Each system continues to add more features and become more automated, meaning that control mechanisms use all kinds of standards and protocols. This has led to fragmented building systems and inefficiency.

In this blog, we’ll show you how to use AWS for the Edge to bring these systems into one data path for cloud processing. You’ll learn how to use AWS IoT services to review and use this data to build smart building functions. Some common use cases include:

  • Provide building facility teams a holistic view of building status and performance, alerting them to problems sooner and helping them solve problems faster.
  • Provide a detailed record of the efficiency and usage of the building over time.
  • Use historical building data to help optimize building operations and predict maintenance needs.
  • Offer enriched tenant engagement through services like building control and personalized experiences.
  • Allow building owners to gather granular usage data from multiple buildings so they can react to changing usage patterns in a single platform.

Securely connecting building devices to AWS IoT Core

AWS IoT Core supports connections with building devices, wireless gateways, applications, and services. Devices connect to AWS IoT Core to send and receive data from AWS IoT Core services and other devices. Buildings often use different device types, and AWS IoT Core has multiple options to ingest data and enabling connectivity within your building. AWS IoT Core is made up of the following components:

  • Device Gateway is the entry point for all devices. It manages your device connections and supports HTTPS and MQTT (3.1.1) protocols.
  • Message Broker is an elastic and fully managed pub/sub message broker that securely transmits messages (for example, device telemetry data) to and from all your building devices.
  • Registry is a database of all your devices and associated attributes and metadata. It allows you to group devices and services based upon attributes such as building, software version, vendor, class, floor, etc.

The architecture in Figure 1 shows how building devices can connect into AWS IoT Core. AWS IoT Core supports multiple connectivity options:

  • Native MQTT – Multiple building management systems or device controllers have MQTT support immediately.
  • AWS IoT Device SDK – This option supports MQTT protocol and multiple programming languages.
  • AWS IoT Greengrass – The previous options assume that devices are connected to the internet, but this isn’t always possible. AWS IoT Greengrass extends the cloud to the building’s edge. Devices can connect directly to AWS IoT Greengrass and send telemetry to AWS IoT Core.
  • AWS for the Edge partner products – There are several partner solutions, such as Ignition Edge from Inductive Automation, that offer protocol translation software to normalize in-building sensor data.
Data ingestion options from on-premises devices to AWS

Figure 1. Data ingestion options from on-premises devices to AWS

Challenges when connecting buildings to the cloud

There are two common challenges when connecting building devices to the cloud:

  • You need a flexible platform to aggregate building device communication data
  • You need to transform the building data to a standard protocol, such as MQTT

Building data is made up of various protocols and formats. Many of these are system-specific or legacy protocols. To overcome this, we suggest processing building device data at the edge, extracting important data points/values before transforming to MQTT, and then sending the data to the cloud.

Transforming protocols can be complex because they can abstract naming and operation types. AWS IoT Greengrass and partner products such as Ignition Edge make it possible to read that data, normalize the naming, and extract useful information for device operation. Combined with AWS IoT Greengrass, this gives you a single way to validate the building device data and standardize its processing.

Using building data to develop smart building solutions

The architecture in Figure 2 shows an in-building lighting system. It is connected to AWS IoT Core and reports on devices’ status and gives users control over connected lights.

The architecture in Figure 2 has two data paths, which we’ll provide details on in the following sections, but here’s a summary:

  1. The “cold” path gathers all incoming data for batch data analysis and historical dashboarding.
  2. The “warm” bidirectional path is for faster, real-time data. It gathers devices’ current state data. This path is used by end-user applications for sending control messages, real-time reporting, or initiating alarms.
Figure 2. Architecture diagram of a building lighting system connected to AWS IoT Core

Figure 2. Architecture diagram of a building lighting system connected to AWS IoT Core

Cold data path

The cold data path gathers all lighting device telemetry data, such as power consumption, operating temperature, health data, etc. to help you understand how the lighting system is functioning.

Building devices can often deliver unstructured, inconsistent, and large volumes of data. AWS IoT Analytics helps clean up this data by applying filters, transformations, and enrichment from other data sources before storing it. By using Amazon Simple Storage Service (Amazon S3), you can analyze your data in different ways. Here we use Amazon Athena and Amazon QuickSight for building operational dashboard visualizations.

Let’s discuss a real-world example. For building lighting systems, understanding your energy consumption is important for evaluating energy and cost efficiency. Data ingested into AWS IoT Core can be stored long term in Amazon S3, making it available for historical reporting. Athena and QuickSight can quickly query this data and build visualizations that show lighting state (on or off) and annual energy consumption over a set period of time. You can also overlay this data with sunrise and sunset data to provide insight into whether you are using your lighting systems efficiently. For example, adjusting the lighting schedule accordingly to the darker winter months versus the brighter summer months.

Warm data path

In the warm data path, AWS IoT Device Shadow service makes the device state available. Shadow updates are forwarded by an AWS IoT rule into downstream services such an AWS IoT Event, which tracks and monitors multiple devices and data points. Then it initiates actions based on specific events. Further, you could build APIs that interact with AWS IoT Device Shadow. In this architecture, we have used AWS AppSync and AWS Lambda to enable building controls via a tenant smartphone application.

Let’s discuss a real-world example. In an office meeting room lighting system, maintaining a certain brightness level is important for health and safety. If that space is unoccupied, you can save money by turning the lighting down or off. AWS IoT Events can take inputs from lumen sensors, lighting systems, and motorized blinds and put them into a detector model. This model calculates and prompts the best action to maintain the room’s brightness throughout the day. If the lumen level drops below a specific brightness threshold in a room, AWS IoT Events could prompt an action to maintain an optimal brightness level in the room. If an occupancy sensor is added to the room, the model can know if someone is in the room and maintain the lighting state. If that person leaves, it will turn off that lighting. The ongoing calculation of state can also evaluate the time of day or weather conditions. It would then select the most economical option for the room, such as opening the window blinds rather than turning on the lighting system.

Conclusion

In this blog, we demonstrated how to collect and aggregate the data produced by on-premises building management platforms. We discussed how augmenting this data with the AWS IoT Core platform allows for development of smart building solutions such as building automation and operational dashboarding. AWS products and services can enable your buildings to be more efficient while and also provide engaging tenant experiences. For more information on how to get started please check out our getting started with AWS IoT Core developer guide.

Automating data center expansions with Airflow

Post Syndicated from Jet Mariscal original https://blog.cloudflare.com/automating-data-center-expansions-with-airflow/

Automating data center expansions with Airflow

Cloudflare’s network keeps growing, and that growth doesn’t just come from building new data centers in new cities. We’re also upgrading the capacity of existing data centers by adding newer generations of servers — a process that makes our network safer, faster, and more reliable for our users.

Connecting new Cloudflare servers to our network has always been complex, in large part because of the amount of manual effort that used to be required. Members of our Data Center and Infrastructure Operations, Network Operations, and Site Reliability Engineering teams had to carefully follow steps in an extremely detailed standard operating procedure (SOP) document, often copying command-line snippets directly from the document and pasting them into terminal windows.

But such a manual process can only scale so far, and we knew must be a way to automate the installation of new servers.

Here’s how we tackled that challenge by building our own Provisioning-as-a-Service (PraaS) platform and cut by 90% the amount of time our team spent on mundane operational tasks.

Choosing and using an automation framework

When we began our automation efforts, we quickly realized it made sense to replace each of these manual SOP steps with an API-call equivalent and to present them in a self-service web-based portal.

To organize these new automatic steps, we chose Apache Airflow, an open-source workflow management platform. Airflow is built around directed acyclic graphs, or DAGs, which are collections of all the tasks you want to run, organized in a way that reflects their relationships and dependencies.

In this new system, each SOP step is implemented as a task in the DAG. The majority of these tasks are API calls to Salt — software which automates the management and configuration of any infrastructure or application, and which we use to manage our servers, switches, and routers. Other DAG tasks are calls to query Prometheus (systems monitoring and alerting toolkit), Thanos (a highly available Prometheus setup with long-term storage capabilities), Google Chat webhooks, JIRA, and other internal systems.

Here is an example of one of these tasks. In the original SOP, SREs were given the following instructions to enable anycast:

  1. Login to a remote system.
  2. Copy and Paste the command in the terminal.
  3. Replace the router placeholder in the command snippet with the actual value.
  4. Execute the command.

MeanwhileIn our new workflow, this step becomes a single task in the DAG named “enable_anycast”:

enable_anycast = builder.wrap_class(AsyncSaltAPIOperator)(
             task_id='enable_anycast',
             target='{{ params.netops }}',
             function='cmd.run',
             fun_kwargs={'cmd': 'salt {{ get_router(params.colo_name) }} '
                         'anycast.enable --out=json --out-indent=-1'},
             salt_conn_id='salt_api',
             trigger_rule='one_success')

As you can see, automation eliminates the need for a human operator to login to a remote system, and to figure out the router that will be used to replace the placeholder in the command to be executed.

In Airflow, a task is an implementation of an Operator. The Operator in the automated step is the “AsyncSaltAPIOperator”, a custom operator built in-house. This extensibility is one of the many reasons that made us decide to use Apache Airflow. It allowed us to extend its functionality by writing custom operators that suit our needs.

SREs from various teams have written quite a lot of custom Airflow Operators that integrate with Salt, Prometheus, Bitbucket, Google Chat, JIRA, PagerDuty, among others.

Manual SOP steps transformed into a feature-packed automation

The tasks that replaced steps in the SOP are marvelously feature-packed. Here are some highlights of what they are capable of, on top of just executing a command:

Failure Handling
When a task fails for whatever reason, it automatically retries until it exhausts its maximum retry limit that we set for the task. We employ various retry strategies, including instructing tasks to not retry at all, especially when it’s impractical to retry, or when we deliberately do not want it to retry at all regardless of whether there are any retry attempts remaining, such as when an exception is encountered or a condition that is unlikely to change for the better.

Logging
Each task provides a comprehensive log during executions. We’ve written our tasks to ensure that we log as much information as possible that would help us audit and troubleshoot issues.

Notifications
We’ve written our tasks to send a notification with information such as the name of the DAG, the name of the task, its task state, the number of attempts it took to reach a certain state, and a link to view the task logs.

When a task fails, we definitely want to be notified, so we also set tasks to additionally provide information such as the number of retry attempts and links to view relevant wiki pages or Grafana dashboards.

Depending on the criticality of the failure, we can also instruct it to page the relevant on-call person on the provisioning shift, should it require immediate attention.

Jinja Templating
Jinja templating allows providing dynamic content using code to otherwise static objects such as strings. We use this in combination with macros wherein we provide parameters that can change during the execution, since macros are evaluated while the task gets run.

Macros
Macros are used to pass dynamic information into task instances at runtime. Macros are a way to expose objects to templates. In other words, macros are functions that take input, modify that input, and give the modified output.

Adapting tasks for preconditions and human intervention

There are a few steps in the SOP that require certain preconditions to be met. We use sensors to set dependencies between these tasks, and even between different DAGs, so that one does not run until the dependency has been met.

Below is an example of a sensor that waits until all nodes resolve to their assigned DNS records:

verify_node_dns = builder.wrap_class(DNSSensor)(
            task_id='verify_node_dns',
            zone=domain,
            nodes_from='{{ to_json(run_ctx.globals.import_nodes_via_mpl) }}',
            timeout=60 * 30,
            poke_interval=60 * 10,
	mode='reschedule')

In addition, some of our tasks still require input from a  human operator. In these circumstances, we use sensors as blocking tasks that prevent work from starting until certain preconditions are met. We use these to set dependencies between tasks and even DAGs so that one does not run until the dependency has finished successfully.

The code below is a simple example of a task that will send notifications to get the attention of a human operator, and waits until a Change Request ticket has been provided and verified:

verify_jira_input = builder.wrap_class(InputSensor)(
            task_id='verify_jira_input',
            var_key='jira',
            prompt='Please provide the Change Request ticket.',
            notify=True,
            require_human=True)

Another sensor task example is waiting until a zone has been deployed by a Cloudflare engineer as described in https://blog.cloudflare.com/improving-the-resiliency-of-our-infrastructure-dns-zone/.

In order for PraaS to be able to accept human inputs, we’ve written a separate DAG we call our DAG Manager. Whenever we need to submit input back to a running expansion DAG, we simply trigger the DAG Manager and pass in our input as a JSON configuration, which will then be processed accordingly and submit the input back to the expansion DAG.

Automating data center expansions with Airflow

Managing Dependencies Between Tasks

Replacing SOP steps with DAG tasks was only the first part of our journey towards greater automation. We also had to define the dependencies between these tasks and construct the workflow accordingly.

Here’s an example of what this looks like in code:

verify_cr >> parse_cr >> [execute_offline, execute_online]
        execute_online >> silence_highstate_runner >> silence_metals >> \
            disable_highstate_runner

The code simply uses bit shift operators to chain the operations. A list of tasks can also be set as dependencies:

change_metal_status >>  [wait_for_change_metal_status, verify_zone_update] >> \
evaluate_ecmp_management

With the bit shift operator, chaining multiple dependencies becomes concise.

By default, a downstream task will only run if its upstream has succeeded. For a more complex dependency setup, we set a trigger_rule which defines the rule by which the generated task gets triggered.

All operators have a trigger_rule argument. The Airflow scheduler decides whether to run the task or not depending on what rule was specified in the task. An example rule that we use a lot in PraaS is “one_success” — it fires as soon as at least one parent succeeds, and it does not wait for all parents to be done.

Solving Complex Workflows with Branching and Multi-DAGs

Having complex workflows means that we need a workflow to branch, or only go down a certain path, based on an arbitrary condition, which is typically related to something that happened in an upstream task. Branching is used to perform conditional logic, that is, execute a set of tasks based on a condition. We use BranchPythonOperator to achieve this.

At some point in the workflow, our data center expansion DAGs trigger various external DAGs to accomplish complex tasks. This is why we have written our DAGs to be fully reusable. We did not try to incorporate all the logic into a single DAG; instead, we created other separable DAGs that are fully reusable and can be triggered on-demand manually by hand or programmatically — our DAG Manager and the “helper” DAG is an example of this.

The Helper DAG comprises logic that allows us to mimic a “for loop” by having the DAG respawn itself if needed, technically doing cycles. If you recall, a DAG is acyclic, but we have some tasks in our workflow that require us to do complex loops and are solved by using a helper DAG.

Automating data center expansions with Airflow

We designed reusable DAGs early on, which allowed us to build complex automation workflows from separable DAGs, each of which handles distinct and well-defined tasks. Each data center DAG could easily reuse other DAGs by triggering them programmatically.

Having separate DAGs that run independently, that are triggered by other DAGs, and that keep inter-dependencies between them, is a pattern we use a lot. It has allowed us to execute very complex workflows.

Creating DAGs that Scale and Executing Tasks at Scale

Data center expansions are done in two phases:

Phase 1 – this is the phase in which servers are powered on. It boots our custom Linux kernel, and begins the provisioning process.

Phase 2 – this is the phase in which newly provisioned servers are enabled in the cluster to receive production traffic.

To reflect these phases in the automation workflow, we also wrote two separate DAGs, one for each phase. However, we have over 200 data centers, so if we were to write a pair of DAGs for each, we would end up writing and maintaining 400 files!

A viable option could be to parameterize our DAGs. At first glance, this approach sounds reasonable. However, it poses one major challenge: tracking the progress of DAG runs will be too difficult and confusing for the human operator using PraaS.

Following the software design principle called DRY (Don’t Repeat Yourself), and inspired by the Factory Method design pattern in programming, we’ve instead written both phase 1 and phase 2 DAGs in a way that allow them to dynamically create multiple different DAGs with exactly the same tasks, and to fully reuse the exact same code. As a result, we only maintain one code base, and as we add new data centers, we are also able to generate a DAG for each new data center instantly, without writing a single line of code.

Automating data center expansions with Airflow

And Airflow even made it easy to put a simple customized web UI on top of the process, which made it simple to use by more employees who didn’t have to understand all the details.

The death of SOPs?

We would like to think that all of this automation removes the need for our original SOP document. But this is not really the case.  Automation can fail, the components in it can fail, and   a particular task in the DAG may fail. When this happens, our SOPs will be used again to prevent provisioning and expansion activities from stopping completely.

Introducing automation paved the way for what we call an SOP-as-Code practice. We made sure that every task in the DAG had an equivalent manual step in the SOP that SREs can execute by hand, should the need arise, and that every change in the SOP has a corresponding pull request (PR) in the code.

What’s next for PraaS

Onboarding of the other provisioning activities into PraaS, such as decommissioning, is already ongoing.

For expansions, our ultimate goal is a fully autonomous system that monitors whether new servers have been racked in our edge data centers — and automatically triggers expansions — with no human intervention.