Tag Archives: Privacy

AI Industry is Trying to Subvert the Definition of “Open Source AI”

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/11/ai-industry-is-trying-to-subvert-the-definition-of-open-source-ai.html

The Open Source Initiative has published (news article here) its definition of “open source AI,” and it’s terrible. It allows for secret training data and mechanisms. It allows for development to be done in secret. Since for a neural network, the training data is the source code—it’s how the model gets programmed—the definition makes no sense.

And it’s confusing; most “open source” AI models—like LLAMA—are open source in name only. But the OSI seems to have been co-opted by industry players that want both corporate secrecy and the “open source” label. (Here’s one rebuttal to the definition.)

This is worth fighting for. We need a public AI option, and open source—real open source—is a necessary component of that.

But while open source should mean open source, there are some partially open models that need some sort of definition. There is a big research field of privacy-preserving, federated methods of ML model training and I think that is a good thing. And OSI has a point here:

Why do you allow the exclusion of some training data?

Because we want Open Source AI to exist also in fields where data cannot be legally shared, for example medical AI. Laws that permit training on data often limit the resharing of that same data to protect copyright or other interests. Privacy rules also give a person the rightful ability to control their most sensitive information ­ like decisions about their health. Similarly, much of the world’s Indigenous knowledge is protected through mechanisms that are not compatible with later-developed frameworks for rights exclusivity and sharing.

How about we call this “open weights” and not open source?

Are Automatic License Plate Scanners Constitutional?

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/10/are-automatic-license-plate-scanners-constitutional.html

An advocacy groups is filing a Fourth Amendment challenge against automatic license plate readers.

“The City of Norfolk, Virginia, has installed a network of cameras that make it functionally impossible for people to drive anywhere without having their movements tracked, photographed, and stored in an AI-assisted database that enables the warrantless surveillance of their every move. This civil rights lawsuit seeks to end this dragnet surveillance program,” the lawsuit notes. “In Norfolk, no one can escape the government’s 172 unblinking eyes,” it continues, referring to the 172 Flock cameras currently operational in Norfolk. The Fourth Amendment protects against unreasonable searches and seizures and has been ruled in many cases to protect against warrantless government surveillance, and the lawsuit specifically says Norfolk’s installation violates that.”

Deebot Robot Vacuums Are Using Photos and Audio to Train Their AI

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/10/deebot-robot-vacuums-are-using-photos-and-audio-to-train-their-ai.html

An Australian news agency is reporting that robot vacuum cleaners from the Chinese company Deebot are surreptitiously taking photos and recording audio, and sending that data back to the vendor to train their AIs.

Ecovacs’s privacy policy—available elsewhere in the app—allows for blanket collection of user data for research purposes, including:

  • The 2D or 3D map of the user’s house generated by the device
  • Voice recordings from the device’s microphone
  • Photos or videos recorded by the device’s camera

It also states that voice recordings, videos and photos that are deleted via the app may continue to be held and used by Ecovacs.

No word on whether the recorded audio is being used to train the vacuum in some way, or whether it is being used to train a LLM.

Slashdot thread.

New standards for a faster and more private Internet

Post Syndicated from Matt Bullock original https://blog.cloudflare.com/new-standards

As the Internet grows, so do the demands for speed and security. At Cloudflare, we’ve spent the last 14 years simplifying the adoption of the latest web technologies, ensuring that our users stay ahead without the complexity. From being the first to offer free SSL certificates through Universal SSL to quickly supporting innovations like TLS 1.3, IPv6, and HTTP/3, we’ve consistently made it easy for everyone to harness cutting-edge advancements.

One of the most exciting recent developments in web performance is Zstandard (zstd) — a new compression algorithm that we have found compresses data 42% faster than Brotli while maintaining almost the same compression levels. Not only that, but Zstandard reduces file sizes by 11.3% compared to GZIP, all while maintaining comparable speeds. As compression speed and efficiency directly impact latency, this is a game changer for improving user experiences across the web.

We’re also re-starting the rollout of Encrypted Client Hello (ECH), a new proposed standard that prevents networks from snooping on which websites a user is visiting. Encrypted Client Hello (ECH) is a successor to ESNI and masks the Server Name Indication (SNI) that is used to negotiate a TLS handshake. This means that whenever a user visits a website on Cloudflare that has ECH enabled, no one except for the user, Cloudflare, and the website owner will be able to determine which website was visited. Cloudflare is a big proponent of privacy for everyone and is excited about the prospects of bringing this technology to life.

In this post, we also further explore our work measuring the impact of HTTP/3 prioritization, and the development of Bottleneck Bandwidth and Round-trip propagation time (BBR) congestion control to further optimize network performance.

Introducing Zstandard compression

Zstandard, an advanced compression algorithm, was developed by Yann Collet at Facebook and open sourced in August 2016 to manage large-scale data processing.  It has gained popularity in recent years due to its impressive compression ratios and speed. The protocol was included in Chromium-based browsers and Firefox in March 2024 as a supported compression algorithm. 

Today, we are excited to announce that Zstandard compression between Cloudflare and browsers is now available to everyone. 

Our testing shows that Zstandard compresses data up to 42% faster than Brotli while achieving nearly equivalent data compression. Additionally, Zstandard outperforms GZIP by approximately 11.3% in compression efficiency, all while maintaining similar compression speeds. This means Zstandard can compress files to the same size as Brotli but in nearly half the time, speeding up your website without sacrificing performance.

This is exciting because compression speed and file size directly impacts latency. When a browser requests a resource from the origin server, the server needs time to compress the data before it’s sent over the network. A faster compression algorithm, like Zstandard, reduces this initial processing time. By also reducing the size of files transmitted over the Internet, better compression means downloads take less time to complete, websites load quicker, and users ultimately get a better experience.

Why is compression so important?

Website performance is crucial to the success of online businesses. Study after study has shown that an increased load time directly affects sales. In highly competitive markets, the performance of a website is crucial for success. Just like a physical shop situated in a remote area faces challenges in attracting customers, a slow website encounters similar difficulties in attracting traffic.

Think about buying a piece of flat pack furniture such as a bookshelf. Instead of receiving the bookshelf fully assembled, which would be expensive and cumbersome to transport, you receive it in a compact, flat box with all the components neatly organized, ready for assembly. The parts are carefully arranged to take up the least amount of space, making the package much smaller and easier to handle. When you get the item, you simply follow the instructions to assemble it to its proper state. 

This is similar to how data compression works. The data is “disassembled” and packed tightly to reduce its size before being transmitted. Once it reaches its destination, it’s “reassembled” to its original form. This compression process reduces the amount of data that needs to be sent, saving bandwidth, reducing costs, and speeding up the transfer, just like how flat pack furniture reduces shipping costs and simplifies delivery logistics.

However, with compression, there is a tradeoff: time to compress versus the overall compression ratio. A compression ratio is a measure of how much a file’s size is reduced during compression. For example, a 10:1 compression ratio means that the compressed file is one-tenth the size of the original. Just like assembling flat-pack furniture takes time and effort, achieving higher compression ratios often requires more processing time. While a higher compression ratio significantly reduces file size — making data transmission faster and more efficient — it may take longer to compress and decompress the data. Conversely, quicker compression methods might produce larger files, leading to faster processing but at the cost of greater bandwidth usage. Balancing these factors is key to optimizing performance in data transmission.

W3 Technologies reports that as of September 12, 2024, 88.6% of websites rely on compression to optimize speed and reduce bandwidth usage. GZIP, introduced in 1996, remains the default algorithm for many, used by 57.0% of sites due to its reasonable compression ratios and fast compression speeds. Brotli, released by Google in 2016, delivers better compression ratios, leading to smaller file sizes, especially for static assets like JavaScript and CSS, and is used by 45.5% of websites. However, this also means that 11.4% of websites still operate without any compression, missing out on crucial performance improvements.

As the Internet and its supporting infrastructure have evolved, so have user demands for faster, more efficient performance. This growing need for higher efficiency without compromising speed is where Zstandard comes into play.

Enter Zstandard

Zstandard offers higher compression ratios comparable to GZIP, but with significantly faster compression and decompression speeds than Brotli. This makes it ideal for real-time applications that require both speed and relatively high compression ratios.

To understand Zstandard’s advantages, it’s helpful to know about Zlib. Zlib was developed in the mid-1990s based on the DEFLATE compression algorithm, which combines LZ77 and Huffman coding to reduce file sizes. While Zlib has been a compression standard since the mid-1990s and is used in Cloudflare’s open-source GZIP implementation, its design is limited by a 32 KB sliding window — a constraint from the memory limitations of that era. This makes Zlib less efficient on modern hardware, which can access far more memory.

Zstandard enhances Zlib by leveraging modern innovations and hardware capabilities. Unlike Zlib’s fixed 32 KB window, Zstandard has no strict memory constraints and can theoretically address terabytes of memory. However,  in practice, it typically uses much less, around 1 MB at lower compression levels. This flexibility allows Zstandard to buffer large amounts of data, enabling it to identify and compress repeating patterns more effectively. Zstandard also employs repcode modeling to efficiently compress structured data with repetitive sequences, further reducing file sizes and enhancing its suitability for modern compression needs.

Zstandard is optimized for modern CPUs, which can execute multiple tasks simultaneously using multiple Arithmetic Logic Units (ALUs) that are used to perform mathematical tasks. Zstandard achieves this by processing data in parallel streams, dividing it into multiple parts that are processed concurrently. The Huffman decoder, Huff0, can decode multiple symbols in parallel on a single CPU core, and when combined with multi-threading, this leads to substantial speed improvements during both compression and decompression.

Zstandard’s branchless design is a crucial innovation that enhances CPU efficiency, especially in modern processors. To understand its significance, consider how CPUs execute instructions.

Modern CPUs use pipelining, where different stages of an instruction are processed simultaneously—like a production line—keeping all parts of the processor busy. However, when CPUs encounter a branch, such as an ‘if-else’ decision, they must make a branch prediction to guess the next step. If the prediction is wrong, the pipeline must be cleared and restarted, causing slowdowns.

Zstandard avoids this issue by eliminating conditional branching. Without relying on branch predictions, it ensures the CPU can execute instructions continuously, keeping the pipeline full and avoiding performance bottlenecks.

A key feature of Zstandard is its use of Finite State Entropy (FSE), an advanced compression method that encodes data more efficiently based on probability. FSE, built on the Asymmetric Numeral System (ANS), allows Zstandard to use fractional bits for encoding, unlike traditional Huffman coding, which only uses whole bits. This allows heavily repeated data to be compressed more tightly without sacrificing efficiency.

Zstandard findings

In the third quarter of 2024, we conducted extensive tests on our new Zstandard compression module, focusing on a 24-hour period where we switched the default compression algorithm from Brotli to Zstandard across our Free plan traffic. This experiment spanned billions of requests, covering a wide range of file types and sizes, including HTML, CSS, and JavaScript. The results were very promising, with significant improvements in both compression speed and file size reduction, leading to faster load times and more efficient bandwidth usage.

Compression ratios

In terms of compression efficiency, Zstandard delivers impressive results. Below are the average compression ratios we observed during our testing.

Compression Algorithm

Average Compression Ratio

GZIP

2.56

Zstandard

2.86

Brotli

3.08

As the table shows, Zstandard achieves an average compression ratio of 2.86:1, which is notably higher than gzip’s 2.56:1 and close to Brotli’s 3.08:1. While Brotli slightly edges out Zstandard in terms of pure compression ratio, what is particularly exciting is that we are only using Zstandard’s default compression level of 3 (out of 22) on our traffic. In the fourth quarter of 2024, we plan to experiment with higher compression levels and multithreading capabilities to further enhance Zstandard’s performance and optimize results even more.

Compression speeds

What truly sets Zstandard apart is its speed. Below are the average times to compress data from our traffic-based tests measured in milliseconds:

Compression Algorithm

Average Time to Compress (ms)

GZIP

0.872

Zstandard

0.848

Brotli

1.544

Zstandard not only compresses data efficiently, but it also does so 42% faster than Brotli, with an average compression time of 0.848 ms compared to Brotli’s 1.544 ms. It even outperforms gzip, which compresses at 0.872 ms on average.

From our results, we have found Zstandard strikes an excellent balance between achieving a high compression ratio and maintaining fast compression speed, making it particularly well-suited for dynamic content such as HTML and non-cacheable sensitive data. Zstandard can compress these responses from the origin quickly and efficiently, saving time compared to Brotli while providing better compression ratios than GZIP.

Implementing Zstandard at Cloudflare

To implement Zstandard compression at Cloudflare, we needed to build it into our Nginx-based service which already handles GZIP and Brotli compression. Nginx is modular by design, with each module performing a specific function, such as compressing a response. Our custom Nginx module leverages Nginx’s function ‘hooks’ — specifically, the header filter and body filter — to implement Zstandard compression.

Header filter

The header filter allows us to access and modify response headers. For example, Cloudflare only compresses responses above a certain size (50 bytes for Zstandard), which is enforced with this code:

if (r->headers_out.content_length_n != -1 &&
    r->headers_out.content_length_n < conf->min_length) {
    return ngx_http_next_header_filter(r);
}

Here, we check the “Content-Length” header. If the content length is less than our minimum threshold, we skip compression and let Nginx execute the next module.

We also need to ensure the content is not already compressed by checking the “Content-Encoding” header:

if (r->headers_out.content_encoding &&
    r->headers_out.content_encoding->value.len) {
    return ngx_http_next_header_filter(r);
}

If the content is already compressed, the module is bypassed, and Nginx proceeds to the next header filter.

Body filter

The body filter hook is where the actual processing of the response body occurs. In our case, this involves compressing the data with the Zstandard encoder and streaming the compressed data back to the client. Since responses can be very large, it’s not feasible to buffer the entire response in memory, so we manage internal memory buffers carefully to avoid running out of memory.

The Zstandard library is well-suited for streaming compression and provides the ZSTD_compressStream2 function:

ZSTDLIB_API size_t ZSTD_compressStream2(ZSTD_CCtx* cctx,
                                        ZSTD_outBuffer* output,
                                        ZSTD_inBuffer* input,
                                        ZSTD_EndDirective endOp);

This function can be called repeatedly with chunks of input data to be compressed. It accepts input and output buffers and an “operation” parameter (ZSTD_EndDirective endOp) that controls whether to continue feeding data, flush the data, or finalize the compression process.

Nginx uses a “flush” flag on memory buffers to indicate when data can be sent. Our module uses this flag to set the appropriate Zstandard operation:

switch (zstd_operation) {
    case ZSTD_e_continue: {
        if (flush) {
            zstd_operation = ZSTD_e_flush;
        }
    }
}

This logic allows us to switch from the “ZSTD_e_continue” operation, which feeds more input data into the encoder, to “ZSTD_e_flush”, which extracts compressed data from the encoder.

Compression cycle

The compression module operates in the following cycle:

  1. Receive uncompressed data.

  2. Locate an internal buffer to store compressed data.

  3. Compress the data with Zstandard.

  4. Send the compressed data back to the client.

Once a buffer is filled with compressed data, it’s passed to the next Nginx module and eventually sent to the client. When the buffer is no longer in use, it can be recycled, avoiding unnecessary memory allocation. This process is managed as follows:

if (free) {
    // A free buffer is available, so use it
    buffer = free;
} else if (buffers_used < maximum_buffers) {
    // No free buffers, but we're under the limit, so allocate a new one
    buffer = create_buf();
} else {
    // No free buffers and can't allocate more
    err = no_memory;
}

Handling backpressure

If no buffers are available, it can lead to backpressure — a situation where the Zstandard module generates compressed data faster than the client can receive it. This causes data to become “stuck” inside Nginx, halting further compression due to memory constraints. In such cases, we stop compression and send an empty buffer to the next Nginx module, allowing Nginx to attempt to send the data to the client again. When successful, this frees up memory buffers that our module can reuse, enabling continued streaming of the compressed response without buffering the entire response in memory.

What’s next? Compression dictionaries

The future of Internet compression lies in the use of compression dictionaries. Both Brotli and Zstandard support dictionaries, offering up to 90% improvement on compression levels compared to using static dictionaries. 

Compression dictionaries store common patterns or sequences of data, allowing algorithms to compress information more efficiently by referencing these patterns rather than repeating them. This concept is akin to how an iPhone’s predictive text feature works. For example, if you frequently use the phrase “On My Way,” you can customize your iPhone’s dictionary to recognize the abbreviation “OMW” and automatically expand it to “On My Way” when you type it, saving the user from typing six extra letters.

O

M

W

O

n

M

y

W

a

y

Traditionally, compression algorithms use a static dictionary defined by its RFC that is shared between clients and origin servers. This static dictionary is designed to be broadly applicable, balancing size and compression effectiveness for general use. However, Zstandard and Brotli support custom dictionaries, tailored specifically to the content being sent to the client. For example, Cloudflare could create a specialized dictionary that focuses on frequently used terms like “Cloudflare”. This custom dictionary would compress these terms more efficiently, and a browser using the same dictionary could decode them accurately, leading to significant improvements in compression and performance.

In the future, we will enable users to leverage origin-generated dictionaries for Zstandard and Brotli to enhance compression. Another exciting area we’re exploring is the use of AI to create these dictionaries dynamically without them needing to be generated at the origin. By analyzing data streams in real-time, Cloudflare could develop context-aware dictionaries tailored to the specific characteristics of the data being processed. This approach would allow users to significantly improve both compression ratios and processing speed for their applications.

Compression Rules for everyone

Today we’re also excited to announce the introduction of Compression Rules for all our customers. By default, Cloudflare will automatically compress certain content types based on their Content-Type headers. Customers can use compression rules to optimize how and what Cloudflare compresses. This feature was previously exclusive to our Enterprise plans.

Compression Rules is built on the same robust framework as our other rules products, such as Origin Rules, Custom Firewall Rules, and Cache Rules, with additional fields for Media Type and Extension Type. This allows you to easily specify the content you wish to compress, providing granular control over your site’s performance optimization.

Compression rules are now available on all our pay-as-you-go plans and will be added to free plans in October 2024. This feature was previously exclusive to our Enterprise customers. In the table below, you’ll find the updated limits, including an increase to 125 Compression Rules for Enterprise plans, aligning with our other rule products’ quotas.

Plan Type

Free*

Pro

Business

Enterprise

Available Compression Rules

10

25

50

125

Using Compression Rules to enable Zstandard

To integrate our Zstandard module into our platform, we also added support for it within our Compression Rules framework. This means that customers can now specify Zstandard as their preferred compression method, and our systems will automatically enable the Zstandard module in Nginx, disabling other compression modules when necessary.

The Accept-Encoding header determines which compression algorithms a client supports. If a browser supports Zstandard (zstd), and both Cloudflare and the zone have enabled the feature, then Cloudflare will return a Zstandard compressed response.

If the client does not support Zstandard, then Cloudflare will automatically fall back to Brotli, GZIP, or serve the content uncompressed where no compression algorithm is supported, ensuring compatibility.

To enable Zstandard for your entire site or specifically filter on certain file types, all Cloudflare users can deploy a simple compression rule.

Further details and examples of what can be accomplished with Compression Rules can be found in our developer documentation.

Currently, we support Zstandard, Brotli, and GZIP as compression algorithms for traffic sent to clients, and support GZIP and Brotli (since 2023) compressed data from the origin. We plan to implement full end-to-end support for Zstandard in 2025, offering customers another effective way to reduce their egress costs.

Once Zstandard is enabled, you can view your browser’s Network Activity log to check the content-encoding headers of the response.

Enable Zstandard now!

Zstandard is now available to all Cloudflare customers through Compression Rules on our Enterprise and pay as you go plans, with free plans gaining access in October 2024. Whether you’re optimizing for speed or aiming to reduce bandwidth, Compression Rules give all customers granular control over their site’s performance.

Encrypted Client Hello (ECH)


While performance is crucial for delivering a fast user experience, ensuring privacy is equally important in today’s Internet landscape. As we optimize for speed with Zstandard, Cloudflare is also working to protect users’ sensitive information from being exposed during data transmission. With web traffic growing more complex and interconnected, it’s critical to keep both performance and privacy in balance. This is where technologies like Encrypted Client Hello (ECH) come into play, securing connections without sacrificing speed.

Ten years ago, we embarked on a mission to create a more secure and encrypted web. At the time, much of the Internet remained unencrypted, leaving user data vulnerable to interception. On September 27, 2014, we took a major step forward by enabling HTTPS for free for all Cloudflare customers. Overnight, we doubled the size of the encrypted web. This set the stage for a more secure Internet, ensuring that encryption was not a privilege limited by budget but a right accessible to everyone.

Since then, both Cloudflare and the broader community have helped encrypt more of the Internet. Projects like Let’s Encrypt launched to make certificates free for everyone. Cloudflare invested to encrypt more of the connection, and future-proof that encryption from coming technologies like quantum computers. We’ve always believed that it was everyone’s right, regardless of your budget, to have an encrypted Internet at no cost.

One of the last major challenges has been securing the SNI (Server Name Identifier), which remains exposed in plaintext during the TLS handshake. This is where Encrypted Client Hello (ECH) comes in, and today, we are proud to announce that we’re closing that gap. 

Cloudflare announced support for Encrypted Client Hello (ECH) in 2023 and has continued to enhance its implementation in collaboration with our Internet browser partners. During a TLS handshake, one of the key pieces of information exchanged is the Server Name Identifier (SNI), which is used to initiate a secure connection. Unfortunately, the SNI is sent in plaintext, meaning anyone can read it. Imagine hand-delivering a letter — anyone following you can see where you’re delivering it, even if they don’t know the contents. With ECH, it is like sending the same confidential letter to a P.O. Box. You place your sensitive letter in a sealed inner envelope with the actual address. Then, you put that envelope into a larger, standard envelope addressed to a public P.O. Box, trusted to securely forward your intended recipient. The larger envelope containing the non-sensitive information is visible to everyone, while the inner envelope holds the confidential details, such as the actual address and recipient. Just as the P.O. Box maintains the anonymity of the true recipient’s address, ECH ensures that the SNI remains protected. 

While encrypting the SNI is a primary motivation for ECH, its benefits extend further. ECH encrypts the entire TLS handshake, ensuring user privacy and enabling TLS to evolve without exposing sensitive connection data. By securing the full handshake, ECH allows for flexible, future-proof encryption designs that safeguard privacy as the Internet continues to grow.

How ECH works

Encrypted Client Hello (ECH) introduces a layer of privacy by dividing the ClientHello message into two distinct parts: an outer ClientHello and an inner ClientHello.

  1. Outer ClientHello: This part remains unencrypted and contains general information such as the list of ciphers and the TLS version. It includes a placeholder SNI, which is a common name used across Cloudflare’s network. For instance, all ECH-enabled websites on Cloudflare share the SNI cloudflare-ech.com. Cloudflare manages this domain and possesses the necessary certificates to handle TLS negotiations for it.

  2. Inner ClientHello: This part is encrypted and includes the actual server name the client wants to visit. Encryption ensures that this sensitive data can only be decrypted by Cloudflare.

During the TLS handshake, the outer ClientHello reveals only the placeholder SNI (e.g., cloudflare-ech.com), while the encrypted inner ClientHello carries the real server name. As a result, intermediaries observing the traffic will only see the generic outer ClientHello, concealing the actual destination.


The design of ECH effectively addresses many challenges in securely deploying handshake encryption, thanks to the collaborative efforts within the IETF community. The key to ECH’s success is its integration with other IETF standards, including the new HTTPS DNS resource record, which enables HTTPS endpoints to advertise different TLS capabilities and simplifies key distribution. By using Encrypted DNS methods, browsers and clients can anonymously query these HTTPS records. These records contain the ECH parameters needed to initiate a secure connection. 

ECH leverages the Hybrid Public Key Encryption (HPKE) standard, which streamlines the handshake encryption process, making it more secure and easier to implement. Before initiating a TCP connection, the user’s browser makes a DNS request for an HTTPS record, and zones with ECH enabled will include an ECH configuration in the HTTPS record containing an encryption public key and some associated metadata. For example, looking at the zone cloudflare-ech.com, you can see the following record returned:

dig cloudflare-ech.com https +short


1 . alpn="h3,h2" ipv4hint=104.18.10.118,104.18.11.118 ech=AEX+DQBB2gAgACD1W1B+GxY3nZ53Rigpsp0xlL6+80qcvZtgwjsIs4YoOwAEAAEAAQASY2xvdWRmbGFyZS1lY2guY29tAAA= ipv6hint=2606:4700::6812:a76,2606:4700::6812:b76

Aside from the public key used by the client to encrypt ClientHelloInner and other parameters that specify the ECH configuration, the ClientOuterHello is also present.

Y2xvdWRmbGFyZS1lY2guY29t

When the string is decoded it reveals:

cloudflare-ech.com

This indicates the public outer SNI endpoint and where the TLS handshake should be forwarded to.

Practical implications

With ECH, any observer monitoring the traffic between the client and Cloudflare will see only uniform TLS handshakes that appear to be directed towards cloudflare-ech.com, regardless of the actual website being accessed. For instance, if a user visits example.com, intermediaries will not discern this specific destination but will only see cloudflare-ech.com in the visible handshake data. 

The problem with middleboxes

In a basic HTTPS connection, a browser (client) establishes a TLS connection directly with an origin server to send requests and download content. However, many connections on the Internet do not go directly from a browser to the server but instead pass through some form of proxy or middlebox (often referred to as a “monster-in-the-middle” or MITM). This routing through intermediaries can occur for various reasons, both benign and malicious.

One common type of HTTPS interceptor is the TLS-terminating forward proxy. This proxy sits between the client and the destination server, transparently forwarding and potentially modifying traffic. To perform this task, the proxy terminates the TLS connection from the client, decrypts the traffic, and then re-encrypts and forwards it to the destination server over a new TLS connection. To avoid browser certificate validation errors, these forward proxies typically require users to install a root certificate on their devices. This root certificate allows the proxy to generate and present a trusted certificate for the destination server, a process often managed by network administrators in corporate environments, as seen with Cloudflare WARP. These services can help prevent sensitive company data from being transmitted to unauthorized destinations, safeguarding confidentiality.


However, TLS-terminating forward proxies are not equipped to handle Encrypted Client Hello (ECH) correctly. ECH separates the ClientHello message into an outer, unencrypted message and an inner, encrypted message. Since the proxy terminates the TLS connection and decrypts the traffic, it cannot manage or re-encrypt these messages as intended by ECH. Consequently, the proxy’s intervention can disrupt the ECH mechanism, potentially causing connection failures.

We also observed that specific Cloudflare setups, such as CNAME Flattening and Orange-to-Orange configurations, could cause ECH to break. This issue arose because the end destination for these requests did not support TLS 1.3, preventing ECH from being processed correctly. Fortunately, in close collaboration with our browser partners, we implemented a fallback in our BoringSSL implementation that handles TLS terminations. This fallback allows browsers to retry connections over TLS 1.2 without ECH, ensuring that a connection can be established and not break.

As a result of these improvements, we have enabled ECH by default for all Free plans, while all other plan types can manually enable it through their Cloudflare dashboard or via the API. We are excited to support ECH at scale, enhancing the privacy and security of users’ browsing activities. ECH plays a crucial role in safeguarding online interactions from potential eavesdroppers and maintaining the confidentiality of web activities.

HTTP/3 Prioritization and QUIC congestion control

Two other areas we are investing in to improve performance for all our customers are HTTP/3 Prioritization and QUIC congestion control. 

HTTP/3 Prioritization focuses on efficiently managing the order in which web assets are loaded, thereby improving web performance by ensuring critical assets are delivered faster. HTTP/3 Prioritization uses Extensible Priorities to simplify prioritization with two parameters: urgency (ranging from 0-7) and a true/false value indicating whether the resource can be processed progressively. This allows resources like HTML, CSS, and images to be prioritized based on importance.

On the other hand, QUIC congestion control aims to optimize the flow of data, preventing network bottlenecks and ensuring smooth, reliable transmission even under heavy traffic conditions. 

Both of these improvements significantly impact how Cloudflare’s network serves requests to clients. Before deploying these technologies across our global network, which handles peak traffic volumes of over 80 million requests per second, we first developed a reliable method to measure their impact through rigorous experimentation.

Measuring impact

Accurately measuring the impact of features implemented by Cloudflare for our customers is crucial for several reasons. These measurements ensure that optimizations related to performance, security, or reliability deliver the intended benefits without introducing new issues. Precise measurement validates the effectiveness of these changes, allowing Cloudflare to assess improvements in metrics such as load times, user experience, and overall site security. One of the best ways to measure performance changes is through aggregated real-world data.

Cloudflare Web Analytics offers free, privacy-first analytics for your website, helping you understand the performance of your web pages as experienced by your visitors. Real User Metrics (RUM) is a vital tool in web performance optimization, capturing data from real users interacting with a website, providing insights into site performance under real-world conditions. RUM tracks various metrics directly from the user’s device, including load times, resource usage, and user interactions. This data is essential for understanding the actual user experience, as it reflects the diverse environments and conditions under which the site is accessed.

A key performance indicator measured through RUM is Core Web Vitals (CWV), a set of metrics defined by Google that quantify crucial aspects of user experience on the web. CWV focuses on three main areas: loading performance, interactivity, and visual stability. The specific metrics include Largest Contentful Paint (LCP), which measures loading performance; First Input Delay (FID), which gauges interactivity; and Cumulative Layout Shift (CLS), which assesses visual stability. By using the CWV measurement in RUM, developers can monitor and optimize their applications to ensure a smoother, faster, and more stable user experience and track the impact of any changes they release.

Over the last three months we have developed the capability to include valuable information in Server-Timing response headers. When a page that uses Cloudflare Web Analytics is loaded in a browser, the privacy-first client-side script from Web Analytics collects browser metrics and server-timing headers, then sends back this performance data. This data is ingested, aggregated, and made available for querying. The server-timing header includes Layer 4 information, such as Round-Trip Time (RTT) and protocol type (TCP or QUIC). Combined with Core Web Vitals data, this allows us to determine whether an optimization has positively impacted a request compared to a control sample. This capability enables us to release large-scale changes such as HTTP/3 Prioritization or BBR with a clear understanding of their impact across our global network.

An example of this header contains several key properties that provide valuable information about the network performance as observed by the server:

server-timing: cfL4;desc="?proto=TCP&rtt=7337&sent=8&recv=8&lost=0&retrans=0&sent_bytes=3419&recv_bytes=832&delivery_rate=548023&cwnd=25&unsent_bytes=0&cid=94dae6b578f91145&ts=225
  • proto: Indicates the transport protocol used

  • rtt: Round-Trip Time (RTT), representing the duration of the network round trip as measured by the layer 4 connection using a smoothing algorithm.

  • sent: Number of packets sent.

  • recv: Number of packets received.

  • lost: Number of packets lost.

  • retrans: Number of retransmitted packets.

  • sent_bytes: Total number of bytes sent.

  • recv_bytes: Total number of bytes received.

  • delivery_rate: Rate of data delivery, an instantaneous measurement in bytes per second.

  • cwnd: Congestion Window, an instantaneous measurement of packet or byte count depending on the protocol.

  • unsent_bytes: Number of bytes not yet sent.

  • cid: A 16-byte hexadecimal opaque connection ID.

  • ts: Timestamp in milliseconds, representing when the data was captured.

This real-time collection of performance data via RUM and Server-Timing headers allows Cloudflare to make data-driven decisions that directly enhance user experience. By continuously analyzing these detailed network and performance insights, we can ensure that future optimizations, such as HTTP/3 Prioritization or BBR deployment, are delivering tangible benefits for our customers.

Enabling HTTP/3 Prioritization for all plans

As part of our focus on improving observability through the integration of the server-timing header, we implemented several minor changes to optimize QUIC handshakes. Notably, we observed positive improvements in our telemetry due to the Layer 4 observability enhancements provided by the server-timing header. These internal findings coincided with third-party measurements, which showed similar improvements in handshake performance.

In the fourth quarter of 2024, we will apply the same experimental methodology to the HTTP/3 Prioritization support announced during Speed Week 2023. HTTP/3 Prioritization is designed to enhance the efficiency and speed of loading web pages by intelligently managing the order in which web assets are delivered to users. This is crucial because modern web pages are composed of numerous elements — such as images, scripts, and stylesheets — that vary in importance. Proper prioritization ensures that critical elements, like primary content and layout, load first, delivering a faster and more seamless browsing experience.

We will use this testing framework to measure performance improvements before enabling the feature across all plan types. This process allows us not only to quantify the benefits but, most importantly, to ensure there are no performance regressions.

Congestion control

Following the completion of the HTTP/3 Prioritization experiments we will then begin testing different congestion control algorithms, specifically focusing on BBR (Bottleneck Bandwidth and Round-trip propagation time) version 3. Congestion control is a crucial mechanism in network communication that aims to optimize data transfer rates while avoiding network congestion. When too much data is sent too quickly over a network, it can lead to congestion, causing packet loss, delays, and reduced overall performance. Think of a busy highway during rush hour. If too many cars (data packets) flood the highway at once, traffic jams occur, slowing everyone down.


Congestion control algorithms act like traffic managers, regulating the flow of data to prevent these “traffic jams,” ensuring that data moves smoothly and efficiently across the network. Each side of a connection runs an algorithm in real time, dynamically adjusting the flow of data based on the current and predicted network conditions.

BBR is an advanced congestion control algorithm, initially developed by Google. BBR seeks to estimate the actual available bandwidth and the minimum round-trip time (RTT) to determine the optimal data flow. This approach allows BBR to maintain high throughput while minimizing latency, leading to more efficient and stable network performance.

BBR v3, the latest iteration, builds on the strengths of its predecessors BBRv1 and BBRv2 by further refining its bandwidth estimation techniques and enhancing its adaptability to varying network conditions. We found BBR v3 to be faster in several cases compared to our previous implementation of CUBIC. Most importantly, it reduced loss and retransmission rates in our Oxy proxy implementation.

With these promising results, we are excited to test various congestion control algorithms including BBRv3 for quiche, our QUIC implementation, across our HTTP/3 traffic. Combining the layer 4 server-timing information with experiments in this area will enable us to explicitly control and measure the impact on real-world metrics.

The future

The future of the Internet relies on continuous innovation to meet the growing demands for speed, security, and scalability. Technologies like Zstandard for compression, BBR for congestion control, HTTP/3 prioritization, and Encrypted Client Hello are setting new standards for performance and privacy. By implementing these protocols, web services can achieve faster page load times, more efficient bandwidth usage, and stronger protections for user data.

These advancements don’t just offer incremental improvements, they provide a significant leap forward in optimizing the user experience and safeguarding online interactions. At Cloudflare, we are committed to making these technologies accessible to everyone, empowering businesses to deliver better, faster, and more secure services. 

Stay tuned for more developments as we continue to push the boundaries of what’s possible on the web and if you’re passionate about building and implementing the latest Internet innovations, we’re hiring!

Cloudflare helps verify the security of end-to-end encrypted messages by auditing key transparency for WhatsApp

Post Syndicated from Thibault Meunier original https://blog.cloudflare.com/key-transparency

Chances are good that today you’ve sent a message through an end-to-end encrypted (E2EE) messaging app such as WhatsApp, Signal, or iMessage. While we often take the privacy of these conversations for granted, they in fact rely on decades of research, testing, and standardization efforts, the foundation of which is a public-private key exchange. There is, however, an oft-overlooked implicit trust inherent in this model: that the messaging app infrastructure is distributing the public keys of all of its users correctly.

Here’s an example: if Joe and Alice are messaging each other on WhatsApp, Joe uses Alice’s phone number to retrieve Alice’s public key from the WhatsApp database, and Alice receives Joe’s public key. Their messages are then encrypted using this key exchange, so that no one — even WhatsApp — can see the contents of their messages besides Alice and Joe themselves. However, in the unlikely situation where an attacker, Bob, manages to register a different public key in WhatsApp’s database, Joe would try to message Alice but unknowingly be messaging Bob instead. And while this threat is most salient for journalists, activists, and those most vulnerable to cyber attacks, we believe that protecting the privacy and integrity of end-to-end encrypted conversations is for everyone.

There are several methods that end-to-end encrypted messaging apps have deployed thus far to protect the integrity of public key distribution, the most common of which is to do an in-person verification of the QR code fingerprint of your public key (WhatsApp and Signal both have a version of this). As you can imagine, this experience is inconvenient and unwieldy, especially as your number of contacts and group chats increase.

Over the past few years, there have been significant developments in this area of cryptography, and WhatsApp has paved the way with their Key Transparency announcement. But as an independent third party, Cloudflare can provide stronger reassurance: that’s why we’re excited to announce that we’re now verifying WhatsApp’s Key Transparency audit proofs. 

Auditing: the next frontier of encryption 

We didn’t build this in a vacuum: similar to how the web and messaging apps became encrypted over time, we see auditing public key infrastructure as the next logical step in securing Internet infrastructure. This solution builds upon learnings from Certificate Transparency and Binary Transparency, which share some of the underlying data structure and cryptographic techniques, and we’re excited about the formation of a working group at the IETF to make multi-party operation of Key Transparency-like systems tractable for a broader set of use cases. 

We see our role here as a pioneer of a real world deployment of this auditing infrastructure, working through and sharing the operational challenges of operating a system that is critical for a messaging app used by billions of people around the world.   

We’ve also done this before — in 2022, Cloudflare announced Code Verify, a partnership in which we verify that the code delivered in the browser for WhatsApp Web has not been tampered with. When users run WhatsApp in their browser, the WhatsApp Code Verify extension compares a hash of the code that is executing in the browser with the hash that Cloudflare has of the codebase, enabling WhatsApp web users to easily see whether the code that is executing is the code that was publicly committed to. 

In Code Verify, Cloudflare builds a non-mutable chain associating the WhatsApp version with the hash of its code.

Cloudflare’s role in Key Transparency is similar in that we are checking that a tree-based directory of public keys (more on this later) has been constructed correctly, and has been done so consistently over time.

How Key Transparency works

The architectural foundation of Key Transparency is the Auditable Key Directory (AKD): a tree-shaped data structure, constructed and maintained by WhatsApp, in which the nodes contain hashed contact details of each user. We’ll explain the basics here but if you’re interested in learning more, check out the SEEMless and Parakeet papers.

The AKD tree is constructed by building a binary tree, each parent node of which is a hash of each of its left and right child nodes:

Each child node on the tree contains contact and public key details for a user (shown here for illustrative purposes). In reality, Cloudflare only sees a hash of each node rather than Alice and Bob’s contact info in plaintext.

An epoch describes a specific version of the tree at a given moment in time, identified by its root node. Using a structure similar to Code Verify, the WhatsApp Log stores each root node hash as part of an append-only time structure of updates.

What kind of changes are valid to be included in a given epoch? When a new person, Brian, joins WhatsApp, WhatsApp inserts a new “B” node in the AKD tree, and a new epoch. If Alice loses her phone and rotates her key, her “version” is updated to v1 in the next update.  

How we built the Auditor on Cloudflare Workers 

The role of the Auditor is to provide two main guarantees: that epochs are globally unique, and that they are valid. They are, however, quite different: global uniqueness requires consistency on whether an epoch and its associated root hash has been seen, while validity is a matter of computation — is the transition from the previous epoch to the current one a correct tree transformation?

Timestamping service

Timestamping service architecture (Cloudflare Workers in Rust, using a Durable Object for storage)

At regular intervals, the WhatsApp Log puts all new updates into the tree, and cuts a new epoch in the format “{counter}/{previous}/{current}”. The counter is a number, whereby “previous” is a hexadecimal encoded hash of the previous tree root, and “current” is a hexadecimal encoded hash for the new tree root. As a shorthand, epochs can be referred to by their counter only.

Here’s an example:

1001/d0bbf29c48716f26a951ae2a244eb1d070ee38865c29c8ad8174e8904e3cdc1a/e1006114485e8f0bbe2464e0ebac77af37bce76851745592e8dd5991ff2cd411

Once an epoch is constructed, the WhatsApp Log sends it to the Auditor for cross-signing, to ensure it has only been seen once. The Auditor adds a timestamp as to when this new epoch has been seen. Cloudflare’s Auditor uses a Durable Object for every epoch to create their timestamp. This guarantees the global uniqueness of an epoch, and the possibility of replay in the event the WhatsApp Log experiences an outage or is distributed across multiple locations. WhatsApp’s Log is expected to produce new epochs at regular intervals, given this constrains the propagation of public key updates seen by their users. Therefore, Cloudflare Auditor does not have to keep the durable object state forever. Once replay and consistency have been accounted for, this state is cleared. This is done after a month, thanks to durable object alarms.

Additional checks are performed by the service, such as checking that the epochs are consecutive, or that their digest is unique. This enforces a chain of epochs and their associated digests, provided by the WhatsApp Log and signed by the Auditor, providing a consistent view for all to see.

We decided to write this service in Rust because Workers rely on cloudflare/workers-rs bindings, and the auditable key directory library is also in Rust (facebook/akd).

Tree validation service

With the timestamping service above, WhatsApp users (as well as their Log) have assurance that epochs are transparent. WhatsApp’s directory can be audited at any point in time, and if it were to be tampered with by WhatsApp or an intermediary, the WhatsApp Log can be held accountable for it.

Epochs and their digests are only representations of their underlying key directory. To fully audit the directory, the transition from the previous digest to a current digest has to be validated. To perform validation, we need to run the epoch validation method. Specifically, we want to run verify_consecutive_append_only on every epoch constructed by the Log. The size of an epoch varies with the number of updates it contains, and therefore the number of associated nodes in the tree to construct as well. While Workers are able to run such validation for a small number of updates, this is a compute-intensive task. Therefore, still leveraging the same Rust codebase, the Auditor leverages a container that only performs the tree construction and validation. The Auditor retrieves the updates for a given epoch, copies them into its own R2 bucket, and delegates the validation to a container running on Cloudflare. Once validated, the epoch is marked as verified.

Architecture for Cloudflare’s Plexi Auditor. The proof verification and signatures stored do not contain personally identifiable information such as your phone number, public key, or other metadata tied to your WhatsApp account.

This decouples global uniqueness requirements and epoch validation, which happens at two distinct times. It allows the validation to take more time, and not be latency sensitive.

How can I verify Cloudflare has signed an epoch?

Anyone can perform audit proof verification — the proofs are publicly available — but Cloudflare will be doing so automatically and publicly to make the results accessible to all. Verify that Cloudflare’s signature matches WhatsApp’s by visiting our Key Transparency website, or via our command line tool.

To use our command line tool, you’ll need to download the plexi client. It helps construct data structures which are used for signatures, and requires you to have git and cargo installed.

cargo install plexi

With the client installed, let’s now check the audit proofs for WhatsApp namespace: whatsapp.key-transparency.v1. Plexi Auditor is represented by one public key, which can verify and vouch for multiple Logs with their own dedicated “namespace.” To validate an epoch, such as epoch 458298 (the epoch at which the log decided to start sharing data), you can run the following command:

plexi audit --remote-url 'https://akd-auditor.cloudflare.com' --namespace 'whatsapp.key-transparency.v1' --long
Namespace
  Name              	: whatsapp.key-transparency.v1
  Ciphersuite       	: ed25519(protobuf)

Signature (2024-09-23T16:53:45Z)
  Epoch height      	: 489193
  Epoch digest      	: cbe5097ae832a3ae51ad866104ffd4aa1f7479e873fd18df9cb96a02fc91ebfe
  Signature         	: fe94973e19da826487b637c019d3ce52f0c08093ada00b4fe6563e2f8117b4345121342bc33aae249be47979dfe704478e2c18aed86e674df9f934b718949c08
  Signature verification: success
  Proof verification	: success

Interested in having Cloudflare audit your public key infrastructure?

At the end of the day, security threats shouldn’t become usability problems — everyday messaging app users shouldn’t have to worry about whether the public keys of the people they’re talking to have been compromised. In the same way that certificate transparency is now built into the issuance and use of digital certificates to encrypt web traffic, we think that public key transparency and auditing should be built into end-to-end encrypted systems by default, so that users never have to do manual QR code verification again.

We built our auditing service to be general purpose, reliable, and fast, and WhatsApp’s Key Transparency is just the first of several use cases it will be used for – Cloudflare is interested in helping audit the delivery of code binaries and integrity of all types of end-to-end encrypted infrastructure. If your company or organization is interested in working with us, you can reach out to us here.

Cloudflare partners with Internet Service Providers and network equipment providers to deliver a safer browsing experience to millions of homes

Post Syndicated from Kelly May Johnston original https://blog.cloudflare.com/safer-resolver


A committed journey of privacy and security

In 2018, Cloudflare announced 1.1.1.1, one of the fastest, privacy-first consumer DNS services. 1.1.1.1 was the first consumer product Cloudflare ever launched, focused on reaching a wider audience. This service was designed to be fast and private, and does not retain information that would identify who is making a request.

In 2020, Cloudflare announced 1.1.1.1 for Families, designed to add a layer of protection to our existing 1.1.1.1 public resolver. The intent behind this product was to provide consumers, namely families, the ability to add a security and adult content filter to block unsuspecting users from accessing specific sites when browsing the Internet.

Today, we are officially announcing that any ISP and equipment manufacturer can use our DNS resolvers for free. Internet service, network, and hardware equipment providers can sign up and join this program to partner with Cloudflare to deliver a safer browsing experience that is easy to use, industry leading, and at no cost to anyone.

Leading companies have already partnered with Cloudflare to deliver superior and customized offerings to protect their customers. By delivering this service in a place where the customer is familiar, you can help us make the Internet a safe place for all. 

A need to intentionally focus on families

COVID-19 presented new challenges beginning in 2020 as kids’ online activity increased and the reliance on home networks was more present than ever before. Research shows around 67% of adolescents have access to a tablet, with ages as low as two years old accessing media content. While it is often impressive to watch the ease with which a young child can navigate a smartphone or tablet handed to them and pull up their favorite YouTube show, it becomes increasingly concerning that kids may unintentionally stumble onto harmful content in the process.

Our launch of 1.1.1.1 for Families in 2020 provided that peace of mind to users around the globe, and it continues to deliver those protections. Today, households can set up this service using our guide. They can select the DNS resolver they want to use, focusing on just privacy, or include blocking security threats and adult content across their entire home network.

Although this service is available and free for anyone to use, there are still many users who browse online daily without protections in place. Setting up protection like this can feel daunting, and many users are at a loss on where to begin and/or how to configure this on their devices or home network. Today we are announcing a partnership that will make setup and configuration much easier for users.

Partnering to extend security even further 

ISPs and network providers can use Cloudflare’s different resolver services to provide various offerings to their customers. Our existing partners have taken these offerings and built them into their platforms as an extension of the services that they are already providing to their customers. This built-in model allows for easy adoption and a consistent and comprehensive end customer journey. Each service is designed with a specific purpose in mind, outlined below:

Our core privacy resolver (1.1.1.1) is designed for speed and privacy.  Additionally, DNS requests to our public resolver can be sent over a secure channel using DNS over HTTPS (DoH) or DNS over TLS (DoT), significantly decreasing the odds of any unwanted spying or monster-in-the-middle attacks.

Our security resolver (1.1.1.2) has all the benefits of 1.1.1.1, with the additional benefit of protecting users from sites that contain malware, spam, botnet command and control attacks, or phishing threats.

Our family resolver (1.1.1.3) provides all the benefits of 1.1.1.2, with the additional benefit of blocking unwanted adult content from both direct site navigation, as well as locking public search engines to Safe Search only. This prevents anyone from unknowingly searching for something that might return an unwanted result. 

Premium Safety & Customizations 

If users want even more flexibility than what our public DNS resolvers provide, Cloudflare also offers a Gateway product that allows customized filtering, reporting, logging, analytics, and scheduling. This advanced Gateway offering includes over 114 categories ranging from social media, online messaging platforms, gaming, and “safe search” results, all the way to “home & garden”.

The additional filters and scheduling functionality empowers users to exercise more nuanced and time-based controls, such as limiting social media during school hours or dinner time. 

If you are an ISP or equipment manufacturer looking to provide easily customizable options for your customers, this is also an available option. We have a multi-tenant environment available for our Gateway offering that enables our customers to empower their individual subscribers to configure their own individual filters for their users and homes. If you are a device manufacturer or ISP looking to offer customizable protections for your individual subscribers, this may be a good option for you.

Our continued commitment to privacy, security, and safety

An easy choice 

Simply put, Cloudflare is an easy and obvious choice for protecting individuals and families. This is why leading companies have all chosen to partner with Cloudflare to help protect customers and their families. In 2020, after launching 1.1.1.1 for Families, we were serving 200+ billion DNS queries per day for 1.1.1.1. Today, we serve 1.7 trillion queries per day for 1.1.1.1 and our network presence spans over 330 cities and interconnects with over 12,500+ other networks. It is this network that puts us within a blink of an eye for 95% of the world’s Internet-connected population (your customers), ensuring they receive lightning fast speed while browsing.

Beyond our speed, Cloudflare is used as a reverse proxy by nearly ~ 20% of all websites across the globe. This gives us incredible insight to the latest Internet trends, threats, and research. In partnering with us, you can leverage our strengths — powerful infrastructure, extensive data insights, and a dedicated threat intelligence team – while focusing on your core priorities.  There is no better partner to have than one who provides global reach, excellent performance, and built-in privacy.

Join us in making a safe browsing experience easy for everyone

Cloudflare began with a singular goal of helping build a better Internet, and our annual Birthday Week is a catalyst for many developments that have shaped a better Internet for everyone.

We remain committed to helping to protect and build a better Internet for every user, and to do so, we need to meet them where they are. Our partnerships are critical in making this a reality, and we want you to be a part of the solution with us.

Whether you are interested in our public DNS resolvers or our more advanced Gateway options, Cloudflare has easy and scalable options for everyone. You can sign up to join this program as a partner today by submitting this form, and we will be in touch to understand your needs and bring you on board.


Introducing Ephemeral IDs: a new tool for fraud detection

Post Syndicated from Oliver Payne original https://blog.cloudflare.com/turnstile-ephemeral-ids-for-fraud-detection

In the early days of the Internet, a single IP address was a reliable indicator of a single user. However, today’s Internet is more complex. Shared IP addresses are now common, with users connecting via mobile IP address pools, VPNs, or behind CGNAT (Carrier Grade Network Address Translation). This makes relying on IP addresses alone a weak method to combat modern threats like automated attacks and fraudulent activity. Additionally, many Internet users have no option but to use an IP address which they don’t have sole control over, and as such, should not be penalized for that.

At Cloudflare, we are solving this complexity with Turnstile, our CAPTCHA alternative. And now, we’re taking the next step in advancing security with Ephemeral IDs, a new feature that generates a unique short-lived ID, without relying on any network-level information.

When a website visitor interacts with Turnstile, we now calculate an Ephemeral ID that can link behavior to a specific client instead of an IP address. This means that even when attackers rotate through large pools of IP addresses, we can still identify and block malicious actions. For example, in attacks like credential stuffing or account signups, where fraudsters attempt to disguise themselves using different IP addresses, Ephemeral IDs allow us to detect abuse patterns more accurately beyond just determining whether the visitor is a human or a bot. Multiple fraudulent actions from the same client are grouped together, improving our detection rate while reducing false positives.

How Ephemeral IDs work

Turnstile detects bots by analyzing browser attributes and signals. Using these aggregated client-side signals, we generate a short-lived Ephemeral ID without setting any cookies or using similar client-side storage. These IDs are intentionally not 100% unique and have a brief lifespan, making them highly effective in identifying patterns of fraud and abuse, without compromising user privacy.

When the same visitor interacts with Turnstile widgets from different Cloudflare customers, they receive different Ephemeral IDs for each one. Additionally, because these IDs change frequently, they cannot be used to track a single visitor over multiple days.


Blue: A single IP address | Green: A single Ephemeral ID
The bigger the node, the more frequently seen that ID or IP address was in our dataset.

The graphic above illustrates the complex reality of the modern Internet, where the relationship between clients and IP addresses is far from a simple one-to-one mapping. While some straightforward mappings still exist, they are no longer the norm.

During a period where a site or service is under attack, we observe a “nest” of highly correlated Ephemeral IDs. In the example below, the correlation is based on both Ephemeral ID and IP address.


Nest in the center of the diagram visualizes thousands of IP addresses (blue) which are correlated by the commonly identified Ephemeral IDs (green). The bigger the node, the more frequently seen that ID or IP address was in our dataset.

This is real-world data showing fraudulent activity on one of Cloudflare’s public-facing forms. Even with access to a broad range of IP addresses, attackers struggle to completely disguise their requests because Ephemeral IDs are generated based on patterns beyond IP addresses. This means that even if they rotate addresses, the underlying client characteristics are still detected, making it harder for them to evade our security measures. This makes it easier for us to group these requests and apply appropriate business logic, whether that means discarding the requests, requiring further validation, enforcing multi-factor authentication (MFA), or other actions. 

This new client identification technology seamlessly integrates into the broader advancements we’ve made to Turnstile over the past year. Whether you’re protecting login forms, signup pages, or high value transactions, you’ll immediately benefit from this extra layer of abuse detection without needing to change a single line of code. We’ll take care of all the heavy lifting and analysis behind the scenes, and our system will continue to improve its accuracy and effectiveness over time.

What does this mean for you? Starting today, Turnstile will go beyond just identifying bots. All websites protected by Turnstile will automatically benefit from the integration of Ephemeral IDs into our detection logic. This means we can more effectively identify and penalize offending clients without impacting other users on the same network, or IP address, improving security and user experience for everyone.

Ephemeral IDs in action

Everyone benefits from the addition of Ephemeral IDs to the Challenge Platform, but for those who want to use it beyond that, the Ephemeral ID is available through the Turnstile siteverify response. A practical use case for Ephemeral IDs is preventing fraudulent account signups. Imagine a bad actor, a real person using a real device, creating hundreds of fake accounts while rotating IP addresses to avoid detection. By ingesting Ephemeral IDs and logging them alongside your account creation logs, you can set up alerts based on account creation thresholds in real-time or retroactively investigate suspicious activity. Even though Ephemeral IDs are short-lived and may have changed by the time an investigation begins, they still provide valuable insights through aggregate analysis, and provide an extra dimension to identify fraud and abuse.

For our Turnstile Enterprise and Bot Management Enterprise customers, you now have the option to access Ephemeral IDs directly through the Turnstile siteverify response. Get in touch with your Account Executive to enable it on your account.

Below is an example of siteverify response for those who have enabled Ephemeral IDs.

curl 'https://challenges.cloudflare.com/turnstile/v0/siteverify' --data 'secret=verysecret&response=<RESPONSE>'
{
    "success": true,
    "error-codes": [],
    "challenge_ts": "2024-09-10T17:29:00.463Z",
    "hostname": "example.com",
    "metadata": {
        "ephemeral_id": "x:9f78e0ed210960d7693b167e"
    }
}

What’s next for Turnstile?

We launched Turnstile with a bold mission: to redefine CAPTCHAs with a frictionless, privacy-first solution that eliminates the annoyance of picking puzzles, selecting stoplights, and clicking crosswalks to prove our humanity. It’s incredible to think that Turnstile has been generally available for a whole year now! During this time, it has blocked over one trillion bots, and is actively protecting more than 350,000 domains worldwide.

As we celebrate Turnstile’s second birthday, we’re proud of the progress we’ve made and thrilled to introduce our latest innovations. While Ephemeral IDs represent the newest evolution of Turnstile, they’re part of our ongoing commitment to continuous improvement. Over the past year, we’ve also introduced a Cloudflare Pages Plugin and partnered with Google Firebase, ensuring that developers have easy access to Turnstile.

Earlier this year, we also launched Pre-Clearance for Turnstile, integrating it with Cloudflare WAF’s Challenge action, making it easier for customers to use Cloudflare’s Application Security products together. If you want to learn more about how to use Turnstile with Cloudflare’s Bot Management and WAF in more detail, check it out here!

We’re incredibly excited about what’s ahead. The introduction of Ephemeral IDs is just one of many innovations on the horizon. We’re committed to making the Internet a safer, more private place for everyone, eliminating the need for frustrating CAPTCHA puzzles while keeping security our top priority. And with our free tier remaining open and unlimited for all, there’s no barrier to getting started with Turnstile today.

Join us in revolutionizing online security – get started with Turnstile now or dive straight into our how-to guides. Let’s help make the Internet a better place, together!

People-Search Site Removal Services Largely Ineffective

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/08/people-search-site-removal-services-largely-ineffective.html

Consumer Reports has a new study of people-search site removal services, concluding that they don’t really work:

As a whole, people-search removal services are largely ineffective. Private information about each participant on the people-search sites decreased after using the people-search removal services. And, not surprisingly, the removal services did save time compared with manually opting out. But, without exception, information about each participant still appeared on some of the 13 people-search sites at the one-week, one-month, and four-month intervals. We initially found 332 instances of information about the 28 participants who would later be signed up for removal services (that does not include the four participants who were opted out manually). Of those 332 instances, only 117, or 35%, were removed within four months.

Online Privacy and Overfishing

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/06/online-privacy-and-overfishing.html

Microsoft recently caught state-backed hackers using its generative AI tools to help with their attacks. In the security community, the immediate questions weren’t about how hackers were using the tools (that was utterly predictable), but about how Microsoft figured it out. The natural conclusion was that Microsoft was spying on its AI users, looking for harmful hackers at work.

Some pushed back at characterizing Microsoft’s actions as “spying.” Of course cloud service providers monitor what users are doing. And because we expect Microsoft to be doing something like this, it’s not fair to call it spying.

We see this argument as an example of our shifting collective expectations of privacy. To understand what’s happening, we can learn from an unlikely source: fish.

In the mid-20th century, scientists began noticing that the number of fish in the ocean—so vast as to underlie the phrase “There are plenty of fish in the sea”—had started declining rapidly due to overfishing. They had already seen a similar decline in whale populations, when the post-WWII whaling industry nearly drove many species extinct. In whaling and later in commercial fishing, new technology made it easier to find and catch marine creatures in ever greater numbers. Ecologists, specifically those working in fisheries management, began studying how and when certain fish populations had gone into serious decline.

One scientist, Daniel Pauly, realized that researchers studying fish populations were making a major error when trying to determine acceptable catch size. It wasn’t that scientists didn’t recognize the declining fish populations. It was just that they didn’t realize how significant the decline was. Pauly noted that each generation of scientists had a different baseline to which they compared the current statistics, and that each generation’s baseline was lower than that of the previous one.

What seems normal to us in the security community is whatever was commonplace at the beginning of our careers.

Pauly called this “shifting baseline syndrome” in a 1995 paper. The baseline most scientists used was the one that was normal when they began their research careers. By that measure, each subsequent decline wasn’t significant, but the cumulative decline was devastating. Each generation of researchers came of age in a new ecological and technological environment, inadvertently masking an exponential decline.

Pauly’s insights came too late to help those managing some fisheries. The ocean suffered catastrophes such as the complete collapse of the Northwest Atlantic cod population in the 1990s.

Internet surveillance, and the resultant loss of privacy, is following the same trajectory. Just as certain fish populations in the world’s oceans have fallen 80 percent, from previously having fallen 80 percent, from previously having fallen 80 percent (ad infinitum), our expectations of privacy have similarly fallen precipitously. The pervasive nature of modern technology makes surveillance easier than ever before, while each successive generation of the public is accustomed to the privacy status quo of their youth. What seems normal to us in the security community is whatever was commonplace at the beginning of our careers.

Historically, people controlled their computers, and software was standalone. The always-connected cloud-deployment model of software and services flipped the script. Most apps and services are designed to be always-online, feeding usage information back to the company. A consequence of this modern deployment model is that everyone—cynical tech folks and even ordinary users—expects that what you do with modern tech isn’t private. But that’s because the baseline has shifted.

AI chatbots are the latest incarnation of this phenomenon: They produce output in response to your input, but behind the scenes there’s a complex cloud-based system keeping track of that input—both to improve the service and to sell you ads.

Shifting baselines are at the heart of our collective loss of privacy. The U.S. Supreme Court has long held that our right to privacy depends on whether we have a reasonable expectation of privacy. But expectation is a slippery thing: It’s subject to shifting baselines.

The question remains: What now? Fisheries scientists, armed with knowledge of shifting-baseline syndrome, now look at the big picture. They no longer consider relative measures, such as comparing this decade with the last decade. Instead, they take a holistic, ecosystem-wide perspective to see what a healthy marine ecosystem and thus sustainable catch should look like. They then turn these scientifically derived sustainable-catch figures into limits to be codified by regulators.

In privacy and security, we need to do the same. Instead of comparing to a shifting baseline, we need to step back and look at what a healthy technological ecosystem would look like: one that respects people’s privacy rights while also allowing companies to recoup costs for services they provide. Ultimately, as with fisheries, we need to take a big-picture perspective and be aware of shifting baselines. A scientifically informed and democratic regulatory process is required to preserve a heritage—whether it be the ocean or the Internet—for the next generation.

This essay was written with Barath Raghavan, and previously appeared in IEEE Spectrum.

Privacy Implications of Tracking Wireless Access Points

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/05/privacy-implications-of-tracking-wireless-access-points.html

Brian Krebs reports on research into geolocating routers:

Apple and the satellite-based broadband service Starlink each recently took steps to address new research into the potential security and privacy implications of how their services geolocate devices. Researchers from the University of Maryland say they relied on publicly available data from Apple to track the location of billions of devices globally—including non-Apple devices like Starlink systems—and found they could use this data to monitor the destruction of Gaza, as well as the movements and in many cases identities of Russian and Ukrainian troops.

Really fascinating implications to this research.

Research paper: “Surveilling the Masses with Wi-Fi-Based Positioning Systems:

Abstract: Wi-Fi-based Positioning Systems (WPSes) are used by modern mobile devices to learn their position using nearby Wi-Fi access points as landmarks. In this work, we show that Apple’s WPS can be abused to create a privacy threat on a global scale. We present an attack that allows an unprivileged attacker to amass a worldwide snapshot of Wi-Fi BSSID geolocations in only a matter of days. Our attack makes few assumptions, merely exploiting the fact that there are relatively few dense regions of allocated MAC address space. Applying this technique over the course of a year, we learned the precise
locations of over 2 billion BSSIDs around the world.

The privacy implications of such massive datasets become more stark when taken longitudinally, allowing the attacker to track devices’ movements. While most Wi-Fi access points do not move for long periods of time, many devices—like compact travel routers—are specifically designed to be mobile.

We present several case studies that demonstrate the types of attacks on privacy that Apple’s WPS enables: We track devices moving in and out of war zones (specifically Ukraine and Gaza), the effects of natural disasters (specifically the fires in Maui), and the possibility of targeted individual tracking by proxy—all by remotely geolocating wireless access points.

We provide recommendations to WPS operators and Wi-Fi access point manufacturers to enhance the privacy of hundreds of millions of users worldwide. Finally, we detail our efforts at responsibly disclosing this privacy vulnerability, and outline some mitigations that Apple and Wi-Fi access point manufacturers have implemented both independently and as a result of our work.

Personal AI Assistants and Privacy

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/05/personal-ai-assistants-and-privacy.html

Microsoft is trying to create a personal digital assistant:

At a Build conference event on Monday, Microsoft revealed a new AI-powered feature called “Recall” for Copilot+ PCs that will allow Windows 11 users to search and retrieve their past activities on their PC. To make it work, Recall records everything users do on their PC, including activities in apps, communications in live meetings, and websites visited for research. Despite encryption and local storage, the new feature raises privacy concerns for certain Windows users.

I wrote about this AI trust problem last year:

One of the promises of generative AI is a personal digital assistant. Acting as your advocate with others, and as a butler with you. This requires an intimacy greater than your search engine, email provider, cloud storage system, or phone. You’re going to want it with you 24/7, constantly training on everything you do. You will want it to know everything about you, so it can most effectively work on your behalf.

And it will help you in many ways. It will notice your moods and know what to suggest. It will anticipate your needs and work to satisfy them. It will be your therapist, life coach, and relationship counselor.

You will default to thinking of it as a friend. You will speak to it in natural language, and it will respond in kind. If it is a robot, it will look humanoid—­or at least like an animal. It will interact with the whole of your existence, just like another person would.

[…]

And you will want to trust it. It will use your mannerisms and cultural references. It will have a convincing voice, a confident tone, and an authoritative manner. Its personality will be optimized to exactly what you like and respond to.

It will act trustworthy, but it will not be trustworthy. We won’t know how they are trained. We won’t know their secret instructions. We won’t know their biases, either accidental or deliberate.

We do know that they are built at enormous expense, mostly in secret, by profit-maximizing corporations for their own benefit.

[…]

All of this is a long-winded way of saying that we need trustworthy AI. AI whose behavior, limitations, and training are understood. AI whose biases are understood, and corrected for. AI whose goals are understood. That won’t secretly betray your trust to someone else.

The market will not provide this on its own. Corporations are profit maximizers, at the expense of society. And the incentives of surveillance capitalism are just too much to resist.

We are going to need some sort of public AI to counterbalance all of these corporate AIs.

EDITED TO ADD (5/24): Lots of comments about Microsoft Recall and security:

This:

Because Recall is “default allow” (it relies on a list of things not to record) … it’s going to vacuum up huge volumes and heretofore unknown types of data, most of which are ephemeral today. The “we can’t avoid saving passwords if they’re not masked” warning Microsoft included is only the tip of that iceberg. There’s an ocean of data that the security ecosystem assumes is “out of reach” because it’s either never stored, or it’s encrypted in transit. All of that goes out the window if the endpoint is just going to…turn around and write it to disk. (And local encryption at rest won’t help much here if the data is queryable in the user’s own authentication context!)

This:

The fact that Microsoft’s new Recall thing won’t capture DRM content means the engineers do understand the risk of logging everything. They just chose to preference the interests of corporates and money over people, deliberately.

This:

Microsoft Recall is going to make post-breach impact analysis impossible. Right now IR processes can establish a timeline of data stewardship to identify what information may have been available to an attacker based on the level of access they obtained. It’s not trivial work, but IR folks can do it. Once a system with Recall is compromised, all data that has touched that system is potentially compromised too, and the ML indirection makes it near impossible to confidently identify a blast radius.

This:

You may be in a position where leaders in your company are hot to turn on Microsoft Copilot Recall. Your best counterargument isn’t threat actors stealing company data. It’s that opposing counsel will request the recall data and demand it not be disabled as part of e-discovery proceedings.

New Consent and Bot Management features for Cloudflare Zaraz

Post Syndicated from Yo'av Moshe original https://blog.cloudflare.com/new-consent-and-bot-management-features-for-cloudflare-zaraz


Managing consent online can be challenging. After you’ve figured out the necessary regulations, you usually need to configure some Consent Management Platform (CMP) to load all third-party tools and scripts on your website in a way that respects these demands. Cloudflare Zaraz manages the loading of all of these third-party tools, so it was only natural that in April 2023 we announced the Cloudflare Zaraz CMP: the simplest way to manage consent in a way that seamlessly integrates with your third-party tools manager.

As more and more third-party tool vendors are required to handle consent properly, our CMP has evolved to integrate with these new technologies and standardization efforts. Today, we’re happy to announce that the Cloudflare Zaraz CMP is now compatible with the Interactive Advertising Bureau Transparency and Consent Framework (IAB TCF) requirements, and fully supports Google’s Consent Mode v2 signals. Separately, we’ve taken efforts to improve the way Cloudflare Zaraz handles traffic coming from online bots.

IAB TCF Compatibility

Earlier this year, Google announced that websites that would like to use AdSense and other advertising solutions in the European Economic Area (EEA), the UK, and Switzerland, will be required to use a CMP that is approved by IAB Europe, an association for digital marketing and advertising. Their Transparency and Consent Framework sets guidelines for how CMPs should operate. Since March 2024, the Cloudflare Zaraz CMP is compliant with these guidelines, and Zaraz users in Europe can use Google’s advertising products without any restrictions.

Since the IAB TCF requirements can make the consent modal a little complex for users, we have made this compliance mode an opt-in feature. See the official documentation for information on how to enable it.

Google Consent Mode v2

Another new requirement from Google was the need to send “Consent Signals”. These signals are part of what is also known as “Consent Mode”, and later, Consent Mode v2. Together with each event sent to Google Analytics and Google Ads, they tell the Google servers about the consent status of the current visitor – did they agree to have their data used for personalized advertising? Did they accept cookies? These and other questions are answered by Consent Mode v2, telling the Google servers how to treat the data it receives.

Consent Mode v2 usually requires setting two values for each consent category – a default value and an updated one. The default value represents the consent status (granted or denied) a certain category (e.g. using cookies) has before the user has submitted their personal preferences. Usually, and especially within the EU, the default value would be `denied`. Once the user submits their preferences, Consent Mode v2 sends an additional “updated” value that represents the choice the user made.

Implementing Consent Mode v2 is quick and easy with Cloudflare Zaraz, although the specific implementation depends on your CMP. Examples, including integration with the Cloudflare Zaraz CMP, are available in our official documentation.

We believe that better standardization around online consent benefits everyone, and we are glad to be working on tools that respect users’ privacy and improve online user experience.

Bot Management

We also recently integrated better Bot Management support within Cloudflare Zaraz. You often want crawlers to be able to access your website, but you don’t want them to trigger your analytics and conversion pixels. Using the Bot Management feature in the Cloudflare Zaraz Settings page allows you to fine tune which requests will make it to Cloudflare Zaraz and which ones will be skipped. Since Zaraz pricing is based on the total number of Zaraz Events, this can also be useful if you want more control over your Cloudflare Zaraz costs, ensuring you will not be paying for events triggered by bots. Like all other Cloudflare Zaraz features, these new features are also available to users on all plans, including the free plan. For us, it is part of making sure that everyone can benefit from a faster, safer, and more private way to manage third parties online. If you haven’t started using Cloudflare Zaraz already, now is a great time. Go to the Cloudflare dashboard and set it up in just a few clicks.

Dan Solove on Privacy Regulation

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/04/dan-solove-on-privacy-regulation.html

Law professor Dan Solove has a new article on privacy regulation. In his email to me, he writes: “I’ve been pondering privacy consent for more than a decade, and I think I finally made a breakthrough with this article.” His mini-abstract:

In this Article I argue that most of the time, privacy consent is fictitious. Instead of futile efforts to try to turn privacy consent from fiction to fact, the better approach is to lean into the fictions. The law can’t stop privacy consent from being a fairy tale, but the law can ensure that the story ends well. I argue that privacy consent should confer less legitimacy and power and that it be backstopped by a set of duties on organizations that process personal data based on consent.

Full abstract:

Consent plays a profound role in nearly all privacy laws. As Professor Heidi Hurd aptly said, consent works “moral magic”—it transforms things that would be illegal and immoral into lawful and legitimate activities. As to privacy, consent authorizes and legitimizes a wide range of data collection and processing.

There are generally two approaches to consent in privacy law. In the United States, the notice-and-choice approach predominates; organizations post a notice of their privacy practices and people are deemed to consent if they continue to do business with the organization or fail to opt out. In the European Union, the General Data Protection Regulation (GDPR) uses the express consent approach, where people must voluntarily and affirmatively consent.

Both approaches fail. The evidence of actual consent is non-existent under the notice-and-choice approach. Individuals are often pressured or manipulated, undermining the validity of their consent. The express consent approach also suffers from these problems ­ people are ill-equipped to decide about their privacy, and even experts cannot fully understand what algorithms will do with personal data. Express consent also is highly impractical; it inundates individuals with consent requests from thousands of organizations. Express consent cannot scale.

In this Article, I contend that most of the time, privacy consent is fictitious. Privacy law should take a new approach to consent that I call “murky consent.” Traditionally, consent has been binary—an on/off switch—but murky consent exists in the shadowy middle ground between full consent and no consent. Murky consent embraces the fact that consent in privacy is largely a set of fictions and is at best highly dubious.

Because it conceptualizes consent as mostly fictional, murky consent recognizes its lack of legitimacy. To return to Hurd’s analogy, murky consent is consent without magic. Rather than provide extensive legitimacy and power, murky consent should authorize only a very restricted and weak license to use data. Murky consent should be subject to extensive regulatory oversight with an ever-present risk that it could be deemed invalid. Murky consent should rest on shaky ground. Because the law pretends people are consenting, the law’s goal should be to ensure that what people are consenting to is good. Doing so promotes the integrity of the fictions of consent. I propose four duties to achieve this end: (1) duty to obtain consent appropriately; (2) duty to avoid thwarting reasonable expectations; (3) duty of loyalty; and (4) duty to avoid unreasonable risk. The law can’t make the tale of privacy consent less fictional, but with these duties, the law can ensure the story ends well.