Tag Archives: Privacy

Like Social Media, AI Requires Difficult Choices

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/12/like-social-media-ai-requires-difficult-choices.html

In his 2020 book, “Future Politics,” British barrister Jamie Susskind wrote that the dominant question of the 20th century was “How much of our collective life should be determined by the state, and what should be left to the market and civil society?” But in the early decades of this century, Susskind suggested that we face a different question: “To what extent should our lives be directed and controlled by powerful digital systems—and on what terms?”

Artificial intelligence (AI) forces us to confront this question. It is a technology that in theory amplifies the power of its users: A manager, marketer, political campaigner, or opinionated internet user can utter a single instruction, and see their message—whatever it is—instantly written, personalized, and propagated via email, text, social, or other channels to thousands of people within their organization, or millions around the world. It also allows us to individualize solicitations for political donations, elaborate a grievance into a well-articulated policy position, or tailor a persuasive argument to an identity group, or even a single person.

But even as it offers endless potential, AI is a technology that—like the state—gives others new powers to control our lives and experiences.

We’ve seen this out play before. Social media companies made the same sorts of promises 20 years ago: instant communication enabling individual connection at massive scale. Fast-forward to today, and the technology that was supposed to give individuals power and influence ended up controlling us. Today social media dominates our time and attention, assaults our mental health, and—together with its Big Tech parent companies—captures an unfathomable fraction of our economy, even as it poses risks to our democracy.

The novelty and potential of social media was as present then as it is for AI now, which should make us wary of its potential harmful consequences for society and democracy. We legitimately fear artificial voices and manufactured reality drowning out real people on the internet: on social media, in chat rooms, everywhere we might try to connect with others.

It doesn’t have to be that way. Alongside these evident risks, AI has legitimate potential to transform both everyday life and democratic governance in positive ways. In our new book, “Rewiring Democracy,” we chronicle examples from around the globe of democracies using AI to make regulatory enforcement more efficient, catch tax cheats, speed up judicial processes, synthesize input from constituents to legislatures, and much more. Because democracies distribute power across institutions and individuals, making the right choices about how to shape AI and its uses requires both clarity and alignment across society.

To that end, we spotlight four pivotal choices facing private and public actors. These choices are similar to those we faced during the advent of social media, and in retrospect we can see that we made the wrong decisions back then. Our collective choices in 2025—choices made by tech CEOs, politicians, and citizens alike—may dictate whether AI is applied to positive and pro-democratic, or harmful and civically destructive, ends.

A Choice for the Executive and the Judiciary: Playing by the Rules

The Federal Election Commission (FEC) calls it fraud when a candidate hires an actor to impersonate their opponent. More recently, they had to decide whether doing the same thing with an AI deepfake makes it okay. (They concluded it does not.) Although in this case the FEC made the right decision, this is just one example of how AIs could skirt laws that govern people.

Likewise, courts are having to decide if and when it is okay for an AI to reuse creative materials without compensation or attribution, which might constitute plagiarism or copyright infringement if carried out by a human. (The court outcomes so far are mixed.) Courts are also adjudicating whether corporations are responsible for upholding promises made by AI customer service representatives. (In the case of Air Canada, the answer was yes, and insurers have started covering the liability.)

Social media companies faced many of the same hazards decades ago and have largely been shielded by the combination of Section 230 of the Communications Act of 1994 and the safe harbor offered by the Digital Millennium Copyright Act of 1998. Even in the absence of congressional action to strengthen or add rigor to this law, the Federal Communications Commission (FCC) and the Supreme Court could take action to enhance its effects and to clarify which humans are responsible when technology is used, in effect, to bypass existing law.

A Choice for Congress: Privacy

As AI-enabled products increasingly ask Americans to share yet more of their personal information—their “context“—to use digital services like personal assistants, safeguarding the interests of the American consumer should be a bipartisan cause in Congress.

It has been nearly 10 years since Europe adopted comprehensive data privacy regulation. Today, American companies exert massive efforts to limit data collection, acquire consent for use of data, and hold it confidential under significant financial penalties—but only for their customers and users in the EU.

Regardless, a decade later the U.S. has still failed to make progress on any serious attempts at comprehensive federal privacy legislation written for the 21st century, and there are precious few data privacy protections that apply to narrow slices of the economy and population. This inaction comes in spite of scandal after scandal regarding Big Tech corporations’ irresponsible and harmful use of our personal data: Oracle’s data profiling, Facebook and Cambridge Analytica, Google ignoring data privacy opt-out requests, and many more.

Privacy is just one side of the obligations AI companies should have with respect to our data; the other side is portability—that is, the ability for individuals to choose to migrate and share their data between consumer tools and technology systems. To the extent that knowing our personal context really does enable better and more personalized AI services, it’s critical that consumers have the ability to extract and migrate their personal context between AI solutions. Consumers should own their own data, and with that ownership should come explicit control over who and what platforms it is shared with, as well as withheld from. Regulators could mandate this interoperability. Otherwise, users are locked in and lack freedom of choice between competing AI solutions—much like the time invested to build a following on a social network has locked many users to those platforms.

A Choice for States: Taxing AI Companies

It has become increasingly clear that social media is not a town square in the utopian sense of an open and protected public forum where political ideas are distributed and debated in good faith. If anything, social media has coarsened and degraded our public discourse. Meanwhile, the sole act of Congress designed to substantially reign in the social and political effects of social media platforms—the TikTok ban, which aimed to protect the American public from Chinese influence and data collection, citing it as a national security threat—is one it seems to no longer even acknowledge.

While Congress has waffled, regulation in the U.S. is happening at the state level. Several states have limited children’s and teens’ access to social media. With Congress having rejected—for now—a threatened federal moratorium on state-level regulation of AI, California passed a new slate of AI regulations after mollifying a lobbying onslaught from industry opponents. Perhaps most interesting, Maryland has recently become the first in the nation to levy taxes on digital advertising platform companies.

States now face a choice of whether to apply a similar reparative tax to AI companies to recapture a fraction of the costs they externalize on the public to fund affected public services. State legislators concerned with the potential loss of jobs, cheating in schools, and harm to those with mental health concerns caused by AI have options to combat it. They could extract the funding needed to mitigate these harms to support public services—strengthening job training programs and public employment, public schools, public health services, even public media and technology.

A Choice for All of Us: What Products Do We Use, and How?

A pivotal moment in the social media timeline occurred in 2006, when Facebook opened its service to the public after years of catering to students of select universities. Millions quickly signed up for a free service where the only source of monetization was the extraction of their attention and personal data.

Today, about half of Americans are daily users of AI, mostly via free products from Facebook’s parent company Meta and a handful of other familiar Big Tech giants and venture-backed tech firms such as Google, Microsoft, OpenAI, and Anthropic—with every incentive to follow the same path as the social platforms.

But now, as then, there are alternatives. Some nonprofit initiatives are building open-source AI tools that have transparent foundations and can be run locally and under users’ control, like AllenAI and EleutherAI. Some governments, like Singapore, Indonesia, and Switzerland, are building public alternatives to corporate AI that don’t suffer from the perverse incentives introduced by the profit motive of private entities.

Just as social media users have faced platform choices with a range of value propositions and ideological valences—as diverse as X, Bluesky, and Mastodon—the same will increasingly be true of AI. Those of us who use AI products in our everyday lives as people, workers, and citizens may not have the same power as judges, lawmakers, and state officials. But we can play a small role in influencing the broader AI ecosystem by demonstrating interest in and usage of these alternatives to Big AI. If you’re a regular user of commercial AI apps, consider trying the free-to-use service for Switzerland’s public Apertus model.

None of these choices are really new. They were all present almost 20 years ago, as social media moved from niche to mainstream. They were all policy debates we did not have, choosing instead to view these technologies through rose-colored glasses. Today, though, we can choose a different path and realize a different future. It is critical that we intentionally navigate a path to a positive future for societal use of AI—before the consolidation of power renders it too late to do so.

This post was written with Nathan E. Sanders, and originally appeared in Lawfare.

Banning VPNs

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/12/banning-vpns.html

This is crazy. Lawmakers in several US states are contemplating banning VPNs, because…think of the children!

As of this writing, Wisconsin lawmakers are escalating their war on privacy by targeting VPNs in the name of “protecting children” in A.B. 105/S.B. 130. It’s an age verification bill that requires all websites distributing material that could conceivably be deemed “sexual content” to both implement an age verification system and also to block the access of users connected via VPN. The bill seeks to broadly expand the definition of materials that are “harmful to minors” beyond the type of speech that states can prohibit minors from accessing­ potentially encompassing things like depictions and discussions of human anatomy, sexuality, and reproduction.

The EFF link explains why this is a terrible idea.

Async QUIC and HTTP/3 made easy: tokio-quiche is now open-source

Post Syndicated from Mendes original https://blog.cloudflare.com/async-quic-and-http-3-made-easy-tokio-quiche-is-now-open-source/

A little over 6 years ago, we presented quiche, our open source QUIC implementation written in Rust. Today we’re announcing the open sourcing of tokio-quiche, our battle-tested, asynchronous QUIC library combining both quiche and the Rust Tokio async runtime. Powering Cloudflare’s Proxy B in Apple iCloud Private Relay and our next-generation Oxy-based proxies, tokio-quiche handles millions of HTTP/3 requests per second with low latency and high throughput. tokio-quiche also powers Cloudflare Warp’s MASQUE client, replacing our WireGuard tunnels with QUIC-based tunnels, and the async version of h3i.

quiche was developed as a sans-io library, meaning that it implements the state machine required to handle the QUIC transport protocol while not making any assumptions about how its user intends to perform IO. This means that, with enough elbow grease, anyone can write an IO integration with quiche! This entails connecting or listening on a UDP socket, managing sending and receiving UDP datagrams on that socket while feeding all network information to quiche. Given we need this integration to be async, we’d have to do all this while integrating with an async Rust runtime. tokio-quiche does all of that for you, no grease required.

Lowering the barrier to entry

Originally, tokio-quiche was only used as the core of Oxy’s HTTP/3 server. But the spark to create tokio-quiche as a standalone library was our need for a MASQUE-capable HTTP/3 client. Our Zero Trust and Privacy Teams need MASQUE clients to tunnel data through WARP and our Privacy Proxies respectively, and we wanted to use the same technology to build both the client and server.

We initially open-sourced quiche to share our memory-safe QUIC and HTTP/3 implementation with as many stakeholders as possible. Our focus at the time was a low-level, sans-io design that could integrate into many types of software and be deployed widely. We achieved this goal, with quiche deployed in many different clients and servers. However, integrating sans-io libraries into applications is an error-prone and time-consuming process. Our aim with tokio-quiche is to lower the barrier of entry by providing much of the needed code ourselves.

Cloudflare alone embracing HTTP/3 is not of much use if others wanting to interact with our products and systems don’t also adopt it. Open sourcing tokio-quiche makes integration with our systems more straightforward, and helps propel the industry into the new standard of HTTP. By contributing tokio-quiche back to the Rust ecosystem, we hope to promote the development and usage of HTTP/3, QUIC and new privacy preserving technologies.

tokio-quiche has been used internally for some years now. This gave us time to refine and battle-test it, demonstrating that it can handle millions of RPS. tokio-quiche is not intended to be a standalone HTTP/3 client or server, but implements low-level protocols and allows for higher-level projects in the future. The README contains examples of server and client client event loops.

It’s actors all the way down

Tokio is a wildly popular asynchronous Rust runtime. It efficiently manages, schedules and executes the billions of asynchronous tasks which run on our edge. We use Tokio extensively at Cloudflare, so we decided to tightly integrate quiche with it – thus the name, tokio-quiche. Under the hood, tokio-quiche uses actors to drive different parts of the QUIC and HTTP/3 state machine. Actors are small tasks with internal state that usually use message passing over channels to communicate with the outside world.

The actor model is a great abstraction to use for async-ifying sans-io libraries due to the conceptual similarities between the two. Both actors and sans-io libraries have some kind internal state which they want exclusive access to. They both usually interact with the outside world by sending and receiving  “messages”. quiche’s “messages” are really raw byte buffers which represent incoming and outgoing network data. One of tokio-quiche’s “messages” is the Incoming struct which describes incoming UDP packets. Due to these similarities, async-ifying a sans-io library means: awaiting new messages or IO, translating the messages or IO into something the sans-io library understands, advancing the internal state machine, translating the state machine’s output to a message or IO, and finally sending the message or IO. (For more discussion on actors with Tokio, make sure to take a look at Alice Rhyl’s excellent blog post on the topic.)

The primary actor in tokio-quiche is the IO loop actor, which moves packets between quiche and the socket. Since QUIC is a transport protocol, it can carry any application protocol you want. HTTP/3 is quite common, but DNS over QUIC and the upcoming Media over QUIC are other examples. There’s even an RFC to help you create your own QUIC application! tokio-quiche exposes the ApplicationOverQuic trait to abstract over application protocols. The trait abstracts over quiche’s methods and the underlying I/O, allowing you to focus on your application logic. For example, our HTTP/3 debug and test client, h3i, is powered by a client-focused, non-HTTP/3 ApplicationOverQuic implementation.


Server Architecture Diagram

tokio-quiche ships with an HTTP/3-focused ApplicationOverQuic called H3Driver. H3Driver hooks up quiche’s HTTP/3 module to this IO loop to provide the building blocks for an async HTTP/3 client or server. The driver turns quiche’s raw HTTP/3 events into higher-level events and asynchronous body data streams, allowing you to respond to them in kind. H3Driver is itself generic, exposing ServerH3Driver and ClientH3Driver variants that each stack additional behavior on top of the core driver’s events.


Internal Data Flow

Inside tokio-quiche, we spawn two important tasks that facilitate data movement from a socket to quiche. The first is the InboundPacketRouter, which owns the receiving half of the socket and routes inbound datagrams by their connection ID (DCID) to a per-connection channel. The second task, the IoWorker actor, is the aforementioned IO loop and drives a single quiche Connection. It intersperses quiche calls with ApplicationOverQuic methods, ensuring you can inspect the connection before and after any IO interaction.

More blog posts on the creation of tokio-quiche are coming soon. We’ll discuss actor models and mutexes, UDP GRO and GSO, tokio task coop budgeting, and more.

Next up: more on QUIC and beyond!

tokio-quiche is an important foundation for Cloudflare’s investment into the QUIC and HTTP/3 ecosystem for Tokio – but it is still only a building block with its own complexity. In the future, we plan to release the same easy-to-use HTTP client and server abstractions that power our Oxy proxies and WARP clients today. Stay tuned for more blog posts on QUIC and HTTP/3 at Cloudflare, including an open-source client for customers of our Privacy Proxies and a completely new service that’s handling millions of RPS with tokio-quiche!

For now, check out the tokio-quiche crate on crates.io and its source code on GitHub to build your very own QUIC application. Could be a simple echo server, a DNS-over-QUIC client, a custom VPN, or even a fully-fledged HTTP server. Maybe you will beat us to the punch?

First Wap: A Surveillance Computer You’ve Never Heard Of

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/10/first-wap-a-surveillance-computer-youve-never-heard-of.html

Mother Jones has a long article on surveillance arms manufacturers, their wares, and how they avoid export control laws:

Operating from their base in Jakarta, where permissive export laws have allowed their surveillance business to flourish, First Wap’s European founders and executives have quietly built a phone-tracking empire, with a footprint extending from the Vatican to the Middle East to Silicon Valley.

It calls its proprietary system Altamides, which it describes in promotional materials as “a unified platform to covertly locate the whereabouts of single or multiple suspects in real-time, to detect movement patterns, and to detect whether suspects are in close vicinity with each other.”

Altamides leaves no trace on the phones it targets, unlike spyware such as Pegasus. Nor does it require a target to click on a malicious link or show any of the telltale signs (such as overheating or a short battery life) of remote monitoring.

Its secret is shrewd use of the antiquated telecom language Signaling System No. 7, known as SS7, that phone carriers use to route calls and text messages. Any entity with SS7 access can send queries requesting information about which cell tower a phone subscriber is nearest to, an essential first step to sending a text message or making a call to that subscriber. But First Wap’s technology uses SS7 to zero in on phone numbers and trace the location of their users.

Much more in this Lighthouse Reports analysis.

The Trump Administration’s Increased Use of Social Media Surveillance

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/10/the-trump-administrations-increased-use-of-social-media-surveillance.html

This chilling paragraph is in a comprehensive Brookings report about the use of tech to deport people from the US:

The administration has also adapted its methods of social media surveillance. Though agencies like the State Department have gathered millions of handles and monitored political discussions online, the Trump administration has been more explicit in who it’s targeting. Secretary of State Marco Rubio announced a new, zero-tolerance “Catch and Revoke” strategy, which uses AI to monitor the public speech of foreign nationals and revoke visas of those who “abuse [the country’s] hospitality.” In a March press conference, Rubio remarked that at least 300 visas, primarily student and visitor visas, had been revoked on the grounds that visitors are engaging in activity contrary to national interest. A State Department cable also announced a new requirement for student visa applicants to set their social media accounts to public—reflecting stricter vetting practices aimed at identifying individuals who “bear hostile attitudes toward our citizens, culture, government, institutions, or founding principles,” among other criteria.

Flok License Plate Surveillance

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/10/flok-license-plate-surveillance.html

The company Flok is surveilling us as we drive:

A retired veteran named Lee Schmidt wanted to know how often Norfolk, Virginia’s 176 Flock Safety automated license-plate-reader cameras were tracking him. The answer, according to a U.S. District Court lawsuit filed in September, was more than four times a day, or 526 times from mid-February to early July. No, there’s no warrant out for Schmidt’s arrest, nor is there a warrant for Schmidt’s co-plaintiff, Crystal Arrington, whom the system tagged 849 times in roughly the same period.

You might think this sounds like it violates the Fourth Amendment, which protects American citizens from unreasonable searches and seizures without probable cause. Well, so does the American Civil Liberties Union. Norfolk, Virginia Judge Jamilah LeCruise also agrees, and in 2024 she ruled that plate-reader data obtained without a search warrant couldn’t be used against a defendant in a robbery case.

Digital Threat Modeling Under Authoritarianism

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/09/digital-threat-modeling-under-authoritarianism.html

Today’s world requires us to make complex and nuanced decisions about our digital security. Evaluating when to use a secure messaging app like Signal or WhatsApp, which passwords to store on your smartphone, or what to share on social media requires us to assess risks and make judgments accordingly. Arriving at any conclusion is an exercise in threat modeling.

In security, threat modeling is the process of determining what security measures make sense in your particular situation. It’s a way to think about potential risks, possible defenses, and the costs of both. It’s how experts avoid being distracted by irrelevant risks or overburdened by undue costs.

We threat model all the time. We might decide to walk down one street instead of another, or use an internet VPN when browsing dubious sites. Perhaps we understand the risks in detail, but more likely we are relying on intuition or some trusted authority. But in the U.S. and elsewhere, the average person’s threat model is changing—specifically involving how we protect our personal information. Previously, most concern centered on corporate surveillance; companies like Google and Facebook engaging in digital surveillance to maximize their profit. Increasingly, however, many people are worried about government surveillance and how the government could weaponize personal data.

Since the beginning of this year, the Trump administration’s actions in this area have raised alarm bells: The Department of Government Efficiency (DOGE) took data from federal agencies, Palantir combined disparate streams of government data into a single system, and Immigration and Customs Enforcement (ICE) used social media posts as a reason to deny someone entry into the U.S.

These threats, and others posed by a techno-authoritarian regime, are vastly different from those presented by a corporate monopolistic regime—and different yet again in a society where both are working together. Contending with these new threats requires a different approach to personal digital devices, cloud services, social media, and data in general.

What Data Does the Government Already Have?

For years, most public attention has centered on the risks of tech companies gathering behavioral data. This is an enormous amount of data, generally used to predict and influence consumers’ future behavior—rather than as a means of uncovering our past. Although commercial data is highly intimate—such as knowledge of your precise location over the course of a year, or the contents of every Facebook post you have ever created—it’s not the same thing as tax returns, police records, unemployment insurance applications, or medical history.

The U.S. government holds extensive data about everyone living inside its borders, some of it very sensitive—and there’s not much that can be done about it. This information consists largely of facts that people are legally obligated to tell the government. The IRS has a lot of very sensitive data about personal finances. The Treasury Department has data about any money received from the government. The Office of Personnel Management has an enormous amount of detailed information about government employees—including the very personal form required to get a security clearance. The Census Bureau possesses vast data about everyone living in the U.S., including, for example, a database of real estate ownership in the country. The Department of Defense and the Bureau of Veterans Affairs have data about present and former members of the military, the Department of Homeland Security has travel information, and various agencies possess health records. And so on.

It is safe to assume that the government has—or will soon have—access to all of this government data. This sounds like a tautology, but in the past, the U.S. government largely followed the many laws limiting how those databases were used, especially regarding how they were shared, combined, and correlated. Under the second Trump administration, this no longer seems to be the case.

Augmenting Government Data with Corporate Data

The mechanisms of corporate surveillance haven’t gone away. Compute technology is constantly spying on its users—and that data is being used to influence us. Companies like Google and Meta are vast surveillance machines, and they use that data to fuel advertising. A smartphone is a portable surveillance device, constantly recording things like location and communication. Cars, and many other Internet of Things devices, do the same. Credit card companies, health insurers, internet retailers, and social media sites all have detailed data about you—and there is a vast industry that buys and sells this intimate data.

This isn’t news. What’s different in a techno-authoritarian regime is that this data is also shared with the government, either as a paid service or as demanded by local law. Amazon shares Ring doorbell data with the police. Flock, a company that collects license plate data from cars around the country, shares data with the police as well. And just as Chinese corporations share user data with the government and companies like Verizon shared calling records with the National Security Agency (NSA) after the Sept. 11 terrorist attacks, an authoritarian government will use this data as well.

Personal Targeting Using Data

The government has vast capabilities for targeted surveillance, both technically and legally. If a high-level figure is targeted by name, it is almost certain that the government can access their data. The government will use its investigatory powers to the fullest: It will go through government data, remotely hack phones and computers, spy on communications, and raid a home. It will compel third parties, like banks, cell providers, email providers, cloud storage services, and social media companies, to turn over data. To the extent those companies keep backups, the government will even be able to obtain deleted data.

This data can be used for prosecution—possibly selectively. This has been made evident in recent weeks, as the Trump administration personally targeted perceived enemies for “mortgage fraud.” This was a clear example of weaponization of data. Given all the data the government requires people to divulge, there will be something there to prosecute.

Although alarming, this sort of targeted attack doesn’t scale. As vast as the government’s information is and as powerful as its capabilities are, they are not infinite. They can be deployed against only a limited number of people. And most people will never be that high on the priorities list.

The Risks of Mass Surveillance

Mass surveillance is surveillance without specific targets. For most people, this is where the primary risks lie. Even if we’re not targeted by name, personal data could raise red flags, drawing unwanted scrutiny.

The risks here are twofold. First, mass surveillance could be used to single out people to harass or arrest: when they cross the border, show up at immigration hearings, attend a protest, are stopped by the police for speeding, or just as they’re living their normal lives. Second, mass surveillance could be used to threaten or blackmail. In the first case, the government is using that database to find a plausible excuse for its actions. In the second, it is looking for an actual infraction that it could selectively prosecute—or not.

Mitigating these risks is difficult, because it would require not interacting with either the government or corporations in everyday life—and living in the woods without any electronics isn’t realistic for most of us. Additionally, this strategy protects only future information; it does nothing to protect the information generated in the past. That said, going back and scrubbing social media accounts and cloud storage does have some value. Whether it’s right for you depends on your personal situation.

Opportunistic Use of Data

Beyond data given to third parties—either corporations or the government—there is also data users keep in their possession.This data may be stored on personal devices such as computers and phones or, more likely today, in some cloud service and accessible from those devices. Here, the risks are different: Some authority could confiscate your device and look through it.

This is not just speculative. There are many stories of ICE agents examining people’s phones and computers when they attempt to enter the U.S.: their emails, contact lists, documents, photos, browser history, and social media posts.

There are several different defenses you can deploy, presented from least to most extreme. First, you can scrub devices of potentially incriminating information, either as a matter of course or before entering a higher-risk situation. Second, you could consider deleting—even temporarily—social media and other apps so that someone with access to a device doesn’t get access to those accounts—this includes your contacts list. If a phone is swept up in a government raid, your contacts become their next targets.

Third, you could choose not to carry your device with you at all, opting instead for a burner phone without contacts, email access, and accounts, or go electronics-free entirely. This may sound extreme—and getting it right is hard—but I know many people today who have stripped-down computers and sanitized phones for international travel. At the same time, there are also stories of people being denied entry to the U.S. because they are carrying what is obviously a burner phone—or no phone at all.

Encryption Isn’t a Magic Bullet—But Use It Anyway

Encryption protects your data while it’s not being used, and your devices when they’re turned off. This doesn’t help if a border agent forces you to turn on your phone and computer. And it doesn’t protect metadata, which needs to be unencrypted for the system to function. This metadata can be extremely valuable. For example, Signal, WhatsApp, and iMessage all encrypt the contents of your text messages—the data—but information about who you are texting and when must remain unencrypted.

Also, if the NSA wants access to someone’s phone, it can get it. Encryption is no help against that sort of sophisticated targeted attack. But, again, most of us aren’t that important and even the NSA can target only so many people. What encryption safeguards against is mass surveillance.

I recommend Signal for text messages above all other apps. But if you are in a country where having Signal on a device is in itself incriminating, then use WhatsApp. Signal is better, but everyone has WhatsApp installed on their phones, so it doesn’t raise the same suspicion. Also, it’s a no-brainer to turn on your computer’s built-in encryption: BitLocker for Windows and FileVault for Macs.

On the subject of data and metadata, it’s worth noting that data poisoning doesn’t help nearly as much as you might think. That is, it doesn’t do much good to add hundreds of random strangers to an address book or bogus internet searches to a browser history to hide the real ones. Modern analysis tools can see through all of that.

Shifting Risks of Decentralization

This notion of individual targeting, and the inability of the government to do that at scale, starts to fail as the authoritarian system becomes more decentralized. After all, if repression comes from the top, it affects only senior government officials and people who people in power personally dislike. If it comes from the bottom, it affects everybody. But decentralization looks much like the events playing out with ICE harassing, detaining, and disappearing people—everyone has to fear it.

This can go much further. Imagine there is a government official assigned to your neighborhood, or your block, or your apartment building. It’s worth that person’s time to scrutinize everybody’s social media posts, email, and chat logs. For anyone in that situation, limiting what you do online is the only defense.

Being Innocent Won’t Protect You

This is vital to understand. Surveillance systems and sorting algorithms make mistakes. This is apparent in the fact that we are routinely served advertisements for products that don’t interest us at all. Those mistakes are relatively harmless—who cares about a poorly targeted ad?—but a similar mistake at an immigration hearing can get someone deported.

An authoritarian government doesn’t care. Mistakes are a feature and not a bug of authoritarian surveillance. If ICE targets only people it can go after legally, then everyone knows whether or not they need to fear ICE. If ICE occasionally makes mistakes by arresting Americans and deporting innocents, then everyone has to fear it. This is by design.

Effective Opposition Requires Being Online

For most people, phones are an essential part of daily life. If you leave yours at home when you attend a protest, you won’t be able to film police violence. Or coordinate with your friends and figure out where to meet. Or use a navigation app to get to the protest in the first place.

Threat modeling is all about trade-offs. Understanding yours depends not only on the technology and its capabilities but also on your personal goals. Are you trying to keep your head down and survive—or get out? Are you wanting to protest legally? Are you doing more, maybe throwing sand into the gears of an authoritarian government, or even engaging in active resistance? The more you are doing, the more technology you need—and the more technology will be used against you. There are no simple answers, only choices.

Details About Chinese Surveillance and Propaganda Companies

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/09/details-about-chinese-surveillance-and-propaganda-companies.html

Details from leaked documents:

While people often look at China’s Great Firewall as a single, all-powerful government system unique to China, the actual process of developing and maintaining it works the same way as surveillance technology in the West. Geedge collaborates with academic institutions on research and development, adapts its business strategy to fit different clients’ needs, and even repurposes leftover infrastructure from its competitors.

[…]

The parallels with the West are hard to miss. A number of American surveillance and propaganda firms also started as academic projects before they were spun out into startups and grew by chasing government contracts. The difference is that in China, these companies operate with far less transparency. Their work comes to light only when a trove of documents slips onto the internet.

[…]

It is tempting to think of the Great Firewall or Chinese propaganda as the outcome of a top-down master plan that only the Chinese Communist Party could pull off. But these leaks suggest a more complicated reality. Censorship and propaganda efforts must be marketed, financed, and maintained. They are shaped by the logic of corporate quarterly financial targets and competitive bids as much as by ideology­—except the customers are governments, and the products can control or shape entire societies.

More information about one of the two leaks.

The RUM Diaries: enabling Web Analytics by default

Post Syndicated from Alex Krivit original https://blog.cloudflare.com/the-rum-diaries-enabling-web-analytics-by-default/

Measuring and improving performance on the Internet can be a daunting task because it spans multiple layers: from the user’s device and browser, to DNS lookups and the network routes, to edge configurations and origin server location. Each layer introduces its own variability such as last-mile bandwidth constraints, third-party scripts, or limited CPU resources, that are often invisible unless you have robust observability tooling in place. Even if you gather data from most of these Internet hops, performance engineers still need to correlate different metrics like front-end events, network processing times, and server-side logs in order to pinpoint where and why elusive “latency” occurs to understand how to fix it.

We want to solve this problem by providing a powerful, in-depth monitoring solution that helps you debug and optimize applications, so you can understand and trace performance issues across the Internet, end to end.

That’s why we’re excited to announce the start of a major upgrade to Cloudflare’s performance analytics suite: Web Analytics as part of our real user monitoring (RUM) tools will soon be combined with network-level insights to help you pinpoint performance issues anywhere on a packet’s journey — from a visitor’s browser, through Cloudflare’s network, to your origin.

Some popular web performance monitoring tools have also sacrificed user privacy in order to achieve depth of visibility. We’re also going to remove that tradeoff. By correlating client-side metrics (like Core Web Vitals) with detailed network and origin data, developers can see where slowdowns occur — and why —  all while preserving end user privacy (by dropping client-specific information and aggregating data by visits as explained in greater detail below).

Over the next several months we’ll share:

  • How Web Analytics work

  • Real-world debugging examples from across the Internet

  • Tips to get the most value from Cloudflare’s analytics tools

The journey starts on October 15, 2025, when Cloudflare will enable Web Analytics for all free domains by default — helping you see how your site actually performs for visitors around the world in real time, without ever collecting any personal data (not applicable to traffic originating from the EU or UK, see below). By the middle of 2026, we’ll deliver something nobody has ever had before: a comprehensive, privacy-first platform for performance monitoring and debugging. Unlike many other tools, this platform won’t just show you where latency lives, it will help you fix it, all in one place. From untangling the trickiest bottlenecks, to getting a crystal-clear view of global performance, this new tool will change how you see your web application and experiment with new performance features. And we’re not building it behind closed doors, we want to bring you along as we launch it in public. Follow along in this series, The RUM Diaries, as we share the journey.

Why this matters

Performance monitoring is only as good as the detail you can see — and the trust your users have that while you’re watching traffic performance, you aren’t watching them. As we explain below, by combining real user metrics with deep, in-network instrumentation, we’ll give developers the visibility to debug any layer of the stack while maintaining Cloudflare’s zero-compromise stance on privacy.

What problem are we solving? 

Many performance monitoring solutions provide only a narrow slice of the performance layer cake, focusing on either the client or the origin while lumping everything in between under a vague “processing time” due to lack of visibility. But as web applications get more complex and user expectations continue to rise, traditional analytics alone don’t cut it. Knowing what happened is just the tip of the iceberg; modern teams need to understand why a bottleneck occurred and how network conditions, code changes, or even a single external script can degrade load times. Moreover, often the tools available can only observe performance rather than helping to optimize it, which leaves teams unable to understand what to try to move the needle on latency.

We want to pull back the curtain so you can understand performance implications of the services you use on our platform and how you can make sure you’re getting the best performance possible. 

Consider Shannon in Detroit, Michigan. She operates an e-commerce site selling hard-to-find watches to horology enthusiasts around the globe. Shannon knows that her customers are impatient (she pictures them frequently checking their wrists). If her site loads slowly, she loses sales, her SEO drops, and her customers go to a different store where they have a better online shopping experience. 

As a result, Shannon continually monitors her site performance, but she frequently runs into problems trying to understand how her site is experienced by customers in different parts of the world. After updating her site, she frequently spot checks its performance using her browser on her office wifi in Detroit, but she continually hears complaints about slow load from her customers in Germany. So Shannon shops around for a solution that monitors performance around the globe. 

This off-the-shelf performance monitoring solution offers her the ability to run similar tests from virtual machines situated around the world across various desktops, mobile devices, and even ISPs, close to her customers. Shannon receives data from these tests, ranging from how fast these synthetic clients’ DNS resolved, how quickly they connected to a particular server, and even when a response was on its way back to a client. Thankfully for Shannon, the off-the-shelf performance monitoring solution identified “server processing time” as the latency culprit in Germany. However, she can’t help but wonder, is it my server that is slow or the transit connection of my users in Germany? Can I make my site faster by adding another server in Germany, or updating my CDN configuration? It’s a three option head-scratcher: is it a networking problem, a server problem, or something else?

Cloudflare can help Shannon (and others!) because we sit in a unique place to provide richer performance analytics. As a reverse proxy positioned between the client and the origin, we are often the first web server a user connects to when requesting content. In addition to moving what’s important closer to your customers, our product suite can generate responses at our edge (e.g. Workers), steer traffic through our dedicated backbone (e.g. cloudflared and more), and route around Internet traffic jams (e.g. Argo). By tailoring a solution that brings together: 

  • client performance data, 

  • real-time network metrics,

  • customer configuration settings, and

  • origin performance measurements

we can provide more insightful information about what’s happening in the vague “processing time.” This will allow developers like Shannon to understand what they should tweak to make their site more performant, build her business and her customers happier. 

What is Web Analytics? 

Turning back to what’s happening on October 15, 2025: We’re enabling Web Analytics so teams can track down performance bottlenecks. Web Analytics works by adding a lightweight JavaScript snippet to your website, which helps monitor performance metrics from visitors to your site. In the Web Analytics dashboard you can see aggregate performance data related to: how a browser has painted the page (via LCP, INP, and CLS), general load time metrics associated with server processing, as well as aggregate counts of visitors.

If you’ve ever popped open DevTools in your browser and stared at the waterfall chart of a slow-loading page, you’ve had a taste of what Web Analytics is doing, except instead of measuring your load times from your laptop, it’s measuring it directly from the browsers of real visitors.

Here’s the high-level architecture:

A lightweight beacon in the browser
Every page that you track with Cloudflare’s Web Analytics includes a tiny JavaScript snippet, optimized to load asynchronously so it won’t block rendering.

  • This snippet hooks into modern browser APIs like the Performance API, Resource Timing, etc

  • This is how Cloudflare collects Core Web Vital metrics like Largest Contentful Paint and Interaction to Next Paint, plus data about resource load times, TLS handshake duration from the perspective of the client.

Aggregation at the edge
When the browser sends performance data, it goes to the nearest Cloudflare data center. Instead of pushing raw events straight to a database, we pre-process at the edge. This reduces storage needs, minimizes latency, and removes personal information like IP addresses. After this pre-processing, it is sent to a core datacenter to be processed and queried by users.


Web Analytics sits under the Analytics & Logs section of the dashboard (at both the account and domain level of the dashboard). Starting on October 15, 2025, free domains will begin to see Web Analytics enabled by default and will be able to view the performance of their visitors in their dashboard. Pro, Biz and ENT accounts can enable Web Analytics by selecting the hostname of the website to add the snippet to and selecting Automatic Setup. Alternatively, you can manually paste the JavaScript beacon before the closing </body> tag on any HTML page you’d like to track from your origin. Just select “manage site” from the Web Analytics tab in the dashboard. 


Once enabled, the JS snippet works with visitors’ browsers to measure how the user experienced page load times and reports on critical client-side metrics. Below these metrics are resource attribution tables that help users understand which assets are taking the most time per metrics to load so that users can better optimize their site performance. 


What does privacy-first mean?

From the beginning, our Web Analytics tools have centered on providing insights without compromising privacy. Being privacy-first means we don’t track individual users for analytics. We don’t use any client-side state (like cookies or localStorage) for analytics purposes, and we don’t track users over time by IP address, User Agent, or any other fingerprinting technique.

Moreover, when enabling Web Analytics, you can choose to drop requests from European and UK visitors if you so desire (listed here specifically), meaning we will not collect any RUM metrics from traffic that passes through our European and UK data centers. The version of Web Analytics that will be enabled by default excludes data from EU visitors (this can be changed in the dashboard if you want). 

The concept of a visit is key to our privacy approach. Rather than count unique IP addresses (requiring storing state about each visitor), we simply count page views that originate from a distinct referral or navigation event, avoiding the need to store information that might be considered personal data. We believe this same concept that we’ve used for years in providing our privacy-first Web Analytics can be logically extended to network and origin metrics. This will allow customers to gain the insights they need to debug and solve performance issues while ensuring they are not collecting unneeded data on visitors.


Opting-out

We built our Web Analytics service to give you the insights you need to run your website, all while maintaining a privacy-first approach. However, if you do want to opt-out, here are the steps to do so.

Via Dashboard

If you have a free domain and do not want Web Analytics automatically enabled for your zone you should do the following before October 15, 2025: 

  1. Navigate to the zone in the Cloudflare dashboard

  2. In the list on the left of the screen, navigate to Web Analytics


  3. On the next page, select either `Enable Globally` or `Exclude EU` to activate the feature


  4. Once Web Analytics has been activated, navigate to `Manage RUM Settings` in the Web Analytics dashboard


  5. Then, on the next page, select `Disable` to disable Web Analytics for the zone


  6. OR, to remove Web Analytics from the zone entirely, delete the configs by clicking Advanced Options and then Delete


    Once you have disabled the product once, we will not re-enable it again. You can choose to enable it whenever you want, however.

Via API

  1. Create a Web Analytics configuration with the following API call:

    curl https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/rum/site_info \
        -H 'Content-Type: application/json' \
        -H "X-Auth-Email: $CLOUDFLARE_EMAIL" \
        -H "X-Auth-Key: $CLOUDFLARE_API_KEY" \
        -d '{
              "auto_install": false,
              "host": "example.com",
              "zone_tag": "023e105f4ecef8ad9ca31a8372d0c353"
            }'
    

    Note: This will not cause your zone to collect RUM data because auto_install is set to `false`

  2. Collect the site_tag and zone_tag fields from the response to this call

    1. site_tag in this response will correspond to $SITE_ID in the following calls

  3. EITHER Disable the Web Analytics configuration with the following API call:

    curl https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/rum/site_info/$SITE_ID \
        -X PUT \
        -H 'Content-Type: application/json' \
        -H "X-Auth-Email: $CLOUDFLARE_EMAIL" \
        -H "X-Auth-Key: $CLOUDFLARE_API_KEY" \
        -d '{
              "auto_install": true,
              "enabled": false,
              "host": "example.com",
              "zone_tag": "023e105f4ecef8ad9ca31a8372d0c353"
            }'
    
    

  4. OR Delete the Web Analytics configuration with the following API call:

    curl https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/rum/site_info/$SITE_ID \
        -X DELETE \
        -H "X-Auth-Email: $CLOUDFLARE_EMAIL" \
        -H "X-Auth-Key: $CLOUDFLARE_API_KEY"

Where We’re Going Next

Today, Web Analytics gives you visibility into how people experience your site in the browser. Next, we’re expanding that lens to show what’s happening across the entire request path, from the click in a user’s browser, through Cloudflare’s global network, to your origin servers, and back.

Here’s what’s coming:

  1. Correlating Across Layers
    We’ll match RUM data from the client with network timing, Cloudflare edge processing, and origin response latency, allowing you to pinpoint whether a spike in TTFB comes from a slow script, a cache miss, or an origin bottleneck.

  2. Proactive Alerting
    Configurable alerts will tell you when performance regresses in specific geographies, when a data center underperforms, or when origin latency spikes.

  3. Actionable Insights
    We’ll go beyond “processing time” as a single number, breaking it into the real-world steps that make up the journey: proxy routing, security checks, cache lookups, origin fetches, and more.

  4. Unified View
    All of this will live in one place (your Cloudflare dashboard) alongside your analytics, logs, firewall events, and configuration settings, so you can see cause and effect in one workflow.

Conclusion

Stay tuned as we work alongside you, in public, to build the most comprehensive, privacy-focused performance analytics platform. Together, we will illuminate every corner of the request journey so you can optimize, innovate, and deliver the best experiences to your users, every time.

The next chapters of this journey will unlock proactive alerts, cross-layer correlation, and actionable insights you can’t get anywhere else. Follow along as the RUM Diaries are just getting started.

Position Regarding the “Chat Control” EU Regulation Proposal

Post Syndicated from Bozho original https://techblog.bozho.net/position-regarding-the-chat-control-eu-regulation-proposal/

Interest in a very sensitive digital topic has been gaining momentum in recent weeks – the so-called “chat control” – a draft EU regulation under which every message we send, even through encrypted applications, would be scanned for child sexual abuse materials (the so-called CSAM).

I will make a retrospective and explain the technical problems, but before that I must state that the political party I represent holds the position that invasive measures against private correspondence, which create conditions for mass surveillance, must not be implemented. Therefore, the proposal – both in its original form and in the version seen by the Danish presidency – is unacceptable.

Even without the provisions concerning encrypted applications, the regulation makes serious steps toward improving the effectiveness of combating the spread of CSAM. Thus, at the upcoming Council of the EU meeting in the fall, the hot issue will be precisely encrypted applications – on the rest there is rather consensus, since it is indisputable that more serious and effective counteraction against such crimes is needed. Therefore, the remaining provisions of the regulation should be supported.

Initially, this proposal included the possibility of sending images centrally to a European body for scanning. This was met with strong disapproval, since in practice it eliminates end-to-end encryption – if every message containing a photo or a link is sent somewhere, encryption is effectively nullified.

Therefore, under a previous Council presidency, there was a working proposal to limit this measure only to already known content (CSAM) and for scanning to be carried out only on the device, before encryption, without sending anything anywhere. At first glance, this sounded more reasonable, as it moved the proposal away from mass surveillance. It even seemed, at first glance, that artificial intelligence could be applied directly on the device. At the time, I made such an assumption, with the caveat that careful analysis was needed.

But once such careful analysis is done, it becomes clear that this approach is both dangerous and not particularly useful for achieving the goal. I will list a few details:

  1. Organized crime groups involved in the distribution of CSAM would simply start using their own applications, which, thanks to another EU regulation (the DMA), they would be able to install on their phones without complying with the new requirements. In other words, the protection of ordinary people’s private correspondence would be weakened and risks of mass surveillance and abuse would be created, while criminal groups would bypass it.

  2. At present, there is no technology capable of implementing the Danish presidency’s and the Commission’s vision in a workable way. Algorithms for so-called perceptual hashing (or fuzzy hashing) were not designed to withstand malicious modifications – with small visual effects or transformations of images, they will go undetected. Likewise, both these algorithms and AI models that would work on end devices produce false positives, which risks flooding law enforcement with entirely legal photos. For such a technology to be introduced by regulation, it must meet all these (and other) challenges – we cannot allow proprietary, experimental technologies to become part of legal frameworks, especially when fundamental constitutional rights are at stake.

  3. The technology, if one day a sufficiently good one is developed, must be open source, and if it uses AI – the model must also be open, with a very clear and transparent process for auditing the training data. The perceptual hashing algorithm should be resistant to malicious image alterations, because otherwise it’s pointless to even try to impose such techniques. Furthermore, the central database must be subject to very strict procedures for submission and verification of content, because otherwise a member state with a low level of rule of law could submit other content, including political content, that it wishes to monitor or censor. Last summer’s example in Bulgaria with the takedown of the satirical website New Beginning (the party of the strongest local local oligarch) is just an indication of how such abuse could happen. Apart from the initial takedown, the website also appeared on lists by cybersecurity companies as “adult content” and was blocked in networks where software by those companies was installed.

These are only part of the arguments why the proposal is ill-conceived. A much longer debate on the issue is needed, as well as many more academic studies researching and developing technological readiness for such approaches. The good news is that many countries are still hesitant, among them Germany, and thus there is no majority in the Council, while the mandate of the European Parliament is against this type of invasive changes.

When there is legitimate criticism of the EU, it is that such types of regulation are possible. But the answer to this criticism is that member states evidently value guarantees for personal freedom, and that within a serious debate across the entire European Union, Orwellian measures can be stopped and working solutions can be found instead of well-sounding but nonfunctional technological regulations.

The post Position Regarding the “Chat Control” EU Regulation Proposal appeared first on Bozho's tech blog.

Reducing double spend latency from 40 ms to < 1 ms on privacy proxy

Post Syndicated from Ben Yang original https://blog.cloudflare.com/reducing-double-spend-latency-from-40-ms-to-less-than-1-ms-on-privacy-proxy/

One of Cloudflare’s big focus areas is making the Internet faster for end users. Part of the way we do that is by looking at the “big rocks” or bottlenecks that might be slowing things down — particularly processes on the critical path. When we recently turned our attention to our privacy proxy product, we found a big opportunity for improvement.

What is our privacy proxy product? These proxies let users browse the web without exposing their personal information to the websites they’re visiting. Cloudflare runs infrastructure for privacy proxies like Apple’s Private Relay and Microsoft’s Edge Secure Network.

Like any secure infrastructure, we make sure that users authenticate to these privacy proxies before we open up a connection to the website they’re visiting. In order to do this in a privacy-preserving way (so that Cloudflare collects the least possible information about end-users) we use an open Internet standard – Privacy Pass – to issue tokens that authenticate to our proxy service.

Every time a user visits a website via our Privacy Proxy, we check the validity of the Privacy Pass token which is included in the Proxy-Authorization header in their request. Before we cryptographically validate a user’s token, we check if this token has already been spent. If the token is unspent, we let the user request through. Otherwise, it’s a “double-spend”. From an access control perspective, double-spends are indicative of a problem. From a privacy perspective, double-spends can reduce the anonymity set and privacy characteristics. From a performance perspective, our privacy proxies see millions of requests per second – and any time spent authenticating delays people from accessing sites – so the check needs to be fast. Let’s see how we reduced the latency of these double-spend checks from ~40 ms to <1 ms.

How did we discover the issue?

We use a tracing platform, Jaeger. It lets us see which paths our code took and how long functions took to run. When we looked into these traces, we saw latencies of ~ 40 ms. It was a good lead, but it alone was not enough to conclude it was an issue. The reason was we only sample a small percentage of our traces, so what we saw was not the whole picture. We needed to look at more data. We could’ve increased how many traces we sampled, but traces are large and heavy for our systems to process. Metrics are a lighter weight solution. We added metrics to get data on all double-spend checks.


The lines in this graph are median latencies we saw for the slowest privacy proxies around the world. The metrics data gave us confidence that it was a problem affecting a large portion of requests… assuming that ~ 45 ms was longer than expected. But, was it expected? What numbers did we expect?

The expected latency

To understand what times are reasonable to expect, let’s go into detail on what makes up a “double-spend check”. When we do a double-spend check, we ask a backing data store if a Privacy Pass token exists. The data store we use is memcached. We have many memcached instances running on servers around the world, so which server do we ask? For this, we use mcrouter. Instead of figuring out which memcached server to ask, we give our request to mcrouter, and it will handle choosing a good memcached server to use. We looked at the median time it took for mcrouter to process our request. This graph shows the average latencies per server over time. There are spikes, but most of the time the latency is < 1 ms. 


By this point, we were confident that double-spend check latencies were longer than expected everywhere, and we started looking for the root cause.

How did we investigate the issue?

We took inspiration from the scientific method. We analyzed our code, created theories for why sections of code caused latency, and used data to reject those theories. For any remaining theories, we implemented fixes and tested if they worked.

Let’s look at the code. At a high level, the double-spend checking logic is:

  1. Get a connection, which can be broken down into:

    1. Send a memcached version command. This serves as a health check for whether the connection is still good to send data on.

    2. If the connection is still good, acquire it. Otherwise, establish a new connection.

  2. Send a memcached get command on the connection.

Let’s go through the theories we had for each step listed above.

Theory 1: health check takes long

We measured the health check primarily as a sanity check. The version command is simple and fast to process, so it should not take long. And we remained sane. The median latency was < 1 ms.


Theory 2: waiting to get a connection

To understand why we may need to wait to get a connection, let’s go into more detail on how we get a connection. In our code, we use a connection pool. The pool is a set of ready-to-go connections to mcrouter. The benefit of having a pool is that we do not have to pay the overhead of establishing a connection every time we want to make a request. Pools have a size limit, though. Our limit was 20 per server, and this is where a potential problem lies. Imagine we have a server that processes 5,000 requests every second, and requests stay for 45 ms. We can use something called Little’s Law to estimate the average number of requests in our system: 5000 x 0.045 = 225. Due to our pool size limits, we can only have 20 connections at a time, so we can only process 20 requests at any point in time. That means 205 requests are just waiting! When we do a double-spend check, maybe we’re waiting ~ 40 ms to get a connection?

We looked at the metrics of many different servers. No matter what the requests per second was, the latency was consistently ~ 40 ms, disproving the theory. For example, this graph shows data from a server that saw a maximum of 20 requests per second. It shows a histogram over time, and the large majority of requests fall in the 40 – 50 ms bucket.


Theory 3: delays in Nagle’s algorithm and delayed acks

We decided to chat with Gemini, giving it the observations we had so far. It suggested many things, but the most interesting was to check if TCP_NODELAY was set. If we had set this option in our code, it would’ve disabled something called Nagle’s algorithm. Nagle’s algorithm itself was not a problem, but when enabled alongside another feature, delayed ACKs, latencies could creep in. To explain why, let’s go through an analogy.

Suppose we run a group chat app. Normally, people type a full thought and send it in one message. But, we have a friend who sends one word at a time: “Hi”. Send. “how”. Send. “are”. Send. “you”. Send. That’s a lot of notifications. Nagle’s algorithm aims to prevent this. Nagle says that if the friend wants to send one short message, that’s fine, but it only lets them do it once per turn. When they try to send more single words right after, Nagle will save the words in a draft message. Once the draft message hits a certain length, Nagle sends. But what if the draft message never hits that length? To manage this, delayed ACKs initiates a 40 ms timer whenever the friend sends a message. If the app gets no further input before the timer ends, the message is sent to the group.

I took a closer look at the code, both Cloudflare authored code and code from dependencies we rely on. We depended on the memcache-async crate for implementing the code that lets us send memcache commands. Here is the code for sending a memcached version command:

self.io.write_all(b"version\r\n").await?;
self.io.flush().await?;

Nothing out of the ordinary. Then, we looked inside the get function.

let writer = self.io.get_mut();
writer.write_all(b"get ").await?;
writer.write_all(key.as_ref()).await?;
writer.write_all(b"\r\n").await?;
writer.flush().await?;

In our code, we set io as a TcpStream, meaning that each write_all call resulted in sending a message. With Nagle’s algorithm enabled, the data flow looked like this:


Oof. We tried to send all three small messages, but after we sent the “get “, the kernel put the token and \r\n in a buffer and started waiting. When mcrouter got the “get “, it could not do anything because it did not have the full command. So, it waited 40 ms. Then, it sent an ACK in response. We got the ACK, and sent the rest of the command in the buffer. mcrouter got the rest of the command, processed it, and returned a response telling us if the token exists. What would the data flow look like with Nagle’s algorithm disabled?


We would send all three small messages. mcrouter would have the full command, and return a response immediately. No waiting, whatsoever.

Why 40 ms?

Our Linux servers have minimum bounds for the delay. Here is a snippet of Linux source code that defines those bounds.

#if HZ >= 100
#define TCP_DELACK_MIN	((unsigned)(HZ/25))	/* minimal time to delay before sending an ACK */
#define TCP_ATO_MIN	((unsigned)(HZ/25))
#else
#define TCP_DELACK_MIN	4U
#define TCP_ATO_MIN	4U
#endif

The comment tells us that TCP_DELACK_MIN is the minimum time delayed ACKs will wait before sending an ACK. We spent some time digging through Cloudflare’s custom kernel settings and found this:

CONFIG_HZ=1000

CONFIG_HZ eventually propagates to HZ and results in a 40 ms delay. That’s where the number comes from!

The fix

We were sending three separate messages for a single command when we only needed to send one. We captured what a get command looked like in Wireshark to verify we were sending three separate messages. (We captured this locally on MacOS. Interestingly, we got an ACK for every message.)


The fix was to use BufWriter<TcpStream> so that write_all would buffer the small messages in a user-space memory buffer, and flush would send the entire memcached command in one message. The Wireshark capture looked much cleaner.


Conclusion

After deploying the fix to production, we saw the median double-spend check latency drop to expected values everywhere.


Our investigation followed a systematic, data-driven approach. We began by using observability tools to confirm the problem’s scale. From there, we formed testable hypotheses and used data to systematically disprove them. This process ultimately led us to a subtle interaction between Nagle’s algorithm and delayed ACKs, caused by how we made use of a third-party dependency.

Ultimately, our mission is to help build a better Internet. Every millisecond saved contributes to a faster and more seamless, private browsing experience for end users. We’re excited to have this rolled out and excited to continue to chase further performance improvements!

How the Solid Protocol Restores Digital Agency

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/07/how-solid-protocol-restores-digital-agency.html

The current state of digital identity is a mess. Your personal information is scattered across hundreds of locations: social media companies, IoT companies, government agencies, websites you have accounts on, and data brokers you’ve never heard of. These entities collect, store, and trade your data, often without your knowledge or consent. It’s both redundant and inconsistent. You have hundreds, maybe thousands, of fragmented digital profiles that often contain contradictory or logically impossible information. Each serves its own purpose, yet there is no central override and control to serve you—as the identity owner.

We’re used to the massive security failures resulting from all of this data under the control of so many different entities. Years of privacy breaches have resulted in a multitude of laws—in US states, in the EU, elsewhere—and calls for even more stringent protections. But while these laws attempt to protect data confidentiality, there is nothing to protect data integrity.

In this context, data integrity refers to its accuracy, consistency, and reliability…throughout its lifecycle. It means ensuring that data is not only accurately recorded but also remains logically consistent across systems, is up-to-date, and can be verified as authentic. When data lacks integrity, it can contain contradictions, errors, or outdated information—problems that can have serious real-world consequences.

Without data integrity, someone could classify you as a teenager while simultaneously attributing to you three teenage children: a biological impossibility. What’s worse, you have no visibility into the data profiles assigned to your identity, no mechanism to correct errors, and no authoritative way to update your information across all platforms where it resides.

Integrity breaches don’t get the same attention that confidentiality breaches do, but the picture isn’t pretty. A 2017 write-up in The Atlantic found error rates exceeding 50% in some categories of personal information. A 2019 audit of data brokers found at least 40% of data broker sourced user attributes are “not at all” accurate. In 2022, the Consumer Financial Protection Bureau documented thousands of cases where consumers were denied housing, employment, or financial services based on logically impossible data combinations in their profiles. Similarly, the National Consumer Law Center report called “Digital Denials” showed inaccuracies in tenant screening data that blocked people from housing.

And integrity breaches can have significant effects on our lives. In one 2024 British case, two companies blamed each other for the faulty debt information that caused catastrophic financial consequences for an innocent victim. Breonna Taylor was killed in 2020 during a police raid on her apartment in Louisville, Kentucky, when officers executed a “no-knock” warrant on the wrong house based on bad data. They had faulty intelligence connecting her address to a suspect who actually lived elsewhere.

In some instances, we have rights to view our data, and in others, rights to correct it, but these sorts of solutions have only limited value. When journalist Julia Angwin attempted to correct her information across major data brokers for her book Dragnet Nation, she found that even after submitting corrections through official channels, a significant number of errors reappeared within six months.

In some instances, we have the right to delete our data, but—again—this only has limited value. Some data processing is legally required, and some is necessary for services we truly want and need.

Our focus needs to shift from the binary choice of either concealing our data entirely or surrendering all control over it. Instead, we need solutions that prioritize integrity in ways that balance privacy with the benefits of data sharing.

It’s not as if we haven’t made progress in better ways to manage online identity. Over the years, numerous trustworthy systems have been developed that could solve many of these problems. For example, imagine digital verification that works like a locked mobile phone—it works when you’re the one who can unlock and use it, but not if someone else grabs it from you. Or consider a storage device that holds all your credentials, like your driver’s license, professional certifications, and healthcare information, and lets you selectively share one without giving away everything at once. Imagine being able to share just a single cell in a table or a specific field in a file. These technologies already exist, and they could let you securely prove specific facts about yourself without surrendering control of your whole identity. This isn’t just theoretically better than traditional usernames and passwords; the technologies represent a fundamental shift in how we think about digital trust and verification.

Standards to do all these things emerged during the Web 2.0 era. We mostly haven’t used them because platform companies have been more interested in building barriers around user data and identity. They’ve used control of user identity as a key to market dominance and monetization. They’ve treated data as a corporate asset, and resisted open standards that would democratize data ownership and access. Closed, proprietary systems have better served their purposes.

There is another way. The Solid protocol, invented by Sir Tim Berners-Lee, represents a radical reimagining of how data operates online. Solid stands for “SOcial LInked Data.” At its core, it decouples data from applications by storing personal information in user-controlled “data wallets”: secure, personal data stores that users can host anywhere they choose. Applications can access specific data within these wallets, but users maintain ownership and control.

Solid is more than distributed data storage. This architecture inverts the current data ownership model. Instead of companies owning user data, users maintain a single source of truth for their personal information. It integrates and extends all those established identity standards and technologies mentioned earlier, and forms a comprehensive stack that places personal identity at the architectural center.

This identity-first paradigm means that every digital interaction begins with the authenticated individual who maintains control over their data. Applications become interchangeable views into user-owned data, rather than data silos themselves. This enables unprecedented interoperability, as services can securely access precisely the information they need while respecting user-defined boundaries.

Solid ensures that user intentions are transparently expressed and reliably enforced across the entire ecosystem. Instead of each application implementing its own custom authorization logic and access controls, Solid establishes a standardized declarative approach where permissions are explicitly defined through control lists or policies attached to resources. Users can specify who has access to what data with granular precision, using simple statements like “Alice can read this document” or “Bob can write to this folder.” These permission rules remain consistent, regardless of which application is accessing the data, eliminating the fragmentation and unpredictability of traditional authorization systems.

This architectural shift decouples applications from data infrastructure. Unlike Web 2.0 platforms like Facebook, which require massive back-end systems to store, process, and monetize user data, Solid applications can be lightweight and focused solely on functionality. Developers no longer need to build and maintain extensive data storage systems, surveillance infrastructure, or analytics pipelines. Instead, they can build specialized tools that request access to specific data in users’ wallets, with the heavy lifting of data storage and access control handled by the protocol itself.

Let’s take healthcare as an example. The current system forces patients to spread pieces of their medical history across countless proprietary databases controlled by insurance companies, hospital networks, and electronic health record vendors. Patients frustratingly become a patchwork rather than a person, because they often can’t access their own complete medical history, let alone correct mistakes. Meanwhile, those third-party databases suffer regular breaches. The Solid protocol enables a fundamentally different approach. Patients maintain their own comprehensive medical record, with data cryptographically signed by trusted providers, in their own data wallet. When visiting a new healthcare provider, patients can arrive with their complete, verifiable medical history rather than starting from zero or waiting for bureaucratic record transfers.

When a patient needs to see a specialist, they can grant temporary, specific access to relevant portions of their medical history. For example, a patient referred to a cardiologist could share only cardiac-related records and essential background information. Or, on the flip side, the patient can share new and rich sources of related data to the specialist, like health and nutrition data. The specialist, in turn, can add their findings and treatment recommendations directly to the patient’s wallet, with a cryptographic signature verifying medical credentials. This process eliminates dangerous information gaps while ensuring that patients maintain an appropriate role in who sees what about them and why.

When a patient—doctor relationship ends, the patient retains all records generated during that relationship—unlike today’s system where changing providers often means losing access to one’s historical records. The departing doctor’s signed contributions remain verifiable parts of the medical history, but they no longer have direct access to the patient’s wallet without explicit permission.

For insurance claims, patients can provide temporary, auditable access to specific information needed for processing—no more and no less. Insurance companies receive verified data directly relevant to claims but should not be expected to have uncontrolled hidden comprehensive profiles or retain information longer than safe under privacy regulations. This approach dramatically reduces unauthorized data use, risk of breaches (privacy and integrity), and administrative costs.

Perhaps most transformatively, this architecture enables patients to selectively participate in medical research while maintaining privacy. They could contribute anonymized or personalized data to studies matching their interests or conditions, with granular control over what information is shared and for how long. Researchers could gain access to larger, more diverse datasets while participants would maintain control over their information—creating a proper ethical model for advancing medical knowledge.

The implications extend far beyond healthcare. In financial services, customers could maintain verified transaction histories and creditworthiness credentials independently of credit bureaus. In education, students could collect verified credentials and portfolios that they truly own rather than relying on institutions’ siloed records. In employment, workers could maintain portable professional histories with verified credentials from past employers. In each case, Solid enables individuals to be the masters of their own data while allowing verification and selective sharing.

The economics of Web 2.0 pushed us toward centralized platforms and surveillance capitalism, but there has always been a better way. Solid brings different pieces together into a cohesive whole that enables the identity-first architecture we should have had all along. The protocol doesn’t just solve technical problems; it corrects the fundamental misalignment of incentives that has made the modern web increasingly hostile to both users and developers.

As we look to a future of increased digitization across all sectors of society, the need for this architectural shift becomes even more apparent. Individuals should be able to maintain and present their own verified digital identity and history, rather than being at the mercy of siloed institutional databases. The Solid protocol makes this future technically possible.

This essay was written with Davi Ottenheimer, and originally appeared on The Inrupt Blog.

New Mobile Phone Forensics Tool

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/07/new-mobile-phone-forensics-tool.html

The Chinese have a new tool called Massistant.

  • Massistant is the presumed successor to Chinese forensics tool, “MFSocket”, reported in 2019 and attributed to publicly traded cybersecurity company, Meiya Pico.
  • The forensics tool works in tandem with a corresponding desktop software.
  • Massistant gains access to device GPS location data, SMS messages, images, audio, contacts and phone services.
  • Meiya Pico maintains partnerships with domestic and international law enforcement partners, both as a surveillance hardware and software provider, as well as through training programs for law enforcement personnel.

From a news article:

The good news, per Balaam, is that Massistant leaves evidence of its compromise on the seized device, meaning users can potentially identify and delete the malware, either because the hacking tool appears as an app, or can be found and deleted using more sophisticated tools such as the Android Debug Bridge, a command line tool that lets a user connect to a device through their computer.

The bad news is that at the time of installing Massistant, the damage is done, and authorities already have the person’s data.

Slashdot thread.

Security Vulnerabilities in ICEBlock

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/07/security-vulnerabilities-in-iceblock.html

The ICEBlock tool has vulnerabilities:

The developer of ICEBlock, an iOS app for anonymously reporting sightings of US Immigration and Customs Enforcement (ICE) officials, promises that it “ensures user privacy by storing no personal data.” But that claim has come under scrutiny. ICEBlock creator Joshua Aaron has been accused of making false promises regarding user anonymity and privacy, being “misguided” about the privacy offered by iOS, and of being an Apple fanboy. The issue isn’t what ICEBlock stores. It’s about what it could accidentally reveal through its tight integration with iOS.

Surveillance Used by a Drug Cartel

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/07/surveillance-used-by-a-drug-cartel.html

Once you build a surveillance system, you can’t control who will use it:

A hacker working for the Sinaloa drug cartel was able to obtain an FBI official’s phone records and use Mexico City’s surveillance cameras to help track and kill the agency’s informants in 2018, according to a new US justice department report.

The incident was disclosed in a justice department inspector general’s audit of the FBI’s efforts to mitigate the effects of “ubiquitous technical surveillance,” a term used to describe the global proliferation of cameras and the thriving trade in vast stores of communications, travel, and location data.

[…]

The report said the hacker identified an FBI assistant legal attaché at the US embassy in Mexico City and was able to use the attaché’s phone number “to obtain calls made and received, as well as geolocation data.” The report said the hacker also “used Mexico City’s camera system to follow the [FBI official] through the city and identify people the [official] met with.”

FBI report.

Orange Me2eets: We made an end-to-end encrypted video calling app and it was easy

Post Syndicated from Michael Rosenberg original https://blog.cloudflare.com/orange-me2eets-we-made-an-end-to-end-encrypted-video-calling-app-and-it-was/

Developing a new video conferencing application often begins with a peer-to-peer setup using WebRTC, facilitating direct data exchange between clients. While effective for small demonstrations, this method encounters scalability hurdles with increased participants. The data transmission load for each client escalates significantly in proportion to the number of users, as each client is required to send data to every other client except themselves (n-1).

In the scaling of video conferencing applications, Selective Forwarding Units (SFUs) are essential.  Essentially a media stream routing hub, an SFU receives media and data flows from participants and intelligently determines which streams to forward. By strategically distributing media based on network conditions and participant needs, this mechanism minimizes bandwidth usage and greatly enhances scalability. Nearly every video conferencing application today uses SFUs.

In 2024, we announced Cloudflare Realtime (then called Cloudflare Calls), our suite of WebRTC products, and we also released Orange Meets, an open source video chat application built on top of our SFU.

We also realized that use of an SFU often comes with a privacy cost, as there is now a centralized hub that could see and listen to all the media contents, even though its sole job is to forward media bytes between clients as a data plane.

We believe end-to-end encryption should be the industry standard for secure communication and that’s why today we’re excited to share that we’ve implemented and open sourced end-to-end encryption in Orange Meets. Our generic implementation is client-only, so it can be used with any WebRTC infrastructure. Finally, our new designated committer distributed algorithm is verified in a bounded model checker to verify this algorithm handles edge cases gracefully.

End-to-end encryption for video conferencing is different than for text messaging

End-to-end encryption describes a secure communication channel whereby only the intended participants can read, see, or listen to the contents of the conversation, not anybody else. WhatsApp and iMessage, for example, are end-to-end-encrypted, which means that the companies that operate those apps or any other infrastructure can’t see the contents of your messages. 

Whereas encrypted group chats are usually long-lived, highly asynchronous, and low bandwidth sessions, video and audio calls are short-lived, highly synchronous, and require high bandwidth. This difference comes with plenty of interesting tradeoffs, which influenced the design of our system.

We had to consider how factors like the ephemeral nature of calls, compared to the persistent nature of group text messages, also influenced the way we designed E2EE for Orange Meets. In chat messages, users must be able to decrypt messages sent to them while they were offline (e.g. while taking a flight). This is not a problem for real-time communication.

The bandwidth limitations around audio/video communication and the use of an SFU prevented us from using some of the E2EE technologies already available for text messages. Apple’s iMessage, for example, encrypts a message N-1 times for an N-user group chat. We can’t encrypt the video for each recipient, as that could saturate the upload capacity of Internet connections as well as slow down the client. Media has to be encrypted once and decrypted by each client while preserving secrecy around only the current participants of the call.

Messaging Layer Security (MLS)

Around the same time we were working on Orange Meets, we saw a lot of excitement around new apps being built with Messaging Layer Security (MLS), an IETF-standardized protocol that describes how you can do a group key exchange in order to establish end-to-end-encryption for group communication. 

Previously, the only way to achieve these properties was to essentially run your own fork of the Signal protocol, which itself is more of a living protocol than a solidified standard. Since MLS is standardized, we’ve now seen multiple high-quality implementations appear, and we’re able to use them to achieve Signal-level security with far less effort.

Implementing MLS here wasn’t easy: it required a moderate amount of client modification, and the development and verification of an encrypted room-joining protocol. Nonetheless, we’re excited to be pioneering a standards-based approach that any customer can run on our network, and to share more details about how our implementation works. 

We did not have to make any changes to the SFU to get end-to-end encryption working. Cloudflare’s SFU doesn’t care about the contents of the data forwarded on our data plane and whether it’s encrypted or not.

Orange Meets: the basics 

Orange Meets is a video calling application built on Cloudflare Workers that uses the Cloudflare Realtime SFU service as the data plane. The roles played by the three main entities in the application are as follows:

  • The user is a participant in the video call. They connect to the Orange Meets server and SFU, described below.

  • The Orange Meets Server is a simple service run on a Cloudflare Worker that runs the small-scale coordination logic of Orange Meets, which is concerned with which user is in which video call — called a room — and what the state of the room is. Whenever something in the room changes, like a participant joining or leaving, or someone muting themselves, the app server broadcasts the change to all room participants. You can use any backend server for this component, we just chose Cloudflare Workers for its convenience.

  • Cloudflare Realtime Selective Forwarding Unit (SFU) is a service that Cloudflare runs, which takes everyone’s audio and video and broadcasts it to everyone else. These connections are potentially lossy, using UDP for transmission. This is done because a dropped video frame from five seconds ago is not very important in the context of a video call, and so should not be re-sent, as it would be in a TCP connection.


The network topology of Orange Meets

Next, we have to define what we mean by end-to-end encryption in the context of video chat.

End-to-end encrypting Orange Meets 

The most immediate way to end-to-end encrypt Orange Meets is to simply have the initial users agree on a symmetric encryption/decryption key at the beginning of a call, and just encrypt every video frame using that key. This is sufficient to hide calls from Cloudflare’s SFU. Some source-encrypted video conferencing implementations, such as Jitsi Meet, work this way.

The issue, however, is that kicking a malicious user from a call does not invalidate their key, since the keys are negotiated just once. A joining user learns the key that was used to encrypt video from before they joined. These failures are more formally referred to as failures of post-compromise security and perfect forward secrecy. When a protocol successfully implements these in a group setting, we call the protocol a continuous group key agreement protocol.

Fortunately for us, MLS is a continuous group key agreement protocol that works out of the box, and the nice folks at Phoenix R&D and Cryspen have a well-documented open-source Rust implementation of most of the MLS protocol. 

All we needed to do was write an MLS client and compile it to WASM, so we could decrypt video streams in-browser. We’re using WASM since that’s one way of running Rust code in the browser. If you’re running a video conferencing application on a desktop or mobile native environment, there are other MLS implementations in your preferred programming language.

Our setup for encryption is as follows:

Make a web worker for encryption. We wrote a web worker in Rust that accepts a WebRTC video stream, broken into individual frames, and encrypts each frame. This code is quite simple, as it’s just an MLS encryption:

group.create_message(
	&self.mls_provider,
	self.my_signing_keys.as_ref()?,
	frame,
)

Postprocess outgoing audio/video. We take our normal stream and, using some newer features of the WebRTC API, add a transform step to it. This transform step simply sends the stream to the worker:

const senderStreams = sender.createEncodedStreams()
const { readable, writable } = senderStreams
this.worker.postMessage(
	{
    	    type: 'encryptStream',
    	    in: readable,
    	    out: writable,
	},
	[readable, writable]
)

And the same for decryption:

const receiverStreams = receiver.createEncodedStreams()
const { readable, writable } = receiverStreams
this.worker.postMessage(
	{
    	    type: 'decryptStream',
    	    in: readable,
    	    out: writable,
	},
	[readable, writable]
)

Once we do this for both audio and video streams, we’re done.

Handling different codec behaviors

The streams are now encrypted before sending and decrypted before rendering, but the browser doesn’t know this. To the browser, the stream is still an ordinary video or audio stream. This can cause errors to occur in the browser’s depacketizing logic, which expects to see certain bytes in certain places, depending on the codec. This results in some extremely cypherpunk artifacts every dozen seconds or so:


Fortunately, this exact issue was discovered by engineers at Discord, who handily documented it in their DAVE E2EE videocalling protocol. For the VP8 codec, which we use by default, the solution is simple: split off the first 1–10 bytes of each packet, and send them unencrypted:

fn split_vp8_header(frame: &[u8]) -> Option<(&[u8], &[u8])> {
    // If this is a keyframe, keep 10 bytes unencrypted. Otherwise, 1 is enough
    let is_keyframe = frame[0] >> 7 == 0;
    let unencrypted_prefix_size = if is_keyframe { 10 } else { 1 };
    frame.split_at_checked(unencrypted_prefix_size)
}

These bytes are not particularly important to encrypt, since they only contain versioning info, whether or not this frame is a keyframe, some constants, and the width and height of the video.

And that’s truly it for the stream encryption part! The only thing remaining is to figure out how we will let new users join a room.

“Join my Orange Meet” 

Usually, the only way to join the call is to click a link. And since the protocol is encrypted, a joining user needs to have some cryptographic information in order to decrypt any messages. How do they receive this information, though? There are a few options.

DAVE does it by using an MLS feature called external proposals. In short, the Discord server registers itself as an external sender, i.e., a party that can send administrative messages to the group, but cannot receive any. When a user wants to join a room, they provide their own cryptographic material, called a key package, and the server constructs and sends an MLS External Add message to the group to let them know about the new user joining. Eventually, a group member will commit this External Add, sending the joiner a Welcome message containing all information necessary to send and receive video.


A user joining a group via MLS external proposals. Recall the Orange Meets app server functions as a broadcast channel for the whole group. We consider a group of 3 members. We write member #2 as the one committing to the proposal, but this can be done by any member. Member #2 also sends a Commit message to the other members, but we omit this for space. 

This is a perfectly viable way to implement room joining, but implementing it would require us to extend the Orange Meets server logic to have some concept of MLS. Since part of our goal is to keep things as simple as possible, we would like to do all our cryptography client-side.

So instead we do what we call the designated committer algorithm. When a user joins a group, they send their cryptographic material to one group member, the designated committer, who then constructs and sends the Add message to the rest of the group. Similarly, when notified of a user’s exit, the designated committer constructs and sends a Remove message to the rest of the group. With this setup, the server’s job remains nothing more than broadcasting messages! It’s quite simple too—the full implementation of the designated committer state machine comes out to 300 lines of Rust, including the MLS boilerplate, and it’s about as efficient.


A user joining a group via the designated committer algorithm.

One cool property of the designated committer algorithm is that something like this isn’t possible in a text group chat setting, since any given user (in particular, the designated committer) may be offline for an arbitrary period of time. Our method works because it leverages the fact that video calls are an inherently synchronous medium.

Verifying the Designated Committer Algorithm with TLA+

The designated committer algorithm is a pretty neat simplification, but it comes with some non-trivial edge cases that we need to make sure we handle, such as:

  • How do we make sure there is only one designated committer at a time? The designated committer is the alive user with the smallest index in the MLS group state, which all users share.

  • What happens if the designated committer exits? Then the next user will take its place. Every user keeps track of pending Adds and Removes, so it can continue where the previous designated committer left off.

  • If a user has not caught up to all messages, could they think they’re the designated committer? No, they have to believe first that all prior eligible designated committers are disconnected.

To make extra sure that this algorithm was correct, we formally modeled it and put it through the TLA+ model checker. To our surprise, it caught some low-level bugs! In particular, it found that, if the designated committer dies while adding a user, the protocol does not recover. We fixed these by breaking up MLS operations and enforcing a strict ordering on messages locally (e.g., a Welcome is always sent before its corresponding Add).

You can find an explainer, lessons learned, and the full PlusCal program (a high-level language that compiles to TLA+) here. The caveat, as with any use of a bounded model checker, is that the checking is, well, bounded. We verified that no invalid protocol states are possible in a group of up to five users. We think this is good evidence that the protocol is correct for an arbitrary number of users. Because there are only two distinct roles in the protocol (designated committer and other group member), any weird behavior ought to be reproducible with two or three users, max.

Preventing Man-in-the-Middle attacks

One important concern to address in any end-to-end encryption setup is how to prevent the service provider from replacing users’ key packages with their own. If the Orange Meets app server did this, and colluded with a malicious SFU to decrypt and re-encrypt video frames on the fly, then the SFU could see all the video sent through the network, and nobody would know.

To resolve this, like DAVE, we include a safety number in the corner of the screen for all calls. This number uniquely represents the cryptographic state of the group. If you check out-of-band (e.g., in a Signal group chat) that everyone agrees on the safety number, then you can be sure nobody’s key material has been secretly replaced.

In fact, you could also read the safety number aloud in the video call itself, but doing this is not provably secure. Reading a safety number aloud is an in-band verification mechanism, i.e., one where a party authenticates a channel within that channel. If a malicious app server colluding with a malicious SFU were able to construct believable video and audio of the user reading the safety number aloud, it could bypass this safety mechanism. So if your threat model includes adversaries that are able to break into a Worker and Cloudflare’s SFU, and simultaneously generate real-time deep-fakes, you should use out-of-band verification 😄.

Future work

There are some areas we could improve on:

  • There is another attack vector for a malicious app server: it is possible to simply serve users malicious Javascript. This problem, more generally called the Javascript Cryptography Problem, affects any in-browser application where the client wants to hide data from the server. Fortunately, we are working on a standard to address this, called Web Application Manifest Consistency, Integrity, and Transparency. In short, like our Code Verify solution for WhatsApp, this would allow every website to commit to the Javascript it serves, and have a third party create an auditable log of the code. With transparency, malicious Javascript can still be distributed, but at least now there is a log that records the code.

  • We can make out-of-band authentication easier by placing trust in an identity provider. Using OpenPubkey, it would be possible for a user to get the identity provider to sign their cryptographic material, and then present that. Then all the users would check the signature before using the material. Transparency would also help here to ensure no signatures were made in secret.

Conclusion

We built end-to-end encryption into the Orange Meets video chat app without a lot of engineering time, and by modifying just the client code. To do so, we built a WASM (compiled from Rust) service worker that sets up an MLS group and does stream encryption and decryption, and designed a new joining protocol for groups, called the designated committer algorithm, and formally modeled it in TLA+. We made comments for all kinds of optimizations that are left to do, so please send us a PR if you’re so inclined!

Try using Orange Meets with E2EE enabled at e2ee.orange.cloudflare.dev, or deploy your own instance using the open source repository on Github.