These days, the most important meeting attendee isn’t a person: It’s the AI notetaker.
This system assigns action items and determines the importance of what is said. If it becomes necessary to revisit the facts of the meeting, its summary is treated as impartial evidence.
But clever meeting attendees can manipulate this system’s record by speaking more to what the underlying AI weights for summarization and importance than to their colleagues. As a result, you can expect some meeting attendees to use language more likely to be captured in summaries, timing their interventions strategically, repeating key points, and employing formulaic phrasing that AI models are more likely to pick up on. Welcome to the world of AI summarization optimization (AISO).
Optimizing for algorithmic manipulation
AI summarization optimization has a well-known precursor: SEO.
Search-engine optimization is as old as the World Wide Web. The idea is straightforward: Search engines scour the internet digesting every possible page, with the goal of serving the best results to every possible query. The objective for a content creator, company, or cause is to optimize for the algorithm search engines have developed to determine their webpage rankings for those queries. That requires writing for two audiences at once: human readers and the search-engine crawlers indexing content. Techniques to do this effectively are passed around like trade secrets, and a $75 billion industry offers SEO services to organizations of all sizes.
More recently, researchers have documented techniques for influencing AI responses, including large-language model optimization (LLMO) and generative engine optimization (GEO). Tricks include content optimization—adding citations and statistics—and adversarial approaches: using specially crafted text sequences. These techniques often target sources that LLMs heavily reference, such as Reddit, which is claimed to be cited in 40% of AI-generated responses. The effectiveness and real-world applicability of these methods remains limited and largely experimental, although there is substantial evidence that countries such as Russia are activelypursuingthis.
AI summarization optimization follows the same logic on a smaller scale. Human participants in a meeting may want a certain fact highlighted in the record, or their perspective to be reflected as the authoritative one. Rather than persuading colleagues directly, they adapt their speech for the notetaker that will later define the “official” summary. For example:
“The main factor in last quarter’s delay was supply chain disruption.”
“The key outcome was overwhelmingly positive client feedback.”
“Our takeaway here is in alignment moving forward.”
“What matters here is the efficiency gains, not the temporary cost overrun.”
The techniques are subtle. They employ high-signal phrases such as “key takeaway” and “action item,” keep statements short and clear, and repeat them when possible. They also use contrastive framing (“this, not that”), and speak early in the meeting or at transition points.
Once spoken words are transcribed, they enter the model’s input. Cue phrases—and even transcription errors—can steer what makes it into the summary. In many tools, the output format itself is also a signal: Summarizers often offer sections such as “Key Takeaways” or “Action Items,” so language that mirrors those headings is more likely to be included. In effect, well-chosen phrases function as implicit markers that guide the AI toward inclusion.
Research confirms this. Early AI summarization research showedthat models trained to reconstruct summary-style sentences systematically overweigh such content. Models over-rely on early-positioncontent in news. And models often overweigh statements at the start or end of a transcript, underweighting the middle. Recent work further confirms vulnerability to phrasing-based manipulation: models cannot reliably distinguish embedded instructions from ordinary content, especially when phrasing mimics salient cues.
How to combat AISO
If AISO becomes common, three forms of defense will emerge. First, meeting participants will exert social pressure on one another. When researchers secretly deployed AI bots in Reddit’s r/changemyview community, users and moderators responded with strong backlash calling it “psychological manipulation.” Anyone using obvious AI-gaming phrases may face similar disapproval.
Second, organizations will start governing meeting behavior using AI: risk assessments and access restrictions before the meetings even start, detection of AISO techniques in meetings, and validation and auditing after the meetings.
Third, AI summarizers will have their own technical countermeasures. For example, the AI security company CloudSEK recommends content sanitization to strip suspicious inputs, prompt filtering to detect meta-instructions and excessive repetition, context window balancing to weight repeated content less heavily, and user warnings showing content provenance.
AI summarization optimization is a small, subtle shift, but it illustrates how the adoption of AI is reshaping human behavior in unexpected ways. The potential implications are quietly profound.
Meetings—humanity’s most fundamental collaborative ritual—are being silently reengineered by those who understand the algorithm’s preferences. The articulate are gaining an invisible advantage over the wise. Adversarial thinking is becoming routine, embedded in the most ordinary workplace rituals, and, as AI becomes embedded in organizational life, strategic interactions with AI notetakers and summarizers may soon be a necessary executive skill for navigating corporate culture.
AI summarization optimization illustrates how quickly humans adapt communication strategies to new technologies. As AI becomes more embedded in workplace communication, recognizing these emerging patterns may prove increasingly important.
This essay was written with Gadi Evron, and originally appeared in CSO.
Below, co-authors Bruce Schneier and Nathan E. Sanders share five key insights from their new book, Rewiring Democracy: How AI Will Transform Our Politics, Government, and Citizenship.
What’s the big idea?
AI can be used both for and against the public interest within democracies. It is already being used in the governing of nations around the world, and there is no escaping its continued use in the future by leaders, policy makers, and legal enforcers. How we wire AI into democracy today will determine if it becomes a tool of oppression or empowerment.
1. AI’s global democratic impact is already profound.
It’s been just a few years since ChatGPT stormed into view and AI’s influence has already permeated every democratic process in governments around the world:
In 2022, an artist collective in Denmark founded the world’s first political party committed to an AI-generated policy platform.
Also in 2022, South Korean politicians running for the presidency were the first to use AI avatars to communicate with voters en masse.
In 2023, a Brazilian municipal legislator passed the first enacted law written by AI.
In 2024, a U.S. federal court judge started using AI to interpret the plain meaning of words in U.S. law.
Also in 2024, the Biden administration disclosed more than two thousand discrete use cases for AI across the agencies of the U.S. federal government.
The examples illustrate the diverse uses of AI across citizenship, politics, legislation, the judiciary, and executive administration.
Not all of these uses will create lasting change. Some of these will be one-offs. Some are inherently small in scale. Some were publicity stunts. But each use case speaks to a shifting balance of supply and demand that AI will increasingly mediate.
Legislators need assistance drafting bills and have limited staff resources, especially at the local and state level. Historically, they have looked to lobbyists and interest groups for help. Increasingly, it’s just as easy for them to use an AI tool.
2. The first places AI will be used are where there is the least public oversight.
Many of the use cases for AI in governance and politics have vocal objectors. Some make us uncomfortable, especially in the hands of authoritarians or ideological extremists.
In some cases, politics will be a regulating force to prevent dangerous uses of AI. Massachusetts has banned the use of AI face recognition in law enforcement because of real concerns voiced by the public about their tendency to encode systems of racial bias.
Some of the uses we think might be most impactful are unlikely to be adopted fast because of legitimate concern about their potential to make mistakes, introduce bias, or subvert human agency. AIs could be assistive tools for citizens, acting as their voting proxies to help us weigh in on larger numbers of more complex ballot initiatives, but we know that many will object to anything that verges on AIs being given a vote.
But AI will continue to be rapidly adopted in some aspects of democracy, regardless of how the public feels. People within democracies, even those in government jobs, often have great independence. They don’t have to ask anyone if it’s ok to use AI, and they will use it if they see that it benefits them. The Brazilian city councilor who used AI to draft a bill did not ask for anyone’s permission. The U.S. federal judge who used AI to help him interpret law did not have to check with anyone first. And the Trump administration seems to be using AI for everything from drafting tariff policies to writing public health reports—with some obvious drawbacks.
It’s likely that even the thousands of disclosed AI uses in government are only the tip of the iceberg. These are just the applications that governments have seen fit to share; the ones they think are the best vetted, most likely to persist, or maybe the least controversial to disclose.
3. Elites and authoritarians will use AI to concentrate power.
Many Westerners point to China as a cautionary tale of how AI could empower autocracy, but the reality is that AI provides structural advantages to entrenched power in democratic governments, too. The nature of automation is that it gives those at the top of a power structure more control over the actions taken at its lower levels.
It’s famously hard for newly elected leaders to exert their will over the many layers of human bureaucracies. The civil service is large, unwieldy, and messy. But it’s trivial for an executive to change the parameters and instructions of an AI model being used to automate the systems of government.
The dynamic of AI effectuating concentration of power extends beyond government agencies. Over the past five years, Ohio has undertaken a project to do a wholesale revision of its administrative code using AI. The leaders of that project framed it in terms of efficiency and good governance: deleting millions of words of outdated, unnecessary, or redundant language. The same technology could be applied to advance more ideological ends, like purging all statutory language that places burdens on business, neglects to hold businesses accountable, protects some class of people, or fails to protect others.
Whether you like or despise automating the enactment of those policies will depend on whether you stand with or are opposed to those in power, and that’s the point. AI gives any faction with power the potential to exert more control over the levers of government.
4. Organizers will find ways to use AI to distribute power instead.
We don’t have to resign ourselves to a world where AI makes the rich richer and the elite more powerful. This is a technology that can also be wielded by outsiders to help level the playing field.
In politics, AI gives upstart and local candidates access to skills and the ability to do work on a scale that used to only be available to well-funded campaigns. In the 2024 cycle, Congressional candidates running against incumbents like Glenn Cook in Georgia and Shamaine Daniels in Pennsylvania used AI to help themselves be everywhere all at once. They used AI to make personalized robocalls to voters, write frequent blog posts, and even generate podcasts in the candidate’s voice. In Japan, a candidate for Governor of Tokyo used an AI avatar to respond to more than eight thousand online questions from voters.
Outside of public politics, labor organizers are also leveraging AI to build power. The Worker’s Lab is a U.S. nonprofit developing assistive technologies for labor unions, like AI-enabled apps that help service workers report workplace safety violations. The 2023 Writers’ Guild of America strike serves as a blueprint for organizers. They won concessions from Hollywood studios that protect their members against being displaced by AI while also winning them guarantees for being able to use AI as assistive tools to their own benefit.
5. The ultimate democratic impact of AI depends on us.
If you are excited about AI and see the potential for it to make life, and maybe even democracy, better around the world, recognize that there are a lot of people who don’t feel the same way.
If you are disturbed about the ways you see AI being used and worried about the future that leads to, recognize that the trajectory we’re on now is not the only one available.
The technology of AI itself does not pose an inherent threat to citizens, workers, and the public interest. Like other democratic technologies—voting processes, legislative districts, judicial review—its impacts will depend on how it’s developed, who controls it, and how it’s used.
Constituents of democracies should do four things:
Reform the technology ecosystem to be more trustworthy, so that AI is developed with more transparency, more guardrails around exploitative use of data, and public oversight.
Resist inappropriate uses of AI in government and politics, like facial recognition technologies that automate surveillance and encode inequity.
Responsibly use AI in government where it can help improve outcomes, like making government more accessible to people through translation and speeding up administrative decision processes.
Renovate the systems of government vulnerable to the disruptive potential of AI’s superhuman capabilities, like political advertising rules that never anticipated deepfakes.
These four Rs are how we can rewire our democracy in a way that applies AI to truly benefit the public interest.
This essay was written with Nathan E. Sanders, and originally appeared in The Next Big Idea Club.
EDITED TO ADD (11/6): This essay was republished by Fast Company.
Current artificial intelligence (AI) methods, especially machine learning (ML), rely heavily on data. To complement our work on AI literacy, we have been investigating what data science teaching resources and education research are currently available. Our goal is to work out what data science concepts should be taught in a data science curriculum for schools.
Read on to find out what resources and materials we have reviewed, and what concept themes we have identified.
What is data science? Why is teaching it important?
Data science is an interdisciplinary science of learning from large datasets, aided by modern computational tools and methods (Ow‑Yeong et al., 2023). We see data science skills as fundamental for using, creating, and thinking critically about:
Insights from data, generally
Data-driven computational tools and methods (such as machine learning) and their outputs and predictions, specifically
To navigate a world where decision making in many areas is influenced by data-driven insights and predictions, young people need to be taught about data science. Data science skills empower young people to become critical thinkers, discerning consumers, adaptable professionals, and informed citizens.
In some countries, such as India and Israel, data science education is an established school subject. It is taught as part of the curriculum in at least one of the primary, secondary, or post-16 age phases. Meanwhile in other countries, for example Canada, Germany, and Poland, data science is a very new school subject, or there are still only recommendations to develop it into a school subject.
While we are currently considering what a comprehensive data science curriculum should include, we already offer several resources to support you with your teaching about data science and data-driven technologies. You can find a list of these resources at the end of this blog. Now, however, I’ll give you an overview of our recent work to identify concepts for a data science curriculum that fits with our approach to AI literacy.
Data science education: What should we teach?
To answer the question ‘What should we teach about data science to learners aged 5 to 19?’, we undertook a grey literature review of data science teaching materials. A grey literature review is structured like an academic literature review and conducted with the same rigour. The difference is that a grey literature review also considers publications that have not been peer-reviewed, including reports, white papers, curriculum materials, and similar resources.
To orient our work, we combined four frameworks for data science and AI/ML education:
With these combined frameworks as our map, we reviewed 79 data science learning resources. The resources varied:
In quality in terms of clarity and teaching approach
In their focus, e.g. on maths, coding, or a specific field such as biology
In their perspective on data science, with some prioritising theory and others real-world applications
From among the 79 resources, we chose 9 that included clear learning outcomes, and that together covered a wide field of concepts. We examined these 9 in detail to extract 181 explicit and implicit data science concepts. Next, we grouped the concepts into themes, and finally we refined these themes by comparing them against the four frameworks listed above.
The themes we have identified for a data science curriculum are:
Fundamentals of data literacy: Key terms and definitions
Understanding bias in data
Ethical responsibility in data use
Data creation, curation, and transformation
Analysis and modelling: Maths and statistics fundamentals
ML principles
Deploying and maintaining ML applications
Software tools and programming
Data visualisation
Presenting findings effectively
This set of themes both fits with the frameworks by Olari and Romeike and Data Science 4 Everyone, and expands them by covering ML principles and programming approaches and calling out data bias and ethics.
What’s next for this work?
Through our grey literature review on data science education, we’ve:
Pinpointed a large set of candidate concepts that could be taught within a data science curriculum
Created a set of clear themes to structure our work going forward
Our next step is to shape these candidate concepts into a progression framework to describe their relationships and establish which concepts could be taught at each age or phase of schooling.
The literature review also gave us an overview of the pedagogical approaches and tools used for teaching data science concepts. These findings will become useful once we start designing learning activities.
You’ll hear more about how this work is going here on our blog and on our social channels. In the meantime, comment below to let us know what you think about the themes, or to tell us what you’d like to see in a data science curriculum for the learners you work with.
The report lists the data-related units within The Computing Curriculum materials, which we no longer update but continue to offer as free downloads. Updated classroom materials are available as part of the Computing materials we created for Oak National Academy in the UK for ages 5–11 and ages 12–19.
The Ada Computer Science platform offers learning materials on data and information, and on AI and ML, for ages 14–19.
You might also be interested in exploring the Experience AI programme, which offers everything teachers need to help students develop a foundational understanding of data-driven AI technologies, their social and ethical implications, and the role that AI can play in their lives.
Teacher training and development resources
Our free online course ‘Teach teens computing: Machine learning and AI‘ helps teachers understand and explain the types of problems that ML can help to solve, discuss how AI is changing the world, and think about the ethics of collecting data to train a ML model.
Teaching young people to understand data-driven AI technologies means teaching them thinking skills that are different to those needed to understand rule-based computer systems. You can read about these Computational Thinking 2.0 skills in our Quick Read PDF.
Interesting article about the arms race between AI systems that invent/design new biological pathogens, and AI systems that detect them before they’re created:
The team started with a basic test: use AI tools to design variants of the toxin ricin, then test them against the software that is used to screen DNA orders. The results of the test suggested there was a risk of dangerous protein variants slipping past existing screening software, so the situation was treated like the equivalent of a zero-day vulnerability.
[…]
Details of that original test are being made available today as part of a much larger analysis that extends the approach to a large range of toxic proteins. Starting with 72 toxins, the researchers used three open source AI packages to generate a total of about 75,000 potential protein variants.
And this is where things get a little complicated. Many of the AI-designed protein variants are going to end up being non-functional, either subtly or catastrophically failing to fold up into the correct configuration to create an active toxin.
[…]
In any case, DNA sequences encoding all 75,000 designs were fed into the software that screens DNA orders for potential threats. One thing that was very clear is that there were huge variations in the ability of the four screening programs to flag these variant designs as threatening. Two of them seemed to do a pretty good job, one was mixed, and another let most of them through. Three of the software packages were updated in response to this performance, which significantly improved their ability to pick out variants.
There was also a clear trend in all four screening packages: The closer the variant was to the original structurally, the more likely the package (both before and after the patches) was to be able to flag it as a threat. In all cases, there was also a cluster of variant designs that were unlikely to fold into a similar structure, and these generally weren’t flagged as threats.
The research is all preliminary, and there are a lot of ways in which the experiment diverges from reality. But I am not optimistic about this particular arms race. I think that the ability of AI systems to create something deadly will advance faster than the ability of AI systems to detect its components.
The Internet is constantly changing in ways that are difficult to see. How do we measure its health, spot new threats, and track the adoption of new technologies? When we launched Cloudflare Radar in 2020, our goal was to illuminate the Internet’s patterns, helping anyone understand what was happening from a security, performance, and usage perspective, based on aggregated data from Cloudflare services. From the start, Internet measurement, transparency, and resilience has been at the core of our mission.
The launch blog post noted, “There are three key components that we’re launching today: Radar Internet Insights, Radar Domain Insights and Radar IP Insights.” These components have remained at the core of Radar, and they have been continuously expanded and complemented by other data sets and capabilities to support that mission. By shining a brighter light on Internet security, routing, traffic disruptions, protocol adoption, DNS, and now AI, Cloudflare Radar has become an increasingly comprehensive source of information and insights. And despite our expanding scope, we’ve focused on maintaining Radar’s “easy access” by evolving our information architecture, making our search capabilities more powerful, and building everything on top of a powerful, publicly-accessible API.
Now more than ever, Internet observability matters. New protocols and use cases compete with new security threats. Connectivity is threatened not only by errant construction equipment, but also by governments practicing targeted content blocking. Cloudflare Radar is uniquely positioned to provide actionable visibility into these trends, threats, and events with local, network, and global level insights, spanning multiple data sets. Below, we explore some highlights of Radar’s evolution over the five years since its launch, looking at how Cloudflare Radar is building one of the industry’s most comprehensive views of what is happening on the Internet.
Making Internet security more transparent
The Cloudflare Research team takes a practical approach to research, tackling projects that have the potential to make a big impact. A number of these projects have been in the security space, and for three of them, we’ve collaborated to bring associated data sets to Radar, highlighting the impact of these projects.
The 2025 launch of the Certificate Transparency (CT) section on Radar was the culmination of several months of collaborative work to expand visibility into key metrics for the Certificate Transparency ecosystem, enabling us to deprecate the original Merkle Town CT dashboard, which was launched in 2018. Digital certificates are the foundation of trust on the modern Internet, and Certificate Authorities (CAs) serve as trusted gatekeepers, issuing those certificates, with CT logs providing a public, auditable record of every certificate issued, making it possible to detect fraudulent or mis-issued certificates. The information available in the new CT section allows users to explore information about these certificates and CAs, as well as about the CT logs that capture information about every issued certificate.
In 2024, members of Cloudflare’s Research team collaborated with outside researchers to publish a paper titled “Global, Passive Detection of Connection Tampering”. Among the findings presented in the paper, it noted that globally, about 20% of all connections to Cloudflare close unexpectedly before any useful data exchange occurs. This unexpected closure is consistent with connection tampering by a third party, which may occur, for instance, when repressive governments seek to block access to websites or applications. Working with the Research team, we added visibility into TCP resets and timeouts to the Network Layer Security page on Radar. This graph, such as the example below for Turkmenistan, provides a perspective on potential connection tampering activity globally, and at a country level. Changes and trends visible in this graph can be used to corroborate reports of content blocking and other local restrictions on Internet connectivity.
The research team has been working on post-quantum encryption since 2017, racing improvements in quantum computing to help ensure that today’s encrypted data and communications are resistant to being decrypted in the future. They have led the drive to incorporate post-quantum encryption across Cloudflare’s infrastructure and services, and in 2023 we announced that it would be included in our delivery services, available to everyone and free of charge, forever. However, to take full advantage, support is needed on the client side as well, so to track that, we worked together to add a graph to Radar’s Adoption & Usage page that tracks the post-quantum encrypted share of HTTPS request traffic. Starting 2024 at under 3%, it has grown to just over 47%, thanks to major browsers and code libraries activating post-quantum support by default.
Measuring AI bot & crawler activity
The rapid proliferation and growth of AI platforms since the launch of OpenAI’s ChatGPT in November 2022 has upended multiple industries. This is especially true for content creators. Over the last several decades, they generally allowed their sites to be crawled in exchange for the traffic that the search engines would send back to them — traffic that could be monetized in various ways. However, two developments have changed this dynamic. First, AI platforms began aggressively crawling these sites to vacuum up content to use for training their models (with no compensation to content creators). Second, search engines have evolved into answer engines, drastically reducing the amount of traffic they send back to sites. This has led content owners to demand solutions.
Among these solutions is providing customers with increased visibility into how frequently AI crawlers are scraping their content, and Radar has built on that to provide aggregated perspectives on this activity. Radar’s AI Insights page provides graphs based on crawling traffic, including traffic trends by bot and traffic trends by crawl purpose, both of which can be broken out by industry set as well. Customers can compare the traffic trends we show on the dashboard with trends across their industry.
One key insight is the crawl-to-refer ratio: a measure of how many HTML pages a crawler consumes in comparison to the number of page visits that they refer back to the crawled site. A view into these ratios by platform, and how they change over time, gives content creators insight into just how significant the reciprocal traffic imbalances are, and the impact of the ongoing transition of search engines into answer engines.
Over the three decades, the humble robots.txt file has served as something of a gatekeeper for websites, letting crawlers know if they are allowed to access content on the site, and if so, which content. Well-behaved crawlers read and parse the file, and adjust their crawling activity accordingly. Based on the robots.txt files found across Radar’s top 10,000 domains, Radar’s AI Insights page shows how many of these sites explicitly allow or disallow these AI crawlers to access content, and how complete that access/restriction is. With the ability to filter the data by domain category, this graph can provide site owners with visibility into how their peers may be dealing with these AI crawlers.
Improving Internet resilience with routing visibility
Routing is the process of selecting a path across one or more networks, and in the context of the Internet, routing selects the paths for Internet Protocol (IP) packets to travel from their origin to their destination. It is absolutely critical to the functioning of the Internet, but lots of things can go wrong, and when they do, they can take a whole network offline. (And depending on the network, a larger blast radius of sites, applications, and other service providers may be impacted.
Routing visibility provides insights into the health of a network, and its relationship to other networks. These insights can help identify or troubleshoot problems when they occur. Among the more significant things that can go wrong are route leaks and origin hijacks. Route leaks occur when a routing announcement propagates beyond its intended scope — that is, when the announcement reaches networks that it shouldn’t. An origin hijack occurs when an attacker creates fake announcements for a targeted prefix, falsely identifying an autonomous systems (AS) under their control as the origin of the prefix — in other words, the attacker claims that their network is responsible for a given set of IP addresses, which would cause traffic to those addresses to be routed to them.
In 2022 and 2023 respectively, we added route leak and origin hijack detection to Radar, providing network operators and other interested groups (such as researchers) with information to help identify which networks may be party to such events, whether as a leaker/hijacker, or a victim. And perhaps more importantly, in 2023 we also launched notifications for route leaks and origin hijacks, automatically notifying subscribers via email or webhook when such an event is detected, enabling them to take immediate action.
In 2025, we further improved this visibility by adding two additional capabilities. The first was real-time BGP route visibility, which illustrates how a given network prefix is connected to other networks — what is the route that packets take to get from that set of IP addresses to the large “tier 1” network providers? Network administrators can use this information when facing network outages, implementing new deployments, or investigating route leaks.
An AS-SET is a grouping of related networks, historically used for multiple purposes such as grouping together a list of downstream customers of a particular network provider. Our recently announced AS-SET monitoring enables network operators to monitor valid and invalid AS-SET memberships for their networks, which can help prevent misuse and issues like route leaks.
Not just pretty pictures
While Radar has been historically focused on providing clear, informative visualizations, we have also launched capabilities that enable users to get at the underlying data more directly, enabling them to use it in a more programmatic fashion. The most important one is the Radar API, launched in 2022. Requiring just an access token, users can get access to all the data shown on Radar, as well as some more advanced filters that provide more specific data, enabling them to incorporate Radar data into their own tools, websites, and applications. The example below shows a simple API call that returns the global distribution of human and bot traffic observed over the last seven days.
The Model Context Protocol is a standard way to make information available to large language models (LLMs). Somewhat similar to the way an application programming interface (API) works, MCP offers a documented, standardized way for a computer program to integrate services from an external source. It essentially allows AI programs to exceed their training, enabling them to incorporate new sources of information into their decision-making and content generation, and helps them connect to external tools. The Radar MCP server allows MCP clients to gain access to Radar data and tools, enabling exploration using natural language queries.
Radar’s URL Scanner has proven to be one of its most popular tools, scanning millions of sites since launching in 2023. It allows users to safely determine whether a site may contain malicious content, as well as providing information on technologies used and insights into the site’s headers, cookies, and links. In addition to being available on Radar, it is also accessible through the API and MCP server.
Finally, Radar’s user interface has seen a number of improvements over the last several years, in service of improved usability and a better user experience. As new data sets and capabilities are launched, they are added to the search bar, allowing users to search not only for countries and ASNs, but also IP address prefixes, certificate authorities, bot names, IP addresses, and more. Initially launching with just a few default date ranges (such as last 24 hours, last 7 days, etc.), we’ve expanded the number of default options, as well as enabling the user to select custom date ranges of up to one year in length. And because the Internet is global, Radar should be too. In 2024, we launched internationalized versions of Radar, marking availability of the site in 14 languages/dialects, including downloaded and embedded content.
This is a sampling of the updates and enhancements that we have made to Radar over the last five years in support of Internet measurement, transparency, and resilience. These individual data sets and tools combine to provide one of the most comprehensive views of the Internet available. And we’re not close to being done. We’ll continue to bring additional visibility to the unseen ways that the Internet is changing by adding more tools, data sets, and visualizations, to help users answer more questions in areas including AI, performance, adoption and usage, and security.
Visit radar.cloudflare.com to explore all the great data sets, capabilities, and tools for yourself, and to use the Radar API or MCP server to incorporate Radar data into your own tools, sites, and applications. Keep an eye on the Radar changelog feed, Radar release notes, and the Cloudflare blog for news about the latest changes and launches, and don’t hesitate to reach out to us with feedback, suggestions, and feature requests.
(*The work was done when Keertana interned at Netflix.)
Introduction
This blog focuses on post-training generative recommender systems. Generative recommenders (GRs) represent a new paradigm in the field of recommendation systems (e.g. HSTU, OneRec). These models draw inspiration from recent advancements in transformer architectures used for language and vision tasks. They approach the recommendation problem, including both ranking and retrieval, as a sequential transduction task. This perspective enables generative training, where the model learns by imitating the next event in a sequence of user activities, thereby effectively modeling user behavior over time.
However, a key challenge with simply replicating observed user patterns is that it may not always lead to the best possible recommendations. User interactions are influenced by a variety of factors — such as trends, or external suggestions — and the system’s view of these interactions is inherently limited. For example, if a user tries a popular show but later indicates it wasn’t a good fit, a model that only imitates this behavior might continue to recommend similar content, missing the chance to enhance the user’s experience.
This highlights the importance of incorporating user preferences and feedback, rather than solely relying on observed behavior, to improve recommendation quality. In the context of recommendation systems, we benefit from a wealth of user feedback, which includes explicit signals such as ratings and reviews, as well as implicit signals like watch time, click-through rates, and overall engagement. This abundance of feedback serves as a valuable resource for improving model performance.
Given the recent success of reinforcement learning techniques in post-training large language models, such as DPO and GRPO, this study investigates whether similar methods can be applied to generative recommenders. Ultimately, our goal is to identify both the opportunities and challenges in using these techniques to enhance the quality and relevance of recommendations.
Unlike language models, post-training generative recommenders presents unique challenges. One of the most significant is the difficulty of obtaining counterfactual feedback in recommendation scenarios. The recommendation feedback is generated on-policy — that is, it reflects users’ real-time interactions with the system as they naturally use it. Since a typical user sequence can span weeks or even years of activity, it is impractical to ask users to review or provide feedback on hypothetical, counterfactual experiences. As a result, the absence of counterfactual data makes it challenging to apply post-training methods such as PPO or DPO, which require feedback from counterfactual user sequences.
Furthermore, post-training methods typically rely on a reward model — either implicit or explicit — to guide optimization. The quality of reward models heavily influences the effectiveness of post-training. In the context of recommendation systems, however, reward signals tend to be much noisier. For instance, if we use watch time as an implicit reward, it may not always accurately reflect user satisfaction: a viewer might stop watching a favorite show simply due to time constraints, while finishing a lengthy show doesn’t necessarily indicate genuine enjoyment.
To address these post-training challenges, we introduce a novel algorithm called Advantage-Weighted Supervised Fine-tuning (A-SFT). Our analysis first demonstrates that reward models in recommendation systems often exhibit higher uncertainty due to the issues discussed above. Rather than relying solely on these uncertain reward models, A-SFT combines supervised fine-tuning with the advantage function to more effectively guide post-training optimization. This approach proves especially effective when the reward model has high variance but still provides valuable directional signals. We benchmark A-SFT against four other representative methods, and our results show that A-SFT achieves better alignment between the pre-trained generative recommendation model and the reward model.
In Figure 1, we conceptualize the pros and cons of different post-training paradigms. For example, Online Reinforcement Learning is most useful when the reward model has a good generalization ability, and behavior cloning is suitable when no reward models are available. Using these algorithms under fitting use cases is the key to a successful post-training. For example, over-exploitation of noisy reward models will hurt task performance, as guidance from the reward models can be simply noise. Conversely, not leveraging a good reward model leaves out potential improvements. We find A-SFT fits the sweet point between offline reinforcement learning and behavior cloning, where it benefits from the directional signals in those noisy estimations and is less dependent on the reward accuracy.
Figure 1: The landscape of RL algorithms based on the reward models’ accuracy
Challenges in Post-training for Recommendation
Reinforcement Learning from Human Feedback (RLHF) is the most popular framework for post-training large language models. In this framework, human annotators evaluate and rank different outputs generated by a model. This feedback is then used to train a reward model that predicts how well a model output aligns with human preferences. This reward model then serves as a proxy for human judgment during reinforcement learning, guiding the model to generate outputs that are more likely to be preferred by humans.
While traditional RLHF methods like PPO or DPO are effective for aligning LLMs, there are several challenges in applying them directly to large-scale recommendation systems:
Lack of Counter-factual Observations
As in typical RLHF settings, collecting real-time feedback from a diverse user base across a wide range of items is both costly and impractical. The data in recommendation are generated by the real-time user interests. Any third-party annotators or even the user themselves lack the practical means to evaluate an alternative reality. For example, it is impractical to ask the Netflix users to evaluate hundreds of unseen movies. Consequently, we lack a live environment in which to perform reinforcement learning.
2. Noisy Reward Models
In addition to the limited counter-factual data, the recommendation task itself has a higher randomness by its nature. The recommendation data has less structure than language data. Users choose to watch some shows not because there is a grammar rule that nouns need to follow by the verbs. In fact, the users’ choices usually exhibit a level of permutation invariance, where swapping the order of events in the user history still makes a valid activity sequence. This randomness in the behaviors makes learning a good reward model extremely difficult. Often the reward models we learnt still have a large margin of errors.
Here is an ablation study we did on the reward model performance with O(Millions) users and O(Billions) of tokens. The reward model uses an open-sourced HSTU architecture in the convenience of reproducing this study. We adopt the standard RLHF approach of training a reward model using offline, human-collected feedback. We start by creating a proxy reward, scored on a scale from 1 to 5 in the convenience of understanding. This reward model is co-trained as a shallow reward head on top of the generative recommender. It predicts the reward for the most recently selected title based on a user’s interaction history. To evaluate its effectiveness, we compare the model’s performance against two simple baselines: (1) predicting the next reward as the average reward the user has given in their past interactions, and (2) predicting it as the average reward that all users have assigned to that particular title in previous interactions.
Table 1: Reward model performance metrics
We observe that the model’s predictions do not significantly outperform the simple baselines. This result is intuitive, as a user’s historical interactions typically cover only a small subset of titles, making it difficult to accurately predict their responses to the vast number of unexplored titles in the catalogue. We expect this to be a potential issue for any large recommendation systems where the ratio between explored and unexplored titles is very small.
3. Lack of Logged Policy
In recommendation systems, the policy that generated the logged data is typically unknown and cannot be directly estimated. Offline reinforcement learning methods often rely on Inverse Propensity Scoring (IPS) to debias such data by reweighting interactions according to the logging policy’s action probabilities. However, estimating the logging policy accurately is challenging and prone to error, which can introduce additional biases, and IPS itself is known to suffer from high variance. Consequently, offline RL approaches that depend on IPS are ill-suited for our setting.
Advantage Weighted Supervised Fine Tuning
Given the three challenges outlined above, we propose a new algorithm Advantage-Weighted SFT (A-SFT). It leverages a combination of supervised fine-tuning and advantage reweighting from reinforcement learning. The key observation is as follows. Despite the reward estimation for each individual event having a high uncertainty, we find the estimations of rewards contain directional signals between high-reward and low-reward events. These signals could help better align the model during post-training.
A central factor in this study is the generalization ability of the reward model. Better generalization enables more accurate predictions of user preferences for unseen titles, thereby making exploration more effective. For reward models with moderate to high generalization power, both online RL methods such as PPO and offline RL methods such as CQL can perform effectively. However, in our setting, reward model generalization is worse than the language counterparts’, which makes these algorithms less appropriate. In addition, the use of techniques like inverse propensity scoring (IPS) introduces a heightened risk of high-variance estimates, prompting us to exclude algorithms such as off-policy REINFORCE.
Our proposed method A-SFT does not rely on IPS. With no need of prior knowledge of the logging policy, it can be generally applied to cases where observation of the environments are limited or biased. This is particularly useful to the recommendation setting due to the user feedback loop and distribution shifts with time. Without knowing the logging policy, A-SFT still provides means to control the policy deviation between the current policy and logging policy by tuning the parameter. This design provides essential means to control the learnt bias from uncertain reward models. We show that A-SFT outperforms baseline behavior cloning by directly optimizing observed rewards.
The advantage-weighted SFT algorithm is as follows:
For the results presented in this blog post, we treat the recommendation problem as a contextual bandit, i.e. given a history of user interactions as the context, can we recommend a high reward next title recommendation for the user?
Benchmarks
We compared representative algorithms including PPO, IPO, DPO, CQL and SFT as the baselines:
Reward weighted Behavior Cloning: This benchmark algorithm modifies supervised fine-tuning (SFT) by weighting the loss with the raw rewards of the chosen item instead of weighing the loss with advantage as in the proposed algorithm.
Rejection Sampling Direct Preference Optimization / Identity Preference Optimization (RS DPO/IPO): this is a variant of DPO/IPO where, for each user history x, we generate contrasting response pairs by training an ensemble of reward models to estimate confidence intervals for the reward of multiple potential responses y. If the lower bound of the reward confidence interval for one response is less than the upper bound for another response, then this pair is used to train DPO/IPO.
Conservative Q-Learning (CQL): This is a standard offline algorithm that learns a conservative Q function, penalizing overestimation of Q-values, particularly in regions of the state-action space with little or no reward data.
Proximal Policy Optimization (PPO): This is a standard RLHF (Reinforcement Learning from Human Feedback) algorithm that uses reward models as an online environment. PPO learns an advantage function and optimizes the policy to maximize expected reward while maintaining proximity to the initial policy.
We sampled a separate test set of O(Millions) users. This test set is collected on a future date after the training.
Offline Evaluation Results
We evaluate our algorithm on a dataset of high-reward user trajectories. For sake of simplicity, we consider a trajectory to have a high reward if the accumulated reward is higher than the median of the population. We present the following metrics for the held out test dataset:
NDCG@k: This measures the ranking quality of the recommended items up to position k. It accounts for the position of relevant items in the recommendation list, assigning higher scores when relevant items appear higher in the ranking. The gain is discounted logarithmically at lower ranks, and the result is normalized by the ideal ranking (i.e., the best possible ordering of items).
HR@k: This measures the proportion of test cases in which the ground-truth chosen item y appears in the top k recommendations. It is a binary metric per test case (hit or miss) and is averaged over all test cases.
MRR: MRR evaluates the ranking quality by measuring the reciprocal of the rank at which the chosen item appears in the recommendation list. The metric is averaged across all test cases.
Reward Model as A Judge: We use the reward model to evaluate the policy for future user events. We propose to use an ensemble of reward models for the evaluation to increase confidence. The result is based on the discounted reward generated for a few steps. The standard deviation is less than 4%.
We measure the percentage improvement in each metric compared to the baseline, Reward Weighted Behavior Cloning(BC). We notice that advantage weighted SFT shows the largest improvement in metrics, outweighing BC as well as reward model dependent algorithms like CQL, PPO, DPO and IPO.
Our experiments show that advantage weighted SFT is a simple but promising approach for post-training generative recommenders as it deals with the issue of poor reward model generalizations and lack of IPS. More specifically, we find PPO, IPO and DPO achieve a good reward score, but also causes the overfitting from the reward model. Conservative Q-Learning achieves more robust improvements but does not fully capture the potential signals in the reward modeling. A-SFT achieved both better recommendation metrics and reward scores.
In early September 2025, attackers used a phishing email to compromise one or more trusted maintainer accounts on npm. They used this to publish malicious releases of 18 widely used npm packages (for example chalk, debug, ansi-styles) that account for more than 2 billion downloads per week. Websites and applications that used these compromised packages were vulnerable to hackers stealing crypto assets (“crypto stealing” or “wallet draining”) from end users. In addition, compromised packages could also modify other packages owned by the same maintainers (using stolen npm tokens) and included code to steal developer tokens for CI/CD pipelines and cloud accounts.
As it relates to end users of your applications, the good news is that Cloudflare Page Shield, our client-side security offering will detect compromised JavaScript libraries and prevent crypto-stealing. More importantly, given the AI powering Cloudflare’s detection solutions, customers are protected from similar attacks in the future, as we explain below.
Excerpt from the injected malicious payload, along with the rest of the innocuous normal code.Among other things, the payload replaces legitimate crypto addresses with attacker’s addresses (for multiple currencies, including bitcoin, ethereum, solana).
Finding needles in a 3.5 billion script haystack
Everyday, Cloudflare Page Shield assesses 3.5 billion scripts per day or 40,000 scripts per second. Of these, less than 0.3% are malicious, based on our machine learning (ML)-based malicious script detection. As explained in a prior blog post, we preprocess JavaScript code into an Abstract Syntax Tree to train a message-passing graph convolutional network (MPGCN) that classifies a given JavaScript file as either malicious or benign.
The intuition behind using a graph-based model is to use both the structure (e.g. function calling, assertions) and code text to learn hacker patterns. For example, in the npm compromise, the malicious code injected in compromised packages uses code obfuscation and also modifies code entry points for crypto wallet interfaces, such as Ethereum’s window.ethereum, to swap payment destinations to accounts in the attacker’s control. Crucially, rather than engineering such behaviors as features, the model learns to distinguish between good and bad code purely from structure and syntax. As a result, it is resilient to techniques used not just in the npm compromise but also future compromise techniques.
Our ML model outputs the probability that a script is malicious which is then transformed into a score ranging from 1 to 99, with low scores indicating likely malicious and high scores indicating benign scripts. Importantly, like many Cloudflare ML models, inferencing happens in under 0.3 seconds.
Model Evaluation
Since the initial launch, our JavaScript classifiers are constantly being evolved to optimize model evaluation metrics, in this case, F1 measure. Our current metrics are
Metric
Latest: Version 2.7
Improvement over prior version
Precision
98%
5%
Recall
90%
233%
F1
94%
123%
Some of the improvements were accomplished through:
More training examples, curated from a combination of open source datasets, security partners, and labeling of Cloudflare traffic
Better training examples, for instance, by removing samples with pure comments in them or scripts with nearly equal structure
Better training set stratification, so that training, validation and test sets all have similar distribution of classes of interest
Tweaking the evaluation criteria to maximize recall with 99% precision
Given the confusion matrix, we should expect about 2 false positives per second, if we assume ~0.3% of the 40,000 scripts per second are flagged as malicious. We employ multiple LLMs alongside expert human security analysts to review such scripts around the clock. Most False Positives we encounter in this way are rather challenging. For example, scripts that read all form inputs except credit card numbers (e.g. reject input values that test true using the Luhn algorithm), injecting dynamic scripts, heavy user tracking, heavy deobfuscation, etc. User tracking scripts often exhibit a combination of these behaviors, and the only reliable way to distinguish truly malicious payloads is by assessing the trustworthiness of their connected domains. We feed all newly labeled scripts back into our ML training (& testing) pipeline.
Most importantly, we verified that Cloudflare Page Shield would have successfully detected all 18 compromised npm packages as malicious (a novel attack, thus, not in the training data)..
Planned improvements
Static script analysis has proven effective and is sometimes the only viable approach (e.g., for npm packages). To address more challenging cases, we are enhancing our ML signals with contextual data including script URLs, page hosts, and connected domains. Modern Agentic AI approaches can wrap JavaScript runtimes as tools in an overall AI workflow. Then, they can enable a hybrid approach that combines static and dynamic analysis techniques to tackle challenging false positive scenarios, such as user tracking scripts.
In the npm attack, we did not see any activity in the Cloudflare network related to this compromise among Page Shield users, though for other exploits, we catch its traffic within minutes. In this case, patches of the compromised npm packages were released in 2 hours or less, and given that the infected payloads had to be built into end user facing applications for end user impact, we suspect that our customers dodged the proverbial bullet. That said, had traffic gotten through, Page Shield was already equipped to detect and block this threat.
Also make sure to consult our Page Shield Script detection to find malicious packages. Consult the Connections tab within Page Shield to view suspicious connections made by your applications.
Several scripts are marked as malicious.
Several connections are marked as malicious.
And be sure to complete the following steps:
Audit your dependency tree for recently published versions (check package-lock.json / npm ls) and look for versions published around early–mid September 2025 of widely used packages.
Rotate any credentials that may have been exposed to your build environment.
Revoke and reissue CI/CD tokens and service keys that might have been used in build pipelines (GitHub Actions, npm tokens, cloud credentials).
Pin dependencies to known-good versions (or use lockfiles), and consider using a package allowlist / verified publisher features from your registry provider.
Scan build logs and repos for suspicious commits/GitHub Actions changes and remove any unknown webhooks or workflows.
While vigilance is key, automated defenses provide a crucial layer of protection against fast-moving supply chain attacks. Interested in better understanding your client-side supply chain? Sign up for our free, custom Client-Side Risk Assessment.
The era of agentic commerce is coming, and it brings with it significant new challenges for security. That’s why Cloudflare is partnering with Visa and Mastercard to help secure automated commerce as AI agents search, compare, and purchase on behalf of consumers.
Through our collaboration, Visa developed the Trusted Agent Protocol and Mastercard developed Agent Pay to help merchants distinguish legitimate, approved agents from malicious bots. Both Trusted Agent Protocol and Agent Pay leverage Web Bot Auth as the agent authentication layer to allow networks like Cloudflare to verify traffic from AI shopping agents that register with a payment network.
The challenges with agentic commerce
Agentic commerce is commerce driven by AI agents. As AI agents execute more transactions, merchants need to protect themselves and maintain trust with their customers. Merchants are beginning to see the promise of agentic commerce but face significant challenges:
How can they distinguish a helpful, approved AI shopping agent from a malicious bot or web crawler?
Is the agent representing a known, repeat customer or someone entirely new?
Are there particular instructions the consumer gave to their agent that the merchant should respect?
We are working with Visa and Mastercard, two of the most trusted consumer brands in payments, to address each of these challenges.
Web Bot Auth is the foundation to securing agentic commerce
In May, we shared a new proposal called Web Bot Auth to cryptographically authenticate agent traffic. Historically, agent traffic has been classified using the user agent and IP address. However, these fields can be spoofed, leading to inaccurate classifications and bot mitigations can be applied inaccurately. Web Bot Auth allows an agent to provide a stable identifier by using HTTP Message Signatures with public key cryptography.
As we spent time collaborating with the teams at Visa and Mastercard, we found that we could leverage Web Bot Auth as the foundation to ensure that each commerce agent request was verifiable, time-based, and non-replayable.
Visa’s Trusted Agent Protocol and Mastercard’s Agent Pay present three key solutions for merchants to manage agentic commerce transactions. First, merchants can identify a registered agent and distinguish whether a particular interaction is intended to browse or to pay. Last, merchants can indicate to agents how a payment is expected, whether that is through a network token, browser-use guest checkout, or a micropayment.
This allows merchants that integrate with these protocols to instantly recognize a trusted agent during two key interactions: the initial browsing phase to determine product details and final costs, and the final payment interaction to complete a purchase. Ultimately, this provides merchants with the tools to verify these signatures, identify trusted interactions, and securely manage how these agents can interact with their site.
How it works: leveraging HTTP message signatures
To make this work, an ecosystem of participants need to be on the same page. It all starts with agentdevelopers, who build the agents to shop on behalf of consumers. These agents then interact with merchants, who need a reliable way to assess the request is made on behalf of consumers. Merchants rely on networks like Cloudflare to verify the agent’s cryptographic signatures and ensure the interaction is legitimate. Finally, there are payment networks like Visa and Mastercard, who can link cardholder identity to agentic commerce transactions, helping ensure that transactions are verifiable and accountable.
When developing their protocols, Visa and Mastercard needed a secure way to authenticate each agent developer and securely transmit information from the agent to the merchant’s website. That’s where we came in and worked with their teams to build upon Web Bot Auth. Web Bot Auth proposals specify how developers of bots and agents can attach their cryptographic signatures in HTTP requests by using HTTP Message Signatures.
Both Visa and Mastercard protocols require agents to register and have their public keys (referenced as the keyid in the Signature-Input header) in a well-known directory, allowing merchants and networks to fetch the keys to validate these HTTP message signatures. To start, Visa and Mastercard will be hosting their own directories for Visa-registered and Mastercard-registered agents, respectively
The newly created agents then communicate their registration, identity, and payment details with the merchant using these HTTP Message Signatures. Both protocols build on Web Bot Auth by introducing a new tag that agents must supply in the Signature-Input header, which indicates whether the agent is browsing or purchasing. Merchants can use the tag to determine whether to interact with the agent. Agents must also include the nonce field, a unique sequence included in the signature, to provide protection against replay attacks.
An agent visiting a merchant’s website to browse a catalog would include an HTTP Message Signature in their request to verify their agent is authorized to browse the merchant’s storefront on behalf of a specific Visa cardholder:
Trusted Agent Protocol and Agent Pay are designed for merchants to benefit from its validation mechanisms without changing their infrastructure. Instead, merchants can set the rules for agent interactions on their site and rely upon Cloudflare as the validator. For these requests, Cloudflare will run the following checks:
Confirm the presence of the Signature-Input and Signature headers.
Pull the keyid from the Signature-Input. If Cloudflare has not previously retrieved and cached the key, fetch it from the public key directory.
Confirm the current time falls between the created and expires timestamps.
Check nonce uniqueness in the cache. By checking if a nonce has been recently used, Cloudflare can reject reused or expired signatures, ensuring the request is not a malicious copy of a prior, legitimate interaction.
Check the validity of the tag, as defined by the protocol. If the agent is browsing, the tag should be agent-browser-auth. If the agent is paying, the tag should be agent-payer-auth.
We recently introduced support for x402 transactions into Cloudflare’s Agent SDK, allowing anyone building an agent to easily transact using the new x402 protocol. We will similarly be working with Visa and Mastercard over the coming months to bring support for their protocols directly to the Agents SDK. This will allow developers to manage their registered agent’s private keys and to easily create the correct HTTP message signatures to authorize their agent to browse and transact on a merchant website.
Conceptually, the requests in a Cloudflare Worker would look something like this:
/**
* Pseudocode example of a Cloudflare Worker acting as a trusted agent.
* This version explicitly illustrates the signing logic to show the core flow.
*/
// Helper function to encapsulate the signing protocol logic.
async function createSignatureHeaders(targetUrl, credentials) {
// Internally, this function would perform the detailed cryptographic steps:
// 1. Generate timestamps and a unique nonce.
// 2. Construct the 'Signature-Input' header string with all required parameters.
// 3. Build the canonical 'Signature Base' string according to the spec.
// 4. Use the private key to sign the base string.
// 5. Return the fully formed 'Signature-Input' and 'Signature' headers.
const signedHeaders = new Headers();
signedHeaders.set('Signature-Input', 'sig2=(...); keyid="..."; ...');
signedHeaders.set('Signature', 'sig2=:...');
return signedHeaders;
}
export default {
async fetch(request, env) {
// 1. Load the final API endpoint and private signing credentials.
const targetUrl = new URL(request.url).searchParams.get('target');
const credentials = {
privateKey: env.PAYMENT_NETWORK_PRIVATE_KEY,
keyId: env.PAYMENT_NETWORK_KEY_ID
};
// 2. Generate the required signature headers using the helper.
const signatureHeaders = await createSignatureHeaders(targetUrl, credentials);
// 3. Attach the newly created signature headers to the request for authentication.
const signedRequestHeaders = new Headers(request.headers);
signedRequestHeaders.set('Host', new URL(targetUrl).hostname);
signedRequestHeaders.set('Signature-Input', signatureHeaders.get('Signature-Input'));
signedRequestHeaders.set('Signature', signatureHeaders.get('Signature'));
// 4. Forward the fully signed request to the protected API.
return fetch(targetUrl, { headers: signedRequestHeaders });
},
};
We’ll also be creating new managed rulesets for our customers that make it easy to allow agents that are using the Trusted Agent Protocol or Agent Pay. You might want to disallow most automated traffic to your storefront but not miss out on revenue opportunities from agents authorized to make a purchase on behalf of a cardholder. A managed rule would make this straightforward to implement. As the website owner, you could enable a managed rule that automatically allows all trusted agents registered with Visa or Mastercard to come to your site, passing your other bot protection & WAF rules.
These will continue to evolve, and we will incorporate feedback to ensure that agent registration and validation works seamlessly across all networks and aligns with the Web Bot Auth proposal. American Express will also be leveraging Web Bot Auth as the foundation to their agentic commerce offering.
The OODA loop—for observe, orient, decide, act—is a framework to understand decision-making in adversarial situations. We apply the same framework to artificial intelligence agents, who have to make their decisions with untrustworthy observations and orientation. To solve this problem, we need new systems of input, processing, and output integrity.
Many decades ago, U.S. Air Force Colonel John Boyd introduced the concept of the “OODA loop,” for Observe, Orient, Decide, and Act. These are the four steps of real-time continuous decision-making. Boyd developed it for fighter pilots, but it’s long been applied in artificial intelligence (AI) and robotics. An AI agent, like a pilot, executes the loop over and over, accomplishing its goals iteratively within an ever-changing environment. This is Anthropic’s definition: “Agents are models using tools in a loop.”1
OODA Loops for Agentic AI
Traditional OODA analysis assumes trusted inputs and outputs, in the same way that classical AI assumed trusted sensors, controlled environments, and physical boundaries. This no longer holds true. AI agents don’t just execute OODA loops; they embed untrusted actors within them. Web-enabled large language models (LLMs) can query adversary-controlled sources mid-loop. Systems that allow AI to use large corpora of content, such as retrieval-augmented generation (https://en.wikipedia.org/wiki/Retrieval-augmented_generation), can ingest poisoned documents. Tool-calling application programming interfaces can execute untrusted code. Modern AI sensors can encompass the entire Internet; their environments are inherently adversarial. That means that fixing AI hallucination is insufficient because even if the AI accurately interprets its inputs and produces corresponding output, it can be fully corrupt.
In 2022, Simon Willison identified a new class of attacks against AI systems: “prompt injection.”2 Prompt injection is possible because an AI mixes untrusted inputs with trusted instructions and then confuses one for the other. Willison’s insight was that this isn’t just a filtering problem; it’s architectural. There is no privilege separation, and there is no separation between the data and control paths. The very mechanism that makes modern AI powerful—treating all inputs uniformly—is what makes it vulnerable. The security challenges we face today are structural consequences of using AI for everything.
Insecurities can have far-reaching effects. A single poisoned piece of training data can affect millions of downstream applications. In this environment, security debt accrues like technical debt.
AI security has a temporal asymmetry. The temporal disconnect between training and deployment creates unauditable vulnerabilities. Attackers can poison a model’s training data and then deploy an exploit years later. Integrity violations are frozen in the model. Models aren’t aware of previous compromises since each inference starts fresh and is equally vulnerable.
AI increasingly maintains state—in the form of chat history and key-value caches. These states accumulate compromises. Every iteration is potentially malicious, and cache poisoning persists across interactions.
Agents compound the risks. Pretrained OODA loops running in one or a dozen AI agents inherit all of these upstream compromises. Model Context Protocol (MCP) and similar systems that allow AI to use tools create their own vulnerabilities that interact with each other. Each tool has its own OODA loop, which nests, interleaves, and races. Tool descriptions become injection vectors. Models can’t verify tool semantics, only syntax. “Submit SQL query” might mean “exfiltrate database” because an agent can be corrupted in prompts, training data, or tool definitions to do what the attacker wants. The abstraction layer itself can be adversarial.
For example, an attacker might want AI agents to leak all the secret keys that the AI knows to the attacker, who might have a collector running in bulletproof hosting in a poorly regulated jurisdiction. They could plant coded instructions in easily scraped web content, waiting for the next AI training set to include it. Once that happens, they can activate the behavior through the front door: tricking AI agents (think a lowly chatbot or an analytics engine or a coding bot or anything in between) that are increasingly taking their own actions, in an OODA loop, using untrustworthy input from a third-party user. This compromise persists in the conversation history and cached responses, spreading to multiple future interactions and even to other AI agents. All this requires us to reconsider risks to the agentic AI OODA loop, from top to bottom.
Observe: The risks include adversarial examples, prompt injection, and sensor spoofing. A sticker fools computer vision, a string fools an LLM. The observation layer lacks authentication and integrity.
Orient: The risks include training data poisoning, context manipulation, and semantic backdoors. The model’s worldview—its orientation—can be influenced by attackers months before deployment. Encoded behavior activates on trigger phrases.
Decide: The risks include logic corruption via fine-tuning attacks, reward hacking, and objective misalignment. The decision process itself becomes the payload. Models can be manipulated to trust malicious sources preferentially.
Act: The risks include output manipulation, tool confusion, and action hijacking. MCP and similar protocols multiply attack surfaces. Each tool call trusts prior stages implicitly.
AI gives the old phrase “inside your adversary’s OODA loop” new meaning. For Boyd’s fighter pilots, it meant that you were operating faster than your adversary, able to act on current data while they were still on the previous iteration. With agentic AI, adversaries aren’t just metaphorically inside; they’re literally providing the observations and manipulating the output. We want adversaries inside our loop because that’s where the data are. AI’s OODA loops must observe untrusted sources to be useful. The competitive advantage, accessing web-scale information, is identical to the attack surface. The speed of your OODA loop is irrelevant when the adversary controls your sensors and actuators.
Worse, speed can itself be a vulnerability. The faster the loop, the less time for verification. Millisecond decisions result in millisecond compromises.
The Source of the Problem
The fundamental problem is that AI must compress reality into model-legible forms. In this setting, adversaries can exploit the compression. They don’t have to attack the territory; they can attack the map. Models lack local contextual knowledge. They process symbols, not meaning. A human sees a suspicious URL; an AI sees valid syntax. And that semantic gap becomes a security gap.
Prompt injection might be unsolvable in today’s LLMs. LLMs process token sequences, but no mechanism exists to mark token privileges. Every solution proposed introduces new injection vectors: Delimiter? Attackers include delimiters. Instruction hierarchy? Attackers claim priority. Separate models? Double the attack surface. Security requires boundaries, but LLMs dissolve boundaries. More generally, existing mechanisms to improve models won’t help protect against attack. Fine-tuning preserves backdoors. Reinforcement learning with human feedback adds human preferences without removing model biases. Each training phase compounds prior compromises.
This is Ken Thompson’s “trusting trust” attack all over again.3 Poisoned states generate poisoned outputs, which poison future states. Try to summarize the conversation history? The summary includes the injection. Clear the cache to remove the poison? Lose all context. Keep the cache for continuity? Keep the contamination. Stateful systems can’t forget attacks, and so memory becomes a liability. Adversaries can craft inputs that corrupt future outputs.
This is the agentic AI security trilemma. Fast, smart, secure; pick any two. Fast and smart—you can’t verify your inputs. Smart and secure—you check everything, slowly, because AI itself can’t be used for this. Secure and fast—you’re stuck with models with intentionally limited capabilities.
This trilemma isn’t unique to AI. Some autoimmune disorders are examples of molecular mimicry—when biological recognition systems fail to distinguish self from nonself. The mechanism designed for protection becomes the pathology as T cells attack healthy tissue or fail to attack pathogens and bad cells. AI exhibits the same kind of recognition failure. No digital immunological markers separate trusted instructions from hostile input. The model’s core capability, following instructions in natural language, is inseparable from its vulnerability. Or like oncogenes, the normal function and the malignant behavior share identical machinery.
Prompt injection is semantic mimicry: adversarial instructions that resemble legitimate prompts, which trigger self-compromise. The immune system can’t add better recognition without rejecting legitimate cells. AI can’t filter malicious prompts without rejecting legitimate instructions. Immune systems can’t verify their own recognition mechanisms, and AI systems can’t verify their own integrity because the verification system uses the same corrupted mechanisms.
In security, we often assume that foreign/hostile code looks different from legitimate instructions, and we use signatures, patterns, and statistical anomaly detection to detect it. But getting inside someone’s AI OODA loop uses the system’s native language. The attack is indistinguishable from normal operation because it is normal operation. The vulnerability isn’t a defect—it’s the feature working correctly.
Where to Go Next?
The shift to an AI-saturated world has been dizzying. Seemingly overnight, we have AI in every technology product, with promises of even more—and agents as well. So where does that leave us with respect to security?
Physical constraints protected Boyd’s fighter pilots. Radar returns couldn’t lie about physics; fooling them, through stealth or jamming, constituted some of the most successful attacks against such systems that are still in use today. Observations were authenticated by their presence. Tampering meant physical access. But semantic observations have no physics. When every AI observation is potentially corrupted, integrity violations span the stack. Text can claim anything, and images can show impossibilities. In training, we face poisoned datasets and backdoored models. In inference, we face adversarial inputs and prompt injection. During operation, we face a contaminated context and persistent compromise. We need semantic integrity: verifying not just data but interpretation, not just content but context, not just information but understanding. We can add checksums, signatures, and audit logs. But how do you checksum a thought? How do you sign semantics? How do you audit attention?
Computer security has evolved over the decades. We addressed availability despite failures through replication and decentralization. We addressed confidentiality despite breaches using authenticated encryption. Now we need to address integrity despite corruption.4
Trustworthy AI agents require integrity because we can’t build reliable systems on unreliable foundations. The question isn’t whether we can add integrity to AI but whether the architecture permits integrity at all.
AI OODA loops and integrity aren’t fundamentally opposed, but today’s AI agents observe the Internet, orient via statistics, decide probabilistically, and act without verification. We built a system that trusts everything, and now we hope for a semantic firewall to keep it safe. The adversary isn’t inside the loop by accident; it’s there by architecture. Web-scale AI means web-scale integrity failure. Every capability corrupts.
Integrity isn’t a feature you add; it’s an architecture you choose. So far, we have built AI systems where “fast” and “smart” preclude “secure.” We optimized for capability over verification, for accessing web-scale data over ensuring trust. AI agents will be even more powerful—and increasingly autonomous. And without integrity, they will also be dangerous.
This chilling paragraph is in a comprehensive Brookings report about the use of tech to deport people from the US:
The administration has also adapted its methods of social media surveillance. Though agencies like the State Department have gathered millions of handles and monitored political discussions online, the Trump administration has been more explicit in who it’s targeting. Secretary of State Marco Rubio announced a new, zero-tolerance “Catch and Revoke” strategy, which uses AI to monitor the public speech of foreign nationals and revoke visas of those who “abuse [the country’s] hospitality.” In a March press conference, Rubio remarked that at least 300 visas, primarily student and visitor visas, had been revoked on the grounds that visitors are engaging in activity contrary to national interest. A State Department cable also announced a new requirement for student visa applicants to set their social media accounts to public—reflecting stricter vetting practices aimed at identifying individuals who “bear hostile attitudes toward our citizens, culture, government, institutions, or founding principles,” among other criteria.
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional
Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.