Tag Archives: Content Independence Day

Your site, your rules: new AI traffic options for all customers

2026-07-01 Jin-Hee Lee

Post Syndicated from Jin-Hee Lee original https://blog.cloudflare.com/content-independence-day-ai-options/

One year ago, we declared the first Content Independence Day, and we gave website owners the means to take back control of their content. The deal between crawlers and website owners that had held up for 30 years — we crawl you, and you get referrals — was no longer true. AI was taking everything and sending back nothing, presenting an existential threat to website owners. And so we launched a one-click “Block AI Bots” option, along with a Pay-Per-Crawl marketplace.

A lot has changed in a year. Last July, conversations around “AI bots” centered around blocking AI training without compensation, pointing to the win–lose deal where content was used for model training with no value driven back to the website owner. But a desire for more nuance has emerged: Content owners still want to be able to protect their content, and they should be compensated for the original content that they work hard to create, curate, and share. We also know that locking down content isn’t a one-size-fits-all solution; website owners want more options than resorting to “block all automation, every time.”

If you run a small site, the problem isn’t just that someone could train models on your content — it’s that nobody can find you in the first place. So you have to make a Faustian bargain: either show up in search and let AI train on you, or risk losing discoverability. This unfairly advantages incumbent search providers if they use the same bots for both search and training; and this unfair advantage incentivizes new players to be evasive as they try to close the competitive gap.

Now, AI can be anything

Today, AI can be in anything. Google search has changed from being sorted by AI to being a full answer engine that answers your question directly on the results page. And Google is not unique in this position — this is the direction in which “search” is moving.

We could debate the cutoff for what qualifies as “AI” today, just to find that the standard changes tomorrow. So, instead of defining a bot primarily as “AI” or not, our updated approach to classification will ask deeper questions about bot or agent behavior: What are they doing on my site? What are they storing? And how will they reshare my content?

A pragmatic taxonomy

To address these questions, we need a more nuanced view — a pragmatic taxonomy that aligns with the AI use cases our customers care about. So we are opening the discussion beyond AI training alone and focusing on three AI use cases that we want all customers to be able to manage:

Search: any behavior that collects or indexes your content, so it can answer questions about it later. The key is that Search is proactively building a database of your site to later respond to queries with. Site owners should expect to get referral traffic or other equitable compensation as a result.
Agent: automated behavior that is acting, usually in real time, on a person’s behalf, to get something done right now. This includes chat fetch bots (e.g., ChatGPT-User) and browser-use agents (e.g., Gemini or Claude driving Chrome). The key is that it visits your web application in order to complete a job, and often there’s a human waiting on the other end.
Training: a crawler taking your content to train or fine-tune a model. The key is that your data is permanently absorbed into the underlying architecture of the AI to improve its capabilities.

Many popular crawlers on the web fall into one of the classifications above; some fall into multiple. We classify plenty of other behaviors beyond the three above — including ads verification, feed fetching, and agentic transactions (more on this below). But we believe it should be simple for all website owners to manage access for these three AI-centered use cases. We believe that bot operators should separate their crawlers because that creates more transparency for website owners: allowing them to better understand why a given crawler is visiting them, as well as to better manage the access they extend to that crawler. If a company runs automation that builds Search indexes, acts as an Agent, and collects data to Train their models, then we strongly encourage that company to separate the automation into three separate crawlers.

We want a classification system that is scalable and representative of the world of automated traffic as it evolves. Tracking a bot’s purposes is nothing new, but our new taxonomy involves a few updates that better represent the state of bot traffic today. Most notably, we want to recognize that bots that have multiple purposes should be tracked with all purposes, not just one of them.

New options to manage AI traffic

We want to provide more options for managing different kinds of AI traffic, to all website owners on the Cloudflare network.

The managed preset to “Block AI bots” that we’ve announced in the past included single-purpose bots that crawled data for model training, as shown below:

^{Screenshot of the existing setting to manage AI bot traffic on July 1, 2025.}

But not all AI use is the same, and we want our customers to have the controls they need. So, we’re launching the ability to manage AI traffic based on three major use cases: Search, Agent, and Training crawlers. With these new options, our customers can more finely tune how they manage AI bot traffic — including customers on our Free tier.

^{Screenshot of the new options to manage AI bot traffic on July 1, 2026.}

Setting new defaults

On September 15, 2026, we’ll be setting new defaults for each of these three classifications. For all new domains onboarding to Cloudflare, the categories of Training and Agent will be blocked by default on the pages that display ads, while Search will remain allowed by default.

An ad is a signal that a website owner meant for a person to land there and see it — something monetizable that fuels the business. So, on those pages, we treat human attention as the end goal, and keep away the bots that may prevent this attention (i.e., Training and Agent bots). On the other hand, Search is the behavior that most naturally funnels back visitors, and we believe it’s in the interest of most site owners to allow this.

Another change that will apply on September 15 is that multi-purpose crawlers (specifically those that combine Search with Training) will be allowed/blocked according to all of their behaviors, in line with our call for transparency for website owners. Since the defaults will be enforced by the most restrictive applicable rules, multi-purpose crawlers such as Googlebot, Applebot, and BingBot will be blocked by customers who have selected to block Training (either through the new options to manage AI traffic, or through the legacy Block AI bots service).

Of course, customer choice is paramount: if a website owner wants to opt out of these new default configurations, they can easily mark this in their Security settings any time leading up to September 15, which will confirm that they want no changes on Training crawlers that also crawl for Search purposes. We’ll also continue to notify customers of the upcoming change to defaults as we approach September 15 to ensure that customers who want to choose settings different from the defaults have the opportunity to do so.

BotBase: a new visibility plane for Enterprise customers

We’re also excited to launch a major visibility update as a new feature of Enterprise Bot Management. As Cloudflare’s directory of tracked bots has grown, so has the desire to manage these bots in sensible groupings and to understand more detail about a particular bot.

Introducing BotBase. BotBase is our new database tracking all known bots, including Verified bots and agents. This database provides a comprehensive, searchable view of our entire directory of bots, directly on the Cloudflare dashboard. We’re tackling visibility first, but, later this year, we’ll expand BotBase to provide a direct control center for known automated content on your website.

With this new view, Enterprise Bot Management customers can see the full catalogue of all Verified bots/agents and where they are classified in this updated taxonomy — a view we’ve never shown dynamically on the Cloudflare dashboard before. Customers who want to precisely target a specific bot can also easily filter for all traffic from this bot, plus copy the detection ID to use in Security rules. All of this is now live within a dedicated page, which can be accessed through the Bot Management configuration card.

As we built BotBase, we wanted to account for all of the pieces of information that would allow us to build scalable, powerful insights from bot to bot. One of these pieces is a cornerstone for our updated taxonomy, which is based on what a bot may do on your site — its behavior. We separate these classifications as shared below, and each bot is classified with one or more of these behaviors.

Bot classification	Behaviors and uses
*Search*	*Crawling to scan your site to help it appear in search engine results*
*Agent*	*User-directed agents visiting a page on behalf of a human*
*Training*	*Crawling to train or fine-tune models*
Transact	Checkout actions on behalf of users
Data Collection	Includes price scraping, competitive intelligence gathering, and third-party analytics
Security Testing	Includes vulnerability scanning and penetration testing
SEO	SEO crawling, site auditing, accessibility checks
Ads Verification	Ad placement verification, ad fraud detection
Social / Link Preview	Link previews for social platforms and messaging apps
Feed Fetching	Includes RSS readers, podcast aggregators, and news feed bots
Monitoring & Operations	Includes uptime monitoring, webhooks, and health checks

^{Bold italicized rows indicate the new configurable options that are available to all customers.}

How does a crawler use my content?

Another piece of information we’ve heard is important to our customers is a bot’s content use — what a bot may keep and reshare after it has crawled your content. To address this, we are building capabilities for Bot Management customers to select and block based on the “content use.” This setting can be set to one of three levels, from least to most permissive:

immediate — interact, but store and reuse nothing
reference (default) — index, excerpt, and link back
full — summarize and reproduce

These values can be combined with bot classifications to express nuanced rules, such as “allow all bots that are used for Search, SEO, and Ads Verification, but only up to the reference use level.” This allows website owners to make decisions in sensible groupings rather than manage individual bot-by-bot rules.

To further support this, starting today, we’re testing a new signal, use, that extends Content Signals and lives in your robots.txt. This extends the three fields of the first version of Content Signals with a fourth, optional field that expresses the same preference as above:

use=immediate
use=reference
use=full

As with all other items listed in the robots.txt file, the values of content use signal a website owner’s preference, rather than issuing blocks directly. We’re now adding support for this extension: all customers who have already enabled managed robots.txt — which prepends the preference to robots.txt that crawling for search is okay, but that crawling for training is not — will now have the additional preference of use=reference added to their robots.txt.

# Cloudflare Managed content with original Content Signals

User-agent: *
Content-Signal: search=yes,ai-train=no
Allow: /

^{The contents of Cloudflare managed robots.txt with the original Content Signals values.}

# Cloudflare Managed content with the new content-use signal

User-agent: *
Content-Signal: search=yes,ai-train=no,use=reference
Allow: /

^{The contents of Cloudflare managed robots.txt with the added parameter.}

We’re also starting to track content uses for every bot in BotBase, and when we discover a bot abusing these signals, it will lose the “Verified” status, resulting in it no longer being allowed. Today, bots that reproduce in full cannot have the Verified status.

What does it mean for a bot to be Verified?

Speaking of “Verified,” the definition of Verified is being updated to reflect the upcoming changes to default allow and block baselines. Previously, all Verified bots were allowed by default, which was reflected in our basic Bot Fight Mode offering to block unwanted automatic traffic and in our rule templates for Enterprise Bot Management customers.

Starting today, we’re adjusting this to add nuance: non-verified bots are still default blocked, but we are no longer viewing Verified as “default allowed.” Now, the Verified label makes a bot allowable with its relevant category, meaning the allowed category (e.g., allowing Search) will determine what is allowed to access a website.

To balance this change, we’re opening up the process of becoming a Verified bot, and making it more transparent, too. To “Verify” a bot, a bot operator needs to show two things: that you represent yourself honestly, and you don’t abuse the access that honesty earns. And to make this easier on bot operators, we’re currently building management tools for bot operators to better ensure they are accurately represented by Cloudflare’s classification system (to be announced in the near future).

^{A preview screenshot of the upcoming platform built directly for bot operators who are part of or want to be a part of BotBase, the next generation of the Cloudflare Bots Directory.}

Experimenting with transitive trust

One more piece: The bot (or agent) at your door increasingly isn’t run by the company that built it. A platform like Cloudflare’s Developer Platform runs automations for thousands of different operators at once, ranging from enterprises to a developer you’ve never heard of. You might trust Stripe, but you don’t necessarily trust everyone who wired Stripe’s tools into a weekend project.

We call the case of (site owner → bot owning company → end user) a matter of transitive trust, and we’re proposing to utilize the existing Forwarded header as defined in RFC 7239 that rides along with the request and allows “proxy components to disclose information lost in the proxying process.”

This is similar to what X-Forwarded-For does for IP addresses, or X-Forwarded-Host does to preserve the original Host header. So when a website owner says, “Allow this operator,” that preference will hold, whether the operator comes to you directly or through three layers of intermediaries that are trusted. More details can be found in our documentation, with a brief example to show the format below.

Forwarded: for="openai"

Adding the extension with content-use discussed above, the header addition would look something like the below, specifying how the operator says they will use the content they access:

Forwarded: for="openai";use="reference"

This also lines up the incentive model we want to foster. Losing trusted status across the more than 20% of web domains that sit behind Cloudflare is a deterrent with teeth. Trust becomes something you can carry with you, and something you can lose.

However, as bot traffic blends with human traffic, it’s possible that this system of transitive trust doesn’t carry beyond the users who can afford to be identifiable. The measures we are proposing today help to convey trust, but they won’t fit the entire web for all time. Small sources of traffic need privacy, and companies that want to preserve their own privacy commitments should be able to explore fair building blocks for the future of an agentic Internet, such as private rate limiting.

Set your terms today

These are small changes that move in the same direction: site owners get more control over who uses their content, and how. We believe the new defaults we discussed today and will soon implement are ones that encourage transparency and are more reflective of where the world is going.

Of course, the ebbs and flows of the web will continue shifting under us, and we’ll keep adjusting with it. But the direction won’t change, because it’s the one Cloudflare started with: a web ecosystem built around trust. Where the people who make things can decide how they’re used — and one where being honest about what you do earns you more access, not less.

These new options to manage AI traffic are live now, and can be configured by all existing customers in their zone Settings. Not on Cloudflare yet? Start for free to set the traffic controls that you want today.

Happy Content Independence Day.

Making AI search smarter

2026-07-01 Matthew Conroy

Post Syndicated from Matthew Conroy original https://blog.cloudflare.com/making-ai-search-smarter/

Search drives most experiences on the web. It’s how we get things done, and how nearly everything on the web gets found — the creators, the merchants, the answer to whatever you just typed into a box. For nearly 30 years, that discovery journey ran on a simple bargain: let a search engine crawl your content, and it sends you visitors. You turned those visitors into a business — through ads, subscriptions, or just the audience itself. Being discoverable and getting paid were the same thing. A year ago, on the first Content Independence Day, we drew a line to defend that bargain in the AI era. But a line in the sand was only a first step. Since then, the prevalence of AI search in consumers’ lives has only accelerated as more than 50% of traffic online is non-human. The threat is no longer a handful of training crawlers you can block; it’s search itself being rebuilt around AI answers.

Today’s answer engines read your page and hand the user a summary, so the visit — and the revenue that depended on it — isn’t needed. We see it firsthand, and independent research backs it up: a 2025 Pew Research Center study found that when Google shows an AI summary, users clicked on a traditional search result link just 8% of the time (about half as often as when there’s no summary) and clicked a link inside the summary only 1% of the time. That leaves our customers in a bind: opt out of AI and be hard to find, or opt in and deliver significant value to users while seeing increasingly little in return. Our customers want to be found and compensated for the value they provide, and right now they’re forced to choose.

Today, we’ve announced new bot options to help our customers better control who can access their site and what they can do with it. But blocking was only step one: saying “no” protects content without rebuilding the business models that sustain it. So, it’s time to start building the new economic model of the Internet, starting with search.

Rebuilding the bargain

Transparency and control are the foundation, but more is needed. In 2025, we laid out our foundation via a set of responsible AI bot principles: bots should be transparent about who they are and what they’re for, respect site owners’ choices, and act in good faith. Our tools hold bots to that bar. But enforcing good bot behavior doesn’t make AI search any better for the people relying on it, and it doesn’t send a dollar back to the creator whose work made the answer possible. We can do more than help the web say “no”; we can help rebuild what it says “yes” to.

So today, we’re announcing two initiatives that move from defense to offense and start putting both halves of that old bargain back together.

Make AI search smarter: By using the signals we see across our global network, like what’s fresh, what’s high quality, and what’s actually changed, we can help answer engines surface the most relevant content and reduce unwanted crawling. People searching get better answers, while costs are reduced for both AI companies and site owners if webpages are only recrawled when they’ve changed.

Pay creators for the value they provide: When your work is used to answer someone’s question, you should be rewarded instead of just being scraped for free. And you should be able to see what’s being used and what people are asking. This should be a real revenue stream, and an incentive to keep producing original content worth finding.

Making search smarter

Today we’re launching a research program to make AI search smarter and stop our customers footing the bill for crawls that produce nothing new.

More than 20% of the web sits behind Cloudflare’s network, which gives us a unique perspective. We can tell which pages have genuinely changed and which ones people and agents are flocking to. Through this program, we will explore using signals our customers have chosen to share about the freshness of their content, and we will combine those with our own insight into traffic flows, both human and bot. For answer engines, that’s a roadmap to high-quality content. For our customers, it provides a view of what users are actually asking, and how their content shows up in AI results. The aim is to measure two things: how much these signals help answer engines to surface fresher, higher-quality content, and how much unnecessary crawling they cut out.

That second benefit, cutting unnecessary crawling, is bigger than it sounds. Cloudflare data suggests that more than 50% of crawl traffic from good bots goes to re-fetching pages that haven’t changed — and that number is likely to climb as crawl volumes do. A signal that just says “nothing’s changed here” lets a crawler skip the trip. That saves the answer engine compute. More importantly, it saves site owners from serving and paying for requests they never needed to.

The program is neutral by design: our goal is to make it work for every answer engine willing to play fair. It’s limited to search. We aren’t sharing any content, and nothing is used to train foundation models. We intend to publish what we learn, including the benefits to site owners such as better content discoverability and reduced server strain. We plan to make the capability broadly available later this year and reduce unnecessary crawling across our network.

From Pay Per Crawl to Pay Per Use

Last year we launched Pay Per Crawl so publishers could charge AI companies for crawling their content. It was a real start, but crawling is a crude measure of value. A single page might be crawled once and then cited in thousands of answers, or crawled over and over and never used at all. Creators want to be paid fairly for the value they provide.

So we’re starting to shape Pay Per Crawl into Pay Per Use. We’re running experiments with top AI companies, like Ceramic.ai and You.com, and the arrangement is straightforward: organizations can bring their payment models and easily scale them to content owners across the Cloudflare network.

Ceramic has built what it calls a “pay-per-query” model, so publishers who opt in can be paid when their content appears in Ceramic’s search results. This means payment is designed to follow the value the work delivers rather than the number of times a crawler happens to fetch it.

“To scale the future of AI search, we need a partner with massive reach and a shared commitment to transparency and fair compensation,” says Anna Patterson, founder and CEO of Ceramic.ai. “Cloudflare allows us to easily and programmatically scale our operations. By bringing our pay-per-query model to their network, we ensure millions of content owners can seamlessly opt in to be compensated every single time their content appears in our search results.”

In addition to compensation, content owners participating in the Cloudflare/Ceramic program will unlock new reporting to help with answer engine optimization (AEO). Customers can finally see the top queries leading to their content appearing in search results, the specific webpage and snippet, their average search result ranking position, and more. This is the first of many products we’ll be launching to help our customers with discoverability.

This is just one emerging approach. Another comes from You.com: agents can pay on demand for a specific piece of premium content they need, without any upfront commitment. New payment models from AI providers are being tested (e.g., Pay per Query, Pay per Result, etc.) and we have the infrastructure to support them all.

We want to be honest that this is an experiment. There’s a lot to learn, including exactly how this holds up at the scale of the Internet. We’ll work that out with our partners and our customers as we go, and share what we learn. But the goal is clear: AI search companies get fresher, better-grounded answers, and the customers whose work makes the answers possible get paid when they help. Cloudflare’s job in all of this is to provide the infrastructure layer that makes this market flourish.

We think this is a more natural fit for where the economics of search are heading. The old, human web optimized search to save time — providing excerpts, ten blue links, and a click. The agentic Internet is different: an agent can read fast and search continuously. Search is becoming something an agent does dozens of times to answer a single question, closer to a utility than a destination. In that world, the unit that matters isn’t the crawl or the click. It’s the outcome. Pricing the outcome, and paying the people who made it possible, is how the web continues to thrive.

The headline we want to earn

A year ago on Content Independence Day, the headline was a default ‘no’: AI can’t crawl without compensation. This year, our focus is on giving our users more products and controls to say ‘yes’ and bring more benefits with it.

Today’s announcements are just the beginning. Cloudflare’s research project is designed to see if our signals produce better results with less crawling. Pay Per Use is a promising direction we’ll experiment with alongside partners who believe that content creators deserve fair compensation for their work. This is how the last 30 years of the web got built too: somebody runs the pilot that turns “the model is broken” into “here’s the new model,” one experiment at a time. We believe there’s value to our customers to be discoverable in this new agentic era, and to optimize their content for maximum discovery. But they should be able to do this without giving away their most valuable creative assets for free.

The web is changing, and the business models it’s relied on are changing with it. The old Internet was open, neutral, and worth contributing to. We have a rare chance to keep it that way, and to build the business models that fund it in the future. Smarter answers for humans and agents asking the questions. A fair deal for the people whose skill, creativity, and commitment makes the answers worthwhile. That’s how we pursue Cloudflare’s mission: to help build a better Internet.

Happy Content Independence Day!

Building on the open, agent-ready web? If you are interested in learning more about the Ceramic and You programs, please fill out this form. If you’re building an answer engine and want to crawl smarter, we’d love to hear from you too: [email protected].

Content Independence Day, one year on: building the business model for the agentic Internet

2026-07-01 Arielle Weiss

Post Syndicated from Arielle Weiss original https://blog.cloudflare.com/agentic-internet-bot-report/

One year ago, we declared Content Independence Day. At the time, we could see what many in the industry were beginning to sense: the fundamental economics of the Internet were shifting. AI adoption was accelerating, publishers were experiencing rapid declines in referral traffic, and AI companies were crawling the web at unprecedented scale, often without clearly declaring intent, and almost always without compensation.

We changed the defaults. For all new domains on Cloudflare, AI training crawlers would be blocked by default unless domain owners chose otherwise. We didn’t do this to wall off the web. We did it because we believed a healthier ecosystem required transparency, control, scarcity, and ultimately, a market where high-quality content could be valued and exchanged fairly.

A year later, that market has emerged. But the transformation of the Internet has happened even faster than we anticipated. In this report, we share key data points that illustrate how quickly the business model of the Internet has shifted – and what this new content market means for publishers and site owners.

Part I: The Internet has changed – faster than anyone expected

The vertical adoption curve

AI is not just another technology cycle. It is a platform shift happening at more than 2x the speed that smartphones were adopted. In just 3.5 years, over 30% of humanity — 2.5 billion active users — has adopted regular use of generative AI. The adoption curve isn’t merely steep: it’s going vertical.

The decline of the open web

Never before have we seen such a rapid change in how humans interact with information, perform work, and spend time online.

The way people use the Internet is changing dramatically. Today, for every hour spent online searching for information, only 15 minutes is spent on the open web. Traditional search behavior is collapsing as users shift to AI-driven discovery and consumption. Instead of visiting multiple sites to source and compare information, users simply type a prompt and receive a nearly instantaneous, consolidated answer.

The agentic Internet is here

This year, agent traffic crossed a historic threshold for the first time: more than 50% of traffic on the Internet is now non-human. This shift has staggering implications for publishers, content owners, and the future of the open web.

Crawlers have changed their purpose

When looking at the crawlers Cloudflare identifies by purpose, the composition of crawler traffic tells the story clearly:

52% of crawler requests are now for AI training as of June 2026, up from 22% in Spring 2025.
Mixed-use crawlers (those blending search, agent use, and training) represent over 36% of activity.
Pure search crawling now represents a small and declining share of overall crawler activity, despite remaining critical for publisher visibility.

As AI training becomes a primary driver of crawler activity, the ability to distinguish between discovery and training becomes increasingly important. Mixed-use crawlers blur that distinction, putting content owners in a difficult position: choose between remaining discoverable in the agentic era, and giving away their most valuable content without compensation.

The old business model is gone

For decades, the economic model of the open web was straightforward. Content creators exchanged access to their content for visibility in search engines, which returned referral traffic. That traffic became the primary mechanism through which publishers, creators, and businesses generated economic value.

But today, that exchange is breaking down. Content is still being crawled, indexed, and used — but increasingly without corresponding traffic being returned to the source. As AI systems answer questions, compare products, conduct research, and complete tasks directly, information across the open web is increasingly becoming part of AI training and retrieval systems. The existential question this raises is simple: if content is consumed without audiences ever visiting the source, how do content creators sustain themselves?

The implications are industry-agnostic

The earliest industries to feel the impact were news organizations and media companies. Today, similar dynamics are impacting businesses across retail, software, IT, and finance. Some of the most heavily crawled categories have seen human traffic decline as much as 40% in less than one year.

Many publishers are now preparing for what they call “Google Zero” — a world where little to no traffic comes from search referrals.

The implications extend to essentially every industry. Any organization that publishes proprietary information on the Internet will need to understand how to operate in an agentic era. This dynamic matters not just to content owners, but to all of us. The Internet is a critical part of the global economy and one of the world’s most important public resources for surfacing information. Ensuring it remains healthy and sustainable is essential for all.

Part II: The market has emerged

What we built

When we launched Content Independence Day, we committed to three things:

Transparency and control for site owners, enabling them to define how their content is accessed and monetized.
Tools that create scarcity, shifting the balance of power back to content owners.
A marketplace where content creators and AI companies of all sizes can discover, license, and determine the value of content more efficiently.

One year later, a market for monetized content is here, and the conditions for a dynamic marketplace are forming.

Transparency and control created scarcity

Historically, publishers have had limited visibility into how AI companies accessed and used their content. As referral traffic declined, that lack of visibility became an economic problem prompting publishers to seek new ways to capture value.

Cloudflare’s attribution, business intelligence, and enforcement tools gave publishers visibility into AI consumption at the network level — an enforcement mechanism far more effective than voluntary standards like robots.txt. For the first time, publishers could determine how their content was accessed and monetized. That control created scarcity, and drove a supply-and-demand content economy.

Scarcity created leverage

Publishers that exercised control over access successfully created scarcity, giving them negotiating leverage that led to better deals. For the first time, publishers gained operator-level attribution data — evidence of how often LLMs attempted to access their content, which competitive LLMs were crawling, what their most in-demand URLs were, and what their crawl-to-referral ratios looked like. This reduced information asymmetry in licensing discussions and enabled publishers to negotiate from a position of knowledge.

Leverage is changing the balance of power

This leverage has empowered our customers. As they have gained greater visibility into how AI systems access and use their content, they’ve become better equipped to understand the implications for their businesses and more confidently articulate the value of the information, brand, and audiences they have built.

As the balance of power between content owners and AI companies begins to change, a licensing economy is emerging:

More than 50 publisher-AI agreements have been signed since 2023.
Major AI companies now actively license content, increasingly recognizing the value of differentiated and premium content.
Collective licensing models continue to emerge and scale.
Large publishers are securing meaningful licensing agreements, demonstrating that content has real economic value within the AI ecosystem.

The conversation is no longer whether content should be compensated. The conversation now is how.

The market is maturing, but inefficiencies remain

Early licensing agreements proved demand exists, but licensing today remains largely bespoke and unlikely to fully replace lost referral, advertising, and affiliate revenue. As a result, publishers are increasingly optimizing for AI consumption alongside traditional human discovery while exploring new monetization pathways.

Supply and demand remain difficult to match efficiently, and while there’s an understanding that not all content carries the same value, content valuation is still unresolved.

The Google convergence problem

No discussion of this market is complete without addressing Google’s unique role. Google remains the dominant gateway to online discovery, accounting for approximately 88% of referral traffic. But increasingly, Google is helping users consume content directly within Google-owned AI experiences.

Discovery and consumption serve fundamentally different purposes. Search drives users to content, while AI-powered experiences increasingly summarize and reuse it without requiring users to visit the source. Website owners view these activities differently because one generates traffic, while the other increasingly substitutes for it.

These differences become especially important when site owners are deciding who should be allowed to access their content and for what purpose. Most leading AI companies separate discovery crawlers from training crawlers, making it relatively simple for publishers to enable content access for one purpose or the other. Google does not. Today, Google has access to about 2x more information than leading AI companies because Google leverages a mixed-use bot that makes it difficult for customers to participate in Google’s search ecosystem without also participating in Google’s AI ecosystem.

Unlike other AI providers, Google’s mixed-use crawler also limits transparency for site owners. Because discovery and AI access are combined into a single crawler, publishers cannot tell why Google is accessing their content or distinguish between traffic used for search and traffic used for AI experiences. They also lose the visibility and evidence that comes from being able to allow or block these activities independently at the network level.

This dynamic has accelerated demand for greater transparency and control, as well as new monetization models to better serve both content owners and AI companies of all sizes.

Part III: A unique view of the ecosystem

Cloudflare sits at the intersection of the emerging agentic economy.

More than 20% of the web sits behind Cloudflare’s network. Of the world’s most-visited websites, 36% rely on our network, and more than 40% of the Fortune 500 are Cloudflare customers. Nearly 80% of leading AI companies use Cloudflare, alongside thousands of developers and emerging AI companies.

This unique position gives us visibility into both sides of the market. We see the content owners creating content, the AI companies consuming it, and the signals increasingly connecting them. That perspective has given us a unique view into how the market has evolved over the past year, and what it now requires.

Part IV: Lessons from an emerging market

As publishers and AI companies adapt to a new agentic economy, Cloudflare has gained a clearer understanding of what the ecosystem now needs.

Transparency must become the standard

Content owners increasingly need visibility and control over who is accessing their content, how it is being used, and for what purpose. AI companies increasingly recognize that transparency builds trust and reduces friction with publishers. Visibility and enforcement are no longer security concerns alone — they have become business requirements that directly influence licensing negotiations and commercial decision making.

To help make transparency the standard, Cloudflare is continuing to invest in enhanced attribution, measurement, and publisher controls that give content owners greater visibility into and control over how their content is accessed and used.

As the industry shifts toward greater transparency, we believe that verifiable bot self-identification and declarations of crawl intent are fundamental to a sustainable ecosystem. Today, more than one-third of crawler activity on our network still comes from mixed-use bots that make it impossible for content owners to distinguish crawl intent. We are actively engaging with the ecosystem and investing in tooling to help drive that number to zero by this time next year.

Better AI requires better signals

Over the past year, it has become increasingly clear that AI companies need more than access to content. They need better ways to determine what to access, when to access it, and how frequently it has changed. Indiscriminate crawling wastes compute for AI companies and creates unnecessary bandwidth burden for publishers, reducing efficiency across the ecosystem.

We believe better answers require better intelligence. We are investing in real-time freshness signals with richer trust, quality, and relevance to help AI companies discover differentiated information while reducing unnecessary crawling across the web.

Markets need better discovery before better pricing

We believe better discovery must precede better pricing. In order for the market to mature, publishers and AI companies need better information about one another. We are investing in richer market intelligence, content signaling, and capabilities that improve discovery between both sides of the ecosystem, laying the foundation for more scalable market mechanisms over time.

Part V. Building the infrastructure for the agentic Internet

One year ago, Content Independence Day introduced a simple idea: content owners should have greater control over how AI companies access and use their information.

Over the past twelve months, that control helped give rise to a market. Transparency created scarcity. Scarcity created leverage. Leverage accelerated licensing. What was once a theoretical discussion about the future of AI and content has become an active market, with publishers, AI companies, and technology providers all adapting to a new set of economic realities.

The market is now entering a new phase that demands new infrastructure. As the Internet becomes increasingly agentic, the underlying systems that support it must evolve to handle permissions, licensing, and commercial transactions at scale. Content owners and AI companies need more efficient ways to connect and exchange value. We believe these capabilities will converge into programmable, scalable mechanisms for content discovery and monetization – reducing friction while unlocking richer forms of value exchange.

Cloudflare’s role is to build the infrastructure and business intelligence, and contribute to the standards that allow the market to determine value more efficiently and help publishers and AI companies participate in a healthier, more dynamic content economy.

The Internet has always evolved. This evolution is faster and more consequential than most. But with the right infrastructure, the right incentives, and a commitment to transparency, we believe the agentic Internet can become more sustainable, more efficient, and better for everyone.

Methodology:
The data in this report is compiled from Cloudflare Radar and the Cloudflare Investor Day 2026 Presentation.

Cloudflare Radar is a hub showcasing global Internet traffic, attack, and technology trends and insights. Powered by data from Cloudflare’s global network, Radar was created to help anyone understand what is happening on the Internet from a security, performance and usage perspective.

Cloudflare’s unique understanding of the Internet comes from its global network — one of the world’s largest, spanning 330+ cities in 100+ countries — and aggregated and anonymized data from Cloudflare’s 1.1.1.1 public DNS Resolver, widely used as a fast and private way to browse the Internet. More than 20% of the web sits behind Cloudflare’s network.

Announcing the Monetization Gateway: charge for any resource behind Cloudflare via x402

2026-07-01 Rohin Lohe

Post Syndicated from Rohin Lohe original https://blog.cloudflare.com/monetization-gateway/

Today, we are announcing the Cloudflare Monetization Gateway, an engine that will give Cloudflare customers the ability to charge for any asset protected by Cloudflare: web pages, datasets, APIs, or MCP tools.

It will provide a single control plane to manage payment policies and access controls across your applications, while also protecting your origin from high payment volumes by handling payment verification and enforcement at the edge. At launch, payments will settle in stablecoins over x402, the open protocol we are building with a coalition of more than 25 industry leaders via the x402 Foundation.

The evolving business model of the web

For 30 years, the web has run on a simple economic bargain: trading content for human attention. That attention has been monetized through advertising, subscriptions, and e-commerce. This bargain funded the Internet as we know it.

But as agents become the dominant Internet users, the model is breaking. An agent does not look at ads or need to maintain a monthly subscription to all the tools it wants to access. It reads a page or consumes a data feed once, takes what it needs, and moves on. Across the web, AI crawlers already request content anywhere from a hundred to tens of thousands of times for every visitor they send back.

This reality demands a new model: usage-based pricing for everything. If attention and e-commerce are moving from websites to AI harnesses and AI-written software, then agents should pay for the inputs they need — training data, inference content, developer tooling, and API usage. The natural unit of payment for software is the request, the token, or the outcome, not the seat or the month. A few examples of what that could look like:

A few cents per web search, billed per call
\$0.001 base fee plus a \$0.01 per MB charge for an upload endpoint
\$0.99 per resolved support escalation, paid only when the work succeeds

This is the same shift behind paying creators when an answer engine uses their content — a fair exchange of value whenever content or a resource is used, priced on neutral rails built for the purpose. People often envision an agent buying high-priced assets like web domains, but most of what an agent pays for sits upstream of any checkout, and is priced far lower.

Some of the Internet already works this way. Cloud and APIs have been sold by the call and by the hour for years, but only to a known buyer: a user signs up, they are issued an API key, and they incur usage-based metered billing. Content mostly skipped payment and ran on advertising instead. These business models have never been able to serve unverified buyers for sub-cent transactions because the payment rails cost too much and took too long to settle. Below a certain price, collecting the payment cost more than the payment was worth.

Historically, usage-based billing was difficult to implement. Businesses needed to effectively become payments companies, running their own accounting to track internal usage in a robust and auditable way. Tracking this usage required significant overhauls of backend systems. Many instead chose per-seat pricing because it is simpler and frequently more profitable.

Agents flip this dynamic. A single agent can do the work of an entire team around the clock, making a flat one-time fee disconnected from actual consumption. At the same time, an agent can make thousands of micropayments without friction, while asking a person to approve each payment would be impossibly burdensome. Usage-based price points are where agents live and where stablecoin-based micropayments shine. That’s because stablecoins (such as Open USD and USDC) allow buyers to transfer tiny sums across the Internet, incurring negligible fees and settling in less than a second. This is not feasible with other payment rails today.

Here’s where we can help. Cloudflare has spent years building usage-based accounting for our own billing systems and for our customers’ analytics. We can dramatically simplify the implementation of usage-based billing for web-based assets thanks to our position as a proxy layer between buyers and sellers. As shown below, with Cloudflare supporting usage-based billing, the evidence of payment can move into the request itself, and the payment validation and the request paths merge.

And here’s the benefit to you: the metering, the payment exchange, and the settlement move off your origin. What stays with you is what matters — your rules, your prices, and your revenue. You will not need to onboard the buyer or stand up a billing system. You will write a rule and agentic buyers will pay for what they use.

A refresher on x402

Last year on Content Independence Day, we gave site owners one-click control over which AI crawlers could reach their content, and with Pay Per Crawl we let them charge crawlers for it. The Monetization Gateway is the next step: instead of only charging crawlers for content, you will be able to charge any caller for any resource, from an API to data to an MCP tool call, and you will not have to build the payment machinery yourself.

x402 is an open protocol that makes it possible to pay over HTTP, named for the 402 status code it finally puts to use. The x402 exchange is simple: a client requests a payment-gated resource. Instead of serving it, the server responds with 402 Payment Required and a small payload that states the price, the accepted asset, and where to pay. The client pays and repeats the request with proof of payment attached. A facilitator verifies, and the server returns the resource. It all happens inside ordinary HTTP requests and responses, with no redirect to a checkout page and no separate payment API to call. Settlement happens peer-to-peer, so any funds that a buyer sends to a seller are directly deposited to the seller’s wallet. We are designing the Monetization Gateway to keep payment overhead low and are aiming for sub-second payment settlement.

^{x402 Payment Flow: AI Agent ↔ APIServer ↔ Blockchain, Source:}^{x402 Readme on GitHub}

Two properties make x402 a good fit for machine payments. The payment amounts can be small, down to fractions of a cent, because the protocol adds almost no overhead. And the buyer needs no account with the seller, because the payment itself is the credential. x402 is rail agnostic, but it is a natural fit for stablecoins, which can settle in under a second for a fraction of a cent with zero chargebacks.

What the Monetization Gateway does

The Monetization Gateway will provide a flexible payment rules API that will allow you to express exactly when you want a caller to pay to access your digital resources.

Here’s how it will work. Tokens, APIs, MCP tool calls, and data already flow through that path. You will decide, as precisely as you want, which of that traffic has to pay. And you will be able to enforce your decisions by writing expressions, similar to expressions that you already write for other Cloudflare rules, in a simple, dedicated product API. The Monetization Gateway will scale with Cloudflare’s global network across 330+ cities, which means that the x402 handshake will occur in close proximity to your buyer. This will reduce request latency and protect your origin.

A few examples of planned capabilities:

Charge for specific REST verbs: Require payment on calls to a specific route, for example $0.01 for every GET or POST request to /api/premium/*.
Variable pricing: Charge variable amounts for tasks of varying complexity, for example, image generation might charge any amount up to $2, depending on the compute used.
Charge only unauthenticated callers: Intercept HTTP 401 “Unauthorized” responses from your origin and return 402 “Payment Required” instead with pricing and payment instructions.

When a request matches, the Monetization Gateway will verify payment before letting it through. You will be able to set these rules in the dashboard, or manage them as code through the Cloudflare API and Terraform, so a paid endpoint is just another part of your infrastructure config.

The Monetization Gateway will initially allow users to require buyers to pay for services and resources in stablecoins. Sellers will be able to use the stablecoins they accumulate for their own transactions or redeem the stablecoins for equivalent fiat currency in their bank account. Using the Monetization Gateway offers a way to increase the addressable market for your products. With the Gateway, agents can request your resource, be told the price, pay, and get the response. No signup, no API key, no prior relationship required. You will decide how much you need to know about that buyer, and you will have the flexibility to require agents to authenticate with Web Bot Auth and apply usage-based pricing against accounts they already hold.

Where we see this going

The Monetization Gateway will turn the request into a payment and give Cloudflare customers new revenue opportunities, but where this goes is far bigger.

An agent is software that acts autonomously on a user’s behalf, and agents are starting to act on their own. Soon they will carry wallets and buy what they need without a person in the loop: a dataset, an API call, a tool, a block of compute. Some of those resources will be free, and some will require proof of who the agent is and who it acts for, through verified agent identity. Many will require both an identity and a payment, and Cloudflare is one of the few places that will be able to settle all of it inside a single request, by verifying the agent, applying the rule, and checking the payment before the origin ever sees the call. The agent becomes the primary buyer on the Internet, and the request becomes the transaction.

There is an enormous amount of value moving across the Internet today that goes unmonetized or undermonetized, not because no one would pay for it, but because the tools to charge for it have never existed. Every useful API call, every answer, every tool invocation an agent makes has value, and almost none of it is paid for today. That is the opportunity in front of us, and it is what the Monetization Gateway will unlock.

This is what we are building toward: an agent-first Internet with Internet-scale settlement built in. Where the people who make something worth paying for get paid by the software that uses it, automatically. And where the smallest new API can reach the same buyers, on the same terms, as the largest company on the web, and the independent creator is paid by the large language models that use their work. That is the next business model of the Internet, and we are building to power it.

Sign up for our waitlist

The Monetization Gateway waitlist is open now for Cloudflare customers. If you’re interested in monetizing your web page, dataset, API, or MCP tool with usage-based pricing, please join our early access list.

Unmasking the crawls with Attribution Business Insights

2026-07-01 Jin-Hee Lee

Post Syndicated from Jin-Hee Lee original https://blog.cloudflare.com/attribution-business-insights/

Original content is the lifeblood of conversations and curiosities. Imagine a world without it: we could find a thousand ways to regurgitate the same material that’s already been created, but we would witness the decline of fresh ideas and arguments.

Website owners fuel the ecosystem of ideas, news, and interesting tidbits, but they face the increasingly complex challenge of managing traffic to their websites and being paid for their content. While some bot traffic is clearly malicious, it isn’t always obvious when a particular AI crawler is helping or harming your business. To answer this, site owners need granular, reliable data to differentiate between traffic that provides value, and traffic that strains resources while eroding the foundation of their business model: actual humans consuming their content.

At Cloudflare, we hold a core belief: website owners have the right to control access to their content. We want to help website owners maintain their high-quality content and regulate AI traffic.

To provide much-needed clarity and help website owners take control, we’re excited to announce the new Attribution Business Insights dashboard — designed with business decision-makers and publishers in mind.

The new economics of the Internet

For decades, the business model of the Internet relied on a straightforward, unspoken agreement: website owners allowed search engines to crawl their content and, in return, search engines sent readers back to their pages. This symbiotic relationship, where traditional search engines operated with a balanced “crawl-to-referral” ratio, generated the pageviews needed to sustain advertising, affiliate revenue, and subscriptions. Search index crawlers would scan your content a couple of times for each referral sent, so making your website available to crawlers had a clear pipeline to additional revenue. We can think of this as the SEO (Search Engine Optimization) era.

Today, the explosive rise of AI crawlers and agents has broken this contract, plunging the digital publishing industry into an unprecedented crisis. The Internet is risking a transition into a “zero-click” ecosystem where AI chatbots scrape original content to synthesize instant answers — completely bypassing the original sources. We’ve already seen a marked shift from the SEO-only world into an AEO (Answer Engine Optimization) world, and now conversations around GEO (Generative Engine Optimization) are taking center stage.

The imbalance of this new reality is made clear by the crawl-to-referral ratios we see across the Internet today. While traditional search engines had a more balanced ratio of crawls to legitimate visitors referred, major AI crawlers operate on a drastically different, extractive scale. Bots from leading AI companies have been observed with a range of crawl-to-referral ratios: we noted ratios of 118:1 up to nearly 50,000:1 around the time of our Content Independence Day in 2025. In other words, an AI crawler might have crawled your premium content tens of thousands of times just to send back a single visitor. This ratio is fundamentally unfair.

For publishers, this creates a double hit: first, they’re losing out on the crucial referral traffic, ad impressions, and direct audience relationships that fund content creation and journalism. Second, they’re forced to bear the rising infrastructure costs of hosting and serving content to automated bots that offer no commercial value in return. The era in which it makes sense to allow all crawlers in the hopes of being discovered is over.

Introducing Attribution Business Insights

We want website owners to have the facts — the cold, hard numbers to understand which bots are helping their business and which bots are harming it. We also want to make this analysis easier than ever, which is why we’ve designed Attribution Business Insights to cut the noise, focusing on the details that our customers have told us are most important.

Today, the Attribution Business Insights dashboard is available to all Cloudflare Bot Management customers. The new dashboard is designed to deliver a targeted view of bot traffic flowing to your website; unlike traditional analytics tools that may require extensive manual filtering, this dashboard provides you with key insights right away.

We set out to answer the most pressing questions for site owners today: How should you think about AI traffic on your websites? What is the value of different audiences — including humans, non-AI bots, and AI bots? And most importantly, what is your data being used for?

^{The new Attribution Business Insights dashboard view, which includes insights about bot traffic overall, a site-wide crawl-to-referral ratio, and the distribution of AI bot traffic vs. organic traffic.}

To answer these questions, the dashboard displays a powerful array of data and insights:

Bot traffic to content pages: View your overall bot vs. human traffic, as well as the volume of all bots successfully accessing content.
Crawl-to-referral ratios: See your site-wide crawl-to-referral ratio on the scale of 24 hours, seven days, or 30 days. You can also see crawl-to-referral ratios per bot operator (per company that owns one or more bots).
Top bots breakdown: A list of top bots by volume, including their country of origin, bandwidth they take up on your website, and whether you’re currently blocking or allowing them.
Updated classification based on crawler behavior: We go beyond a generic label of “AI Crawler” by classifying crawlers with our updated taxonomy, whether it’s Training (i.e., training the next version of an LLM chatbot), Search (i.e., refreshing databases for Retrieval-Augmented Generation), or Agent (i.e., used in agentic interaction to return answers to an end user).

From data to business strategy

You shouldn’t have to be a security expert to understand how AI crawlers affect your business. If website owners want to spend just a few minutes ingesting the high-level insights, they can walk away with a clear temperature check of the effectiveness of their content security policy.

For those who want to do a little more digging to understand how AI companies are making use of their content — or collect information to guide how they want their relationships with AI companies to develop — we show a more granular view organized by bot operator.

^{Breakdown of bot activity on a website, with important details for each bot such as type, crawl-to-referral ratio, and current action.}

By having a consolidated view of companies seeking to access content on your website, you can develop a better baseline of crawler activity. We want this data to equip our customers to step into any business conversation with the facts on their side. Tell Company1 that their crawl volume is twenty times that of Company4’s, and that Company4 is already compensating you for content. Revisit the way that Company2 licenses your content based on their recent activity. This new dashboard propels business conversations to move forward.

How does this new layer of visibility tie into the existing tools you have to protect your website from abuse? In line with other features of Bot Management, the action step still happens in Security rules. To avoid adding noise to the control plane, Attribution Business Insights is intended to be a hub for thoughtful, filtered analytics, rather than another place to take action. This dashboard serves as a central source of information, allowing you to investigate before then taking an action in the same rule engine that governs other abuse mitigations. We also want to be loud and clear about inviting business decision-makers into this dashboard, acknowledging that conversations around AI traffic have a wider set of stakeholders than only security-specialized users.

What’s next

The Attribution Business Insights dashboard is the next critical step in providing website owners with the transparency and control they need to manage evolving AI bot threats, and more broadly, shape the new dynamics of the Internet. We’re already investigating the next iteration with close publishing partners to create a visibility plane that covers security from the perspective of the website owner with valuable, original content to share.

A sneak preview below includes a new view to dissect crawler activity per-article to reveal the appetite that AI companies have for different pieces of content, different campaigns, and so on.

^{Breakdown of most popular articles, according to traffic volume. Shows key metrics such as AI bot traffic vs. other bot traffic vs. human traffic, both direct and from a referral.}

Visibility is the first piece, and there’s more to come to empower website owners to take control of their content in this new age. We encourage all customers of Cloudflare Bot Management — especially those driving business conversations — to access this today for a fresh take on analytics.