Tag Archives: Domain Rankings

Measuring the Internet’s pulse: trending domains now on Cloudflare Radar

Post Syndicated from Sabina Zejnilovic original http://blog.cloudflare.com/radar-trending-domains/

Measuring the Internet's pulse: trending domains now on Cloudflare Radar

Measuring the Internet's pulse: trending domains now on Cloudflare Radar

In 2022, we launched the Radar Domain Rankings, with top lists of the most popular domains based on how people use the Internet globally. The lists are calculated using a machine learning model that uses aggregated 1.1.1.1 resolver data that is anonymized in accordance with our privacy commitments. While the top 100 list is updated daily for each location, typically the first results of that list are stable over time, with the big names such as Google, Facebook, Apple, Microsoft and TikTok leading. Additionally, these global big names appear for the majority of locations.

Today, we are improving our Domain Rankings page and adding Trending Domains lists. The new data shows which domains are currently experiencing an increase in popularity. Hence, while with the top popular domains we aim to show domains of broad appeal and of interest to many Internet users, with the trending domains we want to show domains that are generating a surge in interest.

Measuring the Internet's pulse: trending domains now on Cloudflare Radar

When we started looking at the best way to generate a list of trending domains, we needed to answer the following questions:

  • What type of popularity changes do we want to capture?
  • What should we use as a baseline to calculate the change?
  • And how do we quantify it?

We soon realized that we needed two lists. One reflecting sudden increased interest related to a particular event or a topic, showing spikes in popularity in domains that jump in the ranking from one day to the next, and another one reflecting steady growth in popularity, showing domains that are increasing their user base over a longer period.

For this reason, we are launching both the Trending Today and Trending This Week top 10 lists to capture the two different types of popularity increase.

To select the baseline for calculating the increase in popularity, we analyzed the volatility of the Radar Domain Ranking list for different top list sizes. The advantage of starting with the Radar Ranking lists is that they already incorporate a good popularity metric that quantifies the estimated relative size of the user population that accesses a domain over some period of time. You can read more about how we define popularity in our “Goodbye, Alexa. Hello, Cloudflare Radar Domain Rankings” blog.

As expected, smaller list sizes were more stable, meaning the percentage of domains in the top 100 that changed the ranking from one day to the next was much lower than the percentage of domains that changed in the top 10,000. Hence, to have a dynamic daily list of trending domains, we had to look beyond the top 100 most popular domains.

However, we did not want to go all the way to the long tail of the list, as we already know that the ranks there are based on “significantly smaller and hence less reliable numbers” (see the paper "A Long Way to the Top: Significance, Structure, and Stability of Internet Top Lists"). Hence, we selected an appropriate list size for each location, based on the distribution of the number of DNS queries per domain. For example, for the Worldwide trending list we analyzed the top 20,000 most popular domains, for Brazil we looked at the top 10,000, Angola 5,000 and for the Faroe Islands top 500.

We then evaluated how much the domains change rank from one day to the next.

Measuring the Internet's pulse: trending domains now on Cloudflare Radar

We saw that on average, the biggest changes in the top lists, from one day to the next, happen from Fridays to Saturdays and from Sundays to Mondays, and hence on Saturdays and Mondays the lists have the least overlap with the lists of the previous day. We also compared the rank changes from one day to the next corresponding weekday, say from one Monday to the next and saw that on average, rankings on Mondays typically have more overlap with the rankings of the previous Mondays, than with the rankings of Sunday. From this we decided that in order to capture which domains are trending due to the weekend effect, we needed to compare the domain's daily rank to the rank of the previous day(s), and not of the corresponding weekday.

However, we also did not want to show as trending those domains that highly oscillate in the rankings, jumping up and down from one day to the next, showing up as trending every few days. Hence, we could not simply compare the daily rank with the rank from the day before. Instead, as a compromise between capturing the most recent trends, including the weekend trends, but still filtering out the domains whose ranking oscillates over a short period of time, we decided to compare the domain's daily rank with its best rank of the previous four days.

Then, to calculate the increase in popularity, we simply calculate the percentage change in the current rank compared to the best rank of the previous four days.

For calculating the domains steadily growing over the week, we used a slightly different approach.

Measuring the Internet's pulse: trending domains now on Cloudflare Radar

We want to highlight domains that keep improving their rank day by day and especially those that have been really trending in the most recent days. Therefore, we decided not to directly compare the current rank with the best rank during the previous week. Instead, we looked at the weighted average per day rank improvement and compared it with the best rank of the previous six days, with more recent days being given more weight.

What do these lists look like at the end? We compiled the lists for the eventful days of June 21 to 24.

On June 22, nba.com was trending in 28 locations, shown in the table below, the United States, as expected, but also Austria, Australia and Japan, to name a few, reflecting the interest in the events of NBA Draft 2023.

Trending Today data from Friday, June 23, 2023:

Location Trending rank Domain
Albania 5 nba.com
Argentina 9 nba.com
Australia 1 nba.com
Austria 9 nba.com
Belgium 5 nba.com
Canada 5 nba.com
Chile 6 nba.com
Colombia 3 nba.com
Dominican Republic 5 nba.com
Greece 2 nba.com
Honduras 6 nba.com
Hong Kong 1 nba.com
India 7 nba.com
Indonesia 4 nba.com
Ireland 3 nba.com
Japan 9 nba.com
Mexico 2 nba.com
New Zealand 1 nba.com
Norway 1 nba.com
Philippines 1 nba.com
Poland 9 nba.com
Serbia 2 nba.com
South Korea 3 nba.com
Taiwan 1 nba.com
Thailand 1 nba.com
Ukraine 1 nba.com
United States 6 nba.com
Venezuela 4 nba.com

Two domains trending in multiple locations on Saturday, June 24, were: rt.com, a Russian news site in English, and liveuamap.com, a site with interactive map of Ukraine. These are probably the effects of the events related to the Wagner group on June 23 and 24. Related to the same events, domain jetphotos.com was trending on the same day in Russia, Norway and Albania.

Trending Today data from Saturday, June 24, 2023:

Location Trending rank Domain
Armenia 4 rt.com
Australia 5 rt.com
Belgium 2 rt.com
Bulgaria 9 rt.com
Canada 6 rt.com
Denmark 6 rt.com
Greece 6 rt.com
Italy 2 rt.com
Kazakhstan 8 rt.com
Lebanon 4 rt.com
Netherlands 8 rt.com
Papua New Guinea 9 rt.com
Singapore 2 rt.com
Spain 6 rt.com
Turkey 4 rt.com
United Kingdom 5 rt.com
United States 3 rt.com
Uzbekistan 2 rt.com

Other domains trending in various locations on Friday and Saturday were different Gaming and Video Streaming domains such as roblox.com, twitch.tv and callofduty.com, showing an increased interest in gaming activities as the weekend approaches.

Yet another interesting effect of the weekend was the presence of five weather forecast sites on the top 10 trending sites on Friday, in Croatia, showing preoccupation with the summer weekend plans.

Trending Today in Croatia (data from Friday, June 23, 2023)

Trending rank Domain Category
1 lightningmaps.org Weather; Education
2 freemeteo.com.hr Weather
3 Vrijeme.hr
(Croatian Meteorological and Hydrological Service)
Politics, Advocacy, and Government-Related
4 arso.gov.si
5 rain-alarm.com Weather; News & Media
6 sorbs.net Information Security
7 neverin.hr Information Technology
8 meteo.hr
(Croatian Meteorological and Hydrological Service)
Business
9 gamespot.com Gaming; Video Streaming
10 grad.hr Business

These were all examples of daily trending domains, but what domains have steadily grown in popularity that week?

In multiple countries we had travel sites trending that week, sites such as booking.com, rentcars.com and amadeus.com, as many people were making their summer vacation plans. Weather forecast, specifically windy.com domain, was also trending the whole week in locations such as the Dominican Republic, Saint Lucia and Reunion, which was not surprising as the hurricane season began.

Trending This Week (Week June 17 -23, 2023)

Dominican Republic Reunion Saint Lucia
cecomsa.com atera.com adition.com
blur.io sharethis.com windy.com
pxfuel.com windy.com bbc.co.uk
windy.com baidu.com ampproject.org
mihoyo.com inmobi.com aniview.com

Final words

Both Trending Today and Trending This Week top 10 lists are now available on Radar starting today and on Radar API. Feel free to explore them and see what is trending on the Internet.

Measuring the Internet's pulse: trending domains now on Cloudflare Radar

Visit Cloudflare Radar for additional insights around (Internet disruptions, routing issues, Internet traffic trends, attacks, Internet quality, etc.). Follow us on social media at @CloudflareRadar (Twitter), cloudflare.social/@radar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via e-mail.

Popular domains are domains of broad appeal based on how people use the Internet. Trending domains are domains that are generating a surge in interest.

Goodbye, Alexa. Hello, Cloudflare Radar Domain Rankings

Post Syndicated from Celso Martinho original https://blog.cloudflare.com/radar-domain-rankings/

Goodbye, Alexa. Hello, Cloudflare Radar Domain Rankings

Goodbye, Alexa. Hello, Cloudflare Radar Domain Rankings

The Internet is a living organism. Technology changes, shifts in human behavior, social events, intentional disruptions, and other occurrences change the Internet in unpredictable ways, even to the trained eye.

Cloudflare Radar has long been the place to visit for accessing data and getting unique insights into how people and organizations are using the Internet across the globe, as well as those unpredictable changes to the Internet.

One of the most popular features on Radar has always been the “Most Popular Domains,” with both global and country-level perspectives. Domain usage signals provide a proxy for user behavior over time and are a good representation of what people are doing on the Internet.

Today, we’re going one step further and launching a new dataset called Radar Domain Rankings (Beta). Domain Rankings is based on aggregated 1.1.1.1 resolver data that is anonymized in accordance with our privacy commitments. The dataset aims to identify the top most popular domains based on how people use the Internet globally, without tracking individuals’ Internet use.

There are a few reasons why we’re doing this now. One is obviously to improve our Radar features with better data and incorporate new learnings. But also, ranking lists are used all over the Internet in all sorts of systems. One of the most used and trusted sources of domain rankings was Alexa, but that service was recently deprecated. We believe we are in a good position to provide a strong alternative.

Let’s see how we built it.

Differences in domain names

Before we dig into the data science behind Domain Rankings, it’s important to understand what a domain and DNS are. Internet domain names are human-readable dot-separated letters, digits and hyphens that correspond to a network resource, like a server or a website. However, your computer and applications don’t know what to do with a domain name; they need IP addresses to send and receive information over the network. DNS is the system that converts, or resolves, a domain name into an IP address. Think of it as an Internet phonebook for domain names.

Note: This is a simplification. A new standard called Internationalized Domain Names, or IDN, allows using Unicode strings in domain names.

Each dot defines a new hierarchy level, reading right to left. Domains can have multiple levels of depth. The highest level corresponds to country code top-level domains (ccTLDs) like .uk, .fr or .pt, or generic top-level domains (gTLDs) like .com, .org, or .net. These are normally assigned to and managed by either country-level entities or administrative organizations operating a registry.

Then there are the second-level domains like cloudflare.com or google.com. These are normally purchased and registered by individuals or organizations, which are then free to create and manage as many hostnames and hierarchy levels as they want.

Unfortunately, however, there are exceptions. For instance, many countries use second-level domain registration. One such example is the United Kingdom, where commercial domains can only be registered under the .co.uk hierarchy. That’s why Google in the UK isn’t google.uk, but rather google.co.uk.

But that’s not all. Some countries use 3rd level domain registrations. One example is Japan, which offers Regional Domain registration under cities like *.aisai.aichi.jp.

Projects like the Public Suffix List are a good starting point for understanding the variations involved, and how they affect validations and assumptions in other systems, such as cookies in web browsers.

Domain Rankings takes some of this nuance into account to inform the definition of our current ruleset:

  • We boil everything down to second-level domains, such as cloudflare.com or google.com.
  • However, if the second level is .edu, .com, .org, .gov, .net, .gov, .net, .co or .mil, then we use third-level domains.
  • We don’t distinguish between what we think is a website or an infrastructure system. A domain represents an Internet-available resource.
  • We will also semi-automate, curate and maintain a list of domains that map to popular platforms and services in the future. Example: fb.audio, fb.com, fb.watch, all map to a “facebook” platform.

Defining popularity

Definitions are important. We established what we consider a domain, but what does domain popularity mean exactly? Our research showed that the volume of traffic generated to a given domain doesn’t really work as a proxy for what we perceive as popular. Instead, Domain Rankings looks at the size of the population of users that look up a domain per unit of time. The more people who are interested in a domain, the more popular it is.

Sounds pretty straightforward, right? Well, it’s not. Our databases don’t have cookies, IPs, or other tracking artifacts, and we strip information that leads to identifying an individual from all of our data, by design.

The good news, however, is that we do a very good job at identifying automated traffic (for instance, you can read about Bot Management and how we use Machine Learning to detect bots in HTTP traffic in our blog) and we found we could develop a reasonable proxy for the unique users metric without sacrificing privacy (using other data points that we store for a limited period of time, like the ASN and high-level geolocation information of the request or the Cloudflare data center that served it).

Domain Rankings’ popularity metric is best described as the estimated relative size of the user population that accesses a domain over some period of time.

Our approach

We announced 1.1.1.1, our privacy-first consumer DNS resolver in 2018, and over the years it’s grown to become one of the top DNS services in the world. 1.1.1.1 is also part of a Research Agreement with APNIC in which we collaborate with them doing public research and DNS data insights.

The data we collect from it honors our privacy commitments, and is aggregated and stripped of any information that could lead to identifying or tracking users. We conducted a privacy examination by a Big Four accounting firm to determine whether the 1.1.1.1 resolver was effectively configured to meet our privacy commitments. You can read more about it in this blog, and the full report is publicly available on our compliance page.

Even without this personally identifying information, the resulting collection is vast and representative of Internet activity.

The 1.1.1.1 service is used in many ways. Regular (human) Internet users use it as their DNS resolver, either because they explicitly configured it in their devices, or their ISP did, or because they use WARP, or their browser uses 1.1.1.1 under the hood. However, servers and cloud infrastructure, IoT devices, home routers, and bots also use 1.1.1.1 extensively, which creates a lot of challenges for us when trying to identify human traffic.

We’ve been using DNS data to calculate the top and trending domains found on both the global and country pages on Cloudflare Radar. It’s been quite a learning experience trying to improve these lists. We have implemented aggregations, counts, filters, handling exceptions, and tried reducing noise, and yet they’re far from perfect. We felt that there had to be a better way.

We’ve spent the last six months building a variety of machine learning models to help us predict the rank of a domain.

Building the model was no easy feat. We experimented with multiple regression types first, to know exactly what the model was doing, and then more complex algorithms to get better performance. We played with different datasets, changed the population groups, variables (features), and combinations of variables, and used synthetic data.

After evaluation, one of our first conclusions was that building a model that could produce good results for the highest ranked domains and the long tail would be difficult.

The paper “A Long Way to the Top: Significance, Structure, and Stability of Internet Top Lists” describes this problem well. “The ranking of domains in the long tail should be based on significantly smaller and hence less reliable numbers.” Talking to our Research Team who submitted the collaboration paper “Toppling Top Lists: Evaluating the Accuracy of Popular Website Lists” to IMC 2022, got us to the same conclusion: the most popular domains (like google.com and facebook.com) have feature values disproportionally higher than the lower-ranked domains.

Therefore, we selected the two models that performed best. One model was trained on the population with the highest feature values, uses more features, and is used to generate the ordered top 100 domain list. A second model was trained on a more general group of domains, uses fewer features, and is used to get the top one million most popular domains, which we then divide into ranking buckets.

These buckets are ranked, but each bucket’s contents are intentionally unordered. For example, the second bucket of 10,000 most popular domains includes the set of domains that rank from 10,001 to 20,000, but give no further indication of the individual ranking of domains in that bucket. Given the size of some of these buckets and the window of time we use to populate them, they will inherently be exposed to more instability, too. We feel this is a good compromise between the described natural uncertainties of our long tail model and providing a reasonable idea of how close to the top a domain is.

Results

It’s important to mention there is no global view that can establish the perfect rank, and there’s no easy mechanism to confirm if a ranking is, ultimately, good. Data-driven results are always subject to some bias and skewing, related to the context of the organizations and systems that collect them. Sometimes all that can be done is to be transparent about potential sources of bias. The geographical distribution of customers and users, product characteristics, platform features, and behavioral diversity play an essential role in the final result. We are presenting the Cloudflare view, what we see.

Having said this, Cloudflare sits in a privileged position and handles a significant amount of Internet traffic. We have plenty of signals we can extract from our aggregated data, and believe that makes it possible to generate high quality domain rankings.

Domain Rankings are available today. You can head up to the Domains page and check it out:

  • Ordered list of the top 100 most popular domains globally and per country, based on our first model. Last 24 hours, updated daily.
  • Unordered global most popular domains datasets divided into buckets of the following sizes: 200, 500, 1,000, 2,000, 5,000, 10,000, 20,000, 50,000, 100,000, 200,000, 500,000, 1,000,000. Last 7 days, updated weekly.
Goodbye, Alexa. Hello, Cloudflare Radar Domain Rankings

Next steps

We will keep improving Domain Rankings and monitoring the results. Anyone can access them on Cloudflare Radar, read the results, and download the CSV files.

Feel free to explore our Domain Rankings and share feedback with us.