Ensure availability of your data using cross-cluster replication with Amazon OpenSearch Service

Post Syndicated from Prashant Agrawal original https://aws.amazon.com/blogs/big-data/ensure-availability-of-your-data-using-cross-cluster-replication-with-amazon-opensearch-service/

Amazon OpenSearch Service is a fully managed service that you can use to deploy and operate OpenSearch and legacy Elasticsearch clusters, cost-effectively, at scale in the AWS Cloud. The service makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more by offering the latest versions of OpenSearch, suppor300t for 19 versions of Elasticsearch (1.5 to 7.10 versions), and visualization capabilities powered by OpenSearch Dashboards and Kibana (1.5 to 7.10 versions).

OpenSearch Service announced the support of cross-cluster replication on October 5, 2021. With cross-cluster replication for OpenSearch Service, you can replicate indices at low latency from one domain to another in the same or different AWS Regions without needing additional technologies. Cross-cluster replication provides sequential consistency while continuously copying data from the leader index to the follower index. Sequential consistency ensures the leader and the follower return the same result set after operations are applied on the indices in the same order. Cross-cluster replication is designed to minimize delivery lag between the leader and the follower index. Typical delivery times are less than a minute. You can continuously monitor the replication status via APIs. Additionally, if you have indices that follow an index pattern, you can create automatic follow rules and they will be automatically replicated.

In this post, we show you how to use these features to ensure availability of your data using cross-cluster replication with OpenSearch Service.

Benefits of cross-cluster replication

Cross-cluster replication is helpful for use cases regarding data proximity, disaster recovery, and multi-cluster patterns.

Data proximity helps reduce latency and response time by bringing the data closer to your user or application server. For example, you can replicate data from one Region, us-west-2 (leader), to multiple Regions across the globe acting as followers, eu-west-1, ap-south-1, ca-central-1, and so on, where the follower can poll the leader to sync new or updated data in the leader. In the following diagram, data is replicated from one production cluster in us-west-2 to multiple locally available clusters near the user or application.

In the case of disaster recovery, you can have one or more follower clusters in the same Region or different Regions, and as long as you have one active cluster, you can serve read requests to the users. In the following diagram, data is replicated from one production cluster to two different disaster recovery clusters.

As of today, cross-cluster replication supports active/active read and active/passive write, as shown in the following diagram.

With this implementation, you can solve the problem of read if your leader goes down, but what about write? As of this writing, cross-cluster replication doesn’t support any kind of failover mechanism to make your follower the leader. In this scenario, you might need to do some extra housekeeping to make your follower domain become the leader and start accepting write requests. This post shows the steps to set up cross-cluster replication and minimize downtime by advancing your follower to be leader.

Set up cross-cluster replication

To set up cross-cluster replication, complete the following steps:

  1. Create two clusters across two Regions, for example leader-east (leader) and follower-west (follower).
    Cross-cluster replication works on a pull model, where the user creates an outbound connection at the follower domain, and the follower keeps polling the leader to sync with new or updated documents for an index.
  2. Go to the follower domain (follower-west) and create a request for an outbound connection. Specify the alias for this connection as follower-west.
  3. Go to the leader domain, locate the inbound connection, and approve the incoming connection from follower-west.
  4. Edit the security configuration and add the following access policy to allow ESCrossClusterGet in the leader domain, which is leader-east:
          "Effect": "Allow",
          "Principal": {
            "AWS": "*"
          "Action": "es:ESCrossClusterGet",
          "Resource": "arn:aws:es:us-east-2:xxx-accountidxx:domain/leader-east"

  5. Create a leader index (on the leader domain), or ignore this step if you already have an index to replicate:
    PUT catalog

  6. Navigate to OpenSearch Dashboards for the follower-west domain.
  7. On the Dev Tools tab, run the following command (or use curl to connect directly):
    PUT _plugins/_replication/catalog-rep/_start
           "leader_alias": "ccr-for-west",
           "leader_index": "catalog",
              "leader_cluster_role": "cross_cluster_replication_leader_full_access",
              "follower_cluster_role": "cross_cluster_replication_follower_full_access"

  8. Confirm the replication:
    GET _plugins/_replication/catalog-rep/_status

  9. Index some documents in the leader index; the following command indexes documents to the catalog index with id:1:
    POST catalog/_doc
      "id": "1"

  10. Now go to follower domain and confirm the documents are replicated by running the following search query:
    GET catalog/_search
       "hits" : [
            "_index" : "catalog",
            "_type" : "_doc",
            "_id" : "hg3YsYIBcxKtCcyhNyp4",
            "_score" : 1.0,
            "_source" : {
              "id" : "1"

Pause and stop the replication

When your replication is running, you can use these steps to pause and stop the replication.

You can use the following API to pause the replication, for example, while you debug an issue or load on the leader. Make sure to add an empty body with the request.

POST _plugins/_replication/catalog-rep/_pause

If you pause the replication, you must resume it within 12 hours. If you fail to resume it within 12 hours, you must stop replication, delete the follower index, and restart replication of the leader.

Stopping the replication makes the follower index unfollow the leader and become a standard index. Use the following code to stop replication:

POST _plugins/_replication/catalog-rep/_stop

Note that you can’t restart replication to this index after you stop it.


You can define a set of replication rules against a single leader domain that automatically replicates indexes that match a specified pattern.

When an index on the leader domain matches one of the patterns (for example, logstash-*), a matching follower index is created on the follower domain. The following code is an example replication rule for auto-follow:

POST _plugins/_replication/_autofollow
      "leader_alias" : "follower-west",
       "name": "rule-name",
       "pattern": "logstash-*",
          "leader_cluster_role": "cross_cluster_replication_leader_full_access",
          "follower_cluster_role": "cross_cluster_replication_follower_full_access"

Delete the replication rule to stop replicating new indexes that match the pattern:

DELETE _plugins/_replication/_autofollow
       "leader_alias" : "follower-west",
       "name": "rule-name"

Monitor cross-cluster replication metrics

OpenSearch Service provides metrics to monitor cross-cluster replication that can help you know the status of the replication along with its performance. For example, ReplicationRate can help you understand the average rate of replication operations per second, and ReplicationNumSyncingIndices can help you know the number of indexes with the replication status SYNCING. For more details about all the metrics provided by OpenSearch Service for cross-cluster replication, refer to Cross-cluster replication metrics.

Recovering from failure

At this point, we have two OpenSearch Service domains running in two different Regions. Let’s consider a scenario in which some disastrous event happens in the Region with your leader domain and the leader goes down. At this point, you can still serve read traffic from the follower domain, but no additional updates are applied because the follower can’t read from the leader. In this scenario, you can use the following steps to advance your follower to be leader:

  1. Go to your follower domain and stop replication:
    POST _plugins/_replication/catalog-rep/_stop

    After replication stops on the follower domain, your follower index acts as a normal index.

  2. At this point, you can start sending write traffic to the follower.

This way, you can advance your follower domain to become leader and route your write traffic to the follower, which helps avoid the data loss for new sets of changes and updates.

Keep in mind that there is a small lag (less than a minute) between the leader-follower sync. Additionally, there could be small amount of data loss in the follower domain that was indexed to the leader and not synced to the follower (especially when the leader went down and the follower didn’t have a chance to poll the changes and updates). For this scenario, you should have a mechanism in your ingest pipeline to replay the data to the follower when your leader goes down.

Now, what if the leader comes back online after a certain period of time. At this time, you can’t start the replication again from your follower to sync the delta to the leader. Even if you try to set up the replication from follower to leader, it will fail with an error. After you have used an index for a leader-follower connection, you can’t use same index again to create a new replication. So, what do you do now?

In this scenario, you can use the following steps to set up a leader-follower connection in the opposite direction:

  1. Delete the index from the old leader.
  2. Set up cross-Region replication in the opposite direction with your new leader (follower-west) and new follower (leader-east).
  3. Start the replication on the new follower (which was your old leader) and sync the data.

This runs the sync for all data again for that index, and may take time depending upon the size of the index because it will bootstrap the index and start the replication from scratch. Additionally, you will incur standard AWS data transfer costs for the data transferred with this replication. This way, you can advance your follower (follower-west) to be leader and make your leader (leader-east) the new follower.


In this post, we showed you how you can use cross-cluster replication to sync data between leader and follower indices. We also demonstrated how you can advance your follower to become leader in case your leader goes down. This can help you serve traffic in the event of any disaster scenarios.

If you have feedback about this post, submit your comments in the comments section. If you have questions about this post, start a new thread on the Amazon OpenSearch Service forum or contact AWS Support.

About the Author

Prashant Agrawal is a Search Specialist Solutions Architect with OpenSearch Service. He works closely with customers to help them migrate their workloads to the cloud and helps existing customers fine-tune their clusters to achieve better performance and save on cost. Before joining AWS, he helped various customers use OpenSearch and Elasticsearch for their search and log analytics use cases. When not working, you can find him traveling and exploring new places. In short, he likes doing Eat → Travel → Repeat.

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support…

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/timestone-netflixs-high-throughput-low-latency-priority-queueing-system-with-built-in-support-1abf249ba95f

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support for Non-Parallelizable Workloads

by Kostas Christidis


Timestone is a high-throughput, low-latency priority queueing system we built in-house to support the needs of Cosmos, our media encoding platform. Over the past 2.5 years, its usage has increased, and Timestone is now also the priority queueing engine backing Conductor, our general-purpose workflow orchestration engine, and BDP Scheduler, the scheduler for large-scale data pipelines. All in all, millions of critical workflows within Netflix now flow through Timestone on a daily basis.

Timestone clients can create queues, enqueue messages with user-defined deadlines and metadata, then dequeue these messages in an earliest-deadline-first (EDF) fashion. Filtering for EDF messages with criteria (e.g. “messages that belong to queue X and have metadata Y”) is also supported.

One of the things that make Timestone different from other priority queues is its support for a construct we call exclusive queues — this is a means to mark chunks of work as non-parallelizable, without requiring any locking or coordination on the consumer side; everything is taken care of by the exclusive queue in the background. We explain the concept in detail in the sections that follow.

Why Timestone

When designing the successor to Reloaded — our media encoding system — back in 2018 (see “Background” section in The Netflix Cosmos Platform), we needed a priority queueing system that would provide queues between the three components in Cosmos (Figure 1):

  1. the API framework (Optimus),
  2. the forward chaining rule engine (Plato), and
  3. the serverless computing layer (Stratum)
Figure 1. A video encoding application built on top of Cosmos. Notice the three Cosmos subsystems: Optimus, an API layer mapping external requests to internal business models, Plato, a workflow layer for business rule modeling, and Stratum, the serverless layer for running stateless and computational-intensive functions. Source: The Netflix Cosmos Platform

Some of the key requirements this priority queueing system would need to satisfy:

1. A message can only be assigned to one worker at any given time. The work that tends to happen in Cosmos is resource-intensive, and can fan out to thousands of actions. Assume then, that there is replication lag between the replicas in our data store, and we present as dequeueable to worker B the message that was just dequeued by worker A via a different node. When we do that, we waste significant compute cycles. This requirement then throws eventually consistent solutions out of the window, and means we want linearizable consistency at the queue level.

2. Allow for non-parallelizable work.

Given that Plato is continuously polling all workflow queues for more work to execute —

While Plato is executing a workflow for a given project (a request for work on a given service) —

Then Plato should not be able to dequeue additional requests for work for that project on that workflow. Otherwise Plato’s inference engine will evaluate the workflow prematurely, and may move the workflow to an incorrect state.

There exists then, a certain type of work in Cosmos that should not be parallelizable, and the ask is for the queueing system to support this type of access pattern natively. This requirement gave birth to the exclusive queue concept. We explain how exclusive queues work in Timestone in the“Key Concepts” section.

3. Allow for dequeueing and queue depth querying using filters (metadata key-value pairs)

4. Allow for the automatic creation of a queue upon message ingestion

5. Render a message dequeueable within a second of ingestion

We built Timestone because we could not find an off-the-shelf solution that meets these requirements.

System Architecture

Timestone is a gRPC-based service. We use protocol buffers to define the interface of our service and the structure of our request and response messages. The system diagram for the application is shown in Figure 2.

Figure 2. Timestone system diagram. Arrows link all the components touched during a typical Timestone client-server interaction. Numbers in red indicate sequence steps. Identical numbers identify concurrent steps.

System of record

The system of record is a durable Redis cluster. Every write request (see Step 1 — note that this includes dequeue requests since they alter the state of the queue) that reaches the cluster (Step 2) is persisted to a transaction log before a response is sent back to the server (Step 3).

Inside the database, we represent each queue with a sorted set where we rank message IDs (see “Message” section) according to priority. We persist messages and queue configurations (see “Queues” section) in Redis as hashes. All data structures related to a queue — from the messages it contains to the in-memory secondary indexes needed to support dequeue-by-filter — are placed in the same Redis shard. We achieve this by having them share a common prefix, specific to the queue in question. We then codify this prefix as a Redis hash tag. Each message carries a payload (see “Message” section) that can weigh up to 32 KiB.

Almost all of the interactions between Timestone and Redis (see “Message States” section) are codified as Lua scripts. In most of these Lua scripts, we tend to update a number of data structures. Since Redis guarantees that each script is executed atomically, a successful script execution is guaranteed to leave the system in a consistent (in the ACID sense) state.

All API operations are queue-scoped. All API operations that modify state are idempotent.

Secondary indexes

For observability purposes, we capture information about incoming messages and their transition between states in two secondary indexes maintained on Elasticsearch.

When we get a write response from Redis, we concurrently (a) return the response to the client, and (b) convert this response into an event that we post to a Kafka cluster, as shown in Step 4. Two Flink jobs — one for each type of index we maintain — consume the events from the corresponding Kafka topics, and update the indexes in Elasticsearch.

One index (“current”) gives users a best-effort view into the current state of the system, while the other index (“historic”) gives users a best effort longitudinal view for messages, allowing them to trace the messages as they flow through Timestone, and answer questions such as time spent in a state, and number of processing errors. We maintain a version counter for each message; every write operation increments that counter. We rely on that version counter to order the events in the historic index.

Events are stored in the Elasticsearch cluster for a finite number of days.

Current Usage at Netflix

The system is dequeue heavy. We see 30K dequeue requests per second (RPS) with a P99 latency of 45ms. In comparison, we see 1.2K enqueue RPS at 25ms P99 latency. We regularly see 5K RPS enqueue bursts at 85ms P99 latency.

15B messages have been enqueued to Timestone since the beginning of the year; these messages have been dequeued 400B times. Pending messages regularly reach 10M.

Usage is expected to double next year, as we migrate the rest of Reloaded, our legacy media encoding system, to Cosmos.

Key Concepts


A message carries an opaque payload, a user-defined priority (see “Priority” section), an optional (mandatory for exclusive queues) set of metadata key-value pairs that can be used for filter-based dequeueing, and an optional invisibility duration.

Any message that is placed into a queue can be dequeued a finite number of times. We call these attempts; each dequeue invocation on a message decreases the attempts left on it.


The priority of a message is expressed as an integer value; the lower the value, the higher the priority. While an application is free to use whatever range they see fit, the norm is to use Unix timestamps in milliseconds (e.g. 1661990400000 for 9/1/2022 midnight UTC).

Figure 3. A snippet from the PriorityClass enum used by a streaming encoding pipeline in Cosmos. The values in parentheses indicate the offset in days.

It is also entirely up to the application to define its own priority levels. For instance a streaming encoding pipeline within Cosmos uses mail priority classes, as shown in Figure 3. Messages belonging to the standard class use the time of enqueue as their priority, while all other classes have their priority values adjusted in multiples of ∼10 years. The priority is set at the workflow rule level, but can be overridden if the request carries a studio tag, such as DAY_OF_BROADCAST.

Message States

Within a queue, a Timestone message belongs to one of six states (Figure 4):

  1. invisible
  2. pending
  3. running
  4. completed
  5. canceled
  6. errored

In general, a message can be enqueued with or without invisibility, which makes the message invisible or pending respectively. Invisible messages become pending when their invisibility window elapses.

A worker can dequeue a pending earliest-deadline-first message from a queue by specifying the amount of time (lease duration) they will be processing it for. Dequeueing messages in batch is also supported. This moves the message to the running state.

The same worker can then issue a complete call to Timestone within the allotted lease window to move the message to the completed state, or issue a lease extension call if they want to maintain control of the message. (A worker can also move a typically running message to the canceled state to signal it is no longer need for processing.)

If none of these calls are issued on time, the message becomes dequeueable again, and this attempt on the message is spent. If there are no attempts left on the message, it is moved automatically to the errored state.

The terminal states (completed, errored, and canceled) are garbage-collected periodically by a background process.

Messages can move states either when a worker invokes an API operation, or when Timestone runs its background processes (Figure 4, marked in red — these run periodically). Figure 4 shows the complete state transition diagram.

Figure 4. State transition diagram for Timestone messages.


All incoming messages are stored in queues. Within a queue, messages are sorted by their priority date. Timestone can host an arbitrary number of user-created queues, and offers a set of API operations for queue management, all revolving around a queue configuration object. Data we store in this object includes the queue’s type (see rest of section), the lease duration that applies to dequeued messages, or the invisibility duration that applies to enqueued messages, the number of times a message can be dequeued, and whether enqueueing or dequeueing is temporarily blocked. Note that a message producer can override the default lease or invisibility duration by setting it at the message level during enqueue.

All queues in Timestone fall into two types, simple, and exclusive.

When an exclusive queue is created, it is associated with a user-defined exclusivity key — for example project. All messages posted to that queue must carry this key in their metadata. For instance, a message with project=foo will be accepted into the queue; a message without the project key will not be. In this example, we call foo, the value that corresponds to the exclusivity key, the message’s exclusivity value.

The contract for exclusive queues is that at any point in time, there can be only up to one consumer per exclusivity value. Therefore, if the project-based exclusive queue in our example has two messages with the key-value pair project=foo in it, and one of them is already leased out to a worker, the other one is not dequeueable. This is depicted in Figure 5.

Figure 5. When worker_2 issues a dequeue call, they lease msg_2 instead of msg_1, even though msg_1 has a higher priority. That happens because the queue is exclusive, and the exclusive value foo is already leased out.

In a simple queue no such contract applies, and there is no tight coupling with message metadata keys. A simple queue works as your typical priority queue, simply ordering messages in an earliest-deadline-first fashion.

What We Are Working On

Some of the things we’re working on:

  1. As the the usage of Timestone within Cosmos grows, so does the need to support a range of queue depth queries. To solve this, we are building a dedicated query service that uses a distinct query model.
  2. As noted above (see “System of record” section), a queue and its contents can only currently occupy one Redis shard. Hot queues however can grow big, esp. when compute capacity is scarce. We want to support arbitrarily large queues, which has us building support for queue sharding.
  3. Messages can carry up to 4 key-value pairs. We currently use all of these key-value pairs to populate the secondary indexes used during dequeue-by-filter. This operation is exponentially complex both in terms of time and space (O(2^n)). We are switching to lexicographical ordering on sorted sets to drop the number of indexes by half, and handle metadata in a more cost-efficient manner.

We may be covering our work on the above in follow-up posts. If these kinds of problems sound interesting to you, and if you like the challenges of building distributed systems for the Netflix Content and Studio ecosystem at scale in general, you should consider joining us.


Poorna Reddy, Kostas Christidis, Aravindan Ramkumar, Surafel Korse, Jiaofen Xu, Anoop Panicker, and Kishore Banala have contributed to this project. We thank Charles Zhao, Olof Johansson, Frank San Miguel, Dmitry Vasilyev, Prudhvi Chaganti, and the rest of the Cosmos team for their constructive feedback while developing and operating Timestone.

Timestone: Netflix’s High-Throughput, Low-Latency Priority Queueing System with Built-in Support… was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Announcing the GNU Toolchain Infrastructure Project

Post Syndicated from corbet original https://lwn.net/Articles/909704/

The backers of the GNU Toolchain Infrastructure Project, which was the
subject of an intense discussion at the GNU
Tools Cauldron, have finally posted
their plans

Linux Foundation IT services plans for the GNU Toolchain include
Git repositories, mailing lists, issue tracking, web sites, and
CI/CD, implemented with strong authentication, attestation, and
security posture. Utilizing the experience and infrastructure of
the LF IT team that is already used by the Linux kernel community
will provide the most effective solution and best experience for
the GNU Toolchain developer community.

What’s New in InsightVM and Nexpose: Q3 2022 in Review

Post Syndicated from Roshnee Mistry Shah original https://blog.rapid7.com/2022/09/28/whats-new-in-insightvm-and-nexpose-q3-2022-in-review/

What’s New in InsightVM and Nexpose: Q3 2022 in Review

Another quarter comes to a close! While we definitely had our share of summer fun, our team continued to invest in the product, releasing features and updates like recurring coverage for enterprise technologies, performance enhancements, and more. Let’s take a look at some of the key releases in InsightVM and Nexpose from Q3.

[InsightVM and Nexpose] Recurring coverage for VMware vCenter

Recurring coverage provides ongoing, automatic vulnerability coverage for popular enterprise technology and systems. We recently added VMware vCenter to our list.

VMware vCenter Server is a centralized management platform used to manage virtual machines, ESXi hosts, and dependent components from a single host. Last year, vCenter was a significant target for bad actors and became the subject of a number of zero-days. Rapid7 provided ad hoc coverage to protect you against the vulnerabilities. Now, recurring coverage ensures fast, comprehensive protection that provides offensive and defensive security against vCenter vulnerabilities as they arise.

[InsightVM and Nexpose] Tune Assistant

The Security Console in InsightVM and Nexpose contains components that benefit from performance tuning. Tune Assistant is a built-in feature that will calculate performance tuning values based on resources allocated to the Security Console server, then automatically apply those values.

Tuning is calculated and applied to all new consoles when the product first starts up, and customers experiencing performance issues on existing consoles can now easily increase their own resources. For more information, read our docs page on configuring maximum performance in an enterprise environment.

What’s New in InsightVM and Nexpose: Q3 2022 in Review

[InsightVM and Nexpose] Windows Server 2022 Support

We want to ensure InsightVM and Nexpose are supported on business-critical technologies and operating systems. We added Windows Server 2022, the latest operating system for servers from Microsoft, to our list. The Scan Engine and Security Console can be installed and will be supported by Rapid7 on Windows Server 2022. Learn more about the systems we support.

[InsightVM and Nexpose] Checks for notable vulnerabilities

With exploitation of major vulnerabilities in Mitel MiVoice Connect, multiple Confluence applications, and other popular solutions, the threat actors definitely did not take it easy this summer. InsightVM and Nexpose customers can assess their exposure to many of these CVEs for vulnerability checks, including:

  • Mitel MiVoice Connect Service Appliance | CVE-2022-29499: An onsite VoIP business phone system, MiVoice Connect had a data validation vulnerability, which arose from insufficient data validation for a diagnostic script. The vulnerability potentially allowed an unauthenticated remote attacker to send specially crafted requests to inject commands and achieve remote code execution. Learn more about the vulnerability and our response.
  • “Questions” add-on for Confluence Application | CVE-2022-26138: This vulnerability affected “Questions,” an add-on for the Confluence application. It was quickly exploited in the wild once the hardcoded password was released on social media. Learn more about the vulnerability and our response.
  • Multiple vulnerabilities in Zimbra Collaboration Suite: Zimbra, a business productivity suite, was affected by five different vulnerabilities, one of which was unpatched, and four of which were being actively and widely exploited in the wild by well-organized threat actors. Learn more about the vulnerability and our response.
  • CVE-2022-30333
  • CVE-2022-27924
  • CVE-2022-27925
  • CVE-2022-37042
  • CVE-2022-37393

We were hard at work this summer making improvements and increasing the level of protections against attackers for our customers. As we head into the fall and the fourth quarter of the year, you can bet we will continue to make InsightVM the best and most comprehensive risk management platform available. Stay tuned for more great things, and have a happy autumn.

Additional reading:


Get the latest stories, expertise, and news about security today.

ALP prototype ‘Les Droites’ is to be expected later this week (openSUSE News)

Post Syndicated from corbet original https://lwn.net/Articles/909687/

The openSUSE News site is looking
to the imminent preview release of the openSUSE ALP

As far as “Les Droites” goes, users can look forward to a SLE Micro
like HostOS with self-healing abilities contributing to our
OS-as-a-Service/ZeroTouch story. The Big Idea is that the user
focuses on the application rather than the underlying host, which
manages, heals, and self-optimizes itself. Both Salt
(pre-installed) and Ansible will be available to simplify further

Users can look forward to Full Disk Encryption (FDE) with TPM
support by default on x86_64. Another part of the deliverables are
numerous containerized system components including yast2, podman,
k3s, cockpit, Display Manager (GDM), and KVM. All of which users
can experiment with, which are simply referred to as Workloads.

Announcing Turnstile, a user-friendly, privacy-preserving alternative to CAPTCHA

Post Syndicated from Reid Tatoris original https://blog.cloudflare.com/turnstile-private-captcha-alternative/

Announcing Turnstile, a user-friendly, privacy-preserving alternative to CAPTCHA

Announcing Turnstile, a user-friendly, privacy-preserving alternative to CAPTCHA

Today, we’re announcing the open beta of Turnstile, an invisible alternative to CAPTCHA. Anyone, anywhere on the Internet, who wants to replace CAPTCHA on their site will be able to call a simple API, without having to be a Cloudflare customer or sending traffic through the Cloudflare global network. Sign up here for free.

There is no point in rehashing the fact that CAPTCHA provides a terrible user experience. It’s been discussed in detail before on this blog, and countless times elsewhere. The creator of the CAPTCHA has even publicly lamented that he “unwittingly created a system that was frittering away, in ten-second increments, millions of hours of a most precious resource: human brain cycles.” We hate it, you hate it, everyone hates it. Today we’re giving everyone a better option.

Turnstile is our smart CAPTCHA alternative. It automatically chooses from a rotating suite of non-intrusive browser challenges based on telemetry and client behavior exhibited during a session. We talked in an earlier post about how we’ve used our Managed Challenge system to reduce our use of CAPTCHA by 91%. Now anyone can take advantage of this same technology to stop using CAPTCHA on their own site.

UX isn’t the only big problem with CAPTCHA — so is privacy

While having to solve a CAPTCHA is a frustrating user experience, there is also a potential hidden tradeoff a website must make when using CAPTCHA. If you are a small site using CAPTCHA today, you essentially have one option: an 800 pound gorilla with 98% of the CAPTCHA market share. This tool is free to use, but in fact it has a privacy cost: you have to give your data to an ad sales company.

According to security researchers, one of the signals that Google uses to decide if you are malicious is whether you have a Google cookie in your browser, and if you have this cookie, Google will give you a higher score. Google says they don’t use this information for ad targeting, but at the end of the day, Google is an ad sales company. Meanwhile, at Cloudflare, we make money when customers choose us to protect their websites and make their services run better. It’s a simple, direct relationship that perfectly aligns our incentives.

Less data collection, more privacy, same security

In June, we announced an effort with Apple to use Private Access Tokens. Visitors using operating systems that support these tokens, including the upcoming versions of macOS or iOS, can now prove they’re human without completing a CAPTCHA or giving up personal data.

By collaborating with third parties like device manufacturers, who already have the data that would help us validate a device, we are able to abstract portions of the validation process, and confirm data without actually collecting, touching, or storing that data ourselves. Rather than interrogating a device directly, we ask the device vendor to do it for us.

Private Access Tokens are built directly into Turnstile. While Turnstile has to look at some session data (like headers, user agent, and browser characteristics) to validate users without challenging them, Private Access Tokens allow us to minimize data collection by asking Apple to validate the device for us. In addition, Turnstile never looks for cookies (like a login cookie), or uses cookies to collect or store information of any kind. Cloudflare has a long track record of investing in user privacy, which we will continue with Turnstile.

We are opening our CAPTCHA replacement to everyone

To improve the Internet for everyone, we decided to open up the technology that powers our Managed Challenge to everyone in beta as a standalone product called Turnstile.

Rather than try to unilaterally deprecate and replace CAPTCHA with a single alternative, we built a platform to test many alternatives and rotate new challenges in and out as they become more or less effective. With Turnstile, we adapt the actual challenge outcome to the individual visitor/browser. First we run a series of small non-interactive JavaScript challenges gathering more signals about the visitor/browser environment. Those challenges include proof-of-work, proof-of-space, probing for web APIs, and various other challenges for detecting browser-quirks and human behavior. As a result, we can fine-tune the difficulty of the challenge to the specific request.

Turnstile also includes machine learning models that detect common features of end visitors who were able to pass a challenge before. The computational hardness of those initial challenges may vary by visitor, but is targeted to run fast.

Swap out your existing CAPTCHA in a few minutes

You can take advantage of Turnstile and stop bothering your visitors with a CAPTCHA even without being on the Cloudflare network. While we make it as easy as possible to use our network, we don’t want this to be a barrier to improving privacy and user experience.

To switch from a CAPTCHA service, all you need to do is:

  1. Create a Cloudflare account, navigate to the `Turnstile` tab on the navigation bar, and get a sitekey and secret key.
  2. Copy our JavaScript from the dashboard and paste over your old CAPTCHA JavaScript.
  3. Update the server-side integration by replacing the old siteverify URL with ours.

There is more detail on the process below, including options you can configure, but that’s really it. We’re excited about the simplicity of making a change.

Announcing Turnstile, a user-friendly, privacy-preserving alternative to CAPTCHA

Deployment options and analytics

To use Turnstile, first create an account and get your site and secret keys.

Announcing Turnstile, a user-friendly, privacy-preserving alternative to CAPTCHA

Then, copy and paste our HTML snippet:

<script src="https://challenges.cloudflare.com/turnstile/v0/api.js" async defer></script>

Once the script is embedded, you can use implicit rendering. Here, the HTML is scanned for elements that have a cf-turnstile class:

<form action="/login" method="POST">
  <div class="cf-turnstile" data-sitekey="yourSiteKey"></div>
  <input type="submit">

Once a challenge has been solved, a token is injected in your form, with the name cf-turnstile-response. This token can be used with our siteverify endpoint to validate a challenge response. A token can only be validated once, and a token cannot be redeemed twice. The validation can be done on the server side or even in the cloud, for example using a simple Workers fetch (see a demo here):

async function handleRequest() {
    // ... Receive token
    let formData = new FormData();
    formData.append('secret', turnstileISecretKey);
    formData.append('response', receivedToken);
    await fetch('https://challenges.cloudflare.com/turnstile/v0/siteverify',
            body: formData,
            method: 'POST'
    // ...

For more complex use cases, the challenge can be invoked explicitly via JavaScript:

    window.turnstileCallbackFunction = function () {
        const turnstileOptions = {
            sitekey: 'yourSitekey',
            callback: function(token) {
                console.log(`Challenge Success: ${token}`);
        turnstile.render('#container', turnstileOptions);
<div id="container"></div>

You can also create what we call ‘Actions’. Custom labels that allow you to distinguish between different pages where you’re using Turnstile, like a login, checkout, or account creation page.

Once you’ve deployed Turnstile, you can go back to the dashboard and see analytics on where you have widgets deployed, how users are solving them, and view any defined actions.

Announcing Turnstile, a user-friendly, privacy-preserving alternative to CAPTCHA

Why are we giving this away for free?

While this is sometimes hard for people outside to believe, helping build a better Internet truly is our mission. This isn’t the first time we’ve built free tools that we think will make the Internet better, and it won’t be the last. It’s really important to us.

So whether or not you’re a Cloudflare customer today, if you’re using a CAPTCHA, try Turnstile for free, instead. You’ll make your users happier, and minimize the data you send to third parties.

Visit this page to sign up for the best invisible, privacy-first, CAPTCHA replacement and to retrieve your Turnstile beta sitekey.

We’ve shipped so many products the Cloudflare dashboard needed its own search engine

Post Syndicated from Emily Flannery original https://blog.cloudflare.com/quick-search-beta/

We've shipped so many products the Cloudflare dashboard needed its own search engine

We've shipped so many products the Cloudflare dashboard needed its own search engine

Today we’re proud to announce our first release of quick search for the Cloudflare dashboard, a beta version of our first ever cross-dashboard search tool to help you navigate our products and features. This first release is now available to a small percentage of our customers. Want to request early access? Let us know by filling out this form.

What we’re launching

We’re launching quick search to speed up common interactions with the Cloudflare dashboard. Our dashboard allows you to configure Cloudflare’s full suite of products and features, and quick search gives you a shortcut.

To get started, you can access the quick search tool from anywhere within the Cloudflare dashboard by clicking the magnifying glass button in the top navigation, or hitting Ctrl + K on Linux and Windows or ⌘ + K on Mac. (If you find yourself forgetting which key combination it is just remember that it’s or Ctrl-K-wik.) From there, enter a search term and then select from the results shown below.

We've shipped so many products the Cloudflare dashboard needed its own search engine
Access quick search from the top navigation bar, or use keyboard shortcuts Ctrl + K on Linux and Windows or ⌘ + K on Mac.

Current supported functionality

What functionality will you have access to? Below you’ll learn about the three core capabilities of quick search that are included in this release, as well as helpful tips for using the tool.

Search for a page in the dashboard

Start typing in the name of the product you’re looking for, and we’ll load matching terms after each key press. You will see results for any dashboard page that currently exists in your sidebar navigation. Then, just click the desired result to navigate directly there.

We've shipped so many products the Cloudflare dashboard needed its own search engine
Search for “page” and you’ll see results categorized into “website-only products” and “account-wide products.”
We've shipped so many products the Cloudflare dashboard needed its own search engine
Search for “ddos” and you’ll see results categorized into “websites,” “website-only products” and “account-wide products.”

Search for website-only products

For our customers who manage a website or domain in Cloudflare, you have access to a multitude of Cloudflare products and features to enhance your website’s security, performance and reliability. Quick search can be used to easily find those products and features, regardless of where you currently are in the dashboard (even from within another website!).

You may easily search for your website by name to navigate to your website’s Overview page:

We've shipped so many products the Cloudflare dashboard needed its own search engine

You may also navigate to the products and feature pages within your specific website(s). Note that you can perform a website-specific search from anywhere in your core dashboard using one of two different approaches, which are explained below.

First, you may search first for your website by name, then navigate search results from there:

We've shipped so many products the Cloudflare dashboard needed its own search engine

Alternatively, you may search first for the product or feature you’re looking for, then filter down by your website:

We've shipped so many products the Cloudflare dashboard needed its own search engine

Search for account-wide products

Many Cloudflare products and features are not tied directly to a website or domain that you have set up in Cloudflare, like Workers, R2, Magic Transit—not to mention their related sub-pages. Now, you may use quick search to more easily navigate to those sections of the dashboard.

We've shipped so many products the Cloudflare dashboard needed its own search engine

Here’s an overview of what’s next on our quick search roadmap (and not yet supported today):

  • Search results do not currently return results of product- and feature-specific names or configurations, such as Worker names, specific DNS records, IP addresses, Firewall Rules.
  • Search results do not currently return results from within the Zero Trust dashboard.
  • Search results do not currently return results for Cloudflare content living outside the dashboard, like Support or Developer documentation.

We’d love to hear what you think. What would you like to see added next? Let us know using the feedback link found at the bottom of the search window.

We've shipped so many products the Cloudflare dashboard needed its own search engine

Our vision for the future of the dashboard

We’re excited to launch quick search and to continue improving our dashboard experience for all customers. Over time, we’ll mature our search functionality to index any and all content you might be looking for — including search results for all product content, Support and Developer docs, extending search across accounts, caching your recent searches, and more.

Quick search is one of many important user experience improvements we are planning to tackle over the coming weeks, months and years. The dashboard is central to your Cloudflare experience, and we’re fully committed to making your experience delightful, useful, and easy. Stay tuned for an upcoming blog post outlining the vision for the Cloudflare dashboard, from our in-app home experience to our global navigation and beyond.

For now, keep your eye out for the little search icon that will help you in your day-to-day responsibilities in Cloudflare, and if you don’t see it yet, don’t worry—we can’t wait to ship it to you soon.

If you don’t yet see quick search in your Cloudflare dashboard, you can request early access by filling out this form.

Private by design: building privacy-preserving products with Cloudflare’s Privacy Edge

Post Syndicated from Mari Galicer original https://blog.cloudflare.com/privacy-edge-making-building-privacy-first-apps-easier/

Private by design: building privacy-preserving products with Cloudflare's Privacy Edge

Private by design: building privacy-preserving products with Cloudflare's Privacy Edge

When Cloudflare was founded, our value proposition had three pillars: more secure, more reliable, and more performant. Over time, we’ve realized that a better Internet is also a more private Internet, and we want to play a role in building it.

User awareness and expectations of and for privacy are higher than ever, but we believe that application developers and platforms shouldn’t have to start from scratch. We’re excited to introduce Privacy Edge – Code Auditability, Privacy Gateway, Privacy Proxy, and Cooperative Analytics – a suite of products that make it easy for site owners and developers to build privacy into their products, by default.

Building network-level privacy into the foundations of app infrastructure

As you’re browsing the web every day, information from the networks and apps you use can expose more information than you intend. When accumulated over time, identifiers like your IP address, cookies, browser and device characteristics create a unique profile that can be used to track your browsing activity. We don’t think this status quo is right for the Internet, or that consumers should have to understand the complex ecosystem of third-party trackers to maintain privacy. Instead, we’ve been working on technologies that encourage and enable website operators and app developers to build privacy into their products at the protocol level.

Getting privacy right is hard. We figured we’d start in the area we know best: building privacy into our network infrastructure. Like other work we’ve done in this space – offering free SSL certificates to make encrypted HTTP requests the norm, and launching, a privacy-respecting DNS resolver, for example – the products we’re announcing today are built upon the foundations of open Internet standards, many of which are co-authored by members of our Research Team.

Privacy Edge – the collection of products we’re announcing today, includes:

  • Privacy Gateway: A lightweight proxy that encrypts request data and forwards it through an IP-blinding relay
  • Code Auditability: A solution to verifying that code delivered in your browser hasn’t been tampered with
  • Private Proxy: A proxy that offers the protection of a VPN, built natively into application architecture
  • Cooperative Analytics: A multi-party computation approach to measurement and analytics based on an emerging distributed aggregation protocol.

Today’s announcement of Privacy Edge isn’t exhaustive. We’re continuing to explore, research and develop new privacy-enabling technologies, and we’re excited about all of them.

Privacy Gateway: IP address privacy for your users

There are situations in which applications only need to receive certain HTTP requests for app functionality, but linking that data with who or where it came from creates a privacy concern.

We recently partnered with Flo Health, a period tracking app, to solve exactly that privacy concern: for users that have turned on “Anonymous mode,” Flo encrypts and forwards traffic through Privacy Gateway so that the network-level request information (most importantly, users’ IP addresses) are replaced by the Cloudflare network.

Private by design: building privacy-preserving products with Cloudflare's Privacy Edge
How data is encapsulated, forwarded, and decapsulated in the Privacy Gateway system.

So how does it work? Privacy Gateway is based on Oblivious HTTP, an emerging IETF standard, and at a high level describes the following data flow:

  1. The client encapsulates an HTTP request using the public key of the customer’s gateway server, and sends it to the relay over a client<>relay HTTPS connection.
  2. The relay forwards the request to the server over its own relay<>gateway HTTPS connection.
  3. The gateway server decapsulates the request, forwarding it to the application server.
  4. The gateway server returns an encapsulated response to the relay, which then forwards the result to the client.

The novel feature Privacy Gateway implements from the OHTTP specification is that messages sent through the relay are encrypted (via HPKE) to the application server, so that the relay learns nothing of the application data beyond the source and destination of each message.

The end result is that the relay will know where the data request is coming from (i.e. users’ IP addresses) but not what it contains (i.e. contents of the request), and the application can see what the data contains but won’t know where it comes from. A win for end-user privacy.

Delivering verifiable and authentic code for privacy-critical applications

How can you ensure that the code — the JavaScript, CSS or even HTML —delivered to a browser hasn’t been tampered with?

One way is to generate a hash (a consistent, unique, and shorter representation) of the code, and have two independent parties compare those hashes when delivered to the user’s browser.

Our Code Auditability service does exactly that, and our recent partnership with Meta deployed it at scale to WhatsApp Web. Installing their Code Verify browser extension ensures users can be sure that they are delivered the code they’re intended to run – free of tampering or corrupted files.

With WhatsApp Web:

  1. WhatsApp publishes the latest version of their JavaScript libraries to their servers, and the corresponding hash for that version to Cloudflare’s audit endpoint.
  2. A WhatsApp web client fetches the latest libraries from WhatsApp.
  3. The Code Verify browser extension subsequently fetches the hash for that version from Cloudflare over a separate, secure connection.
  4. Code Verify compares the “known good” hash from Cloudflare with the hash of the libraries it locally computed.

If the hashes match, as they should under almost any circumstance, the code is “verified” from the perspective of the extension. If the hashes don’t match, it indicates that the code running on the user’s browser is different from the code WhatsApp intended to run on all its user’s browsers.

Private by design: building privacy-preserving products with Cloudflare's Privacy Edge
How Cloudflare and WhatsApp Web verify code shipped to users isn’t tampered with.

Right now, we call this “Code Auditability” and we see a ton of other potential use cases including password managers, email applications, certificate issuance – all technologies that are potentially targets of tampering or security threats because of the sensitive data they handle.

In the near term, we’re working with other app developers to co-design solutions that meet their needs for privacy-critical products. In the long term, we’re working on standardizing the approach, including building on existing Content Security Policy standards, or the Isolated Web Apps proposal, and even an approach towards building Code Auditability natively into the browser so that a browser extension (existing or new) isn’t required.

Privacy-preserving proxying – built into applications

What if applications could build the protection of a VPN into their products, by default?

Privacy Proxy is our platform to proxy traffic through Cloudflare using a combination of privacy protocols that make it much more difficult to track users’ web browsing activity over time. At a high level, the Privacy Proxy Platform encrypts browsing traffic, replaces a device’s IP address with one from the Cloudflare network, and then forwards it onto its destination.

Private by design: building privacy-preserving products with Cloudflare's Privacy Edge
System architecture for Privacy Proxy.

The Privacy Proxy platform consists of several pieces and protocols to make it work:

  1. Privacy API: a service that issues unique cryptographic tokens, later redeemed against the proxy service to ensure that only valid clients are able to connect to the service.
  2. Geolocated IP assignment: a service that assigns each connection a new Cloudflare IP address based on the client’s approximate location.
  3. Privacy Proxy: the HTTP CONNECT-based service running on Cloudflare’s network that handles the proxying of traffic. This service validates the privacy token passed by the client, enforces any double spend prevention necessary for the token.
Private by design: building privacy-preserving products with Cloudflare's Privacy Edge

We’re working on several partnerships to provide network-level protection for user’s browsing traffic, most recently with Apple for Private Relay. Private Relay’s design adds privacy to the traditional proxy design by adding an additional hop – an ingress proxy, operated by Apple – that separates handling users’ identities (i.e., whether they’re a valid iCloud+ user) from the proxying of traffic – the egress proxy, operated by Cloudflare.

Measurements and analytics without seeing individual inputs

What if you could calculate the results of a poll, without seeing individuals’ votes, or update inputs to a machine learning model that predicted COVID-19 exposure without seeing who was exposed?

It might seem like magic, but it’s actually just cryptography. Cooperative Analytics is a multi-party computation system for aggregating privacy-sensitive user measurements that doesn’t reveal individual inputs, based on the Distributed Aggregation Protocol (DAP).

Private by design: building privacy-preserving products with Cloudflare's Privacy Edge
How data flows through the Cooperative Analytics system.

At a high-level, DAP takes the core concept behind MapReduce — what became a fundamental way to aggregate large amounts of data — and rethinks how it would work with privacy-built in, so that each individual input cannot be (provably) mapped back to the original user.


  1. Measurements are first “secret shared,” or split into multiple pieces. For example, if a user’s input is the number 5, her input could be split into two shares of [10,-5].
  2. The input share pieces are then distributed between different, non-colluding servers for aggregation (in this example, simply summed up). Similar to Privacy Gateway or Private Proxy, no one party has all the information needed to reconstruct any user’s input.
  3. Depending on the use case, the servers will then communicate with one another in order to verify that the input is “valid” – so that no one can insert an input that throws off the entire results. The magic of multi-party computation is that the servers can perform this computation without learning anything about the input beyond its validity.
  4. Once enough input shares have been aggregated to ensure strong anonymity and a statistically significant sample size – each server sends its sum of the input shares to the overall consumer of this service to then compute the final result.

For simplicity, the above example talks about measurements as summed up numbers, but DAP describes algorithms for multiple different types of inputs: the most common string input, or a linear regression, for example.

Early iterations of this system have been implemented by Apple and Google for COVID-19 exposure notifications, but there are many other potential use cases for a system like this: think sensitive browser telemetry, geolocation data – any situation where one has a question about a population of users, but doesn’t want to have to measure them directly.

Because this system requires different parties to operate separate aggregation servers, Cloudflare is working with several partners to act as one of the aggregation servers for DAP. We’re calling our implementation Daphne, and it’s built on top of Cloudflare Workers.

Privacy still requires trust

Part of what’s cool about these systems is that they distribute information — whether user data, network traffic, or both — amongst multiple parties.

While we think that products included in Privacy Edge are moving the Internet in the right direction, we understand that trust only goes so far. To that end, we’re trying to be as transparent as possible.

  • We’ve open sourced the code for Privacy Gateway’s server and DAP’s aggregation server, and all the standards work we’re doing is in public with the IETF.
  • We’re also working on detailed and accessible privacy notices for each product that describe exactly what kind of network data Cloudflare sees, doesn’t see, and how long we retain it for.
  • And, most importantly, we’re continuing to develop new protocols (like Oblivious HTTP) and technologies that don’t just require trust, but that can provably minimize the data observed or logged.

We’d love to see more folks get involved in the standards space, and we welcome feedback from privacy experts and potential customers on how we can improve the integrity of these systems.

We’re looking for collaborators

Privacy Edge products are currently in early access.

We’re looking for application developers who want to build more private user-facing apps with Privacy Gateway; browser and existing VPN vendors looking to improve network-level security for their users via Privacy Proxy; and anyone shipping sensitive software on the Internet that is looking to iterate with us on code auditability and web app signing.

If you’re interested in working with us on furthering privacy on the Internet, then please reach out, and we’ll be in touch!

Introducing Cloudflare’s free Botnet Threat Feed for service providers

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/botnet-threat-feed-for-isp/

Introducing Cloudflare’s free Botnet Threat Feed for service providers

Introducing Cloudflare’s free Botnet Threat Feed for service providers

We’re pleased to introduce Cloudflare’s free Botnet Threat Feed for Service Providers. This includes all types of service providers, ranging from hosting providers to ISPs and cloud compute providers.

This feed will give service providers threat intelligence on their own IP addresses that have participated in HTTP DDoS attacks as observed from the Cloudflare network — allowing them to crack down on abusers, take down botnet nodes, reduce their abuse-driven costs, and ultimately reduce the amount and force of DDoS attacks across the Internet. We’re giving away this feed for free as part of our mission to help build a better Internet.

Service providers that operate their own IP space can now sign up to the early access waiting list.

Cloudflare’s unique vantage point on DDoS attacks

Cloudflare provides services to millions of customers ranging from small businesses and individual developers to large enterprises, including 29% of Fortune 1000 companies. Today, about 20% of websites rely directly on Cloudflare’s services. This gives us a unique vantage point on tremendous amounts of DDoS attacks that target our customers.

DDoS attacks, by definition, are distributed. They originate from botnets of many sources — in some cases, from hundreds of thousands to millions of unique IP addresses. In the case of HTTP DDoS attacks, where the victims are flooded with HTTP requests, we know that the source IP addresses that we see are the real ones — they’re not spoofed (altered). We know this because to initiate an HTTP request a connection must be established between the client and server. Therefore, we can reliably identify the sources of the attacks to understand the origins of the attacks.

As we’ve seen in previous attacks, such as the 26 million request per second DDoS attack that was launched by the Mantis botnet, a significant portion originated from service providers such as French-based OVH (Autonomous System Number 16276), the Indonesian Telkomnet (ASN 7713), the US-based iboss (ASN 137922), the Libyan Ajeel (ASN 37284), and others.

Introducing Cloudflare’s free Botnet Threat Feed for service providers
Source service providers of a Mantis botnet attack

The service providers are not to blame. Their networks and infrastructure are abused by attackers to launch attacks. But, it can be hard for service providers to identify the abusers. In some cases, we’ve seen as little as one single IP of a service provider participate in a DDoS attack consisting of thousands of bots — all scattered across many service providers. And so, the service providers usually only see a small fraction of the attack traffic leaving their network, and it can be hard to correlate it to malicious activity.

Even more so, in the case of HTTPS DDoS attacks, the service provider would only see encrypted gibberish leaving their network without any possibility to decrypt or understand if it is malicious or legitimate traffic. However, at Cloudflare, we see the entire attack and all of its sources, and can use that to help service providers stop the abusers and attacks.

Leveraging our unique vantage point, we go to great lengths to ensure that our threat intelligence includes actual attackers and not legitimate clients.

Partnering with service providers around the world to help build a better Internet

Since our previous experience mitigating Mantis botnet attacks, we’ve been working with providers around the world to help them crack down on abusers. We realized the potential and decided to double down on this effort. The result is that each service provider can subscribe to a feed of their own offending IPs, for free, so they can take action and take down the abused systems.

Our mission at Cloudflare is to help build a better Internet — one that is safer, more performant, and more reliable for everyone. We believe that providing this threat intelligence will help us all move in that direction — cracking down on DDoS attackers and taking down malicious botnets.

If you are a service provider and operate your own IP space, you can now sign up to the early access waiting list.

Monitor your own network with free network flow analytics from Cloudflare

Post Syndicated from Chris Draper original https://blog.cloudflare.com/free-magic-network-monitoring/

Monitor your own network with free network flow analytics from Cloudflare

Monitor your own network with free network flow analytics from Cloudflare

As a network engineer or manager, answering questions about the traffic flowing across your infrastructure is a key part of your job. Cloudflare built Magic Network Monitoring (previously called Flow Based Monitoring) to give you better visibility into your network and to answer questions like, “What is my network’s peak traffic volume? What are the sources of that traffic? When does my network see that traffic?” Today, Cloudflare is excited to announce early access to a free version of Magic Network Monitoring that will be available to everyone. You can request early access by filling out this form.

Magic Network Monitoring now features a powerful analytics dashboard, self-serve configuration, and a step-by-step onboarding wizard. You’ll have access to a tool that helps you visualize your traffic and filter by packet characteristics including protocols, source IPs, destination IPs, ports, TCP flags, and router IP. Magic Network Monitoring also includes network traffic volume alerts for specific IP addresses or IP prefixes on your network.

Making Network Monitoring easy

Magic Networking Monitoring allows customers to collect network analytics without installing a physical device like a network TAP (Test Access Point) or setting up overly complex remote monitoring systems. Our product works with any hardware that exports network flow data, and customers can quickly configure any router to send flow data to Cloudflare’s network. From there, our network flow analyzer will aggregate your traffic data and display it in Magic Network Monitoring analytics.

Analytics dashboard

In Magic Network Monitoring analytics, customers can take a deep dive into their network traffic data. You can filter traffic data by protocol, source IP, destination IP, TCP flags, and router IP. Customers can combine these filters together to answer questions like, “How much ICMP data was requested from my speed test server over the past 24 hours?” Visibility into traffic analytics is a key part of understanding your network’s operations and proactively improving your security. Let’s walk through some cases where Magic Network Monitoring analytics can answer your network visibility and security questions.

Monitor your own network with free network flow analytics from Cloudflare

Create network volume alert thresholds per IP address or IP prefix

Magic Network Monitoring is incredibly flexible, and it can be customized to meet the needs of any network hobbyist or business. You can monitor your traffic volume trends over time via the analytics dashboard and build an understanding of your network’s traffic profile. After gathering historical network data, you can set custom volumetric threshold alerts for one IP prefix or a group of IP prefixes. As your network traffic changes over time, or their network expands, they can easily update their Magic Network Monitoring configuration to receive data from new routers or destinations within their network.

Monitoring a speed test server in a home lab

Let’s run through an example where you’re running a network home lab. You decide to use Magic Network Monitoring to track the volume of requests a speed test server you’re hosting receives and check for potential bad actors. Your goal is to identify when your speed test server experiences peak traffic, and the volume of that traffic. You set up Magic Network Monitoring and create a rule that analyzes all traffic destined for your speed test server’s IP address. After collecting data for seven days, the analytics dashboard shows that peak traffic occurs on weekdays in the morning, and that during this time, your traffic volume ranges from 450 – 550 Mbps.

As you’re checking over the analytics data, you also notice strange traffic spikes of 300 – 350 Mbps in the middle of the night that occur at the same time. As you investigate further, the analytics dashboard shows the source of this traffic spike is from the same IP prefix. You research some source IPs, and find they’re associated with malicious activity. As a result, you update your firewall to block traffic from this problematic source.

Identifying a network layer DDoS attack

Magic Network Monitoring can also be leveraged to identify a variety of L3, L4, and L7 DDoS attacks. Let’s run through an example of how ACME Corp, a small business using Magic Network Monitoring, can identify a Ping (ICMP) Flood attack on their network. Ping Flood attacks aim to overwhelm the targeted network’s ability to respond to a high number of requests or overload the network connection with bogus traffic.

At the start of a Ping Flood attack, your server’s traffic volume will begin to ramp up. Magic Network Monitoring will analyze traffic across your network, and send an email, webhook, or PagerDuty alert once an unusual volume of traffic is identified. Your network and security team can respond to the volumetric alert by checking the data in Magic Network Monitoring analytics and identifying the attack type. In this case, they’ll notice the following traffic characteristics:

  1. Network traffic volume above your historical traffic averages
  2. An unusually large amount of ICMP traffic
  3. ICMP traffic coming from a specific set of source IPs

Now, your network security team has confirmed the traffic is malicious by identifying the attack type, and can begin taking steps to mitigate the attack.

Magic Network Monitoring and Magic Transit

If your business is impacted by DDoS attacks, Magic Network Monitoring will identify attacks, and Magic Transit can be used to mitigate those DDoS attacks. Magic Transit protects customers’ entire network from DDoS attacks by placing our network in front of theirs. You can use Magic Transit Always On to reduce latency and mitigate attacks all the time, or Magic Transit On Demand to protect your network during active attacks. With Magic Transit, you get DDoS protection, traffic acceleration, and other network functions delivered as a service from every Cloudflare data center. Magic Transit works by allowing Cloudflare to advertise customers’ IP prefixes to the Internet with BGP to route the customer’s traffic through our network for DDoS protection. If you’re interested in protecting your network with Magic Transit, you can visit the Magic Transit product page and request a demo today.

Monitor your own network with free network flow analytics from Cloudflare

Sign up for early access and what’s next

The free version of Magic Network Monitoring (MNM) will be released in the next few weeks. You can request early access by filling out this form.

This is just the beginning for Magic Network Monitoring. In the future, you can look forward to features like advanced DDoS attack identification, network incident history and trends, and volumetric alert threshold recommendations.

Cold War Bugging of Soviet Facilities

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/09/cold-war-bugging-of-soviet-facilities.html

Found documents in Poland detail US spying operations against the former Soviet Union.

The file details a number of bugs found at Soviet diplomatic facilities in Washington, D.C., New York, and San Francisco, as well as in a Russian government-owned vacation compound, apartments used by Russia personnel, and even Russian diplomats’ cars. And the bugs were everywhere: encased in plaster in an apartment closet; behind electrical and television outlets; bored into concrete bricks and threaded into window frames; inside wooden beams and baseboards and stashed within a building’s foundation itself; surreptitiously attached to security cameras; wired into ceiling panels and walls; and secretly implanted into the backseat of cars and in their window panels, instrument panels, and dashboards. It’s an impressive—­ and impressively thorough—­ effort by U.S. counterspies.

We have long read about sophisticated Russian spying operations—bugging the Moscow embassy, bugging Selectric typewriters in the Moscow embassy, bugging the new Moscow embassy. These are the first details I’ve read about the US bugging the Russians’ embassy.

[$] Finding bugs with sanitizers

Post Syndicated from jake original https://lwn.net/Articles/909245/

Andrey Konovalov began his 2022 Linux
Security Summit Europe
(LSS EU) talk with a bold statement: “fuzzing is
useless”. As might be guessed, he qualified that assertion quickly by
adding “without dynamic bug detectors”. These bug detectors include
“sanitizers” of various sorts, such as the Kernel Address
(KASAN), but there are others. Konovalov looked in detail at KASAN
and gave an overview of the
sanitizer landscape along with some ideas of ways to push these bug
detectors further—to find even more kernel bugs.

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.