Comic for 2024.06.24 – Weightlifting

2024-06-25 Explosm.net

Post Syndicated from Explosm.net original https://explosm.net/comics/weightlifting

New Cyanide and Happiness Comic

How to use SES Mail Manager SMTP Relay action to deliver inbound email to Google Workspace and Microsoft 365

2024-06-25 Zip Zieper

Post Syndicated from Zip Zieper original https://aws.amazon.com/blogs/messaging-and-targeting/how-to-use-ses-mail-manager-smtp-relay-action-to-deliver-inbound-email-to-google-workspace-and-microsoft-365/

Introduction

Customers often ask us if the Amazon Simple Email Service (SES) inbound capabilities they use with applications hosted on AWS infrastructure can also be used to process and automate employee email hosted on public services like Google Workspace and Microsoft 365. The answer has typically been “yes, but with some limitations”, as until now, SES inbound has been somewhat constrained by the fact that it didn’t support relaying messages for an existing domain. This limitation makes it very difficult to fully manage email flows across hybrid email environments.

Such conversations led the SES team to create Amazon Simple Email Service (SES) Mail Manager which offers a set of capabilities that simplify managing large volumes of email communications within an organization. Mail Manager’s rules set conditions and actions can optimize routing for improved delivery and communication flow, both for incoming and outgoing emails. Mail Manager’s email security features can be augmented by optional add-ons from industry-leading, vetted third-party providers. Flexible archiving features help organizations meet stringent compliance and record-keeping requirements.

In this blog, we position Mail Manager as a central ingress gateway for a fictitious company, Nutrition.co, that is based on real-world AWS customers. We discuss the customer challenges and explain how to configure Mail Manager’s SMTP Relay action to intercept, archive then deliver emails destined for employees’ Google Workspace hosted Gmail and Microsoft 365 hosted Exchange Online mailboxes. Similar mail flows can be used to process, automate and archive emails destined for their AWS hosted apps.

You can learn more about all of Mail Manager’s capabilities here.

Customer background and use case

Our fictitious company, Nutrition.co, is an online retail business with multiple employee departments, including administration, marketing, sales and fulfillment. The company has acquired several smaller rivals that use both Google Workspace and Microsoft 365 to host their employee inboxes, and plan to consolidate all users onto the same domain ( such as [email protected] and [email protected]). They also host several applications on Amazon Web Services (AWS) that use Amazon SES’ inbound capability to receive emails using a subdomain *customer-support*.nutrition.co, such as orders@*customer-support*.nutrition.co and returns@*customer-support*.nutrition.co.

Nutrition.co is looking for a solution to unify all their email domain routing, security and archiving processes onto one centralized management system to simplify their email infrastructure. They want an approach that provides more flexibility to control which addresses and domains are used for apps and automation as well as employee mail. They also want to enhance email compliance and governance with a flexible solution for screening, processing and archiving inbound emails to both employees and applications, before delivering those emails to recipient inboxes on Google Workspace and Microsoft 365 and applications hosted on AWS.

The SES Mail Manger based central ingress and egress gateway architecture we propose will allow Nutrition.co to manage their peer-to-peer and application-driven emails in one place, Amazon SES. It will simplify email security and management, and make it easy to unlock new cloud-enabled email use cases. The architecture can be modified to acommodate a wide variety of email infrastructure, including fully cloud hosted, on-premises, and hybrid mailbox hosting environments.

What is an Inbound SMTP Gateway?

An Inbound SMTP Gateway is an SMTP server that accepts inbound email via an Open Ingress Point, and then delivers those messages to another email environment’s inbound SMTP server. In the diagram below, Mail Manger is configured as an inbound SMTP Gateway:

Figure 1: Diagram of the inbound gateway mail flow to a mailbox hosting environment

“Inbound email” refers to email traffic flows where the originator of the message can be either a trusted (for example: the UK division of Nutrition.co) or an untrusted (for example: a Nutrition.co customer or vendor) entity. To send an email, the originating email system looks up the recipient domain’s MX record in the global DNS system to determine the address for the recipient’s inbound mail server. Once a connection is established on port 25, the originating server delivers the email message using the SMTP protocol typically using STARTTLS for transport level encryption. Inbound messages are typically authenticated using the SPF, DKIM, and DMARC industry standard protocols, which help ensure the messages are coming from the legitimate sender’s domain.

An Inbound SMTP gateway can act on messages, for example to process and/or archive, before passing them along to the end recipient’s email server. To learn more about archiving emails in transit, visit this blog.

Configuring Mail Manager as an Inbound SMTP Gateway

Before we can configure Mail Manager as an Inbound Gateway for Nutrition.co’s Google Workspace and Microsoft 365 hosted mailboxes, we need to “allow-list” Mail Manager in Nutrition.co’s Google Workspace and Microsoft 365 settings. Allow-listing in this context refers to configuring the hosted mailbox environments such that Mail Manager is not identified as the source of messages, but rather as an SMTP relay.

This configuration is necessary because the messages being relayed through Mail Manager originate from both trusted and untrusted senders. This mail flow will contain both wanted and, potentially, unwanted messages. Mail Manager is the intermediary, not the source of potentially unwanted email passing through Mail Manager’s Open Ingress Point before being relayed to the destination mailbox environment.

If Mail Manager is not allow-listed, inbound email that is relayed thru Mail Manager’s Open Ingress Point will fail SPF checks because the IP addresses of the intermediary server are not authorized by the domain’s SPF policy. Since DMARC relies on SPF, messages from intermediary mail servers will fail the domain’s DMARC policy if they are not signed with a domain-aligned DKIM signature.

Mailbox hosting environments and their anti-spam algorithms rely on SPF, DKIM and DMARC for authenticating different inbound mail flow configurations before making an assessment about the message’s disposition. Properly authenticated messages, if not otherwise identified as unwanted by recipients and their security administrator, are delivered to Inboxes. Messages that are not authenticated are more likely to be treated as spam. Messages from intermediary servers can sometimes be mistaken as spoofed or unwanted messages.

By allow-listing the egress IP addresses of the Mail Manager servers, Nutrition.co’s Google Workspace and Microsoft 365 hosting environments will be able to assess the correct SPF result when receiving inbound email from Mail Manager.

Note: Do not include Mail Manager’s IP addresses in the domain’s SPF policy, These IP addresses are shared by other Mail Manager customers so including them in the domain’s SPF policy can introduce a security risk.

Note: It is also possible to use DKIM and ARC for allow-listing mail streams, but Gmail and Exchange Online both support IP allow-listing.

Note: Nutrition.co’s Google Workspace and Microsoft 365 hosting environments may still make a spam assessment about the messages under the context that Mail Manager is not the original sender, but this is not common.

Figure 2: Diagram of the SES Mail Manager architecture to accept inbound email via an open Ingress endpoint and configured with a Rule set condition to relay messages with the SMTP Relay action.

In the diagram above, the interaction points are as follows:

1. Email senders look in DNS to discover the MX record for example.com.
2. The value of the domain’s MX record is the A record of the Mail Manager Ingress endpoint. The Ingress endpoint is configured as an ‘open’ Ingress endpoint so that it can receive inbound email without requiring SMTP Auth
3. The Ingress endpoint traffic policy is configured to allow and deny traffic
4. The Rule Set conditions determine which messages are to be relayed
5. The SMTP Relay action relays messages for recipients that are SES verified identities

Configuring Mail Manager as an Inbound SMTP Gateway

Prerequisites

Access to the administrative console for Nutrition.co’s Google Workspace and Microsoft 365 hosted mailboxes
Access to the DNS zone hosting the MX records for the Nutrition.co’s domains

Step 1: Allow-list the regional Mail Manager IP addresses in Nutrition.co’s Google Workspace and Microsoft 365, and create the Mail Manager relay action(s) in AWS SES console.

If you do not configure the allow-list Nutrition.co’s Google Workspace and Microsoft 365 hosted, it may cause those mailbox providers to reject as spam or send to junk the emails replayed from your Mail Manager environment.

Step 1-a: Follow the instructions to allow-list Mail Manager to relay email to Nutrition.co’s Google Workspace and Microsoft 365 environments.

Step1-b: Create an SMTP relay for your mailbox hosting environment

* See Creating an SMTP relay in the SES console

Figure 3: Screenshot of an SMTP Relay rule action configured for Microsoft 365 Exchange Online inbound receiving

Figure 4: Screenshot of an SMTP Relay rule action configured for Google Workspaces Gmail inbound receiving

Because Nutrition.co hosts email in both Google Workspace and Microsoft 365, we must create SMTP Relay actions for both.

Step 2: In SES console, verify Nutrition.co’s email domain, which is nutrition.co

SES needs to prove that Nutrition.co owns the domain of each of the recipient addresses before it will begin relaying inbound email. If you cannot verify ownership of the recipient email destinations, SES will not relay messages.

Follow the instructions to verify Nutrition.co’s SES domain identity for the recipient email addresses within Nutrition.co’s Google Workspace and Microsoft 365 environments. (*note that subdomains such as customer-support.nutrition.co inherit verification from the parent domain*).

See Creating a domain identity

Figure 5: Screenshot of a successfully verified domain in the SES console.

Step 3: Configure Mail Manager with an Open Ingress Point and Rule Set Action to relay inbound email to the mailbox hosting environment.

Step 3-a: See Create a Traffic Policy to accept inbound email from the internet.

Default action: Allow
(Optional) Add Policy statements, depending on your requirements. Choose the action to be taken when the filter conditions are met: Deny
- While Nutrition.co does not want to apply additional security via the SMTP Relay gateway, Mail Manager supports both native capabilities and optional add-on subscriptions to 3rd party tools from vetted industry leaders such as Spamhaus and Abusix.

Figure 6: Screenshot of a traffic policy for accepting all email from the internet

Step 3-b: Follow the instructions for creating rule sets and rules in the SES console.

Select the SMTP Relay that you created in Step 1-b and enable the **Preserve Mail From** option.
- The ‘Preserve Mail From’ setting is necessary so that the mailbox provider can be configured to make the correct assessment of the message’s SPF policy evaluation, assuming that the allow-list configuration Step 1 is complete.
Add any conditions and exceptions for each rule, depending on your needs.
- You may want to create a condition for the SMTP Relay rule so that only messages destined for recipients within your domain are relayed to the appropriate SMTP Relay action, and choose a different action for the recipients who are not hosted in your environment, such as the Archive action.
- If you have both Google Workspace and Microsoft 365 configured as SMTP Relay destinations, you may combine the SMTP Relay actions in a single rule if the conditions are the same, or create them as separate rules if the conditions need to be different

Figure 7: A Mail Manager rule configured with an SMTP Relay action for Google Workspaces and another SMTP Relay actions for Microsoft 365

Step 3-c: Follow the documentation for Creating an Ingress Point.

The Mail Manager Ingress point needs to be ‘Open“ for this use case because internet mail senders need to connect to port 25 and send without SMTP authentication for inbound mail flows.

Type: Open
Traffic policy: Choose the traffic policy that you created step 3-a
Rule set: Choose the rule set that you created in step 3-b

After saving the ingress endpoint settings, you should see something similar in the console.

Figure 8: Screenshot of an ‘open’ Mail Manager Ingress endpoint configured with a rule set and traffic policy

Step 4. Verify your configuration and change your domain’s MX record

Once you have finished configuring Mail Manager with an Inbound Gateway configuration you will have:

An Open ingress point that does not require authentication and has an open traffic policy to allow messages from the internet.
A Rule set with SMTP Relay actions that will relay inbound messages to Google Workspace and/or Microsoft 365.

Step 4-a: Test your configuration

Ingress point: You can test that the Ingress endpoint receives email by using an SMTP capable client application, such as “openssl s_client” from a host that allows for outbound port 25 connections to the A Record of your Open Ingress Point (many ISPs and cloud infrastructure providers block port 25 by default to stop the proliferation of spam on the internet). If you get a “250 OK” response from the SMTP transaction, the Ingress point is configured correctly.
Rule set: You can test your Rule set by sending a message to your Ingress endpoint that has a recipient destination that is both a verified domain, and a domain that is hosted by your mailbox environment. You may want to add the Archive and/or Save to S3 rule actions to occur prior to SMTP Relay. This enables you to view message headers and diagnose issues that may occur during the SMTP relay to the mailbox hosting environments.
Final delivery: You can test the entire mail flow by looking at the received messages in your mailbox hosting environment.
- How to look at received messages in a mailbox hosting environment
  - Google Workspace – From within the Gmail interface, find the message and open the message menu options.

- - Choose “Show original”.

- - (The Screenshot above shows the Gmail ‘Show original“ message headers. The Mail From address (also appears as the Return-path header, and envelope-from value in other headers) is preserved within the @gmail.com domain, and Gmail’s assessment of SPF correctly attributed the message as originating from 209.85.216.51 even though the message was relayed through 206.55.129.47. Since the 209.x.x.x address is in the SPF policy for gmail.com, the message passes SPF due to the allow-list configuration)
  - Microsoft 365 – From within the Outlook on the Web interface, find the message and open the message menu options.
  - Choose “View message details”. You will see the message headers similar to the Gmail example above.

Step 4-b: Change the MX record for your domain.

Note: We recommend using a new subdomain so that you can test this mail flow configuration for a period of time prior to changing the MX record for the primary domain that is actively being used by end users and applications.

Once you have finished testing, you can change the MX record for the domain. The value of the MX record should be the **A Record** of the Open Ingress point along with the priority value.

Figure 13: A screenshot of an MX record configured in Amazon Route 53

Conclusion

In this blog post, we’ve explored how to leverage SES Mail Manager’s SMTP Relay action to simplify the handling of inbound email for organizations that use a mix of email hosting environments, specifically Google Workspace and Microsoft 365. By configuring Mail Manager as an inbound SMTP gateway, our fictitious customer, Nutrition.co was able to centralize the management of their email flows, enhance security through features like traffic policies and rule sets, and ensure compliance through flexible archiving.

The key steps involved setting up allow-listing in the Google Workspace and Microsoft 365 environments, creating SMTP relay configurations in Mail Manager, and updating Nutrition.co domain’s MX record to point to the Mail Manager ingress endpoint. This allowed Nutrition.co to seamlessly route inbound emails destined for both their cloud-hosted employee mailboxes and on-premises applications, processing and archiving the messages before final delivery.

The flexibility of Mail Manager’s SMTP Relay action makes it a powerful tool for organizations looking to unify their email infrastructure, especially in hybrid environments. By acting as a centralized ingress and egress gateway, Mail Manager can help streamline email management, improve security, and unlock new cloud-enabled email use cases. As email continues to be a critical communication channel, solutions like Mail Manager will become increasingly important for businesses looking to maximize the value of their email ecosystem.

Please visit AWS Re:Post to ask and find answers to questions about SES Mail Manager. Talk with your AWS account team if you are interested in exploring Mail Manager in more depth.

Additional blogs related to Mail Manager:

About the Authors

Jesse Thompson is an Email Deliverability Manager with the Amazon Simple Email Service team. His background is in enterprise development and operations, with a focus on email abuse mitigation and encouragement of authenticity practices with open standard protocols. Jesse’s favorite activity outside of technology is recreational curling.

Alexey Kurbatsky

Alexey is a Senior Software Development Engineer at AWS, specializing in building distributed and scalable services. Outside of work, he enjoys exploring nature thru hiking as well as playing guitar.

Zip

Zip is a Sr. Specialist Solutions Architect at AWS, working with Amazon Pinpoint and Simple Email Service and WorkMail. Outside of work he enjoys time with his family, cooking, mountain biking, boating, learning and beach plogging.

Email Archiving with Mail Manager: Why To Archive In Transit vs At The Mailbox

2024-06-25 Zip Zieper

Post Syndicated from Zip Zieper original https://aws.amazon.com/blogs/messaging-and-targeting/email-archiving-with-mail-manager-why-to-archive-in-transit-vs-at-the-mailbox/

When designing Amazon Simple Email Service’s (SES) Mail Manager, we often heard from customers about the “PST-file problem” inherent with user-side mailbox-based archiving. This occurs when, for a variety of reasons, end users decide to archive their emails to local PST files or other local storage. These PST files are fragile and easily corrupted. Furthermore, they are subject to the backup practices of individual workstations. Lastly, PST files are readily are portable and can be easily copied and moved outside the visibility of the email system and your IT and IP controls.

We developed Amazon Simple Email Service (SES) Mail Manager archiving features in response to this problem, and based on additional customer feedback: the need for consistent email retention behaviors, for all email. Customers also wanted the flexibility to determine which messages to archive, where to put them, and how long to retain those messages.

To make the feature applicable to the widest set of use cases, we designed Mail Manager to be able to archive any email traversing the SES service, not just those that have already been delivered to a user’s mailbox. This added flexibility ensures organizations can maintain a complete record of exactly those email communications they wish to preserve. Rather than require external tools to search and export Mail Manager’s archives, we built these functions directly into the SES console.

In fact, the entire Media Manager archiving solution is fully managed by SES within the customer’s Mail Manager account, reducing the operational overhead traditionally associated with email archiving and compliance.

Figure 1 – Mail Manager Archiving

At the core of the SES Mail Manager archiving solution is the ability to capture and retain any message, regardless of its source or destination, as it flows through the service’s rules engine. This design approach ensures that every email message traversing Mail Manager can be subject to archiving and retention policies, rather than requiring organizations to manage different systems and tools for mail flowing through mail servers, internal relays and other email infrastructure. The result is a unified, comprehensive compliance solution that provides visibility and control over an organization’s email archiving.

SES also published a detailed overview of the Archiving feature, which is available here: Archiving and sending to final SMTP server.

Archiving on its own isn’t an innovation; it’s an email primitive – an essential capability that can be used to enable other, more complex solutions. Historically, retention of email was configured as a function of your on-premises mail server, where your mailboxes themselves were resident. Personally-authored emails were considered the high-value material to retain, and adding archiving as a function of mailbox configurations was the simplest approach.

In practice, we find that the mail captured at the mailbox server, or end user’s inbox, represents only a fraction of of the mail a typical enterprise generates. As organizations grow, the number of applications generating Application To Person (A2P) messages tends to increase dramatically. Similarly, as corporate environments become more complex, SaaS-based solutions that are external to the primary email infrastructure often use email to update employees along with workflow-management systems. Much of that mail eludes archiving as it bypasses individual user mailboxes.

The SES strategy for archiving is to capture mail from anywhere, to anywhere, as long as it transits an ingress endpoint as part of your Mail Manager configuration. You have two choices: you can write those messages directly to an S3 bucket you control, and then ingest it into any other tool you like. Alternately, you can send messages into a managed archive within Mail Manager, and gain access to search, export, and configurable retention features. By default, SES configures retention for 6 months, but it’s adjustable up to permanent retention for customers who require it.

Mail Manager’s archiving feature captures any message which matches your rule, or all messages traversing any ingress endpoint. You can choose to write all messages to or from your senior leadership team into one archive, or you can organize by other envelope metadata. The rules operate the same way whether the message is A2P or Person to Person (P2P), ensuring uniform policies and retention options.

With Mail Manager’s managed archives, you pay for each gigabyte ingested, indexed, and available for search, and a separate storage fee for each gigabyte retained every month. Note that the storage fee includes both the raw content of the messages, and the size of the computed index required for search and export functions.

For messages you write to your S3 buckets, you also have the option to invoke an S3 trigger action that calls an Amazon Lambda to drive various automatation workflows. Regulated industries might want to write all messages to S3 to leverage S3’s glacier storage option for very long-term storage.

You can even split your workload between Mail Manager’s managed archive, for emails you are likely to need readily discoverable, and the Write to S3 option, for content which you don’t expect to ever need to search with granularity, but still needs to be archived to “check the box” for your retention policy. In fact, AWS encourages such a builder-oriented approach, because it rewards thoughtful decisions and resource utilization, and conforms to the broad goal of consumption-based pricing, which Mail Manager embraces fully at every step.

Figure 2 - Rule Set with conditions for archiving

Figure 2 – Rule Set with conditions for archivingMail Manager provides a more comprehensive, resilient archiving approach that increases both the overall scope of mail that can be captured, and the fidelity of the archived data. You don’t need any special adapters or plugins to capture mail from any source. All email that comes through your Mail Manager Ingress Endpoint can be archived.

Figure 3 – Create archive

Why not try Mail Manager today and experience the benefits of a centralized, scalable email archiving solution? You’ll pay only for the data you ingest and retain each month, without the fragility and visibility issues of user-managed archives. Visit the SES website to start your free trial of Mail Manager and take control of your organization’s critical email records. To start with Mail Manager, visit https://aws.amazon.com/ses/, click on Mail Manager, and set up your first workload today.

If you have any questions or need further guidance, feel free to reach out to us via the SES Forums or in the comments section of this blog post. We’re here to help you navigate the evolving email landscape and unlock the full potential of your Amazon SES investment.

About the Authors

Toby Weir-Jones

Toby is a Principal Product Manager for Amazon SES and WorkMail. He joined AWS in January 2021 and has significant experience in both business and consumer information security products and services. His focus on email solutions at SES is all about tackling a product that everyone uses and finding ways to bring innovation and improved performance to one of the most ubiquitous IT tools.

Brett Ezell

Brett is an Amazon Pinpoint and Amazon Simple Email Service Specialist Solutions Architect at AWS. As a Navy veteran, he joined AWS in 2020 through an AWS technical military apprenticeship program. When he isn’t deep diving into solutions for customer challenges, Brett spends his time collecting vinyl, attending live music, and training at the gym. An admitted comic book nerd, he feeds his addiction every Wednesday by combing through his local shop for new books.

Zip

Configure a custom domain name for your Amazon MSK cluster

2024-06-24 Subham Rakshit

Post Syndicated from Subham Rakshit original https://aws.amazon.com/blogs/big-data/configure-a-custom-domain-name-for-your-amazon-msk-cluster/

Amazon Managed Streaming for Kafka (Amazon MSK) is a fully managed service that enables you to build and run applications that use Apache Kafka to process streaming data. It runs open-source versions of Apache Kafka. This means existing applications, tooling, and plugins from partners and the Apache Kafka community are supported without requiring changes to application code.

Customers use Amazon MSK for real-time data sharing with their end customers, who could be internal teams or third parties. These end customers manage Kafka clients, which are deployed in AWS, other managed cloud providers, or on premises. When migrating from self-managed to Amazon MSK or moving clients between MSK clusters, customers want to avoid the need for Kafka client reconfiguration, to use a different Domain Name System (DNS) name. Therefore, it’s important to have a custom domain name for the MSK cluster that the clients can communicate to. Also, having a custom domain name makes the disaster recovery (DR) process less complicated because clients don’t need to change the MSK bootstrap address when either a new cluster is created or a client connection needs to be redirected to a DR AWS Region.

MSK clusters use AWS-generated DNS names that are unique for each cluster, containing the broker ID, MSK cluster name, two service generated sub-domains, and the AWS Region, ending with amazonaws.com. The following figure illustrates this naming format.

MSK brokers use the same DNS name for the certificates used for Transport Layer Security (TLS) connections. The DNS name used by clients with TLS encrypted authentication mechanisms must match the primary Common Name (CN), or Subject Alternative Name (SAN) of the certificate presented by the MSK broker, to avoid hostname validation errors.

The solution discussed in this post provides a way for you to use a custom domain name for clients to connect to their MSK clusters when using SASL/SCRAM (Simple Authentication and Security Layer/ Salted Challenge Response Mechanism) authentication only.

Solution overview

Network Load Balancers (NLBs) are a popular addition to the Amazon MSK architecture, along with AWS PrivateLink as a way to expose connectivity to an MSK cluster from other virtual private clouds (VPCs). For more details, see How Goldman Sachs builds cross-account connectivity to their Amazon MSK clusters with AWS PrivateLink. In this post, we run through how to use an NLB to enable the use of a custom domain name with Amazon MSK when using SASL/SCRAM authentication.

The following diagram shows all components used by the solution.

SASL/SCRAM uses TLS to encrypt the Kafka protocol traffic between the client and Kafka broker. To use a custom domain name, the client needs to be presented with a server certificate matching that custom domain name. As of this writing, it isn’t possible to modify the certificate used by the MSK brokers, so this solution uses an NLB to sit between the client and MSK brokers.

An NLB works at the connection layer (Layer 4) and routes the TCP or UDP protocol traffic. It doesn’t validate the application data being sent and forwards the Kafka protocol traffic. The NLB provides the ability to use a TLS listener, where a certificate is imported into AWS Certificate Manager (ACM) and associated with the listener and enables TLS negotiation between the client and the NLB. The NLB performs a separate TLS negotiation between itself and the MSK brokers. This NLB TLS negotiation to the target works exactly the same irrespective of whether certificates are signed by a public or private Certificate Authority (CA).

For the client to resolve DNS queries for the custom domain, an Amazon Route 53 private hosted zone is used to host the DNS records, and is associated with the client’s VPC to enable DNS resolution from the Route 53 VPC resolver.

Kafka listeners and advertised listeners

Kafka listeners (listeners) are the lists of addresses that Kafka binds to for listening. A Kafka listener is composed of a hostname or IP, port, and protocol: <protocol>://<hostname>:<port>.

The Kafka client uses the bootstrap address to connect to one of the brokers in the cluster and issues a metadata request. The broker provides a metadata response containing the address information of each broker that the client needs to connect to talk to these brokers. Advertised listeners (advertised.listeners) is a configuration option used by Kafka clients to connect to the brokers. By default, an advertised listener is not set. After it’s set, Kafka clients will use the advertised listener instead of listeners to obtain the connection information for brokers.

When Amazon MSK multi-VPC private connectivity is enabled, AWS sets the advertised.listeners configuration option to include the Amazon MSK multi-VPC DNS alias.

MSK brokers use the listener configuration to tell clients the DNS names to use to connect to the individual brokers for each authentication type enabled. Therefore, when clients are directed to use the custom domain name, you need to set a custom advertised listener for SASL/SCRAM authentication protocol. Advertised listeners are unique to each broker; the cluster won’t start if multiple brokers have the same advertised listener address.

Kafka bootstrap process and setup options

A Kafka client uses the bootstrap addresses to get the metadata from the MSK cluster, which in response provides the broker hostname and port (the listeners information by default or the advertised listener if it’s configured) that the client needs to connect to for subsequent requests. Using this information, the client connects to the appropriate broker for the topic or partition that it needs to send to or fetch from. The following diagram shows the default bootstrap and topic or partition connectivity between a Kafka client and MSK broker.

You have two options when using a custom domain name with Amazon MSK.

Option 1: Only a bootstrap connection through an NLB

You can use a custom domain name only for the bootstrap connection, where the advertised listeners are not set, so the client is directed to the default AWS cluster DNS name. This option is beneficial when the Kafka client has direct network connectivity to both the NLB and the MSK broker’s Elastic Network Interface (ENI). The following diagram illustrates this setup.

No changes are required to the MSK brokers, and the Kafka client has the custom domain set as the bootstrap address. The Kafka client uses the custom domain bootstrap address to send a get metadata request to the NLB. The NLB sends the Kafka protocol traffic received by the Kafka client to a healthy MSK broker’s ENI. That broker responds with metadata where only listeners is set, containing the default MSK cluster DNS name for each broker. The Kafka client then uses the default MSK cluster DNS name for the appropriate broker and connects to that broker’s ENI.

Option 2: All connections through an NLB

Alternatively, you can use a custom domain name for the bootstrap and the brokers, where the custom domain name for each broker is set in the advertised listeners configuration. You need to use this option when Kafka clients don’t have direct network connectivity to the MSK brokers ENI. For example, Kafka clients need to use an NLB, AWS PrivateLink, or Amazon MSK multi-VPC endpoints to connect to an MSK cluster. The following diagram illustrates this setup.

The advertised listeners are set to use the custom domain name, and the Kafka client has the custom domain set as the bootstrap address. The Kafka client uses the custom domain bootstrap address to send a get metadata request, which is sent to the NLB. The NLB sends the Kafka protocol traffic received by the Kafka client to a healthy MSK broker’s ENI. That broker responds with metadata where advertised listeners is set. The Kafka client uses the custom domain name for the appropriate broker, which directs the connection to the NLB, for the port set for that broker. The NLB sends the Kafka protocol traffic to that broker.

Network Load Balancer

The following diagram illustrates the NLB port and target configuration. A TLS listener with port 9000 is used for bootstrap connections with all MSK brokers set as targets. The listener uses TLS target type with target port as 9096. A TLS listener port is used to represent each broker in the MSK cluster. In this post, there are three brokers in the MSK cluster with TLS 9001, representing broker 1, up to TLS 9003, representing broker 3.

For all TLS listeners on the NLB, a single imported certificate with the domain name bootstrap.example.com is attached to the NLB. bootstrap.example.com is used as the Common Name (CN) so that the certificate is valid for the bootstrap address, and Subject Alternative Names (SANs) are set for all broker DNS names. If the certificate is issued by a private CA, clients need to import the root and intermediate CA certificates to the trust store. If the certificate is issued by a public CA, the root and intermediate CA certificates will be in the default trust store.

The following table shows the required NLB configuration.

NLB Listener Type	NLB Listener Port	Certificate	NLB Target Type	NLB Targets
TLS	9000	bootstrap.example.com	TLS	All Broker ENIs
TLS	9001	bootstrap.example.com	TLS	Broker 1
TLS	9002	bootstrap.example.com	TLS	Broker 2
TLS	9003	bootstrap.example.com	TLS	Broker 3

Domain Name System

For this post, a Route 53 private hosted zone is used to host the DNS records for the custom domain, in this case example.com. The private hosted zone is associated with the Amazon MSK VPC, to enable DNS resolution for the client that is launched in the same VPC. If your client is in a different VPC than the MSK cluster, you need to associate the private hosted zone with that client’s VPC.

The Route 53 private hosted zone is not a required part of the solution. The most crucial part is that the client can perform DNS resolution against the custom domain and get the required responses. You can instead use your organization’s existing DNS, a Route 53 public hosted zone or Route 53 inbound resolver to resolve Route 53 private hosted zones from outside of AWS, or an alternative DNS solution.

The following figure shows the DNS records used by the client to resolve to the NLB. We use bootstrap for the initial client connection, and use b-1, b-2, and b-3 to reference each broker’s name.

The following table lists the DNS records required for a three-broker MSK cluster when using a Route 53 private or public hosted zone.

Record	Record Type	Value
bootstrap	A	NLB Alias
b-1	A	NLB Alias
b-2	A	NLB Alias
b-3	A	NLB Alias

The following table lists the DNS records required for a three-broker MSK cluster when using other DNS solutions.

Record	Record Type	Value
bootstrap	C	NLB DNS A Record (e.g. name-id.elb.region.amazonaws.com)
b-1	C	NLB DNS A Record
b-2	C	NLB DNS A Record
b-3	C	NLB DNS A Record

In the following sections, we go through the steps to configure a custom domain name for your MSK cluster and clients connecting with the custom domain.

Prerequisites

To deploy the solution, you need the following prerequisites:

An AWS account
Appropriate AWS Identity and Access Management (IAM) permissions to deploy AWS CloudFormation stack resources

Launch the CloudFormation template

Complete the following steps to deploy the CloudFormation template:

Choose Launch Stack.

Provide the stack name as msk-custom-domain.
For MSKClientUserName, enter the user name of the secret used for SASL/SCRAM authentication with Amazon MSK.
For MSKClientUserPassword, enter the password of the secret used for SASL/SCRAM authentication with Amazon MSK.

The CloudFormation template will deploy the following resources:

VPC, private subnets, and security groups
MSK cluster
Kafka client Amazon Elastic Compute Cloud (Amazon EC2) instance
An AWS Secrets Manager secret with the provided user name and password, associated with the MSK cluster

Set up the EC2 instance

Complete the following steps to configure your EC2 instance:

On the Amazon EC2 console, connect to the instance msk-custom-domain-KafkaClientInstance1 using Session Manager, a capability of AWS Systems Manager.
Switch to ec2-user:
```
sudo su - ec2-user 
cd
```

Run the following commands to configure the SASL/SCRAM client properties, create Kafka access control lists (ACLs), and create a topic named customer:

. ./cloudformation_outputs.sh 
aws configure set region $REGION 
export BS=$(aws kafka get-bootstrap-brokers --cluster-arn ${MSKClusterArn} | jq -r '.BootstrapBrokerStringSaslScram') 
export ZOOKEEPER=$(aws kafka describe-cluster --cluster-arn $MSKClusterArn | jq -r '.ClusterInfo.ZookeeperConnectString')
./configure_sasl_scram_properties_and_kafka_acl.sh

Create a certificate

For this post, we use self-signed certificates. However, it’s recommended to use either a public certificate or a certificate signed by your organization’s private key infrastructure (PKI).

If you’re are using an AWS private CA for the private key infrastructure, refer to Creating a private CA for instructions to create and install a private CA.

Use the openSSL command to create a self-signed certificate. Modify the following command, adding the country code, state, city, and company:

SSLCONFIG="[req]
prompt = no
distinguished_name = req_distinguished_name
x509_extensions = v3_ca

[req_distinguished_name]
C = <<Country_Code>>
ST = <<State>>
L = <<City>>
O = <<Company>>
OU = 
emailAddress = 
CN = botstrap.example.com

[v3_ca]
basicConstraints = CA:FALSE
keyUsage = digitalSignature, keyEncipherment
subjectAltName = @alternate_names

[alternate_names]
DNS.1 = bootstrap.example.com
DNS.2 = b-1.example.com
DNS.3 = b-2.example.com
DNS.4 = b-3.example.com
"

openssl req -x509 -newkey rsa:2048 -days 365 -nodes \
    -config <(echo "$SSLCONFIG") \
    -keyout msk-custom-domain-pvt-key.pem \
    -out msk-custom-domain-certificate.pem

You can check the created certificate using the following command:

openssl x509 -text -noout -in msk-custom-domain-certificate.pem

Import the certificate to ACM

To use the self-signed certificate for the solution, you need to import the certificate to ACM:

export CertificateARN=$(aws acm import-certificate --certificate file://msk-custom-domain-certificate.pem --private-key file://msk-custom-domain-pvt-key.pem | jq -r '.CertificateArn')

echo $CertificateARN

After it’s imported, you can see the certificate in ACM.

Import the certificate to the Kafka client trust store

For the client to validate the server SSL certificate during the TLS handshake, you need to import the self-signed certificate to the client’s trust store.

Run the following command to use the JVM trust store to create your client trust store:

cp /usr/lib/jvm/jre-1.8.0-openjdk/lib/security/cacerts /home/ec2-user/kafka.client.truststore.jks 
chmod 700 kafka.client.truststore.jks

Import the self-signed certificate to the trust store by using the following command. Provide the keystore password as changeit.

/usr/lib/jvm/jre-1.8.0-openjdk/bin/keytool -import \ 
	-trustcacerts \ 
	-noprompt \ 
	-alias msk-cert \ 
	-file msk-custom-domain-certificate.pem \ 
	-keystore kafka.client.truststore.jks

You need to include the trust store certificate location config properties used by Kafka clients to enable certification validation:
```
echo 'ssl.truststore.location=/home/ec2-user/kafka.client.truststore.jks' >> /home/ec2-user/kafka/config/client_sasl.properties
```

Set up DNS resolution for clients within the VPC

To set up DNS resolution for clients, create a private hosted zone for the domain and associate the hosted zone with the VPC where the client is deployed:

aws route53 create-hosted-zone \
--name example.com \
--caller-reference "msk-custom-domain" \
--hosted-zone-config Comment="Private Hosted Zone for MSK",PrivateZone=true \
--vpc VPCRegion=$REGION,VPCId=$MSKVPCId

export HostedZoneId=$(aws route53 list-hosted-zones-by-vpc --vpc-id $MSKVPCId --vpc-region $REGION | jq -r '.HostedZoneSummaries[0].HostedZoneId')

Create EC2 target groups

Target groups route requests to individual registered targets, such as EC2 instances, using the protocol and port number that you specify. You can register a target with multiple target groups and you can register multiple targets to one target group.

For this post, you need four target groups: one for each broker instance and one that will point to all the brokers and will be used by clients for Amazon MSK connection bootstrapping.

The target group will receive traffic on port 9096 (SASL/SCRAM authentication) and will be associated with the Amazon MSK VPC:

aws elbv2 create-target-group \
    --name b-all-bootstrap \
    --protocol TLS \
    --port 9096 \
    --target-type ip \
    --vpc-id $MSKVPCId
    
aws elbv2 create-target-group \
    --name b-1 \
    --protocol TLS \
    --port 9096 \
    --target-type ip \
    --vpc-id $MSKVPCId
    
aws elbv2 create-target-group \
    --name b-2 \
    --protocol TLS \
    --port 9096 \
    --target-type ip \
    --vpc-id $MSKVPCId
    
aws elbv2 create-target-group \
    --name b-3 \
    --protocol TLS \
    --port 9096 \
    --target-type ip \
    --vpc-id $MSKVPCId

Register target groups with MSK broker IPs

You need to associate each target group with the broker instance (target) in the MSK cluster so that the traffic going through the target group can be routed to the individual broker instance.

Complete the following steps:

Get the MSK broker hostnames:

echo $BS

This should show the brokers, which are part of bootstrap address. The hostname of broker 1 looks like the following code:

b-1.mskcustomdomaincluster.xxxxx.yy.kafka.region.amazonaws.com

To get the hostname of other brokers in the cluster, replace b-1 with values like b-2, b-3, and so on. For example, if you have six brokers in the cluster, you will have six broker hostnames starting with b-1 to b-6.

To get the IP address of the individual brokers, use the nslookup command:

nslookup b-1.mskcustomdomaincluster.xxxxx.yy.kafka.region.amazonaws.com Server: 172.16.0.2
Address: 172.16.0.2#53

Non-authoritative answer:
Name: b-1.mskcustomdomaincluster.xxxxx.yy.kafka.region.amazonaws.com
Address: 172.16.1.225

Modify the following commands with the IP addresses of each broker to create an environment variable that will be used later:

export B1=<<b-1_IP_Address>> 
export B2=<<b-2_IP_Address>> 
export B3=<<b-3_IP_Address>>

Next, you need to register the broker IP with the target group. For broker b-1, you will register the IP address with target group b-1.

Provide the target group name b-1 to get the target group ARN. Then register the broker IP address with the target group.

export TARGET_GROUP_B_1_ARN=$(aws elbv2 describe-target-groups --names b-1 | jq -r '.TargetGroups[0].TargetGroupArn')

aws elbv2 register-targets \
--target-group-arn ${TARGET_GROUP_B_1_ARN} \
--targets Id=$B1

Iterate the steps of obtaining the IP address from other broker hostnames and register the IP address with the corresponding target group for brokers b-2 and b-3:

B-2
export TARGET_GROUP_B_2_ARN=$(aws elbv2 describe-target-groups --names b-2 | jq -r '.TargetGroups[0].TargetGroupArn')

aws elbv2 register-targets \
    --target-group-arn ${TARGET_GROUP_B_2_ARN} \
    --targets Id=$B2
B-3
export TARGET_GROUP_B_3_ARN=$(aws elbv2 describe-target-groups --names b-3 | jq -r '.TargetGroups[0].TargetGroupArn')

aws elbv2 register-targets \
    --target-group-arn ${TARGET_GROUP_B_3_ARN} \
    --targets Id=$B3

Also, you need to register all three broker IP addresses with the target group b-all-bootstrap. This target group will be used for routing the traffic for the Amazon MSK client connection bootstrap process.

export TARGET_GROUP_B_ALL_ARN=$(aws elbv2 describe-target-groups --names b-all-bootstrap | jq -r '.TargetGroups[0].TargetGroupArn')

aws elbv2 register-targets \
--target-group-arn ${TARGET_GROUP_B_ALL_ARN} \
--targets Id=$B1 Id=$B2 Id=$B3

Set up NLB listeners

Now that you have the target groups created and certificate imported, you’re ready to create the NLB and listeners.

Create the NLB with the following code:

aws elbv2 create-load-balancer \
--name msk-nlb-internal \
--scheme internal \
--type network \
--subnets $MSKVPCPrivateSubnet1 $MSKVPCPrivateSubnet2 $MSKVPCPrivateSubnet3 \
--security-groups $NLBSecurityGroupId

export NLB_ARN=$(aws elbv2 describe-load-balancers --names msk-nlb-internal | jq -r '.LoadBalancers[0].LoadBalancerArn')

Next, you configure the listeners that will be used by the clients to communicate with the MSK cluster. You need to create four listeners, one for each target group for ports 9000–9003. The following table lists the listener configurations.

Protocol	Port	Certificate	NLB Target Type	NLB Targets
TLS	9000	bootstrap.example.com	TLS	b-all-bootstrap
TLS	9001	bootstrap.example.com	TLS	b-1
TLS	9002	bootstrap.example.com	TLS	b-2
TLS	9003	bootstrap.example.com	TLS	b-3

Use the following code for port 9000:

aws elbv2 create-listener \
--load-balancer-arn $NLB_ARN \
--protocol TLS \
--port 9000 \
--certificates CertificateArn=$CertificateARN \
--ssl-policy ELBSecurityPolicy-TLS13-1-2-2021-06 \
--default-actions Type=forward,TargetGroupArn=$TARGET_GROUP_B_ALL_ARN

Use the following code for port 9001:

aws elbv2 create-listener \
--load-balancer-arn $NLB_ARN \
--protocol TLS \
--port 9001 \
--certificates CertificateArn=$CertificateARN \
--ssl-policy ELBSecurityPolicy-TLS13-1-2-2021-06 \
--default-actions Type=forward,TargetGroupArn=$TARGET_GROUP_B_1_ARN

Use the following code for port 9002:

aws elbv2 create-listener \
--load-balancer-arn $NLB_ARN \
--protocol TLS \
--port 9002 \
--certificates CertificateArn=$CertificateARN \
--ssl-policy ELBSecurityPolicy-TLS13-1-2-2021-06 \
--default-actions Type=forward,TargetGroupArn=$TARGET_GROUP_B_2_ARN

Use the following code for port 9003:

aws elbv2 create-listener \
--load-balancer-arn $NLB_ARN \
--protocol TLS \
--port 9003 \
--certificates CertificateArn=$CertificateARN \
--ssl-policy ELBSecurityPolicy-TLS13-1-2-2021-06 \
--default-actions Type=forward,TargetGroupArn=$TARGET_GROUP_B_3_ARN

Enable cross-zone load balancing

By default, cross-zone load balancing is disabled on NLBs. When disabled, each load balancer node distributes traffic to healthy targets in the same Availability Zone. For example, requests that come into the load balancer node in Availability Zone A will only be forwarded to a healthy target in Availability Zone A. If the only healthy target or the only registered target associated to an NLB listener is in another Availability Zone than the load balancer node receiving the traffic, the traffic is dropped.

Because the NLB has the bootstrap listener that is associated with a target group that has all brokers registered across multiple Availability Zones, Route 53 will respond to DNS queries against the NLB DNS name with the IP address of NLB ENIs in Availability Zones with healthy targets.

When the Kafka client tries to connect to a broker through the broker’s listener on the NLB, there will be a noticeable delay in receiving a response from the broker as the client tries to connect to the broker using all IPs returned by Route 53.

Enabling cross-zone load balancing distributes the traffic across the registered targets in all Availability Zones.

aws elbv2 modify-load-balancer-attributes --load-balancer-arn $NLB_ARN --attributes Key=load_balancing.cross_zone.enabled,Value=true

Create DNS A records in a private hosted zone

Create DNS A records to route the traffic to the network load balancer. The following table lists the records.

Record	Record Type	Value
bootstrap	A	NLB Alias
b-1	A	NLB Alias
b-2	A	NLB Alias
b-3	A	NLB Alias

Alias record types will be used, so you need the NLB’s DNS name and hosted zone ID:

export NLB_DNS=$(aws elbv2 describe-load-balancers --names msk-nlb-internal | jq -r '.LoadBalancers[0].DNSName')

export NLB_ZoneId=$(aws elbv2 describe-load-balancers --names msk-nlb-internal | jq -r '.LoadBalancers[0].CanonicalHostedZoneId')

Create the bootstrap record, and then repeat this command to create the b-1, b-2, and b-3 records, modifying the Name field:

aws route53 change-resource-record-sets \
--hosted-zone-id $HostedZoneId \
--change-batch file://<(cat << EOF
{
   "Comment": "Create bootstrap record",
   "Changes": [{
      "Action": "CREATE",
      "ResourceRecordSet": {
         "Name": "bootstrap.example.com",
         "Type": "A",
         "AliasTarget": {
            "HostedZoneId": "$NLB_ZoneId",
            "DNSName": "$NLB_DNS",
            "EvaluateTargetHealth": true
         }
      }
   }]
}
EOF)

Optionally, to optimize cross-zone data charges, you can set b-1, b-2, and b-3 to the IP address of the NLB’s ENI that is in the same Availability Zone as each broker. For example, if b-2 is using an IP address that is in subnet 172.16.2.0/24, which is in Availability Zone A, you should use the NLB ENI that is in the same Availability Zone as the value for the DNS record.

The next step details how to use a custom domain name for bootstrap connectivity only. If all Kafka traffic needs to go through the NLB, as discussed earlier, proceed to the subsequent section to set up advertised listeners.

Configure the advertised listener in the MSK cluster

To get the listener details for broker 1, you provide entity-type as brokers and entity-name as 1 for the broker ID:

/home/ec2-user/kafka/bin/kafka-configs.sh --bootstrap-server $BS \
--entity-type brokers \
--entity-name 1 \
--command-config ~/kafka/config/client_sasl.properties \
--all \
--describe | grep 'listeners=CLIENT_SASL_SCRAM'

You will get an output like the following:

Listeners=CLIENT_SASL_SCRAM://b-1.mskcustomdomaincluster.XXXX.yy.kafka.region.amazonaws.com:9096,CLIENT_SECURE://b-1.mskcustomdomaincluster.XXXX.yy.kafka.region.amazonaws.com:9094,REPLICATION://b-1.mskcustomdomaincluster.XXXX.yy.kafka.region.amazonaws.com:9093,REPLICATION_SECURE:// b-1.mskcustomdomaincluster.XXXX.yy.kafka.region.amazonaws.com:9095 sensitive=false synonyms={STATIC_BROKER_CONFIG:listeners=CLIENT_SASL_SCRAM://b-1.mskcustomdomaincluster.XXXX.yy.kafka.region.amazonaws.com:9096,CLIENT_SECURE://b-1.mskcustomdomaincluster.XXXX.yy.kafka.region.amazonaws.com:9094,REPLICATION://b-1.mskcustomdomaincluster.XXXX.yy.kafka.region.amazonaws.com:9093,REPLICATION_SECURE:// b-1.mskcustomdomaincluster.XXXX.yy.kafka.region.amazonaws.com:9095}

Going forward, clients will connect through the custom domain name. Therefore, you need to configure the advertised listeners to the custom domain hostname and port. For this, you need to copy the listener details and change the CLIENT_SASL_SCRAM listener to b-1.example.com:9001.

While you’re configuring the advertised listener, you also need to preserve the information about other listener types in the advertised listener because inter-broker communications also use the addresses in the advertised listener.

Based on our configuration, the advertised listener for broker 1 will look like the following code, with everything after sensitive=false removed:

CLIENT_SASL_SCRAM://b-1.example.com:9001,REPLICATION://b-1-internal.mskcustomdomaincluster.xxxxxx.yy.kafka.region.amazonaws.com:9093,REPLICATION_SECURE://b-1-internal.mskcustomdomaincluster.xxxxxx.yy.kafka.region.amazonaws.com:9095

Modify the following command as follows:

<<BROKER_NUMBER>> – Set to the broker ID being changed (for example, 1 for broker 1)
<<PORT_NUMBER>> – Set to the port number corresponding to broker ID (for example, 9001 for broker 1)
<<REPLICATION_DNS_NAME>> – Set to the DNS name for REPLICATION
<<REPLICATION_SECURE_DNS_NAME>> – Set to the DNS name for REPLICATION_SECURE

/home/ec2-user/kafka/bin/kafka-configs.sh --alter \
--bootstrap-server $BS \
--entity-type brokers \
--entity-name <<BROKER_NUMBER>> \
--command-config ~/kafka/config/client_sasl.properties \
--add-config advertised.listeners=[CLIENT_SASL_SCRAM://b-<<BROKER_NUMBER>>.example.com:<<PORT_NUMBER>>,REPLICATION://<<REPLICATION_DNS_NAME>>:9093,REPLICATION_SECURE://<<REPLICATION_SECURE_DNS_NAME>>:9095]

The command should look something like the following example:

/home/ec2-user/kafka/bin/kafka-configs.sh --alter \
--bootstrap-server $BS \
--entity-type brokers \
--entity-name 1 \
--command-config ~/kafka/config/client_sasl.properties \
--add-config advertised.listeners=[CLIENT_SASL_SCRAM://b-1.example.com:9001,REPLICATION://b-1-internal.mskcustomdomaincluster.xxxxxx.yy.kafka.region.amazonaws.com:9093,REPLICATION_SECURE://b-1-internal.mskcustomdomaincluster.xxxxxx.yy.kafka.region.amazonaws.com:9095]

Run the command to add the advertised listener for broker 1.

You need to get the listener details for the other brokers and configure the advertised.listener for each.

Test the setup

Set the bootstrap address to the custom domain. This is the A record created in the private hosted zone.

export BS=bootstrap.example.com:9000

List the MSK topics using the custom domain bootstrap address:

/home/ec2-user/kafka/bin/kafka-topics.sh --list \
--bootstrap-server $BS \
--command-config=/home/ec2-user/kafka/config/client_sasl.properties

You should see the topic customer.

Clean up

To stop incurring costs, it’s recommended to manually delete the private hosted zone, NLB, target groups, and imported certificate in ACM. Also, delete the CloudFormation stack to remove any resources provisioned by CloudFormation.

Use the following code to manually delete the aforementioned resources:

aws route53 change-resource-record-sets \
  --hosted-zone-id $HostedZoneId \
  --change-batch file://<(cat << EOF
{
  "Changes": [
    {
      "Action": "DELETE",
      "ResourceRecordSet": {
        "Name": "bootstrap.example.com",
        "Type": "A",
        "AliasTarget": {
          "HostedZoneId": "$NLB_ZoneId",
          "DNSName": "$NLB_DNS",
          "EvaluateTargetHealth": true
        }
      }
    }
  ]
}
EOF
)
    
aws route53 change-resource-record-sets \
  --hosted-zone-id $HostedZoneId \
  --change-batch file://<(cat << EOF
{
  "Changes": [
    {
      "Action": "DELETE",
      "ResourceRecordSet": {
        "Name": "b-1.example.com",
        "Type": "A",
        "AliasTarget": {
          "HostedZoneId": "$NLB_ZoneId",
          "DNSName": "$NLB_DNS",
          "EvaluateTargetHealth": true
        }
      }
    }
  ]
}
EOF
)
    
aws route53 change-resource-record-sets \
  --hosted-zone-id $HostedZoneId \
  --change-batch file://<(cat << EOF
{
  "Changes": [
    {
      "Action": "DELETE",
      "ResourceRecordSet": {
        "Name": "b-2.example.com",
        "Type": "A",
        "AliasTarget": {
          "HostedZoneId": "$NLB_ZoneId",
          "DNSName": "$NLB_DNS",
          "EvaluateTargetHealth": true
        }
      }
    }
  ]
}
EOF
)
    
aws route53 change-resource-record-sets \
  --hosted-zone-id $HostedZoneId \
  --change-batch file://<(cat << EOF
{
  "Changes": [
    {
      "Action": "DELETE",
      "ResourceRecordSet": {
        "Name": "b-3.example.com",
        "Type": "A",
        "AliasTarget": {
          "HostedZoneId": "$NLB_ZoneId",
          "DNSName": "$NLB_DNS",
          "EvaluateTargetHealth": true
        }
      }
    }
  ]
}
EOF
)
    
aws route53 delete-hosted-zone --id $HostedZoneId
aws elbv2 delete-load-balancer --load-balancer-arn $NLB_ARN
aws elbv2 delete-target-group --target-group-arn $TARGET_GROUP_B_ALL_ARN
aws elbv2 delete-target-group --target-group-arn $TARGET_GROUP_B_1_ARN
aws elbv2 delete-target-group --target-group-arn $TARGET_GROUP_B_2_ARN
aws elbv2 delete-target-group --target-group-arn $TARGET_GROUP_B_3_ARN

You need to wait up to 5 minutes for the completion of the NLB deletion:

aws acm delete-certificate --certificate-arn $CertificateARN

Now you can delete the CloudFormation stack.

Summary

This post explains how you can use an NLB, Route 53, and the advertised listener configuration option in Amazon MSK to support custom domain names with MSK clusters when using SASL/SCRAM authentication. You can use this solution to keep your existing Kafka bootstrap DNS name and reduce or remove the need to change client applications because of a migration, recovery process, or multi-cluster high availability. You can also use this solution to have the MSK bootstrap and broker names under your custom domain, enabling you to bring the DNS name in line with your naming convention (for example, msk.prod.example.com).

Try the solution out for yourself, and leave your questions and feedback in the comments section.

About the Authors

Subham Rakshit is a Senior Streaming Solutions Architect for Analytics at AWS based in the UK. He works with customers to design and build streaming architectures so they can get value from analyzing their streaming data. His two little daughters keep him occupied most of the time outside work, and he loves solving jigsaw puzzles with them. Connect with him on LinkedIn.

Mark Taylor is a Senior Technical Account Manager at Amazon Web Services, working with enterprise customers to implement best practices, optimize AWS usage, and address business challenges. Prior to joining AWS, Mark spent over 16 years in networking roles across industries, including healthcare, government, education, and payments. Mark lives in Folkestone, England, with his wife and two dogs. Outside of work, he enjoys watching and playing football, watching movies, playing board games, and traveling.

Helping keep customers safe with leaked password notification

2024-06-24 Garrett Galow

Post Syndicated from Garrett Galow original https://blog.cloudflare.com/helping-keep-customers-safe-with-leaked-password-notification

Password reuse is a real problem. When people use the same password across multiple services, it creates a risk that a breach of one service will give attackers access to a different, apparently unrelated, service. Attackers know people reuse passwords and build giant lists of known passwords and known usernames or email addresses.

If you got to the end of that paragraph and realized you’ve reused the same password multiple places, stop reading and go change those passwords. We’ll wait.

To help protect Cloudflare customers who have used a password attackers know about, we are releasing a feature to improve the security of the Cloudflare dashboard for all our customers by automatically checking whether their Cloudflare user password has appeared in an attacker’s list. Cloudflare will securely check a customer’s password against threat intelligence sources that monitor data breaches in other services.

If a customer logs in to Cloudflare with a password that was leaked in a breach elsewhere on the Internet, Cloudflare will alert them and ask them to choose a new password.

For some customers, the news that their password was known to hackers will come as a surprise – no one wants to intentionally use passwords that they know have been leaked elsewhere. To help customers avoid being locked out when they urgently need to use their Cloudflare dashboard, the leaked password check will provide a warning to the customer for the first three login attempts. After those three attempts, Cloudflare will require that the customer reset their password.

Resetting a leaked password is just the first step in Internet account security. The best way to protect your Cloudflare account, or any account, is to add two-factor authentication, such as using a hardware security key or an authenticator application, or to rely on a single sign-on integration. Cloudflare makes it easy for any user to add two-factor authentication security to their account through app-based codes, hardware keys, or passkeys. Cloudflare account Super Administrators can also require that all members enable two-factor authentication.

Whether or not a user has been impacted in a data breach, we encourage everyone to add two-factor authentication security to their Cloudflare account.

How do credentials leak?

Each time you authenticate to a service on the Internet with a username and password, that service can take a range of steps to protect your credentials.

More secure providers will hash the passwords. Hashing uses a cryptographic algorithm to convert the password into a random string of characters. Some platforms will layer on additional safeguards like a salt mechanism that introduces a random value to each password before the hashing process to ensure that two identical passwords do not have identical hashes.

These protections, combined with rate limits on login attempts, prevent brute force attacks. However, even for providers that adopt these best practices, users can still become victims of determined attackers when bad actors gain access to breached password databases. Attackers can collect compromised email password pairs to gain access to user accounts elsewhere as part of targeted attacks.

When vendors discover these kinds of compromised accounts, in many cases they will quickly force a password reset. However, resetting a leaked password in one application can still leave you vulnerable in other applications if you reused that password in other places and do not change your credentials everywhere.

That kind of password reuse means that an attacker can steal your credentials from one Internet service and try them against dozens of other popular destinations to see where you reuse the same password.

These so-called credential stuffing attacks have become more prevalent as breaches pile up. Attackers can sit on large troves of credentials for months or years, waiting to sell them to another bad actor or to use them in targeted attacks. For customers who want to protect themselves from these and other attacks that can compromise their end customer’s accounts, Cloudflare has solutions like bot management, exposed credential checks, and rate limiting available to help defend against these kinds of attacks

How can customers protect against the impact of leaked credentials?

If every password you use is unique, then a data breach in one vendor will be limited to just that particular system. For that reason, we encourage users to adopt a password manager that can create and remember unique passwords for each service that you use. Thankfully, the most popular operating systems now include password managers by default, and multiplatform third party options also make it easy for users to adopt this practice.

However, unique passwords are still vulnerable to phishing attacks. The best way to protect any Internet account is to add two-factor authentication (2FA). Two-factor authentication provides a comprehensive defense against credential stuffing attacks. When using two-factor authentication to log in, you must use your password and, for example, a one-time code from an app or a tap on a physical hardware key. The password alone is not enough to access your account.

Adoption of two-factor authentication, specifically hardware keys, has been shown to be able to eliminate 99.9% of account takeovers, since the attacker must also get access to your second factor in addition to your password. In the case of hardware keys, they need to physically obtain the key.

How does Cloudflare check for leaked credentials?

When a user attempts to log into Cloudflare, we will check if the password used has been leaked in a known data breach of another service or application on the Internet. We maintain data on breaches of Internet services that we can use to search against. Because we use password hashes, scrambled versions of the original password that can’t be easily reversed, we compare a hash of the password to hashes of compromised passwords found in these lists from other attacks. An additional benefit, beside the security of hashes, is the ability to perform fast lookups, much faster than plaintext searches. This means that we can perform these checks without adding significant latency to the login process for users.

Because of the potential impact of a Cloudflare account being compromised, we opt for a more secure approach of disallowing leaked credentials regardless of whether they were associated with the specific user’s email or not. Unfortunately, data breaches are likely to continue to happen. Therefore, being proactive helps reduce the risk of new breaches that contain the email and password pair allowing for an account to be compromised before the data is available to us.

If we detect a match, the user will be prompted with the following warning. We will also send an email notification with instructions on what to do and a unique link in order to reset the password.

At this point, the user will still be able to log in to their Cloudflare account. We strongly encourage users to reset their password immediately. However, we know that in some cases you need to reach the Cloudflare dashboard immediately or do not have convenient access to the email used for the account. Cloudflare will allow two additional login attempts to succeed with the same compromised password before forcing the user to reset their password.

To reset a password in Cloudflare Dashboard, navigate to the Authentication page in My Profile. From here, select Change Password and enter both the current password to authenticate and a new, non-compromised password. Alternatively, for those whose password was leaked, upon login an email will be sent with a unique link to reset the password.

What’s next?

Forcing users to reset compromised credentials helps prevent attacks from spreading on the Internet, but it’s just a small piece of improved account security. We know that adding the next step, second factor authentication, can be cumbersome. We have committed to CISA’s Secure by Design Pledge, which includes working to increase 2FA adoption across the industry. We will share our plans on how we will be implementing the pledge by mid-2025.

Adding multifactor authentication to every one of your accounts on the Internet can still be a chore, no matter how much the experience is improved. It’s much easier if you can just do it with one account and use that account to authenticate into other services and applications – a single-sign on (SSO) flow. Right now, our SSO feature is limited to enterprise accounts, and we plan to change that. We will allow users to access Cloudflare through other providers like Google, GitHub, and more. Allowing users to reduce how many unique password and 2FA combinations they have to keep track of helps to reduce the likelihood of being impacted by future password breaches.

ASRock Rack 1U24E1S-GENOA/2L2T Review A Single Socket E1.S Monster

2024-06-24 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/asrock-rack-1u24e1s-genoa-2l2t-review-e1-s-amd-epyc-broadcom-micron-kioxia-ssd-edsff/

In our ASRock Rack 1U24E1S-GENOA/2L2T review, we see how this 1U AMD EPYC 9004 server uses unique cooling concepts and 24x E1.S SSD bays

The post ASRock Rack 1U24E1S-GENOA/2L2T Review A Single Socket E1.S Monster appeared first on ServeTheHome.

Min: sched_ext: scheduler architecture and interfaces

2024-06-24 corbet

Post Syndicated from corbet original https://lwn.net/Articles/979532/

Changwoo Min has posted an
introduction to writing custom schedulers with sched_ext.

In a particular situation, when each scheduling policy needs its
specific action, the core kernel scheduler calls an operation
defined in struct sched_class. For example, when the core
kernel scheduler needs to select a task to be scheduled, it calls
the sched_class.pick_next_task(rq) callback of a concrete
scheduling policy. When a task becomes runnable, the core kernel
scheduler calls sched_class.enqueue(rq, p, flags) so the
concrete scheduling policy enqueues task p to run queue
rq. When a task’s runtime state needs to be updated, the
core kernel scheduler calls sched_class.update_curr(rq).

Who Really Protests, and Why?

2024-06-24 The Atlantic

Post Syndicated from The Atlantic original https://www.youtube.com/watch?v=XhpXRU9UEcY

The Truth About Immigration and Public Opinion

2024-06-24 The Atlantic

Post Syndicated from The Atlantic original https://www.youtube.com/watch?v=dfBFjS8wI2A

The Airport Lounge Arms Race

2024-06-24 The Atlantic

Post Syndicated from The Atlantic original https://www.youtube.com/watch?v=R7yYCXl_6l4

AWS Weekly Roundup: Claude 3.5 Sonnet in Amazon Bedrock, CodeCatalyst updates, SageMaker with MLflow, and more (June 24, 2024)

2024-06-24 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-claude-3-5-sonnet-in-amazon-bedrock-codecatalyst-updates-sagemaker-with-mlflow-and-more-june-24-2024/

This week, I had the opportunity to try the new Anthropic Claude 3.5 Sonnet model in Amazon Bedrock just before it launched, and I was really impressed by its speed and accuracy! It was also the week of AWS Summit Japan; here’s a nice picture of the busy AWS Community stage.

Last week’s launches
With many new capabilities, from recommendations on the size of your Amazon Relational Database Services (Amazon RDS) databases to new built-in transformations in AWS Glue, here’s what got my attention:

Amazon Bedrock – Now supports Anthropic’s Claude 3.5 Sonnet and compressed embeddings from Cohere Embed.

AWS CodeArtifact – With support for Rust packages with Cargo, developers can now store and access their Rust libraries (known as crates).

Amazon CodeCatalyst – Many updates from this unified software development service. You can now assign issues in CodeCatalyst to Amazon Q and direct it to work with source code hosted in GitHub Cloud and Bitbucket Cloud and ask Amazon Q to analyze issues and recommend granular tasks. These tasks can then be individually assigned to users or to Amazon Q itself. You can now also use Amazon Q to help pick the best blueprint for your needs. You can now securely store, publish, and share Maven, Python, and NuGet packages. You can also link an issue to other issues. This allows customers to link issues in CodeCatalyst as blocked by, duplicate of, related to, or blocks another issue. You can now configure a single CodeBuild webhook at organization or enterprise level to receive events from all repositories in your organizations, instead of creating webhooks for each individual repository. Finally, you can now add a default IAM role to an environment.

Amazon EC2 – C7g and R7g instances (powered by AWS Graviton3 processors) are now available in Europe (Milan), Asia Pacific (Hong Kong), and South America (São Paulo) Regions. C7i-flex instances are now available in US East (Ohio) Region.

AWS Compute Optimizer – Now provides rightsizing recommendations for Amazon RDS MySQL, and RDS PostgreSQL. More info in this Cloud Financial Management blog post.

Amazon OpenSearch Service – With JSON Web Token (JWT) authentication and authorization, it’s now easier to integrate identity providers and isolate tenants in a multi-tenant application.

Amazon SageMaker – Now helps you manage machine learning (ML) experiments and the entire ML lifecycle with a fully managed MLflow capability.

AWS Glue – The serverless data integration service now offers 13 new built-in transforms: flag duplicates in column, format Phone Number, format case, fill with mode, flag duplicate rows, remove duplicates, month name, iIs even, cryptographic hash, decrypt, encrypt, int to IP, and IP to int.

Amazon MWAA – Amazon Managed Workflows for Apache Airflow (MWAA) now supports custom domain names for the Airflow web server, allowing to use private web servers with load balancers, custom DNS entries, or proxies to point users to a user-friendly web address.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional projects, blog posts, and news items that you might find interesting:

AWS re:Inforce 2024 re:Cap – A summary of our annual, immersive, cloud-security learning event by my colleague Wojtek.

Three ways Amazon Q Developer agent for code transformation accelerates Java upgrades – This post offers interesting details on how Amazon Q Developer handles major version upgrades of popular frameworks, replacing deprecated API calls on your behalf, and explainability on code changes.

Five ways Amazon Q simplifies AWS CloudFormation development – For template code generation, querying CloudFormation resource requirements, explaining existing template code, understanding deployment options and issues, and querying CloudFormation documentation.

Improving air quality with generative AI – A nice solution that uses artificial intelligence (AI) to standardize air quality data, addressing the air quality data integration problem of low-cost sensors.

Deploy a Slack gateway for Amazon Bedrock – A solution bringing the power of generative AI directly into your Slack workspace.

An agent-based simulation of Amazon’s inbound supply chain – Simulating the entire US inbound supply chain, including the “first-mile” of distribution and tracking the movement of hundreds of millions of individual products through the network.

AWS CloudFormation Linter (cfn-lint) v1 – This upgrade is particularly significant because it converts from using the CloudFormation spec to using CloudFormation registry resource provider schemas.

A practical approach to using generative AI in the SDLC – Learn how an AI assistant like Amazon Q Developer helps my colleague Jenna figure out what to build and how to build it.

AWS open source news and updates – My colleague Ricardo writes about open source projects, tools, and events from the AWS Community. Check out Ricardo’s page for the latest updates.

Upcoming AWS events
Check your calendars and sign up for upcoming AWS events:

AWS Summits – Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. This week, you can join the AWS Summit in Washington, DC, June 26–27. Learn here about future AWS Summit events happening in your area.

AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world. This week there are AWS Community Days in Switzerland (June 27), Sri Lanka (June 27), and the Gen AI Edition in Ahmedabad, India (June 29).

Browse all upcoming AWS led in-person and virtual events and developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Danilo

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

[$] The GhostBSD in the machine

2024-06-24 jzb

Post Syndicated from jzb original https://lwn.net/Articles/978837/

GhostBSD is a
desktop-oriented operating system based on FreeBSD and the MATE Desktop Environment. The
goal of the project is to lower the barrier to entry of using FreeBSD
on a desktop or laptop system, and it largely succeeds at this. While it has a few rough edges
that make it hard to recommend for the average desktop user, it is
a fine choice for users who want a desktop with FreeBSD underpinnings
such as the Z File System (ZFS), and the Ports (source) and Packages (binary) software collections.

Security updates for Monday

2024-06-24 jake

Post Syndicated from jake original https://lwn.net/Articles/979520/

Security updates have been issued by AlmaLinux (ipa and libreswan), Debian (netty), Fedora (python-PyMySQL, tomcat, and webkitgtk), Gentoo (Flatpak, GLib, JHead, LZ4, and RDoc), Mageia (thunderbird), Oracle (nghttp2 and thunderbird), Red Hat (dnsmasq, libreswan, pki-core, and python3.11), Slackware (emacs), SUSE (gnome-settings-daemon, libarchive, qpdf, vte, and wget), and Ubuntu (libhibernate3-java).

Emacs 29.4 released

2024-06-24 corbet

Post Syndicated from corbet original https://lwn.net/Articles/979491/

Version
29.4 of the Emacs editor has been released. This is “an emergency
bugfix release” fixing a vulnerability that can cause
the editor to execute arbitrary shell code in Org mode. Anybody who runs Emacs on
untrusted files — including those using Gnus or one of the Emacs mail modes
— should be looking to update. For those who cannot update, a pair of
messages from Russ
Allbery and Florian Weimer
investigates how to disable the Org-mode evaluation, a task that is
seemingly more complicated than it should be.

Using machine learning to detect bot attacks that leverage residential proxies

2024-06-24 Bob AminAzad

Post Syndicated from Bob AminAzad original https://blog.cloudflare.com/residential-proxy-bot-detection-using-machine-learning

Bots using residential proxies are a major source of frustration for security engineers trying to fight online abuse. These engineers often see a similar pattern of abuse when well-funded, modern botnets target their applications. Advanced bots bypass country blocks, ASN blocks, and rate-limiting. Every time, the bot operator moves to a new IP address space until they blend in perfectly with the “good” traffic, mimicking real users’ behavior and request patterns. Our new Bot Management machine learning model (v8) identifies residential proxy abuse without resorting to IP blocking, which can cause false positives for legitimate users.

Background

One of the main sources of Cloudflare’s bot score is our bot detection machine learning model which analyzes, on average, over 46 million HTTP requests per second in real time. Since our first Bot Management ML model was released in 2019, we have continuously evolved and improved the model. Nowadays, our models leverage features based on request fingerprints, behavioral signals, and global statistics and trends that we see across our network.

Each iteration of the model focuses on certain areas of improvement. This process starts with a rigorous R&D phase to identify the emerging patterns of bot attacks by reviewing feedback from our customers and reports of missed attacks. In v8, we mainly focused on two areas of abuse. First, we analyzed the campaigns that leverage residential IP proxies, which are proxies on residential networks commonly used to launch widely distributed attacks against high profile targets. In addition to that, we improved model accuracy for detecting attacks that originate from cloud providers.

Residential IP proxies

Proxies allow attackers to hide their identity and distribute their attack. Moreover, IP address rotation allows attackers to directly bypass traditional defenses such as IP reputation and IP rate limiting. Knowing this, defenders use a plethora of signals to identify malicious use of proxies. In its simplest forms, IP reputation signals (e.g., data center IP addresses, known open proxies, etc.) can lead to the detection of such distributed attacks.

However, in the past few years, bot operators have started favoring proxies operating in residential network IP address space. By using residential IP proxies, attackers can masquerade as legitimate users by sending their traffic through residential networks. Nowadays, residential IP proxies are offered by companies that facilitate access to large pools of IP addresses for attackers. Residential proxy providers claim to offer 30-100 million IPs belonging to residential and mobile networks across the world. Most commonly, these IPs are sourced by partnering with free VPN providers, as well as including the proxy SDKs into popular browser extensions and mobile applications. This allows residential proxy providers to gain a foothold on victims’ devices and abuse their residential network connections.

Figure 1: Architecture of a residential proxy network

Figure 1 depicts the architecture of a residential proxy. By subscribing to these services, attackers gain access to an authenticated proxy gateway address commonly using the HTTPS/SOCKS5 proxy protocol. Some residential proxy providers allow their users to select the country or region for the proxy exit nodes. Alternatively, users can choose to keep the same IP address throughout their session or rotate to a new one for each outgoing request. Residential proxy providers then identify active exit nodes on their network (on devices that they control within residential networks across the world) and route the proxied traffic through them.

The large pool of IP addresses and the diversity of networks poses a challenge to traditional bot defense mechanisms that rely on IP reputation and rate limiting. Moreover, the diversity of IPs enables the attackers to rotate through them indefinitely. This shrinks the window of opportunity for bot detection systems to effectively detect and stop the attacks. Effective defense against residential proxy attacks should be able to detect this type of bot traffic either based on single request features to stop the attack immediately, or identify unique fingerprints from the browsing agent to track and mitigate the bot traffic regardless of the IP source. Overly broad blocking actions, such as IP block-listing, by definition, would result in blocking legitimate traffic from residential networks where at least one device is acting as a residential proxy node.

ML model training

At its heart, our model is built using a chain of modules that work together. Initially, we fetch and prepare training and validation datasets from our Clickhouse data storage. We use datasets with high confidence labels as part of our training. For model validation, we use datasets consisting of missed attacks reported by our customers, known sources of bot traffic (e.g., verified bots), and high confidence detections from other bot management modules (e.g., heuristics engine). We orchestrate these steps using Apache Airflow, which enables us to customize each stage of the ML model training and define the interdependencies of our training, validation, and reporting modules in the form of directed acyclic graphs (DAGs).

The first step of training a new model is fetching labeled training data from our data store. Under the hood, our dataset definitions are SQL queries that will materialize by fetching data from our Clickhouse cluster where we store feature values and calculate aggregates from the traffic on our network. Figure 2 depicts these steps as train and validation dataset fetch operations. Introducing new datasets can be as straightforward as writing the SQL queries to filter the desired subset of requests.

Figure 2: Airflow DAG for model training and validation

After fetching the datasets, we train our Catboost model and tune its hyper parameters. During evaluation, we compare the performance of the newly trained model against the current default version running for our customers. To capture the intricate patterns in subsets of our data, we split certain validation datasets into smaller slivers called specializations. For instance, we use the detections made by our heuristics engine and managed rulesets as ground truth for bot traffic. To ensure that larger sources of traffic (large ASNs, different HTTP versions, etc.) do not mask our visibility into patterns for the rest of the traffic, we define specializations for these sources of traffic. As a result, improvements in accuracy of the new model can be evaluated for common patterns (e.g., HTTP/1.1 and HTTP/2) as well as less common ones. Our model training DAG will provide a breakdown report for the accuracy, score distribution, feature importance, and SHAP explainers for each validation dataset and its specializations.

Once we are happy with the validation results and model accuracy, we evaluate our model against a checklist of steps to ensure the correctness and validity of our model. We start by ensuring that our results and observations are reproducible over multiple non-overlapping training and validation time ranges. Moreover, we check for the following factors:

Check for the distribution of feature values to identify irregularities such as missing or skewed values.
Check for overlaps between training and validation datasets and feature values.
Verify the diversity of training data and the balance between labels and datasets.
Evaluate performance changes in the accuracy of the model on validation datasets based on their order of importance.
Check for model overfitting by evaluating the feature importance and SHAP explainers.

After the model passes the readiness checks, we deploy it in shadow mode. We can observe the behavior of the model on live traffic in log-only mode (i.e., without affecting the bot score). After gaining confidence in the model’s performance on live traffic, we start onboarding beta customers, and gradually switch the model to active mode all while closely monitoring the real-world performance of our new model.

ML features for bot detection

Each of our models uses a set of features to make inferences about the incoming requests. We compute our features based on single request properties (single request features) and patterns from multiple requests (i.e., inter-request features). We can categorize these features into the following groups:

Global features: inter-request features that are computed based on global aggregates for different types of fingerprints and traffic sources (e.g., for an ASN) seen across our global network. Given the relatively lower cardinality of these features, we can scalably calculate global aggregates for each of them.
High cardinality features: inter-request features focused on fine-grained aggregate data from local traffic patterns and behaviors (e.g., for an individual IP address)
Single request features: features derived from each individual request (e.g., user agent).

Our Bot Management system (named BLISS) is responsible for fetching and computing these feature values and making them available on our servers for inference by active versions of our ML models.

Detecting residential proxies using network and behavioral signals

Attacks originating from residential IP addresses are commonly characterized by a spike in the overall traffic towards sensitive endpoints on the target websites from a large number of residential ASNs. Our approach for detecting residential IP proxies is twofold. First, we start by comparing direct vs proxied requests and looking for network level discrepancies. Revisiting Figure 1, we notice that a request routed through residential proxies (red dotted line) has to traverse through multiple hops before reaching the target, which affects the network latency of the request.

Based on this observation alone, we are able to characterize residential proxy traffic with a high true positive rate (i.e., all residential proxy requests have high network latency). While we were able to replicate this in our lab environment, we quickly realized that at the scale of the Internet, we run into numerous exceptions with false positive detections (i.e., non-residential proxy traffic with high latency). For instance, countries and regions that predominantly use satellite Internet would exhibit a high network latency for the majority of their requests due to the use of performance enhancing proxies.

Realizing that relying solely on network characteristics of connections to detect residential proxies is inadequate given the diversity of the connections on the Internet, we switched our focus to the behavior of residential IPs. To that end, we observe that the IP addresses from residential proxies express a distinct behavior during periods of peak activity. While this observation singles out highly active IPs over their peak activity time, given the pool size of residential IPs, it is not uncommon to only observe a small number of requests from the majority of residential proxy IPs.

These periods of inactivity can be attributed to the temporary nature of residential proxy exit nodes. For instance, when the client software (i.e., browser or mobile application) that runs the exit nodes of these proxies is closed, the node leaves the residential proxy network. One way to filter out periods of inactivity is to increase the monitoring time and punish each IP address that exhibits residential proxy behavior for a period of time. This block-listing approach, however, has certain limitations. Most importantly, by relying only on IP-based behavioral signals, we would block traffic from legitimate users that may unknowingly run mobile applications or browser extensions that turn their devices into proxies. This is further detrimental for mobile networks where many users share their IPs behind CGNATs. Figure 3 demonstrates this by comparing the share of direct vs proxied requests that we received from active residential proxy IPs over a 24-hour period. Overall, we see that 4 out of 5 requests from these networks belong to direct and benign connections from residential devices.

Figure 3: Percentage of direct vs proxied requests from residential proxy IPs.

Using this insight, we combined behavioral and latency-based features along with new datasets to train a new machine learning model that detects residential proxy traffic on a per-request basis. This scheme allows us to block residential proxy traffic while allowing benign residential users to visit Cloudflare-protected websites from the same residential network.

Detection results and case studies

We started testing v8 in shadow mode in March 2024. Every hour, v8 is classifying more than 17 million unique IPs that participate in residential proxy attacks. Figure 4 shows the geographic distribution of IPs with residential proxy activity belonging to more than 45 thousand ASNs in 237 countries/regions. Among the most commonly requested endpoints from residential proxies, we observe patterns of account takeover attempts, such as requests to /login, /auth/login, and /api/login.

Figure 4: Countries and regions with residential network activity. Size of markers are proportionate to the number of IPs with residential proxy activity.

Furthermore, we see significant improvements when evaluating our new machine learning model on previously missed attacks reported by our customers. In one case, v8 was able to correctly classify 95% of requests from distributed residential proxy attacks targeting the voucher redemption endpoint of the customer’s website. In another case, our new model successfully detected a previously missed content scraping attack evident by increased detection during traffic spikes depicted in Figure 5. We are continuing to monitor the behavior of residential proxy attacks in the wild and work with our customers to ensure that we can provide robust detection against these distributed attacks.

Figure 5: Spikes in bot requests from residential proxies detected by ML v8

Improving detection for bots from cloud providers

In addition to residential IP proxies, bot operators commonly use cloud providers to host and run bot scripts that attack our customers. To combat these attacks, we improved our ground truth labels for cloud provider attacks in our latest ML training datasets. Early results show that v8 detects 20% more bots from cloud providers, with up to 70% more bots detected on zones that are marked as under attack. We further plan to expand the list of cloud providers that v8 detects as part of our ongoing updates.

Check out ML v8

For existing Bot Management customers we recommend toggling “Auto-update machine learning model” to instantly gain the benefits of ML v8 and its residential proxy detection, and to stay up to date with our future ML model updates. If you’re not a Cloudflare Bot Management customer, contact our sales team to try out Bot Management.