Tag Archives: DNS filtering

Introducing Cloudflare One Intel

Post Syndicated from Malavika Balachandran Tadeusz original https://blog.cloudflare.com/cloudflare-one-intel/

Introducing Cloudflare One Intel

Introducing Cloudflare One Intel

Earlier this week, we announced Cloudflare One, a single platform for networking and security management. Cloudflare One extends the speed, reliability, and security we’ve brought to Internet properties and applications over the last decade to make the Internet the new enterprise WAN.

Underpinning Cloudflare One is Cloudflare’s global network – today, our network spans more than 200 cities worldwide and is within milliseconds of nearly everyone connected to the Internet. Our network handles, on average, 18 million HTTP requests and 6 million DNS requests per second. With 1 billion unique IP addresses connecting to the Cloudflare network each day, we have one of the broadest views on Internet activity worldwide.

We see a large diversity of Internet traffic across our entire product suite. Every day, we block 72 billion cyberthreats. This visibility provides us with a unique position to understand and mitigate Internet threats, and enables us to see new threats and malware before anyone else.

At the beginning of this month, as part of our 10th Birthday Week, we launched Cloudflare Radar, which shares high-level trends with the general public based on our network’s aggregate data. The same data that powers that view of the Internet also gives us the ability to create new insights to keep your team safer.

Today, we are excited to announce the next phase of network and threat intelligence at Cloudflare: the launch of Cloudflare One Intel. Cloudflare One Intel streamlines network and security operations by converting the data we can gather on our network into actionable insights.

The challenge with the traditional security operations

Most enterprises use a large array of point solutions to ensure that the corporate network remains fast, available and secure. Security teams typically aggregate logs from these point solutions into their SIEM and create custom alerts for incident detection.

Once an incident has been detected, security teams will quickly respond with remediating actions to prevent data loss, such as removing a compromised device’s access controls or adding a malicious hostname or URL to a block list.

Along with incident remediation, security teams will conduct an investigation of the incident to uncover more details about the attacker. Pivoting across historical DNS records, SSL certificate fingerprints, malware samples, and other indicators of compromise, security researchers will try to uncover more details about an attacker. Linked indicators then get fed back onto block lists in point solutions to prevent subsequent attacks.

However, there are several challenges with traditional incident detection and response. Security operations teams are often overwhelmed by the plethora of logs and alerts. With threat intelligence, SIEMs, and control planes all in different platforms, incident detection, remediation and forensics can be slow, arduous, and expensive.

Improving Incident Response with Cloudflare One

We want to make network and security operations as streamlined as possible. Cloudflare One Intel helps network and security teams detect and respond to incidents more efficiently. That means bringing together insights from your network activity, global Internet intelligence, and automated remediation in a single platform.

As part of the mission to help security teams detect and block emerging security threats more efficiently we are releasing two features within Cloudflare Gateway: DNS tunneling detection and domain insights.

What is DNS Tunneling?

DNS tunneling is the misuse of the Domain Name System (DNS) protocol to encode another protocol’s data into a series of DNS queries and response messages. DNS tunneling is often used to circumvent a corporate firewall. For example, DNS tunneling might be used to visit a website that is blocked on the corporate firewall, distribute malware from a command & control server, or exfiltrate sensitive data.

DNS tunneling isn’t only used for malicious activities. One of the most common uses of DNS tunneling is by antivirus software, which will often use DNS tunneling to look up file signatures.

Blocking DNS tunneling using Cloudflare Gateway

Starting today, customers using Cloudflare Gateway can block hostnames associated with DNS tunneling using the “DNS Tunneling” filter in Gateway’s DNS filtering policies. This feature is available to all Gateway users at no additional cost.

You can begin using the filter by navigating to the Policies section of the Gateway product and selecting the “Security Threats” tab. Once you check the “DNS Tunneling” box, Gateway will automatically block any requests made by your organization’s users to domains on this list. Should you want to manually override any specific domains, you can use the “Domain Override” feature to remove the block policy on a specific domain.

Introducing Cloudflare One Intel

We previously included known malicious DNS tunnels in our “Anonymizer” category within Gateway’s security threat categories. We are now pulling that into its own category so that customers can have more granular visibility into threats on their network. Further, we are expanding the filter beyond known malicious DNS tunnels to include newly emerging threats, so that customers can block these threats as soon as we see them on our network.

How we use machine learning to detect DNS tunneling

Using machine learning, Cloudflare detects anomalous DNS request patterns and flags these requests as suspected DNS tunneling. Our model analyzes requests and detects anomalous behavior at a frequency of every five minutes.

Once a set of requests is flagged, we add the associated hostname to our “DNS Tunneling” category. We do not add hostnames of commonly allowed DNS tunnels to this list, such as those used by antivirus software.

Our model not only blocks hostnames associated with DNS tunneling seen on your network, but across the entire Cloudflare network. Processing over 500 billion DNS queries each day, we have unique insight into global DNS traffic patterns.

Adding transparency to security

Cloudflare’s unique insight into global Internet traffic is what powers the intelligence behind Cloudflare One. DNS tunneling detection is one example of how we use aggregated data from our network to improve Internet security for everyone. But, until now, that has been opaque to users.

Security teams investigating the threats that impact their organization need more transparency. Cloudflare One Intel consolidates the information we have about the potentially harmful sites and properties that can target your organization.

Starting today, with a single click, administrators reviewing logs in Cloudflare Gateway can get a comprehensive breakdown of any site being allowed or blocked.

In this expanded view, you can now click the “View Domain Insights” button, which will take you to the Cloudflare Radar Domain Insights page for the requested hostname. This feature is available to all Gateway users at no additional cost.

Introducing Cloudflare One Intel
Introducing Cloudflare One Intel

What’s Next

These new features are just the beginning of Cloudflare One Intel. Over the coming weeks and months, we’ll be rolling out more features across the Cloudflare One platform that will make our Internet intelligence more accessible and actionable. Stay tuned for premium features available in Cloudflare Radar for Cloudflare Gateway customers.

Get started now

Cloudflare Radar is available to everyone for free – you can check it out here and start exploring our Internet intelligence.

To protect your team from threats on the Internet that utilize DNS tunnelling, sign up for a Cloudflare Gateway account and use the Security filter setting to block DNS tunnelling attempts. DNS-based security and content filtering is available for free across every Gateway plan.

Resolve internal hostnames with Cloudflare for Teams

Post Syndicated from Sam Rhea original https://blog.cloudflare.com/redirect-users-to-new-destinations-with-cloudflare-for-teams/

Resolve internal hostnames with Cloudflare for Teams

Phishing attacks begin like any other visit to a site on the Internet. A user opens a suspicious link from an email, and their DNS resolver looks up the hostname, then connects the user to the origin.

Cloudflare Gateway’s secure DNS blocks threats like this by checking every hostname query against a constantly-evolving list of known threats on the Internet. Instead of sending the user to the malicious host, Gateway stops the site from resolving.. The user sees a “blocked domain” page instead of the malicious site itself.

As teams migrate to SaaS applications and zero-trust solutions, they rely more on the public Internet to do their jobs. Gateway’s security works like a bouncer, keeping users safe as they navigate the Internet. However, some organizations still need to send traffic to internal destinations for testing or as a way to make the migration more seamless.

Starting today, you can use Cloudflare Gateway to direct end user traffic to a different IP than the one they originally requested. Administrators can build rules to override the address that would be returned by a resolver and send traffic to a specified alternative.

Like the security features of Cloudflare Gateway, the redirect function is available in every one of Cloudflare’s data centers in 200 cities around the world, so you can block bad traffic and steer internal traffic without compromising performance.

What is Cloudflare Gateway?

Cloudflare Gateway is one-half of Cloudflare for Teams, Cloudflare’s platform for securing users, devices, and data. With Cloudflare for Teams, our global network becomes your team’s network, replacing on-premise appliances and security subscriptions with a single solution delivered closer to your users – wherever they work.

Resolve internal hostnames with Cloudflare for Teams

As part of that platform, Cloudflare Gateway blocks threats on the public Internet from becoming incidents inside of your organization. Gateway’s first release added  DNS security filtering and content blocking to the world’s fastest DNS resolver, Cloudflare’s 1.1.1.1.

Deployment takes less than 5 minutes. Teams can secure entire office networks and segment traffic reports by location. For distributed organizations, Gateway can be deployed via MDM on networks that support IPv6 or using a dedicated IPv4 as part of a Cloudflare enterprise account.

With secure DNS filtering, administrators can click a single button to block known threats, like sources of malware or phishing sites. Policies can be extended to block specific categories, like gambling sites or social media. When users request a filtered site, Gateway stops the DNS query from resolving and prevents the device from connecting to a malicious destination or hostname with blocked material.

Traffic bound for internal destinations

As users connect to SaaS applications, Cloudflare Gateway can keep those teams secure from threats on the public Internet.

In parallel, teams can move applications that previously lived on a private network to a zero-trust model with Cloudflare Access. Rather than trusting anyone on a private network, Access checks for identity any time someone attempts to reach the application.

Together, Cloudflare for Teams keeps users safe and makes internal applications just as easy to use as SaaS tools. Making it easier to migrate to that model also reduces user friction. Domain overrides can smooth that transition from internal networks to a fully cloud-delivered model.

With Gateway’s domain override feature, administrators can choose certain hostnames that still run on the private network and send traffic to the local IPs with the same resolver that secures Internet-bound traffic. End users can continue to connect to those resources without disruption. Once ready, those tools can be secured with Cloudflare Access to remove the reliance on a private network altogether.

Resolve internal hostnames with Cloudflare for Teams

Cloudflare Gateway can help reduce user confusion and IT overhead with split-horizon setups where some traffic routes to the Internet and other requests need to stay on the same network. Administrators can build policies to route traffic bound for hostnames, even ones that exist publicly, to internal IP addresses that a user can reach if they are on the same local network.

How does it work?

When administrators configure an override policy, Cloudflare Gateway pushes that information to the edge of our network. The rule becomes part of the Gateway enforcement flow for that organization’s account. Explicit override policies are enforced first, before allowed or blocked rules.

When a user makes a request to the original destination, that request arrives at a Gateway IP address where Cloudflare’s network checks the source IP to determine which policies to enforce. Gateway determines that the request has an override rule and returns the preconfigured IP address.

Gateway’s DNS override feature is supported in deployments that use Cloudflare’s IPv4 or IPv6 addresses, as well as DNS over HTTPS.

What’s next?

The domain override feature is available to all Cloudflare for Teams customers today at no additional cost. You can begin building override rules by navigating to the Policies section of the Gateway product and selecting the “Custom” tab. Administrators can configure up to 1,000 custom rules.

To help organizations in their transition to remote work, Cloudflare has made our Teams platform free for any organization through September 1. You can set up an account at dash.teams.cloudflare.com now.

Need help getting started? You can request a dedicated onboarding session at no charge.

How to add DNS filtering to your NAT instance with Squid

Post Syndicated from Nicolas Malaval original https://aws.amazon.com/blogs/security/how-to-add-dns-filtering-to-your-nat-instance-with-squid/

September 23, 2020: The squid configuration file in this blog post and associated YAML template have been updated.

September 4, 2019: We’ve updated this blog post, initially published on January 26, 2016. Major changes include: support of Amazon Linux 2, no longer having to compile Squid 3.5, and a high availability version of the solution across two availability zones.

Amazon Virtual Private Cloud (Amazon VPC) enables you to launch AWS resources on a virtual private network that you’ve defined. On an Amazon VPC, many people use network address translation (NAT) instances and NAT gateways to enable instances in a private subnet to initiate outbound traffic to the Internet, while preventing the instances from receiving inbound traffic initiated by someone on the Internet.

For security and compliance purposes, you might have to filter the requests initiated by these instances (also known as “egress filtering”). Using iptables rules, you could restrict outbound traffic with your NAT instance based on a predefined destination port or IP address. However, you might need to enforce more complex security policies, such as allowing requests to AWS endpoints only, or blocking fraudulent websites, which you can’t easily achieve by using iptables rules.

In this post, I discuss and give an example of how to use Squid, a leading open-source proxy, to implement a “transparent proxy” that can restrict both HTTP and HTTPS outbound traffic to a given set of Internet domains, while being fully transparent for instances in the private subnet.

The solution architecture

In this section, I present the architecture of the high availability NAT solution and explain how to configure Squid to filter traffic transparently. Later in this post, I’ll provide instructions about how to implement and test the solution.

The following diagram illustrates how the components in this process interact with each other. Squid Instance 1 intercepts HTTP/S requests sent by instances in Private Subnet 1, including the Testing Instance. Squid Instance 1 then initiates a connection with the destination host on behalf of the Testing Instance, which goes through the Internet gateway. This solution spans two Availability Zones, with Squid Instance 2 intercepting requests sent from the other Availability Zone. Note that you may adapt the solution to span additional Availability Zones.

Figure 1: The solution spans two Availability Zones

Figure 1: The solution spans two Availability Zones

Intercepting and filtering traffic

In each availability zone, the route table associated to the private subnet sends the outbound traffic to the Squid instance (see Route Tables for a NAT Device). Squid intercepts the requested domain, then applies the following filtering policy:

  • For HTTP requests, Squid retrieves the host header field included in all HTTP/1.1 request messages. This specifies the Internet host being requested.
  • For HTTPS requests, the HTTP traffic is encapsulated in a TLS connection between the instance in the private subnet and the remote host. Squid cannot retrieve the host header field because the header is encrypted. A feature called SslBump would allow Squid to decrypt the traffic, but this would not be transparent for the client because the certificate would be considered invalid in most cases. The feature I use instead, called SslPeekAndSplice, retrieves the Server Name Indication (SNI) from the TLS initiation. The SNI contains the requested Internet host. As a result, Squid can make filtering decisions without decrypting the HTTPS traffic.

Note 1: Some older client-side software stacks do not support SNI. Here are the minimum versions of some important stacks and programming languages that support SNI: Python 2.7.9 and 3.2, Java 7 JSSE, wget 1.14, OpenSSL 0.9.8j, cURL 7.18.1

Note 2: TLS 1.3 introduced an optional extension that allows the client to encrypt the SNI, which may prevent Squid from intercepting the requested domain.

The SslPeekAndSplice feature was introduced in Squid 3.5 and is implemented in the same Squid module as SslBump. To enable this module, Squid requires that you provide a certificate, though it will not be used to decode HTTPS traffic. The solution creates a certificate using OpenSSL.


mkdir /etc/squid/ssl
cd /etc/squid/ssl
openssl genrsa -out squid.key 4096
openssl req -new -key squid.key -out squid.csr -subj "/C=XX/ST=XX/L=squid/O=squid/CN=squid"
openssl x509 -req -days 3650 -in squid.csr -signkey squid.key -out squid.crt
cat squid.key squid.crt >> squid.pem        

The following code shows the Squid configuration file. For HTTPS traffic, note the ssl_bump directives instructing Squid to “peek” (retrieve the SNI) and then “splice” (become a TCP tunnel without decoding) or “terminate” the connection depending on the requested host.


visible_hostname squid 
cache deny all 

# Log format and rotation 
logformat squid %ts.%03tu %6tr %>a %Ss/%03>Hs %<st %rm %ru %ssl::>sni %Sh/%<a %mt 
logfile_rotate 10 
debug_options rotate=10 

# Handle HTTP requests 
http_port 3128 
http_port 3129 intercept 

# Handle HTTPS requests 
https_port 3130 cert=/etc/squid/ssl/squid.pem ssl-bump intercept 
acl SSL_port port 443 
http_access allow SSL_port 
acl step1 at_step SslBump1 
acl step2 at_step SslBump2 
acl step3 at_step SslBump3 
ssl_bump peek step1 all 

# Deny requests to proxy instance metadata 
acl instance_metadata dst 169.254.169.254 
http_access deny instance_metadata 

# Filter HTTP requests based on the whitelist 
acl allowed_http_sites dstdomain "/etc/squid/whitelist.txt" 
http_access allow allowed_http_sites 

# Filter HTTPS requests based on the whitelist 
acl allowed_https_sites ssl::server_name "/etc/squid/whitelist.txt" 
ssl_bump peek step2 allowed_https_sites 
ssl_bump splice step3 allowed_https_sites 
ssl_bump terminate step2 all 

http_access deny all       

The text file located at /etc/squid/whitelist.txt contains the list of whitelisted domains, with one domain per line. In this blog post, I’ll show you how to configure Squid to allow requests to *.amazonaws.com, which corresponds to AWS endpoints. Note that you can restrict access to a specific set of AWS services that you’ve defined (see Regions and Endpoints for a detailed list of endpoints), or you can set your own list of domains.

Note: An alternate approach is to use VPC endpoints to privately connect your VPC to supported AWS services without requiring access over the Internet (see VPC Endpoints). Some supported AWS services allow you to create a policy that controls the use of the endpoint to access AWS resources (see VPC Endpoint Policies, and VPC Endpoints for a list of supported services).

You may have noticed that Squid listens on port 3129 for HTTP traffic and 3130 for HTTPS. Because Squid cannot directly listen to 80 and 443, you have to redirect the incoming requests from instances in the private subnets to the Squid ports using iptables. You do not have to enable IP forwarding or add any FORWARD rule, as you would do with a standard NAT instance.


sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 3129
sudo iptables -t nat -A PREROUTING -p tcp --dport 443 -j REDIRECT --to-port 3130       

The solution stores the files squid.conf and whitelist.txt in an Amazon Simple Storage Service (S3) bucket and runs the following script every minute on the Squid instances to download and update the Squid configuration from S3. This makes it easy to maintain the Squid configuration from a central location. Note that it first validates the files with squid -k parse and then reload the configuration with squid -k reconfigure if no error was found.


        cp /etc/squid/* /etc/squid/old/
        aws s3 sync s3://<s3-bucket> /etc/squid
        squid -k parse && squid -k reconfigure || (cp /etc/squid/old/* /etc/squid/; exit 1)     

The solution then uses the CloudWatch Agent on the Squid instances to collect and store Squid logs in Amazon CloudWatch Logs. The log group /filtering-nat-instance/cache.log contains the error and debug messages that Squid generates and /filtering-nat-instance/access.log contains the access logs.

An access log record is a space-delimited string that has the following format:

<time> <response_time> <client_ip> <status_code> <size> <method> <url> <sni> <remote_host> <mime>

The following table describes the fields of an access log record.

Field Description
time Request time in seconds since epoch
response_time Response time in milliseconds
client_ip Client source IP address
status_code Squid request status and HTTP response code sent to the client. For example, a HTTP request to an unallowed domain logs TCP_DENIED/403, and a HTTPS request to a whitelisted domain logs TCP_TUNNEL/200
size Total size of the response sent to client
method Request method like GET or POST.
url Request URL received from the client. Logged for HTTP requests only
sni Domain name intercepted in the SNI. Logged for HTTPS requests only
remote_host Squid hierarchy status and remote host IP address
mime MIME content type. Logged for HTTP requests only

The following are some examples of access log records:


1563718817.184 14 10.0.0.28 TCP_DENIED/403 3822 GET http://example.com/ - HIER_NONE/- text/html
1563718821.573 7 10.0.0.28 TAG_NONE/200 0 CONNECT 172.217.7.227:443 example.com HIER_NONE/- -
1563718872.923 32 10.0.0.28 TCP_TUNNEL/200 22927 CONNECT 52.216.187.19:443 calculator.s3.amazonaws.com ORIGINAL_DST/52.216.187.19 –   

Designing a high availability solution

The Squid instances introduce a single point of failure for the private subnets. If a Squid instance fails, the instances in its associated private subnet cannot send outbound traffic anymore. The following diagram illustrates the architecture that I propose to address this situation within an Availability Zone.

Figure 2: The architecture to address if a Squid instance fails within an Availability Zone

Figure 2: The architecture to address if a Squid instance fails within an Availability Zone

Each Squid instance is launched in an Amazon EC2 Auto Scaling group that has a minimum size and a maximum size of one instance. A shell script is run at startup to configure the instances. That includes installing and configuring Squid (see Running Commands on Your Linux Instance at Launch).

The solution uses the CloudWatch Agent and its procstat plugin to collect the CPU usage of the Squid process every 10 seconds. For each Squid instance, the solution creates a CloudWatch alarm that watches this custom metric and goes to an ALARM state when a data point is missing. This can happen, for example, when Squid crashes or the Squid instance fails. Note that for my use case, I consider watching the Squid process a sufficient approach to determining the health status of a Squid instance, although it cannot detect eventual cases of the Squid process being alive but unable to forward traffic. As a workaround, you can use an end-to-end monitoring approach, like using witness instances in the private subnets to send test requests at regular intervals and collect the custom metric.

When an alarm goes to ALARM state, CloudWatch sends a notification to an Amazon Simple Notification Service (SNS) topic which then triggers an AWS Lambda function. The Lambda function marks the Squid instance as unhealthy in its Auto Scaling group, retrieves the list of healthy Squid instances based on the state of other CloudWatch alarms, and updates the route tables that currently route traffic to the unhealthy Squid instance to instead route traffic to the first available healthy Squid instance. While the Auto Scaling group automatically replaces the unhealthy Squid instance, private instances can send outbound traffic through the Squid instance in the other Availability Zone.

When the CloudWatch agent starts collecting the custom metric again on the replacement Squid instance, the alarm reverts to OK state. Similarly, CloudWatch sends a notification to the SNS topic, which then triggers the Lambda function. The Lambda function completes the lifecycle action (see Amazon EC2 Auto Scaling Lifecycle Hooks) to indicate that the replacement instance is ready to serve traffic, and updates the route table associated to the private subnet in the same availability zone to route traffic to the replacement instance.

Implementing and testing the solution

Now that you understand the architecture behind this solution, you can follow the instructions in this section to implement and test the solution in your AWS account.

Implementing the solution

First, you’ll use AWS CloudFormation to provision the required resources. Select the Launch Stack button below to open the CloudFormation console and create a stack from the template. Then, follow the on-screen instructions.

Select this image to open a link that starts building the CloudFormation stack

CloudFormation will create the following resources:

  • An Amazon Virtual Private Cloud (Amazon VPC) with an internet gateway attached.
  • Two public subnets and two private subnets on the Amazon VPC.
  • Three route tables. The first route table is associated to the public subnets to make them publicly accessible. The other two route tables are associated to the private subnets.
  • An S3 bucket to store the Squid configuration files, and two Lambda-based custom resources to add the files squid.conf and whitelist.txt to this bucket.
  • An IAM role to grant the Squid instances permissions to read from the S3 bucket and use the CloudWatch agent.
  • A security group to allow HTTP and HTTPS traffic from instances in the private subnets.
  • A launch configuration to specify the template of Squid instances. That includes commands to run at startup for automating the initial configuration.
  • Two Auto Scaling groups that use this launch configuration to launch the Squid instances.
  • A Lambda function to redirect the outbound traffic and recover a Squid instance when it fails.
  • Two CloudWatch alarms to watch the custom metric sent by Squid instances and trigger the Lambda function when the health status of Squid instances changes.
  • An EC2 instance in the first private subnet to test the solution, and an IAM role to grant this instance permissions to use the SSM agent. Session Manager, which I introduce in the next paragraph, uses this SSM agent (see Working with SSM Agent)

Testing the solution

After the stack creation has completed (it can take up to 10 minutes), connect onto the Testing Instance using Session Manager, a capability of AWS Systems Manager that lets you manage instances through an interactive shell without the need to open an SSH port:

  1. Open the AWS Systems Manager console.
  2. In the navigation pane, choose Session Manager.
  3. Choose Start Session.
  4. For Target instances, choose the option button to the left of Testing Instance.
  5. Choose Start Session.

Note: Session Manager makes calls to several AWS endpoints (see Working with SSM Agent). If you prefer to restrict access to a defined set of AWS services, make sure to whitelist the associated domains.

After the connection is made, you can test the solution with the following commands. Only the last three requests should return a valid response, because Squid allows traffic to *.amazonaws.com only.


curl http://www.amazon.com
curl https://www.amazon.com
curl http://calculator.s3.amazonaws.com/index.html
curl https://calculator.s3.amazonaws.com/index.html
aws ec2 describe-regions --region us-east-1         

To find the requests you just made in the access logs, here’s how to browse the Squid logs in Amazon CloudWatch Logs:

  1. Open the Amazon CloudWatch console.
  2. In the navigation pane, choose Logs.
  3. For Log Groups, choose the log group /filtering-nat-instance/access.log.
  4. Choose Search Log Group to view and search log records.

To test how the solution behaves when a Squid instance fails, you can terminate one of the Squid instances manually in the Amazon EC2 console. Then, watch the CloudWatch alarm change its state in the Amazon CloudWatch console, or watch the solution change the default route of the impacted route table in the Amazon VPC console.

You can now delete the CloudFormation stack to clean up the resources that were just created.

Discussion: Transparent or forward proxy?

The solution that I describe in this blog is fully transparent for instances in the private subnets, which means that instances don’t need to be aware of the proxy and can make requests as if they were behind a standard NAT instance. An alternate solution is to deploy a forward proxy in your Amazon VPC and configure instances in private subnets to use it (see the blog post How to set up an outbound VPC proxy with domain whitelisting and content filtering for an example). In this section, I discuss some of the differences between the two solutions.

Supportability

A major drawback with forward proxies is that the proxy must be explicitly configured on every instance within the private subnets. For example, you can configure the HTTP_PROXY and HTTPS_PROXY environment variables on Linux instances, but some applications or services, like yum, require their own proxy configuration, or don’t support proxy usage. Note also that some AWS services and features, like Amazon EMR or Amazon SageMaker notebook instances, don’t support using a forward proxy at the time of this post. However, with TLS 1.3, a forward proxy is the only option to restrict outbound traffic if the SNI is encrypted.

Scalability

Deploying a forward proxy on AWS usually consists of a load balancer distributing traffic to a set of proxy instances launched in an Auto Scaling group. Proxy instances can be launched or terminated dynamically depending on the demand (also known as “horizontal scaling”). With forward proxies, each route table can route traffic to a single instance at a time, and changing the type of the instance is the only way to increase or decrease the capacity (also known as “vertical scaling”).

The solution I present in this post does not dynamically adapt the instance type of the Squid instances based on the demand. However, you might consider a mechanism in which the traffic from a private subnet is temporarily redirected through another Availability Zone while the Squid instance is being relaunched by Auto Scaling with a smaller or larger instance type.

Mutualization

Deploying a centralized proxy solution and using it across multiple VPCs is a way of reducing cost and operational complexity.

With a forward proxy, instances in private subnets send IP packets to the proxy load balancer. Therefore, sharing a forward proxy across multiple VPCs only requires connectivity between the “instance VPCs” and a proxy VPC that has VPC Peering or equivalent capabilities.

With a transparent proxy, instances in private subnets sends IP packets to the remote host. VPC Peering does not support transitive routing (see Unsupported VPC Peering Configurations) and cannot be used to share a transparent proxy across multiple VPCs. However, you can now use an AWS Transit Gateway that acts as a network transit hub to share a transparent proxy across multiple VPCs. I give an example in the next section.

Sharing the solution across multiple VPCs using AWS Transit Gateway

In this section, I give an example of how to share a transparent proxy across multiple VPCs using AWS Transit Gateway. The architecture is illustrated in the following diagram. For the sake of simplicity, the diagram does not include Availability Zones.

Figure 3: The architecture for a transparent proxy across multiple VPCs using AWS Transit Gateway

Figure 3: The architecture for a transparent proxy across multiple VPCs using AWS Transit Gateway

Here’s how instances in the private subnet of “VPC App” can make requests via the shared transparent proxy in “VPC Shared:”

  1. When instances in VPC App make HTTP/S requests, the network packets they send have the public IP address of the remote host as the destination address. These packets are forwarded to the transit gateway, based on the route table associated to the private subnet.
  2. The transit gateway receives the packets and forwards them to VPC Shared, based on the default route of the transit gateway route table.
  3. Note that the transit gateway attachment resides in the transit gateway subnet. When the packets arrive in VPC Shared, they are forwarded to the Squid instance because the next destination has been determined based on the route table associated to the transit gateway subnet.
  4. The Squid instance makes requests on behalf of the source instance (“Instances” in the schema). Then, it sends the response to the source instance. The packets that it emits have the IP address of the source instance as the destination address and are forwarded to the transit gateway according to the route table associated to the public subnet.
  5. The transit gateway receives and forwards the response packets to VPC App.
  6. Finally, the response reaches the source instance.

In a high availability deployment, you could have one transit gateway subnet per Availability Zone that sends traffic to the Squid instance that resides in the same Availability Zone, or to the Squid instance in another Availability Zone if the instance in the same Availability Zone fails.

You could also use AWS Transit Gateway to implement a transparent proxy solution that scales horizontally. This allows you to add or remove proxy instances based on the demand, instead of changing the instance type. With this approach, you must deploy a fleet of proxy instances – launched by an Auto Scaling group, for example – and mount a VPN connection between each instance and the transit gateway. The proxy instances need to support ECMP (“Equal Cost Multipath routing”; see Transit Gateways) to equally spread the outbound traffic between instances. I don’t describe this alternative architecture further in this blog post.

Conclusion

In this post, I’ve shown how you can use Squid to implement a high availability solution that filters outgoing traffic to the Internet and helps meet your security and compliance needs, while being fully transparent for the back-end instances in your VPC. I’ve also discussed the key differences between transparent proxies and forward proxies. Finally, I gave an example of how to share a transparent proxy solution across multiple VPCs using AWS Transit Gateway.

If you have any questions or suggestions, please leave a comment below or on the Amazon VPC forum.

If you have feedback about this blog post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Nicolas Malaval

Nicolas is a Solution Architect for Amazon Web Services. He lives in Paris and helps our healthcare customers in France adopt cloud technology and innovate with AWS. Before that, he spent three years as a Consultant for AWS Professional Services, working with enterprise customers.