Biesheuvel: Mitigating kernel risks on 32-bit ARM

Post Syndicated from original https://lwn.net/Articles/885912/

Ard Biesheuvel writes
about 32-bit Arm systems
on the Google Security Blog, with a focus on
why these processors are still in use and what is being done to increase
their security at the kernel level.

Preventing stack overflows from corrupting unrelated memory
contents is the goal of VMAP_STACK, which we are enabling
for 32-bit ARM
as well. When VMAP_STACK is enabled, kernel mode
stacks are allocated from the kernel heap as before, but mapped
into a different part of the kernel’s address space, and surrounded
by guard regions, which are guaranteed to be kept
unpopulated. Given that accesses to such unpopulated regions will
trigger an exception, the kernel’s memory management layer can step
in and terminate the program as soon as a stack overflow occurs,
and prevent it from causing memory corruption.

What is cryptographic computing? A conversation with two AWS experts

Post Syndicated from Supriya Anand original https://aws.amazon.com/blogs/security/a-conversation-about-cryptographic-computing-at-aws/

Joan Feigenbaum
Joan Feigenbaum
Amazon Scholar, AWS Cryptography
Bill Horne
Bill Horne
Principal Product Manager, AWS Cryptography

AWS Cryptography tools and services use a wide range of encryption and storage technologies that can help customers protect their data both at rest and in transit. In some instances, customers also require protection of their data even while it is in use. To address these needs, Amazon Web Services (AWS) is developing new techniques for cryptographic computing, a set of technologies that allow computations to be performed on encrypted data, so that sensitive data is never exposed. This foundation is used to help protect the privacy and intellectual property of data owners, data users, and other parties involved in machine learning activities.

We recently spoke to Bill Horne, Principal Product Manager in AWS Cryptography, and Joan Feigenbaum, Amazon Scholar in AWS Cryptography, about their experiences with cryptographic computing, why it’s such an important topic, and how AWS is addressing it.

Tell me about yourselves: what made you decide to work in cryptographic computing? And, why did you come to AWS to do cryptographic computing?

Joan: I’m a computer science professor at Yale and an Amazon Scholar. I started graduate school at Stanford in Computer Science in the fall of 1981. Before that, I was an undergraduate math major at Harvard. Almost from the beginning, I have been interested in what has now come to be called cryptographic computing. During the fall of 1982, Andrew Yao, who was my PhD advisor, published a paper entitled “Protocols for Secure Computation,” which introduced the millionaire’s problem: Two millionaires want to run a protocol at the end of which they will know which one of them has more millions, but not know exactly how many millions the other one has. If you dig deeper, you’ll find a few antecedents, but that’s the paper that’s usually credited with launching the field of cryptographic computing. Over the course of my 40 years as a computer scientist, I’ve worked in many different areas of computer science research, but I’ve always come back to cryptographic computing, because it’s absolutely fascinating and has many practical applications.

Bill: I originally got my PhD in Machine Learning in 1993, but I switched over to security in the late 1990s. I’ve spent most of my career in industrial research laboratories, where I was always interested in how to bring technology out of the lab and get it into real products. There’s a lot of interest from customers right now around cryptographic computing, and so I think that we’re at a really interesting point in time, where this could take off in the next few years. Being a part of something like this is really exciting.

What exactly is cryptographic computing?

Bill: Cryptographic computing is not a single thing. Rather, it is a methodology for protecting data in use—a set of techniques for doing computation over sensitive data without revealing that data to other parties. For example, if you are a financial services company, you might want to work with other financial services companies to develop machine learning models for credit card fraud detection. You might need to use sensitive data about your customers as training data for your models, but you don’t want to share your customer data in plaintext form with the other companies, and vice versa. Cryptographic computing gives organizations a way to train models collaboratively without exposing plaintext data about their customers to each other, or even to an intermediate third party such as a cloud provider like AWS.

Why is it challenging to protect data in use? How does cryptographic computing help with this challenge?

Bill: Protecting data-at-rest and data-in-transit using cryptography is very well understood.

Protecting data-in-use is a little trickier. When we say we are protecting data-in-use, we mean protecting it while we are doing computation on it. One way to do that is with other types of security mechanisms besides encryption. Specifically, we can use isolation and access control mechanisms to tightly control who or what can gain access to those computations. The level of control can vary greatly from standard virtual machine isolation, all the way down to isolated, hardened, and constrained enclaves backed by a combination of software and specialized hardware. The data is decrypted and processed within the enclave, and is inaccessible to any external code and processes. AWS offers Nitro Enclaves, which is a very tightly controlled environment that uses this kind of approach.

Cryptographic computing offers a completely different approach to protecting data-in-use. Instead of using isolation and access control, data is always cryptographically protected, and the processing happens directly on the protected data. The hardware doing the computation doesn’t even have access to the cryptographic keys used to encrypt the data, so it is computationally intractable for that hardware, any software running on that hardware, or any person who has access to that hardware to learn anything about your data. In fact, you arguably don’t even need isolation and access control if you are using cryptographic computing, since nothing can be learned by viewing the computation.

What are some cryptographic computing techniques and how do they work?

Bill: Two applicable fundamental cryptographic computing techniques are homomorphic encryption and secure multi-party computation. Homomorphic encryption allows for computation on encrypted data. Basically, the idea is that there are special cryptosystems that support basic mathematical operations like addition and multiplication which work on encrypted data. From those simple operations, you can form complex circuits to implement any function you want.

Secure multi-party computation is a very different paradigm. In secure multi-party computation, you have two or more parties who want to jointly compute some function, but they don’t want to reveal their data to each other. An example might be that you have a list of customers and I have a list of customers, and we want to find out what customers we have in common without revealing anything else about our data to each other, in order to protect customer privacy. That’s a special kind of multi-party computation called private set intersection (PSI).

Joan: To add some detail to what Bill said, homomorphic encryption was heavily influenced by a 2009 breakthrough by Craig Gentry, who is now a Research Fellow at the Algorand Foundation. If a customer has dataset X, needs f(X), and is willing to reveal X to the server, he uploads X and has the cloud service compute Y= f(X) and return Y. If he wants (or is required by law or policy) to hide X from the cloud provider, he homomorphically encrypts X on the client side to get X’, uploads it, receives an encrypted result Y’, and homomorphically decrypts Y’ (again on the client side) to get Y. The confidential data, the result, and the cryptographic keys all remain on the client side.

In secure multi-party computation, there are n ≥ 2 parties that have datasets X1, X2, …, Xn, and they wish to compute Y=f(X1, X2, …, Xn). No party wants to reveal to the others anything about his own data that isn’t implied by the result Y. They execute an n-party protocol in which they exchange messages and perform local computations; at the end, all parties know the result, but none has obtained additional information about the others’ inputs or the intermediate results of the (often multi-round) distributed computation. Multi-party computation might use encryption, but often it uses other data-hiding techniques such as secret sharing.

Cryptographic computing seems to be appearing in the popular technical press a lot right now and AWS is leading work in this area. Why is this a hot topic right now?

Joan: There’s strong motivation to deploy this stuff now, because cloud computing has become a big part of our tech economy and a big part of our information infrastructure. Parties that might have previously managed compute environments on-premises where data privacy is easier to reason about are now choosing third-party cloud providers to provide this compute environment. Data privacy is harder to reason about in the cloud, so they’re looking for techniques where they don’t have to completely rely on their cloud provider for data privacy. There’s a tremendous amount of confidential data—in health care, medical research, finance, government, education, and so on—data which organizations want to use in the cloud to take advantage of state-of-the-art computational techniques that are hard to implement in-house. That’s exactly what cryptographic computing is intended for: using data without revealing it.

Bill: Data privacy has become one the most important issues in security. There is clearly a lot of regulatory pressure right now to protect the privacy of individuals. But progressive companies are actually trying to go above and beyond what they are legally required to do. Cryptographic computing offers customers a compelling set of new tools for being able to protect data throughout its lifecycle without exposing it to unauthorized parties.

Also, there’s a lot of hype right now about homomorphic encryption that’s driving a lot of interest in the popular tech press. But I don’t think people fully understand its power, applicability, or limitations. We’re starting to see homomorphic encryption being used in practice for some small-scale applications, but we are just at the beginning of what homomorphic encryption can offer. AWS is actively exploring ideas and finding new opportunities to solve customer problems with this technology.

Can you talk about the research that’s been done at AWS in cryptographic computing?

Joan: We researched and published on a novel use of homomorphic encryption applied to a popular machine learning algorithm called XGBoost. You have an XGBoost model that has been trained in the standard way, and a large set of users that want to query that model. We developed PPXGBoost inference (where the “PP” stands for privacy preserving). Each user stores a personalized, encrypted version of the model on a remote server, and then submits encrypted queries to that server. The user receives encrypted inferences, which are decrypted and stored on a personal device. For example, imagine a healthcare application, where over time the device uses these inferences to build up a health profile that is stored locally. Note that the user never reveals any personal health data to the server, because the submitted queries are all encrypted.

There’s another application our colleague Eric Crockett, Sr. Applied Scientist, published a paper about. It deals with a standard machine-learning technique called logistic regression. Crockett developed HELR, an application that trains logistic-regression models on homomorphically encrypted data.

Both papers are available on the AWS Cryptographic Computing webpage. The HELR code and PPXGBoost code are available there as well. You can download that code, experiment with it, and use it in your applications.

What are you working on right now that you’re excited about?

Bill: We’ve been talking with a lot of internal and external customers about their data protection problems, and have identified a number of areas where cryptographic computing offers solutions. We see a lot of interest in collaborative data analysis using secure multi-party computation. Customers want to jointly compute all sorts of functions and perform analytics without revealing their data to each other. We see interest in everything from simple comparisons of data sets through jointly training machine learning models.

Joan: To add to what Bill said: We’re exploring two use cases in which cryptographic computing (in particular, secure multi-party computation and homomorphic encryption) can be applied to help solve customers’ security and privacy challenges at scale. The first use case is privacy-preserving federated learning, and the second is private set intersection (PSI).

Federated learning makes it possible to take advantage of machine learning while minimizing the need to collect user data. Imagine you have a server and a large set of clients. The server has constructed a model and pushed it out to the clients for use on local devices; one typical use case is voice recognition. As clients use the model, they make personalized updates that improve it. Some of the local improvements made locally in my environment could also be relevant in millions of other users’ environments. The server gathers up all these local improvements and aggregates them into one improvement to the global model; then the next time it pushes out a new model to existing and new clients, it has an improved model to push out. To accomplish privacy-preserving federated learning, one uses cryptographic computing techniques to ensure that individual users’ local improvements are never revealed to the server or to other users in the process of computing a global improvement.

Using PSI, two or more AWS customers who have related datasets can compute the intersection of their datasets—that is, the data elements that they all have in common—while hiding crucial information about the data elements that are not common to all of them. PSI is a key enabler in several business use cases that we have heard about from customers, including data enrichment, advertising, and healthcare.

This post is meant to introduce some of the cryptographic computing and novel use cases AWS is exploring. If you are serious about exploring this approach, we encourage you to reach out to us and discuss what problems you are trying to solve and whether cryptographic computing can help you. Learn more and get in touch with us at our Cryptographic Computing webpage or send us an email at [email protected]

Want more AWS Security news? Follow us on Twitter.

Author

Supriya Anand

Supriya is a Senior Digital Strategist at AWS, focused on marketing, encryption, and emerging areas of cybersecurity. She has worked to drive large scale marketing and content initiatives forward in a variety of regulated industries. She is passionate about helping customers learn best practices to secure their AWS cloud environment so they can innovate faster on behalf of their business.

Author

Maddie Bacon

Maddie (she/her) is a technical writer for AWS Security with a passion for creating meaningful content. She previously worked as a security reporter and editor at TechTarget and has a BA in Mathematics. In her spare time, she enjoys reading, traveling, and all things Harry Potter.

Intel acquires Linutronix

Post Syndicated from original https://lwn.net/Articles/885903/

Intel has announced
the acquisition of Linutronix.

Linutronix is comprised of a team of highly qualified and motivated
employees with a wealth of experience and involvement in the
ongoing development of Linux. Led by CEO Heinz Egger and CTO Thomas
Gleixner, Linutronix is the architect of PREEMPT_RT (Real Time) and
the leading technology provider for industrial Linux. Gleixner has
been the principal maintainer of x86 architecture in the Linux
kernel since 2008.


The plan is evidently to continue to run Linutronix as an independent
company rather than absorbing it into Intel.

For Health Insurance Companies, Web Apps Can Be an Open Wound

Post Syndicated from Paul Prudhomme original https://blog.rapid7.com/2022/02/23/for-health-insurance-companies-web-apps-can-be-an-open-wound/

For Health Insurance Companies, Web Apps Can Be an Open Wound

At IntSights, a Rapid7 company, our goal is to ensure organizations everywhere understand the threats facing them in today’s cyber landscape. With this in mind, we took a focused look at the insurance industry — a highly targeted vertical due to the amount of valuable data these organizations hold. We’ve collected our findings in the “2022 Insurance Industry Cyber Threat Landscape Report,” which you can read in full right now.

As part of this research, we reviewed threats specific to each vertical in the insurance industry. Healthcare insurance providers, in particular, have large targets on their backs. Criminals often aim to breach healthcare providers to gain access to various personal health information (PHI), which can include everything from sensitive patient health information to healthcare insurance policy details. Once this data falls into the wrong hands, it can be used to conduct fraud and exploit patients in a variety of ways.

This being the case, health insurance providers need to lock down their security perimeter as much as possible, and there’s one broader problem affecting the industry we want to highlight here: web app security. Security bugs or misconfigurations of public-facing customer web applications can often be overlooked, and they are major areas of concern because they serve as entry points for bad actors.

Let’s explore why these vulnerabilities are dangerous, how they happen in the first place, and what healthcare insurance providers can do to mitigate these threats and protect their policyholders.

Web apps as an entry point

Public-facing web applications are commonly used in the insurance industry to gather information about an individual or an organization. This data is often leveraged to generate a quote estimate for the type of insurance policy the person or company is looking for. While this can be a helpful way to personalize the customer experience and attract more customers to your business by showcasing competitive rates, it can also inadvertently expose inputted information if the app is misconfigured.

Take, for example, a vulnerability discovered in home and pet insurance provider Lemonade’s website. By simply clicking on public search results, a person could access and edit customers’ accounts without providing any credentials. From there, a bad actor could steal personally identifiable data and exploit it with barely any hassle at all.

The shocking part of this incident is that Lemonade spokespeople claimed this website flaw was “by design.” During the setup of the website, the team responsible likely didn’t realize anyone could log in, access a customer’s account, and even download a copy of that individual’s insurance policy. Since then, the indexed search results have stopped working, but it just goes to show how a simple oversight like that can be an open door for bad actors, who usually have to search much harder to find and exploit a vulnerability.

This is why health insurance companies should pay extra attention to how their public-facing apps, websites, and portals are configured. With the treasure trove of PHI they store — including everything from COVID-19 vaccination records to insurance policy details that list patient Social Security numbers, birthdates, and even Medicare or Medicaid coverage — healthcare organizations are prime targets for hackers eager to conduct insurance fraud, and a misconfiguration could give them easy access to this data.

How do misconfigurations happen?

Misconfigurations can happen at any level of an application stack, from the application server to the network services and beyond. As such, bad actors will try to exploit any misconfigurations in your stack by looking for unpatched flaws, unused pages, unprotected files or directories, and even dummy accounts that can get them into a system and open up access to data from within. This can lead to a complete system compromise and should be taken seriously.

But how do misconfigurations happen in the first place? Here are a few of the most common security misconfigurations in web apps:

  • Exposing too much information: If an attacker discovers what type of software you’re using for a public-facing web application, it will be much easier for them to search for and find vulnerabilities. There are some clever ways they go about learning this information; for example, they may be able to tell from an error message what type of back end you’re using. Anything that reveals stack traces or exposes information about what systems you’re using needs to be taken care of.
  • Default settings: When deploying new software, it usually comes out of the box with all functionality activated. However, every extra functionality is just another point of entry that you need to lock down. Never leave all default settings on, and make sure to change default accounts and passwords for everything, from admin consoles to hardware.
  • A lack of permissions: When your user permissions or account security settings are not strict, attackers may be able to access an account and run commands in the operating system. In the Lemonade example, for instance, anyone that found the account pages through search could log into the accounts without inputting user credentials.
  • Outdated software: Updating and patching software regularly is required to shore up any security vulnerabilities. This is even more critical for public-facing applications, as bad actors will often run down a list of known vulnerabilities to exploit a system. If the software isn’t up to date, it could leave a wide-open hole in your defenses.

How to resolve and prevent configuration issues in web apps

For healthcare security and health IT teams looking to find, fix, and prevent configuration issues in web apps, here are a few ways you can start:

  • Establish secure installation processes. A repeatable hardening process will help you deploy new software faster and easier in the future. This process, once outlined, should then be configured identically across your environments and automated to minimize effort.
  • Do not install unused features and frameworks. When first setting up your application, don’t deploy with the default settings. Review every feature, functionality, and framework, and remove any you do not want or plan to use. This will help you launch with a minimal platform that will be easier to harden.
  • Implement strict permissions. Ensure that different credentials are used in each environment, from development to production. Default user accounts and passwords should always be changed as soon as possible, and you will want to implement strict requirements for credentials.
  • Review and update configurations regularly. You might think you’re done once you’ve deployed your app, but you should always come back to review and update configurations on a consistent basis. Scan for errors, apply patches, and verify the effectiveness of your configurations and settings in all environments for maximum protection.
  • Generate a software bill of materials (SBOM) and cross-reference it against vulnerabilities often. It’s important to know every component comprised within a piece of software. You can easily generate an SBOM with a variety of open-source and commercially available third-party applications, and once you have it in hand, regularly cross-reference the components in it against known vulnerability lists.

Cyber threat intelligence can also help, as it can inform your health IT team about any threats facing your web app security. For example, threat intelligence can reveal what bad actors hope to acquire from your web apps and the methods they may try to use to obtain it. When you gather key information like this, you can tailor your defenses appropriately.

By leveraging robust cyber threat intelligence solutions and performing rigorous testing and scrutiny of public-facing web applications and other infrastructure, health insurance organizations and their healthcare security teams can better protect their environments and avoid inadvertently exposing customer data.

To learn more about the threats facing the insurance industry today — and some recommendations to protect against them — read the full research report here: “2022 Insurance Industry Cyber Threat Landscape Report.”

Additional reading:

Security updates for Wednesday

Post Syndicated from original https://lwn.net/Articles/885885/

Security updates have been issued by Debian (expat), Fedora (php and vim), Mageia (cpanminus, expat, htmldoc, nodejs, polkit, util-linux, and varnish), Red Hat (389-ds-base, curl, kernel, kernel-rt, openldap, python-pillow, rpm, sysstat, and unbound), Scientific Linux (389-ds-base, kernel, openldap, and python-pillow), and Ubuntu (cyrus-sasl2, linux-oem-5.14, and php7.0).

Making protocols post-quantum

Post Syndicated from Thom Wiggers original https://blog.cloudflare.com/making-protocols-post-quantum/

Making protocols post-quantum

Making protocols post-quantum

Ever since the (public) invention of cryptography based on mathematical trap-doors by Whitfield Diffie, Martin Hellman, and Ralph Merkle, the world has had key agreement and signature schemes based on discrete logarithms. Rivest, Shamir, and Adleman invented integer factorization-based signature and encryption schemes a few years later. The core idea, that has perhaps changed the world in ways that are hard to comprehend, is that of public key cryptography. We can give you a piece of information that is completely public (the public key), known to all our adversaries, and yet we can still securely communicate as long as we do not reveal our piece of extra information (the private key). With the private key, we can then efficiently solve mathematical problems that, without the secret information, would be practically unsolvable.

In later decades, there were advancements in our understanding of integer factorization that required us to bump up the key sizes for finite-field based schemes. The cryptographic community largely solved that problem by figuring out how to base the same schemes on elliptic curves. The world has since then grown accustomed to having algorithms where public keys, secret keys, and signatures are just a handful of bytes and the speed is measured in the tens of microseconds. This allowed cryptography to be bolted onto previously insecure protocols such as HTTP or DNS without much overhead in either time or the data transmitted. We previously wrote about how TLS loves small signatures; similar things can probably be said for a lot of present-day protocols.

But this blog has “post-quantum” in the title; quantum computers are likely to make our cryptographic lives significantly harder by undermining many of the assurances we previously took for granted. The old schemes are no longer secure because new algorithms can efficiently solve their particular mathematical trapdoors. We, together with everyone on the Internet, will need to swap them out. There are whole suites of quantum-resistant replacement algorithms; however, right now it seems that we need to choose between “fast” and “small”. The new alternatives also do not always have the same properties that we have based some protocols on.

Making protocols post-quantum
Fast or small: Cloudflare previously experimented with NTRU-HRSS (a fast key exchange scheme with large public keys and ciphertexts) and SIKE (a scheme with very small public keys and ciphertexts, but much slower algorithm operations).

In this blog post, we will discuss how one might upgrade cryptographic protocols to make them secure against quantum computers. We will focus on the cryptography that they use and see what the challenges are in making them secure against quantum computers. We will show how trade-offs might motivate completely new protocols for some applications. We will use TLS here as a stand-in for other protocols, as it is one of the most deployed protocols.

Making TLS post-quantum

TLS, from SSL and HTTPS fame, gets discussed a lot. We keep it brief here. TLS 1.3 consists of an Ephemeral Elliptic curve Diffie-Hellman (ECDH) key exchange which is authenticated by a digital signature that’s verified by using a public key that’s provided by the server in a certificate. We know that this public key is the right one because the certificate contains another signature by the issuer of the certificate and our client has a repository of valid issuer (“certificate authority”) public keys that it can use to verify the authenticity of the server’s certificate.

In principle, TLS can become post-quantum straightforwardly: we just write “PQ” in front of the algorithms. We replace ECDH key exchange by post-quantum (PQ) key exchange provided by a post-quantum Key Encapsulation Mechanism (KEM). For the signatures on the handshake, we just use a post-quantum signature scheme instead of an elliptic curve or RSA signature scheme. No big changes to the actual “arrows” of the protocol necessary, which is super convenient because we don’t need to revisit our security proofs. Mission accomplished, cake for everyone, right?

Making protocols post-quantum
Upgrading the cryptography in TLS seems as easy as scribbling in “PQ-”.

Key exchange

Of course, it’s not so simple. There are nine different suites of post-quantum key exchange algorithms still in the running in round three of the NIST Post-Quantum standardization project: Kyber, SABER, NTRU, and Classic McEliece (the “finalists”); and SIKE, BIKE, FrodoKEM, HQC, and NTRU Prime (“alternates”). These schemes have wildly different characteristics. This means that for step one, replacing the key exchange by post-quantum key exchange, we need to understand the differences between these schemes and decide which one fits best in TLS. Because we’re doing ephemeral key exchange, we consider the size of the public key and the ciphertext since they need to be transmitted for every handshake. We also consider the “speed” of the key generation, encapsulation, and decapsulation operations, because these will affect how many servers we will need to handle these connections.

Table 1: Post-Quantum KEMs at security level comparable with AES128. Sizes in bytes.

Scheme Transmission size (pk+ct) Speed of operations
Finalists
Kyber512 1,632 Very fast
NTRU-HPS-2048-509 1,398 Very fast
SABER (LightSABER) 1,408 Very fast
Classic McEliece 261,248 Very slow
Alternate candidates
SIKEp434 676 Slow
NTRU Prime (ntrulpr) 1,922 Very fast
NTRU Prime (sntru) 1,891 Fast
BIKE 5,892 Slow
HQC 6,730 Reasonable
FrodoKEM 19,336 Reasonable

Fortunately, once we make this table the landscape for KEMs that are suitable for use in TLS quickly becomes clear. We will have to sacrifice an additional 1,400 – 2,000 bytes, assuming SIKE’s slow runtime is a bigger penalty to the connection establishment (see our previous work here). So we can choose one of the lattice-based finalists (Kyber, SABER or NTRU) and call it a day.1

Signature schemes

For our post-quantum signature scheme, we can draw a similar table. In TLS, we generally care about the sizes of public keys and signatures. In terms of runtime, we care about signing and verification times, as key generation is only done once for each certificate, offline. The round three candidates for signature schemes are: Dilithium, Falcon, Rainbow (the three finalists), and SPHINCS+, Picnic, and GeMSS.

Table 2: Post-Quantum signature schemes at security level comparable with AES128 (or smallest parameter set). Sizes in bytes.

Scheme Public key size Signature size Speed of operations
Finalists
Dilithium2 1,312 2,420 Very fast
Falcon-512 897 690 Fast if you have the right hardware
Rainbow-I-CZ 103,648 66 Fast
Alternate Candidates
SPHINCS+-128f 32 17,088 Slow
SPHINCS+-128s 32 7,856 Very slow
GeMSS-128 352,188 33 Very slow
Picnic3 35 14,612 Very slow

There are many signatures in a TLS handshake. Aside from the handshake signature that the server creates to authenticate the handshake (with public key in the server certificate), there are signatures on the certificate chain (with public keys for intermediate certificates), as well as OCSP Stapling (1) and Certificate Transparency (2) signatures (without public keys).

This means that if we used Dilithium for all of these, we require 17KB of public keys and signatures. Falcon is looking very attractive here, only requiring 6KB, but it might not run fast enough on embedded devices that don’t have special hardware to accelerate 64-bit floating point computations in constant time. SPHINCS+, GeMSS, or Rainbow each have significant deployment challenges, so it seems that there is no one-scheme-fits-all solution.

Picking and choosing specific algorithms for particular use cases, such as using a scheme with short signatures for root certificates, OCSP Stapling, and CT might help to alleviate the problems a bit. We might use Rainbow for the CA root, OCSP staples, and CT logs, which would mean we only need 66 bytes for each signature. It is very nice that Rainbow signatures are only very slightly larger than 64-byte ed25519 elliptic curve signatures, and they are significantly smaller than 256-byte RSA-2048 signatures. This gives us a lot of space to absorb the impact of the larger handshake signatures required. For intermediate certificates, where both the public key and the signature are transmitted, we might use Falcon because it’s nice and small, and the client only needs to do signature verification.

Using KEMs for authentication

In the pre-quantum world, key exchange and signature schemes used to be roughly equivalent in terms of work required or bytes transmitted. As we saw in the previous section, this doesn’t hold up in the post-quantum world. This means that this might be a good opportunity to also investigate alternatives to the classic “signed key exchange” model. Deploying significant changes to an existing protocol might be harder than just swapping out primitives, but we might gain better characteristics. We will look at such a proposed redesign for TLS here.

The idea is to use key exchange not just for confidentiality, but also for authentication. This uses the following idea: what a signature in a protocol like TLS is actually proving is that the person signing has possession of the secret key that corresponds to the public key that’s in the certificate. But we can also do this with a key exchange key by showing you can derive the same shared secret (if you want to prove this explicitly, you might do so by computing a Message Authentication Code using the established shared secret).

This isn’t new; many modern cryptographic protocols, such as the Signal messaging protocol, have used such mechanisms. They offer privacy benefits like (offline) deniability. But now we might also use this to obtain a faster or “smaller” protocol.

However, this does not come for free. Because authentication via key exchange (via KEM at least) inherently requires two participants to exchange keys, we need to send more messages. In TLS, this means that the server that wants to authenticate first needs to give the client their public key. The client obviously can not encapsulate a shared secret to a key he does not know.

We also still need to verify signatures on the certificate chain and the signatures for OCSP stapling and Certificate Transparency are still necessary. Because we need to do “offline” verification for those elements of the handshake, it is hard to get rid of those signatures. So we will still need to carefully look at those signatures and pick an algorithm that fits there.

   Client                                  Server
 ClientHello         -------->
                     <--------         ServerHello
                                             <...>
                     <--------       <Certificate>  ^
 <KEMEncapsulation>                                 | Auth
 {Finished}          -------->                      |
 [Application Data]  -------->                      |
                     <--------          {Finished}  v
 [Application Data]  <------->  [Application Data]

<msg>: encrypted w/ keys derived from ephemeral KEX (HS)
{msg}: encrypted w/ keys derived from HS+KEM (AHS)
[msg]: encrypted w/ traffic keys derived from AHS (MS)

Authentication via KEM in TLS from the AuthKEM proposal

Authentication via KEM in TLS from the AuthKEM proposal

If we put the necessary arrows to authenticate via KEM into TLS it looks something like Figure 2. This is actually a fully-fledged proposal for an alternative to the usual TLS handshake. The academic proposal KEMTLS was published at the ACM CCS conference in 2020; a proposal to integrate this into TLS 1.3 is described in the draft-celi-wiggers-tls-authkem draft RFC.

What this proposal illustrates is that the transition to post-quantum cryptography might motivate, or even require, us to have a brand-new look at what the desired characteristics of our protocol are and what properties we need, like what budget we have for round-trips versus the budget for data transmitted. We might even pick up some properties, like deniability, along the way. For some protocols this is somewhat easy, like TLS; in other protocols there isn’t even a clear idea of where to start (DNSSEC has very tight limits).

Conclusions

We should not wait until NIST has finished standardizing post-quantum key exchange and signature schemes before thinking about whether our protocols are ready for the post-quantum world. For our current protocols, we should investigate how the proposed post-quantum key exchange and signature schemes can be fitted in. At the same time, we might use this opportunity for careful protocol redesigns, especially if the constraints are so tight that it is not easy to fit in post-quantum cryptography. Cloudflare is participating in the IETF and working with partners in both academia and the industry to investigate the impact of post-quantum cryptography and make the transition as easy as possible.

If you want to be a part of the future of cryptography on the Internet, either as an academic or an engineer, be sure to check out our academic outreach or jobs pages.

Reference

…..

1Of course, it’s not so simple. The performance measurements were done on a beefy Macbook, using AVX2 intrinsics. For stuff like IoT (yes, your smart washing machine will also need to go post-quantum) or a smart card you probably want to add another few columns to this table before making a choice, such as code size, side channel considerations, power consumption, and execution time on your platform.

Route leaks and confirmation biases

Post Syndicated from Maximilian Wilhelm original https://blog.cloudflare.com/route-leaks-and-confirmation-biases/

Route leaks and confirmation biases

Route leaks and confirmation biases

This is not what I imagined my first blog article would look like, but here we go.

On February 1, 2022, a configuration error on one of our routers caused a route leak of up to 2,000 Internet prefixes to one of our Internet transit providers. This leak lasted for 32 seconds and at a later time 7 seconds. We did not see any traffic spikes or drops in our network and did not see any customer impact because of this error, but this may have caused an impact to external parties, and we are sorry for the mistake.

Route leaks and confirmation biases

Timeline

All timestamps are UTC.

As part of our efforts to build the best network, we regularly update our Internet transit and peering links throughout our network. On February 1, 2022, we had a “hot-cut” scheduled with one of our Internet transit providers to simultaneously update router configurations on Cloudflare and ISP routers to migrate one of our existing Internet transit links in Newark to a link with more capacity. Doing a “hot-cut” means that both parties will change cabling and configuration at the same time, usually while being on a conference call, to reduce downtime and impact on the network. The migration started off-peak at 10:45 (05:45 local time) with our network engineer entering the bridge call with our data center engineers and remote hands on site as well as operators from the ISP.

At 11:17, we connected the new fiber link and established the BGP sessions to the ISP successfully. We had BGP filters in place on our end to not accept and send any prefixes, so we could evaluate the connection and settings without any impact on our network and services.

As the connection between our router and the ISP — like most Internet connections — was realized over a fiber link, the first item to check are the “light levels” of that link. This shows the strength of the optical signal received by our router from the ISP router and can indicate a bad connection when it’s too low. Low light levels are likely caused by unclean fiber ends or not fully seated connectors, but may also indicate a defective optical transceiver which connects the fiber link to the router – all of which can degrade service quality.

The next item on the checklist is interface errors, which will occur when a network device receives incorrect or malformed network packets, which would also indicate a bad connection and would likely lead to a degradation in service quality, too.

As light levels were good, and we observed no errors on the link, we deemed it ready for production and removed the BGP reject filters at 11:22.

This immediately triggered the maximum prefix-limit protection the ISP had configured on the BGP session and shut down the session, preventing further impact. The maximum prefix-limit is a safeguard in BGP to prevent the spread of route leaks and to protect the Internet. The limit is usually set just a little higher than the expected number of Internet prefixes from a peer to leave some headroom for growth but also catch configuration errors fast. The configured value was just 40 prefixes short of the number of prefixes we were advertising at that site, so this was considered the reason for the session to be shut down. After checking back internally, we asked the ISP to raise the prefix-limit, which they did.

The BGP session was reestablished at 12:08 and immediately shut down again. The problem was identified and fixed at 12:14.

10:45: Start of scheduled maintenance

11:17: New link was connected and BGP sessions went up (filters still in place)

11:22: Link was deemed ready for production and filters removed

11:23: BGP sessions were torn down by ISP router due to configured prefix-limit

12:08: ISP configures higher prefix-limits, BGP sessions briefly come up again and are shut down

12:14: Issue identified and configuration updated

What happened and what we’re doing about it

The outage occurred while migrating one of our Internet transits to a link with more capacity. Once the new link and a BGP session had been established, and the link deemed error-free, our network engineering team followed the peer-reviewed deployment plan. The team removed the filters from the BGP sessions, which prevented the Cloudflare router from accepting and sending prefixes via BGP.

Due to an oversight in the deployment plan, which had been peer-reviewed before without noticing this issue, no BGP filters to only export prefixes of Cloudflare and our customers were added. A peer review on the internal chat did not notice this either, so the network engineer performing this change went ahead.

ewr02# show |compare                                     
[edit protocols bgp group 4-ORANGE-TRANSIT]
-  import REJECT-ALL;
-  export REJECT-ALL;
[edit protocols bgp group 6-ORANGE-TRANSIT]
-  import REJECT-ALL;
-  export REJECT-ALL;

The change resulted in our router sending all known prefixes to the ISP router, which shut down the session as the number of prefixes received exceeded the maximum prefix-limit configured.

As the configured values for the maximum prefix-limits turned out to be rather low for the number of prefixes on our network, this didn’t come as a surprise to our network engineering team and no investigation into why the BGP session went down was started. The prefix-limit being too low seemed to be a perfectly valid reason.

We asked the ISP to increase the prefix-limit, which they did after they received approval on their side. Once the prefix-limit had been increased and the previously shutdown BGP sessions reset, the sessions were reestablished but were shut down immediately as the maximum prefix-limit was triggered again. This is when our network engineer started questioning whether there was another issue at fault and found and corrected the configuration error previously overlooked.

We made the following change in response to this event: we introduced an implicit reject policy for BGP sessions which will take effect if no import/export policy is configured for a specific BGP neighbor or neighbor group. This change has been deployed.

BGP security & preventing route-leaks — what’s in the cards?

Route leaks aren’t new, and they keep happening. The industry has come up with many approaches to limit the impact or even prevent route-leaks. Policies and filters are used to control which prefixes should be exported to or imported from a given peer. RPKI can help to make sure only allowed prefixes are accepted from a peer and a maximum prefix-limit can act as a last line of defense when everything else fails.

BGP policies and filters are commonly used to ensure only explicitly allowed prefixes are sent out to BGP peers, usually only allowing prefixes owned by the entity operating the network and its customers. They can also be used to tweak some knobs (BGP local-pref, MED, AS path prepend, etc.) to influence routing decisions and balance traffic across links. This is what the policies we have in place for our peers and transits do. As explained above, the maximum prefix-limit is intended to tear down BGP sessions if more prefixes are being sent or received than to be expected. We have talked about RPKI before, it’s the required cryptographic upgrade to BGP routing, and we still are on our path to securing Internet Routing.

To improve the overall stability of the Internet even more, in 2017, a new Internet standard was proposed, which adds another layer of protection into the mix: RFC8212 defines Default External BGP (EBGP) Route Propagation Behavior without Policies which pretty much tackles the exact issues we were facing.

This RFC updates the BGP-4 standard (RFC4271) which defines how BGP works and what vendors are expected to implement. On the Juniper operating system, JunOS, this can be activated by setting defaults ebgp no-policy reject-always on the protocols bgp hierarchy level starting with Junos OS Release 20.3R1.

If you are running an older version of JunOS, a similar effect can be achieved by defining a REJECT-ALL policy and setting this as import/export policy on the protocols bgp hierarchy level. Note that this will also affect iBGP sessions, which the solution above will have no impact on.

policy-statement REJECT-ALL {
  then reject;
}

protocol bgp {
  import REJECT-ALL;
  export REJECT-ALL;
}

Conclusion

We are sorry for leaking routes of prefixes which did not belong to Cloudflare or our customers and to network engineers who got paged as a result of this.

We have processes in place to make sure that changes to our infrastructure are reviewed before being executed, so potential issues can be spotted before they reach production. In this case, the review process failed to catch this configuration error. In response, we will increase our efforts to further our network automation, to fully derive the device configuration from an intended state.

While this configuration error was caused by human error, it could have been detected and mitigated significantly faster if the confirmation bias did not kick in, making the operator think the observed behavior was to be expected. This error underlines the importance of our existing efforts on training our people to be aware of biases we have in our life. This also serves as a great example on how confirmation bias can influence and impact our work and that we should question our conclusions (early).

It also shows how important protocols like RPKI are. Route leaks are something even experienced network operators can cause accidentally, and technical solutions are needed to reduce the impact of leaks whether they are intentional or the result of an error.

Bypassing Apple’s AirTag Security

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/02/bypassing-apples-airtag-security.html

A Berlin-based company has developed an AirTag clone that bypasses Apple’s anti-stalker security systems. Source code for these AirTag clones is available online.

So now we have several problems with the system. Apple’s anti-stalker security only works with iPhones. (Apple wrote an Android app that can detect AirTags, but how many people are going to download it?) And now non-AirTags can piggyback on Apple’s system without triggering the alarms.

Apple didn’t think this through nearly as well as it claims to have. I think the general problem is one that I have written about before: designers just don’t have intimate threats in mind when building these systems.

Building TypeScript projects with AWS SAM CLI

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/building-typescript-projects-with-aws-sam-cli/

This post written by Dan Fox, Principal Specialist Solutions Architect and Roman Boiko, Senior Specialist Solutions Architect

The AWS Serverless Application Model (AWS SAM) CLI provides developers with a local tool for managing serverless applications on AWS. This command line tool allows developers to initialize and configure applications, build and test locally, and deploy to the AWS Cloud. Developers can also use AWS SAM from IDEs like Visual Studio Code, JetBrains, or WebStorm. TypeScript is a superset of JavaScript and adds static typing, which reduces errors during development and runtime.

On February 22, 2022 we announced the beta of AWS SAM CLI support for TypeScript. These improvements simplify TypeScript application development by allowing you to build and deploy serverless TypeScript projects using AWS SAM CLI commands. To install the latest version of the AWS SAM CLI, refer to the installation section of the AWS SAM page.

In this post, I initialize a TypeScript project using an AWS SAM template. Then I build a TypeScript project using the AWS SAM CLI. Next, I use AWS SAM Accelerate to speed up the development and test iteration cycles for your TypeScript project. Last, I measure the impact of bundling, tree shaking, and minification on deployment package size.

Initializing a TypeScript template

This walkthrough requires:

AWS SAM now provides the capability to create a sample TypeScript project using a template. Since this feature is still in preview, you can enable this by one of the following methods:

  1. Use env variable `SAM_CLI_BETA_ESBUILD=1`
  2. Add the following parameters to your samconfig.toml
    [default.build.parameters]
    beta_features = true
    [default.sync.parameters]
    beta_features = true
  3. Use the --beta-features option with sam build and sam sync. I use this approach in the following examples.
  4. Choose option ‘y’ when CLI prompts you about using beta features.

To create a new project:

  1. Run – sam init
  2. In the wizard, select the following options:
    1. AWS Quick Start Templates
    2. Hello World Example
    3. nodejs14.x – TypeScript
    4. Zip
    5. Keep the name of the application as sam-app
sam init wizard steps

sam init wizard steps

Open the created project in a text editor. In the root, you see a README.MD file with the project description and a template.yaml. This is the specification that defines the serverless application.

In the hello-world folder is an app.ts file written in TypeScript. This project also includes a unit test in Jest and sample configurations for ESLint, Prettier, and TypeScript compilers.

Project structure

Project structure

Building and deploying a TypeScript project

Previously, to use TypeScript with AWS SAM CLI, you needed custom steps. These transform the TypeScript project into a JavaScript project before running the build.

Today, you can use the sam build command to transpile code from TypeScript to JavaScript. This bundles local dependencies and symlinks, and minifies files to reduce asset size.

AWS SAM uses the popular open source bundler esbuild to perform these tasks. This does not perform type checking but you may use the tsc CLI to perform this task. Once you have built the TypeScript project, use the sam deploy command to deploy to the AWS Cloud.
The following shows how this works.

  1. Navigate to the root of sam-app.
  2. Run sam build. This command uses esbuild to transpile and package app.ts.

    sam build wizard

    sam build wizard

  3. Customize the esbuild properties by editing the Metadata section in the template.yaml file.

    Esbuild configuration

    Esbuild configuration

  4. After a successful build, run sam deploy --guided to deploy the application to your AWS account.
  5. Accept all the default values in the wizard, except this question:
    HelloWorldFunction may not have authorization defined, Is this okay? [y/N]: y

    sam deploy wizard

    sam deploy wizard

  6. After successful deployment, test that the function is working by querying the API Gateway endpoint displayed in the Outputs section.

    sam deploy output

    sam deploy output

Using AWS SAM Accelerate with TypeScript

AWS SAM Accelerate is a set of features that reduces development and test cycle latency by enabling you to test code quickly against AWS services in the cloud. AWS SAM Accelerate released beta support for TypeScript. Use the template from the last example to use SAM Accelerate with TypeScript.

Use AWS SAM Accelerate to build and deploy your code upon changes.

  1. Run sam sync --stack-name sam-app --watch.
  2. Open your browser with the API Gateway endpoint from the Outputs section.
  3. Update the handler function in app.ts file to:
    export const lambdaHandler = async (event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> => {
        let response: APIGatewayProxyResult;
        try {
            response = {
                statusCode: 200,
                body: JSON.stringify({
                    message: 'hello SAM',
                }),
            };
        } catch (err) {
            console.log(err);
            response = {
                statusCode: 500,
                body: JSON.stringify({
                    message: 'some error happened',
                }),
            };
        }
    
        return response;
    };
  4. Save changes. AWS SAM automatically rebuilds and syncs the application code to the cloud.

    AWS SAM Accelerate output

    AWS SAM Accelerate output

  5. Refresh the browser to see the updated message.

Deployment package size optimizations

One additional benefit of the TypeScript build process is that it reduces your deployment package size through bundling, tree shaking, and minification. The bundling process removes dependency files not referenced in the control flow. Tree shaking is the term used for unused code elimination. It is a compiler optimization that removes unreachable code within files.

Minification reduces file size by removing white space, rewriting syntax to be more compact, and renaming local variables to be shorter. The sam build process performs bundling and tree shaking by default. Configure minification, a feature typically used in production environments, within the Metadata section of the template.yaml file.

Measure the impact of these optimizations by the reduced deployment package size. For example, measure the before and after size of an application, which includes the AWS SDK for JavaScript v3 S3 Client as a dependency.

To begin, change the package.json file to include the @aws-sdk/client-s3 as a dependency:

  1. From the application root, cd into the hello-world directory.
  2. Run the command:
    npm install @aws-sdk/client-s3
  3. Delete all the devDependencies except for esbuild to get a more accurate comparison

    package.json contents

    package.json contents

  4. Run the following command to build your dependency library:
    npm install
  5. From the application root, run the following command to measure the size of the application directory contents:
    du -sh hello-world
    The current application is approximately 50 MB.
  6. Turn on minification by setting the Minify value to true in the template.yaml file

    Metadata section of template.yaml

    Metadata section of template.yaml

  7. Now run the following command to build your project using bundling, tree shaking, and minification.
    sam build
  8. Your deployment package is now built in the .aws_sam directory. You can measure the size of the package with the following command:
    du -sh .aws-sam

The new package size is approximately 2.8 MB. That represents a 94% reduction in uncompressed application size.

Conclusion

This post reviews several new features that can improve the development experience for TypeScript developers. I show how to create a sample TypeScript project using sam init. I build and deploy a TypeScript project using the AWS SAM CLI. I show how to use AWS SAM Accelerate with your TypeScript project. Last, I measure the impact of bundling, tree shaking, and minification on a sample project. We invite the serverless community to help improve AWS SAM. AWS SAM is an open source project and you can contribute to the repository here.

For more serverless content, visit Serverless Land.

[$] Python support for regular expressions

Post Syndicated from original https://lwn.net/Articles/885682/

Regular
expressions
are a common feature of computer languages, especially
higher-level languages like Ruby, Perl, Python, and others, for doing
fairly sophisticated text-pattern matching. Some languages, including
Perl,
incorporate regular expressions into the language itself,
while others have classes or libraries that come with the language
installation. Python’s standard library has the re module,
which provides facilities for working with regular expressions; as a recent
discussion on the python-ideas mailing shows, though, that module has
somewhat fallen by the wayside in recent times.

AWS achieves FedRAMP P-ATO for 15 services in the AWS US East/West and AWS GovCloud (US) Regions

Post Syndicated from Alexis Robinson original https://aws.amazon.com/blogs/security/aws-achieves-fedramp-p-ato-for-15-services-in-the-aws-us-east-west-and-aws-govcloud-us-regions/

AWS is pleased to announce that 15 additional AWS services have achieved Provisional Authority to Operate (P-ATO) from the Federal Risk and Authorization Management Program (FedRAMP) Joint Authorization Board (JAB).

AWS is continually expanding the scope of our compliance programs to help customers use authorized services for sensitive and regulated workloads. AWS now offers 111 AWS services authorized in the AWS US East/West Regions under FedRAMP Moderate Authorization, and 91 services authorized in the AWS GovCloud (US) Regions under FedRAMP High Authorization.

Figure 1. Newly authorized services list

Figure 1. Newly authorized services list

Descriptions of AWS Services now in FedRAMP P-ATO

These additional AWS services now provide the following capabilities for the U.S. federal government and customers with regulated workloads:

  • Amazon Detective simplifies analyzing, investigating, and quickly identifying the root cause of potential security issues or suspicious activities. Amazon Detective automatically collects log data from your AWS resources, and uses machine learning, statistical analysis, and graph theory to build a linked set of data enabling you to easily conduct faster and more efficient security investigations.
  • Amazon FSx for Lustre provides fully managed shared storage with the scalability and performance of the popular Lustre file system.
  • Amazon FSx for Windows File Server provides fully managed shared storage built on Windows Server, and delivers a wide range of data access, data management, and administrative capabilities.
  • Amazon Kendra is an intelligent search service powered by machine learning (ML).
  • Amazon Keyspaces (for Apache Cassandra) is a scalable, highly available, and managed Apache Cassandra-compatible database service.
  • Amazon Lex is an AWS service for building conversational interfaces into applications using voice and text.
  • Amazon Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in AWS.
  • Amazon MQ is a managed message broker service for Apache ActiveMQ and RabbitMQ that simplifies setting up and operating message brokers on AWS.
  • AWS CloudHSM is a cloud-based hardware security module (HSM) that lets you generate and use your own encryption keys on the AWS Cloud.
  • AWS Cloud Map is a cloud resource discovery service. With Cloud Map, you can define custom names for your application resources, and CloudMap maintains the updated location of these dynamically changing resources.
  • AWS Glue DataBrew is a new visual data preparation tool that lets data analysts and data scientists quickly clean and normalize data to prepare it for analytics and machine learning.
  • AWS Outposts (hardware excluded) is a fully managed service that extends AWS infrastructure, services, APIs, and tools to customer premises. By providing local access to AWS managed infrastructure, AWS Outposts enables you to build and run applications on premises using the same programming interfaces used in AWS Regions, while using local compute and storage resources for lower latency and local data processing needs.
  • AWS Resource Groups grants you the ability to organize your AWS resources, managing and automating tasks for large numbers of resources at the same time.
  • AWS Snowmobile is an Exabyte-scale data transfer service used to move extremely large amounts of data to AWS. You can transfer up to 100PB per Snowmobile, a 45-foot long ruggedized shipping container, pulled by a semi-trailer truck. After an initial assessment, a Snowmobile will be transported to your data center and AWS personnel will configure it so it can be accessed as a network storage target. After you load your data, the Snowmobile is driven back to an AWS regional data center, where AWS imports the data into Amazon Simple Storage Service (Amazon S3).
  • AWS Transfer Family securely scales your recurring business-to-business file transfers to Amazon S3 and Amazon Elastic File System (Amazon EFS) using SFTP, FTPS, and FTP protocols.

The following services are now listed on the FedRAMP Marketplace and the AWS Services in Scope by Compliance Program page.

Service authorizations by Region

Service FedRAMP Moderate in AWS US East/West FedRAMP High in AWS GovCloud (US)
Amazon Detective
Amazon FSx for Lustre
Amazon FSx for Windows File Server
Amazon Kendra
Amazon Keyspaces (for Apache Cassandra)
Amazon Lex
Amazon Macie
Amazon MQ
AWS CloudHSM
AWS Cloud Map
AWS Glue DataBrew
AWS Outposts
AWS Resource Groups
AWS Snowmobile
AWS Transfer Family

To learn what other public sector customers are doing on AWS, see our Government, Education, and Nonprofits Case Studies and Customer Success Stories. Stay tuned for future updates on our Services in Scope by Compliance Program page. Let us know how this post will help your mission by reaching out to your AWS Account Team. Lastly, if you have feedback about this blog post, let us know in the Comments section.

Want more AWS Security news? Follow us on Twitter.

Author

Alexis Robinson

Alexis is the Head of the U.S. Government Security and Compliance Program for AWS. For over 10 years, she has served federal government clients advising on security best practices and conducting cyber and financial assessments. She currently supports the security of the AWS internal environment including cloud services applicable to AWS East/West and AWS GovCloud (US) Regions.

Fine-tune and optimize AWS WAF Bot Control mitigation capability

Post Syndicated from Dmitriy Novikov original https://aws.amazon.com/blogs/security/fine-tune-and-optimize-aws-waf-bot-control-mitigation-capability/

Introduction

A few years ago at Sydney Summit, I had an excellent question from one of our attendees. She asked me to help her design a cost-effective, reliable, and not overcomplicated solution for protection against simple bots for her web-facing resources on Amazon Web Services (AWS). I remember the occasion because with the release of AWS WAF Bot Control, I can now address the question with an elegant solution. The Bot Control feature now makes this a matter of switching it on to start filtering out common and pervasive bots that generate over 50 percent of the traffic against typical web applications.

Reduce Unwanted Traffic on Your Website with New AWS WAF Bot Control introduced AWS WAF Bot Control and some of its capabilities. That blog post covers everything you need to know about where to start and what elements it uses for configuration and protection. This post unpacks closely-related functionalities, and shares key considerations, best practices, and how to customize for common use cases. Use cases covered include:

  • Limiting the crawling rate of a bot leveraging labels and AWS WAF response headers
  • Enabling Bot Control only for certain parts of your application with scope down statements
  • Prioritizing verified bots or allowing only specific ones using labels
  • Inserting custom headers into requests from certain bots based on their labels

Key elements of AWS WAF Bot Control fine-tuning

Before moving on to precise configuration of the bot mitigation capability, it is important to understand the components that go into the process.

Labels

Although labels aren’t unique to Bot Control, the feature takes advantage of them, and many configurations use labels as the main input. A label is a string value that is applied to a request based on matching a rule statement. One way of thinking about them is as tags that belong to the specific request. The request acquires them after being processed by a rule statement, and can be used as identification of similar requests in all subsequent rules within the same web ACL. Labels enable you to act on a group of requests that meets specific criteria. That’s because the subsequent rules in the same web ACL have access to the generated labels and can match against them.

Labels go beyond just a mechanism for matching a rule. Labels are independent of a rule’s action, as they can be generated for Block, Allow, and Count. That opens up opportunities to filter or construct queries against records in AWS WAF logs based on labels, and so implement sophisticated analytics.

A label is a string made up of a prefix, optional namespace, and a name delimited by a colon. For example: prefix:[namespace:]name. The prefix is automatically added by AWS WAF.

AWS WAF Bot Control includes various labels and namespaces:

  • bot:category: Type of bot. For example, search_engine, content_fetcher
  • bot:name: Name of a specific bot (if available). For example, scrapy, mauibot, crawler4j
  • bot:verified: Verified bots are generally safe for web applications. For example, googlebot and linkedin. Bot Control performs validation to confirm that such bots come from the source that they claim, using the bot confirmation detection logic described later in this section.

    By default, verified bots are not blocked by Bot Control, but you can use a label to block them with a custom rule.

  • signal: attributes of the request indicate a bot activity. For example, non_browser_user_agent, automated_browser

These labels are added through managed bot detection logic, and Bot Control uses them to perform the following:

Known bot categorization: Comparing the request user-agent to known bots to categorize and allow customers to block by category. Bots are categorized by their function, such as scrapers, search engines, social media.

Bot confirmation: Most respectable bots provide a way to validate beyond the user-agent, typically by doing a reverse DNS lookup of the IP address to confirm the validity of domain and host names. These automatic checks will help you to ensure that only legitimate bots are allowed, and provide a signal to flag requests to downstream systems for bot detection.

Header validation: Request headers validation is performed against a series of checks to look for missing headers, malformed headers, or invalid headers.

Browser signature matching: TLS handshake data and request headers can be deconstructed and partially recombined to create a browser signature that identifies browser and OS combinations. This signature can be validated against the user-agent to confirm they match, and checked against lists of known-good browser known-bad browser signatures.

Below are a few examples of labels that Bot Control has. You can obtain the full list by calling the DescribeManagedRuleGroup API.

awswaf:managed:aws:bot-control:bot:category:search_engine
awswaf:managed:aws:bot-control:bot:name:scrapy
awswaf:managed:aws:bot-control:bot:verified
awswaf:managed:aws:bot-control:signal:non_browser_user_agent

Best practice to start with Bot Control

Although Bot Control can be enabled and start protecting your web resources with the default Block action, you can switch all rules in the rule group into a Count action at the beginning. This accomplishes the following:

  • Avoids false positives with requests that might match one of the rules in Bot Control but still be a valid bot for your resource.
  • Allows you to accumulate enough data points in the form of labels and actions on requests with them, if some of the requests matched rules in Bot Control. That enables you to make informed decisions on constructing rules for each desired bot or category and when switching them into a default action is appropriate.

Labels can be looked up in Amazon CloudWatch metrics and AWS WAF logs, and as soon as you have them, you can start planning whether exceptions or any custom rules are needed to cater for a specific scenario. This blog post explores examples of such use cases in the Common use cases sections below.

Additionally, as AWS WAF processes rules in sequential order, you should consider where the Bot Control rule group is located in your web ACL. To filter out requests that you confidently consider unwanted, you can place AWS Managed Rules rule groups—such as the Amazon IP reputation list—before the Bot Control rule group in the evaluation order. This decreases the number of requests processed by Bot Control, and makes it more cost effective. Simultaneously, Bot Control should be early enough in the rules to:

  • Enable label generation for downstream rules. That also provides higher visibility as a side benefit.
  • Decrease false positives by not blocking desired bots before they reach Bot Control.

AWS WAF Bot Control fine-tuning wouldn’t be complete and configurable without a set of recently released features and capabilities of AWS WAF. Let’s unpack them.

How to work with labels in CloudWatch metrics and AWS WAF logs

Generated labels generate CloudWatch metrics and are placed into AWS WAF logs. It enables you to see what bots and categories hit your website, and the labels associated with them that you can use for fine tuning.

CloudWatch metrics are generated with the following dimensions and metrics.

  • Region dimension is available for all Regions except Amazon CloudFront. When web ACL is associated with CloudFront, metrics are in the Northern Virginia Region.
  • WebACL dimension is the name of the WebACL
  • Namespace is the fully qualified namespace, including the prefix
  • LabelValue is the label name
  • Action is the terminating action (for example, Allow, Block, Count)

AWS WAF includes a shortcut to associated CloudWatch metrics at the top of the Overview page, as shown in Figure 1.

Figure 1: Title and description of the chart in AWS WAF with a shortcut to CloudWatch

Figure 1: Title and description of the chart in AWS WAF with a shortcut to CloudWatch

Alternatively, you can find them in the WAFV2 service category of the CloudWatch Metrics section.

CloudWatch displays generated labels and the volume across dates and times, so you can evaluate and make informed decisions to structure the rules or address false positives. Figure 2 illustrates what labels were generated for requests from bots that hit my website. This example configured only a couple of explicit Allow actions, so most of them were blocked. The top section of the figure 2 shows the load from two selected labels.

Figure 2: WAFV2 CloudWatch metrics for generated Label Namespaces

Figure 2: WAFV2 CloudWatch metrics for generated Label Namespaces

In AWS WAF logs, generated labels are included in an array under the field labels. Figure 3 shows an example request with the labels array at the bottom.

Figure 3: An example of an AWS WAF log record

Figure 3: An example of an AWS WAF log record

This example shows three labels generated for the same request. Uptimerobot follows the monitoring category label, and combining these two labels is useful to provide flexibility for configurations based on them. You can use the whole category, or be laser-focused using the label of the specific bot. You will see how and why that matters later in this blog post. The third label, non_browser_user_agent, is a signal of forwarded requests that have extra headers. For protection from bots in conjunction with labels, you can construct extra scanning in your application for certain requests.

Scope-down statements

Given that Bot Control is a premium feature and is a paid AWS Managed Rules, the ability to keep your costs in control is crucial. The scope-down statement allows you to optimize for cost by filtering out any traffic that doesn’t require inspection by Bot Control.

To address this goal, you can use scope down statements that can be applied to two broad scenarios.

You can exclude certain parts of your resource from scanning by Bot Control. Think of parts of your web site that you don’t mind being accessed by bots, typically that would be static content, such as images and CSS files. Leaving protection on everything else, such as APIs and login pages. You can also exclude IP ranges that can be considered safe from bot management. For example, traffic that’s known to come from your organization or viewers that belong to your partners or customers.

Alternatively, you can look at this from a different angle, and only apply bot management to a small section of your resources. For example, you can use Bot Control to protect a login page, or certain sensitive APIs, leaving everything else outside of your bot management.

With all of these tools in our toolkit let’s put them into perspective and dive deep into use cases and scenarios.

Common use cases for AWS WAF Bot Control fine-tuning

There are several methods for fine tuning Bot Control to better meet your needs. In this section, you’ll see some of the methods you can use.

Limit the crawling rate

In some cases, it is necessary to allow bots access to your websites. A good example is search engine bots, that crawl the web and create an index. If optimization for search engines is important for your business, but you notice excessive load from too many requests hitting your web resource, you might face a dilemma of how to slow crawlers down without unnecessarily blocking them. You can solve this with a combination of Bot Control detection logic and a rate-based rule with a response status code and header to communicate your intention back to crawlers. Most crawlers that are deemed useful have a built-in mechanism to decrease their crawl rate when you detect and respond to increased load.

To customize bot mitigation and set the crawl rate below limits that might negatively affect your web resource

  1. In the AWS WAF console, select Web ACLs from the left menu. Open your web ACL or follow the steps to create a web ACL.
  2. Choose the Rules tab and select Add rules. Select Add managed rule groups and proceed with the following settings:
    1. In the AWS managed rule groups section, select the switch Add to web ACL to enable Bot Control in the web ACL. This also gives you labels that you can use in other rules later in the evaluation process inside the web ACL.
    2. Select Add rules and choose Save
  3. In the same web ACL, select Add rules menu and select Add my own rules and rule groups.
  4. Using the provided Rule builder, configure the following settings:
    1. Enter a preferred name for the rule and select Rate-based rule.
    2. Enter a preferred rate limit for the rule. For example, 500.

      Note: The rate limit is the maximum number of requests allowed from a single IP address in a five-minute period.

    3. Select Only consider requests that match the criteria in a rule statement to enable the scope-down statement to narrow the scope of the requests that the rule evaluates.
    4. Under the Inspect menu, select Has a label to focus only on certain types of bots.
    5. In the Match key field, enter one of the following labels to match based on broad categories, such as verified bots or all bots identified as scraping as illustrated on Figure 4:

      awswaf:managed:aws:bot-control:bot:verified
      awswaf:managed:aws:bot-control:bot:category:scraping_framework

    6. Alternatively, you can narrow down to a specific bot using its label:

      awswaf:managed:aws:bot-control:bot:name:Googlebot

      Figure 4: Label match rule statement in a rule builder with a specific match key

      Figure 4: Label match rule statement in a rule builder with a specific match key

  5. In the Action section, configure the following settings:
    1. Select Custom response to enable it.
    2. Enter 429 as the Response code to indicate and communicate back to the bot that it has sent too many requests in a given amount of time.
    3. Select Add new custom header and enter Retry-After in the Key field and a value in seconds for the Value field. The value indicates how many seconds a bot must wait before making a new request.
  6. Select Add rule.
  7. It’s important to place the rule after the Bot Control rule group inside your web ACL, so that the label is available in this custom rule.
    1. In the Set rule priority section, check that the new rate-based rule is under the existing Bot Control rule set and if not, choose the newly created rule and select Move up or Move down until the rule is located after it.
    2. Select Save.
Figure 5: AWS WAF rule action with a custom response code

Figure 5: AWS WAF rule action with a custom response code

With the preceding configuration, Bot Control sets required labels, which you then use in the scope-down statement in a rate-based rule to not only establish a ceiling of how many requests you will allow from specific bots, but also communicate to bots when their crawling rate is too high. If they don’t respect the response and lower their rate, the rule will temporarily block them, protecting your web resource from being overwhelmed.

Note: If you use a category label, such as scraping_framework, all bots that have that label will be counted by your rate-based rule. To avoid unintentional blocking of bots that use the same label, you can either narrow down to a specific bot with a precise bot:name: label, or select a higher rate limit to allow a greater margin for the aggregate.

Enable Bot Control only for certain parts of your application

As mentioned earlier, excluding parts of your web resource from Bot Control protection is a mechanism to reduce the cost of running the feature by focusing only on a subset of the requests reaching a resource. There are a few common scenarios that take advantage of this approach.

To run Bot Control only on dynamic parts of your traffic

  1. In the AWS WAF console, select Web ACLs from the left menu. Open a web ACL that you have, or follow the steps to create a web ACL.
  2. Choose the Rules tab and select Add rules. Then select Add managed rule groups to proceed with the following settings:
    1. In the AWS managed rule groups section, select Add to web ACL to enable Bot Control in the web ACL.
    2. Select Edit.
  3. Select Scope-down statement – optional and select Enable Scope-down statement.
  4. In If a request, select doesn’t match the statement (NOT).
  5. In the Statement section, configure the following settings:
    1. Choose URI path in the Inspect field.
    2. For the Match type, choose Starts with string.
    3. Depending on the structure of your resource, you can enter a whole URI string—such as images/—in the String to match field. The string will be excluded from Bot Control evaluation.
    Figure 6: A scope-down statement to match based on a string that a URI path starts with

    Figure 6: A scope-down statement to match based on a string that a URI path starts with

  6. Select Save rule.

An alternative to using string matching

As an alternative to a string match type, you can use a regex pattern set. If you don’t have a regex pattern set, create one using the following guide.

Note: This pattern matches most common file extensions associated with static files for typical web resources. You can customize the pattern set if you have different file types.

  1. Follow steps 1-4 of the previous procedure.
  2. In the Statement section, configure the following settings:
    1. Choose URI path in the Inspect field.
    2. For the Match type, choose Matches pattern from regex pattern set and select your created set in the Regex pattern set. as illustrated in Figure 7.
    3. In Regex pattern set, enter the pattern
      (?i)\.(jpe?g|gif|png|svg|ico|css|js|woff2?)$

      Figure 7: A scope-down statement to match based on a regex pattern set as part of a URI path

      Figure 7: A scope-down statement to match based on a regex pattern set as part of a URI path

To run Bot Control only on the most sensitive parts of your application.

Another option is to exclude almost everything, by only enabling the Bot Control on the most sensitive part of your application. For example, a login page.

Note: The actual URI path depends on the structure of your application.

  1. Inside the Scope-down statement, in the If a request menu, select matches the statement.
  2. In the Statement section:
    1. In the Inspect field, select URI path.
    2. For the Match type, select Contains string.
    3. In the String to match field, enter the string you want to match. For example, login as shown in the Figure 8.
  3. Choose Save rule.
    Figure 8: A scope-down statement to match based on a string within a URI path

    Figure 8: A scope-down statement to match based on a string within a URI path

To exclude more than one part of your application from Bot Control.

If you have more than one part to exclude, you can use an OR logical statement to list each part in a scope-down statement.

  1. Inside the Scope-down statement, in the If a request menu, select matches at least one of the statements (OR).
  2. In the Statement 1 section, configure the following settings:
    1. Choose URI path in the Inspect field.
    2. For the Match type choose Contains string.
    3. In the String to match field enter a preferred value. For example, login.
  3. In the Statement 2 section, configure the following settings:
    1. Choose URI path in the Inspect field.
    2. For the Match type choose Starts with string.
    3. In the String to match field enter a preferred URI value. For example, payment/.
  4. Select Save rule.

Figure 9 builds on the previous example of an exact string match by adding an OR statement to protect an API named payment.

Figure 9: A scope-down statement with OR logic for more sophisticated matching

Figure 9: A scope-down statement with OR logic for more sophisticated matching

Note: The visual editor on the console supports up to five statements. To add more, edit the JSON representation of the rule on the console or use the APIs.

Prioritize verified bots that you don’t want to block

Since verified bots aren’t blocked by default, in most cases there is no need to apply extra logic to allow them through. However, there are scenarios where other AWS WAF rules might match some aspects of requests from verified bots and block them. That can hurt some metrics for SEO, or prevent links from your website from properly propagating and displaying in social media resources. If this is important for your business, then you might want to ensure you protect verified bots by explicitly allowing them in AWS WAF.

To prioritize the verified bots category

  1. In the AWS WAF menu, select Web ACLs from the left menu. Open a web ACL that you have, or follow the steps to create a web ACL. The next steps assume you already have a Bot Control rule group enabled inside the web ACL.
  2. In the web ACL, select Add rules, and then select Add my own rules and rule groups.
  3. Using the provided Rule builder, configure the following settings:
    1. Enter a name for the rule in the Name field.
    2. Under the Inspect menu, select Has a label.
    3. In the Match key field, enter the following label to match based on the label that each verified bot has:

      awswaf:managed:aws:bot-control:bot:verified

    4. In the Action section, select Allow to confirm the action on a request match
  4. Select Add rule. It’s important to place the rule after the Bot Control rule group inside your web ACL, so that the bot:verified label is available in this custom rule. To complete this, configure the following steps:
    1. In the Set rule priority section, check that the rule you just created is listed immediately after the existing Bot Control rule set. If it’s not, choose the newly created rule and select Move up or Move down until the rule is located immediately after the existing Bot Control rule set.
    2. Select Save.
Figure 10: Label match rule statement in a Rule builder with a specific match key

Figure 10: Label match rule statement in a Rule builder with a specific match key

Allow a specific bot

Labels also enable you to single out the bot you don’t want to block from the category that is blocked. One of the common examples are third-party bots that perform monitoring of your web resources.

Let’s take a look at a scenario where UptimeRobot is used to allow a specific bot. The bot falls into a category that’s being blocked by default—bot:category:monitoring. You can either exclude the whole category, which can have a wider impact on resource than you want, or allow only UptimeRobot.

To explicitly allow a specific bot

  1. Analyze CloudWatch metrics or AWS WAF logs to find the bot that is being blocked and its associated labels. Unless you want to allow the whole category, the label you would be looking for is bot:name: The example that follows is based on the label awswaf:managed:aws:bot-control:bot:name:uptimerobot.

    From the logs, you can also verify which category the bot belongs to, which is useful for configuring Scope-down statements.

  2. In the AWS WAF console, select Web ACLs from the left menu. Open a web ACL that you have, or follow the steps to create a web ACL. For the next steps, it’s assumed that you already have a Bot Control rule group enabled inside the webACL.
  3. Open the Bot Control rule set in the list inside your web ACL and choose Edit
  4. From the list of Rules find CategoryMonitoring and set to Count. This will prevent the default block action of the category.
  5. Select Scope-down statement – optional and select Scope-down statement. Then configure the following settings:
    1. Inside the Scope-down statement, in the If a request menu, choose matches all the statements (AND). This will allow you to construct the complex logic necessary to block the category but allow a specified bot.
    2. In the Statement 1 section under the Inspect menu select Has a label.
    3. In the Match key field, enter the label of the broad category that you set to count in step number 4. In this example, it is monitoring. This configuration will keep other bots from the category blocked:

      awswaf:managed:aws:bot-control:bot:category:monitoring

    4. In the Statement 2 section, select Negate statement results to allow you to exclude a specific bot.
    5. Under the Inspect menu, select Has a label.
    6. In the Match key field, enter the label that will uniquely identify the bot you want to explicitly allow. In this example, it’s uptimerobot with the following label:

      awswaf:managed:aws:bot-control:bot:name:uptimerobot

  6. Choose Save rule.
Figure 11: Label match rule statement with AND logic to single out a specific bot name from a category

Figure 11: Label match rule statement with AND logic to single out a specific bot name from a category

Note: This approach is the best practice for analyzing and, if necessary, addressing false positives situations. You can apply exclusion to any bot, or multiple bots, based on the unique bot:name: label.

Insert custom headers into requests from certain bots

There are situations when you want to further process or analyze certain requests. or implement logic that is provided by systems in the downstream. In such cases, you can use AWS WAF Bot Control to categorize the requests. Applications later in the process can then apply the intended logic on either a broad group of requests, such as all bots within a category, or as narrow as a certain bot.

To insert a custom header

  1. In the AWS WAF console, select Web ACLs from the left menu. Open a web ACL that you have, or follow the steps to create a web ACL. The next steps assume that you already have Bot Control rule group enabled inside the webACL.
  2. Open the Bot Control rule set in the list inside your web ACL and choose Edit.
  3. From the list of Rules set the targeted category to Count.
  4. Choose Save rule.
  5. In the same web ACL, choose the Add rules menu and select Add my own rules and rule groups.
  6. Using the provided Rule builder, configure the following settings:
    1. Enter a name for the rule in the Name field.
    2. Under the Inspect menu, select Has a label.
    3. In the Match key field, enter the label to match either a targeted category or a bot. This example uses the security category label:
      awswaf:managed:aws:bot-control:bot:category:security
    4. In the Action section, select Count
    5. Open Custom request – optional and select Add new custom header
    6. Enter values in the Key and Value fields that correspond to the inserted custom header key-value pair that you want to use in downstream systems. The example in Figure 12 shows this configuration.
    7. Choose Add rule.

    AWS WAF prefixes your custom header names with x-amzn-waf- when it inserts them, so when you add abc-category, your downstream system sees it as x-amzn-waf-abc-category.

Figure 12: AWS WAF rule action with a custom header inserted by the service

Figure 12: AWS WAF rule action with a custom header inserted by the service

The custom rule located after Bot Control now inserts the header into any request that it labeled as coming from bots within the security category. Then the security appliance that is after AWS WAF acts on the requests based on the header, and processes them accordingly.

This implementation can serve other scenarios. For example, using your custom headers to communicate to your Origin to append headers that will explicitly prevent caching certain content. That makes bots always get it from the Origin. Inserted headers are accessible within AWS Lambda@Edge functions and CloudFront Functions, this opens up advanced processing scenarios.

Conclusion

This post describes the primary building blocks for using Bot Control, and how you can combine and customize them to address different scenarios. It’s not an exhaustive list of the use cases that Bot Control can be fine-tuned for, but hopefully the examples provided here inspire and provide you with ideas for other implementations.

If you already have AWS WAF associated with any of your web-facing resources, you can view current bot traffic estimates for your applications based on a sample of requests currently processed by the service. Visit the AWS WAF console to view the bot overview dashboard. That’s a good starting point to consider implementing learnings from this blog to improve your bot protection.

It is early days for the feature, and it will keep gaining more capabilities, stay tuned!

 
If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on AWS WAF re:Post or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Dmitriy Novikov

Dmitriy Novikov

In his role as Senior Solutions Architect at Amazon Web Services, Dmitriy supports AWS customers to utilize emerging technologies for business value generation. He’s a technology enthusiast who gets a charge out of finding innovative solutions to complex security challenges. He enjoys sharing his learnings on architecture and best practices in blogs, whitepapers and public speaking events. Outside work, Dmitriy has a passion for reading and triathlon.

The collective thoughts of the interwebz