Architecture Monthly Magazine: Open Source

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/aws-architecture-monthly-magazine-open-source/

Architecture Monthly Magazine - Open SourceAccording to the Open Source Initiative, the term “open source” was created at a strategy session held in 1998 in Palo Alto, California, shortly after the announcement of the release of the Netscape source code. Stakeholders at that session realized that this announcement created an opportunity to educate and advocate for the superiority of an open development process.

We’ve witnessed big changes in open source in the past 22 years and this year-end issue of Architecture Monthly looks at a few trends, including the shift from businesses only consuming open source to contributing to it. You’ll also going to learn how AWS open source projects are one of the ways we’re making technology less cost-prohibitive and more accessible to everyone.

In this month’s Open Source issue

  • Ask an Expert: Ricardo Sueiras, Principal Advocate for Open Source at AWS
  • Blog: How Amazon Retail Systems Run Machine Learning Predictions with Apache Spark using Deep Java Library
  • Case Study: Absa Transforms IT and Fosters Tech Talent Using AWS
  • Reference Architecture: Running WordPress on AWS
  • Blog: Simplifying Serverless Best Practices with Lambda Powertools
  • Whitepaper: Modernizing the Amazon Database Infrastructure: Migrating from Oracle to AWS
  • Quick Start: Magento on AWS
  • Related Videos: Wix, Viber, UC Santa Cruz, & Redfin

Download the magazine

How to access the magazine

We hope you’re enjoying Architecture Monthly, and we’d like to hear from you—leave us star rating and comment on the Amazon Kindle Newsstand page or contact us anytime at [email protected].

[$] Changed-block tracking and differential backups in QEMU

Post Syndicated from original https://lwn.net/Articles/837053/rss

The block layer of QEMU, the open-source
machine emulator and virtualizer, forms the backbone of many storage
virtualization features
: the QEMU Copy-On-Write (QCOW2) disk-image file format,
disk image chains, point-in-time snapshots, backups, and more. At the
recently concluded 2020 KVM Forum
virtual event, Eric Blake gave a talk
on the current work in QEMU and libvirt
to make differential backups more powerful. As the name implies,
“differential backups” address the efficiency problems of full disk
backups: space usage and speed of backup creation.

Security updates for Tuesday

Post Syndicated from original https://lwn.net/Articles/837538/rss

Security updates have been issued by Debian (libdatetime-timezone-perl, openldap, pacemaker, and restic), Fedora (libmediainfo, mediainfo, mingw-python3, and seamonkey), Gentoo (libexif), openSUSE (raptor), Oracle (kernel and microcode_ctl), Scientific Linux (firefox), SUSE (kernel-firmware, postgresql, postgresql96, postgresql10 and postgresql12, and raptor), and Ubuntu (openldap and postgresql-10, postgresql-12, postgresql-9.5).

Anchoring Trust: A Hardware Secure Boot Story

Post Syndicated from Derek Chamorro original https://blog.cloudflare.com/anchoring-trust-a-hardware-secure-boot-story/

Anchoring Trust: A Hardware Secure Boot Story

Anchoring Trust: A Hardware Secure Boot Story

As a security company, we pride ourselves on finding innovative ways to protect our platform to, in turn, protect the data of our customers. Part of this approach is implementing progressive methods in protecting our hardware at scale. While we have blogged about how we address security threats from application to memory, the attacks on hardware, as well as firmware, have increased substantially. The data cataloged in the National Vulnerability Database (NVD) has shown the frequency of hardware and firmware-level vulnerabilities rising year after year.

Technologies like secure boot, common in desktops and laptops, have been ported over to the server industry as a method to combat firmware-level attacks and protect a device’s boot integrity. These technologies require that you create a trust ‘anchor’, an authoritative entity for which trust is assumed and not derived. A common trust anchor is the system Basic Input/Output System (BIOS) or the Unified Extensible Firmware Interface (UEFI) firmware.

While this ensures that the device boots only signed firmware and operating system bootloaders, does it protect the entire boot process? What protects the BIOS/UEFI firmware from attacks?

The Boot Process

Before we discuss how we secure our boot process, we will first go over how we boot our machines.

Anchoring Trust: A Hardware Secure Boot Story

The above image shows the following sequence of events:

  • After powering on the system (through a baseboard management controller (BMC) or physically pushing a button on the system), the system unconditionally executes the UEFI firmware residing on a flash chip on the motherboard.
  • UEFI performs some hardware and peripheral initialization and executes the Preboot Execution Environment (PXE) code, which is a small program that boots an image over the network and usually resides on a flash chip on the network card.
  • PXE sets up the network card, and downloads and executes a small program bootloader through an open source boot firmware called iPXE.
  • iPXE loads a script that automates a sequence of commands for the bootloader to know how to boot a specific operating system (sometimes several of them). In our case, it loads our Linux kernel, initrd (this contains device drivers which are not directly compiled into the kernel), and a standard Linux root filesystem. After loading these components, the bootloader executes and hands off the control to the kernel.
  • Finally, the Linux kernel loads any additional drivers it needs and starts applications and services.

UEFI Secure Boot

Our UEFI secure boot process is fairly straightforward, albeit customized for our environments. After loading the UEFI firmware from the bootloader, an initialization script defines the following variables:

Platform Key (PK): It serves as the cryptographic root of trust for secure boot, giving capabilities to manipulate and/or validate the other components of the secure boot framework.

Trusted Database (DB): Contains a signed (by platform key) list of hashes of all PCI option ROMs, as well as a public key, which is used to verify the signature of the bootloader and the kernel on boot.

These variables are respectively the master platform public key, which is used to sign all other resources, and an allow list database, containing other certificates, binary file hashes, etc. In default secure boot scenarios, Microsoft keys are used by default. At Cloudflare we use our own, which makes us the root of trust for UEFI:

Anchoring Trust: A Hardware Secure Boot Story

But, by setting our trust anchor in the UEFI firmware, what attack vectors still exist?

UEFI Attacks

As stated previously, firmware and hardware attacks are on the rise. It is clear from the figure below that firmware-related vulnerabilities have increased significantly over the last 10 years, especially since 2017, when the hacker community started attacking the firmware on different platforms:

Anchoring Trust: A Hardware Secure Boot Story

This upward trend, coupled with recent malware findings in UEFI, shows that trusting firmware is becoming increasingly problematic.

By tainting the UEFI firmware image, you poison the entire boot trust chain. The ability to trust firmware integrity is important beyond secure boot. For example, if you can’t trust the firmware not to be compromised, you can’t trust things like trusted platform module (TPM) measurements to be accurate, because the firmware is itself responsible for doing these measurements (e.g a TPM is not an on-path security mechanism, but instead it requires firmware to interact and cooperate with). Firmware may be crafted to extend measurements that are accepted by a remote attestor, but that don’t represent what’s being locally loaded. This could cause firmware to have a questionable measured boot and remote attestation procedure.

Anchoring Trust: A Hardware Secure Boot Story

If we can’t trust firmware, then hardware becomes our last line of defense.

Hardware Root of Trust

Early this year, we made a series of blog posts on why we chose AMD EPYC processors for our Gen X servers. With security in mind, we started turning on features that were available to us and set forth the plan of using AMD silicon as a Hardware Root of Trust (HRoT).

Platform Secure Boot (PSB) is AMD’s implementation of hardware-rooted boot integrity. Why is it better than UEFI firmware-based root of trust? Because it is intended to assert, by a root of trust anchored in the hardware, the integrity and authenticity of the System ROM image before it can execute. It does so by performing the following actions:

  • Authenticates the first block of BIOS/UEFI prior to releasing x86 CPUs from reset.
  • Authenticates the System Read-Only Memory (ROM) contents on each boot, not just during updates.
  • Moves the UEFI Secure Boot trust chain to immutable hardware.

This is accomplished by the AMD Platform Security Processor (PSP), an ARM Cortex-A5 microcontroller that is an immutable part of the system on chip (SoC). The PSB consists of two components:

On-chip Boot ROM

  • Embeds a SHA384 hash of an AMD root signing key
  • Verifies and then loads the off-chip PSP bootloader located in the boot flash

Off-chip Bootloader

  • Locates the PSP directory table that allows the PSP to find and load various images
  • Authenticates first block of BIOS/UEFI code
  • Releases CPUs after successful authentication
Anchoring Trust: A Hardware Secure Boot Story
  1. The PSP secures the On-chip Boot ROM code, loads the off-chip PSP firmware into PSP static random access memory (SRAM) after authenticating the firmware, and passes control to it.
  2. The Off-chip Bootloader (BL) loads and specifies applications in a specific order (whether or not the system goes into a debug state and then a secure EFI application binary interface to the BL)
  3. The system continues initialization through each bootloader stage.
  4. If each stage passes, then the UEFI image is loaded and the x86 cores are released.

Now that we know the booting steps, let’s build an image.

Build Process

Public Key Infrastructure

Before the image gets built, a public key infrastructure (PKI) is created to generate the key pairs involved for signing and validating signatures:

Anchoring Trust: A Hardware Secure Boot Story

Our original device manufacturer (ODM), as a trust extension, creates a key pair (public and private) that is used to sign the first segment of the BIOS (private key) and validates that segment on boot (public key).

On AMD’s side, they have a key pair that is used to sign (the AMD root signing private key) and certify the public key created by the ODM. This is validated by AMD’s root signing public key, which is stored as a hash value (RSASSA-PSS: SHA-384 with 4096-bit key is used as the hashing algorithm for both message and mask generation) in SPI-ROM.

Private keys (both AMD and ODM) are stored in hardware security modules.

Because of the way the PKI mechanisms are built, the system cannot be compromised if only one of the keys is leaked. This is an important piece of the trust hierarchy that is used for image signing.

Certificate Signing Request

Once the PKI infrastructure is established, a BIOS signing key pair is created, together with a certificate signing request (CSR). Creating the CSR uses known common name (CN) fields that many are familiar with:

  • countryName
  • stateOrProvinceName
  • localityName
  • organizationName

In addition to the fields above, the CSR will contain a serialNumber field, a 32-bit integer value represented in ASCII HEX format that encodes the following values:

  • PLATFORM_VENDOR_ID: An 8-bit integer value assigned by AMD for each ODM.
  • PLATFORM_MODEL_ID: A 4-bit integer value assigned to a platform by the ODM.
  • BIOS_KEY_REVISION_ID: is set by the ODM encoding a 4-bit key revision as unary counter value.
  • DISABLE_SECURE_DEBUG: Fuse bit that controls whether secure debug unlock feature is disabled permanently.
  • DISABLE_AMD_BIOS_KEY_USE: Fuse bit that controls if the BIOS, signed by an AMD key, (with vendor ID == 0) is permitted to boot on a CPU with non-zero Vendor ID.
  • DISABLE_BIOS_KEY_ANTI_ROLLBACK: Fuse bit that controls whether BIOS key anti-rollback feature is enabled.
Anchoring Trust: A Hardware Secure Boot Story

Remember these values, as we’ll show how we use them in a bit. Any of the DISABLE values are optional, but recommended based on your security posture/comfort level.

AMD, upon processing the CSR, provides the public part of the BIOS signing key signed and certified by the AMD signing root key as a RSA Public Key Token file (.stkn) format.

Putting It All Together

The following is a step-by-step illustration of how signed UEFI firmware is built:

Anchoring Trust: A Hardware Secure Boot Story
  1. The ODM submits their public key used for signing Cloudflare images to AMD.
  2. AMD signs this key using their RSA private key and passes it back to ODM.
  3. The AMD public key and the signed ODM public key are part of the final BIOS SPI image.
  4. The BIOS source code is compiled and various BIOS components (PEI Volume, Driver eXecution Environment (DXE) volume, NVRAM storage, etc.) are built as usual.
  5. The PSP directory and BIOS directory are built next. PSP directory and BIOS directory table points to the location of various firmware entities.
  6. The ODM builds the signed BIOS Root of Trust Measurement (RTM) signature based on the blob of BIOS PEI volume concatenated with BIOS Directory header, and generates the digital signature of this using the private portion of ODM signing key. The SPI location for signed BIOS RTM code is finally updated with this signature blob.
  7. Finally, the BIOS binaries, PSP directory, BIOS directory and various firmware binaries are combined to build the SPI BIOS image.

Enabling Platform Secure Boot

Platform Secure Boot is enabled at boot time with a PSB-ready firmware image. PSB is configured using a region of one-time programmable (OTP) fuses, specified for the customer. OTP fuses are on-chip non-volatile memory (NVM) that permits data to be written to memory only once. There is NO way to roll the fused CPU back to an unfused one.

Enabling PSB in the field will go through two steps: fusing and validating.

  • Fusing: Fuse the values assigned in the serialNumber field that was generated in the CSR
  • Validating: Validate the fused values and the status code registers

If validation is successful, the BIOS RTM signature is validated using the ODM BIOS signing key, PSB-specific registers (MP0_C2P_MSG_37 and MP0_C2P_MSG_38) are updated with the PSB status and fuse values, and the x86 cores are released

If validation fails, the registers above are updated with the PSB error status and fuse values, and the x86 cores stay in a locked state.

Let’s Boot!

With a signed image in hand, we are ready to enable PSB on a machine. We chose to deploy this on a few machines that had an updated, unsigned AMI UEFI firmware image, in this case version 2.16. We use a couple of different firmware update tools, so, after a quick script, we ran an update to change the firmware version from 2.16 to 2.18C (the signed image):

. $sudo ./UpdateAll.sh
Bin file name is ****.218C

BEGIN

+---------------------------------------------------------------------------+
|                 AMI Firmware Update Utility v5.11.03.1778                 |      
|                 Copyright (C)2018 American Megatrends Inc.                |                       
|                         All Rights Reserved.                              |
+---------------------------------------------------------------------------+
Reading flash ............... done
FFS checksums ......... ok
Check RomLayout ........ ok.
Erasing Boot Block .......... done
Updating Boot Block ......... done
Verifying Boot Block ........ done
Erasing Main Block .......... done
Updating Main Block ......... done
Verifying Main Block ........ done
Erasing NVRAM Block ......... done
Updating NVRAM Block ........ done
Verifying NVRAM Block ....... done
Erasing NCB Block ........... done
Updating NCB Block .......... done
Verifying NCB Block ......... done

Process completed.

After the update completed, we rebooted:

Anchoring Trust: A Hardware Secure Boot Story

After a successful install, we validated that the image was correct via the sysfs information provided in the dmidecode output:

Anchoring Trust: A Hardware Secure Boot Story

Testing

With a signed image installed, we wanted to test that it worked, meaning: what if an unauthorized user installed their own firmware image? We did this by downgrading the image back to an unsigned image, 2.16. In theory, the machine shouldn’t boot as the x86 cores should stay in a locked state. After downgrading, we rebooted and got the following:

Anchoring Trust: A Hardware Secure Boot Story

This isn’t a powered down machine, but the result of booting with an unsigned image.

Anchoring Trust: A Hardware Secure Boot Story

Flashing back to a signed image is done by running the same flashing utility through the BMC, so we weren’t bricked. Nonetheless, the results were successful.

Naming Convention

Our standard UEFI firmware images are alphanumeric, making it difficult to distinguish (by name) the difference between a signed and unsigned image (v2.16A vs v2.18C), for example. There isn’t a remote attestation capability (yet) to probe the PSB status registers or to store these values by means of a signature (e.g. TPM quote). As we transitioned to PSB, we wanted to make this easier to determine by adding a specific suffix: -sig  that we could query in userspace. This would allow us to query this information via Prometheus. Changing the file name alone wouldn’t do it, so we had to make the following changes to reflect a new naming convention for signed images:

  • Update filename
  • Update BIOS version for setup menu
  • Update post message
  • Update SMBIOS type 0 (BIOS version string identifier)

Signed images now have a -sig suffix:

~$ sudo dmidecode -t0
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.3.0 present.
# SMBIOS implementations newer than version 3.2.0 are not
# fully supported by this version of dmidecode.

Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
	Vendor: American Megatrends Inc.
	Version: V2.20-sig
	Release Date: 09/29/2020
	Address: 0xF0000
	Runtime Size: 64 kB
	ROM Size: 16 MB

Conclusion

Finding weaknesses in firmware is a challenge that many attackers have taken on. Attacks that physically manipulate the firmware used for performing hardware initialization during the booting process can invalidate many of the common secure boot features that are considered industry standard. By implementing a hardware root of trust that is used for code signing critical boot entities, your hardware becomes a ‘first line of defense’ in ensuring that your server hardware and software integrity can derive trust through cryptographic means.

What’s Next?

While this post discussed our current, AMD-based hardware platform, how will this affect our future hardware generations? One of the benefits of working with diverse vendors like AMD and Ampere (ARM) is that we can ensure they are baking in our desired platform security by default (which we’ll speak about in a future post), making our hardware security outlook that much brighter 😀.

Q&A with NASA engineers behind Raspberry Pi–powered ISS Mimic

Post Syndicated from NASA Engineers original https://www.raspberrypi.org/blog/qa-with-nasa-engineers-behind-raspberry-pi-powered-iss-mimic/

Did you see the coolest International Space Station (ISS) on Earth on the blog last week? ISS Mimic is powered by Raspberry Pi, mirrors exactly what the real ISS is doing in orbit, and was built by NASA engineers to make the ISS feel more real for Earth-bound STEAM enthusiasts.

Here’s (most of) the team behind ISS Mimic

The team launched ISS Mimic in celebration of 20 years of continuous human presence in space on the ISS. And they’ve been getting lots of questions since we posted about their creation so, we asked them back to fill you in with a quick Q&A.

And here are newbies Dallas and Estefannie (Estefannie made the ISS Mimic video)

1. Since this is NASA-related, “MIMIC” must be an acronym, right?

Yes, we forced one: “Mechatronic Instantiated Model, Interactively Controlled”

2. What’s your subtitle? 

“The second-most complicated International Space Station ever made”. We also like “1/100th scale for 1/100,000,000th cost”

3. Wait, are US tax dollars paying for you to make this?

No, it’s a volunteer project, but we do get lots of support. It’s done on our own time and money — though many NASA types and others have kicked in to help buy materials. 

ISS Mimic, filmed by YouTube’s finest Estefannie Explains it All

4. So you have supporters?

Yes — mostly other organisations that we have teamed up with. We partner with a non-profit makerspace near NASA, Creatorspace, for tools, materials, and outreach. And an awesome local 3D printer manufacturer, re:3D, has joined us and printed our (large) solar panels for free, and is helping to refine our models. They are also working towards making a kit of parts for sale for those who don’t have a printer or the time to print all the pieces, with a discount for educators.

Particularly helpful has been Space Center Houston (NASA’s visitor center), who invited us to present to the public and at an educator conference (pre-COVID), and allowed us to spend a full day filming in their beautiful facility. Our earliest supporter was Boeing, who we‘ve worked with to facilitate outreach to educators and students from the start.

The real International Space Station (ISS) in orbit

5. How long have you been working on this?

5 years — a looong time. We spent much effort early on to establish the scale and feasibility and test the capabilities of 3D printing. We maintained a hard push to keep the materials cost down and reduce build time/complexity for busy educators. We always knew we’d use Raspberry Pi for the brain, but were looking for less costly options for the mechatronics. We’d still like to cut the cost down a lot to make the project more attainable for lower-income schools and individuals.

6. Have you done any outreach so far?

All of the support has allowed us to take our prototype to schools and STEM events locally. But we really want this to be built around the world to reach those who don’t have much connection to space exploration and hands-on STEM. The big build is probably most suitable for teens and adults, while the alternative builds (in-work) would be much more approachable for younger students.

‘ISS Mimic’ on display

7. So, this just for schools? 

No, not at all. Our focus is to make it viable for schools/educators — in cost and build complexity — but we want any space nerd to be able to build their own and help drive the design.

8. Biggest challenge?

Gravity. And time to work on the project… and trying to keep the cost down.

9. What about a Lunar Gateway or Habitat version of ISS Mimic?

It’s on our radar! Another build that’s screaming to be made is hacking the LEGO ISS model (released this year) to rotate its joints and light LEDs.

Raspberry Pi on the real ISS

There are two Raspberry Pi computers aboard the real ISS right now! And even better, young people have the chance to write Python code that will run on them — IN SPACE — as part of the European Astro Pi Challenge.

Tell the young space enthusiast in your life about Astro Pi to inspire them to try coding! All the info lives at astro-pi.org.

The post Q&A with NASA engineers behind Raspberry Pi–powered ISS Mimic appeared first on Raspberry Pi.

Managing your AWS Outposts capacity using Amazon CloudWatch and AWS Lambda

Post Syndicated from Shubha Kumbadakone original https://aws.amazon.com/blogs/compute/managing-your-aws-outposts-capacity-using-amazon-cloudwatch-and-aws-lambda/

This post is authored by Carlos Castro, AWS Principal Solutions Architect and Kevin Wang, AWS Associate Solutions Architect

Customers are excited about using AWS Outposts to bring services and capabilities available on AWS to on-premises for workloads that require low-latency, local data processing, or local data storage. As part of this, the responsibility for managing capacity shifts from the AWS to the customer. Customers can purchase AWS Outposts configurations from a broad selection of instance types based on the use cases they plan to run on AWS Outposts. When additional capacity is needed, customers can expand their AWS Outposts. When operating in an AWS Region, customers don’t have to worry about redundant capacity. However, with AWS Outposts, customers must ensure that there is sufficient compute and storage capacity to restore services in the event of a hardware or software failure. This allows customers to meet the business continuity and disaster recovery (BCDR) requirements for the workloads that run on their Outpost.

This post shows how you can combine Region-based AWS services for a common management task, ensuring there is available capacity to support an N+M high availability model, where N is the required capacity and M is the spare capacity designed into the solution. I monitor the Amazon EC2 and Amazon EBS capacity on an Outpost and show how to automate governance to ensure that there are available resources to recover from failures. With this, customers can create a “hot-spare” failover mechanism.

Overview of the solution

I use integrations between services in the AWS Outposts’ home Region with the services deployed on the Outpost to accomplish the goal. AWS Outposts provides available and utilized capacity metrics in Amazon CloudWatch. It is a best practice for administrators to create Amazon CloudWatch Events to monitor capacity. For this use case, events trigger fan-out actions to inform (via Amazon Simple Notification Service) and act (via an AWS Lambda function) on the condition. The action from an event can include triggering a workflow to stop low-priority resources, restricting the use of remaining capacity to a subset of the administrators or sending a notification to the appropriate team to plan adding incremental capacity. In this case, I act on the event by updating the permissions policy associated with administrators in AWS Identity and Access Management (IAM). If the threshold has been exceeded, I remove the capability to create more resources by updating an AWS IAM policy. This action allows me to reserve that capacity in the event of a failure. Once the threshold normalizes, I return permissions to the administrators. AWS administrators have the flexibility of assigning the IAM policies you control with the solution to selected IAM users, groups, or roles. For example, application system administrators can be assigned the policies with dynamic resource creation permissions while AWS Outposts infrastructure administrators maintain full access to creating resources.

The following image represents this flow.

Solution Architecture Diagram and Workflow

Tutorial

Now, you are ready to set up this environment. First, create resources using the AWS Management Console and AWS CloudFormation. You can also use the AWS Command Line Interface (CLI) or your preferred infrastructure as code framework.

The suggested sequence of steps is as follows:

  1. Create AWS IAM policies
  2. Launch AWS CloudFormation stack
  3. Review created resources
  4. Validate your work

Prerequisites

For this walkthrough, you must have the following prerequisites:

Before you deploy the solution’s resources, clone the solution’s GitHub repository. Once you’ve cloned the repository, you must upload a copy of the compressed AWS Lambda code artifacts (.zip files) to the top-level folder of an S3 bucket in your account.

Create AWS IAM policies

This approach updates AWS IAM policies to reflect the current capacity available in your AWS Outpost. This enables users to create resources (such as EC2 instances) when there is capacity available and denies them the ability to do so when the capacity threshold has been exceeded. Since this solution automates  remediation via AWS Lambda, you must create a policy that allows the function to make changes in IAM. These permissions are restricted to creating and deleting policy versions. Please review the IAM service documentation for additional information or more in-depth examples.

To create an IAM policy

  1. Log in to AWS IAM console.
  2. Select Policies from the left side navigation tree and click the Create Policy button.
  3. Navigate to the JSON tab and replace the contents with the code below.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Deny",
                "Action": "ec2:RunInstances",
                "Resource": [
                    "arn:aws:ec2:*:*:subnet/",
                    "arn:aws:ec2:*:*:network-interface/*",
                    "arn:aws:ec2:*:*:instance/*",
                    "arn:aws:ec2:*:*:volume/*",
                    "arn:aws:ec2:*::image/ami-*",
                    "arn:aws:ec2:*:*:key-pair/*",
                    "arn:aws:ec2:*:*:security-group/*"
                ]
            }
        ]
     }
    
  4. Select Review Policy.
  5. Provide a name for your policy, e.g., outposts-ec2-capacity and select Create Policy.
  6. Record the Amazon Resource Name (ARN) for the created policy in your text editor of choice.
  7. Repeat steps 2-6 to create a capacity policy for EBS, e.g., outposts-ebs-capacity using the JSON code below:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Deny",
                "Action": "ec2:CreateVolume",
                "Resource": [
                    "arn:aws:ec2:*:*:subnet/",
                    "arn:aws:ec2:*:*:network-interface/*",
                    "arn:aws:ec2:*:*:instance/*",
                    "arn:aws:ec2:*:*:volume/*",
                    "arn:aws:ec2:*::image/ami-*",
                    "arn:aws:ec2:*:*:key-pair/*",
                    "arn:aws:ec2:*:*:security-group/*"
                ]
            }
        ]
     }
    
  8. Assign the created policies to a role mapped to your Outposts users.

Now, you have the correct IAM policies in place, and you can move on to launch an AWS CloudFormation stack.

Launch AWS CloudFormation stack

Create an AWS CloudFormation stack using the outposts-manage-capacity.yml template that is included in your local clone of the repository. For illustration purposes, the template creates resources to monitor a subset of the Amazon EC2 instance types supported on AWS Outposts. If you have a different configuration, adjust the template by modifying the CloudWatch alarm resources accordingly (Type: AWS::CloudWatch::Alarm).

Provide all necessary parameters using the following descriptions:

Review created resources

Now that you have launched your AWS CloudFormation stack, you are ready to review the resources you created. In order to automate the monitoring of our capacity, I use Amazon CloudWatch alarms to monitor the default metrics published by AWS Outposts and trigger an action when a particular condition has been met. Review the metric alarms created by navigating to the Amazon CloudWatch console. There are alarms for sample instance types in your Outposts. There is also an alarm for EBS capacity. The thresholds for these alarms are based on the parameters (ComputeThreshold, StorageThreshold) provided to your AWS CloudFormation stack.

I use an Amazon CloudWatch Event rule to fan-out notifications about capacity events in our AWS Outposts. Review the event rule created by navigating to the Amazon CloudWatch console. The rule triggers an action to send a message to an SNS topic to inform administrators via email. Review the created topics by navigating to the Amazon SNS console. After your email subscription is created, you must confirm it using the confirmation email message delivered to the account provided.

You then automate remediation by creating a second action in the event rule to trigger the AWS Lambda function. Remediation is based on the current capacity conditions and the state of our CloudWatch alarms. These functions use the boto3 python library to update the version of our previously created Amazon IAM policy and reflect the correct privilege for our administrators based on the system state. Review the created AWS Lambda functions using the console. Navigate to the function’s Configuration tab to see that Amazon CloudWatch Events are configured as triggers as shown in the figure below.

Lambda Function's Configuration tab to check if CloudWatch events are configured as a trigger

Lambda Function’s Configuration tab to check if CloudWatch events are configured as a trigger

Validate your work

To validate your work, assign the IAM policy created for Amazon EC2 and Amazon EBS to the IAM users or IAM roles assigned to your AWS Outposts administrators. Once assigned, use one of these IAM principals to create enough EC2 or EBS resources on your AWS Outpost to breach the thresholds you previously established. This depends on your AWS Outposts configuration and the capacity it is populated with. When capacity is exceeded, your administrators should receive an error similar to the following image when attempting to create resources in addition to an email notification of the condition. You can also circumvent this by using the ‘Set_Alarm_State’ function through the SDK/CLI to modify the state of your target alarms.

This error should be removed once the capacity returns to the normal state. Please note that AWS Outposts CloudWatch metrics have a default resolution period of five minutes so allow time for a period update to register before changes are made.

Cleaning up

To avoid incurring future charges, delete the resources created during this blog by deleting the AWS CloudFormation stack.

Conclusion

One of the benefits of AWS Outposts is a truly consistent hybrid experience. This post showed how you can use the same management and monitoring services in the AWS Cloud across on-premises and in-Region AWS environments. This simplifies the responsibility to manage capacity to meet your workload requirements. For additional information on the benefits of AWS Outposts, visit its product detail page.

Investigate VPC flow with Amazon Detective

Post Syndicated from Ross Warren original https://aws.amazon.com/blogs/security/investigate-vpc-flow-with-amazon-detective/

Many Amazon Web Services (AWS) customers need enhanced insight into IP network flow. Traditionally, cost, the complexity of collection, and the time required for analysis has led to incomplete investigations of network flows. Having good telemetry is paramount, and VPC Flow Logs are a very important part of a robust centralized logging architecture. The information that VPC Flow Logs provide is frequently used by security analysts to determine the scope of security issues, to validate that network access rules are working as expected, and to help analysts investigate issues and diagnose network behaviors. Flow logs capture information about the IP traffic going to and from EC2 interfaces in a VPC. Each record describes aspects of the traffic flow, such as where it originated and where it was sent to, what network ports were used, and how many bytes were sent.

Amazon Detective now enables you to interactively examine the details of the virtual private cloud (VPC) network flows of your Amazon Elastic Compute Cloud (Amazon EC2) instances. Amazon Detective makes it easy to analyze, investigate, and quickly identify the root cause of potential security issues or suspicious activities. Detective automatically collects VPC flow logs from your monitored accounts, aggregates them by EC2 instance, and presents visual summaries and analytics about these network flows. Detective doesn’t require VPC Flow Logs to be configured and doesn’t impact existing flow log collection.

In this blog post, I describe how to use the new VPC flow feature in Detective to investigate an UnauthorizedAccess:EC2/TorClient finding from Amazon GuardDuty. Amazon GuardDuty is a threat detection service that continuously monitors for malicious activity and unauthorized behavior to protect your AWS accounts, workloads, and data stored in Amazon S3. GuardDuty documentation states that this alert can indicate unauthorized access to your AWS resources with the intent of hiding the unauthorized user’s true identity. I’ll demonstrate how to use Amazon Detective to investigate an instance that was flagged by Amazon GuardDuty to determine whether it is compromised or not.

Starting the investigation in GuardDuty

In my GuardDuty console, I’m going to select the UnauthorizedAccess:EC2/TorClient finding shown in Figure 1, choose the Actions menu, and select Investigate.
 

Figure 1: Investigating from the GuardDuty console

Figure 1: Investigating from the GuardDuty console

This opens a new browser tab and launches the Amazon Detective console, where I’m presented with the profile page for this finding, shown in Figure 2. You must have Detective enabled to pivot between a GuardDuty finding and Detective. Detective provides profile pages for supported GuardDuty findings and AWS resources (for example, IP address, EC2 instance, user, and role) that include information and data visualizations that summarize observed behaviors and give guidance for interpreting them. Profiles help analysts to determine whether the finding is of genuine concern or a false positive. For resources, profiles provide supporting details for an investigation into a finding or for a general hunt for suspicious activity.
 

Figure 2: Finding profile panel

Figure 2: Finding profile panel

In this case, the profile page for this GuardDuty UnauthorizedAccess:EC2/TorClient finding provides contextual and behavioral data about the EC2 instance on which GuardDuty has noted the issue. As I dive into this finding, I’m going to be asking questions that help assess whether the instance was in fact accessed unintentionally, such as, “What IP port or network service was in use at that time?,” “Were any large data transfers involved?,” “Was the traffic allowed by my security groups?” Profile pages in Detective organize content that helps security analysts investigate GuardDuty findings, examine unexpected network behavior, and identify other AWS resources that might be affected by a potential security issue.

I begin scrolling down the page and notice the Findings associated with EC2 instance i-9999999999999999 panel. Detective displays related findings to provide analysts with additional evidence and context about potentially related issues. The finding I’m investigating is listed there, as well as an Unusual Behaviors/VM/Behavior:EC2-NetworkPortUnusual finding. GuardDuty builds a baseline on your network traffic and will generate findings where there is traffic outside the calculated normal. While we might not investigate every instance of anomalous traffic, having these alerts correlated by Detective provides context for validating the issue. Keeping this in mind as I scroll down, at the bottom of this profile page, I find the Overall VPC flow volume panel. If you choose the Info link next to the panel title, you can see helpful tips that describe how to use the visualizations and provide ideas for questions to ask within your investigation. These info links are available throughout Detective. Check them out!

Investigating VPC flow in Detective

In this investigation, I’m very curious about the two large spikes in inbound traffic that I see in the Overall VPC flow volume panel, which seem to be visually associated with some unusual outbound traffic spikes. It’s most likely that these outbound spikes are related to the Unusual Behaviors/VM/Behavior:EC2-NetworkPortUnusual finding I mentioned earlier. To start the investigation, I choose the display details for scope time button, shown circled at the bottom of Figure 2. This expands the VPC Flow Details, shown in Figure 3.
 

Figure 3: Our first look at VPC Flow Details

Figure 3: Our first look at VPC Flow Details

We now can see that each entry displays the volume of inbound traffic, the volume of outbound traffic, and whether the access request was accepted or rejected. Detective provides annotations on the VPC flows to help guide your investigation. These From finding annotations make it clear which flows and resources were involved in the finding. In this case, we can easily see (in Figure 3) the three IP addresses at the top of the list that triggered this GuardDuty finding.

I’m first going to focus on the spikes in traffic that are above the baseline. When I click on one of the spikes in the graph, the time window for the VPC flow activity now matches the dates of these spikes I’m investigating.

If I choose the Inbound Traffic column header, shown in Figure 4, I can find the flows that contributed to the spike during this time window.
 

Figure 4: Inbound traffic spikes

Figure 4: Inbound traffic spikes

Note that the two large inbound spikes aren’t associated with the IP address from the UnauthorizedAccess:EC2/TorClient finding, based on the Detective annotation From finding. Let’s check the outbound traffic. If I do a quick sort of the table based on the outbound traffic column, as shown in Figure 5, we can also see the outbound spikes, and it isn’t immediately evident whether the spikes are associated with this finding. I could continue to investigate the spikes (because they are a visual anomaly), or focus just on the VPC flow traffic that GuardDuty and Detective have labeled as associated with this TOR finding.
 

Figure 5: Outbound traffic spikes

Figure 5: Outbound traffic spikes

Let’s focus on the outbound and inbound spikes and see if we can determine what’s happening. The inbound spikes are on port 443, typically an HTTPS port, or a secure web connection. The outbound spikes are on port 22 (ssh), but go to IP addresses that look to be internal based on their addresses of 172.16.x.x. The port 443 traffic might indicate a web server that’s open to the internet and receiving traffic. With further investigation, we can determine if this idea is valid, and continue hunting for potentially malicious traffic.

A good next step would be to investigate the two specific IP addresses to rule out their involvement in the finding. I can do this by right-clicking on either of the external IP addresses and opening a new tab, where I can focus on investigating these two specific IP addresses. I would take this line of investigation to possibly rule out the involvement of these IP addresses in this finding, determine if they regularly communicate with my resources, find out what instance(s) they’re related to, and see if there are other findings associated with these instances or IP addresses. This deeper investigation is outside the scope of this blog post, but it’s something you should be doing in your own environment.

IP addresses in AWS are ephemeral in nature. The unique identifier in VPC flow logs is the Instance ID. At the time of this investigation, 172.16.0.7 is assigned to the instance related to this finding, so let’s continue to take a look at the internal 172.16.0.7 IP address with 218 MB outbound traffic on port 22. I choose 172.16.0.7, and Detective opens up the profile page for this specific IP address, as shown in Figure 6. Here we see some interesting correlations: two other GuardDuty findings related to SSH brute-force attacks. These could be related to our outbound port 22 spikes, because they’re certainly in the window of time we’re investigating.
 

Figure 6: IP address profile panel

Figure 6: IP address profile panel

As part of a deeper investigation, you would investigate the SSH brute-force findings for 198.51.100.254 and 203.0.113.83 but for now I’m interested what this IP is involved in. Detective easily associates this 172.16.0.7 IP address with the instance that was assigned the IP during the scope time. I scroll down to the bottom of the profile page for 172.16.0.7 and investigate the i-9999999999999999 instance by choosing the instance name.

Filtering VPC flow activity

In Detective, as the investigator we are looking at an instance profile panel, similar to the one in Figure 2, and since we’re interested in VPC flow details, I’m going to scroll down and select display details for scope time.

To focus on specific activity, I can filter the activity details by the following values:

  • IP address
  • Local or remote port
  • Direction
  • Protocol
  • Whether the request was accepted or rejected

I’m going to filter these VPC flow details and just look at port 22 (sshd) inbound traffic. I select the Filter check box and select Local Port and 22, as shown in Figure 7. Detective fills in all the available ports for you, making it easy to complete this filter.
 

Figure 7: Port 22 traffic

Figure 7: Port 22 traffic

The activity details show a few IP addresses related to port 22, and we’re still following the large outbound spikes of traffic. It’s outside the scope of this blog post, but now it would be time to start looking at your security groups and network access control lists (ACLs) and determine why port 22 is open to the internet and sending all this traffic.

Understanding traffic behavior

As an investigator, I now have a good picture of the traffic related to the initial finding, and by diving deeper we’re able to discover other interesting traffic during the same timeframe. While we may not always determine “who has done it,” the goal should be to improve our understanding of the behavior of our environment and gather important technical evidence. Detective helps you identify and investigate anomalies to give you insight into your environment. If we were to continue our investigation into the finding, here are some actions we can take within Detective.

Investigate VPC findings with Detective:

  • Perform ports and utilization analysis
    • Identify service and ephemeral ports
    • Determine whether traffic was accepted or rejected based on security groups and NACL configurations
    • Investigate possible reconnaissance traffic by exploring the significant amount of rejected traffic
  • Correlate EC2 instances to TCP/IP ports and IPs
  • Analyze traffic spikes and anomalies
  • Discover traffic patterns and make behavioral correlations

Explore EC2 instance behavior with Detective:

  • Directional Traffic Analysis
  • Investigate possible data exfiltration events by digging into large transfers
  • Enumerate distinct IP connections and sort and filter by protocol, amount of traffic, and traffic direction
  • Gather data related to a spike in port count from a single IP address (potential brute force) or multiple IP addresses (distributed denial of service (DDoS))

Additional forensics steps to consider

  • Snapshot EC2 Volumes
  • Memory dump of EC2 instance
  • Isolate EC2 instance
  • Review your authentication strategy and assess whether the chosen authentication method is sufficient to protect your asset

Summary

Without requiring you to set up infrastructure or spend time configuring log ingestion, Detective collects, organizes, and presents relevant data for your threat analysis and investigations. Security and operations teams will find this new capability helpful for simplifying EC2 traffic analysis, validating security group permissions, and diagnosing EC2 instance behavior. Detective does the heavy lifting of storing, and analyzing VPC flow data so you can focus on quickly answering your investigative questions. VPC network flow details are available now in all Detective supported Regions and are included as part of your service subscription.

To get started, you can enable a 30-day free trial of Amazon Detective. See the AWS Regional Services page for all the regions where Detective is available. To learn more, visit the Amazon Detective product page.

Are you a visual learner? Check out Amazon Detective Overview and Demonstration. This video helps you learn how and when to use Amazon Detective to improve the security of your AWS resources.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Ross Warren

Ross Warren is a Solution Architect at AWS based in Northern Virginia. Prior to his work at AWS, Ross’ areas of focus included cyber threat hunting and security operations. Ross has worked at a handful of startups and has enjoyed the transition to AWS because he can continue to build solutions for customers on today’s most innovative platform.

Author

Jim Miller

Jim is a Solution Architect at AWS based in Connecticut. Jim has worked within cyber security his entire career with areas of focus including cyber security architecture and incident response. At AWS he loves building secure solutions for customers to enable teams to build and innovate with confidence.

Round 2 post-quantum TLS is now supported in AWS KMS

Post Syndicated from Alex Weibel original https://aws.amazon.com/blogs/security/round-2-post-quantum-tls-is-now-supported-in-aws-kms/

AWS Key Management Service (AWS KMS) now supports three new hybrid post-quantum key exchange algorithms for the Transport Layer Security (TLS) 1.2 encryption protocol that’s used when connecting to AWS KMS API endpoints. These new hybrid post-quantum algorithms combine the proven security of a classical key exchange with the potential quantum-safe properties of new post-quantum key exchanges undergoing evaluation for standardization. The fastest of these algorithms adds approximately 0.3 milliseconds of overheard compared to a classical TLS handshake. The new post-quantum key exchange algorithms added are Round 2 versions of Kyber, Bit Flipping Key Encapsulation (BIKE), and Supersingular Isogeny Key Encapsulation (SIKE). Each organization has submitted their algorithms to the National Institute of Standards and Technology (NIST) as part of NIST’s post-quantum cryptography standardization process. This process spans several rounds of evaluation over multiple years, and is likely to continue beyond 2021.

In our previous hybrid post-quantum TLS blog post, we announced that AWS KMS had launched hybrid post-quantum TLS 1.2 with Round 1 versions of BIKE and SIKE. The Round 1 post-quantum algorithms are still supported by AWS KMS, but at a lower priority than the Round 2 algorithms. You can choose to upgrade your client to enable negotiation of Round 2 algorithms.

Why post-quantum TLS is important

A large-scale quantum computer would be able to break the current public-key cryptography that’s used for key exchange in classical TLS connections. While a large-scale quantum computer isn’t available today, it’s still important to think about and plan for your long-term security needs. TLS traffic using classical algorithms recorded today could be decrypted by a large-scale quantum computer in the future. If you’re developing applications that rely on the long-term confidentiality of data passed over a TLS connection, you should consider a plan to migrate to post-quantum cryptography before the lifespan of the sensitivity of your data would be susceptible to an unauthorized user with a large-scale quantum computer. As an example, this means that if you believe that a large-scale quantum computer is 25 years away, and your data must be secure for 20 years, you should migrate to post-quantum schemes within the next 5 years. AWS is working to prepare for this future, and we want you to be prepared too.

We’re offering this feature now instead of waiting for standardization efforts to be complete so you have a way to measure the potential performance impact to your applications. Offering this feature now also gives you the protection afforded by the proposed post-quantum schemes today. While we believe that the use of this feature raises the already high security bar for connecting to AWS KMS endpoints, these new cipher suites will impact bandwidth utilization and latency. However, using these new algorithms could also create connection failures for intermediate systems that proxy TLS connections. We’d like to get feedback from you on the effectiveness of our implementation or any issues found so we can improve it over time.

Hybrid post-quantum TLS 1.2

Hybrid post-quantum TLS is a feature that provides the security protections of both the classical and post-quantum key exchange algorithms in a single TLS handshake. Figure 1 shows the differences in the connection secret derivation process between classical and hybrid post-quantum TLS 1.2. Hybrid post-quantum TLS 1.2 has three major differences from classical TLS 1.2:

  • The negotiated post-quantum key is appended to the ECDHE key before being used as the hash-based message authentication code (HMAC) key.
  • The text hybrid in its ASCII representation is prepended to the beginning of the HMAC message.
  • The entire client key exchange message from the TLS handshake is appended to the end of the HMAC message.
Figure 1: Differences in the connection secret derivation process between classical and hybrid post-quantum TLS 1.2

Figure 1: Differences in the connection secret derivation process between classical and hybrid post-quantum TLS 1.2

Some background on post-quantum TLS

Today, all requests to AWS KMS use TLS with key exchange algorithms that provide perfect forward secrecy and use one of the following classical schemes:

While existing FFDHE and ECDHE schemes use perfect forward secrecy to protect against the compromise of the server’s long-term secret key, these schemes don’t protect against large-scale quantum computers. In the future, a sufficiently capable large-scale quantum computer could run Shor’s Algorithm to recover the TLS session key of a recorded classical session, and thereby gain access to the data inside. Using a post-quantum key exchange algorithm during the TLS handshake protects against attacks from a large-scale quantum computer.

The possibility of large-scale quantum computing has spurred the development of new quantum-resistant cryptographic algorithms. NIST has started the process of standardizing post-quantum key encapsulation mechanisms (KEMs). A KEM is a type of key exchange that’s used to establish a shared symmetric key. AWS has chosen three NIST KEM submissions to adopt in our post-quantum efforts:

Hybrid mode ensures that the negotiated key is as strong as the weakest key agreement scheme. If one of the schemes is broken, the communications remain confidential. The Internet Engineering Task Force (IETF) Hybrid Post-Quantum Key Encapsulation Methods for Transport Layer Security 1.2 draft describes how to combine post-quantum KEMs with ECDHE to create new cipher suites for TLS 1.2.

These cipher suites use a hybrid key exchange that performs two independent key exchanges during the TLS handshake. The key exchange then cryptographically combines the keys from each into a single TLS session key. This strategy combines the proven security of a classical key exchange with the potential quantum-safe properties of new post-quantum key exchanges being analyzed by NIST.

The effect of hybrid post-quantum TLS on performance

Post-quantum cipher suites have a different performance profile and bandwidth usage from traditional cipher suites. AWS has measured bandwidth and latency across 2,000 TLS handshakes between an Amazon Elastic Compute Cloud (Amazon EC2) C5n.4xlarge client and the public AWS KMS endpoint, which were both in the us-west-2 Region. Your own performance characteristics might differ, and will depend on your environment, including your:

  • Hardware–CPU speed and number of cores.
  • Existing workloads–how often you call AWS KMS and what other work your application performs.
  • Network–location and capacity.

The following graphs and table show latency measurements performed by AWS for all newly supported Round 2 post-quantum algorithms, in addition to the classical ECDHE key exchange algorithm currently used by most customers.

Figure 2 shows the latency differences of all hybrid post-quantum algorithms compared with classical ECDHE alone, and shows that compared to ECDHE alone, SIKE adds approximately 101 milliseconds of overhead, BIKE adds approximately 9.5 milliseconds of overhead, and Kyber adds approximately 0.3 milliseconds of overhead.
 

Figure 2: TLS handshake latency at varying percentiles for four key exchange algorithms

Figure 2: TLS handshake latency at varying percentiles for four key exchange algorithms

Figure 3 shows the latency differences between ECDHE with Kyber, and ECDHE alone. The addition of Kyber adds approximately 0.3 milliseconds of overhead.
 

Figure 3: TLS handshake latency at varying percentiles, with only top two performing key exchange algorithms

Figure 3: TLS handshake latency at varying percentiles, with only top two performing key exchange algorithms

The following table shows the total amount of data (in bytes) needed to complete the TLS handshake for each cipher suite, the average latency, and latency at varying percentiles. All measurements were gathered from 2,000 TLS handshakes. The time was measured on the client from the start of the handshake until the handshake was completed, and includes all network transfer time. All connections used RSA authentication with a 2048-bit key, and ECDHE used the secp256r1 curve. All hybrid post-quantum tests used the NIST Round 2 versions. The Kyber test used the Kyber-512 parameter, the BIKE test used the BIKE-1 Level 1 parameter, and the SIKE test used the SIKEp434 parameter.

Item Bandwidth
(bytes)
Total
handshakes
Average
(ms)
p0
(ms)
p50
(ms)
p90
(ms)
p99
(ms)
ECDHE (classic) 3,574 2,000 3.08 2.07 3.02 3.95 4.71
ECDHE + Kyber R2 5,898 2,000 3.36 2.38 3.17 4.28 5.35
ECDHE + BIKE R2 12,456 2,000 14.91 11.59 14.16 18.27 23.58
ECDHE + SIKE R2 4,628 2,000 112.40 103.22 108.87 126.80 146.56

By default, the AWS SDK client performs a TLS handshake once to set up a new TLS connection, and then reuses that TLS connection for multiple requests. This means that the increased cost of a hybrid post-quantum TLS handshake is amortized over multiple requests sent over the TLS connection. You should take the amortization into account when evaluating the overall additional cost of using post-quantum algorithms; otherwise performance data could be skewed.

AWS KMS has chosen Kyber Round 2 to be KMS’s highest prioritized post-quantum algorithm, with BIKE Round 2, and SIKE Round 2 next in priority order for post-quantum algorithms. This is because Kyber’s performance is closest to the classical ECDHE performance that most AWS KMS customers are using today and are accustomed to.

How to use hybrid post-quantum cipher suites

To use the post-quantum cipher suites with AWS KMS, you need the preview release of the AWS Common Runtime (CRT) HTTP client for the AWS SDK for Java 2.x. Also, you will need to configure the AWS CRT HTTP client to use the s2n post-quantum hybrid cipher suites. Post-quantum TLS for AWS KMS is available in all AWS Regions except for AWS GovCloud (US-East), AWS GovCloud (US-West), AWS China (Beijing) Region operated by Beijing Sinnet Technology Co. Ltd (“Sinnet”), and AWS China (Ningxia) Region operated by Ningxia Western Cloud Data Technology Co. Ltd. (“NWCD”). Since NIST has not yet standardized post-quantum cryptography, connections that require Federal Information Processing Standards (FIPS) compliance cannot use the hybrid key exchange. For example, kms.<region>.amazonaws.com supports the use of post-quantum cipher suites, while kms-fips.<region>.amazonaws.com does not.

  1. If you’re using the AWS SDK for Java 2.x, you must add the preview release of the AWS Common Runtime client to your Maven dependencies.
    <dependency>
        <groupId>software.amazon.awssdk</groupId>
        <artifactId>aws-crt-client</artifactId>
        <version>2.14.13-PREVIEW</version>
    </dependency>
    

  2. You then must configure the new SDK and cipher suite in the existing initialization code of your application:
    if(!TLS_CIPHER_PREF_KMS_PQ_TLSv1_0_2020_07.isSupported()){
        throw new RuntimeException("Post Quantum Ciphers not supported on this Platform");
    }
    
    SdkAsyncHttpClient awsCrtHttpClient = AwsCrtAsyncHttpClient.builder()
              .tlsCipherPreference(TLS_CIPHER_PREF_KMS_PQ_TLSv1_0_2020_07)
              .build();
              
    KmsAsyncClient kms = KmsAsyncClient.builder()
             .httpClient(awsCrtHttpClient)
             .build();
             
    ListKeysResponse response = kms.listKeys().get();
    

Now, all connections made to AWS KMS in supported Regions will use the new hybrid post-quantum cipher suites! To see a complete example of everything set up, check out the example application here.

Things to try

Here are some ideas about how to use this post-quantum-enabled client:

  • Run load tests and benchmarks. These new cipher suites perform differently than traditional key exchange algorithms. You might need to adjust your connection timeouts to allow for the longer handshake times or, if you’re running inside an AWS Lambda function, extend the execution timeout setting.
  • Try connecting from different locations. Depending on the network path your request takes, you might discover that intermediate hosts, proxies, or firewalls with deep packet inspection (DPI) block the request. This could be due to the new cipher suites in the ClientHello or the larger key exchange messages. If this is the case, you might need to work with your security team or IT administrators to update the relevant configuration to unblock the new TLS cipher suites. We’d like to hear from you about how your infrastructure interacts with this new variant of TLS traffic. If you have questions or feedback, please start a new thread on the AWS KMS discussion forum.

Conclusion

In this blog post, I announced support for Round 2 hybrid post-quantum algorithms in AWS KMS, and showed you how to begin experimenting with hybrid post-quantum key exchange algorithms for TLS when connecting to AWS KMS endpoints.

More info

If you’d like to learn more about post-quantum cryptography check out:

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Alex Weibel

Alex is a Senior Software Engineer on the AWS Crypto Algorithms team. He’s one of the maintainers for Amazon’s TLS Library s2n. Previously, Alex worked on TLS termination and request proxying for S3 and the Elastic Load Balancing Service developing new features for customers. Alex holds a Bachelor of Science degree in Computer Science from the University of Texas at Austin.

[$] A realtime developer’s checklist

Post Syndicated from original https://lwn.net/Articles/837019/rss

Realtime application development under Linux requires care to make sure
that the critical realtime tasks do not suffer interference from other
applications and the rest of the system. During the Embedded
Linux Conference (ELC) 2020
, John Ogness presented a checklist (slides
[PDF]
) for realtime developers, with practical recipes to
follow. There are a lot of tools and features available for realtime
developers, even on systems without the RT_PREEMPT patches applied.

Workers KV – free to try, with increased limits!

Post Syndicated from Greg McKeon original https://blog.cloudflare.com/workers-kv-free-tier/

Workers KV - free to try, with increased limits!

Workers KV - free to try, with increased limits!

In May 2019, we launched Workers KV, letting developers store key-value data and make that data globally accessible from Workers running in Cloudflare’s over 200 data centers.

Today, we’re announcing a Free Tier for Workers KV that opens up global, low-latency data storage to every developer on the Workers platform. Additionally, to expand Workers KV’s use cases even further, we’re also raising the maximum value size from 10 MB to 25 MB. You can now write an application that serves larger static files directly or JSON blobs directly from KV.

Together with our announcement of the Durable Objects limited beta last month, the Workers platform continues to move toward providing storage solutions for applications that are globally deployed as easily as an application running in a single data center today.

What are the new free tier limits?

The free tier includes 100,000 read operations and 1,000 each of write, list and delete operations per day, resetting daily at UTC 00:00, with a maximum total storage size of 1 GB. Operations that exceed these limits will fail with an error.

Additional KV usage costs $0.50 per million read operations, $5.00 per million list, write and delete operations and $0.50 per GB of stored data.

We intentionally chose these limits to prioritize use cases where KV works well – infrequently written data that may be frequently read around the globe.

What is the new KV value size limit?

We’re raising the value size limit in Workers KV from 10 MB to 25 MB. Users frequently store static assets in Workers KV to then be served by Workers code. To make it as easy as possible to deploy your entire site on Workers, we’re raising the value size limit to handle even larger assets.

Since Workers Sites hosts your site from Workers KV, the increased size limit also means Workers Sites assets can now be as large as 25 MB.

How does Workers KV work?

Workers KV stores key-value pairs and caches hot keys in Cloudflare’s data centers around the world. When a request hits a Worker that uses KV, it retrieves the KV pair from Cloudflare’s local cache with low latency if the pair has been accessed recently.

While some programs running on the Workers platform are stateless, it is often necessary to distribute files or configuration data to running Workers. Workers KV allows you to persist data and access it across multiple Workers calls.

For example, let’s say I wanted to serve a static text file from Cloudflare’s edge. I could provision my own object storage, host it on my own domain, and put that domain behind Cloudflare.

With Workers KV, however, that reduces down to a few simple steps. First, I bind my KV namespace to my Workers code with Wrangler.

wrangler kv:namespace create "BUCKET"

Then, in my wrangler.toml, I add my new namespace id to associate it with my Worker.

kv_namespaces = [
 {binding = “BUCKET", id = <insert-id-here>}
]

I can upload a new text file from the command line using Wrangler:

$ wrangler kv:key put --binding=BUCKET "my-file" value.txt --path

And then serve that file from my Workers script with low latency from any of Cloudflare’s points of presence around the globe!

addEventListener('fetch', event => {
  event.respondWith(handleEvent(event))
})

async function handleEvent(event) {
 let txt = await BUCKET.get("my-file")  
 return new Response(txt, {
    headers: {
      "content-type": "text/plain"
    }
  })
}

Beyond file hosting, Workers users have built many other types of applications with Workers KV:

  • Mass redirects – handle billions of HTTP redirects.
  • Access control rules – validate user requests to your API.
  • Translation keys – dynamically localize your web pages.
  • Configuration data – manage who can access your origin.

While Workers KV provides low latency access across the globe, it may not return the most up-to-date data if updates to keys are happening more than once a minute or from multiple data centers simultaneously. For use cases that cannot tolerate stale data, Durable Objects is a better solution.

Get started with Workers KV today, for free

The free tier and increased limits are live now!

You can get started with Workers and Workers KV in the Cloudflare dash. To check out an example of how to use Workers KV, check out the tutorial in the Workers documentation.

Snowflake: Running Millions of Simulation Tests with Amazon EKS

Post Syndicated from Keith Joelner original https://aws.amazon.com/blogs/architecture/snowflake-running-millions-of-simulation-tests-with-amazon-eks/

This post was co-written with Brian Nutt, Senior Software Engineer and Kao Makino, Principal Performance Engineer, both at Snowflake.

Transactional databases are a key component of any production system. Maintaining data integrity while rows are read and written at a massive scale is a major technical challenge for these types of databases. To ensure their stability, it’s necessary to test many different scenarios and configurations. Simulating as many of these as possible allows engineers to quickly catch defects and build resilience. But the Holy Grail is to accomplish this at scale and within a timeframe that allows your developers to iterate quickly.

Snowflake has been using and advancing FoundationDB (FDB), an open-source, ACID-compliant, distributed key-value store since 2014. FDB, running on Amazon Elastic Cloud Compute (EC2) and Amazon Elastic Block Storage (EBS), has proven to be extremely reliable and is a key part of Snowflake’s cloud services layer architecture. To support its development process of creating high quality and stable software, Snowflake developed Project Joshua, an internal system that leverages Amazon Elastic Kubernetes Service (EKS), Amazon Elastic Container Registry (ECR), Amazon EC2 Spot Instances, and AWS PrivateLink to run over one hundred thousand of validation and regression tests an hour.

About Snowflake

Snowflake is a single, integrated data platform delivered as a service. Built from the ground up for the cloud, Snowflake’s unique multi-cluster shared data architecture delivers the performance, scale, elasticity, and concurrency that today’s organizations require. It features storage, compute, and global services layers that are physically separated but logically integrated. Data workloads scale independently from one another, making it an ideal platform for data warehousing, data lakes, data engineering, data science, modern data sharing, and developing data applications.

Snowflake architecture

Developing a simulation-based testing and validation framework

Snowflake’s cloud services layer is composed of a collection of services that manage virtual warehouses, query optimization, and transactions. This layer relies on rich metadata stored in FDB.

Prior to the creation of the simulation framework, Project Joshua, FDB developers ran tests on their laptops and were limited by the number they could run. Additionally, there was a scheduled nightly job for running further tests.

Joshua at Snowflake

Amazon EKS as the foundation

Snowflake’s platform team decided to use Kubernetes to build Project Joshua. Their focus was on helping engineers run their workloads instead of spending cycles on the management of the control plane. They turned to Amazon EKS to achieve their scalability needs. This was a crucial success criterion for Project Joshua since at any point in time there could be hundreds of nodes running in the cluster. Snowflake utilizes the Kubernetes Cluster Autoscaler to dynamically scale worker nodes in minutes to support a tests-based queue of Joshua’s requests.

With the integration of Amazon EKS and Amazon Virtual Private Cloud (Amazon VPC), Snowflake is able to control access to the required resources. For example: the database that serves Joshua’s test queues is external to the EKS cluster. By using the Amazon VPC CNI plugin, each pod receives an IP address in the VPC and Snowflake can control access to the test queue via security groups.

To achieve its desired performance, Snowflake created its own custom pod scaler, which responds quicker to changes than using a custom metric for pod scheduling.

  • The agent scaler is responsible for monitoring a test queue in the coordination database (which, coincidentally, is also FDB) to schedule Joshua agents. The agent scaler communicates directly with Amazon EKS using the Kubernetes API to schedule tests in parallel.
  • Joshua agents (one agent per pod) are responsible for pulling tests from the test queue, executing, and reporting results. Tests are run one at a time within the EKS Cluster until the test queue is drained.

Achieving scale and cost savings with Amazon EC2 Spot

A Spot Fleet is a collection—or fleet—of Amazon EC2 Spot instances that Joshua uses to make the infrastructure more reliable and cost effective. ​ Spot Fleet is used to reduce the cost of worker nodes by running a variety of instance types.

With Spot Fleet, Snowflake requests a combination of different instance types to help ensure that demand gets fulfilled. These options make Fleet more tolerant of surges in demand for instance types. If a surge occurs it will not significantly affect tasks since Joshua is agnostic to the type of instance and can fall back to a different instance type and still be available.

For reservations, Snowflake uses the capacity-optimized allocation strategy to automatically launch Spot Instances into the most available pools by looking at real-time capacity data and predicting which are the most available. This helps Snowflake quickly switch instances reserved to what is most available in the Spot market, instead of spending time contending for the cheapest instances, at the cost of a potentially higher price.

Overcoming hurdles

Snowflake’s usage of a public container registry posed a scalability challenge. When starting hundreds of worker nodes, each node needs to pull images from the public registry. This can lead to a potential rate limiting issue when all outbound traffic goes through a NAT gateway.

For example, consider 1,000 nodes pulling a 10 GB image. Each pull request requires each node to download the image across the public internet. Some issues that need to be addressed are latency, reliability, and increased costs due to the additional time to download an image for each test. Also, container registries can become unavailable or may rate-limit download requests. Lastly, images are pulled through public internet and other services in the cluster can experience pulling issues.

​For anything more than a minimal workload, a local container registry is needed. If an image is first pulled from the public registry and then pushed to a local registry (cache), it only needs to pull once from the public registry, then all worker nodes benefit from a local pull. That’s why Snowflake decided to replicate images to ECR, a fully managed docker container registry, providing a reliable local registry to store images. Additional benefits for the local registry are that it’s not exclusive to Joshua; all platform components required for Snowflake clusters can be cached in the local ECR Registry. For additional security and performance Snowflake uses AWS PrivateLink to keep all network traffic from ECR to the workers nodes within the AWS network. It also resolved rate-limiting issues from pulling images from a public registry with unauthenticated requests, unblocking other cluster nodes from pulling critical images for operation.

Conclusion

Project Joshua allows Snowflake to enable developers to test more scenarios without having to worry about the management of the infrastructure. ​ Snowflake’s engineers can schedule thousands of test simulations and configurations to catch bugs faster. FDB is a key component of ​the Snowflake stack and Project Joshua helps make FDB more stable and resilient. Additionally, Amazon EC2 Spot has provided non-trivial cost savings to Snowflake vs. running on-demand or buying reserved instances.

If you want to know more about how Snowflake built its high performance data warehouse as a Service on AWS, watch the This is My Architecture video below.

Security updates for Monday

Post Syndicated from original https://lwn.net/Articles/837431/rss

Security updates have been issued by Debian (libdatetime-timezone-perl and libvncserver), Fedora (chromium, kernel, kernel-headers, kernel-tools, krb5, libexif, libxml2, and thunderbird), Gentoo (chromium, libmaxminddb, and mit-krb5), Mageia (arpwatch, bluez, chromium-browser-stable, firefox and thunderbird, golang, java-1.8.0-op, kdeconnect-kde, kleopatra, libexif, lilypond, microcode, packagekit, ruby, and tpm2-tss), openSUSE (chromium, firefox, ImageMagick, kernel, openldap2, python-waitress, SDL, u-boot, ucode-intel, and zeromq), Oracle (fence-agents, firefox, freetype, kernel, python, python3, and thunderbird), Red Hat (rh-postgresql10-postgresql, rh-postgresql12-postgresql, and virt:8.2 and virt-devel:8.2), Slackware (seamonkey), and SUSE (firefox, gdm, kernel, and kernel-firmware).

On Blockchain Voting

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2020/11/on-blockchain-voting.html

Blockchain voting is a spectacularly dumb idea for a whole bunch of reasons. I have generally quoted Matt Blaze:

Why is blockchain voting a dumb idea? Glad you asked.

For starters:

  • It doesn’t solve any problems civil elections actually have.
  • It’s basically incompatible with “software independence”, considered an essential property.
  • It can make ballot secrecy difficult or impossible.

I’ve also quoted this XKCD cartoon.

But now I have this excellent paper from MIT researchers:

“Going from Bad to Worse: From Internet Voting to Blockchain Voting”
Sunoo Park, Michael Specter, Neha Narula, and Ronald L. Rivest

Abstract: Voters are understandably concerned about election security. News reports of possible election interference by foreign powers, of unauthorized voting, of voter disenfranchisement, and of technological failures call into question the integrity of elections worldwide.This article examines the suggestions that “voting over the Internet” or “voting on the blockchain” would increase election security, and finds such claims to be wanting and misleading. While current election systems are far from perfect, Internet- and blockchain-based voting would greatly increase the risk of undetectable, nation-scale election failures.Online voting may seem appealing: voting from a computer or smart phone may seem convenient and accessible. However, studies have been inconclusive, showing that online voting may have little to no effect on turnout in practice, and it may even increase disenfranchisement. More importantly: given the current state of computer security, any turnout increase derived from with Internet- or blockchain-based voting would come at the cost of losing meaningful assurance that votes have been counted as they were cast, and not undetectably altered or discarded. This state of affairs will continue as long as standard tactics such as malware, zero days, and denial-of-service attacks continue to be effective.This article analyzes and systematizes prior research on the security risks of online and electronic voting, and show that not only do these risks persist in blockchain-based voting systems, but blockchains may introduce additional problems for voting systems. Finally, we suggest questions for critically assessing security risks of new voting system proposals.

You may have heard of Voatz, which uses blockchain for voting. It’s an insecure mess. And this is my general essay on blockchain. Short summary: it’s completely useless.

Kernel prepatch 5.10-rc4

Post Syndicated from original https://lwn.net/Articles/837347/rss

The 5.10-rc4 kernel prepatch is out for
testing. “All looks good, and nothing makes me go ‘uhhuh, 5.10 looks iffy’. So
go test, let’s get this all solid and calmed down, and this will
hopefully be one of those regular boring releases even if it’s
certainly not been on the smaller side…

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close