Tag Archives: Uncategorized

How to mount Linux volume and keep mount point consistency

2022-02-05 limillan

Post Syndicated from limillan original https://aws.amazon.com/blogs/compute/how-to-mount-linux-volume-and-keep-mount-point-consistency/

This post is written by: Leonardo Azize Martins, Cloud Infrastructure Architect, Professional Services

Customers often use Amazon Elastic Compute Cloud (Amazon EC2) Linux based instances with many Amazon Elastic Block Store (Amazon EBS) volumes attached. In this case, device name can vary depending on some facts, such as virtualization type, instance type, or operating system. As the device name can change, the customer shouldn’t rely on the device name to mount volumes from it. The customer wants to avoid the situation where a volume is mounted on a different mount point just because the device name changed, or the device name doesn’t exist because the name pattern changed.

Customers who want to utilize the latest instance family usually change the instance type when a new one is available. The device name can be different between instance families, such as T2 and T3. T2 uses /dev/sd[a-z], while T3 uses /dev/nvme[0-26]n1. If you mount a device on T2 called /dev/sdc, when you change the instance family to T3 the same device won’t be called /dev/sdc anymore. In this case, it will fail to mount.

Amazon EBS volumes are exposed as NVMe block devices on instances built on the AWS Nitro System. The block device driver can assign NVMe device names in a different order than what you specified for the volumes in the block device mapping. In this situation, a device that should be mounted on /data could end-up being mounted on /logs.

On Linux, you can use the fstab file to mount devices using kernel name descriptors (the traditional way), file system labels, or the file system UUID. Kernel name descriptors aren’t persistent and can change each boot. Therefore, they shouldn’t be used in configuration files.

UUID is a mechanism for giving each filesystem a unique identifier. These identifiers’ attributes are generated by filesystem utilities (mkfs.*) when the device is formatted, and they’re designed so that collisions are unlikely. All GNU/Linux filesystems (including swap and LUKS headers of raw encrypted devices) support UUID.

As UUID is a filesystem attribute, it can also be used with Logical Volume Manager (LVM) and Linux software RAID (mdadm).

Depending on the fstab file configuration, you may find that you can’t access your instance, which requires you to follow a rescue process to fix issues. This is the case if you configure the fstab file with the device name and change the instance type.

This post shows how to mount Linux volumes and keep mount points preserved when the instance type is changed or the instance is rebooted.

Overview of solution

When you create an instance, you specify block devices mapping. It doesn’t mean that the Linux device has the same name or can be discovered in the same order as specified on the instance mapping. This situation can be more evident when using applications that require more volumes.

Using UUID to mount volumes lets you mitigate future issues when you stop and start your instance or change the instance type.

Figure 1: EC2 instance block device mapping

Walkthrough

You will create one instance with three volumes: one root volume and two data volumes. We use Amazon Linux 2 in this post. In this instance type, volumes have a specific name format. Later, you will change the instance type. The new instance type volumes will have another name format.

Follow these steps:

Create an instance with three volumes
Create the filesystem on the volumes and mount them
Change the instance type

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
Linux knowledge

Create instance

Create one instance with three volumes: a root volume and two data volumes. Use Launch Instance Wizard with the following details.

Launch Amazon Linux 2 instance

On Step 1, choose Amazon Linux 2 AMI (HVM), SSD Volume Type.
On Step 2, choose micro.
On Step 3, choose Next.
On Step 4, add two new volumes, device /dev/sdb 10 GiB and device /dev/sdc 12 GiB.

Figure 2: Launch instances, add storage

Create filesystem and mount

Connect to your instance using EC2 Instance Connect or any other method that feels comfortable for you. Mount the device using UUID instead of the device name. Run the following instructions as the user root.

Format and mount the device

Run the following command to confirm that you have three disks:
```
$ lsblk
```
Format disk as xfs, and run the following commands:
```
mkfs.xfs /dev/xvdb
mkfs.xfs /dev/xvdc
```
Create mount point, and run the following commands:
```
mkdir /mnt/disk1
mkdir /mnt/disk2
```

Add mount instructions, and run the following commands:

echo "$(blkid /dev/xvdb | awk '{print $2}') /mnt/disk1 xfs defaults,noatime" | tee -a /etc/fstab
echo "$(blkid /dev/xvdc | awk '{print $2}') /mnt/disk2 xfs defaults,noatime" | tee -a /etc/fstab

Mount volumes, create dummy file, and run the following commands:
```
mount -a
touch /mnt/disk1/file1.txt
touch /mnt/disk2/file2.txt
```

You will have an fstab file like the following:

cat /etc/fstab
UUID=7b355c6b-f82b-4810-94b9-4f3af651f629     /           xfs    defaults,noatime  1   1
UUID="2c160dd6-586c-4258-84bb-c79933b9ae02" /mnt/disk1 xfs defaults,noatime
UUID="3e7a0c28-7cf1-40dc-82a1-2f5cfb53f9a4" /mnt/disk2 xfs defaults,noatime

Change instance type

Change the instance type from t2.micro to t3.micro.

Launch Amazon Linux 2 instance

Stop instance.
Change the instance type to micro.
Start instance.
Connect to your instance using EC2 Instance Connect.
Check the device name, and run the following command:

lsblk

List files, and run the following command:

ls -l /mnt/*

Note that the device names are changed from xvd* to nvme*. All of the devices are mounted without any issue and with the correct mount points.

Cleaning up

To avoid incurring future charges, delete the instance and all of the volumes that you created in this post.

The other side

UUID is an attribute of the filesystem that was generated when you formatted your device. Therefore, it will follow your device even if you create an AMI or snapshot. So you don’t need to worry about a restore process, and it will smoothly proceed to an instance restore. You must be careful if you restore a snapshot from one volume and attach it to the same instance, as you will end-up with two volumes that are using the same UUID. If you try to mount the restored volume on the same instance, then it will fail and you will find this message on /var/log/messages file. kernel: XFS (xvdf1): Filesystem has duplicate UUID f6646b81-f2a6-46ca-9f3d-c746cf015379 - can't mount It is even more important to be careful if you attach a volume created from the snapshot of the root volume and restart your instance. Since both volumes have the same UUID, you may find that a volume other than the one attached to /dev/xvda or /dev/sda has become the root volume of your instance. See the following example for details. Note that both volumes have the same UUID, but the one mounted on / is /dev/xvdf1, not /dev/xvda1, which is the real root volume for this instance.

$ blkid
/dev/xvda1: LABEL="/" UUID="f6646b81-f2a6-46ca-9f3d-c746cf015379" TYPE="xfs" PARTLABEL="Linux" PARTUUID="79fae994-3708-4293-bb29-4d069d1c786b"
/dev/xvdf1: LABEL="/" UUID="f6646b81-f2a6-46ca-9f3d-c746cf015379" TYPE="xfs" PARTLABEL="Linux" PARTUUID="79fae994-3708-4293-bb29-4d069d1c786b"

$ lsblk
NAME    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda    202:0    0   8G  0 disk
└─xvda1 202:1    0   8G  0 part
xvdf    202:80   0   8G  0 disk
└─xvdf1 202:81   0   8G  0 part /

Conclusion

In this post, we covered how to use UUID to mount Linux devices using fstab file. This keeps the mount point on the correct device. It also lets you change the instance type without changes to the fstab file. You can use UUID with LVM and Linux software RAID (mdadm), UUID, as an attribute of the filesystem, will be the same even after a backup and restore process, snapshot, or clone. To learn more, check out our block device mappings and device names on Linux instances documentation.

Expanding Your EC2 Possibilities By Utilizing the CPU Launch Options

2022-02-04 limillan

Post Syndicated from limillan original https://aws.amazon.com/blogs/compute/expanding-your-ec2-possibilities-by-utilizing-the-cpu-launch-options/

This post is written by: Matthew Brunton, Senior Solutions Architect – WWPS

To ensure our customers have the appropriate machines available for their workloads, AWS offers a wide range of hardware options that include hundreds of types of instances that help customers achieve the best price performance for their workloads. In some specialized circumstances, our customers need an even wider range of options, or more flexibility. This could be driven by a desire to optimize licensing costs, or the customer wanting more hardware configuration options. Some high performance workloads can improve by turning off simultaneous multithreading. In our AWS Well Architected Framework – High Performance Computing Lens we have the following recommendation “Unless an application has been tested with hyperthreading enabled, it is recommended that hyperthreading be disabled”. With these factors in mind, AWS offers the ability to configure some options regarding the CPU configuration in launched instances.

Our larger instance types that have a higher number of cores, and offer multithreaded cores will translate to a larger combination of potential options. The valid combinations of cores and threads per core can be found here. To consider utilizing the CPU options for both cores and threads per core, you will need to consider instance types that have multiple CPU’s and/or cores.

You can specify numerous CPU options for some of our larger instances via the console, command line interface, or the API. Moreover, you can remove CPUs from the launch configuration, or deactivate threading within CPUs that have multiple threads per core. Amazon Elastic Compute Cloud (Amazon EC2) FAQ’s for the Optimize CPU’s feature can be found here. You should be aware that this feature is only available during instance launch and cannot be modified after launch. The launch options persist after you reboot, stop, or start an instance.

You can easily determine how many CPUs and threads a machine has. There are numerous ways to see this via the AWS Management Console and the AWS Command Line Interface (CLI).

Within the AWS Management Console, under the ‘Instance Details’ section, opening up the ‘Host and placement group’ item reveals the number of vCPUs that your machine has.

Figure 1: Console details showing number of vCPU’s

This information is also available using the AWS CLI as follows:

aws ec2 describe-instances --region us-east-1 --filters "Name=instance-type,Values='c6i.*large'"

...
    "Instances": [
        {
            "Monitoring": {
                "State": "disabled"
            }, 
            "PrivateDnsName": "ip-172-31-44-121.ec2.internal",
 "PrivateIpAddress": "172.31.44.121",                              
 "State": {
                "Code": 16, 
                "Name": "running"
            }, 
            "EbsOptimized": false, 
            "LaunchTime": "2021-11-22T01:46:58+00:00",
            "ProductCodes": [], 
            "VpcId": " vpc-7f7f1502", 
            "CpuOptions": { "CoreCount": 32, "ThreadsPerCore": 1 }, 
            "StateTransitionReason": "", 
            ...
        }
    ]
...

Figure 2: ec2 describe-instances cli output

A handy AWS CLI command ‘describe-instance-types’ will show the list of valid cores, the possible threads-per-core, and the default values for the vCPUs and cores. These are listed in the ‘DefaultVCpus’ and ‘DefaultCores’ items in the returned JSON listed as follows.

aws ec2 describe-instance-types --region us-east-1 --filters "Name=instance-type,Values='c6i.*'"
{
    "InstanceTypes": [
        {
            "InstanceType": "c6i.4xlarge",
            "CurrentGeneration": true,
            "FreeTierEligible": false,
            "SupportedUsageClasses": [
                "on-demand",
                "spot"
            ],
            "SupportedRootDeviceTypes": [
                "ebs"
            ],
            "SupportedVirtualizationTypes": [
                "hvm"
            ],
            "BareMetal": false,
            "Hypervisor": "nitro",
            "ProcessorInfo": {
                "SupportedArchitectures": [
                    "x86_64"
                ],
                "SustainedClockSpeedInGhz": 3.5
            },
            "VCpuInfo": { "DefaultVCpus": 16, "DefaultCores": 8, "DefaultThreadsPerCore": 2, "ValidCores": [ 2, 4, 6, 8 ], "ValidThreadsPerCore": [ 1, 2 ] },
            "MemoryInfo": {
                "SizeInMiB": 32768
            },

Figure 3: ec2 describe-instance-types cli output

To utilize the AWS CLI to launch one or multiple instances, you should use the run instances CLI command.

The following is the shorthand syntax:

aws ec2 run-instances --image-id xxx --instance-type xxx --cpu-options “CoreCount=xx,ThreadsPerCore=xx” --key-name xxx --region xxx

For example, the following command launches a 32-core machine with only 1 thread per core instead of the standard 2 threads per core:

aws ec2 run-instances --image-id ami-0c2b8ca1dad447f8a --instance-type c6i.16xlarge
--cpu-options " CoreCount =32, ThreadsPerCore =1" --key-name MyKeyPair --region xxx

If you are using the CPU options parameters in CloudFormation templates, then the following applies:

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ec2-instance-cpuoptions.html

The following is an example of the YAML syntax for specifying the CPU configuration.

Resources:
  CustomEC2Instance:
    Type: "AWS::EC2::Instance"
    Properties:
      InstanceType: xxx
      ImageId: xxx
      CpuOptions:
CoreCount: xx
ThreadsPerCore: x

As can be seen in the following table, there are a number of valid CPU core options, as well as the option to set one or two threads per core for each CPU for certain instance types. This significantly opens up the number of combinations and permutations to meet your specific workload need. In the case of the c6i instance listed in the following table, there are an additional 32 CPU core and threading combinations available to customers.

Instance type	Default vCPUs	Default CPU cores	Default threads per core	Valid CPU cores	Valid threads per core
c6i.16xlarge	64	32	2	2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32	1, 2

Figure 4: Valid Core and Thread Launch Options

Note that you still pay the same amount for the EC2 instance, even if you deactivate some of the cores or threads.

Conclusion

This approach can let customers customize the EC2 hardware options even further to ensure a wider range of CPU/memory combinations over and above the already extensive AWS instance options. This lets customers finely tune hardware for their exact application requirements, whether that is running High Performance workloads or trying to save money where license restrictions mean you can benefit from a specific CPU configuration.

We have run through various scenarios in this post, which detailed how to launch instances with alternate CPU configurations, easily check the current configuration of your running instances via our API and console, and how to configure the options in CloudFormation templates.

We love to see our customers maximizing the flexibility of the AWS platform to deliver outstanding results. Have a look at some of your High Performance workloads and give the threading options a try, or take a deep dive into any of your more expensive licenses to see if you could benefit from an alternate CPU configuration. In order to get started, check out our detailed developer documentation for the optimize CPU options.

Interview with the Head of the NSA’s Research Directorate

2022-02-03 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/02/interview-with-the-head-of-the-nsas-research-directorate.html

MIT Technology Review published an interview with Gil Herrera, the new head of the NSA’s Research Directorate. There’s a lot of talk about quantum computing, monitoring 5G networks, and the problems of big data:

The math department, often in conjunction with the computer science department, helps tackle one of NSA’s most interesting problems: big data. Despite public reckoning over mass surveillance, NSA famously faces the challenge of collecting such extreme quantities of data that, on top of legal and ethical problems, it can be nearly impossible to sift through all of it to find everything of value. NSA views the kind of “vast access and collection” that it talks about internally as both an achievement and its own set of problems. The field of data science aims to solve them.

“Everyone thinks their data is the messiest in the world, and mine maybe is because it’s taken from people who don’t want us to have it, frankly,” said Herrera’s immediate predecessor at the NSA, the computer scientist Deborah Frincke, during a 2017 talk at Stanford. “The adversary does not speak clearly in English with nice statements into a mic and, if we can’t understand it, send us a clearer statement.”

Making sense of vast stores of unclear, often stolen data in hundreds of languages and even more technical formats remains one of the directorate’s enduring tasks.

Finding Vulnerabilities in Open Source Projects

2022-02-02 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/02/finding-vulnerabilities-in-open-source-projects.html

The Open Source Security Foundation announced $10 million in funding from a pool of tech and financial companies, including $5 million from Microsoft and Google, to find vulnerabilities in open source projects:

The “Alpha” side will emphasize vulnerability testing by hand in the most popular open-source projects, developing close working relationships with a handful of the top 200 projects for testing each year. “Omega” will look more at the broader landscape of open source, running automated testing on the top 10,000.

This is an excellent idea. This code ends up in all sorts of critical applications.

Log4j would be a prototypical vulnerability that the Alpha team might look for – an unknown problem in a high-impact project that automated tools would not be able to pick up before a human discovered it. The goal is not to use the personnel engaged with Alpha to replicate dependency analysis, for example.

Me on App Store Monopolies and Security

2022-02-01 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/02/me-on-app-store-monopolies-and-security.html

There are two bills working their way through Congress that would force companies like Apple to allow competitive app stores. Apple hates this, since it would break its monopoly, and it’s making a variety of security arguments to bolster its argument. I have written a rebuttal:

I would like to address some of the unfounded security concerns raised about these bills. It’s simply not true that this legislation puts user privacy and security at risk. In fact, it’s fairer to say that this legislation puts those companies’ extractive business-models at risk. Their claims about risks to privacy and security are both false and disingenuous, and motivated by their own self-interest and not the public interest. App store monopolies cannot protect users from every risk, and they frequently prevent the distribution of important tools that actually enhance security. Furthermore, the alleged risks of third-party app stores and “side-loading” apps pale in comparison to their benefits. These bills will encourage competition, prevent monopolist extortion, and guarantee users a new right to digital self-determination.

Matt Stoller has also written about this.

EDITED TO ADD (2/13): Here are the two bills.

Mocking service integrations with AWS Step Functions Local

2022-01-31 Benjamin Smith

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/mocking-service-integrations-with-aws-step-functions-local/

This post is written by Sam Dengler, Principal Specialist Solutions Architect, and Dhiraj Mahapatro, Senior Specialist Solutions Architect.

AWS Step Functions now supports over 200 AWS Service integrations via AWS SDK Integration. Developers want to build and test control flow logic for workflows using branching logic, error handling, and retries. This allows for precise workflow execution with deterministic results. Additionally, developers use Step Functions’ input and output processing features to transform data as it enters and exits tasks.

Developers can test their state machines locally using Step Functions Local before deploying them to an AWS account. However state machines that use service integrations like AWS Lambda, Amazon SQS, or Amazon SNS require Step Functions Local to perform calls to AWS service endpoints. Often, developers want to test the control and data flows of their state machine executions in isolation, without any dependency on service integration availability.

Today, AWS is releasing Mocked Service Integrations for Step Functions Local. This allows developers to define sample outputs from AWS service integrations. You can combine them into test case scenarios to validate workflow control and data flow definitions. You can find the code used in this post in the Step Functions examples GitHub repository.

Sales lead generation sample workflow

In this example, new sales leads are created in a customer relationship management system. This triggers the sample workflow execution using input data, which provides information about the contact.

Using the sales lead data, the workflow first validates the contact’s identity and address. If valid, it uses Step Functions’ AWS SDK integration for Amazon Comprehend to call the DetectSentiment API. It uses the sales lead’s comments as input for sentiment analysis.

If the comments have a positive sentiment, it adds the sales leads information to a DynamoDB table for follow-up. The event is published to Amazon EventBridge to notify subscribers.

If the sales lead data is invalid or a negative sentiment is detected, it publishes events to EventBridge for notification. No record is added to the Amazon DynamoDB table. The following Step Functions Workflow Studio diagram shows the control logic:

The full workflow definition is available in the code repository. Note the workflow task names in the diagram, such as DetectSentiment, which are important when defining the mocked responses.

Sentiment analysis test case

In this example, you test a scenario in which:

The identity and address are successfully validated using a Lambda function.
A positive sentiment is detected using the Comprehend.DetectSentiment API after three retries.
A contact item is written to a DynamoDB table successfully
An event is published to an EventBridge event bus successfully

The execution path for this test scenario is shown in the following diagram (the red and green numbers have been added). 0 represents the first execution; 1, 2, and 3 represent the max retry attempts (MaxAttempts), in case of an InternalServerException.

Mocked response configuration

To use service integration mocking, create a mock configuration file with sections specifying mock AWS service responses. These are grouped into test cases that can be activated when executing state machines locally. The following example provides code snippets and the full mock configuration is available in the code repository.

To mock a successful Lambda function invocation, define a mock response that conforms to the Lambda.Invoke API response elements. Associate it to the first request attempt:

"CheckIdentityLambdaMockedSuccess": {
  "0": {
    "Return": {
      "StatusCode": 200,
      "Payload": {
        "statusCode": 200,
        "body": "{\"approved\":true,\"message\":\"identity validation passed\"
}"
      }
    }
  }
}

To mock the DetectSentiment retry behavior, define failure and successful mock responses that conform to the Comprehend.DetectSentiment API call. Associate the failure mocks to three request attempts, and associate the successful mock to the fourth attempt:

"DetectSentimentRetryOnErrorWithSuccess": {
  "0-2": {
    "Throw": {
      "Error": "InternalServerException",
      "Cause": "Server Exception while calling DetectSentiment API in Comprehend Service"
    }
  },
  "3": {
    "Return": {
      "Sentiment": "POSITIVE",
      "SentimentScore": {
        "Mixed": 0.00012647535,
        "Negative": 0.00008031699,
        "Neutral": 0.0051454515,
        "Positive": 0.9946478
      }
    }
  }
}

Note that Step Functions Local does not validate the structure of the mocked responses. Ensure that your mocked responses conform to actual responses before testing. To review the structure of service responses, either perform the actual service calls using Step Functions or view the documentation for those services.

Next, associate the mocked responses to a test case identifier:

"RetryOnServiceExceptionTest": {
  "Check Identity": "CheckIdentityLambdaMockedSuccess",
  "Check Address": "CheckAddressLambdaMockedSuccess",
  "DetectSentiment": "DetectSentimentRetryOnErrorWithSuccess",
  "Add to FollowUp": "AddToFollowUpSuccess",
  "CustomerAddedToFollowup": "CustomerAddedToFollowupSuccess"
}

With the test case and mock responses configured, you can use them for testing with Step Functions Local.

Test case execution using Step Functions Local

The Step Functions Developer Guide describes the steps used to set up Step Functions Local on your workstation and create a state machine.

After these steps are complete, you can run a workflow locally using the start-execution AWS CLI command. Activate the mocked responses by appending a pound sign and the test case identifier to the state machine ARN:

aws stepfunctions start-execution \
  --endpoint http://localhost:8083 \
  --state-machine arn:aws:states:us-east-1:123456789012:stateMachine: LeadGenerationStateMachine#RetryOnServiceExceptionTest \
  --input file://events/sfn_valid_input.json

Test case validation

To validate the workflow executed correctly in the test case, examine the state machine execution events using the StepFunctions.GetExecutionHistory API. This ensures that the correct states are used. There are a variety of validation tools available. This post shows how to achieve this using the AWS CLI filtering feature using JMESPath syntax.

In this test case, you validate the TaskFailed and TaskSucceeded events match the retry definition for the DetectSentiment task, which specifies three retries. Use the following AWS CLI command to get the execution history and filter on the execution events:

aws stepfunctions get-execution-history \
  --endpoint http://localhost:8083 \
  --execution-arn <ExecutionArn>
  --query 'events[?(type==`TaskFailed` && contains(taskFailedEventDetails.cause, `Server Exception while calling DetectSentiment API in Comprehend Service`)) || (type==`TaskSucceeded` && taskSucceededEventDetails.resource==`comprehend:detectSentiment`)]'

The results include matching events:

{
  "timestamp": "2022-01-13T17:24:32.276000-05:00",
  "type": "TaskFailed",
  "id": 19,
  "previousEventId": 18,
  "taskFailedEventDetails": {
    "error": "InternalServerException",
    "cause": "Server Exception while calling DetectSentiment API in Comprehend Service"
  }
}

These results should be compared to the test acceptance criteria to verify the execution behavior. Test cases, acceptance criteria, and validation expressions vary by customer and use case. These techniques are flexible to accommodate various happy path and error scenarios. To explore additional sample test cases and examples, visit the example code repository.

Conclusion

This post introduces a new robust way to test AWS Step Functions state machines in isolation. With mocking, developers get more control over the type of scenarios that a state machine can handle, leading to assertion of multiple behaviors. Testing a state machine with mocks can also be part of the software release. Asserting on behaviors like error handling, branching, parallel, dynamic parallel (map state) helps test the entire state machine’s behavior. For any new behavior in the state machine, such as a new type of exception from a state, you can mock and add as a test.

See the Step Functions Developer Guide for more information on service mocking with Step Functions Local. The sample application covers basic scenarios of testing a state machine. You can use a similar approach for complex scenarios including other Step Functions flows, like map and wait.

For more serverless learning resources, visit Serverless Land.

Using the circuit breaker pattern with AWS Step Functions and Amazon DynamoDB

2022-01-31 Eric Johnson

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/using-the-circuit-breaker-pattern-with-aws-step-functions-and-amazon-dynamodb/

This post is written by Anitha Deenadayalan, Developer Specialist SA, DevAx

Modern applications use microservices as an architectural and organizational approach to software development, where the application comprises small independent services that communicate over well-defined APIs.

When multiple microservices collaborate to handle requests, one or more services may become unavailable or exhibit a high latency. Microservices communicate through remote procedure calls, and it is always possible that transient errors could occur in the network connectivity, causing failures.

This can cause performance degradation in the entire application during synchronous execution because of the cascading of timeouts or failures causing poor user experience. When complex applications use microservices, an outage in one microservice can lead to application failure. This post shows how to use the circuit breaker design pattern to help with a graceful service degradation.

Introducing circuit breakers

Michael Nygard popularized the circuit breaker pattern in his book, Release It. This design pattern can prevent a caller service from retrying another callee service call that has previously caused repeated timeouts or failures. It can also detect when the callee service is functional again.

Fallacies of distributed computing are a set of assertions made by Peter Deutsch and others at Sun Microsystems. They say the programmers new to distributed applications invariably make false assumptions. The network reliability, zero-latency expectations, and bandwidth limitations result in software applications written with minimal error handling for network errors.

During a network outage, applications may indefinitely wait for a reply and continually consume application resources. Failure to retry the operations when the network becomes available can also lead to application degradation. If API calls to a database or an external service time-out due to network issues, repeated calls with no circuit breaker can affect cost and performance.

The circuit breaker pattern

There is a circuit breaker object that routes the calls from the caller to the callee in the circuit breaker pattern. For example, in an ecommerce application, the order service can call the payment service to collect the payments. When there are no failures, the order service routes all calls to the payment service by the circuit breaker:

Circuit breaker with no failures

If the payment service times out, the circuit breaker can detect the timeout and track the failure. If the timeouts exceed a specified threshold, the application opens the circuit:

Circuit breaker with payment service failure

Once the circuit is open, the circuit breaker object does not route the calls to the payment service. It returns an immediate failure when the order service calls the payment service:

Circuit breaker stops routing to payment service

The circuit breaker object periodically tries to see if the calls to the payment service are successful:

Circuit breaker retries payment service

When the call to payment service succeeds, the circuit is closed, and all further calls are routed to the payment service again:

Circuit breaker with working payment service again

Architecture overview

This example uses the AWS Step Functions, AWS Lambda, and Amazon DynamoDB to implement the circuit breaker pattern:

Circuit breaker architecture

The Step Functions workflow provides circuit breaker capabilities. When a service wants to call another service, it starts the workflow with the name of the callee service.

The workflow gets the circuit status from the CircuitStatus DynamoDB table, which stores the currently degraded services. If the CircuitStatus contains a record for the service called, then the circuit is open. The Step Functions workflow returns an immediate failure and exit with a FAIL state.

If the CircuitStatus table does not contain an item for the called service, then the service is operational. The ExecuteLambda step in the state machine definition invokes the Lambda function sent through a parameter value. The Step Functions workflow exits with a SUCCESS state, if the call succeeds.

The items in the DynamoDB table have the following attributes:

DynamoDB items list

If the service call fails or a timeout occurs, the application retries with exponential backoff for a defined number of times. If the service call fails after the retries, the workflow inserts a record in the CircuitStatus table for the service with the CircuitStatus as OPEN, and the workflow exits with a FAIL state. Subsequent calls to the same service return an immediate failure as long as the circuit is open.

I enter the item with an associated time-to-live (TTL) value to ensure eventual connection retries and the item expires at the defined TTL time. DynamoDB’s time to live (TTL) allows you to define a per-item timestamp to determine when an item is no longer needed. Shortly after the date and time of the specified timestamp, DynamoDB deletes the item from your table without consuming write throughput.

For example, if you set the TTL value to 60 seconds to check a service status after a minute, DynamoDB deletes the item from the table after 60 seconds. The workflow invokes the service to check for availability when a new call comes in after the item has expired.

Circuit breaker Step Function

Prerequisites

For this walkthrough, you need:

An AWS account and an AWS user with AdministratorAccess (see the instructions on the AWS Identity and Access Management (IAM) console)
Access to the following AWS services: AWS Lambda, AWS Step Functions, and Amazon DynamoDB.
AWS SAM CLI using the instructions here.
NET Core 3.1 SDK installed
JetBrains Rider or Microsoft Visual Studio 2017 or later (or Visual Studio Code)

Setting up the environment

Use the .NET Core 3.1 code in the GitHub repository and the AWS SAM template to create the AWS resources for this walkthrough. These include IAM roles, DynamoDB table, the Step Functions workflow, and Lambda functions.

You need an AWS access key ID and secret access key to configure the AWS Command Line Interface (AWS CLI). To learn more about configuring the AWS CLI, follow these instructions.
Clone the repo:
git clone https://github.com/aws-samples/circuit-breaker-netcore-blog
After cloning, this is the folder structure:

Project file structure

Deploy using Serverless Application Model (AWS SAM)

The AWS Serverless Application Model (AWS SAM) CLI provides developers with a local tool for managing serverless applications on AWS.

The sam build command processes your AWS SAM template file, application code, and applicable language-specific files and dependencies. The command copies build artifacts in the format and location expected for subsequent steps in your workflow. Run these commands to process the template file:
```
cd circuit-breaker
sam build
```
After you build the application, test using the sam deploy command. AWS SAM deploys the application to AWS and displays the output in the terminal.
```
sam deploy --guided
```
Output from sam deploy
You can also view the output in AWS CloudFormation page.

Output in CloudFormation console
The Step Functions workflow provides the circuit-breaker function. Refer to the circuitbreaker.asl.json file in the statemachine folder for the state machine definition in the Amazon States Language (ASL).

To deploy with the CDK, refer to the GitHub page.

Running the service through the circuit breaker

To provide circuit breaker capabilities to the Lambda microservice, you must send the name or function ARN of the Lambda function to the Step Functions workflow:

{
  "TargetLambda": "<Name or ARN of the Lambda function>"
}

Successful run

To simulate a successful run, use the HelloWorld Lambda function provided by passing the name or ARN of the Lambda function the stack has created. Your input appears as follows:

{
  "TargetLambda": "circuit-breaker-stack-HelloWorldFunction-pP1HNkJGugQz"
}

During the successful run, the Get Circuit Status step checks the circuit status against the DynamoDB table. Suppose that the circuit is CLOSED, which is indicated by zero records for that service in the DynamoDB table. In that case, the Execute Lambda step runs the Lambda function and exits the workflow successfully.

Step Function with closed circuit

Service timeout

To simulate a timeout, use the TestCircuitBreaker Lambda function by passing the name or ARN of the Lambda function the stack has created. Your input appears as:

{
  "TargetLambda": "circuit-breaker-stack-TestCircuitBreakerFunction-mKeyyJq4BjQ7"
}

Again, the circuit status is checked against the DynamoDB table by the Get Circuit Status step in the workflow. The circuit is CLOSED during the first pass, and the Execute Lambda step runs the Lambda function and timeout.

The workflow retries based on the retry count and the exponential backoff values, and finally returns a timeout error. It runs the Update Circuit Status step where a record is inserted in the DynamoDB table for that service, with a predefined time-to-live value specified by TTL attribute ExpireTimeStamp.

Step Function with open circuit

Repeat timeout

As long as there is an item for the service in the DynamoDB table, the circuit breaker workflow returns an immediate failure to the calling service. When you re-execute the call to the Step Functions workflow for the TestCircuitBreaker Lambda function within 20 seconds, the circuit is still open. The workflow immediately fails, ensuring the stability of the overall application performance.

Step Function workflow immediately fails until retry

The item in the DynamoDB table expires after 20 seconds, and the workflow retries the service again. This time, the workflow retries with exponential backoffs, and if it succeeds, the workflow exits successfully.

Cleaning up

To avoid incurring additional charges, clean up all the created resources. Run the following command from a terminal window. This command deletes the created resources that are part of this example.

sam delete --stack-name circuit-breaker-stack --region <region name>

Conclusion

This post showed how to implement the circuit breaker pattern using Step Functions, Lambda, DynamoDB, and .NET Core 3.1. This pattern can help prevent system degradation in service failures or timeouts. Step Functions and the TTL feature of DynamoDB can make it easier to implement the circuit breaker capabilities.

To learn more about developing microservices on AWS, refer to the whitepaper on microservices. To learn more about serverless and AWS SAM, visit the Sessions with SAM series and find more resources at Serverless Land.

Twelve-Year-Old Linux Vulnerability Discovered and Patched

2022-01-31 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/01/twelve-year-old-linux-vulnerability-discovered-and-patched.html

It’s a privilege escalation vulnerability:

Linux users on Tuesday got a major dose of bad news — a 12-year-old vulnerability in a system tool called Polkit gives attackers unfettered root privileges on machines running most major distributions of the open source operating system.

Previously called PolicyKit, Polkit manages system-wide privileges in Unix-like OSes. It provides a mechanism for nonprivileged processes to safely interact with privileged processes. It also allows users to execute commands with high privileges by using a component called pkexec, followed by the command.

It was discovered in October, and disclosed last week — after most Linux distributions issued patches. Of course, there’s lots of Linux out there that never gets patched, so expect this to be exploited in the wild for a long time.

Of course, this vulnerability doesn’t give attackers access to the system. They have to get that some other way. But if they get access, this vulnerability gives them root privileges.

Friday Squid Blogging: Cephalopods Thirty Million Years Older Than Previously Thought

2022-01-29 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/01/friday-squid-blogging-cephalopods-thirty-million-years-older-than-previously-thought.html

New fossils from Newfoundland push the origins of cephalopods to 522 million years ago.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Read my blog posting guidelines here.

Tracking Secret German Organizations with Apple AirTags

2022-01-28 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/01/tracking-secret-german-organizations-with-apple-airtags.html

A German activist is trying to track down a secret government intelligence agency. One of her research techniques is to mail Apple AirTags to see where they actually end up:

Wittmann says that everyone she spoke to denied being part of this intelligence agency. But what she describes as a “good indicator,” would be if she could prove that the postal address for this “federal authority” actually leads to the intelligence service’s apparent offices.

“To understand where mail ends up,” she writes (in translation), “[you can do] a lot of manual research. Or you can simply send a small device that regularly transmits its current position (a so-called AirTag) and see where it lands.”

She sent a parcel with an AirTag and watched through Apple’s Find My system as it was delivered via the Berlin sorting center to a sorting office in Cologne-Ehrenfeld. And then appears at the Office for the Protection of the Constitution in Cologne.

So an AirTag addressed to a telecommunications authority based in one part of Germany, ends up in the offices of an intelligence agency based in another part of the country.

Wittmann’s research is also now detailed in the German Wikipedia entry for the federal telecommunications service. It recounts how following her original discovery in December 2021, subsequent government press conferences have denied that there is such a federal telecommunications service at all.

Here’s the original Medium post, in German.

In a similar story, someone used an AirTag to track her furniture as a moving company lied about its whereabouts.

New DeadBolt Ransomware Targets NAT Devices

2022-01-26 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/01/new-deadbolt-ransomware-targets-nat-devices.html

There’s a new ransomware that targets NAT devices made by QNAP:

The attacks started today, January 25th, with QNAP devices suddenly finding their files encrypted and file names appended with a .deadbolt file extension.

Instead of creating ransom notes in each folder on the device, the QNAP device’s login page is hijacked to display a screen stating, “WARNING: Your files have been locked by DeadBolt”….

[…]

BleepingComputer is aware of at least fifteen victims of the new DeadBolt ransomware attack, with no specific region being targeted.

As with all ransomware attacks against QNAP devices, the DeadBolt attacks only affect devices accessible to the Internet.

As the threat actors claim the attack is conducted through a zero-day vulnerability, it is strongly advised that all QNAP users disconnect their devices from the Internet and place them behind a firewall.

Merck Wins Insurance Lawsuit re NotPetya Attack

2022-01-25 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/01/merck-wins-insurance-lawsuit-re-notpetya-attack.html

The insurance company Ace American has to pay for the losses:

On 6th December 2021, the New Jersey Superior Court granted partial summary judgment (attached) in favour of Merck and International Indemnity, declaring that the War or Hostile Acts exclusion was inapplicable to the dispute.

Merck suffered US$1.4 billion in business interruption losses from the Notpetya cyber attack of 2017 which were claimed against “all risks” property re/insurance policies providing coverage for losses resulting from destruction or corruption of computer data and software.

The parties disputed whether the Notpetya malware which affected Merck’s computers in 2017 was an instrument of the Russian government, so that the War or Hostile Acts exclusion would apply to the loss.

The Court noted that Merck was a sophisticated and knowledgeable party, but there was no indication that the exclusion had been negotiated since it was in standard language. The Court, therefore, applied, under New Jersey law, the doctrine of construction of insurance contracts that gives prevalence to the reasonable expectations of the insured, even in exceptional circumstances when the literal meaning of the policy is plain.

Merck argued that the attack was not “an official state action,” which I’m surprised wasn’t successfully disputed.

Slashdot thread.

Linux-Targeted Malware Increased by 35%

2022-01-24 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/01/linux-targeted-malware-increased-by-35.html

Crowdstrike is reporting that malware targeting Linux has increased considerably in 2021:

Malware targeting Linux systems increased by 35% in 2021 compared to 2020.

XorDDoS, Mirai and Mozi malware families accounted for over 22% of Linux-targeted threats observed by CrowdStrike in 2021.

Ten times more Mozi malware samples were observed in 2021 compared to 2020.

Lots of details in the report.

News article:

The Crowdstrike findings aren’t surprising as they confirm an ongoing trend that emerged in previous years.

For example, an Intezer report analyzing 2020 stats found that Linux malware families increased by 40% in 2020 compared to the previous year.

In the first six months of 2020, a steep rise of 500% in Golang malware was recorded, showing that malware authors were looking for ways to make their code run on multiple platforms.

This programming, and by extension, targeting trend, has already been confirmed in early 2022 cases and is likely to continue unabated.

Slashdot thread.

Friday Squid Blogging: Piglet Squid

2022-01-22 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/01/friday-squid-blogging-piglet-squid.html

Nice article on the piglet squid.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Read my blog posting guidelines here.

China’s Olympics App Is Horribly Insecure

2022-01-21 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/01/chinas-olympics-app-is-horribly-insecure.html

China is mandating that athletes download and use a health and travel app when they attend the Winter Olympics next month. Citizen Lab examined the app and found it riddled with security holes.

Key Findings:

MY2022, an app mandated for use by all attendees of the 2022 Olympic Games in Beijing, has a simple but devastating flaw where encryption protecting users’ voice audio and file transfers can be trivially sidestepped. Health customs forms which transmit passport details, demographic information, and medical and travel history are also vulnerable. Server responses can also be spoofed, allowing an attacker to display fake instructions to users.
MY2022 is fairly straightforward about the types of data it collects from users in its public-facing documents. However, as the app collects a range of highly sensitive medical information, it is unclear with whom or which organization(s) it shares this information.
MY2022 includes features that allow users to report “politically sensitive” content. The app also includes a censorship keyword list, which, while presently inactive, targets a variety of political topics including domestic issues such as Xinjiang and Tibet as well as references to Chinese government agencies.
While the vendor did not respond to our security disclosure, we find that the app’s security deficits may not only violate Google’s Unwanted Software Policy and Apple’s App Store guidelines but also China’s own laws and national standards pertaining to privacy protection, providing potential avenues for future redress.

News article:

It’s not clear whether the security flaws were intentional or not, but the report speculated that proper encryption might interfere with some of China’s ubiquitous online surveillance tools, especially systems that allow local authorities to snoop on phones using public wireless networks or internet cafes. Still, the researchers added that the flaws were probably unintentional, because the government will already be receiving data from the app, so there wouldn’t be a need to intercept the data as it was being transferred.

[…]

The app also included a list of 2,422 political keywords, described within the code as “illegalwords.txt,” that worked as a keyword censorship list, according to Citizen Lab. The researchers said the list appeared to be a latent function that the app’s chat and file transfer function was not actively using.

The US government has already advised athletes to leave their personal phones and laptops home and bring burners.

San Francisco Police Illegally Spying on Protesters

2022-01-20 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/01/san-francisco-police-illegally-spying-on-protesters.html

Last summer, the San Francisco police illegally used surveillance cameras at the George Floyd protests. The EFF is suing the police:

This surveillance invaded the privacy of protesters, targeted people of color, and chills and deters participation and organizing for future protests. The SFPD also violated San Francisco’s new Surveillance Technology Ordinance. It prohibits city agencies like the SFPD from acquiring, borrowing, or using surveillance technology, without prior approval from the city’s Board of Supervisors, following an open process that includes public participation. Here, the SFPD went through no such process before spying on protesters with this network of surveillance cameras.

It’s feels like a pretty easy case. There’s a law, and the SF police didn’t follow it.

Tech billionaire Chris Larsen is on the side of the police. He thinks that the surveillance is a good thing, and wrote an op-ed defending it.

I wouldn’t be writing about this at all except that Chris is a board member of EPIC, and used his EPIC affiliation in the op-ed to bolster his own credentials. (Bizarrely, he linked to an EPIC page that directly contradicts his position.) In his op-ed, he mischaracterized the EFF’s actions and the facts of the lawsuit. It’s a mess.

The plaintiffs in the lawsuit wrote a good rebuttal to Larsen’s piece. And this week, EPIC published what is effectively its own rebuttal:

One of the fundamental principles that underlies EPIC’s work (and the work of many other groups) on surveillance oversight is that individuals should have the power to decide whether surveillance tools are used in their communities and to impose limits on their use. We have fought for years to shed light on the development, procurement, and deployment of such technologies and have worked to ensure that they are subject to independent oversight through hearings, legal challenges, petitions, and other public forums. The CCOPS model, which was developed by ACLU affiliates and other coalition partners in California and implemented through the San Francisco ordinance, is a powerful mechanism to enable public oversight of dangerous surveillance tools. The access, retention, and use policies put in place by the neighborhood business associations operating these networks provide necessary, but not sufficient, protections against abuse. Strict oversight is essential to promote both privacy and community safety, which includes freedom from arbitrary police action and the freedom to assemble.

So far, EPIC has not done anything about Larsen still being on its board. (Others have criticized them for keeping him on.) I don’t know if I have an opinion on this. Larsen has done good work on financial privacy regulations, which is a good thing. But he seems to be funding all these surveillance cameras in San Francisco, which is really bad.

Are Fake COVID Testing Sites Harvesting Data?

2022-01-19 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/01/are-fake-covid-testing-sites-harvesting-data.html

Over the past few weeks, I’ve seen a bunch of writing about what seems to be fake COVID-19 testing sites. They take your name and info, and do a nose swab, but you never get test results. Speculation centered around data harvesting, but that didn’t make sense because it was far too labor intensive for that and — sorry to break it to you — your data isn’t worth all that much.

It seems to be multilevel marketing fraud instead:

The Center for COVID Control is a management company to Doctors Clinical Laboratory. It provides tests and testing supplies, software, personal protective equipment and marketing services — online and printed — to testing sites, said a person who was formerly associated with the Center for COVID Control. Some of the sites are owned independently but operate in partnership with the chain under its name and with its guidance.

[…]

Doctors Clinical Lab, the lab Center for COVID Control uses to process tests, makes money by billing patients’ insurance companies or seeking reimbursement from the federal government for testing. Insurance statements reviewed by Block Club show the lab has, in multiple instances, billed insurance companies $325 for a PCR test, $50 for a rapid test, $50 for collecting a person’s sample and $80 for a “supplemental fee.”

In turn, the testing sites are paid for providing samples to the lab to be processed, said a person formerly associated with the Center for COVID Control.

In a January video talking to testing site operators, Syed said the Center for COVID Control will no longer provide them with PCR tests, but it will continue supplying them with rapid tests at a cost of $5 per test. The companies will keep making money for the rapid tests they collect, he said.

“You guys will continue making the $28.50 you’re making for the rapid test,” Syed said in the video.

Read the article for the messy details. Or take a job and see for yourself.

UK Government to Launch PR Campaign Undermining End-to-End Encryption

2022-01-18 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/01/uk-government-to-launch-pr-campaign-undermining-end-to-end-encryption.html

Rolling Stone is reporting that the UK government has hired the M&C Saatchi advertising agency to launch an anti-encryption advertising campaign. Presumably they’ll lean heavily on the “think of the children!” rhetoric we’re seeing in this current wave of the crypto wars. The technical eavesdropping mechanisms have shifted to client-side scanning, which won’t actually help — but since that’s not really the point, it’s not argued on its merits.

An Examination of the Bug Bounty Marketplace

2022-01-17 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/01/an-examination-of-the-bug-bounty-marketplace.html

Here’s a fascinating report: “Bounty Everything: Hackers and the Making of the Global Bug Marketplace.” From a summary:

…researchers Ryan Ellis and Yuan Stevens provide a window into the working lives of hackers who participate in “bug bounty” programs — programs that hire hackers to discover and report bugs or other vulnerabilities in their systems. This report illuminates the risks and insecurities for hackers as gig workers, and how bounty programs rely on vulnerable workers to fix their vulnerable systems.

Ellis and Stevens’s research offers a historical overview of bounty programs and an analysis of contemporary bug bounty platforms — the new intermediaries that now structure the vast majority of bounty work. The report draws directly from interviews with hackers, who recount that bounty programs seem willing to integrate a diverse workforce in their practices, but only on terms that deny them the job security and access enjoyed by core security workforces. These inequities go far beyond the difference experienced by temporary and permanent employees at companies such as Google and Apple, contend the authors. The global bug bounty workforce is doing piecework — they are paid for each bug, and the conditions under which a bug is paid vary greatly from one company to the next.

Overview of solution

Walkthrough

Prerequisites

Create instance

Launch Amazon Linux 2 instance

Create filesystem and mount

Format and mount the device

Change instance type

Launch Amazon Linux 2 instance

Cleaning up

The other side

Conclusion

Sales lead generation sample workflow

Sentiment analysis test case

Mocked response configuration

Test case execution using Step Functions Local

Test case validation

Conclusion

Introducing circuit breakers

The circuit breaker pattern

Architecture overview

Prerequisites

Setting up the environment

Deploy using Serverless Application Model (AWS SAM)

Running the service through the circuit breaker

Successful run

Service timeout

Repeat timeout

Cleaning up

Conclusion

The collective thoughts of the interwebz