AWS CloudFormation allows developers and systems administrators to easily create and manage a collection of related AWS resources (called a CloudFormation stack) by provisioning and updating them in an orderly and predictable way. CloudFormation users can now deploy and manage AWS Batch resources in exactly the same way that they are managing the rest of their AWS infrastructure.
This post highlights the native resources supported in CloudFormation and demonstrates how to create AWS Batch compute environments using CloudFormation. All sample CloudFormation, per-region templates related to this post can be found on the CloudFormation sample template site. The Ohio (us-east-2) Region is used as the example region for the remainder of this post.
AWS Batch Resources
AWS Batch is a managed service that helps you efficiently run batch computing workloads on the AWS Cloud. Users submit jobs to job queues, specifying the application to be run and their jobs’ CPU and memory requirements. AWS Batch is responsible for launching the appropriate quantity and types of instances needed to run your jobs.
AWS Batch removes the undifferentiated heavy lifting of configuring and managing compute infrastructure, allowing you to instead focus on your applications and users. This is demonstrated in the How AWS Batch Works video.
AWS Batch manages the following resources:
Job definitions
Job queues
Compute environments
A job definition specifies how jobs are to be run—for example, which Docker image to use for your job, how many vCPUs and how much memory is required, the IAM role to be used, and more.
Jobs are submitted to job queues where they reside until they can be scheduled to run on Amazon EC2 instances within a compute environment. An AWS account can have multiple job queues, each with varying priority. This gives you the ability to closely align the consumption of compute resources with your organizational requirements.
Compute environments provision and manage your EC2 instances and other compute resources that are used to run your AWS Batch jobs. Job queues are mapped to one more compute environments and a given environment can also be mapped to one or more job queues. This many-to-many relationship is defined by the compute environment order and job queue priority properties.
The following diagram shows a general overview of how the AWS Batch resources interact.
CloudFormation stack creation and updates
Upon the creation of your stack, an AWS Batch job definition is registered using your CloudFormation template. If a job definition with the same name has already been registered, a new revision is created. On stack updates, any changes to your job definition specifications in the CloudFormation template result in a new revision of that job definition and a deregistration of the previous job definition revision. Stack deletions only result in the deregistration of your job definition, as AWS Batch does not delete job definitions.
At the stack creation, a job queue is created using the template. Any changes to your job queue properties within the stack result in a call to the UpdateJobQueue API action. Similarly, stack deletions result in the deletion of job queues from your AWS Batch compute environment.
CloudFormation creates an AWS Batch compute environment using the properties specified in your template. Stack updates result in updates to your compute environment where possible. If you need to change a parameter that is not supported by the UpdateComputeEnvironment API action, stack updates result in the deletion and re-creation of your compute environment. Upon stack deletion, your compute environment is disabled, and then deleted.
All naming conventions specified by CloudFormation should be followed—especially in the case of resource replacement—or you run the risk of a failed stack changes. For example, all AWS Batch resource property names must be capitalized, and resource names must be changed in the case of resource replacement, as is the case in any CloudFormation stack.
If you do not provide values for ComputeEnvironmentName, JobQueueName, or JobDefinitionName in your template, a pseudo-random name is generated for you using the logical ID that you gave the resource in CloudFormation.
Launching a “Hello World” example stack
Here’s a familiar “Hello World” example of a CloudFormation stack with AWS Batch resources.
This example registers a simple job definition, a job queue that can accept job submissions, and a compute environment that contains the compute resources used to execute your job. The stack template also creates additional AWS resources that are required by AWS Batch:
An IAM service role that gives AWS Batch permissions to take the required actions on your behalf
An IAM ECS instance role
A VPC
A VPC subnet (though I’ve provided a general template, I suggest that this be a private subnet)
A security group
This stack can easily be deployed in the CloudFormation console, but I provide CLI commands that complete the stack creation for you. Use the Launch stack button or run the following command:
You can monitor the creation of the resources in your CloudFormation stack in the CloudFormation console, on the Events tab:
Confirm the successful creation of your stack by observing a CREATE_COMPLETE status. At this point, you should also be able to view the new resource ARNs on the Outputs tab:
After your stack is successfully created, everything that you need to submit a “hello-world” job is complete.
Make sure to use the accurate job definition name and revision number. You can find the accurate Amazon Resource Name (ARN) on the CloudFormation stack Outputs tab. A pseudo-random resource name is generated for your AWS Batch resources. If you do have an existing hello-world job definition, make sure that you run the command with the job definition revision created by your new CloudFormation stack from the stack outputs.
You can monitor the successful execution of the job in the AWS Batch console under Jobs:
When you are done using this stack and want to delete the resources, run the following command. CloudFormation deregisters the job definition, and deletes the job queue, compute environment, and the rest of the resources in the stack template.
aws – region us-east-2 cloudformation delete-stack – stack-name hello-world-batch-stack
Now that you know the basics of AWS Batch resources, here’s a more complex example.
High– and low-priority job queues with On-Demand and Spot compute environments
This CloudFormation stack creates two job queues with varying priority and two compute environments. You have one On-Demand compute environment and one Spot compute environment with a Spot price at 40% of On-Demand.
The first job queue is higher priority and feeds jobs to both compute environments, while the lower priority job queue only submits jobs for execution to the Spot compute environment.
There are two job definitions, one high-priority job queue and one low-priority job queue. Each job submitted using a given job definition is submitted to a job queue. For example, jobs submitted with an important-production-application job definition are submitted to the high priority job queue, while jobs submitted with a test-application job definition are submitted to the low priority job queue.
This example registers both job definitions and creates your compute environments and job queues. It also creates the VPC, subnet, security group, IAM service role for AWS Batch, ECS instance role, and an IAM Spot Fleet role. Use the Launch stack button or run the following command:
As with any CloudFormation stack, you can update resources for your application’s specific needs. AWS CloudFormation Designer is a graphic tool for creating, viewing, and modifying CloudFormation templates.
Any changes to resource properties that require replacement results in the creation of a new resource to reflect this change, and the deletion of the obsolete resource. Changes to an immutable compute environment or job queue properties results in replacement. Changes to updateable properties update the existing resource. Any changes to job definitions (beyond the name) result in the registration of a new revision of the existing job definition, followed by the deregistration of the previous revision.
Finally, run the following command to delete the CloudFormation stack containing your AWS Batch resources:
aws – region us-east-2 cloudformation delete-stack – stack-name high-low-priority-batch-stack
Conclusion
In this post, I detailed the steps to create, update with and without replacement, and delete your AWS Batch resources using CloudFormation templates as part of CloudFormation stacks with other AWS service resources. For more information, see the following topics:
As I have discussed in the past, sophisticated AWS customers invariably control multiple AWS accounts. Some of these are the results of acquisitions or a holdover from bottom-up, departmental adoption of cloud computing. Others create multiple accounts in order to isolate developers, projects, or departments from each other. We strongly endorse this as a best practice, and back it up with cross-account features in many AWS services, as well as AWS Organizations for policy-based management that spans accounts. Many of these customers also make great use of AWS Config and use Config Rules (both their own and those supplied by Config) to check their AWS resources for compliance.
Aggregate Across Accounts and Regions Today we are making Config Rules even more useful by adding the ability to aggregate the compliance data produced by their rules across multiple AWS accounts and/or Regions. The aggregated data can then be viewed in a single dashboard, making this a great way to improve governance and compliance. Even better, the aggregation and dashboard are available at no charge to all AWS Config users!
I’ll show you how to set this up in a moment. First, let’s define a couple of terms:
Aggregator – This is a new Config resource. It identifies the sources (accounts and regions) of the compliance data to be aggregated. Multiple aggregators can be used simultaneously, giving you the ability to fine-tune your governance and compliance model.
Aggregator account – This is an AWS account that owns one or more aggregators.
Source account – This is an AWS account that has compliance data to be aggregated.
Aggregated view – A dashboard that shows compliant and non-compliant rules for an aggregator.
Here’s how it all fits together:
Setting up Aggregation Let’s set up aggregation for some AWS Config data! The first steps take place in the aggregator account. I open the Config Console, find the Aggregated View section and click Aggregators:
I review the list of Aggregators, and click Add aggregator to make a new one:
I grant AWS Config permission to replicate data from the source accounts and enter a name for my aggregator (MyAgg):
Next, I select the source accounts. I have three options here: I can manually add the account IDs, upload a file that contains a comma-separated list, or add all of the accounts in my AWS Organization:
I click on Add source accounts to manually add one account, enter the ID, and click Add source accounts:
Next, I choose the regions of interest, with the option to select current regions as well as future ones, then click Save to move ahead:
The next step takes place in the source account, within the Config Console. An Authorization request appears:
And I confirm it:
You can use CloudFormation StackSets to enable authorization programmatically across all source accounts. Also note that the authorization step is not needed if you choose to aggregate all accounts in your AWS Organization.
Compliance data from the source account begins to flow to the aggregator account and becomes visible in the Console, generally within 2-5 minutes:
As you can see, I have a multitude of filtering options! I can focus the view on a particular region or account, and I can see which rules or accounts have the most issues to address. For example, I can see all of the buckets that do not have server-side encryption enabled:
I can also look at the overall compliance situation for an account, seeing both compliant and non-compliant resources:
Things to Know This new feature is available today in the US East (N. Virginia), US East (Ohio), US West (Oregon), US West (N. California), EU (Ireland), EU (Frankfurt), Asia Pacific (Tokyo), Asia Pacific (Sydney), and Asia Pacific (Singapore) regions at no charge and you can start using it today. You pay for the use of Config and Config Rules as usual.
The multi-account, multi-region data aggregation capability in AWS Config allows you to view the compliance status of your accounts from a central account. It assumes that you have already enabled Config and Config Rules across your accounts (you can use CloudFormation StackSets to distribute and deploy your Config Rules across multiple accounts).
Amazon EC2 Spot Instances are spare compute capacity in the AWS Cloud available to you at steep discounts compared to On-Demand prices. The only difference between On-Demand Instances and Spot Instances is that Spot Instances can be interrupted by Amazon EC2 with two minutes of notification when EC2 needs the capacity back.
Customers have been taking advantage of Spot Instance interruption notices available via the instance metadata service since January 2015 to orchestrate their workloads seamlessly around any potential interruptions. Examples include saving the state of a job, detaching from a load balancer, or draining containers. Needless to say, the two-minute Spot Instance interruption notice is a powerful tool when using Spot Instances.
In January 2018, the Spot Instance interruption notice also became available as an event in Amazon CloudWatch Events. This allows targets such as AWS Lambda functions or Amazon SNS topics to process Spot Instance interruption notices by creating a CloudWatch Events rule to monitor for the notice.
In this post, I walk through an example use case for taking advantage of Spot Instance interruption notices in CloudWatch Events to automatically deregister Spot Instances from an Elastic Load Balancing Application Load Balancer.
When any of the Spot Instances receives an interruption notice, Spot Fleet sends the event to CloudWatch Events. The CloudWatch Events rule then notifies both targets, the Lambda function and SNS topic. The Lambda function detaches the Spot Instance from the Application Load Balancer target group, taking advantage of nearly a full two minutes of connection draining before the instance is interrupted. The SNS topic also receives a message, and is provided as an example for the reader to use as an exercise.
To complete this walkthrough, have the AWS CLI installed and configured, as well as the ability to launch CloudFormation stacks.
Launch the stack
Go ahead and launch the CloudFormation stack. You can check it out from GitHub, or grab the template directly. In this post, I use the stack name “spot-spin-cwe“, but feel free to use any name you like. Just remember to change it in the instructions.
Here are the details of the architecture being launched by the stack.
IAM permissions
Give permissions to a few components in the architecture:
The Lambda function
The CloudWatch Events rule
The Spot Fleet
The Lambda function needs basic Lambda function execution permissions so that it can write logs to CloudWatch Logs. You can use the AWS managed policy for this. It also needs to describe EC2 tags as well as deregister targets within Elastic Load Balancing. You can create a custom policy for these.
Finally, Spot Fleet needs permissions to request Spot Instances, tag, and register targets in Elastic Load Balancing. You can tap into an AWS managed policy for this.
Because you are taking advantage of the two-minute Spot Instance notice, you can tune the Elastic Load Balancing target group deregistration timeout delay to match. When a target is deregistered from the target group, it is put into connection draining mode for the length of the timeout delay: 120 seconds to equal the two-minute notice.
To capture the Spot Instance interruption notice being published to CloudWatch Events, create a rule with two targets: the Lambda function and the SNS topic.
The Lambda function does the heavy lifting for you. The details of the CloudWatch event are published to the Lambda function, which then uses boto3 to make a couple of AWS API calls. The first call is to describe the EC2 tags for the Spot Instance, filtering on a key of “TargetGroupArn”. If this tag is found, the instance is then deregistered from the target group ARN stored as the value of the tag.
import boto3
def handler(event, context):
instanceId = event['detail']['instance-id']
instanceAction = event['detail']['instance-action']
try:
ec2client = boto3.client('ec2')
describeTags = ec2client.describe_tags(Filters=[{'Name': 'resource-id','Values':[instanceId],'Name':'key','Values':['loadBalancerTargetGroup']}])
except:
print("No action being taken. Unable to describe tags for instance id:", instanceId)
return
try:
elbv2client = boto3.client('elbv2')
deregisterTargets = elbv2client.deregister_targets(TargetGroupArn=describeTags['Tags'][0]['Value'],Targets=[{'Id':instanceId}])
except:
print("No action being taken. Unable to deregister targets for instance id:", instanceId)
return
print("Detaching instance from target:")
print(instanceId, describeTags['Tags'][0]['Value'], deregisterTargets, sep=",")
return
SNS topic
Finally, you’ve created an SNS topic as an example target. For example, you could subscribe an email address to the SNS topic in order to receive email notifications when a Spot Instance interruption notice is received.
To proceed to creating your Spot Fleet request, use some of the resources that the CloudFormation stack created, to populate the Spot Fleet request launch configuration. You can find the values in the outputs values of the CloudFormation stack:
You can confirm that the Spot Fleet request was fulfilled by checking that ActivityStatus is “fulfilled”, or by checking that FulfilledCapacity is greater than or equal to TargetCapacity, while describing the request:
In order to test, you can take advantage of the fact that any interruption action that Spot Fleet takes on a Spot Instance results in a Spot Instance interruption notice being provided. Therefore, you can simply decrease the target size of your Spot Fleet from 2 to 1. The instance that is interrupted receives the interruption notice:
As soon as the interruption notice is published to CloudWatch Events, the Lambda function triggers and detaches the instance from the target group, effectively putting the instance in a draining state.
In conclusion, Amazon EC2 Spot Instance interruption notices are an extremely powerful tool when taking advantage of Amazon EC2 Spot Instances in your workloads, for tasks such as saving state, draining connections, and much more. I’d love to hear how you are using them in your own environment!
Chad Schmutzer Solutions Architect
Chad Schmutzer is a Solutions Architect at Amazon Web Services based in Pasadena, CA. As an extension of the Amazon EC2 Spot Instances team, Chad helps customers significantly reduce the cost of running their applications, growing their compute capacity and throughput without increasing budget, and enabling new types of cloud computing applications.
Can you believe it’s already the month of March? With some great new Tech Talks available this month, there’s no better time to grow your knowledge about AWS services and solutions.
AWS Online Tech Talks
March 2018– Schedule
Below is the full schedule for the live, online technical sessions being held during the month of March. Make sure to register ahead of time so you won’t miss out on these free talks conducted by AWS subject matter experts.
March 28, 2018 | 11:00 AM – 12:00 PM PT – Deep Dive on Amazon Athena (300) – Dive deep into the most common Amazon Athena use cases, including working with other AWS services.
Compute
March 26, 2018 | 01:00 PM – 01:45 PM PT – High Performance Computing in the Cloud (200) – Learn how AWS is enabling faster time to results and higher ROI when it comes to solving the big problems in science, engineering and business with high performance computing in the cloud.
March 27, 2018 | 01:00 PM – 01:45 PM PT – Introduction to Hybrid Cloud on AWS (200) – Learn how AWS is building the industry’s broadest capabilities for Hybrid Cloud deployments.
March 28, 2018 | 01:00 PM – 02:00 PM PT – Media Processing Workflows at High Velocity and Scale using AI and ML (200) – Hear how AWS customers have improved media supply chains using AI in areas such as metadata tagging (Rekognition and Comprehend), translations, transcriptions, and cloud services (Elemental).
March 22, 2018 | 09:00 AM – 09:45 AM PT – New Mobile CLI and Console Experience (200) – Learn how AWS Mobile Services has introduced a new CLI and streamlined console experience in order to simplify and speed up the development of mobile applications with innovative AWS features and back-end functionality.
Networking
March 28, 2018 | 09:00 AM – 09:45 AM PT –Deep Dive on New AWS Networking Features (300) – Learn how AWS PrivateLink, Direct Connect gateway, and new features with Elastic Load Balancers (ELB) come together to meet the needs of a modern enterprise.
March 29, 2018 | 09:00 AM – 09:45 AM PT – Navigating GDPR Compliance on AWS (300) – Get a walkthrough of potential General Data Protection Regulation (GDPR) obligations and see how the AWS cloud offers services and features that are consistent with GDPR considerations in the ramp-up to the May 25th, 2018 enforcement date.
March 27, 2018 | 11:00 AM – 11:45 AM PT–Enterprise Applications with Amazon EFS (300) – Join us for a technical deep dive on Amazon EFS, where you’ll learn tips and tricks for integrating your enterprise applications with Amazon EFS.
March 29, 2018 | 11:00 AM – 11:45 AM PT – Transforming Data Lakes with Amazon S3 Select & Amazon Glacier Select (300) – Join us for a webinar where we’ll demonstrate how Amazon S3 Select can increase analytics query performance up to 400%, and Amazon Glacier Select makes it practical to extend queries to archive storage, significantly reducing data lake storage costs.
Apache Cassandra is a commonly used, high performance NoSQL database. AWS customers that currently maintain Cassandra on-premises may want to take advantage of the scalability, reliability, security, and economic benefits of running Cassandra on Amazon EC2.
Amazon EC2 and Amazon Elastic Block Store (Amazon EBS) provide secure, resizable compute capacity and storage in the AWS Cloud. When combined, you can deploy Cassandra, allowing you to scale capacity according to your requirements. Given the number of possible deployment topologies, it’s not always trivial to select the most appropriate strategy suitable for your use case.
In this post, we outline three Cassandra deployment options, as well as provide guidance about determining the best practices for your use case in the following areas:
Cassandra resource overview
Deployment considerations
Storage options
Networking
High availability and resiliency
Maintenance
Security
Before we jump into best practices for running Cassandra on AWS, we should mention that we have many customers who decided to use DynamoDB instead of managing their own Cassandra cluster. DynamoDB is fully managed, serverless, and provides multi-master cross-region replication, encryption at rest, and managed backup and restore. Integration with AWS Identity and Access Management (IAM) enables DynamoDB customers to implement fine-grained access control for their data security needs.
Several customers who have been using large Cassandra clusters for many years have moved to DynamoDB to eliminate the complications of administering Cassandra clusters and maintaining high availability and durability themselves. Gumgum.com is one customer who migrated to DynamoDB and observed significant savings. For more information, see Moving to Amazon DynamoDB from Hosted Cassandra: A Leap Towards 60% Cost Saving per Year.
AWS provides options, so you’re covered whether you want to run your own NoSQL Cassandra database, or move to a fully managed, serverless DynamoDB database.
Cassandra resource overview
Here’s a short introduction to standard Cassandra resources and how they are implemented with AWS infrastructure. If you’re already familiar with Cassandra or AWS deployments, this can serve as a refresher.
Resource
Cassandra
AWS
Cluster
A single Cassandra deployment.
This typically consists of multiple physical locations, keyspaces, and physical servers.
A logical deployment construct in AWS that maps to an AWS CloudFormation StackSet, which consists of one or many CloudFormation stacks to deploy Cassandra.
Datacenter
A group of nodes configured as a single replication group.
A logical deployment construct in AWS.
A datacenter is deployed with a single CloudFormation stack consisting of Amazon EC2 instances, networking, storage, and security resources.
Rack
A collection of servers.
A datacenter consists of at least one rack. Cassandra tries to place the replicas on different racks.
A single Availability Zone.
Server/node
A physical virtual machine running Cassandra software.
An EC2 instance.
Token
Conceptually, the data managed by a cluster is represented as a ring. The ring is then divided into ranges equal to the number of nodes. Each node being responsible for one or more ranges of the data. Each node gets assigned with a token, which is essentially a random number from the range. The token value determines the node’s position in the ring and its range of data.
Managed within Cassandra.
Virtual node (vnode)
Responsible for storing a range of data. Each vnode receives one token in the ring. A cluster (by default) consists of 256 tokens, which are uniformly distributed across all servers in the Cassandra datacenter.
Managed within Cassandra.
Replication factor
The total number of replicas across the cluster.
Managed within Cassandra.
Deployment considerations
One of the many benefits of deploying Cassandra on Amazon EC2 is that you can automate many deployment tasks. In addition, AWS includes services, such as CloudFormation, that allow you to describe and provision all your infrastructure resources in your cloud environment.
We recommend orchestrating each Cassandra ring with one CloudFormation template. If you are deploying in multiple AWS Regions, you can use a CloudFormation StackSet to manage those stacks. All the maintenance actions (scaling, upgrading, and backing up) should be scripted with an AWS SDK. These may live as standalone AWS Lambda functions that can be invoked on demand during maintenance.
You can get started by following the Cassandra Quick Start deployment guide. Keep in mind that this guide does not address the requirements to operate a production deployment and should be used only for learning more about Cassandra.
Deployment patterns
In this section, we discuss various deployment options available for Cassandra in Amazon EC2. A successful deployment starts with thoughtful consideration of these options. Consider the amount of data, network environment, throughput, and availability.
Single AWS Region, 3 Availability Zones
Active-active, multi-Region
Active-standby, multi-Region
Single region, 3 Availability Zones
In this pattern, you deploy the Cassandra cluster in one AWS Region and three Availability Zones. There is only one ring in the cluster. By using EC2 instances in three zones, you ensure that the replicas are distributed uniformly in all zones.
To ensure the even distribution of data across all Availability Zones, we recommend that you distribute the EC2 instances evenly in all three Availability Zones. The number of EC2 instances in the cluster is a multiple of three (the replication factor).
This pattern is suitable in situations where the application is deployed in one Region or where deployments in different Regions should be constrained to the same Region because of data privacy or other legal requirements.
Pros
Cons
● Highly available, can sustain failure of one Availability Zone.
● Simple deployment
● Does not protect in a situation when many of the resources in a Region are experiencing intermittent failure.
Active-active, multi-Region
In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.
We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.
This pattern is most suitable when the applications using the Cassandra cluster are deployed in more than one Region.
Pros
Cons
● No data loss during failover.
● Highly available, can sustain when many of the resources in a Region are experiencing intermittent failures.
● Read/write traffic can be localized to the closest Region for the user for lower latency and higher performance.
● High operational overhead
● The second Region effectively doubles the cost
Active-standby, multi-region
In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.
However, the second Region does not receive traffic from the applications. It only functions as a secondary location for disaster recovery reasons. If the primary Region is not available, the second Region receives traffic.
We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.
This pattern is most suitable when the applications using the Cassandra cluster require low recovery point objective (RPO) and recovery time objective (RTO).
Pros
Cons
● No data loss during failover.
● Highly available, can sustain failure or partitioning of one whole Region.
● High operational overhead.
● High latency for writes for eventual consistency.
● The second Region effectively doubles the cost.
Storage options
In on-premises deployments, Cassandra deployments use local disks to store data. There are two storage options for EC2 instances:
Your choice of storage is closely related to the type of workload supported by the Cassandra cluster. Instance store works best for most general purpose Cassandra deployments. However, in certain read-heavy clusters, Amazon EBS is a better choice.
The choice of instance type is generally driven by the type of storage:
If ephemeral storage is required for your application, a storage-optimized (I3) instance is the best option.
If your workload requires Amazon EBS, it is best to go with compute-optimized (C5) instances.
Burstable instance types (T2) don’t offer good performance for Cassandra deployments.
Instance store
Ephemeral storage is local to the EC2 instance. It may provide high input/output operations per second (IOPs) based on the instance type. An SSD-based instance store can support up to 3.3M IOPS in I3 instances. This high performance makes it an ideal choice for transactional or write-intensive applications such as Cassandra.
In general, instance storage is recommended for transactional, large, and medium-size Cassandra clusters. For a large cluster, read/write traffic is distributed across a higher number of nodes, so the loss of one node has less of an impact. However, for smaller clusters, a quick recovery for the failed node is important.
As an example, for a cluster with 100 nodes, the loss of 1 node is 3.33% loss (with a replication factor of 3). Similarly, for a cluster with 10 nodes, the loss of 1 node is 33% less capacity (with a replication factor of 3).
Ephemeral storage
Amazon EBS
Comments
IOPS
(translates to higher query performance)
Up to 3.3M on I3
80K/instance
10K/gp2/volume
32K/io1/volume
This results in a higher query performance on each host. However, Cassandra implicitly scales well in terms of horizontal scale. In general, we recommend scaling horizontally first. Then, scale vertically to mitigate specific issues.
Note: 3.3M IOPS is observed with 100% random read with a 4-KB block size on Amazon Linux.
AWS instance types
I3
Compute optimized, C5
Being able to choose between different instance types is an advantage in terms of CPU, memory, etc., for horizontal and vertical scaling.
Backup/ recovery
Custom
Basic building blocks are available from AWS.
Amazon EBS offers distinct advantage here. It is small engineering effort to establish a backup/restore strategy.
a) In case of an instance failure, the EBS volumes from the failing instance are attached to a new instance.
b) In case of an EBS volume failure, the data is restored by creating a new EBS volume from last snapshot.
Amazon EBS
EBS volumes offer higher resiliency, and IOPs can be configured based on your storage needs. EBS volumes also offer some distinct advantages in terms of recovery time. EBS volumes can support up to 32K IOPS per volume and up to 80K IOPS per instance in RAID configuration. They have an annualized failure rate (AFR) of 0.1–0.2%, which makes EBS volumes 20 times more reliable than typical commodity disk drives.
The primary advantage of using Amazon EBS in a Cassandra deployment is that it reduces data-transfer traffic significantly when a node fails or must be replaced. The replacement node joins the cluster much faster. However, Amazon EBS could be more expensive, depending on your data storage needs.
Cassandra has built-in fault tolerance by replicating data to partitions across a configurable number of nodes. It can not only withstand node failures but if a node fails, it can also recover by copying data from other replicas into a new node. Depending on your application, this could mean copying tens of gigabytes of data. This adds additional delay to the recovery process, increases network traffic, and could possibly impact the performance of the Cassandra cluster during recovery.
Data stored on Amazon EBS is persisted in case of an instance failure or termination. The node’s data stored on an EBS volume remains intact and the EBS volume can be mounted to a new EC2 instance. Most of the replicated data for the replacement node is already available in the EBS volume and won’t need to be copied over the network from another node. Only the changes made after the original node failed need to be transferred across the network. That makes this process much faster.
EBS volumes are snapshotted periodically. So, if a volume fails, a new volume can be created from the last known good snapshot and be attached to a new instance. This is faster than creating a new volume and coping all the data to it.
Most Cassandra deployments use a replication factor of three. However, Amazon EBS does its own replication under the covers for fault tolerance. In practice, EBS volumes are about 20 times more reliable than typical disk drives. So, it is possible to go with a replication factor of two. This not only saves cost, but also enables deployments in a region that has two Availability Zones.
EBS volumes are recommended in case of read-heavy, small clusters (fewer nodes) that require storage of a large amount of data. Keep in mind that the Amazon EBS provisioned IOPS could get expensive. General purpose EBS volumes work best when sized for required performance.
Networking
If your cluster is expected to receive high read/write traffic, select an instance type that offers 10–Gb/s performance. As an example, i3.8xlarge and c5.9xlarge both offer 10–Gb/s networking performance. A smaller instance type in the same family leads to a relatively lower networking throughput.
Cassandra generates a universal unique identifier (UUID) for each node based on IP address for the instance. This UUID is used for distributing vnodes on the ring.
In the case of an AWS deployment, IP addresses are assigned automatically to the instance when an EC2 instance is created. With the new IP address, the data distribution changes and the whole ring has to be rebalanced. This is not desirable.
To preserve the assigned IP address, use a secondary elastic network interface with a fixed IP address. Before swapping an EC2 instance with a new one, detach the secondary network interface from the old instance and attach it to the new one. This way, the UUID remains same and there is no change in the way that data is distributed in the cluster.
If you are deploying in more than one region, you can connect the two VPCs in two regions using cross-region VPC peering.
High availability and resiliency
Cassandra is designed to be fault-tolerant and highly available during multiple node failures. In the patterns described earlier in this post, you deploy Cassandra to three Availability Zones with a replication factor of three. Even though it limits the AWS Region choices to the Regions with three or more Availability Zones, it offers protection for the cases of one-zone failure and network partitioning within a single Region. The multi-Region deployments described earlier in this post protect when many of the resources in a Region are experiencing intermittent failure.
Resiliency is ensured through infrastructure automation. The deployment patterns all require a quick replacement of the failing nodes. In the case of a regionwide failure, when you deploy with the multi-Region option, traffic can be directed to the other active Region while the infrastructure is recovering in the failing Region. In the case of unforeseen data corruption, the standby cluster can be restored with point-in-time backups stored in Amazon S3.
Maintenance
In this section, we look at ways to ensure that your Cassandra cluster is healthy:
Scaling
Upgrades
Backup and restore
Scaling
Cassandra is horizontally scaled by adding more instances to the ring. We recommend doubling the number of nodes in a cluster to scale up in one scale operation. This leaves the data homogeneously distributed across Availability Zones. Similarly, when scaling down, it’s best to halve the number of instances to keep the data homogeneously distributed.
Cassandra is vertically scaled by increasing the compute power of each node. Larger instance types have proportionally bigger memory. Use deployment automation to swap instances for bigger instances without downtime or data loss.
Upgrades
All three types of upgrades (Cassandra, operating system patching, and instance type changes) follow the same rolling upgrade pattern.
In this process, you start with a new EC2 instance and install software and patches on it. Thereafter, remove one node from the ring. For more information, see Cassandra cluster Rolling upgrade. Then, you detach the secondary network interface from one of the EC2 instances in the ring and attach it to the new EC2 instance. Restart the Cassandra service and wait for it to sync. Repeat this process for all nodes in the cluster.
Backup and restore
Your backup and restore strategy is dependent on the type of storage used in the deployment. Cassandra supports snapshots and incremental backups. When using instance store, a file-based backup tool works best. Customers use rsync or other third-party products to copy data backups from the instance to long-term storage. For more information, see Backing up and restoring data in the DataStax documentation. This process has to be repeated for all instances in the cluster for a complete backup. These backup files are copied back to new instances to restore. We recommend using S3 to durably store backup files for long-term storage.
For Amazon EBS based deployments, you can enable automated snapshots of EBS volumes to back up volumes. New EBS volumes can be easily created from these snapshots for restoration.
Security
We recommend that you think about security in all aspects of deployment. The first step is to ensure that the data is encrypted at rest and in transit. The second step is to restrict access to unauthorized users. For more information about security, see the Cassandra documentation.
Encryption at rest
Encryption at rest can be achieved by using EBS volumes with encryption enabled. Amazon EBS uses AWS KMS for encryption. For more information, see Amazon EBS Encryption.
Instance store–based deployments require using an encrypted file system or an AWS partner solution. If you are using DataStax Enterprise, it supports transparent data encryption.
Encryption in transit
Cassandra uses Transport Layer Security (TLS) for client and internode communications.
Authentication
The security mechanism is pluggable, which means that you can easily swap out one authentication method for another. You can also provide your own method of authenticating to Cassandra, such as a Kerberos ticket, or if you want to store passwords in a different location, such as an LDAP directory.
Authorization
The authorizer that’s plugged in by default is org.apache.cassandra.auth.Allow AllAuthorizer. Cassandra also provides a role-based access control (RBAC) capability, which allows you to create roles and assign permissions to these roles.
Conclusion
In this post, we discussed several patterns for running Cassandra in the AWS Cloud. This post describes how you can manage Cassandra databases running on Amazon EC2. AWS also provides managed offerings for a number of databases. To learn more, see Purpose-built databases for all your application needs.
If you have questions or suggestions, please comment below.
Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.
Provanshu Dey is a Senior IoT Consultant with AWS Professional Services. He works on highly scalable and reliable IoT, data and machine learning solutions with our customers. In his spare time, he enjoys spending time with his family and tinkering with electronics & gadgets.
With the introduction of AWS Organizations and AWS CloudFormation StackSets, you can create and manage standard AWS Identity and Access Management (IAM) roles, customer managed policies, and federated identity providers across a set of accounts in your organization. This tech talk goes through the details of setting up a CloudFormation StackSet in your master account, and creating stack instances in each account that set up roles and policies in each account in an organizational unit (OU). We also will discuss how to update the stacks and how to integrate StackSets into your account creation process. This tech talk is a followup to a set of blog posts on the AWS Security Blog published during the summer of 2017.
The third annual Prime Day set another round of records for global orders, topping Black Friday and Cyber Monday, making it the biggest day in Amazon retail history. Over the course of the 30 hour event, tens of millions of Prime members purchased things like Echo Dots, Fire tablets, programmable pressure cookers, espresso machines, rechargeable batteries, and much more! July 11th also set a record for the number of new Prime memberships, as people signed up in order to take advantage of hundreds of thousands of deals. Amazon customers shopped online and made heavy use of the Amazon App, with mobile orders more than doubling from last Prime Day.
Powered by AWS Last year I told you about How AWS Powered Amazon’s Biggest Day Ever, and shared what the team had learned with regard to preparation, automation, monitoring, and thinking big. All of those lessons still apply and you can read that post to learn more. Preparation for this year’s Prime Day (which started just days after Prime Day 2016 wrapped up) started by collecting and sharing best practices and identifying areas for improvement, proceeding to implementation and stress testing as the big day approached. Two of the best practices involve auditing and GameDay:
Auditing – This is a formal way for us to track preparations, identify risks, and to track progress against our objectives. Each team must respond to a series of detailed technical and operational questions that are designed to help them determine their readiness. On the technical side, questions could revolve around time to recovery after a database failure, including the all-important check of the TTL (time to live) for the CNAME. Operational questions address schedules for on-call personnel, points of contact, and ownership of services & instances.
GameDay – This practice (which I believe originated with former Amazonian Jesse Robbins), is intended to validate all of the capacity planning & preparation and to verify that all of the necessary operational practices are in place and work as expected. It introduces simulated failures and helps to train the team to identify and quickly resolve issues, building muscle memory in the process. It also tests failover and recovery capabilities, and can expose latent defects that are lurking under the covers. GameDays help teams to understand scaling drivers (page views, orders, and so forth) and gives them an opportunity to test their scaling practices. To learn more, read Resilience Engineering: Learning to Embrace Failure or watch the video: GameDay: Creating Resiliency Through Destruction.
Prime Day 2017 Metrics So, how did we do this year?
The AWS teams checked their dashboards and log files, and were happy to share their metrics with me. Here are a few of the most interesting ones:
Block Storage – Use of Amazon Elastic Block Store (EBS) grew by 40% year-over-year, with aggregate data transfer jumping to 52 petabytes (a 50% increase) for the day and total I/O requests rising to 835 million (a 30% increase). The team told me that they loved the elasticity of EBS, and that they were able to ramp down on capacity after Prime Day concluded instead of being stuck with it.
NoSQL Database – Amazon DynamoDB requests from Alexa, the Amazon.com sites, and the Amazon fulfillment centers totaled 3.34 trillion, peaking at 12.9 million per second. According to the team, the extreme scale, consistent performance, and high availability of DynamoDB let them meet needs of Prime Day without breaking a sweat.
Stack Creation – Nearly 31,000 AWS CloudFormation stacks were created for Prime Day in order to bring additional AWS resources on line.
API Usage – AWS CloudTrail processed over 50 billion events and tracked more than 419 billion calls to various AWS APIs, all in support of Prime Day.
Configuration Tracking – AWS Config generated over 14 million Configuration items for AWS resources.
You Can Do It Running an event that is as large, complex, and mission-critical as Prime Day takes a lot of planning. If you have an event of this type in mind, please take a look at our new Infrastructure Event Readiness white paper. Inside, you will learn how to design and provision your applications to smoothly handle planned scaling events such as product launches or seasonal traffic spikes, with sections on automation, resiliency, cost optimization, event management, and more.
Many AWS customers are using the power of AWS CloudFormation to customize complex infrastructures. At the same time, they are moving towards self-service for their expanding customer bases. How can complex infrastructure be provisioned on-demand while minimizing customer use of the AWS Management Console?
Let’s say AnyCompany uses AWS services to process sensitive datasets owned by its customers. These customers need to be able to provision their own processing infrastructure on demand. However, AnyCompany doesn’t want the customers to access AWS CloudFormation directly, or see how customer data is processed. How can AnyCompany create resources with CloudFormation templates without exposing proprietary processing methods?
You can use Amazon API Gateway and AWS Lambda to provide complex, on-demand, data-processing infrastructure to users who have no need to access the AWS Management Console. In this post, we walk through the setup and configuration of working code examples that you (and your customers) can use to provision on-demand infrastructure. This post’s solution combines principles of immutable computing with a method to provide granular access to powerful capabilities available in the AWS Cloud. In this post, you create immutability for an Amazon EC2 instance by provisioning it without an SSH key or any other access mechanism. The instance can’t be altered after it’s launched.
Solution architecture
API Gateway simplifies the creation, management, and deployment of APIs. Integration of API Gateway with Lambda, the AWS serverless compute service, allows for further interaction with the larger family of AWS services. In this post, the Lambda function provisions on-demand CloudFormation infrastructure stacks for our example service’s primary business function: the calculation of pi.
Two CloudFormation templates are used to provision infrastructure stacks:
The primary template
The business-function template
The primary CloudFormation template creates the base infrastructure that is shown in the following diagram.
The primary template creates a stack of resources:
API Gateway resources
The associated AWS Identity and Access Management (IAM) security roles
An Amazon S3 bucket for the results of the data processed by the business-function template
A Lambda function specifically for the purpose of creating multiple iterations of a second repeatable infrastructure. This repeatable infrastructure, contained within the second CloudFormation template, is referred to as the business-function template.
The API created by the primary template allows system users to pass selected parameters to the business-function template, such as cost-center tags, a unique name, and data-processing parameters. The ability to pass parameters allows the business-function infrastructure to adapt to the specific needs of each request. In this post’s example—calculating the first 15,000 digits of pi—the number of digits calculated can be specified in the request initiated by the customer. For your own business function templates, this could be the location of input data or an email address to which results are sent.
In the example business-function template, the template provisions an immutable EC2 instance that calculates a specified number of digits of pi and uploads the results to the S3 bucket. The business-function stack then self-terminates, reducing any potential extra costs.
This architecture allows you, at multiple points, to control access, processing parameters, and workflow.
In the earlier diagram, (1) all access to the AWS Management Console and API for your account’s infrastructure is segregated through the use of API Gateway. Then, (2) IAM role–based restrictions are in place for API Gateway to call the Lambda function. The Lambda function serves to selectively add, subtract, or alter input parameters from external users. The code for this function is embedded in the primary CloudFormation template. You can modify the code to add specific tags for billing information or call other AWS services such as Amazon DynamoDB or Amazon RDS to keep a record of which infrastructure is instantiated by which API users.
In our example template, the Lambda function is populated to pass only the required parameters for the business-function template’s calculation of pi. The use of the function restricts the ability of customers to inject unexpected or undesirable parameters. The solution then (3) uses a Lambda execution role to control the Lambda function’s access. Finally, (4) the Lambda function initiates the CloudFormation business-function template by using another IAM role that is restricted to instantiating only the resources required.
The use of an intermediate Lambda function here allows for heavy customization and filtering of the input requests from the customer-facing API Gateway.
Solution Features
By using this API Gateway–based solution, you can benefit in a number of ways:
User requests are decoupled from the infrastructure that fulfills those requests.
This post’s solution removes the need for users to have access to the AWS Management Console. Users can accomplish their infrastructure-backed requests on demand without having any direct access to the console. Additionally, processing methods and infrastructure can be switched in and out without any change to the external appearance of your service. For example, in this post, I could have a containerized solution such as Amazon EC2 Container Services (Amazon ECS) process my pi-calculation service without any visibility to users.
Proprietary processing methods and infrastructure are obscured from users.
The solution in this post can protect the proprietary secrets of your company’s data processing service by obscuring which processing methods you are using and which infrastructure performs the processing. The solution also isolates users from viewing each other’s operations. Without having console access, users cannot determine which running processing stacks are initiated by other users.
On-demand infrastructure in specific configurations is created without allowing users to provision arbitrary infrastructure.
The solution in this post complements the functionality of the AWS Service Catalog. Beyond allowing only preapproved infrastructure and preventing unapproved resources from executing outside your company’s policies, this API Gateway–driven method delivers a specific, processed result. For ease of troubleshooting and integration with DevOps workflows, you can standardize, version, and deploy complicated CloudFormation stacks for networking, compute, storage, and big data processing.
Simplification of the user experience.
This solution also simplifies the user experience. Commands that request a specific result are limited to a single, familiar REST API interface call that delivers only that result.
Cost savings through self-terminating infrastructure.
The architecture of this solution is augmented through the use of self-terminating CloudFormation stacks. Stacks can spin up expensive infrastructure to perform data processing on demand, and self-terminate as soon as the task has completed. This can save you infrastructure costs associated with idle resources.
Walkthrough: Deploy the solution
This post’s example templates create the base infrastructure and a business-function process. The business-function process can instantiate a single EC2 instance to calculate a specified number of digits of pi and deposit the results in an S3 bucket. As noted previously, this example includes built-in logic to automatically self-terminate the business-function template’s CloudFormation stacks after instantiation.
The following steps walk through how to provision this solution for creating on-demand business-function CloudFormation stacks through API Gateway. You can modify the business-function template code for your specialized infrastructure needs. However, you must upload the modified template to your own S3 bucket before completing Step 1 because the business-function template location is required for the primary template’s parameters.
Both templates are designed for use in the us-west-2 region and may require minor modifications to function properly in other regions.
Costs
API Gateway costs are based on the number of API calls requested, and the single call here leads to a near-zero cost. For current pricing, see Amazon API Gateway Pricing. The associated S3 costs and Lambda costs for this post also are negligible because you are not transferring much data or spending a significant amount of time processing serverless code. The most expensive part of this post’s solution is the on-demand EC2 instance cost invoked in each business-function template instantiation. This cost is approximately $0.012 per hour for each run with the default t2.micro EC2 instance. In total, running the solution as presented in this post should cost you less than $0.10 in order to test multiple calculations of pi.
1. Deploy the CloudFormation template
First, deploy the primary CloudFormation template for API Gateway and other resources. This template creates IAM policies, IAM roles, API Gateway resources, a new S3 bucket, and a Lambda function.
You can deploy the primary template and launch the primary stack in the us-west-2 region by selecting the following button. Aside from Stack name, the only required parameters are the exact location of the business-function template in S3 and a unique S3 bucket name as a location for template results.
In the CloudFormation console, on the Specify Details page, type a stack name.
For NewS3BucketName, type a unique name.
For S3CFTLocation, leave the default value. If you adapt the primary template for your own customized business-function template, be sure to include the entire URI of the S3 location (for example, https://s3-us-west-2.amazonaws.com/mys3bucket/businessFunctionTemplate.yaml).
On the Options page, no changes are required, but you can specify additional tags if desired.
On the Capabilities page, acknowledge that CloudFormation might create IAM resources, as shown in the following screenshot. To finish creating the stack, choose Create.
2. Create an API key for the API Gateway
Though you could modify the example primary template for public use, this example restricts access to execute API calls through the use of API keys. After you create the API key, you must associate the new API key with the usage plan created by the primary template.
For Name, type a name for the key. For API Key, leave the default Auto Generate
(Optional) For Description, type a value.
Choose Save.
Choose Add to Usage Plan and type WrappingApiUsagePlan. This usage plan defaults your Stage to PROD because it is the only configured stage.
To add the key to the usage plan, choose the green check icon. Your key should resemble the following screenshot.
Choose Show and note the API key value for the next step.
3. Initiate a REST API query
Now, initiate a REST API query against the API that you created. in the example curl command below, replace the placeholder API key specified in the x-api-key HTTP header with your API key.
You also need to replace the example URI path with your API’s specific URI. Create the full URI by concatenating the base URI listed in the CloudFormation output for the primary template stack with the primary template’s API call path, which is /PROD/PerRunExecute. This full URI for your specific API call should closely resemble the URI path (e.g. https://abcdefghi.execute-api.us-west-2.amazonaws.com/PROD/PerRunExecute) in the following example curl command.
When you run the curl command, it executes a POST query to the REST API to instantiate the business-function template and calculate the first 15,000 digits of pi. A successful API query returns information on the curl request as well as CloudFormation API output similar to the following, indicating an “HTTPStatusCode” of 200.
After the EC2 instance completes its processing and uploads the results to the S3 bucket created by the primary template, you can examine the output. Using the configured t2.micro EC2 instance, instantiation and calculation for the first 15,000 digits of pi takes approximately 5–10 minutes. You can monitor progress in the EC2 console (where you see a new EC2 instance running) or the CloudFormation console (where you see the provisioned stack in a “created” state).
After the first 15,000 digits of pi have been uploaded to the S3 bucket, the EC2 instance self-terminates the stack. You can then download the results file from the S3 bucket, as shown below. The naming convention for pi calculating runs is “PiCalculation” concatenated with an epoch timestamp from the time of execution.
This file contains the instance ID that performed the processing, a time stamp, and the first 15,000 digits of pi. The output should look similar to the following screenshot.
5. Clean up
To help minimize your costs, clean up any infrastructure that resulted from following the solution in this post. Remember that you first have to detach the API key that you created from the usage plan in order to delete the primary CloudFormation stack. You can do this by clicking the X for associated usage plans for your API key, as shown in the following screenshot.
You also must empty the S3 results bucket of all files for the bucket to be deleted when the CloudFormation stack self-terminates.
This solution can be adapted to any CloudFormation template encompassing a business function for your service.
Adapt to Your Services
To adapt this example to your own business-function:
Write a CloudFormation template that implements your new business function.
In the primary template, customize the Lambda resource object code for “PerRunCloudFormationExecutionLambda” to accept and pass on the specific parameters needed by your business function template. Save it with a new name.
Upload your custom business-function CloudFormation template developed in step 1 to an S3 bucket with permissions that allow the CloudFormation service to access the template objects. For more information about S3 access control, see Managing Access Permissions to Your Amazon S3 Resource.
Start your new stack with the modified primary CloudFormation template. Pass it, as a parameter, the name of your new business function template as a parameter. You can reuse the same API Gateway usage plan created previously, if desired.
Test, using your new stack name in place of the bolded content above, and posting an API Gateway query with parameters needed by your new business function.
Conclusion
By using API Gateway, you can control complex, on-demand CloudFormation stacks that create AWS infrastructure. When you combine this functionality with self-terminating templates, you can realize significant cost savings. API Gateway also allows for standardization of infrastructure. It can enable your users to instantiate complex and costly architectures on demand and only for as long as your users need them.
Though the solution in this post assumes that the business function is to calculate the first 15,000 digits of pi (a less than profitable venture these days), you can adopt the solution to deliver significant cost savings. For example, instead of a single EC2 instance, the business-function template could instantiate Amazon Redshift (a petabyte-scale data warehouse), Amazon EMR, or any complex processing infrastructure. As with all on-demand AWS services, you pay for the infrastructure only when it is running.
If you have comments about any of this content, submit them in the “Comments” section below. If you have questions about implementing this solution, start a new thread on the API Gateway forum.
AWS CloudFormation helps AWS customers implement an Infrastructure as Code model. Instead of setting up their environments and applications by hand, they build a template and use it to create all of the necessary resources, collectively known as a CloudFormation stack. This model removes opportunities for manual error, increases efficiency, and ensures consistent configurations over time.
Today I would like to tell you about a new feature that makes CloudFormation even more useful. This feature is designed to help you to address the challenges that you face when you use Infrastructure as Code in situations that include multiple AWS accounts and/or AWS Regions. As a quick review:
Accounts – As I have told you in the past, many organizations use a multitude of AWS accounts, often using AWS Organizations to arrange the accounts into a hierarchy and to group them into Organizational Units, or OUs (read AWS Organizations – Policy-Based Management for Multiple AWS Accounts to learn more). Our customers use multiple accounts for business units, applications, and developers. They often create separate accounts for development, testing, staging, and production on a per-application basis.
Regions – Customers also make great use of the large (and ever-growing) set of AWS Regions. They build global applications that span two or more regions, implement sophisticated multi-region disaster recovery models, replicate S3, Aurora, PostgreSQL, and MySQL data in real time, and choose locations for storage and processing of sensitive data in accord with national and regional regulations.
This expansion into multiple accounts and regions comes with some new challenges with respect to governance and consistency. Our customers tell us that they want to make sure that each new account is set up in accord with their internal standards. Among other things, they want to set up IAM users and roles, VPCs and VPC subnets, security groups, Config Rules, logging, and AWS Lambda functions in a consistent and reliable way.
Introducing StackSet In order to address these important customer needs, we are launching CloudFormation StackSet today. You can now define an AWS resource configuration in a CloudFormation template and then roll it out across multiple AWS accounts and/or Regions with a couple of clicks. You can use this to set up a baseline level of AWS functionality that addresses the cross-account and cross-region scenarios that I listed above. Once you have set this up, you can easily expand coverage to additional accounts and regions.
This feature always works on a cross-account basis. The master account owns one or more StackSets and controls deployment to one or more target accounts. The master account must include an assumable IAM role and the target accounts must delegate trust to this role. To learn how to do this, read Prerequisites in the StackSet Documentation.
Each StackSet references a CloudFormation template and contains lists of accounts and regions. All operations apply to the cross-product of the accounts and regions in the StackSet. If the StackSet references three accounts (A1, A2, and A3) and four regions (R1, R2, R3, and R4), there are twelve targets:
Region R1: Accounts A1, A2, and A3.
Region R2: Accounts A1, A2, and A3.
Region R3: Accounts A1, A2, and A3.
Region R4: Accounts A1, A2, and A3.
Deploying a template initiates creation of a CloudFormation stack in an account/region pair. Templates are deployed sequentially to regions (you control the order) to multiple accounts within the region (you control the amount of parallelism). You can also set an error threshold that will terminate deployments if stack creation fails.
You can use your existing CloudFormation templates (taking care to make sure that they are ready to work across accounts and regions), create new ones, or use one of our sample templates. We are launching with support for the AWS partition (all public regions except those in China), and expect to expand it to to the others before too long.
Using the Console, I start by clicking on Create StackSet. I can use my own template or one of the samples. I’ll use the last sample (Add config rule encrypted volumes):
I click on View template to learn more about the template and the rule:
I give my StackSet a name. The template that I selected accepts an optional parameter, and I can enter it at this time:
Next, I choose the accounts and regions. I can enter account numbers directly, reference an AWS organizational unit, or upload a list of account numbers:
I can set up the regions and control the deployment order:
I can also set the deployment options. Once I am done I click on Next to proceed:
I can add tags to my StackSet. They will be applied to the AWS resources created during the deployment:
The deployment begins, and I can track the status from the Console:
I can open up the Stacks section to see each stack. Initially, the status of each stack is OUTDATED, indicating that the template has yet to be deployed to the stack; this will change to CURRENT after a successful deployment. If a stack cannot be deleted, the status will change to INOPERABLE.
After my initial deployment, I can click on Manage StackSet to add additional accounts, regions, or both, to create additional stacks:
Now Available This new feature is available now and you can start using it today at no extra charge (you pay only for the AWS resources created on your behalf).
Today, we released a new security whitepaper: Use AWS WAF to Mitigate OWASP’s Top 10 Web Application Vulnerabilities. This whitepaper describes how you can use AWS WAF, a web application firewall, to address the top application security flaws as named by the Open Web Application Security Project (OWASP). Using AWS WAF, you can write rules to match patterns of exploitation attempts in HTTP requests and block requests from reaching your web servers. This whitepaper discusses manifestations of these security vulnerabilities, AWS WAF–based mitigation strategies, and other AWS services or solutions that can help address these threats.
– Vlad
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.