Data is the soul of an application. As containers make it easier to package and deploy applications faster, testing plays an even more important role in the reliable delivery of software. Given that all applications have data, development teams want a way to reliably control, move, and test using real application data or, at times, obfuscated data.
For many teams, moving application data through a CI/CD pipeline, while honoring compliance and maintaining separation of concerns, has been a manual task that doesn’t scale. At best, it is limited to a few applications, and is not portable across environments. The goal should be to make running and testing stateful containers (think databases and message buses where operations are tracked) as easy as with stateless (such as with web front ends where they are often not).
Why is state important in testing scenarios? One reason is that many bugs manifest only when code is tested against real data. For example, we might simply want to test a database schema upgrade but a small synthetic dataset does not exercise the critical, finer corner cases in complex business logic. If we want true end-to-end testing, we need to be able to easily manage our data or state.
In this blog post, we define a CI/CD pipeline reference architecture that can automate data movement between applications. We also provide the steps to follow to configure the CI/CD pipeline.
Stateful Pipelines: Need for Portable Volumes
As part of continuous integration, testing, and deployment, a team may need to reproduce a bug found in production against a staging setup. Here, the hosting environment is comprised of a cluster with Kubernetes as the scheduler and Portworx for persistent volumes. The testing workflow is then automated by AWS CodeCommit, AWS CodePipeline, and AWS CodeBuild.
Portworx offers Kubernetes storage that can be used to make persistent volumes portable between AWS environments and pipelines. The addition of Portworx to the AWS Developer Tools continuous deployment for Kubernetes reference architecture adds persistent storage and storage orchestration to a Kubernetes cluster. The example uses MongoDB as the demonstration of a stateful application. In practice, the workflow applies to any containerized application such as Cassandra, MySQL, Kafka, and Elasticsearch.
Using the reference architecture, a developer calls CodePipeline to trigger a snapshot of the running production MongoDB database. Portworx then creates a block-based, writable snapshot of the MongoDB volume. Meanwhile, the production MongoDB database continues serving end users and is uninterrupted.
Without the Portworx integrations, a manual process would require an application-level backup of the database instance that is outside of the CI/CD process. For larger databases, this could take hours and impact production. The use of block-based snapshots follows best practices for resilient and non-disruptive backups.
As part of the workflow, CodePipeline deploys a new MongoDB instance for staging onto the Kubernetes cluster and mounts the second Portworx volume that has the data from production. CodePipeline triggers the snapshot of a Portworx volume through an AWS Lambda function, as shown here
AWS Developer Tools with Kubernetes: Integrated Workflow with Portworx
In the following workflow, a developer is testing changes to a containerized application that calls on MongoDB. The tests are performed against a staging instance of MongoDB. The same workflow applies if changes were on the server side. The original production deployment is scheduled as a Kubernetes deployment object and uses Portworx as the storage for the persistent volume.
The continuous deployment pipeline runs as follows:
Developers integrate bug fix changes into a main development branch that gets merged into a CodeCommit master branch.
Amazon CloudWatch triggers the pipeline when code is merged into a master branch of an AWS CodeCommit repository.
AWS CodePipeline sends the new revision to AWS CodeBuild, which builds a Docker container image with the build ID.
AWS CodeBuild pushes the new Docker container image tagged with the build ID to an Amazon ECR registry.
Kubernetes downloads the new container (for the database client) from Amazon ECR and deploys the application (as a pod) and staging MongoDB instance (as a deployment object).
AWS CodePipeline, through a Lambda function, calls Portworx to snapshot the production MongoDB and deploy a staging instance of MongoDB• Portworx provides a snapshot of the production instance as the persistent storage of the staging MongoDB • The MongoDB instance mounts the snapshot.
At this point, the staging setup mimics a production environment. Teams can run integration and full end-to-end tests, using partner tooling, without impacting production workloads. The full pipeline is shown here.
This reference architecture showcases how development teams can easily move data between production and staging for the purposes of testing. Instead of taking application-specific manual steps, all operations in this CodePipeline architecture are automated and tracked as part of the CI/CD process.
This integrated experience is part of making stateful containers as easy as stateless. With AWS CodePipeline for CI/CD process, developers can easily deploy stateful containers onto a Kubernetes cluster with Portworx storage and automate data movement within their process.
The reference architecture and code are available on GitHub:
AWS Config enables continuous monitoring of your AWS resources, making it simple to assess, audit, and record resource configurations and changes. AWS Config does this through the use of rules that define the desired configuration state of your AWS resources. AWS Config provides a number of AWS managed rules that address a wide range of security concerns such as checking if you encrypted your Amazon Elastic Block Store (Amazon EBS) volumes, tagged your resources appropriately, and enabled multi-factor authentication (MFA) for root accounts. You can also create custom rules to codify your compliance requirements through the use of AWS Lambda functions.
In this post we’ll show you how to use AWS Config to monitor our Amazon Simple Storage Service (S3) bucket ACLs and policies for violations which allow public read or public write access. If AWS Config finds a policy violation, we’ll have it trigger an Amazon CloudWatch Event rule to trigger an AWS Lambda function which either corrects the S3 bucket ACL, or notifies you via Amazon Simple Notification Service (Amazon SNS) that the policy is in violation and allows public read or public write access. We’ll show you how to do this in five main steps.
Enable AWS Config to monitor Amazon S3 bucket ACLs and policies for compliance violations.
Create an IAM Role and Policy that grants a Lambda function permissions to read S3 bucket policies and send alerts through SNS.
Create and configure a CloudWatch Events rule that triggers the Lambda function when AWS Config detects an S3 bucket ACL or policy violation.
Create a Lambda function that uses the IAM role to review S3 bucket ACLs and policies, correct the ACLs, and notify your team of out-of-compliance policies.
Verify the monitoring solution.
Note: This post assumes your compliance policies require the buckets you monitor not allow public read or write access. If you have intentionally open buckets serving static content, for example, you can use this post as a jumping-off point for a solution tailored to your needs.
At the end of this post, we provide an AWS CloudFormation template that implements the solution outlined. The template enables you to deploy the solution in multiple regions quickly.
Important: The use of some of the resources deployed, including those deployed using the provided CloudFormation template, will incur costs as long as they are in use. AWS Config Rules incur costs in each region they are active.
Here’s an architecture diagram of what we’ll implement:
Figure 1: Architecture diagram
Step 1: Enable AWS Config and Amazon S3 Bucket monitoring
The following steps demonstrate how to set up AWS Config to monitor Amazon S3 buckets.
If this is your first time using AWS Config, select Get started. If you’ve already used AWS Config, select Settings.
In the Settings page, under Resource types to record, clear the All resources checkbox. In the Specific types list, select Bucket under S3.
Figure 2: The Settings dialog box showing the “Specific types” list
Choose the Amazon S3 bucket for storing configuration history and snapshots. We’ll create a new Amazon S3 bucket.
Figure 3: Creating an S3 bucket
If you prefer to use an existing Amazon S3 bucket in your account, select the Choose a bucket from your account radio button and, using the dropdown, select an existing bucket.
Figure 4: Selecting an existing S3 bucket
Under Amazon SNS topic, check the box next to Stream configuration changes and notifications to an Amazon SNS topic, and then select the radio button to Create a topic.
Alternatively, you can choose a topic that you have previously created and subscribed to.
Figure 5: Selecting a topic that you’ve previously created and subscribed to
If you created a new SNS topic you need to subscribe to it to receive notifications. We’ll cover this in a later step.
Under AWS Config role, choose Create a role (unless you already have a role you want to use). We’re using the auto-suggested role name.
Figure 6: Creating a role
Configure Amazon S3 bucket monitoring rules:
On the AWS Config rules page, search for S3 and choose the s3-bucket-publice-read-prohibited and s3-bucket-public-write-prohibited rules, then click Next.
Figure 7: AWS Config rules dialog
On the Review page, select Confirm. AWS Config is now analyzing your Amazon S3 buckets, capturing their current configurations, and evaluating the configurations against the rules we selected.
If you created a new Amazon SNS topic, open the Amazon SNS Management Console and locate the topic you created:
Figure 8: Amazon SNS topic list
Copy the ARN of the topic (the string that begins with arn:) because you’ll need it in a later step.
Select the checkbox next to the topic, and then, under the Actions menu, select Subscribe to topic.
Select Email as the protocol, enter your email address, and then select Create subscription.
After several minutes, you’ll receive an email asking you to confirm your subscription for notifications for this topic. Select the link to confirm the subscription.
Step 2: Create a Role for Lambda
Our Lambda will need permissions that enable it to inspect and modify Amazon S3 bucket ACLs and policies, log to CloudWatch Logs, and publishing to an Amazon SNS topic. We’ll now set up a custom AWS Identity and Access Management (IAM) policy and role to support these actions and assign them to the Lambda function we’ll create in the next section.
In the AWS Management Console, under Services, select IAM to access the IAM Console.
Create a policy with the following permissions, or copy the following policy:
The pattern matches events generated by AWS Config when it checks the Amazon S3 bucket for public accessibility.
We’ll add a Lambda target later. For now, select your Amazon SNS topic created earlier, and then select Configure details.
Figure 9: The “Create rule” dialog
Give your rule a name and description. For this example, we’ll name ours AWSConfigFoundOpenBucket
Click Create rule.
Step 4: Create a Lambda Function
In this section, we’ll create a new Lambda function to examine an Amazon S3 bucket’s ACL and bucket policy. If the bucket ACL is found to allow public access, the Lambda function overwrites it to be private. If a bucket policy is found, the Lambda function creates an SNS message, puts the policy in the message body, and publishes it to the Amazon SNS topic we created. Bucket policies can be complex, and overwriting your policy may cause unexpected loss of access, so this Lambda function doesn’t attempt to alter your policy in any way.
Get the ARN of the Amazon SNS topic created earlier.
In the AWS Management Console, under Services, select Lambda to go to the Lambda Console.
From the Dashboard, select Create Function. Or, if you were taken directly to the Functions page, select the Create Function button in the upper-right.
On the Create function page:
Choose Author from scratch.
Provide a name for the function. We’re using AWSConfigOpenAccessResponder.
The Lambda function we’ve written is Python 3.6 compatible, so in the Runtime dropdown list, select Python 3.6.
Under Role, select Choose an existing role. Select the role you created in the previous section, and then select Create function.
Figure 10: The “Create function” dialog
We’ll now add a CloudWatch Event based on the rule we created earlier.
In the Add triggers section, select CloudWatch Events. A CloudWatch Events box should appear connected to the left side of the Lambda Function and have a note that states Configuration required.
Figure 11: CloudWatch Events in the “Add triggers” section
From the Rule dropdown box, choose the rule you created earlier, and then select Add.
Scroll up to the Designer section and select the name of your Lambda function.
Delete the default code and paste in the following code:
from botocore.exceptions import ClientError
ACL_RD_WARNING = "The S3 bucket ACL allows public read access."
PLCY_RD_WARNING = "The S3 bucket policy allows public read access."
ACL_WRT_WARNING = "The S3 bucket ACL allows public write access."
PLCY_WRT_WARNING = "The S3 bucket policy allows public write access."
RD_COMBO_WARNING = ACL_RD_WARNING + PLCY_RD_WARNING
WRT_COMBO_WARNING = ACL_WRT_WARNING + PLCY_WRT_WARNING
def policyNotifier(bucketName, s3client):
bucketPolicy = s3client.get_bucket_policy(Bucket = bucketName)
# notify that the bucket policy may need to be reviewed due to security concerns
sns = boto3.client('sns')
subject = "Potential compliance violation in " + bucketName + " bucket policy"
message = "Potential bucket policy compliance violation. Please review: " + json.dumps(bucketPolicy['Policy'])
# send SNS message with warning and bucket policy
response = sns.publish(
TopicArn = os.environ['TOPIC_ARN'],
Subject = subject,
Message = message
except ClientError as e:
# error caught due to no bucket policy
print("No bucket policy found; no alert sent.")
def lambda_handler(event, context):
# instantiate Amazon S3 client
s3 = boto3.client('s3')
resource = list(event['detail']['requestParameters']['evaluations'])
bucketName = resource['complianceResourceId']
complianceFailure = event['detail']['requestParameters']['evaluations']['annotation']
if(complianceFailure == ACL_RD_WARNING or complianceFailure == PLCY_RD_WARNING):
s3.put_bucket_acl(Bucket = bucketName, ACL = 'private')
elif(complianceFailure == PLCY_RD_WARNING or complianceFailure == PLCY_WRT_WARNING):
elif(complianceFailure == RD_COMBO_WARNING or complianceFailure == WRT_COMBO_WARNING):
s3.put_bucket_acl(Bucket = bucketName, ACL = 'private')
return 0 # done
Scroll down to the Environment variables section. This code uses an environment variable to store the Amazon SNS topic ARN.
For the key, enter TOPIC_ARN.
For the value, enter the ARN of the Amazon SNS topic created earlier.
Under Execution role, select Choose an existing role, and then select the role created earlier from the dropdown.
Leave everything else as-is, and then, at the top, select Save.
Step 5: Verify it Works
We now have the Lambda function, an Amazon SNS topic, AWS Config watching our Amazon S3 buckets, and a CloudWatch Rule to trigger the Lambda function if a bucket is found to be non-compliant. Let’s test them to make sure they work.
We have an Amazon S3 bucket, myconfigtestbucket that’s been created in the region monitored by AWS Config, as well as the associated Lambda function. This bucket has no public read or write access set in an ACL or a policy, so it’s compliant.
Figure 12: The “Config Dashboard”
Let’s change the bucket’s ACL to allow public listing of objects:
Figure 13: Screen shot of “Permissions” tab showing Everyone granted list access
After saving, the bucket now has public access. After several minutes, the AWS Config Dashboard notes that there is one non-compliant resource:
Figure 14: The “Config Dashboard” shown with a non-compliant resource
In the Amazon S3 Console, we can see that the bucket no longer has public listing of objects enabled after the invocation of the Lambda function triggered by the CloudWatch Rule created earlier.
Figure 15: The “Permissions” tab showing list access no longer allowed
Notice that the AWS Config Dashboard now shows that there are no longer any non-compliant resources:
Figure 16: The “Config Dashboard” showing zero non-compliant resources
Now, let’s try out the Amazon S3 bucket policy check by configuring a bucket policy that allows list access:
Figure 17: A bucket policy that allows list access
A few minutes after setting this bucket policy on the myconfigtestbucket bucket, AWS Config recognizes the bucket is no longer compliant. Because this is a bucket policy rather than an ACL, we publish a notification to the SNS topic we created earlier that lets us know about the potential policy violation:
Figure 18: Notification about potential policy violation
Knowing that the policy allows open listing of the bucket, we can now modify or delete the policy, after which AWS Config will recognize that the resource is compliant.
In this post, we demonstrated how you can use AWS Config to monitor for Amazon S3 buckets with open read and write access ACLs and policies. We also showed how to use Amazon CloudWatch, Amazon SNS, and Lambda to overwrite a public bucket ACL, or to alert you should a bucket have a suspicious policy. You can use the CloudFormation template to deploy this solution in multiple regions quickly. With this approach, you will be able to easily identify and secure open Amazon S3 bucket ACLs and policies. Once you have deployed this solution to multiple regions you can aggregate the results using an AWS Config aggregator. See this post to learn more.
If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the AWS Config forum or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
This blog was contributed by Rucha Nene, Sr. Product Manager for Amazon EBS
AWS customers use tags to track ownership of resources, implement compliance protocols, control access to resources via IAM policies, and drive their cost accounting processes. Last year, we made tagging for Amazon EC2 instances and Amazon EBS volumes easier by adding the ability to tag these resources upon creation. We are now extending this capability to EBS snapshots.
Earlier, you could tag your EBS snapshots only after the resource had been created and sometimes, ended up with EBS snapshots in an untagged state if tagging failed. You also could not control the actions that users and groups could take over specific snapshots, or enforce tighter security policies.
To address these issues, we are making tagging for EBS snapshots more flexible and giving customers more control over EBS snapshots by introducing two new capabilities:
Tag on creation for EBS snapshots – You can now specify tags for EBS snapshots as part of the API call that creates the resource or via the Amazon EC2 Console when creating an EBS snapshot.
Resource-level permission and enforced tag usage – The CreateSnapshot, DeleteSnapshot, and ModifySnapshotAttrribute API actions now support IAM resource-level permissions. You can now write IAM policies that mandate the use of specific tags when taking actions on EBS snapshots.
Tag on creation
You can now specify tags for EBS snapshots as part of the API call that creates the resources. The resource creation and the tagging are performed atomically; both must succeed in order for the operation CreateSnapshot to succeed. You no longer need to build tagging scripts that run after EBS snapshots have been created.
Here’s how you specify tags when you create an EBS snapshot, using the console:
CreateSnapshot, DeleteSnapshot, and ModifySnapshotAttribute now support resource-level permissions, which allow you to exercise more control over EBS snapshots. You can write IAM policies that give you precise control over access to resources and let you specify which users are able to create snapshots for a given set of volumes. You can also enforce the use of specific tags to help track resources and achieve more accurate cost allocation reporting.
For example, here’s a statement that requires that the costcenter tag (with a value of “115”) be present on the volume from which snapshots are being created. It requires that this tag be applied to all newly created snapshots. In addition, it requires that the created snapshots are tagged with User:username for the customer.
To implement stronger compliance and security policies, you could also restrict access to DeleteSnapshot, if the resource is not tagged with the user’s name. Here’s a statement that allows the deletion of a snapshot only if the snapshot is tagged with User:username for the customer.
In the news, Boeing (an aircraft maker) has been “targeted by a WannaCry virus attack”. Phrased this way, it’s implausible. There are no new attacks targeting people with WannaCry. There is either no WannaCry, or it’s simply a continuation of the attack from a year ago.
It’s possible what happened is that an anti-virus product called a new virus “WannaCry”. Virus families are often related, and sometimes a distant relative gets called the same thing. I know this watching the way various anti-virus products label my own software, which isn’t a virus, but which virus writers often include with their own stuff. The Lazarus group, which is believed to be responsible for WannaCry, have whole virus families like this. Thus, just because an AV product claims you are infected with WannaCry doesn’t mean it’s the same thing that everyone else is calling WannaCry.
Famously, WannaCry was the first virus/ransomware/worm that used the NSA ETERNALBLUE exploit. Other viruses have since added the exploit, and of course, hackers use it when attacking systems. It may be that a network intrusion detection system detected ETERNALBLUE, which people then assumed was due to WannaCry. It may actually have been an nPetya infection instead (nPetya was the second major virus/worm/ransomware to use the exploit).
Or it could be the real WannaCry, but it’s probably not a new “attack” that “targets” Boeing. Instead, it’s likely a continuation from WannaCry’s first appearance. WannaCry is a worm, which means it spreads automatically after it was launched, for years, without anybody in control. Infected machines still exist, unnoticed by their owners, attacking random machines on the Internet. If you plug in an unpatched computer onto the raw Internet, without the benefit of a firewall, it’ll get infected within an hour.
However, the Boeing manufacturing systems that were infected were not on the Internet, so what happened? The narrative from the news stories imply some nefarious hacker activity that “targeted” Boeing, but that’s unlikely.
We have now have over 15 years of experience with network worms getting into strange places disconnected and even “air gapped” from the Internet. The most common reason is laptops. Somebody takes their laptop to some place like an airport WiFi network, and gets infected. They put their laptop to sleep, then wake it again when they reach their destination, and plug it into the manufacturing network. At this point, the virus spreads and infects everything. This is especially the case with maintenance/support engineers, who often have specialized software they use to control manufacturing machines, for which they have a reason to connect to the local network even if it doesn’t have useful access to the Internet. A single engineer may act as a sort of Typhoid Mary, going from customer to customer, infecting each in turn whenever they open their laptop.
Another cause for infection is virtual machines. A common practice is to take “snapshots” of live machines and save them to backups. Should the virtual machine crash, instead of rebooting it, it’s simply restored from the backed up running image. If that backup image is infected, then bringing it out of sleep will allow the worm to start spreading.
Jake Williams claims he’s seen three other manufacturing networks infected with WannaCry. Why does manufacturing seem more susceptible? The reason appears to be the “killswitch” that stops WannaCry from running elsewhere. The killswitch uses a DNS lookup, stopping itself if it can resolve a certain domain. Manufacturing networks are largely disconnected from the Internet enough that such DNS lookups don’t work, so the domain can’t be found, so the killswitch doesn’t work. Thus, manufacturing systems are no more likely to get infected, but the lack of killswitch means the virus will continue to run, attacking more systems instead of immediately killing itself.
One solution to this would be to setup sinkhole DNS servers on the network that resolve all unknown DNS queries to a single server that logs all requests. This is trivially setup with most DNS servers. The logs will quickly identify problems on the network, as well as any hacker or virus activity. The side effect is that it would make this killswitch kill WannaCry. WannaCry isn’t sufficient reason to setup sinkhole servers, of course, but it’s something I’ve found generally useful in the past.
Conclusion Something obviously happened to the Boeing plant, but the narrative is all wrong. Words like “targeted attack” imply things that likely didn’t happen. Facts are so loose in cybersecurity that it may not have even been WannaCry.
The real story is that the original WannaCry is still out there, still trying to spread. Simply put a computer on the raw Internet (without a firewall) and you’ll get attacked. That, somehow, isn’t news. Instead, what’s news is whenever that continued infection hits somewhere famous, like Boeing, even though (as Boeing claims) it had no important effect.
Backblaze’s mission is making cloud storage astonishingly easy and affordable. That guides our focus — making our customers’ data more usable. Today, we’re pleased to introduce a trial of the B2 Snapshot Return Refund program. B2 customers have long been able to create a Snapshot of their data and order a hard drive with that data sent via FedEx anywhere in the world. Starting today, if the customer sends the drive back to Backblaze within 30 days, they will get a full refund. This new feature is available automatically for B2 customers when they order a Snapshot. There are no extra buttons to push or boxes to check — just send back the drive within 30 days and we’ll refund your money. To put it simply, we are offering the cloud storage industry’s only refundable rapid data egress service.
You Shouldn’t be Afraid to Use Your Own Data
Last week, we cut the price of B2 downloads in half — from 2¢ per GB to 1¢ per GB. That 50% reduction makes B2’s download price 1/5 that of Amazon’s S3 (with B2 storage pricing already 1/4 that of S3). The price reduction and today’s introduction of the B2 Snapshot Return Refund program are deliberate moves to eliminate the industry’s biggest barrier to entry — the cost of using data stored in the cloud. Storage vendors who make it expensive to restore, or place time lag impediments to access, are reducing the usefulness of your data. We believe this is antithetical to encouraging the use of the cloud in the first place.
There are many ways B2 customers can benefit from using the B2 Snapshot Return Refund program, here is a typical scenario.
Media and Entertainment Workflow Based Snapshots
Businesses in the Media and Entertainment (M&E) industry tend to have large quantities of digital media, and the amount of data will continue to increase in the coming years with more 4K and 8K cameras coming into regular use. When an organization needs to deliver or share that data, they typically have to manually download data from their internal storage system, and copy it on a thumb drive or hard drive, or perhaps create an LTO tape. Once that is done, they take their storage device, label it, and mail to their customer. Not only is this practice costly, time consuming, and potentially insecure, it doesn’t scale well with larger amounts of data.
With just a few clicks, you can easily distribute or share your digital media if it stored in the B2 Cloud. Here’s how the process works:
Log in to your Backblaze B2 account.
Navigate to the bucket where the data is located.
Select the files, or the entire bucket, you wish to send and create a “Snapshot.”
Once the Snapshot is complete you have choices:
Download the Snapshot and pay $0.01/GB for the download
Have Backblaze copy the Snapshot to an external hard drive and FedEx it anywhere in the world. This stores up to 3.5 TB and costs $189.00. Return the hard drive to Backblaze within 30 days and you’ll get your $189.00 back.
Have Backblaze copy the Snapshot to a flash drive and FedEx it anywhere in the world. This stores up to 110 GB and costs $99.00. FedEx shipping to the specified location is included. Return the flash drive to Backblaze within 30 days and you’ll get your $99.00 back.
You can always keep the hard drive or flash drive and Backblaze, of course, will keep your money.
Each drive containing a Snapshot is encrypted. The encryption key can be found in your Backblaze B2 account after you log in. The FedEX tracking number is there as well. When the hard drive arrives at its destination you can provide the encryption key to the recipient and they’ll be able to access the files. Note that the encryption key must be entered each time the hard drive is started, so the data remains protected even if the hard drive is returned to Backblaze.
The B2 Snapshot Return Refund program supports Snapshots as large as 3.5 terabytes. That means you can send about 50 hours of 4k video to a client or partner by selecting the hard drive option. If you select the flash drive option, a Snapshot can be up to 110 gigabytes, which is about 1hr and 45 min of 4k video.
While the example uses an M&E workflow, any workflow requiring the exchange or distribution of large amounts of data across distinct geographies will benefit from this service.
This is a Trial Program
Backblaze fully intends to offer the B2 Snapshot Return Refund Program for a long time. That said, there is no program like this in the industry and so we want to put some guardrails on it to ensure we can offer a sustainable program for all. Thus, the “fine print”:
Minimum Snapshot Size — a Snapshot must be greater than 10 GB to qualify for this program. Why? You can download a 10 GB Snapshot in a few minutes. Why pay us to do the same thing and have it take a couple of days??
The 30 Day Clock — The clock starts on the day the drive is marked as delivered to you by FedEx and the clock ends on the date postmarked on the package we receive. If that’s 30 days or less, your refund will be granted.
5 Drive Refunds Per Year — We are initially setting a limit of 5 drive refunds per B2 account per year. By placing a cap on the number of drive refunds per year, we are able to provide a service that is responsive to our entire client base. We expect to change or remove this limit once we have enough data to understand the demand and can make sure we are staffed properly.
It is Your Data — Use It
Our industry has a habit of charging little to store data and then usurious amounts to get it back. There are certainly real costs involved in data retrieval. We outlined them in our post on the Cost of Cloud Storage. The industry rates charged for data retrieval are clearly strategic moves to try and lock customers in. To us, that runs counter to trying to do our part to make data useful and our customers’ lives easier. That viewpoint drives our efforts behind lowering our download pricing and the creation of this program.
We hope you enjoy the B2 Snapshot Return Refund program. If you have a moment, please tell us in the comments below how you might use it!
This post was written in partnership with Intuit to share learnings, best practices, and recommendations for running an Apache Kafka cluster on AWS. Thanks to Vaishak Suresh and his colleagues at Intuit for their contribution and support.
Intuit, in their own words: Intuit, a leading enterprise customer for AWS, is a creator of business and financial management solutions. For more information on how Intuit partners with AWS, see our previous blog post, Real-time Stream Processing Using Apache Spark Streaming and Apache Kafka on AWS. Apache Kafka is an open-source, distributed streaming platform that enables you to build real-time streaming applications.
The best practices described in this post are based on our experience in running and operating large-scale Kafka clusters on AWS for more than two years. Our intent for this post is to help AWS customers who are currently running Kafka on AWS, and also customers who are considering migrating on-premises Kafka deployments to AWS.
Running your Kafka deployment on Amazon EC2 provides a high performance, scalable solution for ingesting streaming data. AWS offers many different instance types and storage option combinations for Kafka deployments. However, given the number of possible deployment topologies, it’s not always trivial to select the most appropriate strategy suitable for your use case.
In this blog post, we cover the following aspects of running Kafka clusters on AWS:
Deployment considerations and patterns
Backup and restore
Note: While implementing Kafka clusters in a production environment, make sure also to consider factors like your number of messages, message size, monitoring, failure handling, and any operational issues.
Deployment considerations and patterns
In this section, we discuss various deployment options available for Kafka on AWS, along with pros and cons of each option. A successful deployment starts with thoughtful consideration of these options. Considering availability, consistency, and operational overhead of the deployment helps when choosing the right option.
Single AWS Region, Three Availability Zones, All Active
One typical deployment pattern (all active) is in a single AWS Region with three Availability Zones (AZs). One Kafka cluster is deployed in each AZ along with Apache ZooKeeper and Kafka producer and consumer instances as shown in the illustration following.
In this pattern, this is the Kafka cluster deployment:
Kafka producers and Kafka cluster are deployed on each AZ.
Data is distributed evenly across three Kafka clusters by using Elastic Load Balancer.
Kafka consumers aggregate data from all three Kafka clusters.
Kafka cluster failover occurs this way:
Mark down all Kafka producers
Debug and restack Kafka
Restart Kafka producers
Following are the pros and cons of this pattern.
Can sustain the failure of two AZs
No message loss during failover
Very high operational overhead:
All changes need to be deployed three times, one for each Kafka cluster
Maintaining and monitoring three Kafka clusters
Maintaining and monitoring three consumer clusters
A restart is required for patching and upgrading brokers in a Kafka cluster. In this approach, a rolling upgrade is done separately for each cluster.
Single Region, Three Availability Zones, Active-Standby
Another typical deployment pattern (active-standby) is in a single AWS Region with a single Kafka cluster and Kafka brokers and Zookeepers distributed across three AZs. Another similar Kafka cluster acts as a standby as shown in the illustration following. You can use Kafka mirroring with MirrorMaker to replicate messages between any two clusters.
In this pattern, this is the Kafka cluster deployment:
Kafka producers are deployed on all three AZs.
Only one Kafka cluster is deployed across three AZs (active).
ZooKeeper instances are deployed on each AZ.
Brokers are spread evenly across all three AZs.
Kafka consumers can be deployed across all three AZs.
Standby Kafka producers and a Multi-AZ Kafka cluster are part of the deployment.
Kafka cluster failover occurs this way:
Switch traffic to standby Kafka producers cluster and Kafka cluster.
Restart consumers to consume from standby Kafka cluster.
Following are the pros and cons of this pattern.
Less operational overhead when compared to the first option
Only one Kafka cluster to manage and consume data from
Can handle single AZ failures without activating a standby Kafka cluster
Added latency due to cross-AZ data transfer among Kafka brokers
For Kafka versions before 0.10, replicas for topic partitions have to be assigned so they’re distributed to the brokers on different AZs (rack-awareness)
The cluster can become unavailable in case of a network glitch, where ZooKeeper does not see Kafka brokers
Possibility of in-transit message loss during failover
Intuit recommends using a single Kafka cluster in one AWS Region, with brokers distributing across three AZs (single region, three AZs). This approach offers stronger fault tolerance than otherwise, because a failed AZ won’t cause Kafka downtime.
There are two storage options for file storage in Amazon EC2:
Ephemeral storage is local to the Amazon EC2 instance. It can provide high IOPS based on the instance type. On the other hand, Amazon EBS volumes offer higher resiliency and you can configure IOPS based on your storage needs. EBS volumes also offer some distinct advantages in terms of recovery time. Your choice of storage is closely related to the type of workload supported by your Kafka cluster.
Kafka provides built-in fault tolerance by replicating data partitions across a configurable number of instances. If a broker fails, you can recover it by fetching all the data from other brokers in the cluster that host the other replicas. Depending on the size of the data transfer, it can affect recovery process and network traffic. These in turn eventually affect the cluster’s performance.
The following table contrasts the benefits of using an instance store versus using EBS for storage.
Instance storage is recommended for large- and medium-sized Kafka clusters. For a large cluster, read/write traffic is distributed across a high number of brokers, so the loss of a broker has less of an impact. However, for smaller clusters, a quick recovery for the failed node is important, but a failed broker takes longer and requires more network traffic for a smaller Kafka cluster.
Storage-optimized instances like h1, i3, and d2 are an ideal choice for distributed applications like Kafka.
The primary advantage of using EBS in a Kafka deployment is that it significantly reduces data-transfer traffic when a broker fails or must be replaced. The replacement broker joins the cluster much faster.
Data stored on EBS is persisted in case of an instance failure or termination. The broker’s data stored on an EBS volume remains intact, and you can mount the EBS volume to a new EC2 instance. Most of the replicated data for the replacement broker is already available in the EBS volume and need not be copied over the network from another broker. Only the changes made after the original broker failure need to be transferred across the network. That makes this process much faster.
Intuit chose EBS because of their frequent instance restacking requirements and also other benefits provided by EBS.
Generally, Kafka deployments use a replication factor of three. EBS offers replication within their service, so Intuit chose a replication factor of two instead of three.
The choice of instance types is generally driven by the type of storage required for your streaming applications on a Kafka cluster. If your application requires ephemeral storage, h1, i3, and d2 instances are your best option.
The network plays a very important role in a distributed system like Kafka. A fast and reliable network ensures that nodes can communicate with each other easily. The available network throughput controls the maximum amount of traffic that Kafka can handle. Network throughput, combined with disk storage, is often the governing factor for cluster sizing.
If you expect your cluster to receive high read/write traffic, select an instance type that offers 10-Gb/s performance.
In addition, choose an option that keeps interbroker network traffic on the private subnet, because this approach allows clients to connect to the brokers. Communication between brokers and clients uses the same network interface and port. For more details, see the documentation about IP addressing for EC2 instances.
If you are deploying in more than one AWS Region, you can connect the two VPCs in the two AWS Regions using cross-region VPC peering. However, be aware of the networking costs associated with cross-AZ deployments.
Kafka has a history of not being backward compatible, but its support of backward compatibility is getting better. During a Kafka upgrade, you should keep your producer and consumer clients on a version equal to or lower than the version you are upgrading from. After the upgrade is finished, you can start using a new protocol version and any new features it supports. There are three upgrade approaches available, discussed following.
Rolling or in-place upgrade
In a rolling or in-place upgrade scenario, upgrade one Kafka broker at a time. Take into consideration the recommendations for doing rolling restarts to avoid downtime for end users.
If you can afford the downtime, you can take your entire cluster down, upgrade each Kafka broker, and then restart the cluster.
Intuit followed the blue/green deployment model for their workloads, as described following.
If you can afford to create a separate Kafka cluster and upgrade it, we highly recommend the blue/green upgrade scenario. In this scenario, we recommend that you keep your clusters up-to-date with the latest Kafka version. For additional details on Kafka version upgrades or more details, see the Kafka upgrade documentation.
The following illustration shows a blue/green upgrade.
In this scenario, the upgrade plan works like this:
Create a new Kafka cluster on AWS.
Create a new Kafka producers stack to point to the new Kafka cluster.
Create topics on the new Kafka cluster.
Test the green deployment end to end (sanity check).
Using Amazon Route 53, change the new Kafka producers stack on AWS to point to the new green Kafka environment that you have created.
The roll-back plan works like this:
Switch Amazon Route 53 to the old Kafka producers stack on AWS to point to the old Kafka environment.
You can tune Kafka performance in multiple dimensions. Following are some best practices for performance tuning.
These are some general performance tuning techniques:
If throughput is less than network capacity, try the following:
Add more threads
Increase batch size
Add more producer instances
Add more partitions
To improve latency when acks =-1, increase your num.replica.fetches value.
For cross-AZ data transfer, tune your buffer settings for sockets and for OS TCP.
Make sure that num.io.threads is greater than the number of disks dedicated for Kafka.
Adjust num.network.threads based on the number of producers plus the number of consumers plus the replication factor.
Your message size affects your network bandwidth. To get higher performance from a Kafka cluster, select an instance type that offers 10 Gb/s performance.
For Java and JVM tuning, try the following:
Minimize GC pauses by using the Oracle JDK, which uses the new G1 garbage-first collector.
Try to keep the Kafka heap size below 4 GB.
Knowing whether a Kafka cluster is working correctly in a production environment is critical. Sometimes, just knowing that the cluster is up is enough, but Kafka applications have many moving parts to monitor. In fact, it can easily become confusing to understand what’s important to watch and what you can set aside. Items to monitor range from simple metrics about the overall rate of traffic, to producers, consumers, brokers, controller, ZooKeeper, topics, partitions, messages, and so on.
For monitoring, Intuit used several tools, including Newrelec, Wavefront, Amazon CloudWatch, and AWS CloudTrail. Our recommended monitoring approach follows.
For system metrics, we recommend that you monitor:
File handle usage
Disk I/O performance
For producers, we recommend that you monitor:
For consumers, we recommend that you monitor:
Like most distributed systems, Kafka provides the mechanisms to transfer data with relatively high security across the components involved. Depending on your setup, security might involve different services such as encryption, Kerberos, Transport Layer Security (TLS) certificates, and advanced access control list (ACL) setup in brokers and ZooKeeper. The following tells you more about the Intuit approach. For details on Kafka security not covered in this section, see the Kafka documentation.
Kafka uses TLS for client and internode communications.
Authentication of connections to brokers from clients (producers and consumers) to other brokers and tools uses either Secure Sockets Layer (SSL) or Simple Authentication and Security Layer (SASL).
Kafka supports Kerberos authentication. If you already have a Kerberos server, you can add Kafka to your current configuration.
In Kafka, authorization is pluggable and integration with external authorization services is supported.
Backup and restore
The type of storage used in your deployment dictates your backup and restore strategy.
The best way to back up a Kafka cluster based on instance storage is to set up a second cluster and replicate messages using MirrorMaker. Kafka’s mirroring feature makes it possible to maintain a replica of an existing Kafka cluster. Depending on your setup and requirements, your backup cluster might be in the same AWS Region as your main cluster or in a different one.
For EBS-based deployments, you can enable automatic snapshots of EBS volumes to back up volumes. You can easily create new EBS volumes from these snapshots to restore. We recommend storing backup files in Amazon S3.
For more information on how to back up in Kafka, see the Kafka documentation.
In this post, we discussed several patterns for running Kafka in the AWS Cloud. AWS also provides an alternative managed solution with Amazon Kinesis Data Streams, there are no servers to manage or scaling cliffs to worry about, you can scale the size of your streaming pipeline in seconds without downtime, data replication across availability zones is automatic, you benefit from security out of the box, Kinesis Data Streams is tightly integrated with a wide variety of AWS services like Lambda, Redshift, Elasticsearch and it supports open source frameworks like Storm, Spark, Flink, and more. You may refer to kafka-kinesis connector.
If you have questions or suggestions, please comment below.
Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.
As leading-edge rolling distributions go, OpenSUSE Tumbleweed is relatively stable, but it is still true that some snapshots are better than others. Jimmy Berry has announced the creation of a web site tracking the quality of each day’s snapshot. “By utilizing a variety of sources of feedback pertaining to snapshots a stability score is estimated. The goal is to err on the side of caution and to allow users to avoid troublesome releases.”
Apache Cassandra is a commonly used, high performance NoSQL database. AWS customers that currently maintain Cassandra on-premises may want to take advantage of the scalability, reliability, security, and economic benefits of running Cassandra on Amazon EC2.
Amazon EC2 and Amazon Elastic Block Store (Amazon EBS) provide secure, resizable compute capacity and storage in the AWS Cloud. When combined, you can deploy Cassandra, allowing you to scale capacity according to your requirements. Given the number of possible deployment topologies, it’s not always trivial to select the most appropriate strategy suitable for your use case.
In this post, we outline three Cassandra deployment options, as well as provide guidance about determining the best practices for your use case in the following areas:
Cassandra resource overview
High availability and resiliency
Before we jump into best practices for running Cassandra on AWS, we should mention that we have many customers who decided to use DynamoDB instead of managing their own Cassandra cluster. DynamoDB is fully managed, serverless, and provides multi-master cross-region replication, encryption at rest, and managed backup and restore. Integration with AWS Identity and Access Management (IAM) enables DynamoDB customers to implement fine-grained access control for their data security needs.
AWS provides options, so you’re covered whether you want to run your own NoSQL Cassandra database, or move to a fully managed, serverless DynamoDB database.
Cassandra resource overview
Here’s a short introduction to standard Cassandra resources and how they are implemented with AWS infrastructure. If you’re already familiar with Cassandra or AWS deployments, this can serve as a refresher.
A single Cassandra deployment.
This typically consists of multiple physical locations, keyspaces, and physical servers.
A logical deployment construct in AWS that maps to an AWS CloudFormation StackSet, which consists of one or many CloudFormation stacks to deploy Cassandra.
A group of nodes configured as a single replication group.
A logical deployment construct in AWS.
A datacenter is deployed with a single CloudFormation stack consisting of Amazon EC2 instances, networking, storage, and security resources.
A collection of servers.
A datacenter consists of at least one rack. Cassandra tries to place the replicas on different racks.
A single Availability Zone.
A physical virtual machine running Cassandra software.
An EC2 instance.
Conceptually, the data managed by a cluster is represented as a ring. The ring is then divided into ranges equal to the number of nodes. Each node being responsible for one or more ranges of the data. Each node gets assigned with a token, which is essentially a random number from the range. The token value determines the node’s position in the ring and its range of data.
Managed within Cassandra.
Virtual node (vnode)
Responsible for storing a range of data. Each vnode receives one token in the ring. A cluster (by default) consists of 256 tokens, which are uniformly distributed across all servers in the Cassandra datacenter.
Managed within Cassandra.
The total number of replicas across the cluster.
Managed within Cassandra.
One of the many benefits of deploying Cassandra on Amazon EC2 is that you can automate many deployment tasks. In addition, AWS includes services, such as CloudFormation, that allow you to describe and provision all your infrastructure resources in your cloud environment.
We recommend orchestrating each Cassandra ring with one CloudFormation template. If you are deploying in multiple AWS Regions, you can use a CloudFormation StackSet to manage those stacks. All the maintenance actions (scaling, upgrading, and backing up) should be scripted with an AWS SDK. These may live as standalone AWS Lambda functions that can be invoked on demand during maintenance.
You can get started by following the Cassandra Quick Start deployment guide. Keep in mind that this guide does not address the requirements to operate a production deployment and should be used only for learning more about Cassandra.
In this section, we discuss various deployment options available for Cassandra in Amazon EC2. A successful deployment starts with thoughtful consideration of these options. Consider the amount of data, network environment, throughput, and availability.
Single AWS Region, 3 Availability Zones
Single region, 3 Availability Zones
In this pattern, you deploy the Cassandra cluster in one AWS Region and three Availability Zones. There is only one ring in the cluster. By using EC2 instances in three zones, you ensure that the replicas are distributed uniformly in all zones.
To ensure the even distribution of data across all Availability Zones, we recommend that you distribute the EC2 instances evenly in all three Availability Zones. The number of EC2 instances in the cluster is a multiple of three (the replication factor).
This pattern is suitable in situations where the application is deployed in one Region or where deployments in different Regions should be constrained to the same Region because of data privacy or other legal requirements.
● Highly available, can sustain failure of one Availability Zone.
● Simple deployment
● Does not protect in a situation when many of the resources in a Region are experiencing intermittent failure.
In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.
We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.
This pattern is most suitable when the applications using the Cassandra cluster are deployed in more than one Region.
● No data loss during failover.
● Highly available, can sustain when many of the resources in a Region are experiencing intermittent failures.
● Read/write traffic can be localized to the closest Region for the user for lower latency and higher performance.
● High operational overhead
● The second Region effectively doubles the cost
In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.
However, the second Region does not receive traffic from the applications. It only functions as a secondary location for disaster recovery reasons. If the primary Region is not available, the second Region receives traffic.
We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.
This pattern is most suitable when the applications using the Cassandra cluster require low recovery point objective (RPO) and recovery time objective (RTO).
● No data loss during failover.
● Highly available, can sustain failure or partitioning of one whole Region.
● High operational overhead.
● High latency for writes for eventual consistency.
● The second Region effectively doubles the cost.
In on-premises deployments, Cassandra deployments use local disks to store data. There are two storage options for EC2 instances:
Your choice of storage is closely related to the type of workload supported by the Cassandra cluster. Instance store works best for most general purpose Cassandra deployments. However, in certain read-heavy clusters, Amazon EBS is a better choice.
The choice of instance type is generally driven by the type of storage:
If ephemeral storage is required for your application, a storage-optimized (I3) instance is the best option.
If your workload requires Amazon EBS, it is best to go with compute-optimized (C5) instances.
Burstable instance types (T2) don’t offer good performance for Cassandra deployments.
Ephemeral storage is local to the EC2 instance. It may provide high input/output operations per second (IOPs) based on the instance type. An SSD-based instance store can support up to 3.3M IOPS in I3 instances. This high performance makes it an ideal choice for transactional or write-intensive applications such as Cassandra.
In general, instance storage is recommended for transactional, large, and medium-size Cassandra clusters. For a large cluster, read/write traffic is distributed across a higher number of nodes, so the loss of one node has less of an impact. However, for smaller clusters, a quick recovery for the failed node is important.
As an example, for a cluster with 100 nodes, the loss of 1 node is 3.33% loss (with a replication factor of 3). Similarly, for a cluster with 10 nodes, the loss of 1 node is 33% less capacity (with a replication factor of 3).
(translates to higher query performance)
Up to 3.3M on I3
This results in a higher query performance on each host. However, Cassandra implicitly scales well in terms of horizontal scale. In general, we recommend scaling horizontally first. Then, scale vertically to mitigate specific issues.
Note: 3.3M IOPS is observed with 100% random read with a 4-KB block size on Amazon Linux.
AWS instance types
Compute optimized, C5
Being able to choose between different instance types is an advantage in terms of CPU, memory, etc., for horizontal and vertical scaling.
Basic building blocks are available from AWS.
Amazon EBS offers distinct advantage here. It is small engineering effort to establish a backup/restore strategy.
a) In case of an instance failure, the EBS volumes from the failing instance are attached to a new instance.
b) In case of an EBS volume failure, the data is restored by creating a new EBS volume from last snapshot.
EBS volumes offer higher resiliency, and IOPs can be configured based on your storage needs. EBS volumes also offer some distinct advantages in terms of recovery time. EBS volumes can support up to 32K IOPS per volume and up to 80K IOPS per instance in RAID configuration. They have an annualized failure rate (AFR) of 0.1–0.2%, which makes EBS volumes 20 times more reliable than typical commodity disk drives.
The primary advantage of using Amazon EBS in a Cassandra deployment is that it reduces data-transfer traffic significantly when a node fails or must be replaced. The replacement node joins the cluster much faster. However, Amazon EBS could be more expensive, depending on your data storage needs.
Cassandra has built-in fault tolerance by replicating data to partitions across a configurable number of nodes. It can not only withstand node failures but if a node fails, it can also recover by copying data from other replicas into a new node. Depending on your application, this could mean copying tens of gigabytes of data. This adds additional delay to the recovery process, increases network traffic, and could possibly impact the performance of the Cassandra cluster during recovery.
Data stored on Amazon EBS is persisted in case of an instance failure or termination. The node’s data stored on an EBS volume remains intact and the EBS volume can be mounted to a new EC2 instance. Most of the replicated data for the replacement node is already available in the EBS volume and won’t need to be copied over the network from another node. Only the changes made after the original node failed need to be transferred across the network. That makes this process much faster.
EBS volumes are snapshotted periodically. So, if a volume fails, a new volume can be created from the last known good snapshot and be attached to a new instance. This is faster than creating a new volume and coping all the data to it.
Most Cassandra deployments use a replication factor of three. However, Amazon EBS does its own replication under the covers for fault tolerance. In practice, EBS volumes are about 20 times more reliable than typical disk drives. So, it is possible to go with a replication factor of two. This not only saves cost, but also enables deployments in a region that has two Availability Zones.
EBS volumes are recommended in case of read-heavy, small clusters (fewer nodes) that require storage of a large amount of data. Keep in mind that the Amazon EBS provisioned IOPS could get expensive. General purpose EBS volumes work best when sized for required performance.
If your cluster is expected to receive high read/write traffic, select an instance type that offers 10–Gb/s performance. As an example, i3.8xlarge and c5.9xlarge both offer 10–Gb/s networking performance. A smaller instance type in the same family leads to a relatively lower networking throughput.
Cassandra generates a universal unique identifier (UUID) for each node based on IP address for the instance. This UUID is used for distributing vnodes on the ring.
In the case of an AWS deployment, IP addresses are assigned automatically to the instance when an EC2 instance is created. With the new IP address, the data distribution changes and the whole ring has to be rebalanced. This is not desirable.
To preserve the assigned IP address, use a secondary elastic network interface with a fixed IP address. Before swapping an EC2 instance with a new one, detach the secondary network interface from the old instance and attach it to the new one. This way, the UUID remains same and there is no change in the way that data is distributed in the cluster.
If you are deploying in more than one region, you can connect the two VPCs in two regions using cross-region VPC peering.
High availability and resiliency
Cassandra is designed to be fault-tolerant and highly available during multiple node failures. In the patterns described earlier in this post, you deploy Cassandra to three Availability Zones with a replication factor of three. Even though it limits the AWS Region choices to the Regions with three or more Availability Zones, it offers protection for the cases of one-zone failure and network partitioning within a single Region. The multi-Region deployments described earlier in this post protect when many of the resources in a Region are experiencing intermittent failure.
Resiliency is ensured through infrastructure automation. The deployment patterns all require a quick replacement of the failing nodes. In the case of a regionwide failure, when you deploy with the multi-Region option, traffic can be directed to the other active Region while the infrastructure is recovering in the failing Region. In the case of unforeseen data corruption, the standby cluster can be restored with point-in-time backups stored in Amazon S3.
In this section, we look at ways to ensure that your Cassandra cluster is healthy:
Backup and restore
Cassandra is horizontally scaled by adding more instances to the ring. We recommend doubling the number of nodes in a cluster to scale up in one scale operation. This leaves the data homogeneously distributed across Availability Zones. Similarly, when scaling down, it’s best to halve the number of instances to keep the data homogeneously distributed.
Cassandra is vertically scaled by increasing the compute power of each node. Larger instance types have proportionally bigger memory. Use deployment automation to swap instances for bigger instances without downtime or data loss.
All three types of upgrades (Cassandra, operating system patching, and instance type changes) follow the same rolling upgrade pattern.
In this process, you start with a new EC2 instance and install software and patches on it. Thereafter, remove one node from the ring. For more information, see Cassandra cluster Rolling upgrade. Then, you detach the secondary network interface from one of the EC2 instances in the ring and attach it to the new EC2 instance. Restart the Cassandra service and wait for it to sync. Repeat this process for all nodes in the cluster.
Backup and restore
Your backup and restore strategy is dependent on the type of storage used in the deployment. Cassandra supports snapshots and incremental backups. When using instance store, a file-based backup tool works best. Customers use rsync or other third-party products to copy data backups from the instance to long-term storage. For more information, see Backing up and restoring data in the DataStax documentation. This process has to be repeated for all instances in the cluster for a complete backup. These backup files are copied back to new instances to restore. We recommend using S3 to durably store backup files for long-term storage.
For Amazon EBS based deployments, you can enable automated snapshots of EBS volumes to back up volumes. New EBS volumes can be easily created from these snapshots for restoration.
We recommend that you think about security in all aspects of deployment. The first step is to ensure that the data is encrypted at rest and in transit. The second step is to restrict access to unauthorized users. For more information about security, see the Cassandra documentation.
Encryption at rest
Encryption at rest can be achieved by using EBS volumes with encryption enabled. Amazon EBS uses AWS KMS for encryption. For more information, see Amazon EBS Encryption.
Instance store–based deployments require using an encrypted file system or an AWS partner solution. If you are using DataStax Enterprise, it supports transparent data encryption.
Encryption in transit
Cassandra uses Transport Layer Security (TLS) for client and internode communications.
The security mechanism is pluggable, which means that you can easily swap out one authentication method for another. You can also provide your own method of authenticating to Cassandra, such as a Kerberos ticket, or if you want to store passwords in a different location, such as an LDAP directory.
The authorizer that’s plugged in by default is org.apache.cassandra.auth.Allow AllAuthorizer. Cassandra also provides a role-based access control (RBAC) capability, which allows you to create roles and assign permissions to these roles.
In this post, we discussed several patterns for running Cassandra in the AWS Cloud. This post describes how you can manage Cassandra databases running on Amazon EC2. AWS also provides managed offerings for a number of databases. To learn more, see Purpose-built databases for all your application needs.
If you have questions or suggestions, please comment below.
Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.
Provanshu Dey is a Senior IoT Consultant with AWS Professional Services. He works on highly scalable and reliable IoT, data and machine learning solutions with our customers. In his spare time, he enjoys spending time with his family and tinkering with electronics & gadgets.
The XFS filesystem has been in the kernel for fifteen years and was used in production on IRIX systems for five years before that. But it might just be time to teach that “old dog” of a filesystem some new tricks, Dave Chinner said, at the beginning of his linux.conf.au 2018 presentation. There are a number of features that XFS lacks when compared to more modern filesystems, such as snapshots and subvolumes; but he has been thinking—and writing code—on a path to get them into XFS.
The Amazon RDS team launched nearly 80 features in 2017. Some of them were covered in this blog, others on the AWS Database Blog, and the rest in What’s New or Forum posts. To wrap up my week, I thought it would be worthwhile to give you an organized recap. So here we go!
“I want my company to innovate, but I am not convinced we can execute successfully.” Far too many times I have heard this fear expressed by senior executives that I have met at different points in my career. In fact, a recent study published by Price Waterhouse Coopers found that while 93% of executives depend on innovation to drive growth, more than half are challenged to take innovative ideas to market quickly in a scalable way.
Many customers are struggling with how to drive enterprise innovation, so I was thrilled to share the stage at AWS re:Invent this past week with several senior executives who have successfully broken this mold to drive amazing enterprise innovation. In particular, I want to thank Parag Karnik from Johnson & Johnson, Bill Rothe from Hess Corporation, Dave Williams from Just Eat, and Olga Lagunova from Pitney Bowes for sharing their stories of innovation, creativity, and solid execution.
Among the many new announcements from AWS this past week, I am particularly excited about the following newly-launched AWS products and programs that I announced at re:Invent to drive new innovations by our enterprise customers:
AI: New Deep Learning Amazon Machine Image (AMI) on EC2 Windows As I shared at re:Invent, customers such as Infor are already successfully leveraging artificial intelligence tools on AWS to deliver tailored, industry-specific applications to their customers. We want to facilitate more of our Windows developers to get started quickly and easily with AI, leveraging machine learning based tools with popular deep learning frameworks, such as Apache MXNet, TensorFlow, and Caffe2. In order to enable this, I announced at re:Invent that AWS now offers a new Deep Learning AMI for Microsoft Windows. The AMI is tailored to facilitate large scale training of deep-learning models, and enables quick and easy setup of Windows Server-based compute resources for machine learning applications.
IoT: Visualize and Analyze SQL and IoT Data Forecasts show as many as 31 billion IoT devices by 2020. AWS wants every Windows customer to take advantage of the data available from their devices. Pitney Bowes, for example, now has more than 130,000 IoT devices streaming data to AWS. Using machine learning, Pitney Bowes enriches and analyzes data to enhance their customer experience, improve efficiencies, and create new data products. AWS IoT Analytics can now be leveraged to run analytics on IoT data and get insights that help you make better and more accurate decisions for IoT applications and machine learning use cases. AWS IoT Analytics can automatically enrich IoT device data with contextual metadata such as your SQL Server transactional data.
New Capabilities for .NET Developers on AWS In addition to all of the enhancements we’ve introduced to deliver a first class experience to Windows developers on AWS, we announced that we are including .NET Core 2.0 support in AWS Lambda and AWS CodeBuild, which will be available for broader use early next year. .NET Core 2.0 packs a number of new features such as Razor pages, better compatibility with .NET framework, more than double the number of APIs compared to the previous versions, and much more. With this announcement, you will be able to take advantage of all latest .NET Core features on Lambda and CodeBuild for building modern serverless and DevOps centric solutions.
License optimization for BYOL AWS provides you a wide variety of instance types and families that best meet your workload needs. If you are using software licensed by the number of vCPUs, you want the ability to further tweak vCPU count to optimize license spend. I announced the upcoming ability to optimize CPUs for EC2, giving you greater control over your EC2 instances on two fronts:
You can specify a custom number of vCPUs when launching new instances to save on vCPU based licensing costs. For example, SQL Server licensing spend.
You can disable Hyper-Threading Technology for workloads that perform well with single-threaded CPUs, like some high-performance computing (HPC) applications.
Using these capabilities, customers who bring their own license (BYOL) will be able to optimize their license usage and save on the license costs.
Server Migration Service for Hyper-V Virtual Machines As Bill Rothe from Hess Corporation shared at re:Invent, Hess has successfully migrated a wide range of workloads to the cloud, including SQL Server, SharePoint, SAP HANA, and many others. AWS Server Migration Service (SMS) now supports Hyper-V virtual machine (VM) migration, in order to further support enterprise migrations like these. AWS Server Migration Service will enable you to more easily co-ordinate large-scale server migrations from on-premise Hyper-V environments to AWS. AWS Server Migration Service allows you to automate, schedule, and track incremental replications of live server volumes. The replicated volumes are encrypted in transit and saved as a new Amazon Machine Image (AMI), which can be launched as an EC2 instance on AWS.
Microsoft Premier Support for AWS End-Customers I was pleased to announce that Microsoft and AWS have developed new areas of support integration to help ensure a great customer experience. Microsoft Premier Support is on board to help AWS assist end customers. AWS Support engineers can escalate directly to Microsoft Support on behalf of AWS customers running Microsoft workloads.
Best Practice Tools: HIPAA Compliance and Digital Innovation Workshop In November, we updated our HIPAA-focused white paper, outlining how you can use AWS to create HIPAA-compliant applications. In the first quarter of next year, we will publish a HIPAA Implementation Guide that expands on our HIPAA Quick Start to enable you to follow strict security, compliance, and risk management controls for common healthcare use cases. I was also pleased to award a Digital Innovation Workshop to one of our customers in my re:Invent session, and look forward to seeing more customers take advantage of this workshop.
AWS: The Continuous Innovation Cloud A common thread we see across customers is that continuous innovation from AWS enables their ongoing reinvention. Continuous innovation means that you are always getting a newer, better offering every single day. Sometimes it is in the form of brand new services and capabilities, and sometimes it is happening invisibly, under the covers where your environment just keeps getting better. I invite you to learn more about how you can accelerate your innovation journey with recently launched AWS services and AWS best practices. If you are migrating Windows workloads, speak with your AWS sales representative or an AWS Microsoft Workloads Competency Partner to learn how you can leverage our re:Think for Windows program for credits to start your migration.
Glenn Gore here, Chief Architect for AWS. I was in Las Vegas last week — with 43K others — for re:Invent 2017. I checked in to the Architecture blog here and here with my take on what was interesting about some of the bigger announcements from a cloud-architecture perspective.
In the excitement of so many new services being launched, we sometimes overlook feature updates that, while perhaps not as exciting as Amazon DeepLens, have significant impact on how you architect and develop solutions on AWS.
Amazon DynamoDB is used by more than 100,000 customers around the world, handling over a trillion requests every day. From the start, DynamoDB has offered high availability by natively spanning multiple Availability Zones within an AWS Region. As more customers started building and deploying truly-global applications, there was a need to replicate a DynamoDB table to multiple AWS Regions, allowing for read/write operations to occur in any region where the table was replicated. This update is important for providing a globally-consistent view of information — as users may transition from one region to another — or for providing additional levels of availability, allowing for failover between AWS Regions without loss of information.
There are some interesting concurrency-design aspects you need to be aware of and ensure you can handle correctly. For example, we support the “last writer wins” reconciliation where eventual consistency is being used and an application updates the same item in different AWS Regions at the same time. If you require strongly-consistent read/writes then you must perform all of your read/writes in the same AWS Region. The details behind this can be found in the DynamoDB documentation. Providing a globally-distributed, replicated DynamoDB table simplifies many different use cases and allows for the logic of replication, which may have been pushed up into the application layers to be simplified back down into the data layer.
The other big update for DynamoDB is that you can now back up your DynamoDB table on demand with no impact to performance. One of the features I really like is that when you trigger a backup, it is available instantly, regardless of the size of the table. Behind the scenes, we use snapshots and change logs to ensure a consistent backup. While backup is instant, restoring the table could take some time depending on its size and ranges — from minutes to hours for very large tables.
This feature is super important for those of you who work in regulated industries that often have strict requirements around data retention and backups of data, which sometimes limited the use of DynamoDB or required complex workarounds to implement some sort of backup feature in the past. This often incurred significant, additional costs due to increased read transactions on their DynamoDB tables.
Amazon Simple Storage Service (Amazon S3) was our first-released AWS service over 11 years ago, and it proved the simplicity and scalability of true API-driven architectures in the cloud. Today, Amazon S3 stores trillions of objects, with transactional requests per second reaching into the millions! Dealing with data as objects opened up an incredibly diverse array of use cases ranging from libraries of static images, game binary downloads, and application log data, to massive data lakes used for big data analytics and business intelligence. With Amazon S3, when you accessed your data in an object, you effectively had to write/read the object as a whole or use the range feature to retrieve a part of the object — if possible — in your individual use case.
Now, with Amazon S3 Select, an SQL-like query language is used that can work with delimited text and JSON files, as well as work with GZIP compressed files. We don’t support encryption during the preview of Amazon S3 Select.
Amazon S3 Select provides two major benefits:
Lower running costs
Serverless Lambda functions, where every millisecond matters when you are being charged, will benefit greatly from Amazon S3 Select as data retrieval and processing of your Lambda function will experience significant speedups and cost reductions. For example, we have seen 2x speed improvement and 80% cost reduction with the Serverless MapReduce code.
Other AWS services such as Amazon Athena, Amazon Redshift, and Amazon EMR will support Amazon S3 Select as well as partner offerings including Cloudera and Hortonworks. If you are using Amazon Glacier for longer-term data archival, you will be able to use Amazon Glacier Select to retrieve a subset of your content from within Amazon Glacier.
As the volume of data that can be stored within Amazon S3 and Amazon Glacier continues to scale on a daily basis, we will continue to innovate and develop improved and optimized services that will allow you to work with these magnificently-large data sets while reducing your costs (retrieval and processing). I believe this will also allow you to simplify the transformation and storage of incoming data into Amazon S3 in basic, semi-structured formats as a single copy vs. some of the duplication and reformatting of data sometimes required to do upfront optimizations for downstream processing. Amazon S3 Select largely removes the need for this upfront optimization and instead allows you to store data once and process it based on your individual Amazon S3 Select query per application or transaction need.
Thanks for reading!
Glenn contemplating why CSV format is still relevant in 2017 (Italy).
I was recently reading an article on ReadWrite.com titled “IoT devices go forth and multiply, to increase 200% by 2021“, and while the article noted the benefit for consumers and the industry of this growth, two things in the article stuck with me. The first was the specific statement that read “researchers warned that the proliferation of IoT technology will create a new bevvy of challenges. Particularly troublesome will be IoT deployments at scale for both end-users and providers.” Not only was that sentence a mouthful, but it really addressed some of the challenges that can come building solutions and deployment of this exciting new technology area. The second sentiment in the article that stayed with me was that Security issues could grow.
So the article got me thinking, how can we create these cool IoT solutions using low-cost efficient microcontrollers with a secure operating system that can easily connect to the cloud. Luckily the answer came to me by way of an exciting new open-source based offering coming from AWS that I am happy to announce to you all today. Let’s all welcome, Amazon FreeRTOS to the technology stage.
Amazon FreeRTOS is an IoT microcontroller operating system that simplifies development, security, deployment, and maintenance of microcontroller-based edge devices. Amazon FreeRTOS extends the FreeRTOS kernel, a popular real-time operating system, with libraries that enable local and cloud connectivity, security, and (coming soon) over-the-air updates.
So what are some of the great benefits of this new exciting offering, you ask. They are as follows:
Easily to create solutions for Low Power Connected Devices: provides a common operating system (OS) and libraries that make the development of common IoT capabilities easy for devices. For example; over-the-air (OTA) updates (coming soon) and device configuration.
Secure Data and Device Connections: devices only run trusted software using the Code Signing service, Amazon FreeRTOS provides a secure connection to the AWS using TLS, as well as, the ability to securely store keys and sensitive data on the device.
Extensive Ecosystem: contains an extensive hardware and technology ecosystem that allows you to choose a variety of qualified chipsets, including Texas Instruments, Microchip, NXP Semiconductors, and STMicroelectronics.
Cloud or Local Connections: Devices can connect directly to the AWS Cloud or via AWS Greengrass.
What’s cool is that it is easy to get started.
The Amazon FreeRTOS console allows you to select and download the software that you need for your solution.
There is a Qualification Program that helps to assure you that the microcontroller you choose will run consistently across several hardware options.
Finally, Amazon FreeRTOS kernel is an open-source FreeRTOS operating system that is freely available on GitHub for download.
But I couldn’t leave you without at least showing you a few snapshots of the Amazon FreeRTOS Console.
Within the Amazon FreeRTOS Console, I can select a predefined software configuration that I would like to use.
If I want to have a more customized software configuration, Amazon FreeRTOS allows you to customize a solution that is targeted for your use by adding or removing libraries.
Thanks for checking out the new Amazon FreeRTOS offering. To learn more go to the Amazon FreeRTOS product page or review the information provided about this exciting IoT device targeted operating system in the AWS documentation.
Can’t wait to see what great new IoT systems are will be enabled and created with it! Happy Coding.
You’ve probably heard about GDPR. The new European data protection regulation that applies practically to everyone. Especially if you are working in a big company, it’s most likely that there’s already a process for gettign your systems in compliance with the regulation.
The regulation is basically a law that must be followed in all European countries (but also applies to non-EU companies that have users in the EU). In this particular case, it applies to companies that are not registered in Europe, but are having European customers. So that’s most companies. I will not go into yet another “12 facts about GDPR” or “7 myths about GDPR” posts/whitepapers, as they are often aimed at managers or legal people. Instead, I’ll focus on what GDPR means for developers.
Why am I qualified to do that? A few reasons – I was advisor to the deputy prime minister of a EU country, and because of that I’ve been both exposed and myself wrote some legislation. I’m familiar with the “legalese” and how the regulatory framework operates in general. I’m also a privacy advocate and I’ve been writing about GDPR-related stuff in the past, i.e. “before it was cool” (protecting sensitive data, the right to be forgotten). And finally, I’m currently working on a project that (among other things) aims to help with covering some GDPR aspects.
I’ll try to be a bit more comprehensive this time and cover as many aspects of the regulation that concern developers as I can. And while developers will mostly be concerned about how the systems they are working on have to change, it’s not unlikely that a less informed manager storms in in late spring, realizing GDPR is going to be in force tomorrow, asking “what should we do to get our system/website compliant”.
The rights of the user/client (referred to as “data subject” in the regulation) that I think are relevant for developers are: the right to erasure (the right to be forgotten/deleted from the system), right to restriction of processing (you still keep the data, but mark it as “restricted” and don’t touch it without further consent by the user), the right to data portability (the ability to export one’s data), the right to rectification (the ability to get personal data fixed), the right to be informed (getting human-readable information, rather than long terms and conditions), the right of access (the user should be able to see all the data you have about them), the right to data portability (the user should be able to get a machine-readable dump of their data).
Additionally, the relevant basic principles are: data minimization (one should not collect more data than necessary), integrity and confidentiality (all security measures to protect data that you can think of + measures to guarantee that the data has not been inappropriately modified).
Even further, the regulation requires certain processes to be in place within an organization (of more than 250 employees or if a significant amount of data is processed), and those include keeping a record of all types of processing activities carried out, including transfers to processors (3rd parties), which includes cloud service providers. None of the other requirements of the regulation have an exception depending on the organization size, so “I’m small, GDPR does not concern me” is a myth.
It is important to know what “personal data” is. Basically, it’s every piece of data that can be used to uniquely identify a person or data that is about an already identified person. It’s data that the user has explicitly provided, but also data that you have collected about them from either 3rd parties or based on their activities on the site (what they’ve been looking at, what they’ve purchased, etc.)
Having said that, I’ll list a number of features that will have to be implemented and some hints on how to do that, followed by some do’s and don’t’s.
“Forget me” – you should have a method that takes a userId and deletes all personal data about that user (in case they have been collected on the basis of consent, and not due to contract enforcement or legal obligation). It is actually useful for integration tests to have that feature (to cleanup after the test), but it may be hard to implement depending on the data model. In a regular data model, deleting a record may be easy, but some foreign keys may be violated. That means you have two options – either make sure you allow nullable foreign keys (for example an order usually has a reference to the user that made it, but when the user requests his data be deleted, you can set the userId to null), or make sure you delete all related data (e.g. via cascades). This may not be desirable, e.g. if the order is used to track available quantities or for accounting purposes. It’s a bit trickier for event-sourcing data models, or in extreme cases, ones that include some sort of blcokchain/hash chain/tamper-evident data structure. With event sourcing you should be able to remove a past event and re-generate intermediate snapshots. For blockchain-like structures – be careful what you put in there and avoid putting personal data of users. There is an option to use a chameleon hash function, but that’s suboptimal. Overall, you must constantly think of how you can delete the personal data. And “our data model doesn’t allow it” isn’t an excuse.
Notify 3rd parties for erasure – deleting things from your system may be one thing, but you are also obligated to inform all third parties that you have pushed that data to. So if you have sent personal data to, say, Salesforce, Hubspot, twitter, or any cloud service provider, you should call an API of theirs that allows for the deletion of personal data. If you are such a provider, obviously, your “forget me” endpoint should be exposed. Calling the 3rd party APIs to remove data is not the full story, though. You also have to make sure the information does not appear in search results. Now, that’s tricky, as Google doesn’t have an API for removal, only a manual process. Fortunately, it’s only about public profile pages that are crawlable by Google (and other search engines, okay…), but you still have to take measures. Ideally, you should make the personal data page return a 404 HTTP status, so that it can be removed.
Restrict processing – in your admin panel where there’s a list of users, there should be a button “restrict processing”. The user settings page should also have that button. When clicked (after reading the appropriate information), it should mark the profile as restricted. That means it should no longer be visible to the backoffice staff, or publicly. You can implement that with a simple “restricted” flag in the users table and a few if-clasues here and there.
Export data – there should be another button – “export data”. When clicked, the user should receive all the data that you hold about them. What exactly is that data – depends on the particular usecase. Usually it’s at least the data that you delete with the “forget me” functionality, but may include additional data (e.g. the orders the user has made may not be delete, but should be included in the dump). The structure of the dump is not strictly defined, but my recommendation would be to reuse schema.org definitions as much as possible, for either JSON or XML. If the data is simple enough, a CSV/XLS export would also be fine. Sometimes data export can take a long time, so the button can trigger a background process, which would then notify the user via email when his data is ready (twitter, for example, does that already – you can request all your tweets and you get them after a while).
Allow users to edit their profile – this seems an obvious rule, but it isn’t always followed. Users must be able to fix all data about them, including data that you have collected from other sources (e.g. using a “login with facebook” you may have fetched their name and address). Rule of thumb – all the fields in your “users” table should be editable via the UI. Technically, rectification can be done via a manual support process, but that’s normally more expensive for a business than just having the form to do it. There is one other scenario, however, when you’ve obtained the data from other sources (i.e. the user hasn’t provided their details to you directly). In that case there should still be a page where they can identify somehow (via email and/or sms confirmation) and get access to the data about them.
Consent checkboxes – this is in my opinion the biggest change that the regulation brings. “I accept the terms and conditions” would no longer be sufficient to claim that the user has given their consent for processing their data. So, for each particular processing activity there should be a separate checkbox on the registration (or user profile) screen. You should keep these consent checkboxes in separate columns in the database, and let the users withdraw their consent (by unchecking these checkboxes from their profile page – see the previous point). Ideally, these checkboxes should come directly from the register of processing activities (if you keep one). Note that the checkboxes should not be preselected, as this does not count as “consent”.
Re-request consent – if the consent users have given was not clear (e.g. if they simply agreed to terms & conditions), you’d have to re-obtain that consent. So prepare a functionality for mass-emailing your users to ask them to go to their profile page and check all the checkboxes for the personal data processing activities that you have.
“See all my data” – this is very similar to the “Export” button, except data should be displayed in the regular UI of the application rather than an XML/JSON format. For example, Google Maps shows you your location history – all the places that you’ve been to. It is a good implementation of the right to access. (Though Google is very far from perfect when privacy is concerned)
Age checks – you should ask for the user’s age, and if the user is a child (below 16), you should ask for parent permission. There’s no clear way how to do that, but my suggestion is to introduce a flow, where the child should specify the email of a parent, who can then confirm. Obviosuly, children will just cheat with their birthdate, or provide a fake parent email, but you will most likely have done your job according to the regulation (this is one of the “wishful thinking” aspects of the regulation).
Encrypt the data in transit. That means that communication between your application layer and your database (or your message queue, or whatever component you have) should be over TLS. The certificates could be self-signed (and possibly pinned), or you could have an internal CA. Different databases have different configurations, just google “X encrypted connections. Some databases need gossiping among the nodes – that should also be configured to use encryption
Encrypt the data at rest – this again depends on the database (some offer table-level encryption), but can also be done on machine-level. E.g. using LUKS. The private key can be stored in your infrastructure, or in some cloud service like AWS KMS.
Encrypt your backups – kind of obvious
Implement pseudonymisation – the most obvious use-case is when you want to use production data for the test/staging servers. You should change the personal data to some “pseudonym”, so that the people cannot be identified. When you push data for machine learning purposes (to third parties or not), you can also do that. Technically, that could mean that your User object can have a “pseudonymize” method which applies hash+salt/bcrypt/PBKDF2 for some of the data that can be used to identify a person
Protect data integrity – this is a very broad thing, and could simply mean “have authentication mechanisms for modifying data”. But you can do something more, even as simple as a checksum, or a more complicated solution (like the one I’m working on). It depends on the stakes, on the way data is accessed, on the particular system, etc. The checksum can be in the form of a hash of all the data in a given database record, which should be updated each time the record is updated through the application. It isn’t a strong guarantee, but it is at least something.
Have your GDPR register of processing activities in something other than Excel – Article 30 says that you should keep a record of all the types of activities that you use personal data for. That sounds like bureaucracy, but it may be useful – you will be able to link certain aspects of your application with that register (e.g. the consent checkboxes, or your audit trail records). It wouldn’t take much time to implement a simple register, but the business requirements for that should come from whoever is responsible for the GDPR compliance. But you can advise them that having it in Excel won’t make it easy for you as a developer (imagine having to fetch the excel file internally, so that you can parse it and implement a feature). Such a register could be a microservice/small application deployed separately in your infrastructure.
Log access to personal data – every read operation on a personal data record should be logged, so that you know who accessed what and for what purpose
Register all API consumers – you shouldn’t allow anonymous API access to personal data. I’d say you should request the organization name and contact person for each API user upon registration, and add those to the data processing register. Note: some have treated article 30 as a requirement to keep an audit log. I don’t think it is saying that – instead it requires 250+ companies to keep a register of the types of processing activities (i.e. what you use the data for). There are other articles in the regulation that imply that keeping an audit log is a best practice (for protecting the integrity of the data as well as to make sure it hasn’t been processed without a valid reason)
Finally, some “don’t’s”.
Don’t use data for purposes that the user hasn’t agreed with – that’s supposed to be the spirit of the regulation. If you want to expose a new API to a new type of clients, or you want to use the data for some machine learning, or you decide to add ads to your site based on users’ behaviour, or sell your database to a 3rd party – think twice. I would imagine your register of processing activities could have a button to send notification emails to users to ask them for permission when a new processing activity is added (or if you use a 3rd party register, it should probably give you an API). So upon adding a new processing activity (and adding that to your register), mass email all users from whom you’d like consent.
Don’t log personal data – getting rid of the personal data from log files (especially if they are shipped to a 3rd party service) can be tedious or even impossible. So log just identifiers if needed. And make sure old logs files are cleaned up, just in case
Don’t put fields on the registration/profile form that you don’t need – it’s always tempting to just throw as many fields as the usability person/designer agrees on, but unless you absolutely need the data for delivering your service, you shouldn’t collect it. Names you should probably always collect, but unless you are delivering something, a home address or phone is unnecessary.
Don’t assume 3rd parties are compliant – you are responsible if there’s a data breach in one of the 3rd parties (e.g. “processors”) to which you send personal data. So before you send data via an API to another service, make sure they have at least a basic level of data protection. If they don’t, raise a flag with management.
Don’t assume having ISO XXX makes you compliant – information security standards and even personal data standards are a good start and they will probably 70% of what the regulation requires, but they are not sufficient – most of the things listed above are not covered in any of those standards
Overall, the purpose of the regulation is to make you take conscious decisions when processing personal data. It imposes best practices in a legal way. If you follow the above advice and design your data model, storage, data flow , API calls with data protection in mind, then you shouldn’t worry about the huge fines that the regulation prescribes – they are for extreme cases, like Equifax for example. Regulators (data protection authorities) will most likely have some checklists into which you’d have to somehow fit, but if you follow best practices, that shouldn’t be an issue.
I think all of the above features can be implemented in a few weeks by a small team. Be suspicious when a big vendor offers you a generic plug-and-play “GDPR compliance” solution. GDPR is not just about the technical aspects listed above – it does have organizational/process implications. But also be suspicious if a consultant claims GDPR is complicated. It’s not – it relies on a few basic principles that are in fact best practices anyway. Just don’t ignore them.
The newly announced openSUSE “Tumbleweed snapshots” feature is an attempt to make rolling distributions a little easier for those who don’t want to stay on the leading edge all the time. In essence, it keeps a snapshot of the state of the distribution at regular intervals and enables users to install applications from their particular snapshot. That allows the installation of new applications without the need to drag in everything else that may have changed since the system as a whole was updated. “Tumbleweed Snapshots provides the best of both worlds, the latest packages when you want them and the one package you need in the middle of working on a project.”
Tag your EC2 instances so that EBS Snapshot Scheduler backs up your instances when you want them backed up.
In addition to making sure your EC2 instances have all the available operating system patches applied on a regular schedule, you should take snapshots of the EBS storage volumes attached to your EC2 instances. Taking regular snapshots allows you to restore your data to a previous state quickly and cost effectively. With Amazon EBS snapshots, you pay only for the actual data you store, and snapshots save only the data that has changed since the previous snapshot, which minimizes your cost. You will use EBS Snapshot Scheduler to make regular snapshots of your EC2 instance. EBS Snapshot Scheduler takes advantage of other AWS services including CloudFormation, Amazon DynamoDB, and AWS Lambda to make backing up your EBS volumes simple.
Determine the schedule
As a best practice, you should back up your data frequently during the hours when your data changes the most. This reduces the amount of data you lose if you have to restore from a snapshot. For the purposes of this blog post, the data for my instances changes the most between the business hours of 9:00 A.M. to 5:00 P.M. Pacific Time. During these hours, I will make snapshots hourly to minimize data loss.
In addition to backing up frequently, another best practice is to establish a strategy for retention. This will vary based on how you need to use the snapshots. If you have compliance requirements to be able to restore for auditing, your needs may be different than if you are able to detect data corruption within three hours and simply need to restore to something that limits data loss to five hours. EBS Snapshot Scheduler enables you to specify the retention period for your snapshots. For this post, I only need to keep snapshots for recent business days. To account for weekends, I will set my retention period to three days, which is down from the default of 15 days when deploying EBS Snapshot Scheduler.
Deploy EBS Snapshot Scheduler
In Step 1 of Part 1 of this post, I showed how to configure an EC2 for Windows Server 2012 R2 instance with an EBS volume. You will use EBS Snapshot Scheduler to take eight snapshots each weekday of your EC2 instance’s EBS volumes:
Navigate to the EBS Snapshot Scheduler deployment page and choose Launch Solution. This takes you to the CloudFormation console in your account. The Specify an Amazon S3 template URL option is already selected and prefilled. Choose Next on the Select Template page.
On the Specify Details page, retain all default parameters except for AutoSnapshotDeletion. Set AutoSnapshotDeletion to Yes to ensure that old snapshots are periodically deleted. The default retention period is 15 days (you will specify a shorter value on your instance in the next subsection).
Choose Next twice to move to the Review step, and start deployment by choosing the I acknowledge that AWS CloudFormation might create IAM resources check box and then choosing Create.
Tag your EC2 instances
EBS Snapshot Scheduler takes a few minutes to deploy. While waiting for its deployment, you can start to tag your instance to define its schedule. EBS Snapshot Scheduler reads tag values and looks for four possible custom parameters in the following order:
<snapshot time> – Time in 24-hour format with no colon.
<retention days> – The number of days (a positive integer) to retain the snapshot before deletion, if set to automatically delete snapshots.
<time zone> – The time zone of the times specified in <snapshot time>.
<active day(s)> – all, weekdays, or mon, tue, wed, thu, fri, sat, and/or sun.
Because you want hourly backups on weekdays between 9:00 A.M. and 5:00 P.M. Pacific Time, you need to configure eight tags—one for each hour of the day. You will add the eight tags shown in the following table to your EC2 instance.
Next, you will add these tags to your instance. If you want to tag multiple instances at once, you can use Tag Editor instead. To add the tags in the preceding table to your EC2 instance:
Navigate to your EC2 instance in the EC2 console and choose Tags in the navigation pane.
Choose Add/Edit Tags and then choose Create Tag to add all the tags specified in the preceding table.
Confirm you have added the tags by choosing Save. After adding these tags, navigate to your EC2 instance in the EC2 console. Your EC2 instance should look similar to the following screenshot.
After waiting a couple of hours, you can see snapshots beginning to populate on the Snapshots page of the EC2 console.
To check if EBS Snapshot Scheduler is active, you can check the CloudWatch rule that runs the Lambda function. If the clock icon shown in the following screenshot is green, the scheduler is active. If the clock icon is gray, the rule is disabled and does not run. You can enable or disable the rule by selecting it, choosing Actions, and choosing Enable or Disable. This also allows you to temporarily disable EBS Snapshot Scheduler.
You can also monitor when EBS Snapshot Scheduler has run by choosing the name of the CloudWatch rule as shown in the previous screenshot and choosing Show metrics for the rule.
In this section, I show you how to you use Amazon Inspector to scan your EC2 instance for common vulnerabilities and exposures (CVEs) and set up Amazon SNS notifications. To do this I will show you how to:
Install the Amazon Inspector agent by using EC2 Run Command.
Set up notifications using Amazon SNS to notify you of any findings.
Define an Amazon Inspector target and template to define what assessment to perform on your EC2 instance.
Schedule Amazon Inspector assessment runs to assess your EC2 instance on a regular interval.
Amazon Inspector can help you scan your EC2 instance using prebuilt rules packages, which are built and maintained by AWS. These prebuilt rules packages tell Amazon Inspector what to scan for on the EC2 instances you select. Amazon Inspector provides the following prebuilt packages for Microsoft Windows Server 2012 R2:
Common Vulnerabilities and Exposures
Center for Internet Security Benchmarks
Runtime Behavior Analysis
In this post, I’m focused on how to make sure you keep your EC2 instances patched, backed up, and inspected for common vulnerabilities and exposures (CVEs). As a result, I will focus on how to use the CVE rules package and use your instance tags to identify the instances on which to run the CVE rules. If your EC2 instance is fully patched using Systems Manager, as described earlier, you should not have any findings with the CVE rules package. Regardless, as a best practice I recommend that you use Amazon Inspector as an additional layer for identifying any unexpected failures. This involves using Amazon CloudWatch to set up weekly Amazon Inspector scans, and configuring Amazon Inspector to notify you of any findings through SNS topics. By acting on the notifications you receive, you can respond quickly to any CVEs on any of your EC2 instances to help ensure that malware using known CVEs does not affect your EC2 instances. In a previous blog post, Eric Fitzgerald showed how to remediate Amazon Inspector security findings automatically.
Install the Amazon Inspector agent
To install the Amazon Inspector agent, you will use EC2 Run Command, which allows you to run any command on any of your EC2 instances that have the Systems Manager agent with an attached IAM role that allows access to Systems Manager.
Choose Run Command under Systems Manager Services in the navigation pane of the EC2 console. Then choose Run a command.
To install the Amazon Inspector agent, you will use an AWS managed and provided command document that downloads and installs the agent for you on the selected EC2 instance. Choose AmazonInspector-ManageAWSAgent. To choose the target EC2 instance where this command will be run, use the tag you previously assigned to your EC2 instance, Patch Group, with a value of Windows Servers. For this example, set the concurrent installations to 1 and tell Systems Manager to stop after 5 errors.
Retain the default values for all other settings on the Run a command page and choose Run. Back on the Run Command page, you can see if the command that installed the Amazon Inspector agent executed successfully on all selected EC2 instances.
Set up notifications using Amazon SNS
Now that you have installed the Amazon Inspector agent, you will set up an SNS topic that will notify you of any findings after an Amazon Inspector run.
Choose Create topic, name your topic (only alphanumeric characters, hyphens, and underscores are allowed) and give it a display name to ensure you know what this topic does (I’ve named mine Inspector). Choose Create topic.
To allow Amazon Inspector to publish messages to your new topic, choose Other topic actions and choose Edit topic policy.
For Allow these users to publish messages to this topic and Allow these users to subscribe to this topic, choose Only these AWS users. Type the following ARN for the US East (N. Virginia) Region in which you are deploying the solution in this post: arn:aws:iam::316112463485:root. This is the ARN of Amazon Inspector itself. For the ARNs of Amazon Inspector in other AWS Regions, see Setting Up an SNS Topic for Amazon Inspector Notifications (Console). Amazon Resource Names (ARNs) uniquely identify AWS resources across all of AWS.
To receive notifications from Amazon Inspector, subscribe to your new topic by choosing Create subscription and adding your email address. After confirming your subscription by clicking the link in the email, the topic should display your email address as a subscriber. Later, you will configure the Amazon Inspector template to publish to this topic.
Define an Amazon Inspector target and template
Now that you have set up the notification topic by which Amazon Inspector can notify you of findings, you can create an Amazon Inspector target and template. A target defines which EC2 instances are in scope for Amazon Inspector. A template defines which packages to run, for how long, and on which target.
To create an Amazon Inspector target:
Navigate to the Amazon Inspector console and choose Get started. At the time of writing this blog post, Amazon Inspector is available in the US East (N. Virginia), US West (N. California), US West (Oregon), EU (Ireland), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Sydney), and Asia Pacific (Tokyo) Regions.
For Amazon Inspector to be able to collect the necessary data from your EC2 instance, you must create an IAM service role for Amazon Inspector. Amazon Inspector can create this role for you if you choose Choose or create role and confirm the role creation by choosing Allow.
Amazon Inspector also asks you to tag your EC2 instance and install the Amazon Inspector agent. You already performed these steps in Part 1 of this post, so you can proceed by choosing Next. To define the Amazon Inspector target, choose the previously used Patch Group tag with a Value of Windows Servers. This is the same tag that you used to define the targets for patching. Then choose Next.
Now, define your Amazon Inspector template, and choose a name and the package you want to run. For this post, use the Common Vulnerabilities and Exposures package and choose the default duration of 1 hour. As you can see, the package has a version number, so always select the latest version of the rules package if multiple versions are available.
Configure Amazon Inspector to publish to your SNS topic when findings are reported. You can also choose to receive a notification of a started run, a finished run, or changes in the state of a run. For this blog post, you want to receive notifications if there are any findings. To start, choose Assessment Templates from the Amazon Inspector console and choose your newly created Amazon Inspector assessment template. Choose the icon below SNS topics (see the following screenshot).
A pop-up appears in which you can choose the previously created topic and the events about which you want SNS to notify you (choose Finding reported).
Schedule Amazon Inspector assessment runs
The last step in using Amazon Inspector to assess for CVEs is to schedule the Amazon Inspector template to run using Amazon CloudWatch Events. This will make sure that Amazon Inspector assesses your EC2 instance on a regular basis. To do this, you need the Amazon Inspector template ARN, which you can find under Assessment templates in the Amazon Inspector console. CloudWatch Events can run your Amazon Inspector assessment at an interval you define using a Cron-based schedule. Cron is a well-known scheduling agent that is widely used on UNIX-like operating systems and uses the following syntax for CloudWatch Events.
All scheduled events use a UTC time zone, and the minimum precision for schedules is one minute. For more information about scheduling CloudWatch Events, see Schedule Expressions for Rules.
On the next page, specify if you want to invoke your rule based on an event pattern or a schedule. For this blog post, you will select a schedule based on a Cron expression.
You can schedule the Amazon Inspector assessment any time you want using the Cron expression, or you can use the Cron expression I used in the following screenshot, which will run the Amazon Inspector assessment every Sunday at 10:00 P.M. GMT.
Choose Add target and choose Inspector assessment template from the drop-down menu. Paste the ARN of the Amazon Inspector template you previously created in the Amazon Inspector console in the Assessment template box and choose Create a new role for this specific resource. This new role is necessary so that CloudWatch Events has the necessary permissions to start the Amazon Inspector assessment. CloudWatch Events will automatically create the new role and grant the minimum set of permissions needed to run the Amazon Inspector assessment. To proceed, choose Configure details.
Next, give your rule a name and a description. I suggest using a name that describes what the rule does, as shown in the following screenshot.
Finish the wizard by choosing Create rule. The rule should appear in the Events – Rules section of the CloudWatch console.
To confirm your CloudWatch Events rule works, wait for the next time your CloudWatch Events rule is scheduled to run. For testing purposes, you can choose your CloudWatch Events rule and choose Edit to change the schedule to run it sooner than scheduled.
Now navigate to the Amazon Inspector console to confirm the launch of your first assessment run. The Start time column shows you the time each assessment started and the Status column the status of your assessment. In the following screenshot, you can see Amazon Inspector is busy Collecting data from the selected assessment targets.
You have concluded the last step of this blog post by setting up a regular scan of your EC2 instance with Amazon Inspector and a notification that will let you know if your EC2 instance is vulnerable to any known CVEs. In a previous Security Blog post, Eric Fitzgerald explained How to Remediate Amazon Inspector Security Findings Automatically. Although that blog post is for Linux-based EC2 instances, the post shows that you can learn about Amazon Inspector findings in other ways than email alerts.
In this two-part blog post, I showed how to make sure you keep your EC2 instances up to date with patching, how to back up your instances with snapshots, and how to monitor your instances for CVEs. Collectively these measures help to protect your instances against common attack vectors that attempt to exploit known vulnerabilities. In Part 1, I showed how to configure your EC2 instances to make it easy to use Systems Manager, EBS Snapshot Scheduler, and Amazon Inspector. I also showed how to use Systems Manager to schedule automatic patches to keep your instances current in a timely fashion. In Part 2, I showed you how to take regular snapshots of your data by using EBS Snapshot Scheduler and how to use Amazon Inspector to check if your EC2 instances running Microsoft Windows contain any common vulnerabilities and exposures (CVEs).
Most malware tries to compromise your systems by using a known vulnerability that the maker of the operating system has already patched. To help prevent malware from affecting your systems, two security best practices are to apply all operating system patches to your systems and actively monitor your systems for missing patches. In case you do need to recover from a malware attack, you should make regular backups of your data.
In today’s blog post (Part 1 of a two-part post), I show how to keep your Amazon EC2 instances that run Microsoft Windows up to date with the latest security patches by using Amazon EC2 Systems Manager. Tomorrow in Part 2, I show how to take regular snapshots of your data by using Amazon EBS Snapshot Scheduler and how to use Amazon Inspector to check if your EC2 instances running Microsoft Windows contain any common vulnerabilities and exposures (CVEs).
Later on, you will assign tasks to a maintenance window to patch your instances with Systems Manager. To do this, the AWS Identity and Access Management (IAM) user you are using for this post must have the iam:PassRole permission. This permission allows this IAM user to assign tasks to pass their own IAM permissions to the AWS service. In this example, when you assign a task to a maintenance window, IAM passes your credentials to Systems Manager. This safeguard ensures that the user cannot use the creation of tasks to elevate their IAM privileges because their own IAM privileges limit which tasks they can run against an EC2 instance. You should also authorize your IAM user to use EC2, Amazon Inspector, Amazon CloudWatch, and Systems Manager. You can achieve this by attaching the following AWS managed policies to the IAM user you are using for this example: AmazonInspectorFullAccess, AmazonEC2FullAccess, and AmazonSSMFullAccess.
The following diagram illustrates the components of this solution’s architecture.
For this blog post, Microsoft Windows EC2 is Amazon EC2 for Microsoft Windows Server 2012 R2 instances with attached Amazon Elastic Block Store (Amazon EBS) volumes, which are running in your VPC. These instances may be standalone Windows instances running your Windows workloads, or you may have joined them to an Active Directory domain controller. For instances joined to a domain, you can be using Active Directory running on an EC2 for Windows instance, or you can use AWS Directory Service for Microsoft Active Directory.
Amazon EC2 Systems Manager is a scalable tool for remote management of your EC2 instances. You will use the Systems Manager Run Command to install the Amazon Inspector agent. The agent enables EC2 instances to communicate with the Amazon Inspector service and run assessments, which I explain in detail later in this blog post. You also will create a Systems Manager association to keep your EC2 instances up to date with the latest security patches.
You can use the EBS Snapshot Scheduler to schedule automated snapshots at regular intervals. You will use it to set up regular snapshots of your Amazon EBS volumes. EBS Snapshot Scheduler is a prebuilt solution by AWS that you will deploy in your AWS account. With Amazon EBS snapshots, you pay only for the actual data you store. Snapshots save only the data that has changed since the previous snapshot, which minimizes your cost.
You will use Amazon Inspector to run security assessments on your EC2 for Windows Server instance. In this post, I show how to assess if your EC2 for Windows Server instance is vulnerable to any of the more than 50,000 CVEs registered with Amazon Inspector.
In today’s and tomorrow’s posts, I show you how to:
Launch an EC2 instance with an IAM role, Amazon EBS volume, and tags that Systems Manager and Amazon Inspector will use.
Configure Systems Manager to install the Amazon Inspector agent and patch your EC2 instances.
Use Amazon Inspector to check if your EC2 instances running Microsoft Windows contain any common vulnerabilities and exposures (CVEs).
Step 1: Launch an EC2 instance
In this section, I show you how to launch your EC2 instances so that you can use Systems Manager with the instances and use instance tags with EBS Snapshot Scheduler to automate snapshots. This requires three things:
Create an IAM role for Systems Manager before launching your EC2 instance.
Launch your EC2 instance with Amazon EBS and the IAM role for Systems Manager.
Add tags to instances so that you can automate policies for which instances you take snapshots of and when.
Create an IAM role for Systems Manager
Before launching your EC2 instance, I recommend that you first create an IAM role for Systems Manager, which you will use to update the EC2 instance you will launch. AWS already provides a preconfigured policy that you can use for your new role, and it is called AmazonEC2RoleforSSM.
Sign in to the IAM console and choose Roles in the navigation pane. Choose Create new role.
In the role-creation workflow, choose AWS service > EC2 > EC2 to create a role for an EC2 instance.
Choose the AmazonEC2RoleforSSM policy to attach it to the new role you are creating.
Give the role a meaningful name (I chose EC2SSM) and description, and choose Create role.
Launch your EC2 instance
To follow along, you need an EC2 instance that is running Microsoft Windows Server 2012 R2 and that has an Amazon EBS volume attached. You can use any existing instance you may have or create a new instance.
When launching your new EC2 instance, be sure that:
The operating system is Microsoft Windows Server 2012 R2.
You attach at least one Amazon EBS volume to the EC2 instance.
The final step of configuring your EC2 instances is to add tags. You will use these tags to configure Systems Manager in Step 2 of this blog post and to configure Amazon Inspector in Part 2. For this example, I add a tag key, Patch Group, and set the value to Windows Servers. I could have other groups of EC2 instances that I treat differently by having the same tag key but a different tag value. For example, I might have a collection of other servers with the Patch Group tag key with a value of IAS Servers.
Note: You must wait a few minutes until the EC2 instance becomes available before you can proceed to the next section.
At this point, you now have at least one EC2 instance you can use to configure Systems Manager, use EBS Snapshot Scheduler, and use Amazon Inspector.
Note: If you have a large number of EC2 instances to tag, you may want to use the EC2 CreateTags API rather than manually apply tags to each instance.
Step 2: Configure Systems Manager
In this section, I show you how to use Systems Manager to apply operating system patches to your EC2 instances, and how to manage patch compliance.
To start, I will provide some background information about Systems Manager. Then, I will cover how to:
Create the Systems Manager IAM role so that Systems Manager is able to perform patch operations.
Associate a Systems Manager patch baseline with your instance to define which patches Systems Manager should apply.
Define a maintenance window to make sure Systems Manager patches your instance when you tell it to.
Monitor patch compliance to verify the patch state of your instances.
Systems Manager is a collection of capabilities that helps you automate management tasks for AWS-hosted instances on EC2 and your on-premises servers. In this post, I use Systems Manager for two purposes: to run remote commands and apply operating system patches. To learn about the full capabilities of Systems Manager, see What Is Amazon EC2 Systems Manager?
Patch management is an important measure to prevent malware from infecting your systems. Most malware attacks look for vulnerabilities that are publicly known and in most cases are already patched by the maker of the operating system. These publicly known vulnerabilities are well documented and therefore easier for an attacker to exploit than having to discover a new vulnerability.
Patches for these new vulnerabilities are available through Systems Manager within hours after Microsoft releases them. There are two prerequisites to use Systems Manager to apply operating system patches. First, you must attach the IAM role you created in the previous section, EC2SSM, to your EC2 instance. Second, you must install the Systems Manager agent on your EC2 instance. If you have used a recent Microsoft Windows Server 2012 R2 AMI published by AWS, Amazon has already installed the Systems Manager agent on your EC2 instance. You can confirm this by logging in to an EC2 instance and looking for Amazon SSM Agent under Programs and Features in Windows. To install the Systems Manager agent on an instance that does not have the agent preinstalled or if you want to use the Systems Manager agent on your on-premises servers, see the documentation about installing the Systems Manager agent. If you forgot to attach the newly created role when launching your EC2 instance or if you want to attach the role to already running EC2 instances, see Attach an AWS IAM Role to an Existing Amazon EC2 Instance by Using the AWS CLI or use the AWS Management Console.
To make sure your EC2 instance receives operating system patches from Systems Manager, you will use the default patch baseline provided and maintained by AWS, and you will define a maintenance window so that you control when your EC2 instances should receive patches. For the maintenance window to be able to run any tasks, you also must create a new role for Systems Manager. This role is a different kind of role than the one you created earlier: Systems Manager will use this role instead of EC2. Earlier we created the EC2SSM role with the AmazonEC2RoleforSSM policy, which allowed the Systems Manager agent on our instance to communicate with the Systems Manager service. Here we need a new role with the policy AmazonSSMMaintenanceWindowRole to make sure the Systems Manager service is able to execute commands on our instance.
Create the Systems Manager IAM role
To create the new IAM role for Systems Manager, follow the same procedure as in the previous section, but in Step 3, choose the AmazonSSMMaintenanceWindowRole policy instead of the previously selected AmazonEC2RoleforSSM policy.
Finish the wizard and give your new role a recognizable name. For example, I named my role MaintenanceWindowRole.
By default, only EC2 instances can assume this new role. You must update the trust policy to enable Systems Manager to assume this role.
To update the trust policy associated with this new role:
Navigate to the IAM console and choose Roles in the navigation pane.
Choose MaintenanceWindowRole and choose the Trust relationships tab. Then choose Edit trust relationship.
Update the policy document by copying the following policy and pasting it in the Policy Document box. As you can see, I have added the ssm.amazonaws.com service to the list of allowed Principals that can assume this role. Choose Update Trust Policy.
Associate a Systems Manager patch baseline with your instance
Next, you are going to associate a Systems Manager patch baseline with your EC2 instance. A patch baseline defines which patches Systems Manager should apply. You will use the default patch baseline that AWS manages and maintains. Before you can associate the patch baseline with your instance, though, you must determine if Systems Manager recognizes your EC2 instance.
Navigate to the EC2 console, scroll down to Systems Manager Shared Resources in the navigation pane, and choose Managed Instances. Your new EC2 instance should be available there. If your instance is missing from the list, verify the following:
Go to the EC2 console and verify your instance is running.
Select your instance and confirm you attached the Systems Manager IAM role, EC2SSM.
Make sure that you deployed a NAT gateway in your public subnet to ensure your VPC reflects the diagram at the start of this post so that the Systems Manager agent can connect to the Systems Manager internet endpoint.
Now that you have confirmed that Systems Manager can manage your EC2 instance, it is time to associate the AWS maintained patch baseline with your EC2 instance:
Choose Patch Baselines under Systems Manager Services in the navigation pane of the EC2 console.
Choose the default patch baseline as highlighted in the following screenshot, and choose Modify Patch Groups in the Actions drop-down.
In the Patch group box, enter the same value you entered under the Patch Group tag of your EC2 instance in “Step 1: Configure your EC2 instance.” In this example, the value I enter is Windows Servers. Choose the check mark icon next to the patch group and choose Close.
Define a maintenance window
Now that you have successfully set up a role and have associated a patch baseline with your EC2 instance, you will define a maintenance window so that you can control when your EC2 instances should receive patches. By creating multiple maintenance windows and assigning them to different patch groups, you can make sure your EC2 instances do not all reboot at the same time. The Patch Group resource tag you defined earlier will determine to which patch group an instance belongs.
To define a maintenance window:
Navigate to the EC2 console, scroll down to Systems Manager Shared Resources in the navigation pane, and choose Maintenance Windows. Choose Create a Maintenance Window.
Select the Cron schedule builder to define the schedule for the maintenance window. In the example in the following screenshot, the maintenance window will start every Saturday at 10:00 P.M. UTC.
To specify when your maintenance window will end, specify the duration. In this example, the four-hour maintenance window will end on the following Sunday morning at 2:00 A.M. UTC (in other words, four hours after it started).
Systems manager completes all tasks that are in process, even if the maintenance window ends. In my example, I am choosing to prevent new tasks from starting within one hour of the end of my maintenance window because I estimated my patch operations might take longer than one hour to complete. Confirm the creation of the maintenance window by choosing Create maintenance window.
After creating the maintenance window, you must register the EC2 instance to the maintenance window so that Systems Manager knows which EC2 instance it should patch in this maintenance window. To do so, choose Register new targets on the Targets tab of your newly created maintenance window. You can register your targets by using the same Patch Group tag you used before to associate the EC2 instance with the AWS-provided patch baseline.
Assign a task to the maintenance window that will install the operating system patches on your EC2 instance:
Open Maintenance Windows in the EC2 console, select your previously created maintenance window, choose the Tasks tab, and choose Register run command task from the Register new task drop-down.
Choose the AWS-RunPatchBaseline document from the list of available documents.
For Role, choose the role you created previously (called MaintenanceWindowRole).
For Execute on, specify how many EC2 instances Systems Manager should patch at the same time. If you have a large number of EC2 instances and want to patch all EC2 instances within the defined time, make sure this number is not too low. For example, if you have 1,000 EC2 instances, a maintenance window of 4 hours, and 2 hours’ time for patching, make this number at least 500.
For Stop after, specify after how many errors Systems Manager should stop.
For Operation, choose Install to make sure to install the patches.
Now, you must wait for the maintenance window to run at least once according to the schedule you defined earlier. Note that if you don’t want to wait, you can adjust the schedule to run sooner by choosing Edit maintenance window on the Maintenance Windows page of Systems Manager. If your maintenance window has expired, you can check the status of any maintenance tasks Systems Manager has performed on the Maintenance Windows page of Systems Manager and select your maintenance window.
Monitor patch compliance
You also can see the overall patch compliance of all EC2 instances that are part of defined patch groups by choosing Patch Compliance under Systems Manager Services in the navigation pane of the EC2 console. You can filter by Patch Group to see how many EC2 instances within the selected patch group are up to date, how many EC2 instances are missing updates, and how many EC2 instances are in an error state.
In this section, you have set everything up for patch management on your instance. Now you know how to patch your EC2 instance in a controlled manner and how to check if your EC2 instance is compliant with the patch baseline you have defined. Of course, I recommend that you apply these steps to all EC2 instances you manage.
In Part 1 of this blog post, I have shown how to configure EC2 instances for use with Systems Manager, EBS Snapshot Scheduler, and Amazon Inspector. I also have shown how to use Systems Manager to keep your Microsoft Windows–based EC2 instances up to date. In Part 2 of this blog post tomorrow, I will show how to take regular snapshots of your data by using EBS Snapshot Scheduler and how to use Amazon Inspector to check if your EC2 instances running Microsoft Windows contain any CVEs.
Our Health Customer Stories page lists just a few of the many customers that are building and running healthcare and life sciences applications that run on AWS. Customers like Verge Health, Care Cloud, and Orion Health trust AWS with Protected Health Information (PHI) and Personally Identifying Information (PII) as part of their efforts to comply with HIPAA and HITECH.
Sixteen More Services In my last HIPAA Eligibility Update I shared the news that we added eight additional services to our list of HIPAA eligible services. Today I am happy to let you know that we have added another sixteen services to the list, bringing the total up to 46. Here are the newest additions, along with some short descriptions and links to some of my blog posts to jog your memory:
Today, AWS introduced AWS Directory Service for Microsoft Active Directory (Standard Edition), also known as AWS Microsoft AD (Standard Edition), which is managed Microsoft Active Directory (AD) that is performance optimized for small and midsize businesses. AWS Microsoft AD (Standard Edition) offers you a highly available and cost-effective primary directory in the AWS Cloud that you can use to manage users, groups, and computers. It enables you to join Amazon EC2 instances to your domain easily and supports many AWS and third-party applications and services. It also can support most of the common use cases of small and midsize businesses. When you use AWS Microsoft AD (Standard Edition) as your primary directory, you can manage access and provide single sign-on (SSO) to cloud applications such as Microsoft Office 365. If you have an existing Microsoft AD directory, you can also use AWS Microsoft AD (Standard Edition) as a resource forest that contains primarily computers and groups, allowing you to migrate your AD-aware applications to the AWS Cloud while using existing on-premises AD credentials.
In this blog post, I help you get started by answering three main questions about AWS Microsoft AD (Standard Edition):
What do I get?
How can I use it?
What are the key features?
After answering these questions, I show how you can get started with creating and using your own AWS Microsoft AD (Standard Edition) directory.
1. What do I get?
When you create an AWS Microsoft AD (Standard Edition) directory, AWS deploys two Microsoft AD domain controllers powered by Microsoft Windows Server 2012 R2 in your Amazon Virtual Private Cloud (VPC). To help deliver high availability, the domain controllers run in different Availability Zones in the AWS Region of your choice.
As a managed service, AWS Microsoft AD (Standard Edition) configures directory replication, automates daily snapshots, and handles all patching and software updates. In addition, AWS Microsoft AD (Standard Edition) monitors and automatically recovers domain controllers in the event of a failure.
AWS Microsoft AD (Standard Edition) has been optimized as a primary directory for small and midsize businesses with the capacity to support approximately 5,000 employees. With 1 GB of directory object storage, AWS Microsoft AD (Standard Edition) has the capacity to store 30,000 or more total directory objects (users, groups, and computers). AWS Microsoft AD (Standard Edition) also gives you the option to add domain controllers to meet the specific performance demands of your applications. You also can use AWS Microsoft AD (Standard Edition) as a resource forest with a trust relationship to your on-premises directory.
2. How can I use it?
With AWS Microsoft AD (Standard Edition), you can share a single directory for multiple use cases. For example, you can share a directory to authenticate and authorize access for .NET applications, Amazon RDS for SQL Server with Windows Authentication enabled, and Amazon Chime for messaging and video conferencing.
The following diagram shows some of the use cases for your AWS Microsoft AD (Standard Edition) directory, including the ability to grant your users access to external cloud applications and allow your on-premises AD users to manage and have access to resources in the AWS Cloud. Click the diagram to see a larger version.
Use case 1: Sign in to AWS applications and services with AD credentials
You can enable multiple AWS applications and services such as the AWS Management Console, Amazon WorkSpaces, and Amazon RDS for SQL Server to use your AWS Microsoft AD (Standard Edition) directory. When you enable an AWS application or service in your directory, your users can access the application or service with their AD credentials.
For example, you can enable your users to sign in to the AWS Management Console with their AD credentials. To do this, you enable the AWS Management Console as an application in your directory, and then assign your AD users and groups to IAM roles. When your users sign in to the AWS Management Console, they assume an IAM role to manage AWS resources. This makes it easy for you to grant your users access to the AWS Management Console without needing to configure and manage a separate SAML infrastructure.
In addition, your users can sign in to your instances with their AD credentials. This eliminates the need to use individual instance credentials or distribute private key (PEM) files. This makes it easier for you to instantly grant or revoke access to users by using AD user administration tools you already use.
Use case 3: Provide directory services to your AD-aware workloads
Use case 4: SSO to Office 365 and other cloud applications
You can use AWS Microsoft AD (Standard Edition) to provide SSO for cloud applications. You can use Azure AD Connect to synchronize your users into Azure AD, and then use Active Directory Federation Services (AD FS) so that your users can access Microsoft Office 365 and other SAML 2.0 cloud applications by using their AD credentials.
Use case 5: Extend your on-premises AD to the AWS Cloud
If you already have an AD infrastructure and want to use it when migrating AD-aware workloads to the AWS Cloud, AWS Microsoft AD (Standard Edition) can help. You can use AD trusts to connect AWS Microsoft AD (Standard Edition) to your existing AD. This means your users can access AD-aware and AWS applications with their on-premises AD credentials, without needing you to synchronize users, groups, or passwords.
For example, your users can sign in to the AWS Management Console and Amazon WorkSpaces by using their existing AD user names and passwords. Also, when you use AD-aware applications such as SharePoint with AWS Microsoft AD (Standard Edition), your logged-in Windows users can access these applications without needing to enter credentials again.
3. What are the key features?
AWS Microsoft AD (Standard Edition) includes the features detailed in this section.
Extend your AD schema
With AWS Microsoft AD, you can run customized AD-integrated applications that require changes to your directory schema, which defines the structures of your directory. The schema is composed of object classes such as user objects, which contain attributes such as user names. AWS Microsoft AD lets you extend the schema by adding new AD attributes or object classes that are not present in the core AD attributes and classes.
With user-specific password policies, you can apply specific restrictions and account lockout policies to different types of users in your AWS Microsoft AD (Standard Edition) domain. For example, you can enforce strong passwords and frequent password change policies for administrators, and use less-restrictive policies with moderate account lockout policies for general users.
Add domain controllers
You can increase the performance and redundancy of your directory by adding domain controllers. This can help improve application performance by enabling directory clients to load-balance their requests across a larger number of domain controllers.
Encrypt directory traffic
You can use AWS Microsoft AD (Standard Edition) to encrypt Lightweight Directory Access Protocol (LDAP) communication between your applications and your directory. By enabling LDAP over Secure Sockets Layer (SSL)/Transport Layer Security (TLS), also called LDAPS, you encrypt your LDAP communications end to end. This helps you to protect sensitive information you keep in your directory when it is accessed over untrusted networks.
Improve the security of signing in to AWS services by using multi-factor authentication (MFA)
You can improve the security of signing in to AWS services, such as Amazon WorkSpaces and Amazon QuickSight, by enabling MFA in your AWS Microsoft AD (Standard Edition) directory. With MFA, your users must enter a one-time passcode (OTP) in addition to their AD user names and passwords to access AWS applications and services you enable in AWS Microsoft AD (Standard Edition).
In this blog post, I explained what AWS Microsoft AD (Standard Edition) is and how you can use it. With a single directory, you can address many use cases for your business, making it easier to migrate and run your AD-aware workloads in the AWS Cloud, provide access to AWS applications and services, and connect to other cloud applications. To learn more about AWS Microsoft AD, see the Directory Service home page.
If you have comments about this post, submit them in the “Comments” section below. If you have questions about this blog post, start a new thread on the Directory Service forum.
Today, we are launching tag-based cost allocation for Amazon Simple Queue Service (SQS). You can now assign tags to your queues and use them to manage your costs at any desired level: application, application stage (for a loosely coupled application that communicates via queues), project, department, or developer. After you have tagged your queues, you can use the AWS Tag Editor to search queues that have tags of interest.
Here’s how I would add three tags (app, stage, and department) to one of my queues:
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.