The EU’s General Data Protection Regulation (GDPR) describes data processor and data controller roles, and some customers and AWS Partner Network (APN) partners are asking how this affects the long-established AWS Shared Responsibility Model. I wanted to take some time to help folks understand shared responsibilities for us and for our customers in context of the GDPR.
How does the AWS Shared Responsibility Model change under GDPR? The short answer – it doesn’t. AWS is responsible for securing the underlying infrastructure that supports the cloud and the services provided; while customers and APN partners, acting either as data controllers or data processors, are responsible for any personal data they put in the cloud. The shared responsibility model illustrates the various responsibilities of AWS and our customers and APN partners, and the same separation of responsibility applies under the GDPR.
AWS responsibilities as a data processor
The GDPR does introduce specific regulation and responsibilities regarding data controllers and processors. When any AWS customer uses our services to process personal data, the controller is usually the AWS customer (and sometimes it is the AWS customer’s customer). However, in all of these cases, AWS is always the data processor in relation to this activity. This is because the customer is directing the processing of data through its interaction with the AWS service controls, and AWS is only executing customer directions. As a data processor, AWS is responsible for protecting the global infrastructure that runs all of our services. Controllers using AWS maintain control over data hosted on this infrastructure, including the security configuration controls for handling end-user content and personal data. Protecting this infrastructure, is our number one priority, and we invest heavily in third-party auditors to test our security controls and make any issues they find available to our customer base through AWS Artifact. Our ISO 27018 report is a good example, as it tests security controls that focus on protection of personal data in particular.
AWS has an increased responsibility for our managed services. Examples of managed services include Amazon DynamoDB, Amazon RDS, Amazon Redshift, Amazon Elastic MapReduce, and Amazon WorkSpaces. These services provide the scalability and flexibility of cloud-based resources with less operational overhead because we handle basic security tasks like guest operating system (OS) and database patching, firewall configuration, and disaster recovery. For most managed services, you only configure logical access controls and protect account credentials, while maintaining control and responsibility of any personal data.
Customer and APN partner responsibilities as data controllers — and how AWS Services can help
Our customers can act as data controllers or data processors within their AWS environment. As a data controller, the services you use may determine how you configure those services to help meet your GDPR compliance needs. For example, AWS Services that are classified as Infrastructure as a Service (IaaS), such as Amazon EC2, Amazon VPC, and Amazon S3, are under your control and require you to perform all routine security configuration and management that would be necessary no matter where the servers were located. With Amazon EC2 instances, you are responsible for managing: guest OS (including updates and security patches), application software or utilities installed on the instances, and the configuration of the AWS-provided firewall (called a security group).
To help you realize data protection by design principles under the GDPR when using our infrastructure, we recommend you protect AWS account credentials and set up individual user accounts with Amazon Identity and Access Management (IAM) so that each user is only given the permissions necessary to fulfill their job duties. We also recommend using multi-factor authentication (MFA) with each account, requiring the use of SSL/TLS to communicate with AWS resources, setting up API/user activity logging with AWS CloudTrail, and using AWS encryption solutions, along with all default security controls within AWS Services. You can also use advanced managed security services, such as Amazon Macie, which assists in discovering and securing personal data stored in Amazon S3.
For more information, you can download the AWS Security Best Practices whitepaper or visit the AWS Security Resources or GDPR Center webpages. In addition to our solutions and services, AWS APN partners can provide hundreds of tools and features to help you meet your security objectives, ranging from network security and configuration management to access control and data encryption.
AWS CloudFormation allows developers and systems administrators to easily create and manage a collection of related AWS resources (called a CloudFormation stack) by provisioning and updating them in an orderly and predictable way. CloudFormation users can now deploy and manage AWS Batch resources in exactly the same way that they are managing the rest of their AWS infrastructure.
This post highlights the native resources supported in CloudFormation and demonstrates how to create AWS Batch compute environments using CloudFormation. All sample CloudFormation, per-region templates related to this post can be found on the CloudFormation sample template site. The Ohio (us-east-2) Region is used as the example region for the remainder of this post.
AWS Batch Resources
AWS Batch is a managed service that helps you efficiently run batch computing workloads on the AWS Cloud. Users submit jobs to job queues, specifying the application to be run and their jobs’ CPU and memory requirements. AWS Batch is responsible for launching the appropriate quantity and types of instances needed to run your jobs.
AWS Batch removes the undifferentiated heavy lifting of configuring and managing compute infrastructure, allowing you to instead focus on your applications and users. This is demonstrated in the How AWS Batch Works video.
AWS Batch manages the following resources:
Job definitions
Job queues
Compute environments
A job definition specifies how jobs are to be run—for example, which Docker image to use for your job, how many vCPUs and how much memory is required, the IAM role to be used, and more.
Jobs are submitted to job queues where they reside until they can be scheduled to run on Amazon EC2 instances within a compute environment. An AWS account can have multiple job queues, each with varying priority. This gives you the ability to closely align the consumption of compute resources with your organizational requirements.
Compute environments provision and manage your EC2 instances and other compute resources that are used to run your AWS Batch jobs. Job queues are mapped to one more compute environments and a given environment can also be mapped to one or more job queues. This many-to-many relationship is defined by the compute environment order and job queue priority properties.
The following diagram shows a general overview of how the AWS Batch resources interact.
CloudFormation stack creation and updates
Upon the creation of your stack, an AWS Batch job definition is registered using your CloudFormation template. If a job definition with the same name has already been registered, a new revision is created. On stack updates, any changes to your job definition specifications in the CloudFormation template result in a new revision of that job definition and a deregistration of the previous job definition revision. Stack deletions only result in the deregistration of your job definition, as AWS Batch does not delete job definitions.
At the stack creation, a job queue is created using the template. Any changes to your job queue properties within the stack result in a call to the UpdateJobQueue API action. Similarly, stack deletions result in the deletion of job queues from your AWS Batch compute environment.
CloudFormation creates an AWS Batch compute environment using the properties specified in your template. Stack updates result in updates to your compute environment where possible. If you need to change a parameter that is not supported by the UpdateComputeEnvironment API action, stack updates result in the deletion and re-creation of your compute environment. Upon stack deletion, your compute environment is disabled, and then deleted.
All naming conventions specified by CloudFormation should be followed—especially in the case of resource replacement—or you run the risk of a failed stack changes. For example, all AWS Batch resource property names must be capitalized, and resource names must be changed in the case of resource replacement, as is the case in any CloudFormation stack.
If you do not provide values for ComputeEnvironmentName, JobQueueName, or JobDefinitionName in your template, a pseudo-random name is generated for you using the logical ID that you gave the resource in CloudFormation.
Launching a “Hello World” example stack
Here’s a familiar “Hello World” example of a CloudFormation stack with AWS Batch resources.
This example registers a simple job definition, a job queue that can accept job submissions, and a compute environment that contains the compute resources used to execute your job. The stack template also creates additional AWS resources that are required by AWS Batch:
An IAM service role that gives AWS Batch permissions to take the required actions on your behalf
An IAM ECS instance role
A VPC
A VPC subnet (though I’ve provided a general template, I suggest that this be a private subnet)
A security group
This stack can easily be deployed in the CloudFormation console, but I provide CLI commands that complete the stack creation for you. Use the Launch stack button or run the following command:
You can monitor the creation of the resources in your CloudFormation stack in the CloudFormation console, on the Events tab:
Confirm the successful creation of your stack by observing a CREATE_COMPLETE status. At this point, you should also be able to view the new resource ARNs on the Outputs tab:
After your stack is successfully created, everything that you need to submit a “hello-world” job is complete.
Make sure to use the accurate job definition name and revision number. You can find the accurate Amazon Resource Name (ARN) on the CloudFormation stack Outputs tab. A pseudo-random resource name is generated for your AWS Batch resources. If you do have an existing hello-world job definition, make sure that you run the command with the job definition revision created by your new CloudFormation stack from the stack outputs.
You can monitor the successful execution of the job in the AWS Batch console under Jobs:
When you are done using this stack and want to delete the resources, run the following command. CloudFormation deregisters the job definition, and deletes the job queue, compute environment, and the rest of the resources in the stack template.
Now that you know the basics of AWS Batch resources, here’s a more complex example.
High– and low-priority job queues with On-Demand and Spot compute environments
This CloudFormation stack creates two job queues with varying priority and two compute environments. You have one On-Demand compute environment and one Spot compute environment with a Spot price at 40% of On-Demand.
The first job queue is higher priority and feeds jobs to both compute environments, while the lower priority job queue only submits jobs for execution to the Spot compute environment.
There are two job definitions, one high-priority job queue and one low-priority job queue. Each job submitted using a given job definition is submitted to a job queue. For example, jobs submitted with an important-production-application job definition are submitted to the high priority job queue, while jobs submitted with a test-application job definition are submitted to the low priority job queue.
This example registers both job definitions and creates your compute environments and job queues. It also creates the VPC, subnet, security group, IAM service role for AWS Batch, ECS instance role, and an IAM Spot Fleet role. Use the Launch stack button or run the following command:
As with any CloudFormation stack, you can update resources for your application’s specific needs. AWS CloudFormation Designer is a graphic tool for creating, viewing, and modifying CloudFormation templates.
Any changes to resource properties that require replacement results in the creation of a new resource to reflect this change, and the deletion of the obsolete resource. Changes to an immutable compute environment or job queue properties results in replacement. Changes to updateable properties update the existing resource. Any changes to job definitions (beyond the name) result in the registration of a new revision of the existing job definition, followed by the deregistration of the previous revision.
Finally, run the following command to delete the CloudFormation stack containing your AWS Batch resources:
In this post, I detailed the steps to create, update with and without replacement, and delete your AWS Batch resources using CloudFormation templates as part of CloudFormation stacks with other AWS service resources. For more information, see the following topics:
AWS CloudHSM provides fully managed, single-tenant hardware security modules (HSMs) in the AWS cloud. A CloudHSM cluster contains either one or multiple HSMs. Multiple HSMs support higher throughput levels for cryptographic operations and provide redundancy. For clusters with multiple HSMs, the CloudHSM service supports server-side automated synchronization of keys and policies. Users, however, are synchronized from the client-side and the synchronization is driven by configuration files which must be refreshed when the cluster size changes. If you do not refresh the configuration files, your CloudHSM user configurations could become unsynchronized and affect the ability of your CloudHSM cluster to provide consistent support of cryptographic information.
In this blog post, I’ll provide a general overview of a CloudHSM architecture, discuss the cluster synchronization process, build a CloudHSM environment, show how the cluster users can become unsynchronized, and then restore user synchronization to bring your cluster back to a consistent state to meet your needs for consistency and redundancy.
CloudHSM Architectural Overview
When you provision an HSM instance in CloudHSM, the HSM instance provides an elastic network interface (ENI) in yourAmazon VPC while the HSM itself resides in a separate VPC managed by AWS CloudHSM. Your applications use the CloudHSM cluster ID to add or remove HSMs from the cluster and the ENI(s) of the HSM instance(s) to access the HSM instances.
You configure your cluster and its HSM instances using CloudHSM client software you deploy on Amazon EC2 instances in your VPC. You only need one such EC2 instance to manage a CloudHSM cluster, but it’s common to deploy additional EC2 instances in other availability zones to provide for client redundancy. Your applications communicate with the HSM instances using the client daemon. You manage and configure the cluster with command line tools including cloudhsm_mgmt_util, key_mgmt_util, and configure. An example of a CloudHSM architecture appears below.
Figure 1: A 3-Node CloudHSM architecture
The diagram shows a three-node CloudHSM cluster deployed in the us-west-2 (Oregon) region with three Amazon EC2 instances with the CloudHSM software. The client in Availability Zone 2 is communicating with the cluster through the elastic network interfaces in each availability zone.
CloudHSM Synchronization Process
Having discussed the architecture of AWS CloudHSM, let’s turn our attention to the matter of cluster synchronization. There are three events that require synchronization: cluster expansion, key management operations, and user management operations. Let’s look at each of these in more detail.
Cluster Expansion
When you add an HSM to an existing cluster, AWS CloudHSM clones all users, keys, and policies from another HSM in the cluster. No additional steps are required on your part.
Key Management Operations
Key management with the key_mgmt_util tool uses the CloudHSM client to communicate with the HSM cluster. Additionally, a fallback, HSM-based synchronization protocol keeps keys in sync.
User Management
You perform user management tasks, such as adding users or changing passwords, using the cloudhsm_mgmt_util tool. This tool communicates directly with the HSMs, bypassing the client daemon. cloudhsm_mgmt_util uses its own configuration files to determine the HSMs that it should connect to within the cluster. These configuration files aren’t updated dynamically when HSM instances are added. To prevent user synchronization errors, you must update the configuration files before running cloudhsm_mgmt_util. You must also not add new HSM instances to the cluster while you’re using the tool. This helps ensure that no HSM instances are accidentally left out of user updates that would in turn result in user synchronization problems.
Again, these safeguards are only necessary when using cloudhsm_mgmt_util. For all other applications and utilities using CloudHSM, the client daemon automatically reconfigures itself as you add and remove HSM instances from your cluster. In the remainder of this post, I will build a CloudHSM infrastructure as shown in the above diagram. I’ll then show you how users on your CloudHSM instances can become unsynchronized, and how to restore proper synchronization.
Prerequisites and Assumptions
You’ll need to have an AWS account that allows you to provision Amazon VPCs, Amazon EC2 instances, and CloudHSMs.
I’ll use the us-west-2 (Oregon) region, but you can use any region that offers CloudHSM.
You’ll need an Amazon EC2 key pair in the region.
You should have a working knowledge of the services I’ve mentioned.
Important: You’ll incur charges for the resources used in this example. You can find the cost of each service on that service’s pricing page.
Building a CloudHSM Infrastructure
Create an Amazon VPC with subnets in the us-west-2a, us-west-2b, and us-east-2c availability zones. I’ll use the Amazon VPC Architecture Quick Start, which is an AWS CloudFormation template that will do this on your behalf. Make sure you select the correct region after you load the Quick Start. Select the following parameters:
Parameter
Value
Availability Zones
us-west-2a, us-west-2b, us-west-2c
Number of Availability Zones
3
Create private subnets
False
Create additional private subnets with dedicated network ACLs
False
Key pair name
The name of your Amazon EC2 key pair
Accept the default values for all other parameters.
Follow these instructions to create a CloudHSM cluster in your new VPC in the us-west-2a, us-west-2b and us-west-2c availability zones. Note that the cluster will not have any HSMs after it’s created.
Follow these instructions to initialize the cluster with an HSM in the us-west-2a availability zone. After the cluster is initialized, note the ENI IP address from the cluster details section in the console as shown here:
Install the client software on the EC2 instance you launched in step 4.
Add the IP of the EC2 instance that you identified in step 4 to the security group you identified in step 3.
Activate the cluster. The activation instructions will guide you through connecting to the EC2 instance you launched in step 4. Remain logged into the EC2 instance following the activation of the cluster for the steps below.
While you are still logged into the EC2 instance you just launched, follow the steps below to add a crypto user named example_user to the cluster:
Ensure the CloudHSM daemon is stopped:$ sudo stop cloudhsm-client
Configure the IP address of the initial HSM using the ENI IP address from step 3:$ sudo /opt/cloudhsm/bin/configure –a 10.0.129.209
Note: the configure tool updates two configuration files: one for the CloudHSM client, and the other for the cloudhsm_mgmt_util program that is used to administer users.
Start the CloudHSM client:$ sudo start cloudhsm-client
Ensure the cloudhsm_mgmt_util configuration file is up to date. We need to do this to ensure cloudhsm_mgmt_util is aware of all the HSM instances in the cluster:$ sudo /opt/cloudhsm/bin/configure –m
Connect to the HSM instances, enable end-to-end encryption, and log in to the HSM instances. Enabling end-to-end encryption encrypts the communication between cloudhsm_mgmt_util and the HSM to prevent interception of sensitive information such as passwords:$ /opt/cloudhsm/bin/cloudhsm_mgmt_util /opt/cloudhsm/etc/cloudhsm_mgmt_util.cfg
aws-cloudhsm> enable_e2e
aws-cloudhsm> loginHSM CO admin
Figure 4: Connecting to a Single CloudHSM
Note: The connection or log in is automatically executed on every HSM instance that cloudhsm_mgmt_util is aware of. Note also that for each of the commands that you enter, the cloudhsm_mgmt_util program identifies the IP address of the HSM to which it is communicating.
Add the user example_user and then confirm the addition by listing the users in the HSM:aws-cloudhsm> createUser CU example_user yourpassword
aws-cloudhsm> listUsers
Use the quit command to log out and exit the program:aws-cloudhsm> quit
Now that we’ve added a user to the CloudHSM, let’s add a key so we can see how users and keys are synchronized as the cluster changes.
Start the key_mgmt_util program:$ /opt/cloudhsm/bin/key_mgmt_util
Log in to the HSM:Command: loginHSM –u CU –s example_user
Notice that key_mgmt_util displays the node id to which it is communicating.
Use the exit command to leave the program:exit
Add another HSM to the cluster in the us-west-2b availability zone and note the ENI IP address from the cluster details section in the console, as shown here:
Figure 6: The ENI IP address
Update the cluster configuration files and use cloud_mgmt_util to examine the user configuration: $ sudo stop cloudhsm-client$ sudo /opt/cloudhsm/bin/configure –a 10.0.129.209
Figure 7: Connecting to the 2-node CloudHSM cluster
Note that cloudhsm_mgmt_utilcloudhsm_mgmt_util now sends commands to both of the HSMs in the cluster. You can see the same thing when we list the users in the cluster.
Figure 8: Showing proper user synchronization across two CloudHSMs
Now, use key_mgmt_util to examine the keys:Command: findKey
Figure 9: Showing that keys are properly synchronized across a 2-node CloudHSM cluster
This command confirms that when we added the second HSM, CloudHSM used cluster-initiated synchronization to load the users and keys into the new HSM.
The CloudHSM Cluster Users Become Unsynchronized
Start cloudhsm_mgmt_util and enable end-to-end encryption:$ /opt/cloudhsm/bin/cloudhsm_mgmt_util /opt/cloudhsm/etc/cloudhsm_mgmt_util.cfg
aws-cloudhsm> enable_e2e
Figure 10: Connecting to the 2-node CloudHSM cluster
While cloudhsm_mgmt_util is left running, add a third HSM in us-west-2c through the console and note the ENI IP address, as shown here:
Figure 11: Connecting to the 2-node CloudHSM cluster
Going back to cloudhsm_mgmt_util, let’s add a user named newest_user to our cluster. Note that we have not exited cloudhsm_mgmt_util and refreshed its configuration file. So it’s still connected only to the first two HSM instances.aws-cloudhsm> enable_e2e
aws-cloudhsm> loginHSM CO admin yourpassword
aws-cloudhsm> createUser CU newest_user yourpassword
Figure 12: Adding a User to only two nodes of a 3-node CloudHSM Cluster and breaking synchronization
The cloudhsm_mgmt_util command adds the user to the two HSMs it already knows about and had connected to. It doesn’t communicate with the newly added HSM.
Let’s fix this by exiting cloudhsm_mgmt_util. Refresh the configuration, and then run the management utility again.$sudo stop cloudhsm-client
You can now see cloudhsm_mgmt_util is communicating with all of the cluster nodes.
Figure 13: Connecting to a 3-node CloudHSM cluster
Let’s see what happens when we list the users:aws-cloudhsm> listUsers
Figure 14: Showing that users are now unsynchronized
You can see from the results that one of the HSMs (server 1) is missing the user named newest_user. The reason this happened is that cloudhsm_mgmt_util was unaware of the HSM instance that was added while it was running (recall that cloudhsm_mgmt_util doesn’t use the cloudhsm_client daemon and, therefore, doesn’t get automatic cluster configuration updates).
Restoring User Synchronization to the CloudHSM Cluster
We now want to add the user newest_user to the single HSM (server 1) that is out of sync. Normally, cloudhsm_mgmt_util works in cluster mode and applies your commands to all HSMs in the cluster. Since we want to work on a single HSM, we’re going to enter the server command to tell cloudhsm_mgmt_util to work in server mode and apply our commands just to that one HSM.
In the server command below, we specify the number of the HSM that we want to change based on the figure above. In the createUser command, you must use the same password that you used in step 3 (in the section titled “The CloudHSM Cluster Users Become Unsynchronized”) on the other HSMs in the cluster so that all HSMs in the cluster have identical user names and passwords. After we make this change, we use the exit command to transition from server mode back to cluster mode.aws-cloudhsm> server 1
server1> createUser CU newest_user yourpassword
exit
Figure 15: Adding a user to a single-node of a 3-node CloudHSM cluster
Now that we have transitioned back to cluster mode, let’s confirm that the HSM user tables are now synchronized by listing the users:aws-cloudhsm> listUsers
Figure 16: Showing that users are now synchronized across the 3-node CloudHSM cluster
Let’s take a look at the keys using key_mgmt_util:Command: loginHSM –u CU –s example_user –p yourpassword
Command: findKey
Figure 17: Showing that keys continued to be synchronized across a 3-node CloudHSM Cluster
You can see that CloudHSM kept the keys in sync because key synchronization is cluster-initiated. No additional actions are required on our part.
Conclusion
AWS CloudHSM provides the ability to create scalable clusters of HSM instances to support the high volumes of cryptographic operations and provide resiliency by supporting multiple availability zones. As mentioned, it’s important to be aware of the various modes of synchronization used in CloudHSM so that each HSM can provide consistent service. In particular, users are synchronized only by the client. Since cloudhsm_mgmt_util doesn’t rely on the client daemon to talk to HSM instances in your cluster, it doesn’t automatically update its configuration. By following the steps above and refreshing the configuration information before changing users or passwords, CloudHSM will keep users and passwords synchronized within the cluster and provide consistent responses to cryptographic operations if the level of redundancy within the HSM cluster changes.
If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the Amazon CloudHSM forum or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
Thanks to Raja Mani, AWS Solutions Architect, for this great blog.
—
In this blog post, I’ll walk you through the steps for setting up continuous replication of an AWS CodeCommit repository from one AWS region to another AWS region using a serverless architecture. CodeCommit is a fully-managed, highly scalable source control service that stores anything from source code to binaries. It works seamlessly with your existing Git tools and eliminates the need to operate your own source control system. Replicating an AWS CodeCommit repository from one AWS region to another AWS region enables you to achieve lower latency pulls for global developers. This same approach can also be used to automatically back up repositories currently hosted on other services (for example, GitHub or BitBucket) to AWS CodeCommit.
This solution uses AWS Lambda and AWS Fargate for continuous replication. Benefits of this approach include:
The replication process can be easily setup to trigger based on events, such as commits made to the repository.
Setting up a serverless architecture means you don’t need to provision, maintain, or administer servers.
Note: AWS Fargate has a limitation of 10 GB for storage and is available in US East (N. Virginia) region. A similar solution that uses Amazon EC2 instances to replicate the repositories on a schedule was published in a previous blog and can be used if your repository does not meet these conditions.
Replication using Fargate
As you follow this blog post, you’ll set up an architecture that looks like this:
Any change in the AWS CodeCommit repository will trigger a Lambda function. The Lambda function will call the Fargate task that replicates the repository using a Git command line tool.
Let us assume a user wants to replicate a repository (Source) from US East (N. Virginia/us-east-1) region to a repository (Destination) in US West (Oregon/us-west-2) region. I’ll walk you through the steps for it:
Prerequisites
Create an AWS Service IAM role for Amazon EC2 that has permission for both source and destination repositories, IAM CreateRole, AttachRolePolicy and Amazon ECR privileges. Here is the EC2 role policy I used:
You need a Docker environment to build this solution. You can launch an EC2 instance and install Docker (or) you can use AWS Cloud9 that comes with Docker and Git preinstalled. I used an EC2 instance and installed Docker in it. Use the IAM role created in the previous step when creating the EC2 instance. I am going to refer this environment as “Docker Environment” in the following steps.
You need to install the AWS CLI on the Docker environment. For AWS CLI installation, refer this page.
You need to install Git, including a Git command line on the Docker environment.
Step 1: Create the Docker image
To create the Docker image, first it needs a Dockerfile. A Dockerfile is a manifest that describes the base image to use for your Docker image and what you want installed and running on it. For more information about Dockerfiles, go to the Dockerfile Reference.
1. Choose a directory in the Docker environment and perform the following steps in that directory. I used /home/ec2-user directory to perform the following steps.
2. Clone the AWS CodeCommit repository in the Docker environment. Open the terminal to the Docker environment and run the following commands to clone your source AWS CodeCommit repository (I ran the commands from /home/ec2-user directory):
Note: Change the URL marked in red to your source and destination repository URL.
3. Create a file called Dockerfile (case sensitive) with the following content (I created it in /home/ec2-user directory):
# Pull the Amazon Linux latest base image
FROM amazonlinux:latest
#Install aws-cli and git command line tools
RUN yum -y install unzip aws-cli
RUN yum -y install git
WORKDIR /home/ec2-user
RUN mkdir LocalRepository
WORKDIR /home/ec2-user/LocalRepository
#Copy Cloned CodeCommit repository to Docker container
COPY ./LocalRepository /home/ec2-user/LocalRepository
#Copy shell script that does the replication
COPY ./repl_repository.bash /home/ec2-user/LocalRepository
RUN chmod ugo+rwx /home/ec2-user/LocalRepository/repl_repository.bash
WORKDIR /home/ec2-user/LocalRepository
#Call this script when Docker starts the container
ENTRYPOINT ["/home/ec2-user/LocalRepository/repl_repository.bash"]
4. Copy the following shell script into a file called repl_repository.bash to the DockerFile directory location in the Docker environment (I created it in /home/ec2-user directory)
6. Verify whether the replication is working by running the repl_repository.bash script from the LocalRepository directory. Go to LocalRepository directory and run this command: . ../repl_repository.bash If it is successful, you will get the “Everything up-to-date” at the last line of the result like this:
$ . ../repl_repository.bash
Everything up-to-date
Step 2: Build the Docker Image
1. Build the Docker image by running this command from the directory where you created the DockerFile in the Docker environment in the previous step (I ran it from /home/ec2-user directory):
$ docker build . –t ccrepl
Output: It installs various packages and set environment variables as part of steps 1 to 3 from the Dockerfile. The steps 4 to 11 from the Dockerfile should produce an output similar to the following:
2. Run the following command to verify that the image was created successfully. It will display “Everything up-to-date” at the end if it is successful.
[[email protected] LocalRepository]$ docker run ccrepl
Everything up-to-date
Step 3: Push the Docker Image to Amazon Elastic Container Registry (ECR)
Perform the following steps in the Docker Environment.
1. Run the AWS CLI configure command and set default region as your source repository region (I used us-east-1).
$ aws configure set default.region <Source Repository Region>
2. Create an Amazon ECR repository using this command to store your ccrepl image (Note the repositoryUri in the output):
2. Create a role called AccessRoleForCCfromFG using the following command in the DockerEnvironment:
$ aws iam create-role --role-name AccessRoleForCCfromFG --assume-role-policy-document file://trustpolicyforecs.json
3. Assign CodeCommit service full access to the above role using the following command in the DockerEnvironment:
$ aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AWSCodeCommitFullAccess --role-name AccessRoleForCCfromFG
4. In the Amazon ECS Console, choose Repositories and select the ccrepl repository that was created in the previous step. Copy the Repository URI.
5. In the Amazon ECS Console, choose Task Definitions and click Create New Task Definition.
6. Select launch type compatibility as FARGATE and click Next Step.
7. In the create task definition screen, do the following:
In Task Definition Name, type ccrepl
In Task Role, choose AccessRoleForCCfromFG
In Task Memory, choose 2GB
In Task CPU, choose 1 vCPU
Click Add Container under Container Definitions in the same screen. In the Add Container screen, do the following:
Enter Container name as ccreplcont
Enter Image URL copied from step 4
Enter Memory Limits as 128 and click Add.
Note: Select TaskExecutionRole as “ecsTaskExecutionRole” if it already exists. If not, select create new role and it will create “ecsTaskExecutionRole” for you.
8. Click the Create button in the task definition screen to create the task. It will successfully create the task, execution role and AWS CloudWatch Log groups.
9. In the Amazon ECS Console, click Clusters and create cluster. Select template as “Networking only, Powered by AWS Fargate” and click next step.
10. Enter cluster name as ccreplcluster and click create.
Step 5: Create the Lambda Function
In this section, I used Amazon Elastic Container Service (ECS) run task API from Lambda to invoke the Fargate task.
1. In the IAM Console, create a new role called ECSLambdaRole with the permissions to AWS CodeCommit, Amazon ECS as well as pass roles privileges needed to run the ECS task. Your statement should look similar to the following (replace <your account id>):
2. In AWS management console, select VPC service and click subnets in the left navigation screen. Note down the Subnet IDs that you want to run the Fargate task in.
3. Create a new Lambda Node.js function called FargateTaskExecutionFunc and assign the role ECSLambdaRole with the following content:
Note: Replace subnets values (marked in red color) with the subnet IDs you identified as the subnets you wanted to run the Fargate task on in Step 2 of this section.
1. In the Lambda Console, click FargateTaskExecutionFunc under functions.
2. Under Add triggers in the Designer, select CodeCommit
3. In the Configure triggers screen, do the following:
Enter Repository name as Source (your source repository name)
Enter trigger name as LambdaTrigger
Leave the Events as “All repository events”
Leave the Branch names as “All branches”
Click Add button
Click Save button to save the changes
Step 6: Verification
To test the application, make a commit and push the changes to the source repository in AWS CodeCommit. That should automatically trigger the Lambda function and replicate the changes in the destination repository. You can verify this by checking CloudWatch Logs for Lambda and ECS, or simply going to the destination repository and verifying the change appears.
Conclusion
Congratulations! You have successfully configured repository replication of an AWS CodeCommit repository using AWS Lambda and AWS Fargate. You can use this technique in a deployment pipeline. You can also tweak the trigger configuration in AWS CodeCommit to call the Lambda function in response to any supported trigger event in AWS CodeCommit.
This blog was contributed by Rucha Nene, Sr. Product Manager for Amazon EBS
AWS customers use tags to track ownership of resources, implement compliance protocols, control access to resources via IAM policies, and drive their cost accounting processes. Last year, we made tagging for Amazon EC2 instances and Amazon EBS volumes easier by adding the ability to tag these resources upon creation. We are now extending this capability to EBS snapshots.
Earlier, you could tag your EBS snapshots only after the resource had been created and sometimes, ended up with EBS snapshots in an untagged state if tagging failed. You also could not control the actions that users and groups could take over specific snapshots, or enforce tighter security policies.
To address these issues, we are making tagging for EBS snapshots more flexible and giving customers more control over EBS snapshots by introducing two new capabilities:
Tag on creation for EBS snapshots – You can now specify tags for EBS snapshots as part of the API call that creates the resource or via the Amazon EC2 Console when creating an EBS snapshot.
Resource-level permission and enforced tag usage – The CreateSnapshot, DeleteSnapshot, and ModifySnapshotAttrribute API actions now support IAM resource-level permissions. You can now write IAM policies that mandate the use of specific tags when taking actions on EBS snapshots.
Tag on creation
You can now specify tags for EBS snapshots as part of the API call that creates the resources. The resource creation and the tagging are performed atomically; both must succeed in order for the operation CreateSnapshot to succeed. You no longer need to build tagging scripts that run after EBS snapshots have been created.
Here’s how you specify tags when you create an EBS snapshot, using the console:
CreateSnapshot, DeleteSnapshot, and ModifySnapshotAttribute now support resource-level permissions, which allow you to exercise more control over EBS snapshots. You can write IAM policies that give you precise control over access to resources and let you specify which users are able to create snapshots for a given set of volumes. You can also enforce the use of specific tags to help track resources and achieve more accurate cost allocation reporting.
For example, here’s a statement that requires that the costcenter tag (with a value of “115”) be present on the volume from which snapshots are being created. It requires that this tag be applied to all newly created snapshots. In addition, it requires that the created snapshots are tagged with User:username for the customer.
To implement stronger compliance and security policies, you could also restrict access to DeleteSnapshot, if the resource is not tagged with the user’s name. Here’s a statement that allows the deletion of a snapshot only if the snapshot is tagged with User:username for the customer.
Apache Cassandra is a commonly used, high performance NoSQL database. AWS customers that currently maintain Cassandra on-premises may want to take advantage of the scalability, reliability, security, and economic benefits of running Cassandra on Amazon EC2.
Amazon EC2 and Amazon Elastic Block Store (Amazon EBS) provide secure, resizable compute capacity and storage in the AWS Cloud. When combined, you can deploy Cassandra, allowing you to scale capacity according to your requirements. Given the number of possible deployment topologies, it’s not always trivial to select the most appropriate strategy suitable for your use case.
In this post, we outline three Cassandra deployment options, as well as provide guidance about determining the best practices for your use case in the following areas:
Cassandra resource overview
Deployment considerations
Storage options
Networking
High availability and resiliency
Maintenance
Security
Before we jump into best practices for running Cassandra on AWS, we should mention that we have many customers who decided to use DynamoDB instead of managing their own Cassandra cluster. DynamoDB is fully managed, serverless, and provides multi-master cross-region replication, encryption at rest, and managed backup and restore. Integration with AWS Identity and Access Management (IAM) enables DynamoDB customers to implement fine-grained access control for their data security needs.
Several customers who have been using large Cassandra clusters for many years have moved to DynamoDB to eliminate the complications of administering Cassandra clusters and maintaining high availability and durability themselves. Gumgum.com is one customer who migrated to DynamoDB and observed significant savings. For more information, see Moving to Amazon DynamoDB from Hosted Cassandra: A Leap Towards 60% Cost Saving per Year.
AWS provides options, so you’re covered whether you want to run your own NoSQL Cassandra database, or move to a fully managed, serverless DynamoDB database.
Cassandra resource overview
Here’s a short introduction to standard Cassandra resources and how they are implemented with AWS infrastructure. If you’re already familiar with Cassandra or AWS deployments, this can serve as a refresher.
Resource
Cassandra
AWS
Cluster
A single Cassandra deployment.
This typically consists of multiple physical locations, keyspaces, and physical servers.
A logical deployment construct in AWS that maps to an AWS CloudFormation StackSet, which consists of one or many CloudFormation stacks to deploy Cassandra.
Datacenter
A group of nodes configured as a single replication group.
A logical deployment construct in AWS.
A datacenter is deployed with a single CloudFormation stack consisting of Amazon EC2 instances, networking, storage, and security resources.
Rack
A collection of servers.
A datacenter consists of at least one rack. Cassandra tries to place the replicas on different racks.
A single Availability Zone.
Server/node
A physical virtual machine running Cassandra software.
An EC2 instance.
Token
Conceptually, the data managed by a cluster is represented as a ring. The ring is then divided into ranges equal to the number of nodes. Each node being responsible for one or more ranges of the data. Each node gets assigned with a token, which is essentially a random number from the range. The token value determines the node’s position in the ring and its range of data.
Managed within Cassandra.
Virtual node (vnode)
Responsible for storing a range of data. Each vnode receives one token in the ring. A cluster (by default) consists of 256 tokens, which are uniformly distributed across all servers in the Cassandra datacenter.
Managed within Cassandra.
Replication factor
The total number of replicas across the cluster.
Managed within Cassandra.
Deployment considerations
One of the many benefits of deploying Cassandra on Amazon EC2 is that you can automate many deployment tasks. In addition, AWS includes services, such as CloudFormation, that allow you to describe and provision all your infrastructure resources in your cloud environment.
We recommend orchestrating each Cassandra ring with one CloudFormation template. If you are deploying in multiple AWS Regions, you can use a CloudFormation StackSet to manage those stacks. All the maintenance actions (scaling, upgrading, and backing up) should be scripted with an AWS SDK. These may live as standalone AWS Lambda functions that can be invoked on demand during maintenance.
You can get started by following the Cassandra Quick Start deployment guide. Keep in mind that this guide does not address the requirements to operate a production deployment and should be used only for learning more about Cassandra.
Deployment patterns
In this section, we discuss various deployment options available for Cassandra in Amazon EC2. A successful deployment starts with thoughtful consideration of these options. Consider the amount of data, network environment, throughput, and availability.
Single AWS Region, 3 Availability Zones
Active-active, multi-Region
Active-standby, multi-Region
Single region, 3 Availability Zones
In this pattern, you deploy the Cassandra cluster in one AWS Region and three Availability Zones. There is only one ring in the cluster. By using EC2 instances in three zones, you ensure that the replicas are distributed uniformly in all zones.
To ensure the even distribution of data across all Availability Zones, we recommend that you distribute the EC2 instances evenly in all three Availability Zones. The number of EC2 instances in the cluster is a multiple of three (the replication factor).
This pattern is suitable in situations where the application is deployed in one Region or where deployments in different Regions should be constrained to the same Region because of data privacy or other legal requirements.
Pros
Cons
● Highly available, can sustain failure of one Availability Zone.
● Simple deployment
● Does not protect in a situation when many of the resources in a Region are experiencing intermittent failure.
Active-active, multi-Region
In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.
We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.
This pattern is most suitable when the applications using the Cassandra cluster are deployed in more than one Region.
Pros
Cons
● No data loss during failover.
● Highly available, can sustain when many of the resources in a Region are experiencing intermittent failures.
● Read/write traffic can be localized to the closest Region for the user for lower latency and higher performance.
● High operational overhead
● The second Region effectively doubles the cost
Active-standby, multi-region
In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.
However, the second Region does not receive traffic from the applications. It only functions as a secondary location for disaster recovery reasons. If the primary Region is not available, the second Region receives traffic.
We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.
This pattern is most suitable when the applications using the Cassandra cluster require low recovery point objective (RPO) and recovery time objective (RTO).
Pros
Cons
● No data loss during failover.
● Highly available, can sustain failure or partitioning of one whole Region.
● High operational overhead.
● High latency for writes for eventual consistency.
● The second Region effectively doubles the cost.
Storage options
In on-premises deployments, Cassandra deployments use local disks to store data. There are two storage options for EC2 instances:
Your choice of storage is closely related to the type of workload supported by the Cassandra cluster. Instance store works best for most general purpose Cassandra deployments. However, in certain read-heavy clusters, Amazon EBS is a better choice.
The choice of instance type is generally driven by the type of storage:
If ephemeral storage is required for your application, a storage-optimized (I3) instance is the best option.
If your workload requires Amazon EBS, it is best to go with compute-optimized (C5) instances.
Burstable instance types (T2) don’t offer good performance for Cassandra deployments.
Instance store
Ephemeral storage is local to the EC2 instance. It may provide high input/output operations per second (IOPs) based on the instance type. An SSD-based instance store can support up to 3.3M IOPS in I3 instances. This high performance makes it an ideal choice for transactional or write-intensive applications such as Cassandra.
In general, instance storage is recommended for transactional, large, and medium-size Cassandra clusters. For a large cluster, read/write traffic is distributed across a higher number of nodes, so the loss of one node has less of an impact. However, for smaller clusters, a quick recovery for the failed node is important.
As an example, for a cluster with 100 nodes, the loss of 1 node is 3.33% loss (with a replication factor of 3). Similarly, for a cluster with 10 nodes, the loss of 1 node is 33% less capacity (with a replication factor of 3).
Ephemeral storage
Amazon EBS
Comments
IOPS
(translates to higher query performance)
Up to 3.3M on I3
80K/instance
10K/gp2/volume
32K/io1/volume
This results in a higher query performance on each host. However, Cassandra implicitly scales well in terms of horizontal scale. In general, we recommend scaling horizontally first. Then, scale vertically to mitigate specific issues.
Note: 3.3M IOPS is observed with 100% random read with a 4-KB block size on Amazon Linux.
AWS instance types
I3
Compute optimized, C5
Being able to choose between different instance types is an advantage in terms of CPU, memory, etc., for horizontal and vertical scaling.
Backup/ recovery
Custom
Basic building blocks are available from AWS.
Amazon EBS offers distinct advantage here. It is small engineering effort to establish a backup/restore strategy.
a) In case of an instance failure, the EBS volumes from the failing instance are attached to a new instance.
b) In case of an EBS volume failure, the data is restored by creating a new EBS volume from last snapshot.
Amazon EBS
EBS volumes offer higher resiliency, and IOPs can be configured based on your storage needs. EBS volumes also offer some distinct advantages in terms of recovery time. EBS volumes can support up to 32K IOPS per volume and up to 80K IOPS per instance in RAID configuration. They have an annualized failure rate (AFR) of 0.1–0.2%, which makes EBS volumes 20 times more reliable than typical commodity disk drives.
The primary advantage of using Amazon EBS in a Cassandra deployment is that it reduces data-transfer traffic significantly when a node fails or must be replaced. The replacement node joins the cluster much faster. However, Amazon EBS could be more expensive, depending on your data storage needs.
Cassandra has built-in fault tolerance by replicating data to partitions across a configurable number of nodes. It can not only withstand node failures but if a node fails, it can also recover by copying data from other replicas into a new node. Depending on your application, this could mean copying tens of gigabytes of data. This adds additional delay to the recovery process, increases network traffic, and could possibly impact the performance of the Cassandra cluster during recovery.
Data stored on Amazon EBS is persisted in case of an instance failure or termination. The node’s data stored on an EBS volume remains intact and the EBS volume can be mounted to a new EC2 instance. Most of the replicated data for the replacement node is already available in the EBS volume and won’t need to be copied over the network from another node. Only the changes made after the original node failed need to be transferred across the network. That makes this process much faster.
EBS volumes are snapshotted periodically. So, if a volume fails, a new volume can be created from the last known good snapshot and be attached to a new instance. This is faster than creating a new volume and coping all the data to it.
Most Cassandra deployments use a replication factor of three. However, Amazon EBS does its own replication under the covers for fault tolerance. In practice, EBS volumes are about 20 times more reliable than typical disk drives. So, it is possible to go with a replication factor of two. This not only saves cost, but also enables deployments in a region that has two Availability Zones.
EBS volumes are recommended in case of read-heavy, small clusters (fewer nodes) that require storage of a large amount of data. Keep in mind that the Amazon EBS provisioned IOPS could get expensive. General purpose EBS volumes work best when sized for required performance.
Networking
If your cluster is expected to receive high read/write traffic, select an instance type that offers 10–Gb/s performance. As an example, i3.8xlarge and c5.9xlarge both offer 10–Gb/s networking performance. A smaller instance type in the same family leads to a relatively lower networking throughput.
Cassandra generates a universal unique identifier (UUID) for each node based on IP address for the instance. This UUID is used for distributing vnodes on the ring.
In the case of an AWS deployment, IP addresses are assigned automatically to the instance when an EC2 instance is created. With the new IP address, the data distribution changes and the whole ring has to be rebalanced. This is not desirable.
To preserve the assigned IP address, use a secondary elastic network interface with a fixed IP address. Before swapping an EC2 instance with a new one, detach the secondary network interface from the old instance and attach it to the new one. This way, the UUID remains same and there is no change in the way that data is distributed in the cluster.
If you are deploying in more than one region, you can connect the two VPCs in two regions using cross-region VPC peering.
High availability and resiliency
Cassandra is designed to be fault-tolerant and highly available during multiple node failures. In the patterns described earlier in this post, you deploy Cassandra to three Availability Zones with a replication factor of three. Even though it limits the AWS Region choices to the Regions with three or more Availability Zones, it offers protection for the cases of one-zone failure and network partitioning within a single Region. The multi-Region deployments described earlier in this post protect when many of the resources in a Region are experiencing intermittent failure.
Resiliency is ensured through infrastructure automation. The deployment patterns all require a quick replacement of the failing nodes. In the case of a regionwide failure, when you deploy with the multi-Region option, traffic can be directed to the other active Region while the infrastructure is recovering in the failing Region. In the case of unforeseen data corruption, the standby cluster can be restored with point-in-time backups stored in Amazon S3.
Maintenance
In this section, we look at ways to ensure that your Cassandra cluster is healthy:
Scaling
Upgrades
Backup and restore
Scaling
Cassandra is horizontally scaled by adding more instances to the ring. We recommend doubling the number of nodes in a cluster to scale up in one scale operation. This leaves the data homogeneously distributed across Availability Zones. Similarly, when scaling down, it’s best to halve the number of instances to keep the data homogeneously distributed.
Cassandra is vertically scaled by increasing the compute power of each node. Larger instance types have proportionally bigger memory. Use deployment automation to swap instances for bigger instances without downtime or data loss.
Upgrades
All three types of upgrades (Cassandra, operating system patching, and instance type changes) follow the same rolling upgrade pattern.
In this process, you start with a new EC2 instance and install software and patches on it. Thereafter, remove one node from the ring. For more information, see Cassandra cluster Rolling upgrade. Then, you detach the secondary network interface from one of the EC2 instances in the ring and attach it to the new EC2 instance. Restart the Cassandra service and wait for it to sync. Repeat this process for all nodes in the cluster.
Backup and restore
Your backup and restore strategy is dependent on the type of storage used in the deployment. Cassandra supports snapshots and incremental backups. When using instance store, a file-based backup tool works best. Customers use rsync or other third-party products to copy data backups from the instance to long-term storage. For more information, see Backing up and restoring data in the DataStax documentation. This process has to be repeated for all instances in the cluster for a complete backup. These backup files are copied back to new instances to restore. We recommend using S3 to durably store backup files for long-term storage.
For Amazon EBS based deployments, you can enable automated snapshots of EBS volumes to back up volumes. New EBS volumes can be easily created from these snapshots for restoration.
Security
We recommend that you think about security in all aspects of deployment. The first step is to ensure that the data is encrypted at rest and in transit. The second step is to restrict access to unauthorized users. For more information about security, see the Cassandra documentation.
Encryption at rest
Encryption at rest can be achieved by using EBS volumes with encryption enabled. Amazon EBS uses AWS KMS for encryption. For more information, see Amazon EBS Encryption.
Instance store–based deployments require using an encrypted file system or an AWS partner solution. If you are using DataStax Enterprise, it supports transparent data encryption.
Encryption in transit
Cassandra uses Transport Layer Security (TLS) for client and internode communications.
Authentication
The security mechanism is pluggable, which means that you can easily swap out one authentication method for another. You can also provide your own method of authenticating to Cassandra, such as a Kerberos ticket, or if you want to store passwords in a different location, such as an LDAP directory.
Authorization
The authorizer that’s plugged in by default is org.apache.cassandra.auth.Allow AllAuthorizer. Cassandra also provides a role-based access control (RBAC) capability, which allows you to create roles and assign permissions to these roles.
Conclusion
In this post, we discussed several patterns for running Cassandra in the AWS Cloud. This post describes how you can manage Cassandra databases running on Amazon EC2. AWS also provides managed offerings for a number of databases. To learn more, see Purpose-built databases for all your application needs.
If you have questions or suggestions, please comment below.
Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.
Provanshu Dey is a Senior IoT Consultant with AWS Professional Services. He works on highly scalable and reliable IoT, data and machine learning solutions with our customers. In his spare time, he enjoys spending time with his family and tinkering with electronics & gadgets.
Most malware tries to compromise your systems by using a known vulnerability that the operating system maker has already patched. As best practices to help prevent malware from affecting your systems, you should apply all operating system patches and actively monitor your systems for missing patches.
Launch an Amazon EC2 instance for use with Systems Manager.
Configure Systems Manager to patch your Amazon EC2 Linux instances.
In two previous blog posts (Part 1 and Part 2), I showed how to use the AWS Management Console to perform the necessary steps to patch, inspect, and protect Microsoft Windows workloads. You can implement those same processes for your Linux instances running in AWS by changing the instance tags and types shown in the previous blog posts.
Because most Linux system administrators are more familiar with using a command line, I show how to patch Linux workloads by using the AWS CLI in this blog post. The steps to use the Amazon EBS Snapshot Scheduler and Amazon Inspector are identical for both Microsoft Windows and Linux.
What you should know first
To follow along with the solution in this post, you need one or more Amazon EC2 instances. You may use existing instances or create new instances. For this post, I assume this is an Amazon EC2 for Amazon Linux instance installed from Amazon Machine Images (AMIs).
Systems Manager is a collection of capabilities that helps you automate management tasks for AWS-hosted instances on Amazon EC2 and your on-premises servers. In this post, I use Systems Manager for two purposes: to run remote commands and apply operating system patches. To learn about the full capabilities of Systems Manager, see What Is AWS Systems Manager?
If you are not familiar with how to launch an Amazon EC2 instance, see Launching an Instance. I also assume you launched or will launch your instance in a private subnet. You must make sure that the Amazon EC2 instance can connect to the internet using a network address translation (NAT) instance or NAT gateway to communicate with Systems Manager. The following diagram shows how you should structure your VPC.
Later in this post, you will assign tasks to a maintenance window to patch your instances with Systems Manager. To do this, the IAM user you are using for this post must have the iam:PassRole permission. This permission allows the IAM user assigning tasks to pass his own IAM permissions to the AWS service. In this example, when you assign a task to a maintenance window, IAM passes your credentials to Systems Manager. You also should authorize your IAM user to use Amazon EC2 and Systems Manager. As mentioned before, you will be using the AWS CLI for most of the steps in this blog post. Our documentation shows you how to get started with the AWS CLI. Make sure you have the AWS CLI installed and configured with an AWS access key and secret access key that belong to an IAM user that have the following AWS managed policies attached to the IAM user you are using for this example: AmazonEC2FullAccess and AmazonSSMFullAccess.
Step 1: Launch an Amazon EC2 Linux instance
In this section, I show you how to launch an Amazon EC2 instance so that you can use Systems Manager with the instance. This step requires you to do three things:
Create an IAM role for Systems Manager before launching your Amazon EC2 instance.
Launch your Amazon EC2 instance with Amazon EBS and the IAM role for Systems Manager.
Add tags to the instances so that you can add your instances to a Systems Manager maintenance window based on tags.
A. Create an IAM role for Systems Manager
Before launching an Amazon EC2 instance, I recommend that you first create an IAM role for Systems Manager, which you will use to update the Amazon EC2 instance. AWS already provides a preconfigured policy that you can use for the new role and it is called AmazonEC2RoleforSSM.
Create a JSON file named trustpolicy-ec2ssm.json that contains the following trust policy. This policy describes which principal (an entity that can take action on an AWS resource) is allowed to assume the role we are going to create. In this example, the principal is the Amazon EC2 service.
Use the following command to create a role named EC2SSM that has the AWS managed policy AmazonEC2RoleforSSM attached to it. This generates JSON-based output that describes the role and its parameters, if the command is successful.
$ aws iam create-role --role-name EC2SSM --assume-role-policy-document file://trustpolicy-ec2ssm.json
Use the following command to attach the AWS managed IAM policy (AmazonEC2RoleforSSM) to your newly created role.
$ aws iam attach-role-policy --role-name EC2SSM --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEC2RoleforSSM
Use the following commands to create the IAM instance profile and add the role to the instance profile. The instance profile is needed to attach the role we created earlier to your Amazon EC2 instance.
$ aws iam create-instance-profile --instance-profile-name EC2SSM-IP
$ aws iam add-role-to-instance-profile --instance-profile-name EC2SSM-IP --role-name EC2SSM
B. Launch your Amazon EC2 instance
To follow along, you need an Amazon EC2 instance that is running Amazon Linux. You can use any existing instance you may have or create a new instance.
When launching a new Amazon EC2 instance, be sure that:
Use the following command to launch a new Amazon EC2 instance using an Amazon Linux AMI available in the US East (N. Virginia) Region (also known as us-east-1). Replace YourKeyPair and YourSubnetId with your information. For more information about creating a key pair, see the create-key-pair documentation. Write down the InstanceId that is in the output because you will need it later in this post.
If you are using an existing Amazon EC2 instance, you can use the following command to attach the instance profile you created earlier to your instance.
The final step of configuring your Amazon EC2 instances is to add tags. You will use these tags to configure Systems Manager in Step 2 of this post. For this example, I add a tag named Patch Group and set the value to Linux Servers. I could have other groups of Amazon EC2 instances that I treat differently by having the same tag name but a different tag value. For example, I might have a collection of other servers with the tag name Patch Group with a value of Web Servers.
Use the following command to add the Patch Group tag to your Amazon EC2 instance.
Note: You must wait a few minutes until the Amazon EC2 instance is available before you can proceed to the next section. To make sure your Amazon EC2 instance is online and ready, you can use the following AWS CLI command:
At this point, you now have at least one Amazon EC2 instance you can use to configure Systems Manager.
Step 2: Configure Systems Manager
In this section, I show you how to configure and use Systems Manager to apply operating system patches to your Amazon EC2 instances, and how to manage patch compliance.
To start, I provide some background information about Systems Manager. Then, I cover how to:
Create the Systems Manager IAM role so that Systems Manager is able to perform patch operations.
Create a Systems Manager patch baseline and associate it with your instance to define which patches Systems Manager should apply.
Define a maintenance window to make sure Systems Manager patches your instance when you tell it to.
Monitor patch compliance to verify the patch state of your instances.
You must meet two prerequisites to use Systems Manager to apply operating system patches. First, you must attach the IAM role you created in the previous section, EC2SSM, to your Amazon EC2 instance. Second, you must install the Systems Manager agent on your Amazon EC2 instance. If you have used a recent Amazon Linux AMI, Amazon has already installed the Systems Manager agent on your Amazon EC2 instance. You can confirm this by logging in to an Amazon EC2 instance and checking the Systems Manager agent log files that are located at /var/log/amazon/ssm/.
For a maintenance window to be able to run any tasks, you must create a new role for Systems Manager. This role is a different kind of role than the one you created earlier: this role will be used by Systems Manager instead of Amazon EC2. Earlier, you created the role, EC2SSM, with the policy, AmazonEC2RoleforSSM, which allowed the Systems Manager agent on your instance to communicate with Systems Manager. In this section, you need a new role with the policy, AmazonSSMMaintenanceWindowRole, so that the Systems Manager service can execute commands on your instance.
To create the new IAM role for Systems Manager:
Create a JSON file named trustpolicy-maintenancewindowrole.json that contains the following trust policy. This policy describes which principal is allowed to assume the role you are going to create. This trust policy allows not only Amazon EC2 to assume this role, but also Systems Manager.
Use the following command to create a role named MaintenanceWindowRole that has the AWS managed policy, AmazonSSMMaintenanceWindowRole, attached to it. This command generates JSON-based output that describes the role and its parameters, if the command is successful.
$ aws iam create-role --role-name MaintenanceWindowRole --assume-role-policy-document file://trustpolicy-maintenancewindowrole.json
Use the following command to attach the AWS managed IAM policy (AmazonEC2RoleforSSM) to your newly created role.
$ aws iam attach-role-policy --role-name MaintenanceWindowRole --policy-arn arn:aws:iam::aws:policy/service-role/AmazonSSMMaintenanceWindowRole
B. Create a Systems Manager patch baseline and associate it with your instance
Next, you will create a Systems Manager patch baseline and associate it with your Amazon EC2 instance. A patch baseline defines which patches Systems Manager should apply to your instance. Before you can associate the patch baseline with your instance, though, you must determine if Systems Manager recognizes your Amazon EC2 instance. Use the following command to list all instances managed by Systems Manager. The --filters option ensures you look only for your newly created Amazon EC2 instance.
If your instance is missing from the list, verify that:
Your instance is running.
You attached the Systems Manager IAM role, EC2SSM.
You deployed a NAT gateway in your public subnet to ensure your VPC reflects the diagram shown earlier in this post so that the Systems Manager agent can connect to the Systems Manager internet endpoint.
Now that you have checked that Systems Manager can manage your Amazon EC2 instance, it is time to create a patch baseline. With a patch baseline, you define which patches are approved to be installed on all Amazon EC2 instances associated with the patch baseline. The Patch Group resource tag you defined earlier will determine to which patch group an instance belongs. If you do not specifically define a patch baseline, the default AWS-managed patch baseline is used.
To create a patch baseline:
Use the following command to create a patch baseline named AmazonLinuxServers. With approval rules, you can determine the approved patches that will be included in your patch baseline. In this example, you add all Critical severity patches to the patch baseline as soon as they are released, by setting the Auto approval delay to 0 days. By setting the Auto approval delay to 2 days, you add to this patch baseline the Important, Medium, and Low severity patches two days after they are released.
Use the following command to register the patch baseline you created with your instance. To do so, you use the Patch Group tag that you added to your Amazon EC2 instance.
Now that you have successfully set up a role, created a patch baseline, and registered your Amazon EC2 instance with your patch baseline, you will define a maintenance window so that you can control when your Amazon EC2 instances will receive patches. By creating multiple maintenance windows and assigning them to different patch groups, you can make sure your Amazon EC2 instances do not all reboot at the same time.
To define a maintenance window:
Use the following command to define a maintenance window. In this example command, the maintenance window will start every Saturday at 10:00 P.M. UTC. It will have a duration of 4 hours and will not start any new tasks 1 hour before the end of the maintenance window.
After defining the maintenance window, you must register the Amazon EC2 instance with the maintenance window so that Systems Manager knows which Amazon EC2 instance it should patch in this maintenance window. You can register the instance by using the same Patch Group tag you used to associate the Amazon EC2 instance with the AWS-provided patch baseline, as shown in the following command.
Assign a task to the maintenance window that will install the operating system patches on your Amazon EC2 instance. The following command includes the following options.
name is the name of your task and is optional. I named mine Patching.
task-arn is the name of the task document you want to run.
max-concurrency allows you to specify how many of your Amazon EC2 instances Systems Manager should patch at the same time. max-errors determines when Systems Manager should abort the task. For patching, this number should not be too low, because you do not want your entire patch task to stop on all instances if one instance fails. You can set this, for example, to 20%.
service-role-arn is the Amazon Resource Name (ARN) of the AmazonSSMMaintenanceWindowRole role you created earlier in this blog post.
task-invocation-parameters defines the parameters that are specific to the AWS-RunPatchBaseline task document and tells Systems Manager that you want to install patches with a timeout of 600 seconds (10 minutes).
Now, you must wait for the maintenance window to run at least once according to the schedule you defined earlier. If your maintenance window has expired, you can check the status of any maintenance tasks Systems Manager has performed by using the following command.
You also can see the overall patch compliance of all Amazon EC2 instances using the following command in the AWS CLI.
$ aws ssm list-compliance-summaries
This command shows you the number of instances that are compliant with each category and the number of instances that are not in JSON format.
You also can see overall patch compliance by choosing Compliance under Insights in the navigation pane of the Systems Manager console. You will see a visual representation of how many Amazon EC2 instances are up to date, how many Amazon EC2 instances are noncompliant, and how many Amazon EC2 instances are compliant in relation to the earlier defined patch baseline.
In this section, you have set everything up for patch management on your instance. Now you know how to patch your Amazon EC2 instance in a controlled manner and how to check if your Amazon EC2 instance is compliant with the patch baseline you have defined. Of course, I recommend that you apply these steps to all Amazon EC2 instances you manage.
Summary
In this blog post, I showed how to use Systems Manager to create a patch baseline and maintenance window to keep your Amazon EC2 Linux instances up to date with the latest security patches. Remember that by creating multiple maintenance windows and assigning them to different patch groups, you can make sure your Amazon EC2 instances do not all reboot at the same time.
If you have comments about this post, submit them in the “Comments” section below. If you have questions about or issues implementing any part of this solution, start a new thread on the Amazon EC2 forum or contact AWS Support.
Microservice-application requirements have changed dramatically in recent years. These days, applications operate with petabytes of data, need almost 100% uptime, and end users expect sub-second response times. Typical N-tier applications can’t deliver on these requirements.
Reactive Manifesto, published in 2014, describes the essential characteristics of reactive systems including: responsiveness, resiliency, elasticity, and being message driven.
Being message driven is perhaps the most important characteristic of reactive systems. Asynchronous messaging helps in the design of loosely coupled systems, which is a key factor for scalability. In order to build a highly decoupled system, it is important to isolate services from each other. As already described, isolation is an important aspect of the microservices pattern. Indeed, reactive systems and microservices are a natural fit.
Implemented Use Case This reference architecture illustrates a typical ad-tracking implementation.
Many ad-tracking companies collect massive amounts of data in near-real-time. In many cases, these workloads are very spiky and heavily depend on the success of the ad-tech companies’ customers. Typically, an ad-tracking-data use case can be separated into a real-time part and a non-real-time part. In the real-time part, it is important to collect data as fast as possible and ask several questions including:, “Is this a valid combination of parameters?,””Does this program exist?,” “Is this program still valid?”
Because response time has a huge impact on conversion rate in advertising, it is important for advertisers to respond as fast as possible. This information should be kept in memory to reduce communication overhead with the caching infrastructure. The tracking application itself should be as lightweight and scalable as possible. For example, the application shouldn’t have any shared mutable state and it should use reactive paradigms. In our implementation, one main application is responsible for this real-time part. It collects and validates data, responds to the client as fast as possible, and asynchronously sends events to backend systems.
The non-real-time part of the application consumes the generated events and persists them in a NoSQL database. In a typical tracking implementation, clicks, cookie information, and transactions are matched asynchronously and persisted in a data store. The matching part is not implemented in this reference architecture. Many ad-tech architectures use frameworks like Hadoop for the matching implementation.
The system can be logically divided into the data collection partand the core data updatepart. The data collection part is responsible for collecting, validating, and persisting the data. In the core data update part, the data that is used for validation gets updated and all subscribers are notified of new data.
Components and Services
Main Application The main application is implemented using Java 8 and uses Vert.x as the main framework. Vert.x is an event-driven, reactive, non-blocking, polyglot framework to implement microservices. It runs on the Java virtual machine (JVM) by using the low-level IO library Netty. You can write applications in Java, JavaScript, Groovy, Ruby, Kotlin, Scala, and Ceylon. The framework offers a simple and scalable actor-like concurrency model. Vert.x calls handlers by using a thread known as an event loop. To use this model, you have to write code known as “verticles.” Verticles share certain similarities with actors in the actor model. To use them, you have to implement the verticle interface. Verticles communicate with each other by generating messages in a single event bus. Those messages are sent on the event bus to a specific address, and verticles can register to this address by using handlers.
With only a few exceptions, none of the APIs in Vert.x block the calling thread. Similar to Node.js, Vert.x uses the reactor pattern. However, in contrast to Node.js, Vert.x uses several event loops. Unfortunately, not all APIs in the Java ecosystem are written asynchronously, for example, the JDBC API. Vert.x offers a possibility to run this, blocking APIs without blocking the event loop. These special verticles are called worker verticles. You don’t execute worker verticles by using the standard Vert.x event loops, but by using a dedicated thread from a worker pool. This way, the worker verticles don’t block the event loop.
Our application consists of five different verticles covering different aspects of the business logic. The main entry point for our application is the HttpVerticle, which exposes an HTTP-endpoint to consume HTTP-requests and for proper health checking. Data from HTTP requests such as parameters and user-agent information are collected and transformed into a JSON message. In order to validate the input data (to ensure that the program exists and is still valid), the message is sent to the CacheVerticle.
This verticle implements an LRU-cache with a TTL of 10 minutes and a capacity of 100,000 entries. Instead of adding additional functionality to a standard JDK map implementation, we use Google Guava, which has all the features we need. If the data is not in the L1 cache, the message is sent to the RedisVerticle. This verticle is responsible for data residing in Amazon ElastiCache and uses the Vert.x-redis-client to read data from Redis. In our example, Redis is the central data store. However, in a typical production implementation, Redis would just be the L2 cache with a central data store like Amazon DynamoDB. One of the most important paradigms of a reactive system is to switch from a pull- to a push-based model. To achieve this and reduce network overhead, we’ll use Redis pub/sub to push core data changes to our main application.
Vert.x also supports direct Redis pub/sub-integration, the following code shows our subscriber-implementation:
vertx.eventBus().<JsonObject>consumer(REDIS_PUBSUB_CHANNEL_VERTX, received -> {
JsonObject value = received.body().getJsonObject("value");
String message = value.getString("message");
JsonObject jsonObject = new JsonObject(message);
eb.send(CACHE_REDIS_EVENTBUS_ADDRESS, jsonObject);
});
redis.subscribe(Constants.REDIS_PUBSUB_CHANNEL, res -> {
if (res.succeeded()) {
LOGGER.info("Subscribed to " + Constants.REDIS_PUBSUB_CHANNEL);
} else {
LOGGER.info(res.cause());
}
});
The verticle subscribes to the appropriate Redis pub/sub-channel. If a message is sent over this channel, the payload is extracted and forwarded to the cache-verticle that stores the data in the L1-cache. After storing and enriching data, a response is sent back to the HttpVerticle, which responds to the HTTP request that initially hit this verticle. In addition, the message is converted to ByteBuffer, wrapped in protocol buffers, and send to an Amazon Kinesis Data Stream.
The following example shows a stripped-down version of the KinesisVerticle:
Kinesis Consumer This AWS Lambda function consumes data from an Amazon Kinesis Data Stream and persists the data in an Amazon DynamoDB table. In order to improve testability, the invocation code is separated from the business logic. The invocation code is implemented in the class KinesisConsumerHandler and iterates over the Kinesis events pulled from the Kinesis stream by AWS Lambda. Each Kinesis event is unwrapped and transformed from ByteBuffer to protocol buffers and converted into a Java object. Those Java objects are passed to the business logic, which persists the data in a DynamoDB table. In order to improve duration of successive Lambda calls, the DynamoDB-client is instantiated lazily and reused if possible.
Redis Updater From time to time, it is necessary to update core data in Redis. A very efficient implementation for this requirement is using AWS Lambda and Amazon Kinesis. New core data is sent over the AWS Kinesis stream using JSON as data format and consumed by a Lambda function. This function iterates over the Kinesis events pulled from the Kinesis stream by AWS Lambda. Each Kinesis event is unwrapped and transformed from ByteBuffer to String and converted into a Java object. The Java object is passed to the business logic and stored in Redis. In addition, the new core data is also sent to the main application using Redis pub/sub in order to reduce network overhead and converting from a pull- to a push-based model.
The following example shows the source code to store data in Redis and notify all subscribers:
public void updateRedisData(final TrackingMessage trackingMessage, final Jedis jedis, final LambdaLogger logger) {
try {
ObjectMapper mapper = new ObjectMapper();
String jsonString = mapper.writeValueAsString(trackingMessage);
Map<String, String> map = marshal(jsonString);
String statusCode = jedis.hmset(trackingMessage.getProgramId(), map);
}
catch (Exception exc) {
if (null == logger)
exc.printStackTrace();
else
logger.log(exc.getMessage());
}
}
public void notifySubscribers(final TrackingMessage trackingMessage, final Jedis jedis, final LambdaLogger logger) {
try {
ObjectMapper mapper = new ObjectMapper();
String jsonString = mapper.writeValueAsString(trackingMessage);
jedis.publish(Constants.REDIS_PUBSUB_CHANNEL, jsonString);
}
catch (final IOException e) {
log(e.getMessage(), logger);
}
}
Similarly to our Kinesis Consumer, the Redis-client is instantiated somewhat lazily.
Infrastructure as Code As already outlined, latency and response time are a very critical part of any ad-tracking solution because response time has a huge impact on conversion rate. In order to reduce latency for customers world-wide, it is common practice to roll out the infrastructure in different AWS Regions in the world to be as close to the end customer as possible. AWS CloudFormation can help you model and set up your AWS resources so that you can spend less time managing those resources and more time focusing on your applications that run in AWS.
You create a template that describes all the AWS resources that you want (for example, Amazon EC2 instances or Amazon RDS DB instances), and AWS CloudFormation takes care of provisioning and configuring those resources for you. Our reference architecture can be rolled out in different Regions using an AWS CloudFormation template, which sets up the complete infrastructure (for example, Amazon Virtual Private Cloud (Amazon VPC), Amazon Elastic Container Service (Amazon ECS) cluster, Lambda functions, DynamoDB table, Amazon ElastiCache cluster, etc.).
Conclusion In this blog post we described reactive principles and an example architecture with a common use case. We leveraged the capabilities of different frameworks in combination with several AWS services in order to implement reactive principles—not only at the application-level but also at the system-level. I hope I’ve given you ideas for creating your own reactive applications and systems on AWS.
About the Author
Sascha Moellering is a Senior Solution Architect. Sascha is primarily interested in automation, infrastructure as code, distributed computing, containers and JVM. He can be reached at [email protected]
Amazon EMR enables data analysts and scientists to deploy a cluster running popular frameworks such as Spark, HBase, Presto, and Flink of any size in minutes. When you launch a cluster, Amazon EMR automatically configures the underlying Amazon EC2 instances with the frameworks and applications that you choose for your cluster. This can include popular web interfaces such as Hue workbench, Zeppelin notebook, and Ganglia monitoring dashboards and tools.
These web interfaces are hosted on the EMR master node and must be accessed using the public DNS name of the master node (master public DNS value). The master public DNS value is dynamically created, not very user friendly and is hard to remember— it looks something like ip-###-###-###-###.us-west-2.compute.internal. Not having a friendly URL to connect to the popular workbench or notebook interfaces may impact the workflow and hinder your gained agility.
Some customers have addressed this challenge through custom bootstrap actions, steps, or external scripts that periodically check for new clusters and register a friendlier name in DNS. These approaches either put additional burden on the data practitioners or require additional resources to execute the scripts. In addition, there is typically some lag time associated with such scripts. They often don’t do a great job cleaning up the DNS records after the cluster has terminated, potentially resulting in a security risk.
The solution in this post provides an automated, serverless approach to registering a friendly master node name for easy access to the web interfaces.
Before I dive deeper, I review these key services and how they are part of this solution.
CloudWatch Events
CloudWatch Events delivers a near real-time stream of system events that describe changes in AWS resources. Using simple rules, you can match events and route them to one or more target functions or streams. An event can be generated in one of four ways:
From an AWS service when resources change state
From API calls that are delivered via AWS CloudTrail
From your own code that can generate application-level events
In this solution, I cover the first type of event, which is automatically emitted by EMR when the cluster state changes. Based on the state of this event, either create or update the DNS record in Route 53 when the cluster state changes to STARTING, or delete the DNS record when the cluster is no longer needed and the state changes to TERMINATED. For more information about all EMR event details, see Monitor CloudWatch Events.
Route 53 private hosted zones
A private hosted zone is a container that holds information about how to route traffic for a domain and its subdomains within one or more VPCs. Private hosted zones enable you to use custom DNS names for your internal resources without exposing the names or IP addresses to the internet.
Route 53 supports resource record sets with a wide range of record types. In this solution, you use a CNAME record that is used to specify a domain name as an alias for another domain (the ‘canonical’ domain). You use a friendly name of the cluster as the CNAME for the EMR master public DNS value.
You are using private hosted zones because an EMR cluster is typically deployed within a private subnet and is accessed either from within the VPC or from on-premises resources over VPN or AWS Direct Connect. To resolve domain names in private hosted zones from your on-premises network, configure a DNS forwarder, as described in How can I resolve Route 53 private hosted zones from an on-premises network via an Ubuntu instance?.
Lambda
Lambda is a compute service that lets you run code without provisioning or managing servers. Lambda executes your code only when needed and scales automatically to thousands of requests per second. Lambda takes care of high availability, and server and OS maintenance and patching. You pay only for the consumed compute time. There is no charge when your code is not running.
Lambda provides the ability to invoke your code in response to events, such as when an object is put to an Amazon S3 bucket or as in this case, when a CloudWatch event is emitted. As part of this solution, you deploy a Lambda function as a target that is invoked by CloudWatch Events when the event matches your rule. You also configure the necessary permissions based on the Lambda permissions model, including a Lambda function policy and Lambda execution role.
Putting it all together
Now that you have all of the pieces, you can put together a complete solution. The following diagram illustrates how the solution works:
Start with a user activity such as launching or terminating an EMR cluster.
EMR automatically sends events to the CloudWatch Events stream.
A CloudWatch Events rule matches the specified event, and routes it to a target, which in this case is a Lambda function. In this case, you are using the EMR Cluster State Change
The Lambda function performs the following key steps:
Get the clusterId value from the event detail and use it to call EMR. DescribeCluster API to retrieve the following data points:
MasterPublicDnsName – public DNS name of the master node
Locate the tag containing the friendly name to use as the CNAME for the cluster. The key name containing the friendly name should be The value should be specified as host.domain.com, where domain is the private hosted zone in which to update the DNS record.
Update DNS based on the state in the event detail.
If the state is STARTING, the function calls the Route 53 API to create or update a resource record set in the private hosted zone specified by the domain tag. This is a CNAME record mapped to MasterPublicDnsName.
Conversely, if the state is TERMINATED, the function calls the Route 53 API to delete the associated resource record set from the private hosted zone.
Deploying the solution
Because all of the components of this solution are serverless, use the AWS Serverless Application Model (AWS SAM) template to deploy the solution. AWS SAM is natively supported by AWS CloudFormation and provides a simplified syntax for expressing serverless resources, resulting in fewer lines of code.
Overview of the SAM template
For this solution, the SAM template has 76 lines of text as compared to 142 lines without SAM resources (and writing the template in YAML would be even slightly smaller). The solution can be deployed using the AWS Management Console, AWS Command Line Interface (AWS CLI), or AWS SAM Local.
CloudFormation transforms help simplify template authoring by condensing a multiple-line resource declaration into a single line in your template. To inform CloudFormation that your template defines a serverless application, add a line under the template format version as follows:
Before SAM, you would use the AWS::Lambda::Function resource type to define your Lambda function. You would then need a resource to define the permissions for the function (AWS::Lambda::Permission), another resource to define a Lambda execution role (AWS::IAM::Role), and finally a CloudWatch Events resource (Events::Rule) that triggers this function.
With SAM, you need to define just a single resource for your function, AWS::Serverless::Function. Using this single resource type, you can define everything that you need, including function properties such as function handler, runtime, and code URI, as well as the required IAM policies and the CloudWatch event.
A few additional things to note in the code example:
CodeUri – Before you can deploy a SAM template, first upload your Lambda function code zip to S3. You can do this manually or use the aws cloudformation package CLI command to automate the task of uploading local artifacts to a S3 bucket, as shown later.
Lambda execution role and permissions – You are not specifying a Lambda execution role in the template. Rather, you are providing the required permissions as IAM policy documents. When the template is submitted, CloudFormation expands the AWS::Serverless::Function resource, declaring a Lambda function and an execution role. The created role has two attached policies: a default AWSLambdaBasicExecutionRole and the inline policy specified in the template.
CloudWatch Events rule – Instead of specifying a CloudWatch Events resource type, you are defining an event source object as a property of the function itself. When the template is submitted, CloudFormation expands this into a CloudWatch Events rule resource and automatically creates the Lambda resource-based permissions to allow the CloudWatch Events rule to trigger the function.
NOTE: If you are trying this solution outside of us-east-1, then you should download the necessary files, upload them to the buckets in your region, edit the script as appropriate and then run it or use the CLI deployment method below.
3.) Choose Next.
4.) On the Specify Details page, keep or modify the stack name and choose Next.
5.) On the Options page, choose Next.
6.) On the Review page, take the following steps:
Acknowledge the two Transform access capabilities. This allows the CloudFormation transform to create the required IAM resources with custom names.
Under Transforms, choose Create Change Set.
Wait a few seconds for the change set to be created before proceeding. The change set should look as follows:
7.) Choose Execute to deploy the template.
After the template is deployed, you should see four resources created:
After the package is successfully uploaded, the output should look as follows:
Uploading to 0f6d12c7872b50b37dbfd5a60385b854 1872 / 1872.0 (100.00%)
Successfully packaged artifacts and wrote output template to file serverless-output.template.
The CodeUri property in serverless-output.template is now referencing the packaged artifacts in the S3 bucket that you specified:
s3://<bucket>/0f6d12c7872b50b37dbfd5a60385b854
Use the aws cloudformation deploy CLI command to deploy the stack:
You should see the following output after the stack has been successfully created:
Waiting for changeset to be created...
Waiting for stack create/update to complete
Successfully created/updated stack – EmrDnsSetterCli
Validating results
To test the solution, launch an EMR cluster. The Lambda function looks for the cluster_name tag associated with the EMR cluster. Make sure to specify the friendly name of your cluster as host.domain.com where the domain is the private hosted zone in which to create the CNAME record.
Here is a sample CLI command to launch a cluster within a specific subnet in a VPC with the required tag cluster_name.
After the cluster is launched, log in to the Route 53 console. In the left navigation pane, choose Hosted Zones to view the list of private and public zones currently configured in Route 53. Select the hosted zone that you specified in the ZONE tag when you launched the cluster. Verify that the resource records were created.
You can also monitor the CloudWatch Events metrics that are published to CloudWatch every minute, such as the number of TriggeredRules and Invocations.
Now that you’ve verified that the Lambda function successfully updated the Route 53 resource records in the zone file, terminate the EMR cluster and verify that the records are removed by the same function.
Conclusion
This solution provides a serverless approach to automatically assigning a friendly name for your EMR cluster for easy access to popular notebooks and other web interfaces. CloudWatch Events also supports cross-account event delivery, so if you are running EMR clusters in multiple AWS accounts, all cluster state events across accounts can be consolidated into a single account.
I hope that this solution provides a small glimpse into the power of CloudWatch Events and Lambda and how they can be leveraged with EMR and other AWS big data services. For example, by using the EMR step state change event, you can chain various pieces of your analytics pipeline. You may have a transient cluster perform data ingest and, when the task successfully completes, spin up an ETL cluster for transformation and upload to Amazon Redshift. The possibilities are truly endless.
AWS Fargate is a technology that allows you to focus on running your application without needing to provision, monitor, or manage the underlying compute infrastructure. You package your application into a Docker container that you can then launch using your container orchestration tool of choice.
Fargate allows you to use containers without being responsible for Amazon EC2 instances, similar to how EC2 allows you to run VMs without managing physical infrastructure. Currently, Fargate provides support for Amazon Elastic Container Service (Amazon ECS). Support for Amazon Elastic Container Service for Kubernetes (Amazon EKS) will be made available in the near future.
Despite offloading the responsibility for the underlying instances, Fargate still gives you deep control over configuration of network placement and policies. This includes the ability to use many networking fundamentals such as Amazon VPC and security groups.
This post covers how to take advantage of the different ways of networking your containers in Fargate when using ECS as your orchestration platform, with a focus on how to do networking securely.
The first step to running any application in Fargate is defining an ECS task for Fargate to launch. A task is a logical group of one or more Docker containers that are deployed with specified settings. When running a task in Fargate, there are two different forms of networking to consider:
Container (local) networking
External networking
Container Networking
Container networking is often used for tightly coupled application components. Perhaps your application has a web tier that is responsible for serving static content as well as generating some dynamic HTML pages. To generate these dynamic pages, it has to fetch information from another application component that has an HTTP API.
One potential architecture for such an application is to deploy the web tier and the API tier together as a pair and use local networking so the web tier can fetch information from the API tier.
If you are running these two components as two processes on a single EC2 instance, the web tier application process could communicate with the API process on the same machine by using the local loopback interface. The local loopback interface has a special IP address of 127.0.0.1 and hostname of localhost.
By making a networking request to this local interface, it bypasses the network interface hardware and instead the operating system just routes network calls from one process to the other directly. This gives the web tier a fast and efficient way to fetch information from the API tier with almost no networking latency.
In Fargate, when you launch multiple containers as part of a single task, they can also communicate with each other over the local loopback interface. Fargate uses a special container networking mode called awsvpc, which gives all the containers in a task a shared elastic network interface to use for communication.
If you specify a port mapping for each container in the task, then the containers can communicate with each other on that port. For example the following task definition could be used to deploy the web tier and the API tier:
ECS, with Fargate, is able to take this definition and launch two containers, each of which is bound to a specific static port on the elastic network interface for the task.
Because each Fargate task has its own isolated networking stack, there is no need for dynamic ports to avoid port conflicts between different tasks as in other networking modes. The static ports make it easy for containers to communicate with each other. For example, the web container makes a request to the API container using its well-known static port:
curl 127.0.0.1:8080/my-endpoint
This sends a local network request, which goes directly from one container to the other over the local loopback interface without traversing the network. This deployment strategy allows for fast and efficient communication between two tightly coupled containers. But most application architectures require more than just internal local networking.
External Networking
External networking is used for network communications that go outside the task to other servers that are not part of the task, or network communications that originate from other hosts on the internet and are directed to the task.
Configuring external networking for a task is done by modifying the settings of the VPC in which you launch your tasks. A VPC is a fundamental tool in AWS for controlling the networking capabilities of resources that you launch on your account.
When setting up a VPC, you create one or more subnets, which are logical groups that your resources can be placed into. Each subnet has an Availability Zone and its own route table, which defines rules about how network traffic operates for that subnet. There are two main types of subnets: public and private.
Public subnets
A public subnet is a subnet that has an associated internet gateway. Fargate tasks in that subnet are assigned both private and public IP addresses:
A browser or other client on the internet can send network traffic to the task via the internet gateway using its public IP address. The tasks can also send network traffic to other servers on the internet because the route table can route traffic out via the internet gateway.
If tasks want to communicate directly with each other, they can use each other’s private IP address to send traffic directly from one to the other so that it stays inside the subnet without going out to the internet gateway and back in.
Private subnets
A private subnet does not have direct internet access. The Fargate tasks inside the subnet don’t have public IP addresses, only private IP addresses. Instead of an internet gateway, a network address translation (NAT) gateway is attached to the subnet:
There is no way for another server or client on the internet to reach your tasks directly, because they don’t even have an address or a direct route to reach them. This is a great way to add another layer of protection for internal tasks that handle sensitive data. Those tasks are protected and can’t receive any inbound traffic at all.
In this configuration, the tasks can still communicate to other servers on the internet via the NAT gateway. They would appear to have the IP address of the NAT gateway to the recipient of the communication. If you run a Fargate task in a private subnet, you must add this NAT gateway. Otherwise, Fargate can’t make a network request to Amazon ECR to download the container image, or communicate with Amazon CloudWatch to store container metrics.
Load balancers
If you are running a container that is hosting internet content in a private subnet, you need a way for traffic from the public to reach the container. This is generally accomplished by using a load balancer such as an Application Load Balancer or a Network Load Balancer.
ECS integrates tightly with AWS load balancers by automatically configuring a service-linked load balancer to send network traffic to containers that are part of the service. When each task starts, the IP address of its elastic network interface is added to the load balancer’s configuration. When the task is being shut down, network traffic is safely drained from the task before removal from the load balancer.
To get internet traffic to containers using a load balancer, the load balancer is placed into a public subnet. ECS configures the load balancer to forward traffic to the container tasks in the private subnet:
This configuration allows your tasks in Fargate to be safely isolated from the rest of the internet. They can still initiate network communication with external resources via the NAT gateway, and still receive traffic from the public via the Application Load Balancer that is in the public subnet.
Another potential use case for a load balancer is for internal communication from one service to another service within the private subnet. This is typically used for a microservice deployment, in which one service such as an internet user account service needs to communicate with an internal service such as a password service. Obviously, it is undesirable for the password service to be directly accessible on the internet, so using an internet load balancer would be a major security vulnerability. Instead, this can be accomplished by hosting an internal load balancer within the private subnet:
With this approach, one container can distribute requests across an Auto Scaling group of other private containers via the internal load balancer, ensuring that the network traffic stays safely protected within the private subnet.
Best Practices for Fargate Networking
Determine whether you should use local task networking
Local task networking is ideal for communicating between containers that are tightly coupled and require maximum networking performance between them. However, when you deploy one or more containers as part of the same task they are always deployed together so it removes the ability to independently scale different types of workload up and down.
In the example of the application with a web tier and an API tier, it may be the case that powering the application requires only two web tier containers but 10 API tier containers. If local container networking is used between these two container types, then an extra eight unnecessary web tier containers would end up being run instead of allowing the two different services to scale independently.
A better approach would be to deploy the two containers as two different services, each with its own load balancer. This allows clients to communicate with the two web containers via the web service’s load balancer. The web service could distribute requests across the eight backend API containers via the API service’s load balancer.
Run internet tasks that require internet access in a public subnet
If you have tasks that require internet access and a lot of bandwidth for communication with other services, it is best to run them in a public subnet. Give them public IP addresses so that each task can communicate with other services directly.
If you run these tasks in a private subnet, then all their outbound traffic has to go through an NAT gateway. AWS NAT gateways support up to 10 Gbps of burst bandwidth. If your bandwidth requirements go over this, then all task networking starts to get throttled. To avoid this, you could distribute the tasks across multiple private subnets, each with their own NAT gateway. It can be easier to just place the tasks into a public subnet, if possible.
Avoid using a public subnet or public IP addresses for private, internal tasks
If you are running a service that handles private, internal information, you should not put it into a public subnet or use a public IP address. For example, imagine that you have one task, which is an API gateway for authentication and access control. You have another background worker task that handles sensitive information.
The intended access pattern is that requests from the public go to the API gateway, which then proxies request to the background task only if the request is from an authenticated user. If the background task is in a public subnet and has a public IP address, then it could be possible for an attacker to bypass the API gateway entirely. They could communicate directly to the background task using its public IP address, without being authenticated.
Conclusion
Fargate gives you a way to run containerized tasks directly without managing any EC2 instances, but you still have full control over how you want networking to work. You can set up containers to talk to each other over the local network interface for maximum speed and efficiency. For running workloads that require privacy and security, use a private subnet with public internet access locked down. Or, for simplicity with an internet workload, you can just use a public subnet and give your containers a public IP address.
To deploy one of these Fargate task networking approaches, check out some sample CloudFormation templates showing how to configure the VPC, subnets, and load balancers.
If you have questions or suggestions, please comment below.
AWS CodeDeploy is a service that automates application deployments to your compute infrastructure, including fleets of Amazon EC2 instances. AWS CodeDeploy can automatically deploy the latest app version to any new EC2 instance launched due to a scaling event. However, if your servers are not part of the Auto Scaling group, it might be a challenge to automate the code deployment for new EC2 launches. This is especially true if you are running your workload on an EC2 Spot Fleet. In this post, I’ll show you how to use AWS CodeDeploy, an Amazon CloudWatch Events rule, EC2 tags, and AWS Lambda to automate your deployments so that all EC2 instances run the same version of your application.
Solution overview:
An Amazon CloudWatch rule is triggered when a new Amazon EC2 instance is launched. The rule invokes an AWS Lambda function that extracts three tags:
· Name
· CodeDeployDeploymentGroup
· CodeDeployApplication
The Lambda function then uses the instance tags to add the EC2 instance to the deployment group. After the instance has been added, Lambda queries AWS CodeDeploy to retrieve the last successful deployment in that deployment group and synchronizes the EC2 instance with the latest code. The following diagram shows the sequence:
Note: For an EC2 Spot Fleet, the tags that you create for your Spot instance are not added automatically by the Spot service to fulfill the request. The Lambda function adds these tags after the Spot instance is launched.
Example:
Let’s take an example of an AWS CodeDeploy application cd-single-deploy-app, and its deployment group, cd-single-deploy-app-group, which has two instances in it. These instances are not part of an Auto Scaling group.
In the AWS CodeDeploy console, you’ll find the deployment ID, d-CSBCXR2GP. We will use this ID later in the testing phase. Our workflow ensures that when a new EC2 instance is launched, it joins the deployment group automatically and the latest version of the application is deployed to it.
You can configure the Lambda function, associated IAM role and policy, and the CloudWatch Events rule by running this CloudFormation template included in the code repository.
Step 3: In Amazon CloudWatch Events, set up a rule for running instances and configure the Lambda function as a target.
Step 4: In the AWS Lambda console, choose your Lambda function. On the Triggers tab, check that the trigger is enabled.
You can now test to ensure if a new EC2 instance is launched, it gets synchronized automatically with the latest code
Test:
A new EC2 instance is launched from the EC2 console, the AWS CloudFormation template, or the EC2 APIs with CodeDeployApplication, CodeDeployDeploymentGroup, and Name tag.
Go to CloudWatch Logs console to check the Lambda execution logs.
Now, go to CodeDeploy console to confirm that the new deployment was successful.
In this screenshot, you can see that the new EC2 instance was synched with latest code.
Summary:
In this post, I’ve demonstrated how to use AWS CodeDeploy, AWS Lambda, an Amazon CloudWatch Events rule, and EC2 tags to automate deployments to disparate EC2 instances. If you are running your workload on Spot instances or have a fleet of On-Demand EC2 instances as part of your application, you can use the steps in this blog post to solve the automation around the deployment for new instances.
Many enterprises use Microsoft Active Directory to manage users, groups, and computers in a network. And a question is asked frequently: How can Active Directory users access big data workloads running on Amazon EMR with the same single sign-on (SSO) experience they have when accessing resources in the Active Directory network?
This post walks you through the process of using AWS CloudFormation to set up a cross-realm trust and extend authentication from an Active Directory network into an Amazon EMR cluster with Kerberos enabled. By establishing a cross-realm trust, Active Directory users can use their Active Directory credentials to access an Amazon EMR cluster and run jobs as themselves.
Walkthrough overview
In this example, you build a solution that allows Active Directory users to seamlessly access Amazon EMR clusters and run big data jobs. Here’s what you need before setting up this solution:
A possible limit increase for your account (Note: Usually a limit increase will not be necessary. See the AWS Service Limits documentation if you encounter a limit error while building the solution.)
To make it easier for you to get started, I created AWS CloudFormation templates that automatically configure and deploy the solution for you. The following steps and resources are involved in setting up the solution:
Note: If you want to manually create and configure the components for this solution without using AWS CloudFormation, refer to the Amazon EMR cross-realm documentation. IMPORTANT: The AWS CloudFormation templates used in this post are designed to work only in the us-east-1 (N. Virginia) Region. They are not intended for production use without modification.
Single-step solution deployment
If you don’t want to set up each component individually, you can use the single-step AWS CloudFormation template. The single-step template is a master template that uses nested stacks (additional templates) to launch and configure all the resources for the solution in one go.
To deploy the single-step template into your account, choose Launch Stack:
This takes you to the Create stack wizard in the AWS CloudFormation console. The template is launched in the US East (N. Virginia) Region by default. Do not change to a different Region because the template is designed to work only in us-east-1 (N. Virginia).
On the Select Template page, keep the default URL for the AWS CloudFormation template, and then choose Next.
On the Specify Details page, review the parameters for the template. Provide values for the parameters that require input (for more information, see the parameters table that follows).
The following parameters are available in this template.
Parameter
Default
Description
Domain Controller name
DC1
NetBIOS (hostname) name of the Active Directory server. This name can be up to 15 characters long.
Active Directory domain
example.com
Fully qualified domain name (FQDN) of the forest root domain (for example, example.com).
Domain NetBIOS name
EXAMPLE
NetBIOS name of the domain for users of earlier versions of Windows. This name can be up to 15 characters long.
Domain admin user
CrossRealmAdmin
User name for the account that is added as domain administrator. This account is separate from the default administrator account.
Domain admin password
Requires input
Password for the domain admin user. Must be at least eight characters including letters, numbers, and symbols.
Key pair name
Requires input
Name of an existing key pair, which enables you to connect securely to your instance after it launches.
Instance type
m4.xlarge
Instance type for the domain controller and the Amazon EMR cluster.
Allowed IP address
10.0.0.0/16
The client IP address can that can reach your cluster. Specify an IP address range in CIDR notation (for example, 203.0.113.5/32). By default, only the VPC CIDR (10.0.0.0/16) can reach the cluster. Be sure to add your client IP range so that you can connect to the cluster using SSH.
EMR Kerberos realm
EC2.INTERNAL
Cluster’s Kerberos realm name. By default, the realm name is derived from the cluster’s VPC domain name in uppercase letters (for example, EC2.INTERNAL is the default VPC domain name in the us-east-1 Region).
Trusted AD domain
EXAMPLE.COM
The Active Directory (AD) domain that you want to trust. This is the same as the “Active Directory domain.” However, it must use all uppercase letters (for example, EXAMPLE.COM).
Cross-realm trust password
Requires input
Password that you want to use for your cross-realm trust.
Instance count
2
The number of instances (core nodes) for the cluster.
EMR applications
Hadoop, Spark, Ganglia, Hive
Comma separated list of applications to install on the cluster.
After you specify the template details, choose Next. On the Options page, choose Next again. On the Review page, select the I acknowledge that AWS CloudFormation might create IAM resources with custom names check box, and then choose Create.
It takes approximately 45 minutes for the deployment to complete. When the stack launch is complete, it will return outputs with information about the resources that were created. Note the outputs and skip to the Managing and testing the solution section. You can view the stack outputs on the AWS Management Console or by using the following AWS CLI command:
This section describes how to use AWS CloudFormation templates to perform each step separately in the solution.
Create and configure an Amazon VPC
In order for you to establish a cross-realm trust between an Amazon EMR Kerberos realm and an Active Directory domain, your Amazon VPC must meet the following requirements:
The subnet used for the Amazon EMR cluster must have a CIDR block of fewer than nine digits (for example, 10.0.1.0/24).
Both DNS resolution and DNS hostnames must be enabled (set to “yes”).
The Active Directory domain controller must be the DNS server for instances in the Amazon VPC (this is configured in the next step).
To use the AWS CloudFormation template to create and configure an Amazon VPC with the prerequisites listed previously, choose Launch Stack:
Note: If you want to create the VPC manually (without using AWS CloudFormation), see Set Up the VPC and Subnet in the Amazon EMR documentation.
Launching this stack creates the following AWS resources:
Amazon VPC with CIDR block 10.0.0.0/16 (Name: CrossRealmVPC)
Internet Gateway (Name: CrossRealmGateway)
Public subnet with CIDR block 10.0.1.0/24 (Name: CrossRealmSubnet)
Security group allowing inbound access from the VPC’s subnets (Name tag: CrossRealmSecurityGroup)
When the stack launch is complete, it should return outputs similar to the following.
Key
Value example
Description
SubnetID
subnet-xxxxxxxx
The subnet for the Active Directory domain controller and the EMR cluster.
SecurityGroup
sg-xxxxxxxx
The security group for the Active Directory domain controller.
VPCID
vpc-xxxxxxxx
The Active Directory domain controller and EMR cluster will be launched on this VPC.
Note the outputs because they are used in the next step. You can view the stack outputs on the AWS Management Console or by using the following AWS CLI command:
Launch and configure an Active Directory domain controller
In this step, you use an AWS CloudFormation template to automatically launch and configure a new Active Directory domain controller and cross-realm trust.
Note: There are various ways to install and configure an Active Directory domain controller. For details on manually launching and installing a domain controller without AWS CloudFormation, see Step 2: Launch and Install the AD Domain Controller in the Amazon EMR documentation.
In addition to launching and configuring an Active Directory domain controller and cross-realm trust, this AWS CloudFormation template also sets the domain controller as the DNS server (name server) for your Amazon VPC. In other words, the template creates a new DHCP option-set for the VPC where it’s being deployed to, and it sets the private IP of the domain controller as the name server for that new DHCP option set.
IMPORTANT: You should not use this template on a production VPC with existing resources like Amazon EC2 instances. When you launch this stack, make sure that you use the new environment and resources (Amazon VPC, subnet, and security group) that were created in the Create and configure an Amazon VPC step.
To launch this stack, choose Launch Stack:
The following table contains information about the parameters available in this template. Review the parameters for the template and provide values for those that require input.
Parameter
Default
Description
VPC ID
Requires input
Launch the domain controller on this VPC (for example, use the VPC created in the Create and configure an Amazon VPC step).
Subnet ID
Requires input
Subnet used for the domain controller (for example, use the subnet created in the Create and configure an Amazon VPC step).
Security group ID
Requires input
Security group (SG) for the domain controller (for example, use the SG created in the Create and configure an Amazon VPC step).
Domain Controller name
DC1
NetBIOS name of the Active Directory server (up to 15 characters).
Active Directory domain
example.com
Fully qualified domain name (FQDN) of the forest root domain (for example, example.com).
Domain NetBIOS name
EXAMPLE
NetBIOS name of the domain for users of earlier versions of Windows. This name can be up to 15 characters long.
Domain admin user
CrossRealmAdmin
User name for the account that is added as domain administrator. This account is separate from the default administrator account.
Domain admin password
Requires input
Password for the domain admin user. Must be at least eight characters including letters, numbers, and symbols.
Key pair name
Requires input
Name of an existing EC2 key pair to enable access to the domain controller instance.
Instance type
m4.xlarge
Instance type for the domain controller.
EMR Kerberos realm
EC2.INTERNAL
Cluster’s Kerberos realm name. By default, the realm name is derived from the cluster’s VPC domain name in uppercase letters (for example, EC2.INTERNAL is the default VPC domain name in the us-east-1 Region).
Cross-realm trust password
Requires input
Password that you want to use for your cross-realm trust.
It takes 25–30 minutes for this stack to be created. When it’s complete, note the stack’s outputs, and then move to the next step: Launch an EMR cluster with Kerberos enabled.
Create a security configuration and launch an Amazon EMR cluster with Kerberos enabled
To launch a kerberized Amazon EMR cluster, you first need to create a security configuration containing the cross-realm trust configuration. You then specify cluster-specific Kerberos attributes when launching the cluster.
In this step, you use AWS CloudFormation to launch and configure a kerberized Amazon EMR cluster with a cross-realm trust. If you want to manually launch and configure a cluster with Kerberos enabled, see Step 6: Launch a Kerberized EMR Cluster in the Amazon EMR documentation.
Note: At the time of this writing, AWS CloudFormation does not yet support launching Amazon EMR clusters with Kerberos authentication enabled. To overcome this limitation, I created a template that uses an AWS Lambda-backed custom resource to launch and configure the Amazon EMR cluster with Kerberos enabled. If you use this template, there’s nothing else that you need to do. Just keep in mind that the template creates and invokes an AWS Lambda function (custom resource) to launch the cluster.
To create a cross-realm trust security configuration and launch a kerberized Amazon EMR cluster using AWS CloudFormation, choose Launch Stack:
The following table lists and describes the template parameters for deploying a kerberized Amazon EMR cluster and configuring a cross-realm trust.
Parameter
Default
Description
Active Directory domain
example.com
The Active Directory domain that you want to establish the cross-realm trust with.
Domain admin user (joiner user)
CrossRealmAdmin
The user name of an Active Directory domain user with privileges to join domains/computers to the Active Directory domain (joiner user).
Domain admin password
Requires input
Password of the joiner user.
Cross-realm trust password
Requires input
Password of your cross-realm trust.
EC2 key pair name
Requires input
Name of an existing key pair, which enables you to connect securely to your cluster after it launches.
Subnet ID
Requires input
Subnet that you want to use for your Amazon EMR cluster (for example, choose the subnet created in the Create and configure an Amazon VPC step).
Security group ID
Requires input
Security group that you want to use for your Amazon EMR cluster (for example, choose the security group created in the Create and configure an Amazon VPC step).
Instance type
m4.xlarge
The instance type that you want to use for the cluster nodes.
Instance count
2
The number of instances (core nodes) for the cluster.
Allowed IP address
10.0.0.0/16
The client IP address can that can reach your cluster. Specify an IP address range in CIDR notation (for example, 203.0.113.5/32). By default, only the VPC CIDR (10.0.0.0/16) can reach the cluster. Be sure to add your client IP range so that you can connect to the cluster using SSH.
EMR applications
Hadoop, Spark, Ganglia, Hive
Comma separated list of the applications that you want installed on the cluster.
EMR Kerberos realm
EC2.INTERNAL
Cluster’s Kerberos realm name. By default, the realm name is derived from the cluster’s VPC domain name in uppercase letters (for example, EC2.INTERNAL is the default VPC domain name in the us-east-1 Region).
Trusted AD domain
EXAMPLE.COM
The Active Directory domain that you want to trust. This name is the same as the “AD domain name.” However, it must use all uppercase letters (for example, EXAMPLE.COM).
It takes 10–15 minutes for this stack to be created. When it’s complete, note the stack’s outputs, and then move to the next section: Managing and testing the solution.
Managing and testing the solution
Now that you’ve configured and built the solution, it’s time to test it by connecting to a cluster using Active Directory credentials.
SSH to a cluster using Active Directory credentials (single sign-on)
After you launch a kerberized Amazon EMR cluster, if you used the AWS CloudFormation templates and added your client IP range to the Allowed IP address parameter, you should be able to connect to the cluster using an SSH client and your Active Directory user credentials. If you have trouble connecting to the cluster using SSH, check the cluster’s security group to make sure that it allows inbound SSH connection (TCP port 22) from your client’s IP address (source).
The following steps assume that you’re using a client such as OpenSSH. If you’re using a different SSH application (for example, PuTTY), consult the application-specific documentation.
Note: Because the cluster was launched with a cross-realm trust configuration, you don’t need to use a private key (.pem file) when you connect to it as a domain user using SSH.
To connect to your Amazon EMR cluster as an Active Directory user using SSH, run the following command. Replace ad_user with the domain admin user that you created while setting up the domain controller and replace master_node_URL with the cluster’s URL (see the stack’s outputs to find this information):
$ ssh -l <ad_user> <master_node_URL>
If your SSH client is configured to use a key as the preferred authentication method, the login might fail. If that’s the case, you can add the following options to your SSH command to force the SSH connection to use password authentication:
After a domain user connects to the cluster using SSH, if this is the first that the user is connecting, a local home directory is created for that user. In addition to creating a local home directory, if you used the create-hdfs-home-ba.sh bootstrap action when launching the cluster (done by default if you used the AWS CloudFormation template to launch a kerberized cluster), an HDFS user home directory is also automatically created.
Note: If you manually launched the cluster and did not use the create-hdfs-home-ba.sh bootstrap action, then you’ll need to manually create HDFS user home directories for your users.
When you connect to the cluster using SSH for the first time (as a domain user), you should see the following messages if the HDFS home directory for your domain user was successfully created:
Running jobs on a kerberized Amazon EMR cluster
To run a job on a kerberized cluster, the user submitting the job must first be authenticated. If you followed the previous section to connect to your cluster as an Active Directory user using SSH, the user should be authenticated automatically.
If running the klist command returns a “No credentials cache found” message, it means that the user is not authenticated (the user doesn’t have a Kerberos ticket). You can re-authenticate a user at any time by running the following command (be sure to use all uppercase letters for the Active Directory domain):
$ kinit <username>@<AD_DOMAIN>
When the user is authenticated, they can submit jobs just like they would on a non-kerberized cluster.
Auditing jobs
Another advantage that Kerberos can provide is that you can easily tell which user ran a particular job. For example, connect (using SSH) to a kerberized cluster with an Active Directory user, and submit the SparkPi sample application:
$ spark-example SparkPi
After running the SparkPi application, go to the Amazon EMR console and choose your cluster. Then choose the Application history tab. There you can see information about the application, including the user that submitted the job:
Common issues
Although it would be hard to cover every possible Kerberos issue, this section covers some of the more common issues that might occur and ways to fix them.
Issue 1: You can successfully connect and get authenticated on a cluster. However, whenever you try running job, it fails with an error similar to the following:
Solution: Make sure that an HDFS home directory for the user was created and that it has the right permissions.
Issue 2: You can successfully connect to the cluster, but you can’t run any Hadoop or HDFS commands.
Solution: Use the klist command to confirm whether the user is authenticated and has a valid Kerberos ticket. Use the kinit command to re-authenticate a user.
Issue 3: You can’t connect (using SSH) to the cluster using Active Directory user credentials, but you can manually authenticate the user with kinit.
Solution: Make sure that the Active Directory domain controller is the DNS server (name server) for the cluster nodes.
Cleaning up
After completing and testing this solution, remember to clean up the resources. If you used the AWS CloudFormation templates to create the resources, then use the AWS CloudFormation console or AWS CLI/SDK to delete the stacks. Deleting a stack also deletes the resources created by that stack.
If one of your stacks does not delete, make sure that there are no dependencies on the resources created by that stack. For example, if you deployed an Amazon VPC using AWS CloudFormation and then deployed a domain controller into that VPC using a different AWS CloudFormation stack, you must first delete the domain controller stack before the VPC stack can be deleted.
Summary
The ability to authenticate users and services with Kerberos not only allows you to secure your big data applications, but it also enables you to easily integrate Amazon EMR clusters with an Active Directory environment. This post showed how you can use Kerberos on Amazon EMR to create a single sign-on solution where Active Directory domain users can seamlessly access Amazon EMR clusters and run big data applications. We also showed how you can use AWS CloudFormation to automate the deployment of this solution.
The following 20 pages were the most viewed AWS Identity and Access Management (IAM) documentation pages in 2017. I have included a brief description with each link to explain what each page covers. Use this list to see what other AWS customers have been viewing and perhaps to pique your own interest in a topic you’ve been meaning to learn about.
What Is IAM? Learn more about IAM, a web service that helps you securely control access to AWS resources for your users. You use IAM to control who can use your AWS resources (authentication) and how they can use resources (authorization).
Creating an IAM User in Your AWS Account You can create one or more IAM users in your AWS account. You might create an IAM user when someone joins your organization, or when you have a new application that needs to make API calls to AWS.
Managing Access Keys for IAM Users Users need their own access keys to make programmatic calls to AWS from the AWS Command Line Interface (AWS CLI), Tools for Windows PowerShell, the AWS SDKs, or direct HTTP calls using the APIs for individual AWS services. To fill this need, you can create, modify, view, or rotate access keys (access key IDs and secret access keys) for IAM users.
IAM JSON Policy Elements Reference Learn more about the elements that you can use when you create a JSON policy. View additional JSON policy examples and learn about conditions, supported data types, and how they are used in various services.
IAM Best Practices To help secure your AWS resources, follow these best practices for IAM.
Using Multi-Factor Authentication (MFA) in AWS For an additional layer of security when signing in to your AWS account, AWS recommends that you configure MFA to help protect your AWS resources. MFA adds extra security because it requires users to enter a unique authentication code from an approved authentication device when they access AWS websites or services.
The IAM Console and the Sign-in Page Learn about the IAM-enabled AWS Management Console sign-in page and how to sign in as an AWS account root user or as an IAM user. To help your users sign in easily, create a unique sign-in URL for your account.
How Users Sign In to Your Account After you create IAM users and passwords for each, your users can sign in to the AWS Management Console using your account ID or alias, or from a special URL that includes your account ID.
Working with Server Certificates Some AWS services can use server certificates that you manage with IAM or AWS Certificate Manager (ACM). ACM is the preferred tool to provision, manage, and deploy your server certificates. Use IAM as a certificate manager only when you must support HTTPS connections in a region that is not supported by ACM.
IAM Roles A role is an AWS identity with permission policies that determine what the identity can and cannot do in AWS using temporary security credentials that are created dynamically and provided to the user. A role is intended to be assumable by anyone who needs it using these temporary security credentials.
IAM Policies Read an overview of policies, which are entities in AWS that, when attached to an identity or resource, define their permissions. Policies are stored in AWS as JSON documents attached to principals as identity-based policies or to resources as resource-based policies.
Example Policies This collection of policies can help you define permissions for your IAM identities, such as granting access to a specific Amazon DynamoDB table or launching Amazon EC2 instances in a specific subnet.
Using an IAM Role to Grant Permissions to Applications Running on Amazon EC2 Instances Use an IAM role to manage temporary credentials for applications that run on an EC2 instance. When you use a role, you do not have to distribute long-term credentials to an EC2 instance. Instead, the role supplies temporary permissions that applications can use when they make calls to other AWS resources.
Creating Your First IAM Admin User and Group As a best practice, do not use the AWS account root user for any task where it’s not required. Instead, learn how to create an IAM administrator user and group for yourself.
Temporary Security Credentials You can use the AWS Security Token Service (AWS STS) to create and provide trusted users with temporary security credentials that can control access to your AWS resources. Temporary security credentials work almost identically to the long-term access key credentials that your IAM users can use.
The AWS Account Root User When you first create an AWS account, you begin with a single sign-in identity that has complete access to all AWS services and resources in the account. This identity is called the AWS account root user and is accessed by signing in with the email address and password that you used to create the account. To manage your root user, follow the steps on this page.
In the “Comments” section below, let us know if you would like to see anything on these or other IAM documentation pages expanded or updated to make them more useful to you.
– Stephenie
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.