Tag Archives: IAM policy

How to centralize and automate IAM policy creation in sandbox, development, and test environments

Post Syndicated from Mahmoud ElZayet original https://aws.amazon.com/blogs/security/how-to-centralize-and-automate-iam-policy-creation-in-sandbox-development-and-test-environments/

To keep pace with AWS innovation, many customers allow their application teams to experiment with AWS services in sandbox environments as they move toward production-ready architecture. These teams need timely access to various sets of AWS services and resources, which means they also need a mechanism to help ensure least privilege is granted. In other words, your application team generally shouldn’t have access to administrative resources, such as an AWS Lambda function that takes periodic Amazon Elastic Block Store snapshot backups, or an Amazon CloudWatch Events rule that sends events to a centralized information security account managed by your security team.

In this blog post, I’ll show you how to create a centralized and automated workflow that creates and validates AWS Identity and Access Management (IAM) policies for application teams working in various sandbox, development, and test environments. Your security developers can customize this workflow according to the specific requirements of your security team. They can create logic to limit the allowed permission sets based on account type or owning team. I’ll use AWS CodePipeline to create and manage a workflow containing various stages and spanning multiple AWS accounts that I’ll describe in more detail in the next section.

Solution overview

I’ll start with this scenario: Alice is an administrator for an AWS sandbox account used by her organization’s data scientists to try out AWS analytics services such as Amazon Athena and Amazon EMR. The data scientists assess the suitability of these services for their production use cases by running sample analytics jobs on portions of real data sets after any sensitive information has been taken out. The data sets are stored in an existing Amazon Simple Storage Service (Amazon S3) bucket. For every new project, Alice authors a new IAM policy that allows the project team to access their requested Amazon S3 bucket and create their analytics clusters. However, Alice must follow a company guideline that sandbox accounts can only launch specific Amazon Elastic Compute Cloud (Amazon EC2) instance types. She must also restrict access to all administrative AWS Lambda functions and CloudWatch Events rules that the security team use to monitor sandbox account compliance. Below is the solution that meets these requirements and makes it easier for Alice and other administrators to perform their tasks.
 

Figure 1: Solution architecture

Figure 1: Solution architecture

  1. Alice uses the IAM visual editor to author a template that gives the data science team access to launch and manage EMR clusters that analyze S3-based data sets. She then uploads the IAM JSON policy document into an existing S3 bucket using an AWS Key Management Service (AWS KMS) key. The key and the S3 bucket are already created by the security team as part of account baselining, which I will detail later in this post.
  2. AWS CodePipeline automatically fetches the IAM JSON policy document and invokes a sequence of validation checks that use a single and central Lambda function hosted in an AWS account managed by the security team.
  3. If the IAM JSON policy adheres to all account and general security requirements coded by the security developers, the central Lambda function automatically creates the policy in Alice’s account and the pipeline will succeed. The central validation Lambda function will also attach a set of predefined explicit denies to the IAM policy to ensure that it limits undesired user capabilities in the sandbox account. If the IAM JSON policy fails the checks, the pipeline will fail and provide Alice the specific reason for non-compliance. Alice must then modify the policy and resubmit. When the policy has been successfully created, Alice will attach it to the right IAM user, group, or role.

Solution deployment

This solution includes the following three steps:

Prerequisites

As this solution manages permissions granted to AWS services or IAM entities, I highly recommend that you try the solution first in an isolated test environment to make sure it meets all your security requirements.

  1. You’ll need administrator access in two AWS accounts to set up the solution. The deployment of this solution is typically done by one of your organization’s administrators while setting up new AWS accounts. These are the two account types you’ll need access to:
     

    • A sandbox account. This lets application teams experiment with various AWS architectures. This could be a development or test account, as mentioned earlier.
    • A central information security account. Typically, this is owned by an information security team who monitors and enforces security compliance within a multi-account structure.


Important
: Because the Lambda function that you’ll create in the information security account has highly privileged permissions, it’s important to strictly follow best practices for securing the account. You need to limit account access to security team members. Sandbox account administrators should also not give this central Lambda function any IAM permissions in their sandbox account beyond IAM Policy creation.

  1. Because you’ll use the AWS Management Console for both AWS accounts, I strongly recommended that you have roles in both AWS accounts and use the console’s Switch Role feature. You can attach an alias to each account and give each a different color code so that you always know which one you’re logged into.
  2. Make sure to use the same AWS region for all the resources that you create for this solution.

Step 1: Deploy the solution prerequisites

Before building the pipeline across the two AWS accounts, you must first configure the required resources in both accounts, such as IAM roles and encryption keys. This configuration is typically done according to your security team’s guidelines when your organization first sets up the sandbox, development, or test environment.

Important

  • In addition to the initial setup you’ll create in this section, your security team must explicitly deny sandbox, development, or test account administrators from attaching IAM Policies that do not meet the allowed security policies for that account type, such as the AdministratorAccess IAM policy. Moreover, your security team must ensure any current or future users, groups, or roles in the account have no permissions to directly set or update IAM policies like (for example) CreatePolicy, CreatePolicyVersion, PutRolePolicy, PutUserPolicy, PutGroupPolicy, or UpdateAssumeRolePolicy. You want to ensure that creating permissions can only be done through the automation pipeline, which I’ll show you how to build shortly.
  • Because the solution I’ll be describing focuses on the creation of least privilege permissions, it’s highly advisable that your security team combines the solution with IAM permission boundaries to make sure that any permissions defined in this solution are scoped by a set of pre-defined permissions for every type of account in the organization. For example, your account administrators might only be allowed to create IAM users or roles with a pre-defined set of permission boundaries that limit the permissions attached to those principals. For more information about permission boundaries, please refer to this AWS Security blog post.

Create the sandbox account prerequisites

Follow the steps below to deploy an AWS CloudFormation template that will create the following resources in the sandbox account:

  • An S3 bucket where your sandbox administrators will upload IAM policies
  • An IAM role that your automated pipeline will use to access the S3 bucket that stores the IAM policies
  • An AWS KMS key that you will use to encrypt the IAM policies in your S3 bucket
  1. While logged in to your sandbox account in your default browser, select this link to launch an AWS stack with the sandbox environment prerequisites. You’ll be redirected to the CloudFormation console with the template URL already populated.
     
    Figure 2: CloudFormation console

    Figure 2: CloudFormation console with prepopulated URL

  2. Select Next and, optionally, provide a name for your stack. A suggested stack name, Sandbox-Prerequisites, should already be populated.
  3. The template defines an input parameter called CentralAccount that you can populate with the AWS account ID of your security account. For more information on how to find the account ID of your security account, check here.
  4. Select Next, and then select Next again.
  5. To have the stack create the IAM roles that your pipeline will use, select the check box that says I acknowledge that AWS CloudFormation might create IAM resources with custom names, and then select Create Stack.
  6. Select the Stack info tab and refresh periodically while watching the Stack Status field value. After your stack reaches the state CREATE_COMPLETE, navigate to the CloudFormation Outputs tab and copy the following output values to the text editor of your choice. You’ll use these values in subsequent CloudFormation stacks.
     
    Figure 3: CloudFormation Outputs tab

    Figure 3: CloudFormation Outputs tab

Create the information security account prerequisites

Follow the steps below to deploy a CloudFormation template that will create the following resources in your information security account:

  • An IAM role used by your automated pipeline to invoke your central Lambda function and to provide access to the sandbox account KMS key
  • An IAM role used by the central Lambda function to assume a role in the sandbox account and manage IAM policies
  1. While logged in to your security account in your default browser, select this link to launch an AWS stack with the security environment prerequisites. You’ll be redirected to the CloudFormation console with the template URL already populated.
  2. Select Next and, optionally, provide a name for your stack. A suggested stack name, Sandbox-Prerequisites, should already be populated.
  3. Populate the following input parameter fields:
    • SandboxAccount: The AWS account ID for the sandbox account.
    • ArtifactBucket: The bucket name that you noted in your text editor from the previous stack run in the sandbox account
    • CMKARN: The Amazon Resource Name (ARN) of the KMS key that you noted in your text editor from the previous stack run in the sandbox account
    • PolicyCheckerFunctionName: The name of the Lambda function to be created later. The default value is PolicyChecker
  4. Select Next, and then select Next again.
  5. To have the stack create the IAM roles used by your pipeline, select the box that reads I acknowledge that AWS CloudFormation might create IAM resources with custom names, and then select Create Stack.
  6. Wait for your stack until it reaches the state CREATE_COMPLETE.

Create the sandbox account pipeline

Now, switch back to your sandbox account and deploy the CloudFormation template that will create the following resources in the sandbox account:

  • An AWS CodePipeline automation pipeline that fetches the IAM policy document from S3 and sends it to the security account for centralized validation. If valid, a Lambda function in the information security account will also create the IAM policy in the sandbox account.
  • An S3 bucket policy to allow your central Lambda function to fetch the IAM policy JSON document from your bucket
  • An IAM role that will be assumed by the Lambda function in the central information security account and used to create IAM policies in the sandbox account. Sandbox account administrator can then attach those IAM policies to the required entities, like an IAM user or role.
  1. While logged in to your sandbox account in your default browser, select this link to launch an AWS stack with the sandbox environment prerequisites. You’ll be redirected to the CloudFormation console with the template URL already populated.
  2. Click Next and, optionally, provide a name for your stack. A suggested stack name, Sandbox-Pipeline, should already be populated.
  3. Populate the following input parameter fields:
    • CentralAccount: The AWS account ID of the information security account, without hyphens.
    • ArtifactBucket: The same bucket name that you noted in your text editor earlier and used in the previous stack in the information security account.
    • CMKARN: The ARN of the KMS key that you noted in your text editor earlier and used in the previous stack in the information security account.
    • PolicyCheckerFunctionName: Again, the name of the Lambda function to be created later. It must be the same value you provided to the information security account template.
  4. Select Next, and then select Next again.
  5. To have the stack create the required IAM roles, select the box that reads I acknowledge that AWS CloudFormation might create IAM resources with custom names, and then select Create Stack.
  6. Wait for your stack until it reaches the state CREATE_COMPLETE.

Step 2: Set up the policy validation Lambda function in the central information security account

In the central information security account, create the Lambda function to validate the IAM policies created in sandbox environment.

  1. In the AWS Lambda console, select Create Function and then select Author from scratch. Provide values for the following fields:
    • Name. This must be the same function name defined as input parameter PolicyCheckerFunctionName to CloudFormation in step 1, when you set up the information security account prerequisites. If you did not change the default value in step 1, the default is still PolicyChecker.
    • Runtime. Python 2.7.
    • Role. To set the role, select Choose an existing role, and then select the role named policy-checker-lambda-role. This is the role you created in step 1, when you set up the information security account prerequisites.

    Choose Create Function, scroll down to Function Code, and then paste the following code into the editor (replacing the existing code):

    
    #  Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved.
    #  Licensed under the Apache License, Version 2.0 (the "License"). You may not
    #  use this file except in compliance with
    #  the License. A copy of the License is located at
    #      http://aws.amazon.com/apache2.0/
    #  or in the "license" file accompanying this file. This file is distributed
    #  on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
    #  either express or implied. See the License for the
    #  specific language governing permissions and
    #  limitations under the License.
    from __future__ import print_function
    import json
    import boto3
    import zipfile
    import tempfile
    import os
    
    print('Loading function')
    PERMISSIVE_ERROR_MSG = """Policy creation request rejected: * permissions not
                             allowed in both actions and resources"""
    GENERAL_ERROR_MSG = """An error has occurred while validating policy.
                            Please contact admin"""
    
    
    def get_template(event, s3, artifact, file_in_zip):
        tmp_file = tempfile.NamedTemporaryFile()
        bucket = event['CodePipeline.job']['data']['inputArtifacts'][0]['location']['s3Location']['bucketName']
        key = event['CodePipeline.job']['data']['inputArtifacts'][0]['location']['s3Location']['objectKey']
    
        with tempfile.NamedTemporaryFile() as tmp_file:
            s3.download_file(bucket, key, tmp_file.name)
            with zipfile.ZipFile(tmp_file.name, 'r') as zip:
                return zip.read(file_in_zip)
    
    
    def get_sts_session(event, account, rolename):
        sts = boto3.client("sts")
        RoleArn = str("arn:aws:iam::" + account + ":role/" + rolename)
        response = sts.assume_role(
            RoleArn=RoleArn,
            RoleSessionName='SecurityManageAccountPermissions',
            DurationSeconds=900)
        sts_session = boto3.Session(
            aws_access_key_id=response['Credentials']['AccessKeyId'],
            aws_secret_access_key=response['Credentials']['SecretAccessKey'],
            aws_session_token=response['Credentials']['SessionToken'],
            region_name=os.environ['AWS_REGION'],
            botocore_session=None,
            profile_name=None)
        return (sts_session)
    
    
    def ManagePolicy(event, context):
        # Set boto session to get pipeline artifact from sandbox/dev/test account
        artifact_session = boto3.Session(
            aws_access_key_id=event['CodePipeline.job']['data']
                                   ['artifactCredentials']['accessKeyId'],
            aws_secret_access_key=event['CodePipeline.job']['data']
                                       ['artifactCredentials']['secretAccessKey'],
            aws_session_token=event['CodePipeline.job']['data']
                                   ['artifactCredentials']['sessionToken'],
            region_name=os.environ['AWS_REGION'],
            botocore_session=None,
            profile_name=None)
        # Fetch pipeline artifact from S3
        s3 = artifact_session.client('s3')
        permission_doc = get_template(event, s3, '', 'policy.json')
        metadata_doc = json.loads(get_template(event, s3, '', 'metadata.json'))
        permission_doc_json = json.loads(permission_doc)
        # Assume the central account role in sandbox/dev/test account
        global STS_SESSION
        STS_SESSION = ''  
        STS_SESSION = get_sts_session(
            event, event['CodePipeline.job']['accountId'], 'central-account-role')
        iam = STS_SESSION.client('iam')
        codepipeline = STS_SESSION.client('codepipeline')
        policy_arn = 'arn:aws:iam::' + event['CodePipeline.job']['accountId'] + ':policy/' + metadata_doc['PolicyName']
    
        try:
            # 1.Sample code - Validate policy sent from sandbox/dev/test account:
            # look for * actions and * resources
            for statement in permission_doc_json['Statement']:
                if statement['Action'] == '*' and statement['Resource'] == '*':
                    return codepipeline.put_job_failure_result(
                                        jobId=event['CodePipeline.job']['id'],
                                        failureDetails={
                                            'type': 'JobFailed',
                                            'message': PERMISSIVE_ERROR_MSG})
            # 2.Sample code - Attach any required denies from central
            # pre-defined policy
            iam_local = boto3.client('iam')
            account_id = context.invoked_function_arn.split(":")[4]
            local_policy_arn = 'arn:aws:iam::' + account_id + ':policy/central-deny-policy-sandbox'
            policy_response = iam_local.get_policy(PolicyArn=local_policy_arn)
            policy_version_id = policy_response['Policy']['DefaultVersionId']
            policy_version_doc = iam_local.get_policy_version(
                PolicyArn=local_policy_arn,
                VersionId=policy_version_id)
            for statement in policy_version_doc['PolicyVersion']['Document']['Statement']:
                permission_doc_json['Statement'].append(
                   statement
                )
            # 3. If validated successfully, create policy in
            # sandbox/dev/test account
            iam.create_policy(
                PolicyName=metadata_doc['PolicyName'],
                PolicyDocument=json.dumps(permission_doc_json),
                Description=metadata_doc['PolicyDescription'])
    
            # successful creation, put result back to
            # sandbox/dev/test account pipeline
            codepipeline.put_job_success_result(
                jobId=event['CodePipeline.job']['id'])
        except Exception as e:
            print('Error: ' + str(e))
            codepipeline.put_job_failure_result(
                jobId=event['CodePipeline.job']['id'],
                failureDetails={'type': 'JobFailed', 'message': GENERAL_ERROR_MSG})
    
    def lambda_handler(event, context):
        print(event)
        ManagePolicy(event, context)
    

    This sample code shows how the Lambda function checks the IAM JSON policy submitted by Alice for policies that are too permissive because they allow all IAM actions on all account resources. The sample code also shows an IAM Deny action that prevents the launch of Amazon EC2 instances that are not part of the T2 EC2 instance family. An explicit deny here ensures that only T2 instances can be launched. Your security developers should author code similar to this sample code, in order to meet the security policies of every account type and control the IAM policies created in various sandbox, development, and test environments.

  2. Before saving your new Lambda function code, scroll further down to the Basic Settings section and increase the function timeout to 10 seconds.
  3. Select Save.

Step 3: Test the sandbox account pipeline

Now it’s time to deploy the solution in your sandbox account.

  1. Create the following files and compress them into an archive with the name policy.zip (this is the name expected by your created pipeline).
    • metadata.json: This file contains metadata like the name and description of the IAM policy to be created.
      
                      {
                      "PolicyDescription": "ec2 start permission policy",
                      "PolicyName": "Ec2RunTeamA"
                      }
                      

    • policy.json: This file contains the JSON body of the IAM policy to be created.
      
                      {
                      "Version": "2012-10-17",
                      "Statement": [
                              {
                              "Sid": "EC2Run",
                              "Effect": "Allow",
                              "Action": "ec2:RunInstances",
                              "Resource": "*"
                              }
                      ]
                      }
                      

  2. To upload your policy.zip file to the bucket you created earlier, go to the Amazon S3 console in the sandbox account and, in the search box at the top of the page, search for the bucket you noted in your text editor earlier as ArtifactBucket.
  3. When you locate your bucket, select the bucket name, and then select Upload. The upload dialog will appear.
  4. Select Add Files and navigate to the folder with the policy.zip file. Select the file, select Open, select Next, and then select Next again.
     
    Figure 4: S3 upload dialog

    Figure 4: S3 upload dialog

  5. Select the AWS KMS master-key radio button, and then select the KMS key that has the alias codepipeline-policy-crossaccounts.
     
    Figure 5: Selecting the KMS key

    Figure 5: Selecting the KMS key

  6. Select Next, and then select Upload.
  7. Go to AWS CodePipeline console, select your sandbox pipeline, and wait for the pipeline to start running. It can take up to a minute for it to start.
     
    Figure 6: AWS CodePipeline console

    Figure 6: AWS CodePipeline console

  8. Wait for your pipeline to complete. There should be no validation errors for the IAM policy you just uploaded and your IAM policy should be successfully created. To view the newly created IAM policy, open the AWS IAM console.
  9. Select Policies on the left and search for the policy with the name defined in the metadata.json file.
     
    Figure 7: Viewing your new policy

    Figure 7: Viewing your new policy

  10. Select the policy name. Note the IAM deny that was automatically added to your defined policy.

If you’d like to test the pipeline further, you can modify the policy to permit all actions on all resources. When policy.zip is uploaded again, the pipeline should return the following error:


Policy creation request rejected: * permissions not allowed in both actions and resources

If you encounter any errors as you modify your Lambda function code, you can always go back to the Lambda function logs in the central information security account. For more information on how to access Lambda function logs, please refer to the documentation.

The same logic used here can be extended to other sandbox, development, or test environments. However, for the central information security account, the existing roles will need to be updated to trust and have access to the resources in the newly added sandbox, development, or test account.

Summary

In this blog post, I showed you how to centralize the validation and creation of IAM policies across various AWS accounts. This allows your security developers to start coding your security best practices; permitting automatic creation and validation of IAM policies across your various sandbox, development, and test accounts. Account administrators can then attach those validated IAM policies to the required IAM users, groups or roles. This process strikes the balance between agility and control. It empowers your account administrators to create compliant and least-privilege permission IAM policies, while also allowing your application teams to keep quickly experimenting and innovating. If you have feedback about this blog post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author photo

Mahmoud ElZayet

Mahmoud is a Global Accounts Solutions Architect at AWS. He works with large enterprise customers providing guidance and technical assistance for building cloud solutions. Mahmoud is passionate about DevOps and Cloud Compliance topics. Outside of work, he enjoys exploring new places with his wife and two kids.

Easier way to control access to AWS regions using IAM policies

Post Syndicated from Sulay Shah original https://aws.amazon.com/blogs/security/easier-way-to-control-access-to-aws-regions-using-iam-policies/

We made it easier for you to comply with regulatory standards by controlling access to AWS Regions using IAM policies. For example, if your company requires users to create resources in a specific AWS region, you can now add a new condition to the IAM policies you attach to your IAM principal (user or role) to enforce this for all AWS services. In this post, I review conditions in policies, introduce the new condition, and review a policy example to demonstrate how you can control access across multiple AWS services to a specific region.

Condition concepts

Before I introduce the new condition, let’s review the condition element of an IAM policy. A condition is an optional IAM policy element that lets you specify special circumstances under which the policy grants or denies permission. A condition includes a condition key, operator, and value for the condition. There are two types of conditions: service-specific conditions and global conditions. Service-specific conditions are specific to certain actions in an AWS service. For example, the condition key ec2:InstanceType supports specific EC2 actions. Global conditions support all actions across all AWS services.

Now that I’ve reviewed the condition element in an IAM policy, let me introduce the new condition.

AWS:RequestedRegion condition key

The new global condition key, , supports all actions across all AWS services. You can use any string operator and specify any AWS region for its value.

Condition key Description Operator(s) Value
aws:RequestedRegion Allows you to specify the region to which the IAM principal (user or role) can make API calls All string operators (for example, StringEquals Any AWS region (for example, us-east-1)

I’ll now demonstrate the use of the new global condition key.

Example: Policy with region-level control

Let’s say a group of software developers in my organization is working on a project using Amazon EC2 and Amazon RDS. The project requires a web server running on an EC2 instance using Amazon Linux and a MySQL database instance in RDS. The developers also want to test Amazon Lambda, an event-driven platform, to retrieve data from the MySQL DB instance in RDS for future use.

My organization requires all the AWS resources to remain in the Frankfurt, eu-central-1, region. To make sure this project follows these guidelines, I create a single IAM policy for all the AWS services that this group is going to use and apply the new global condition key aws:RequestedRegion for all the services. This way I can ensure that any new EC2 instances launched or any database instances created using RDS are in Frankfurt. This policy also ensures that any Lambda functions this group creates for testing are also in the Frankfurt region.


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeAccountAttributes",
                "ec2:DescribeAvailabilityZones",
                "ec2:DescribeInternetGateways",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcAttribute",
                "ec2:DescribeVpcs",
                "ec2:DescribeInstances",
                "ec2:DescribeImages",
                "ec2:DescribeKeyPairs",
                "rds:Describe*",
                "iam:ListRolePolicies",
                "iam:ListRoles",
                "iam:GetRole",
                "iam:ListInstanceProfiles",
                "iam:AttachRolePolicy",
                "lambda:GetAccountSettings"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:RunInstances",
                "rds:CreateDBInstance",
                "rds:CreateDBCluster",
                "lambda:CreateFunction",
                "lambda:InvokeFunction"
            ],
            "Resource": "*",
      "Condition": {"StringEquals": {"aws:RequestedRegion": "eu-central-1"}}

        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": "arn:aws:iam::account-id:role/*"
        }
    ]
}

The first statement in the above example contains all the read-only actions that let my developers use the console for EC2, RDS, and Lambda. The permissions for IAM-related actions are required to launch EC2 instances with a role, enable enhanced monitoring in RDS, and for AWS Lambda to assume the IAM execution role to execute the Lambda function. I’ve combined all the read-only actions into a single statement for simplicity. The second statement is where I give write access to my developers for the three services and restrict the write access to the Frankfurt region using the aws:RequestedRegion condition key. You can also list multiple AWS regions with the new condition key if your developers are allowed to create resources in multiple regions. The third statement grants permissions for the IAM action iam:PassRole required by AWS Lambda. For more information on allowing users to create a Lambda function, see Using Identity-Based Policies for AWS Lambda.

Summary

You can now use the aws:RequestedRegion global condition key in your IAM policies to specify the region to which the IAM principal (user or role) can invoke an API call. This capability makes it easier for you to restrict the AWS regions your IAM principals can use to comply with regulatory standards and improve account security. For more information about this global condition key and policy examples using aws:RequestedRegion, see the IAM documentation.

If you have comments about this post, submit them in the Comments section below. If you have questions about or suggestions for this solution, start a new thread on the IAM forum.

Want more AWS Security news? Follow us on Twitter.

How to Patch Linux Workloads on AWS

Post Syndicated from Koen van Blijderveen original https://aws.amazon.com/blogs/security/how-to-patch-linux-workloads-on-aws/

Most malware tries to compromise your systems by using a known vulnerability that the operating system maker has already patched. As best practices to help prevent malware from affecting your systems, you should apply all operating system patches and actively monitor your systems for missing patches.

In this blog post, I show you how to patch Linux workloads using AWS Systems Manager. To accomplish this, I will show you how to use the AWS Command Line Interface (AWS CLI) to:

  1. Launch an Amazon EC2 instance for use with Systems Manager.
  2. Configure Systems Manager to patch your Amazon EC2 Linux instances.

In two previous blog posts (Part 1 and Part 2), I showed how to use the AWS Management Console to perform the necessary steps to patch, inspect, and protect Microsoft Windows workloads. You can implement those same processes for your Linux instances running in AWS by changing the instance tags and types shown in the previous blog posts.

Because most Linux system administrators are more familiar with using a command line, I show how to patch Linux workloads by using the AWS CLI in this blog post. The steps to use the Amazon EBS Snapshot Scheduler and Amazon Inspector are identical for both Microsoft Windows and Linux.

What you should know first

To follow along with the solution in this post, you need one or more Amazon EC2 instances. You may use existing instances or create new instances. For this post, I assume this is an Amazon EC2 for Amazon Linux instance installed from Amazon Machine Images (AMIs).

Systems Manager is a collection of capabilities that helps you automate management tasks for AWS-hosted instances on Amazon EC2 and your on-premises servers. In this post, I use Systems Manager for two purposes: to run remote commands and apply operating system patches. To learn about the full capabilities of Systems Manager, see What Is AWS Systems Manager?

As of Amazon Linux 2017.09, the AMI comes preinstalled with the Systems Manager agent. Systems Manager Patch Manager also supports Red Hat and Ubuntu. To install the agent on these Linux distributions or an older version of Amazon Linux, see Installing and Configuring SSM Agent on Linux Instances.

If you are not familiar with how to launch an Amazon EC2 instance, see Launching an Instance. I also assume you launched or will launch your instance in a private subnet. You must make sure that the Amazon EC2 instance can connect to the internet using a network address translation (NAT) instance or NAT gateway to communicate with Systems Manager. The following diagram shows how you should structure your VPC.

Diagram showing how to structure your VPC

Later in this post, you will assign tasks to a maintenance window to patch your instances with Systems Manager. To do this, the IAM user you are using for this post must have the iam:PassRole permission. This permission allows the IAM user assigning tasks to pass his own IAM permissions to the AWS service. In this example, when you assign a task to a maintenance window, IAM passes your credentials to Systems Manager. You also should authorize your IAM user to use Amazon EC2 and Systems Manager. As mentioned before, you will be using the AWS CLI for most of the steps in this blog post. Our documentation shows you how to get started with the AWS CLI. Make sure you have the AWS CLI installed and configured with an AWS access key and secret access key that belong to an IAM user that have the following AWS managed policies attached to the IAM user you are using for this example: AmazonEC2FullAccess and AmazonSSMFullAccess.

Step 1: Launch an Amazon EC2 Linux instance

In this section, I show you how to launch an Amazon EC2 instance so that you can use Systems Manager with the instance. This step requires you to do three things:

  1. Create an IAM role for Systems Manager before launching your Amazon EC2 instance.
  2. Launch your Amazon EC2 instance with Amazon EBS and the IAM role for Systems Manager.
  3. Add tags to the instances so that you can add your instances to a Systems Manager maintenance window based on tags.

A. Create an IAM role for Systems Manager

Before launching an Amazon EC2 instance, I recommend that you first create an IAM role for Systems Manager, which you will use to update the Amazon EC2 instance. AWS already provides a preconfigured policy that you can use for the new role and it is called AmazonEC2RoleforSSM.

  1. Create a JSON file named trustpolicy-ec2ssm.json that contains the following trust policy. This policy describes which principal (an entity that can take action on an AWS resource) is allowed to assume the role we are going to create. In this example, the principal is the Amazon EC2 service.
    {
      "Version": "2012-10-17",
      "Statement": {
        "Effect": "Allow",
        "Principal": {"Service": "ec2.amazonaws.com"},
        "Action": "sts:AssumeRole"
      }
    }

  1. Use the following command to create a role named EC2SSM that has the AWS managed policy AmazonEC2RoleforSSM attached to it. This generates JSON-based output that describes the role and its parameters, if the command is successful.
    $ aws iam create-role --role-name EC2SSM --assume-role-policy-document file://trustpolicy-ec2ssm.json

  1. Use the following command to attach the AWS managed IAM policy (AmazonEC2RoleforSSM) to your newly created role.
    $ aws iam attach-role-policy --role-name EC2SSM --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEC2RoleforSSM

  1. Use the following commands to create the IAM instance profile and add the role to the instance profile. The instance profile is needed to attach the role we created earlier to your Amazon EC2 instance.
    $ aws iam create-instance-profile --instance-profile-name EC2SSM-IP
    $ aws iam add-role-to-instance-profile --instance-profile-name EC2SSM-IP --role-name EC2SSM

B. Launch your Amazon EC2 instance

To follow along, you need an Amazon EC2 instance that is running Amazon Linux. You can use any existing instance you may have or create a new instance.

When launching a new Amazon EC2 instance, be sure that:

  1. Use the following command to launch a new Amazon EC2 instance using an Amazon Linux AMI available in the US East (N. Virginia) Region (also known as us-east-1). Replace YourKeyPair and YourSubnetId with your information. For more information about creating a key pair, see the create-key-pair documentation. Write down the InstanceId that is in the output because you will need it later in this post.
    $ aws ec2 run-instances --image-id ami-cb9ec1b1 --instance-type t2.micro --key-name YourKeyPair --subnet-id YourSubnetId --iam-instance-profile Name=EC2SSM-IP

  1. If you are using an existing Amazon EC2 instance, you can use the following command to attach the instance profile you created earlier to your instance.
    $ aws ec2 associate-iam-instance-profile --instance-id YourInstanceId --iam-instance-profile Name=EC2SSM-IP

C. Add tags

The final step of configuring your Amazon EC2 instances is to add tags. You will use these tags to configure Systems Manager in Step 2 of this post. For this example, I add a tag named Patch Group and set the value to Linux Servers. I could have other groups of Amazon EC2 instances that I treat differently by having the same tag name but a different tag value. For example, I might have a collection of other servers with the tag name Patch Group with a value of Web Servers.

  • Use the following command to add the Patch Group tag to your Amazon EC2 instance.
    $ aws ec2 create-tags --resources YourInstanceId --tags --tags Key="Patch Group",Value="Linux Servers"

Note: You must wait a few minutes until the Amazon EC2 instance is available before you can proceed to the next section. To make sure your Amazon EC2 instance is online and ready, you can use the following AWS CLI command:

$ aws ec2 describe-instance-status --instance-ids YourInstanceId

At this point, you now have at least one Amazon EC2 instance you can use to configure Systems Manager.

Step 2: Configure Systems Manager

In this section, I show you how to configure and use Systems Manager to apply operating system patches to your Amazon EC2 instances, and how to manage patch compliance.

To start, I provide some background information about Systems Manager. Then, I cover how to:

  1. Create the Systems Manager IAM role so that Systems Manager is able to perform patch operations.
  2. Create a Systems Manager patch baseline and associate it with your instance to define which patches Systems Manager should apply.
  3. Define a maintenance window to make sure Systems Manager patches your instance when you tell it to.
  4. Monitor patch compliance to verify the patch state of your instances.

You must meet two prerequisites to use Systems Manager to apply operating system patches. First, you must attach the IAM role you created in the previous section, EC2SSM, to your Amazon EC2 instance. Second, you must install the Systems Manager agent on your Amazon EC2 instance. If you have used a recent Amazon Linux AMI, Amazon has already installed the Systems Manager agent on your Amazon EC2 instance. You can confirm this by logging in to an Amazon EC2 instance and checking the Systems Manager agent log files that are located at /var/log/amazon/ssm/.

To install the Systems Manager agent on an instance that does not have the agent preinstalled or if you want to use the Systems Manager agent on your on-premises servers, see Installing and Configuring the Systems Manager Agent on Linux Instances. If you forgot to attach the newly created role when launching your Amazon EC2 instance or if you want to attach the role to already running Amazon EC2 instances, see Attach an AWS IAM Role to an Existing Amazon EC2 Instance by Using the AWS CLI or use the AWS Management Console.

A. Create the Systems Manager IAM role

For a maintenance window to be able to run any tasks, you must create a new role for Systems Manager. This role is a different kind of role than the one you created earlier: this role will be used by Systems Manager instead of Amazon EC2. Earlier, you created the role, EC2SSM, with the policy, AmazonEC2RoleforSSM, which allowed the Systems Manager agent on your instance to communicate with Systems Manager. In this section, you need a new role with the policy, AmazonSSMMaintenanceWindowRole, so that the Systems Manager service can execute commands on your instance.

To create the new IAM role for Systems Manager:

  1. Create a JSON file named trustpolicy-maintenancewindowrole.json that contains the following trust policy. This policy describes which principal is allowed to assume the role you are going to create. This trust policy allows not only Amazon EC2 to assume this role, but also Systems Manager.
    {
       "Version":"2012-10-17",
       "Statement":[
          {
             "Sid":"",
             "Effect":"Allow",
             "Principal":{
                "Service":[
                   "ec2.amazonaws.com",
                   "ssm.amazonaws.com"
               ]
             },
             "Action":"sts:AssumeRole"
          }
       ]
    }

  1. Use the following command to create a role named MaintenanceWindowRole that has the AWS managed policy, AmazonSSMMaintenanceWindowRole, attached to it. This command generates JSON-based output that describes the role and its parameters, if the command is successful.
    $ aws iam create-role --role-name MaintenanceWindowRole --assume-role-policy-document file://trustpolicy-maintenancewindowrole.json

  1. Use the following command to attach the AWS managed IAM policy (AmazonEC2RoleforSSM) to your newly created role.
    $ aws iam attach-role-policy --role-name MaintenanceWindowRole --policy-arn arn:aws:iam::aws:policy/service-role/AmazonSSMMaintenanceWindowRole

B. Create a Systems Manager patch baseline and associate it with your instance

Next, you will create a Systems Manager patch baseline and associate it with your Amazon EC2 instance. A patch baseline defines which patches Systems Manager should apply to your instance. Before you can associate the patch baseline with your instance, though, you must determine if Systems Manager recognizes your Amazon EC2 instance. Use the following command to list all instances managed by Systems Manager. The --filters option ensures you look only for your newly created Amazon EC2 instance.

$ aws ssm describe-instance-information --filters Key=InstanceIds,Values= YourInstanceId

{
    "InstanceInformationList": [
        {
            "IsLatestVersion": true,
            "ComputerName": "ip-10-50-2-245",
            "PingStatus": "Online",
            "InstanceId": "YourInstanceId",
            "IPAddress": "10.50.2.245",
            "ResourceType": "EC2Instance",
            "AgentVersion": "2.2.120.0",
            "PlatformVersion": "2017.09",
            "PlatformName": "Amazon Linux AMI",
            "PlatformType": "Linux",
            "LastPingDateTime": 1515759143.826
        }
    ]
}

If your instance is missing from the list, verify that:

  1. Your instance is running.
  2. You attached the Systems Manager IAM role, EC2SSM.
  3. You deployed a NAT gateway in your public subnet to ensure your VPC reflects the diagram shown earlier in this post so that the Systems Manager agent can connect to the Systems Manager internet endpoint.
  4. The Systems Manager agent logs don’t include any unaddressed errors.

Now that you have checked that Systems Manager can manage your Amazon EC2 instance, it is time to create a patch baseline. With a patch baseline, you define which patches are approved to be installed on all Amazon EC2 instances associated with the patch baseline. The Patch Group resource tag you defined earlier will determine to which patch group an instance belongs. If you do not specifically define a patch baseline, the default AWS-managed patch baseline is used.

To create a patch baseline:

  1. Use the following command to create a patch baseline named AmazonLinuxServers. With approval rules, you can determine the approved patches that will be included in your patch baseline. In this example, you add all Critical severity patches to the patch baseline as soon as they are released, by setting the Auto approval delay to 0 days. By setting the Auto approval delay to 2 days, you add to this patch baseline the Important, Medium, and Low severity patches two days after they are released.
    $ aws ssm create-patch-baseline --name "AmazonLinuxServers" --description "Baseline containing all updates for Amazon Linux" --operating-system AMAZON_LINUX --approval-rules "PatchRules=[{PatchFilterGroup={PatchFilters=[{Values=[Critical],Key=SEVERITY}]},ApproveAfterDays=0,ComplianceLevel=CRITICAL},{PatchFilterGroup={PatchFilters=[{Values=[Important,Medium,Low],Key=SEVERITY}]},ApproveAfterDays=2,ComplianceLevel=HIGH}]"
    
    {
        "BaselineId": "YourBaselineId"
    }

  1. Use the following command to register the patch baseline you created with your instance. To do so, you use the Patch Group tag that you added to your Amazon EC2 instance.
    $ aws ssm register-patch-baseline-for-patch-group --baseline-id YourPatchBaselineId --patch-group "Linux Servers"
    
    {
        "PatchGroup": "Linux Servers",
        "BaselineId": "YourBaselineId"
    }

C.  Define a maintenance window

Now that you have successfully set up a role, created a patch baseline, and registered your Amazon EC2 instance with your patch baseline, you will define a maintenance window so that you can control when your Amazon EC2 instances will receive patches. By creating multiple maintenance windows and assigning them to different patch groups, you can make sure your Amazon EC2 instances do not all reboot at the same time.

To define a maintenance window:

  1. Use the following command to define a maintenance window. In this example command, the maintenance window will start every Saturday at 10:00 P.M. UTC. It will have a duration of 4 hours and will not start any new tasks 1 hour before the end of the maintenance window.
    $ aws ssm create-maintenance-window --name SaturdayNight --schedule "cron(0 0 22 ? * SAT *)" --duration 4 --cutoff 1 --allow-unassociated-targets
    
    {
        "WindowId": "YourMaintenanceWindowId"
    }

For more information about defining a cron-based schedule for maintenance windows, see Cron and Rate Expressions for Maintenance Windows.

  1. After defining the maintenance window, you must register the Amazon EC2 instance with the maintenance window so that Systems Manager knows which Amazon EC2 instance it should patch in this maintenance window. You can register the instance by using the same Patch Group tag you used to associate the Amazon EC2 instance with the AWS-provided patch baseline, as shown in the following command.
    $ aws ssm register-target-with-maintenance-window --window-id YourMaintenanceWindowId --resource-type INSTANCE --targets "Key=tag:Patch Group,Values=Linux Servers"
    
    {
        "WindowTargetId": "YourWindowTargetId"
    }

  1. Assign a task to the maintenance window that will install the operating system patches on your Amazon EC2 instance. The following command includes the following options.
    1. name is the name of your task and is optional. I named mine Patching.
    2. task-arn is the name of the task document you want to run.
    3. max-concurrency allows you to specify how many of your Amazon EC2 instances Systems Manager should patch at the same time. max-errors determines when Systems Manager should abort the task. For patching, this number should not be too low, because you do not want your entire patch task to stop on all instances if one instance fails. You can set this, for example, to 20%.
    4. service-role-arn is the Amazon Resource Name (ARN) of the AmazonSSMMaintenanceWindowRole role you created earlier in this blog post.
    5. task-invocation-parameters defines the parameters that are specific to the AWS-RunPatchBaseline task document and tells Systems Manager that you want to install patches with a timeout of 600 seconds (10 minutes).
      $ aws ssm register-task-with-maintenance-window --name "Patching" --window-id "YourMaintenanceWindowId" --targets "Key=WindowTargetIds,Values=YourWindowTargetId" --task-arn AWS-RunPatchBaseline --service-role-arn "arn:aws:iam::123456789012:role/MaintenanceWindowRole" --task-type "RUN_COMMAND" --task-invocation-parameters "RunCommand={Comment=,TimeoutSeconds=600,Parameters={SnapshotId=[''],Operation=[Install]}}" --max-concurrency "500" --max-errors "20%"
      
      {
          "WindowTaskId": "YourWindowTaskId"
      }

Now, you must wait for the maintenance window to run at least once according to the schedule you defined earlier. If your maintenance window has expired, you can check the status of any maintenance tasks Systems Manager has performed by using the following command.

$ aws ssm describe-maintenance-window-executions --window-id "YourMaintenanceWindowId"

{
    "WindowExecutions": [
        {
            "Status": "SUCCESS",
            "WindowId": "YourMaintenanceWindowId",
            "WindowExecutionId": "b594984b-430e-4ffa-a44c-a2e171de9dd3",
            "EndTime": 1515766467.487,
            "StartTime": 1515766457.691
        }
    ]
}

D.  Monitor patch compliance

You also can see the overall patch compliance of all Amazon EC2 instances using the following command in the AWS CLI.

$ aws ssm list-compliance-summaries

This command shows you the number of instances that are compliant with each category and the number of instances that are not in JSON format.

You also can see overall patch compliance by choosing Compliance under Insights in the navigation pane of the Systems Manager console. You will see a visual representation of how many Amazon EC2 instances are up to date, how many Amazon EC2 instances are noncompliant, and how many Amazon EC2 instances are compliant in relation to the earlier defined patch baseline.

Screenshot of the Compliance page of the Systems Manager console

In this section, you have set everything up for patch management on your instance. Now you know how to patch your Amazon EC2 instance in a controlled manner and how to check if your Amazon EC2 instance is compliant with the patch baseline you have defined. Of course, I recommend that you apply these steps to all Amazon EC2 instances you manage.

Summary

In this blog post, I showed how to use Systems Manager to create a patch baseline and maintenance window to keep your Amazon EC2 Linux instances up to date with the latest security patches. Remember that by creating multiple maintenance windows and assigning them to different patch groups, you can make sure your Amazon EC2 instances do not all reboot at the same time.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about or issues implementing any part of this solution, start a new thread on the Amazon EC2 forum or contact AWS Support.

– Koen

Building Blocks of Amazon ECS

Post Syndicated from Tiffany Jernigan original https://aws.amazon.com/blogs/compute/building-blocks-of-amazon-ecs/

So, what’s Amazon Elastic Container Service (ECS)? ECS is a managed service for running containers on AWS, designed to make it easy to run applications in the cloud without worrying about configuring the environment for your code to run in. Using ECS, you can easily deploy containers to host a simple website or run complex distributed microservices using thousands of containers.

Getting started with ECS isn’t too difficult. To fully understand how it works and how you can use it, it helps to understand the basic building blocks of ECS and how they fit together!

Let’s begin with an analogy

Imagine you’re in a virtual reality game with blocks and portals, in which your task is to build kingdoms.

In your spaceship, you pull up a holographic map of your upcoming destination: Nozama, a golden-orange planet. Looking at its various regions, you see that the nearest one is za-southwest-1 (SW Nozama). You set your destination, and use your jump drive to jump to the outer atmosphere of za-southwest-1.

As you approach SW Nozama, you see three portals, 1a, 1b, and 1c. Each portal lets you transport directly to an isolated zone (Availability Zone), where you can start construction on your new kingdom (cluster), Royaume.

With your supply of blocks, you take the portal to 1b, and erect the surrounding walls of your first territory (instance)*.

Before you get ahead of yourself, there are some rules to keep in mind. For your territory to be a part of Royaume, the land ordinance requires construction of a building (container), specifically a castle, from which your territory’s lord (agent)* rules.

You can then create architectural plans (task definitions) to build your developments (tasks), consisting of up to 10 buildings per plan. A development can be built now within this or any territory, or multiple territories.

If you do decide to create more territories, you can either stay here in 1b or take a portal to another location in SW Nozama and start building there.

Amazon EC2 building blocks

We currently provide two launch types: EC2 and Fargate. With Fargate, the Amazon EC2 instances are abstracted away and managed for you. Instead of worrying about ECS container instances, you can just worry about tasks. In this post, the infrastructure components used by ECS that are handled by Fargate are marked with a *.

Instance*

EC2 instances are good ol’ virtual machines (VMs). And yes, don’t worry, you can connect to them (via SSH). Because customers have varying needs in memory, storage, and computing power, many different instance types are offered. Just want to run a small application or try a free trial? Try t2.micro. Want to run memory-optimized workloads? R3 and X1 instances are a couple options. There are many more instance types as well, which cater to various use cases.

AMI*

Sorry if you wanted to immediately march forward, but before you create your instance, you need to choose an AMI. An AMI stands for Amazon Machine Image. What does that mean? Basically, an AMI provides the information required to launch an instance: root volume, launch permissions, and volume-attachment specifications. You can find and choose a Linux or Windows AMI provided by AWS, the user community, the AWS Marketplace (for example, the Amazon ECS-Optimized AMI), or you can create your own.

Region

AWS is divided into regions that are geographic areas around the world (for now it’s just Earth, but maybe someday…). These regions have semi-evocative names such as us-east-1 (N. Virginia), us-west-2 (Oregon), eu-central-1 (Frankfurt), ap-northeast-1 (Tokyo), etc.

Each region is designed to be completely isolated from the others, and consists of multiple, distinct data centers. This creates a “blast radius” for failure so that even if an entire region goes down, the others aren’t affected. Like many AWS services, to start using ECS, you first need to decide the region in which to operate. Typically, this is the region nearest to you or your users.

Availability Zone

AWS regions are subdivided into Availability Zones. A region has at minimum two zones, and up to a handful. Zones are physically isolated from each other, spanning one or more different data centers, but are connected through low-latency, fiber-optic networking, and share some common facilities. EC2 is designed so that the most common failures only affect a single zone to prevent region-wide outages. This means you can achieve high availability in a region by spanning your services across multiple zones and distributing across hosts.

Amazon ECS building blocks

Container

Well, without containers, ECS wouldn’t exist!

Are containers virtual machines?
Nope! Virtual machines virtualize the hardware (benefits), while containers virtualize the operating system (even more benefits!). If you look inside a container, you would see that it is made by processes running on the host, and tied together by kernel constructs like namespaces, cgroups, etc. But you don’t need to bother about that level of detail, at least not in this post!

Why containers?
Containers give you the ability to build, ship, and run your code anywhere!

Before the cloud, you needed to self-host and therefore had to buy machines in addition to setting up and configuring the operating system (OS), and running your code. In the cloud, with virtualization, you can just skip to setting up the OS and running your code. Containers make the process even easier—you can just run your code.

Additionally, all of the dependencies travel in a package with the code, which is called an image. This allows containers to be deployed on any host machine. From the outside, it looks like a host is just holding a bunch of containers. They all look the same, in the sense that they are generic enough to be deployed on any host.

With ECS, you can easily run your containerized code and applications across a managed cluster of EC2 instances.

Are containers a fairly new technology?
The concept of containerization is not new. Its origins date back to 1979 with the creation of chroot. However, it wasn’t until the early 2000s that containers became a major technology. The most significant milestone to date was the release of Docker in 2013, which led to the popularization and widespread adoption of containers.

What does ECS use?
While other container technologies exist (LXC, rkt, etc.), because of its massive adoption and use by our customers, ECS was designed first to work natively with Docker containers.

Container instance*

Yep, you are back to instances. An instance is just slightly more complex in the ECS realm though. Here, it is an ECS container instance that is an EC2 instance running the agent, has a specifically defined IAM policy and role, and has been registered into your cluster.

And as you probably guessed, in these instances, you are running containers. 

AMI*

These container instances can use any AMI as long as it has the following specifications: a modern Linux distribution with the agent and the Docker Daemon with any Docker runtime dependencies running on it.

Want it more simplified? Well, AWS created the Amazon ECS-Optimized AMI for just that. Not only does that AMI come preconfigured with all of the previously mentioned specifications, it’s tested and includes the recommended ecs-init upstart process to run and monitor the agent.

Cluster

An ECS cluster is a grouping of (container) instances* (or tasks in Fargate) that lie within a single region, but can span multiple Availability Zones – it’s even a good idea for redundancy. When launching an instance (or tasks in Fargate), unless specified, it registers with the cluster named “default”. If “default” doesn’t exist, it is created. You can also scale and delete your clusters.

Agent*

The Amazon ECS container agent is a Go program that runs in its own container within each EC2 instance that you use with ECS. (It’s also available open source on GitHub!) The agent is the intermediary component that takes care of the communication between the scheduler and your instances. Want to register your instance into a cluster? (Why wouldn’t you? A cluster is both a logical boundary and provider of pool of resources!) Then you need to run the agent on it.

Task

When you want to start a container, it has to be part of a task. Therefore, you have to create a task first. Succinctly, tasks are a logical grouping of 1 to N containers that run together on the same instance, with N defined by you, up to 10. Let’s say you want to run a custom blog engine. You could put together a web server, an application server, and an in-memory cache, each in their own container. Together, they form a basic frontend unit.

Task definition

Ah, but you cannot create a task directly. You have to create a task definition that tells ECS that “task definition X is composed of this container (and maybe that other container and that other container too!).” It’s kind of like an architectural plan for a city. Some other details it can include are how the containers interact, container CPU and memory constraints, and task permissions using IAM roles.

Then you can tell ECS, “start one task using task definition X.” It might sound like unnecessary planning at first. As soon as you start to deal with multiple tasks, scaling, upgrades, and other “real life” scenarios, you’ll be glad that you have task definitions to keep track of things!

Scheduler*

So, the scheduler schedules… sorry, this should be more helpful, huh? The scheduler is part of the “hosted orchestration layer” provided by ECS. Wait a minute, what do I mean by “hosted orchestration”? Simply put, hosted means that it’s operated by ECS on your behalf, without you having to care about it. Your applications are deployed in containers running on your instances, but the managing of tasks is taken care of by ECS. One less thing to worry about!

Also, the scheduler is the component that decides what (which containers) gets to run where (on which instances), according to a number of constraints. Say that you have a custom blog engine to scale for high availability. You could create a service, which by default, spreads tasks across all zones in the chosen region. And if you want each task to be on a different instance, you can use the distinctInstance task placement constraint. ECS makes sure that not only this happens, but if a task fails, it starts again.

Service

To ensure that you always have your task running without managing it yourself, you can create a service based on the task that you defined and ECS ensures that it stays running. A service is a special construct that says, “at any given time, I want to make sure that N tasks using task definition X1 are running.” If N=1, it just means “make sure that this task is running, and restart it if needed!” And with N>1, you’re basically scaling your application until you hit N, while also ensuring each task is running.

So, what now?

Hopefully you, at the very least, learned a tiny something. All comments are very welcome!

Want to discuss ECS with others? Join the amazon-ecs slack group, which members of the community created and manage.

Also, if you’re interested in learning more about the core concepts of ECS and its relation to EC2, here are some resources:

Pages
Amazon ECS landing page
AWS Fargate landing page
Amazon ECS Getting Started
Nathan Peck’s AWSome ECS

Docs
Amazon EC2
Amazon ECS

Blogs
AWS Compute Blog
AWS Blog

GitHub code
Amazon ECS container agent
Amazon ECS CLI

AWS videos
Learn Amazon ECS
AWS videos
AWS webinars

 

— tiffany

 @tiffanyfayj