Tag Archives: roles

New IAMCTL tool compares multiple IAM roles and policies

Post Syndicated from Sudhir Reddy Maddulapally original https://aws.amazon.com/blogs/security/new-iamctl-tool-compares-multiple-iam-roles-and-policies/

If you have multiple Amazon Web Services (AWS) accounts, and you have AWS Identity and Access Management (IAM) roles among those multiple accounts that are supposed to be similar, those roles can deviate over time from your intended baseline due to manual actions performed directly out-of-band called drift. As part of regular compliance checks, you should confirm that these roles have no deviations. In this post, we present a tool called IAMCTL that you can use to extract the IAM roles and policies from two accounts, compare them, and report out the differences and statistics. We will explain how to use the tool, and will describe the key concepts.

Prerequisites

Before you install IAMCTL and start using it, here are a few prerequisites that need to be in place on the computer where you will run it:

To follow along in your environment, clone the files from the GitHub repository, and run the steps in order. You won’t incur any charges to run this tool.

Install IAMCTL

This section describes how to install and run the IAMCTL tool.

To install and run IAMCTL

  1. At the command line, enter the following command:
    pip3 install git+ssh://[email protected]/aws-samples/[email protected]
    

    You will see output similar to the following.

    Figure 1: IAMCTL tool installation output

    Figure 1: IAMCTL tool installation output

  2. To confirm that your installation was successful, enter the following command.
    iamctl –h
    

    You will see results similar to those in figure 2.

    Figure 2: IAMCTL help message

    Figure 2: IAMCTL help message

Now that you’ve successfully installed the IAMCTL tool, the next section will show you how to use the IAMCTL commands.

Example use scenario

Here is an example of how IAMCTL can be used to find differences in IAM roles between two AWS accounts.

A system administrator for a product team is trying to accelerate a product launch in the middle of testing cycles. Developers have found that the same version of their application behaves differently in the development environment as compared to the QA environment, and they suspect this behavior is due to differences in IAM roles and policies.

The application called “app1” primarily reads from an Amazon Simple Storage Service (Amazon S3) bucket, and runs on an Amazon Elastic Compute Cloud (Amazon EC2) instance. In the development (DEV) account, the application uses an IAM role called “app1_dev” to access the S3 bucket “app1-dev”. In the QA account, the application uses an IAM role called “app1_qa” to access the S3 bucket “app1-qa”. This is depicted in figure 3.

Figure 3: Showing the “app1” application in the development and QA accounts

Figure 3: Showing the “app1” application in the development and QA accounts

Setting up the scenario

To simulate this setup for the purpose of this walkthrough, you don’t have to create the EC2 instance or the S3 bucket, but just focus on the IAM role, inline policy, and trust policy.

As noted in the prerequisites, you will switch between the two AWS accounts by using the AWS CLI named profiles “dev-profile” and “qa-profile”, which are configured to point to the DEV and QA accounts respectively.

Start by using this command:

mkdir -p iamctl_test iamctl_test/dev iamctl_test/qa

The command creates a directory structure that looks like this:
Iamctl_test
|– qa
|– dev

Now, switch to the dev folder to run all the following example commands against the DEV account, by using this command:

cd iamctl_test/dev

To create the required policies, first create a file named “app1_s3_access_policy.json” and add the following policy to it. You will use this file’s content as your role’s inline policy.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": [
         "arn:aws:s3:::app1-dev/shared/*"
            ]
        }
    ]
}

Second, create a file called “app1_trust_policy.json” and add the following policy to it. You will use this file’s content as your role’s trust policy.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Now use the two files to create an IAM role with the name “app1_dev” in the account by using these command(s), run in the same order as listed here:

#create role with trust policy

aws --profile dev-profile iam create-role --role-name app1_dev --assume-role-policy-document file://app1_trust_policy.json

#put inline policy to the role created above
 
aws --profile dev-profile iam put-role-policy --role-name app1_dev --policy-name s3_inline_policy --policy-document file://app1_s3_access_policy.json

In the QA account, the IAM role is named “app1_qa” and the S3 bucket is named “app1-qa”.

Repeat the steps from the prior example against the QA account by changing dev to qa where shown in bold in the following code samples. Change the directory to qa by using this command:

cd ../qa

To create the required policies, first create a file called “app1_s3_access_policy.json” and add the following policy to it.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": [
         "arn:aws:s3:::app1-qa/shared/*"
            ]
        }
    ]
}

Next, create a file, called “app1_trust_policy.json” and add the following policy.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Now, use the two files created so far to create an IAM role with the name “app1_qa” in your QA account by using these command(s), run in the same order as listed here:

#create role with trust policy
aws --profile qa-profile iam create-role --role-name app1_qa --assume-role-policy-document file://app1_trust_policy.json

#put inline policy to the role create above
aws --profile qa-profile iam put-role-policy --role-name app1_qa --policy-name s3_inline_policy --policy-document file://app1_s3_access_policy.json

So far, you have two accounts with an IAM role created in each of them for your application. In terms of permissions, there are no differences other than the name of the S3 bucket resource the permission is granted against.

You can expect IAMCTL to generate a minimal set of differences between the DEV and QA accounts, assuming all other IAM roles and policies are the same, but to be sure about the current state of both accounts, in IAMCTL you can run a process called baselining.

Through the process of baselining, you will generate an equivalency dictionary that represents all the known string patterns that reduce the noise in the generated deviations list, and then you will introduce a change into one of the IAM roles in your QA account, followed by a final IAMCTL diff to show the deviations.

Baselining

Baselining is the process of bringing two accounts to an “equivalence” state for IAM roles and policies by establishing a baseline, which future diff operations can leverage. The process is as simple as:

  1. Run the iamctl diff command.
  2. Capture all string substitutions into an equivalence dictionary to remove or reduce noise.
  3. Save the generated detailed files as a snapshot.

Now you can go through these steps for your baseline.

Go ahead and run the iamctl diff command against these two accounts by using the following commands.

#change directory from qa to iamctl-test
cd ..

#run iamctl init
iamctl init

The results of running the init command are shown in figure 4.

Figure 4: Output of the iamctl init command

Figure 4: Output of the iamctl init command

If you look at the iamctl_test directory now, shown in figure 5, you can see that the init command created two files in the iamctl_test directory.

Figure 5: The directory structure after running the init command

Figure 5: The directory structure after running the init command

These two files are as follows:

  1. iam.jsonA reference file that has all AWS services and actions listed, among other things. IAMCTL uses this to map the resource listed in an IAM policy to its corresponding AWS resource, based on Amazon Resource Name (ARN) regular expression.
  2. equivalency_list.jsonThe default sample dictionary that IAMCTL uses to suppress false alarms when it compares two accounts. This is where the known string patterns that need to be substituted are added.

Note: A best practice is to make the directory where you store the equivalency dictionary and from which you run IAMCTL to be a Git repository. Doing this will let you capture any additions or modifications for the equivalency dictionary by using Git commits. This will not only give you an audit trail of your historical baselines but also gives context to any additions or modifications to the equivalency dictionary. However, doing this is not necessary for the regular functioning of IAMCTL.

Next, run the iamctl diff command:

#run iamctl diff
iamctl diff dev-profile dev qa-profile qa
Figure 6: Result of diff command

Figure 6: Result of diff command

Figure 6 shows the results of running the diff command. You can see that IAMCTL considers the app1_qa and app1_dev roles as unique to the DEV and QA accounts, respectively. This is because IAMCTL uses role names to decide whether to compare the role or tag the role as unique.

You will add the strings “dev” and “qa” to the equivalency dictionary to instruct IAMCTL to substitute occurrences of these two strings with “accountname” by adding the follow JSON to the equivalency_list.json file. You will also clean up some defaults already present in there.

echo “{“accountname”:[“dev”,”qa”]}” > equivalency_list.json

Figure 7 shows the equivalency dictionary before you take these actions, and figure 8 shows the dictionary after these actions.

Figure 7: Equivalency dictionary before

Figure 7: Equivalency dictionary before

Figure 8: Equivalency dictionary after

Figure 8: Equivalency dictionary after

There’s another thing to notice here. In this example, one common role was flagged as having a difference. To know which role this is and what the difference is, go to the detail reports folder listed at the bottom of the summary report. The directory structure of this folder is shown in figure 9.

Notice that the reports are created under your home directory with a folder structure that mimics the time stamp down to the second. IAMCTL does this to maintain uniqueness for each run.

tree /Users/<username>/aws-idt/output/2020/08/24/08/38/49/
Figure 9: Files written to the output reports directory

Figure 9: Files written to the output reports directory

You can see there is a file called common_roles_in_dev_with_differences.csv, and it lists a role called “AwsSecurity***Audit”.

You can see there is another file called dev_to_qa_common_role_difference_items.csv, and it lists the granular IAM items from the DEV account that belong to the “AwsSecurity***Audit” role as compared to QA, but which have differences. You can see that all entries in the file have the DEV account number in the resource ARN, whereas in the qa_to_dev_common_role_difference_items.csv file, all entries have the QA account number for the same role “AwsSecurity***Audit”.

Add both of the account numbers to the equivalency dictionary to substitute them with a placeholder number, because you don’t want this role to get flagged as having differences.

echo “{“accountname”:[“dev”,”qa”],”000000000000”:[“123456789012”,”987654321098”]}” > equivalency_list.json

Now, re-run the diff command.

#run iamctl diff
iamctl diff dev-profile dev qa-profile qa

As you can see in figure 10, you get back the result of the diff command that shows that the DEV account doesn’t have any differences in IAM roles as compared to the QA account.

Figure 10: Output showing no differences after completion of baselining

Figure 10: Output showing no differences after completion of baselining

This concludes the baselining for your DEV and QA accounts. Now you will introduce a change.

Introducing drift

Drift occurs when there is a difference in actual vs expected values in the definition or configuration of a resource. There are several reasons why drift occurs, but for this scenario you will use “intentional need to respond to a time-sensitive operational event” as a reason to mimic and introduce drift into what you have built so far.

To simulate this change, add “s3:PutObject” to the qa app1_s3_access_policy.json file as shown in the following example.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*",
                "s3:PutObject"
            ],
            "Resource": [
         "arn:aws:s3:::app1-qa/shared/*"
            ]
        }
    ]
}

Put this new inline policy on your existing role “app1_qa” by using this command:

aws --profile qa-profile iam put-role-policy --role-name app1_qa --policy-name s3_inline_policy --policy-document file://app1_s3_access_policy.json

The following table represents the new drift in the accounts.

ActionAccount-DEV
Role name: app1_dev
Account-QA
Role name: app1_qa
s3:Get*YesYes
s3:List*YesYes
s3: PutObjectNoYes

Next, run the iamctl diff command to see how it picks up the drift from your previously baselined accounts.

#change directory from qa to iamctL-test
cd ..
iamctl diff dev-profile dev qa-profile qa
Figure 11: Output showing the one deviation that was introduced

Figure 11: Output showing the one deviation that was introduced

You can see that IAMCTL now shows that the QA account has one difference as compared to DEV, which is what we expect based on the deviation you’ve introduced.

Open up the file qa_to_dev_common_role_difference_items.csv to look at the one difference. Again, adjust the following path example with the output from the iamctl diff command at the bottom of the summary report in Figure 11.

cat /Users/<username>/aws-idt/output/2020/09/18/07/38/15/qa_to_dev_common_role_difference_items.csv

As shown in figure 12, you can see that the file lists the specific S3 action “PutObject” with the role name and other relevant details.

Figure 12: Content of file qa_to_dev_common_role_difference_items.csv showing the one deviation that was introduced

Figure 12: Content of file qa_to_dev_common_role_difference_items.csv showing the one deviation that was introduced

You can use this information to remediate the deviation by performing corrective actions in either your DEV account or QA account. You can confirm the effectiveness of the corrective action by re-baselining to make sure that zero deviations appear.

Conclusion

In this post, you learned how to use the IAMCTL tool to compare IAM roles between two accounts, to arrive at a granular list of meaningful differences that can be used for compliance audits or for further remediation actions. If you’ve created your IAM roles by using an AWS CloudFormation stack, you can turn on drift detection and easily capture the drift because of changes done outside of AWS CloudFormation to those IAM resources. For more information about drift detection, see Detecting unmanaged configuration changes to stacks and resources. Lastly, see the GitHub repository where the tool is maintained with documentation describing each of the subcommand concepts. We welcome any pull requests for issues and enhancements.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS IAM forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Sudhir Reddy Maddulapally

Sudhir is a Senior Partner Solution Architect who builds SaaS solutions for partners by day and is a tech tinkerer by night. He enjoys trips to state and national parks, and Yosemite is his favorite thus far!

Author

Soumya Vanga

Soumya is a Cloud Application Architect with AWS Professional Services in New York, NY, helping customers design solutions and workloads and to adopt Cloud Native services.

How to use trust policies with IAM roles

Post Syndicated from Jonathan Jenkyn original https://aws.amazon.com/blogs/security/how-to-use-trust-policies-with-iam-roles/

AWS Identity and Access Management (IAM) roles are a significant component in the way customers operate in Amazon Web Service (AWS). In this post, I’ll dive into the details on how Cloud security architects and account administrators can protect IAM roles from misuse by using trust policies. By the end of this post, you’ll know how to use IAM roles to build trust policies that work at scale, providing guardrails to control access to resources in your organization.

In general, there are four different scenarios where you might use IAM roles in AWS:

  • One AWS service accesses another AWS service – When an AWS service needs access to other AWS services or functions, you can create a role that will grant that access.
  • One AWS account accesses another AWS account – This use case is commonly referred to as a cross-account role pattern. This allows human or machine IAM principals from other AWS accounts to assume this role and act on resources in this account.
  • A third-party web identity needs access – This use case allows users with identities in third-party systems like Google and Facebook, or Amazon Cognito, to use a role to access resources in the account.
  • Authentication using SAML2.0 federation – This is commonly used by enterprises with Active Directory that want to connect using an IAM role so that their users can use single sign-on workflows to access AWS accounts.

In all cases, the makeup of an IAM role is the same as that of an IAM user and is only differentiated by the following qualities:

  • An IAM role does not have long term credentials associated with it; rather, a principal (an IAM user, machine, or other authenticated identity) assumes the IAM role and inherits the permissions assigned to that role.
  • The tokens issued when a principal assumes an IAM role are temporary. Their expiration reduces the risks associated with credentials leaking and being reused.
  • An IAM role has a trust policy that defines which conditions must be met to allow other principals to assume it. This trust policy reduces the risks associated with privilege escalation.

Recommendation: You should make extensive use of temporary IAM roles rather than permanent credentials such as IAM users. For more information review this page: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp.html

While the list of users having access to your AWS accounts can change over time, the roles used to manage your AWS account probably won’t. The use of IAM roles essentially decouples your enterprise identity system (SAML 2.0) from your permission system (AWS IAM policies), simplifying management of each.

Managing access to IAM roles

Let’s dive into how you can create relationships between your enterprise identity system and your permissions system by looking at the policy types you can apply to an IAM role.

An IAM role has three places where it uses policies:

  • Permission policies (inline and attached) – These policies define the permissions that a principal assuming the role is able (or restricted) to perform, and on which resources.
  • Permissions boundary – A permissions boundary is an advanced feature for using a managed policy to set the maximum permissions that an identity-based policy can grant to an IAM entity. An entity’s permissions boundary allows it to perform only the actions that are allowed by both its identity-based permission policies and its permissions boundaries.
  • Trust relationship – This policy defines which principals can assume the role, and under which conditions. This is sometimes referred to as a resource-based policy for the IAM role. We’ll refer to this policy simply as the ‘trust policy’.

A role can be assumed by a human user or a machine principal, such as an Amazon Elastic Computer Cloud (Amazon EC2) instance or an AWS Lambda function. Over the rest of this post, you’ll see how you’re able to reduce the conditions for principals to use roles by configuring their trust policies.

An example of a simple trust policy

A common use case is when you need to provide security audit access to your account, allowing a third party to review the configuration of that account. After attaching the relevant permission policies to an IAM role, you need to add a cross-account trust policy to allow the third-party auditor to make the sts:AssumeRole API call to elevate their access in the audited account. The following trust policy shows an example policy created through the AWS Management Console:


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {}
    }
  ]
}

As you can see, it has the same structure as other IAM policies with Effect, Action, and Condition components. It also has the Principal parameter, but no Resource attribute. This is because the resource, in the context of the trust policy, is the IAM role itself. For the same reason, the Action parameter will only ever be set to one of the following values: sts:AssumeRole, sts:AssumeRoleWithSAML, or sts:AssumeRoleWithWebIdentity.

Note: The suffix root in the policy’s Principal attribute equates to “authenticated and authorized principals in the account,” not the special and all-powerful root user principal that is created when an AWS account is created.

Using the Principal attribute to reduce scope

In a trust policy, the Principal attribute indicates which other principals can assume the IAM role. In the example above, 111122223333 represents the AWS account number for the auditor’s AWS account. In effect, this allows any principal in the 111122223333 AWS account with sts:AssumeRole permissions to assume this role.

To restrict access to a specific IAM user account, you can define the trust policy like the following example, which would allow only the IAM user LiJuan in the 111122223333 account to assume this role. LiJuan would also need to have sts:AssumeRole permissions attached to their IAM user for this to work:


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:user/LiJuan"
      },
      "Action": "sts:AssumeRole",
      "Condition": {}
    }
  ]
}

The principals set in the Principal attribute can be any principal defined by the IAM documentation, and can refer to an AWS or a federated principal. You cannot use a wildcard (“*” or “?”) within a Principal for a trust policy, other than one special condition, which I’ll come back to in a moment: You must define precisely which principal you are referring to because there is a translation that occurs when you submit your trust policy that ties it to each principal’s hidden principal ID, and it can’t do that if there are wildcards in the principal.

The only scenario where you can use a wildcard in the Principal parameter is where the parameter value is only the “*” wildcard. Use of the global wildcard “*” for the Principal isn’t recommended unless you have clearly defined Conditional attributes in the policy statement to restrict use of the IAM role, since doing so without Conditional attributes permits assumption of the role by any principal in any AWS account, regardless of who that is.

Using identity federation on AWS

Federated users from SAML 2.0 compliant enterprise identity services are given permissions to access AWS accounts through the use of IAM roles. While the user-to-role configuration of this connection is established within the SAML 2.0 identity provider, you should also put controls in the trust policy in IAM to reduce any abuse.

Because the Principal attribute contains configuration information about the SAML mapping, in the case of Active Directory, you need to use the Condition attribute in the trust policy to restrict use of the role from the AWS account management perspective. This can be done by restricting the SourceIp address, as demonstrated later, or by using one or more of the SAML-specific Condition keys available. My recommendation here is to be as specific as you can in reducing the set of principals that can use the role as is practical. This is best achieved by adding qualifiers into the Condition attribute of your trust policy.

There’s a very good guide on creating roles for SAML 2.0 federation that contains a basic example trust policy you can use.

Using the Condition attribute in a trust policy to reduce scope

The Condition statement in your trust policy sets additional requirements for the Principal trying to assume the role. If you don’t set a Condition attribute, the IAM engine will rely solely on the Principal attribute of this policy to authorize role assumption. Given that it isn’t possible to use wildcards within the Principal attribute, the Condition attribute is a really flexible way to reduce the set of users that are able to assume the role without necessarily specifying the principals.

Limiting role use based on an identifier

Occasionally teams managing multiple roles can become confused as to which role achieves what and can inadvertently assume the wrong role. This is referred to as the Confused Deputy problem. This next section shows you a way to quickly reduce this risk.

The following trust policy requires that principals from the 111122223333 AWS account have provided a special phrase when making their request to assume the role. Adding this condition reduces the risk that someone from the 111122223333 account will assume this role by mistake. This phrase is configured by specifying an ExternalID conditional context key.


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "ExampleSpecialPhrase"
        }
      }
    }
  ]
}

In the example trust policy above, the value ExampleSpecialPhrase isn’t a secret or a password. Adding the ExternalID condition limits this role from being assumed using the console. The only way to add this ExternalID argument into the role assumption API call is to use the AWS Command Line Interface (AWS CLI) or a programming interface. Having this condition doesn’t prevent a user who knows about this relationship and the ExternalId from assuming what might be a privileged set of permissions, but does help manage risks like the Confused Deputy problem. I see customers using an ExternalID that matches the name of the AWS account, which works to ensure that an operator is working on the account they believe they’re working on.

Limiting role use based on multi-factor authentication

By using the Condition attribute, you can also require that the principal assuming this role has passed a multi-factor authentication (MFA) check before they’re permitted to use this role. This again limits the risk associated with mistaken use of the role and adds some assurances about the principal’s identity.


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "BoolIfExists": { 
          "aws:MultiFactorAuthPresent" : "true" 
		}
      }
    }
  ]
}

In the example trust policy above, I also introduced the MultiFactorAuthPresent conditional context key. Per the AWS global condition context keys documentation, the MultiFactorAuthPresent conditional context key does not apply to sts:AssumeRole requests in the following contexts:

  • When using access keys in the CLI or with the API
  • When using temporary credentials without MFA
  • When a user signs in to the AWS Console
  • When services (like AWS CloudFormation or Amazon Athena) reuse session credentials to call other APIs
  • When authentication has taken place via federation

In the example above, the use of the BoolIfExists qualifier to the MultiFactorAuthPresent conditional context key evaluates the condition as true if:

  • The principal type can have an MFA attached, and does.
    or
  • The principal type cannot have an MFA attached.

This is a subtle difference but makes the use of this conditional key in trust policies much more flexible across all principal types.

Limiting role use based on time

During activities like security audits, it’s quite common for the activity to be time-bound and temporary. There’s a risk that the IAM role could be assumed even after the audit activity concludes, which might be undesirable. You can manage this risk by adding a time condition to the Condition attribute of the trust policy. This means that rather than being concerned with disabling the IAM role created immediately following the activity, customers can build the date restriction into the trust policy. You can do this by using policy attribute statements, like so:


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "DateGreaterThan": {
          "aws:CurrentTime": "2020-09-01T12:00:00Z"
        },
        "DateLessThan": {
          "aws:CurrentTime": "2020-09-07T12:00:00Z"
        }
      }
    }
  ]
}

Limiting role use based on IP addresses or CIDR ranges

If the auditor for a security audit is using a known fixed IP address, you can build that information into the trust policy, further reducing the opportunity for the role to be assumed by unauthorized actors calling the assumeRole API function from another IP address or CIDR range:


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "IpAddress": {
          "aws:SourceIp": "203.0.113.0/24"
        }
      }
    }
  ]
}

Limiting role use based on tags

IAM tagging capabilities can also help to build flexible and adaptive trust policies, too, so that they create an attribute-based access control (ABAC) model for IAM management. You can build trust policies that only permit principals that have already been tagged with a specific key and value to assume a specific role. The following example requires that IAM principals in the AWS account 111122223333 be tagged with department = OperationsTeam for them to assume the IAM role.


tagged with department = OperationsTeam for them to assume the IAM role.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalTag/department": "OperationsTeam"
        }
      }
    }
  ]
}

If you want to create this effect, I highly recommend the use of the PrincipalTag pattern above, but you must also be cautious about which principals are then also given iam:TagUser, iam:TagRole, iam:UnTagUser, and iam:UnTagRole permissions, perhaps even using the aws:PrincipalTag condition within the permissions boundary policy to restrict their ability to retag their own IAM principal or that of another IAM role they can assume.

Limiting or extending access to a role based on AWS Organizations

Since its announcement in 2016, almost every enterprise customer I work with uses AWS Organizations. This AWS service allows customers to create an organizational structure for their accounts by creating hard boundaries to manage blast-radius risks, among other advantages. You can use the PrincipalOrgID condition to limit assumption of an organization-wide core IAM role.

Caution: As you’ll see in the example below, you need to set the Principal attribute to “*” to do this, which would, without the conditional restriction, allow all role assumption requests to be accepted for this role, irrespective of the source of that assumption request. For that reason, be especially careful about the use of this pattern.


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalOrgID": "o-abcd12efg1"
        }
      }
    }
  ]
}

It isn’t practical to write out all the AWS account identifiers into a trust policy, and because of the way policies like this are evaluated, you can’t include wildcard characters for the account number in the principal’s account number field. The use of the PrincipalOrgID global condition context key provides us with a neat and dynamic mechanism to create a short policy statement.

Role chaining

There are instances where a third party might themselves be using IAM roles, or where an AWS service resource that has already assumed a role needs to assume another role (perhaps in another account), and customers might need to allow only specific IAM roles in that remote account to assume the IAM role you create in your account. You can use role chaining to build permitted role escalation routes using role assumption from within the same account or AWS organization, or from third-party AWS accounts.

Consider the following trust policy example where I use a combination of the Principal attribute to scope down to an AWS account, and the aws:UserId global conditional context key to scope down to a specific role using its RoleId. To capture the RoleId for the role you want to be able to assume, you can run the following command using the AWS CLI:


# aws iam get-role --role-name CrossAccountAuditor
{
    "Role": {
        "Path": "/",
        "RoleName": "CrossAccountAuditor",
        "RoleId": "ARO1234567123456D",
        "Arn": "arn:aws:iam:: 111122223333:role/CrossAccountAuditor",
        "CreateDate": "2017-08-31T14:24:20+00:00",
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Sid": "",
                    "Effect": "Allow",
                    "Principal": {
                        "AWS": "arn:aws:iam::111122223333:root"
                    },
                    "Action": "sts:AssumeRole",
                    "Condition": {
                        "StringEquals": {
                            "sts:ExternalId": "ExampleSpecialPhrase"
                        }
                    }
                }
            ]
        }
    }
}

Here is the example trust policy that limits to only the CrossAccountAuditor role from AWS Account 111122223333.


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringLike": {
          "aws:userId": "ARO1234567123456D:*"
        }
      }
    }
  ]
}

If you’re using an IAM user and have assumed the CrossAccountAuditor IAM role, the policy above will work through the AWS CLI with a call to aws sts assume-role and through the console.

This type of trust policy also works for services like Amazon EC2, allowing those instances using their assigned instance profile role to assume a role in another account to perform actions. We’ll touch on this use case later in the post.

Putting it all together

AWS customers can use combinations of all the above Principal and Condition attributes to hone the trust they’re extending out to any third party, or even within their own organization. They might create an accumulated trust policy for an IAM role which achieves the following effect:

Allows only a user named PauloSantos, in AWS account number 111122223333, to assume the role if they have also authenticated with an MFA, are logging in from an IP address in the 203.0.113.0 to 203.0.113.24 CIDR range, and the date is between noon of September 1, 2020, and noon of September 7, 2020.


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::111122223333:user/PauloSantos"
        ]
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "BoolIfExists": {
          "aws:MultiFactorAuthPresent": "true"
        },
        "IpAddress": {
          "aws:SourceIp": "203.0.113.0/24"
        },
        "DateGreaterThan": {
          "aws:CurrentTime": "2020-09-01T12:00:00Z"
        },
        "DateLessThan": {
          "aws:CurrentTime": "2020-09-07T12:00:00Z"
        }
      }
    }
  ]
}

I’ve seen customers use this to create IAM users who have no permissions attached besides sts:AssumeRole. Trust relationships are then configured between the IAM users and the IAM roles, creating ultimate flexibility in defining who has access to what roles without needing to update the IAM user identity pool at all.

A word on Effect: Deny and NotPrincipal in IAM role trust policies

I have seen some customers make use of an “Effect”: “Deny” clause in their trust policies. This pattern can help manage a wildcard statement in another “Effect”: “Allow” clause of the same trust policy. However, this isn’t the best approach for most scenarios. You will typically be able to define each principal in your policy as being allowed access. An example of where this might not be true is where you have a clause that uses the global wildcard “*” as a principal, in which case it will be necessary to add Deny statements to further filter the access.

Putting a wildcard into the Principal attribute of an Allow policy statement, particularly in relation to trust policies, can be dangerous if you haven’t done a robust job of managing the Condition attribute in the same statement. Be as specific as possible in your Allow statement, and use Principal attributes first, rather than then relying on Deny statements to manage potential security gaps created by your use of wildcards.

The following trust policy allows all IAM principals within the o-abcd12efg1 organization to assume the IAM role, but only if it’s before September 7, 2020:


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalOrgID": "o-abcd12efg1"
        }
      }
    },
    {
      "Effect": "Deny",
      "Principal": {
        "AWS": "*"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "DateGreaterThan": {
          "aws:CurrentTime": "2020-09-07T12:00:00Z"
        }
      }
    }
  ]
}

The use of NotPrincipal in trust policies

You can also build into your trust policies a NotPrincipal condition. Again, this is rarely the best choice, because you can introduce unnecessary complexity and confusion into your policies. Instead, you can avoid that problem by using fairly simple and prescriptive Principal statements.

Statements with NotPrincipal can also use a Deny statement as well, so it can create quite baffling policy logic, which if misunderstood could create unintended opportunities for misuse or abuse.

Here’s an example where you might think to use Deny and NotPrincipal in a trust policy—but notice this has the same effect as adding arn:aws:iam::123456789012:role/CoreAccess in a single Allow statement. In general, Deny with NotPrincipal statements in trust policies create unnecessary complexity, and should be avoided.


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Deny",
      "NotPrincipal": {
        "AWS": "arn:aws:iam::111122223333:role/CoreAccess"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Remember, your Principal attribute should be very specific, to reduce the set of those able to assume the role, and an IAM role trust policy won’t permit access if a corresponding Allow statement isn’t explicitly present in the trust policy. It’s better to rely on the default deny policy evaluation logic where you’re able, rather than introducing unnecessary complexity into your policy logic.

Creating trust policies for AWS services that assume roles

There are two types of contexts where AWS services need access to IAM roles to function:

  1. Resources managed by an AWS service (like Amazon EC2 or Lambda, for example) need access to an IAM role to execute functions on other AWS resources, and need permissions to do so.
  2. An AWS service that abstracts its functionality from other AWS services, like Amazon Elastic Container Service (Amazon ECS) or Amazon Lex, needs access to execute functions on AWS resources. These are called service-linked roles and are a special case that’s out of the scope of this post.

In both contexts, you have the service itself as an actor. The service is assuming your IAM role so it can provide your credentials to your Lambda function (the first context) or use those credentials to do things (the second context). In the same way that IAM roles are used by human operators to provide an escalation mechanism for users operating with specific functions in the examples above, so, too, do AWS resources, such as Lambda functions, Amazon EC2 instances, and even AWS CloudFormation, require the same mechanism. You can find more information about how to create IAM Roles for AWS Services here.

An IAM role for a human operator and for an AWS service are exactly the same, even though they have a different principal defined in the trust policy. The policy’s Principal will define the AWS service that is permitted to assume the role for its function.

Here’s an example trust policy for a role designed for an Amazon EC2 instance to assume. You can see that the principal provided is the ec2.amazonaws.com service:


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Every configuration of an AWS resource should be passed a specific role unique to its function. So, if you have two Amazon EC2 launch configurations, you should design two separate IAM roles, even if the permissions they require are currently the same. This allows each configuration to grow or shrink the permissions it requires over time, without needing to reattach IAM roles to configurations, which might create a privilege escalation risk. Instead, you update the permissions attached to each IAM role independently, knowing that it will only be used by that one service resource. This helps reduce the potential impact of risks. Automating your management of roles will help here, too.

Several customers have asked if it’s possible to design a trust policy for an IAM role such that it can only be passed to a specific Amazon EC2 instance. This isn’t directly possible. You cannot place the Amazon Resource Name (ARN) for an EC2 instance into the Principal of a trust policy, nor can you use tag-based condition statements in the trust policy to limit the ability for the role to be used by a specific resource.

The only option is to manage access to the iam:PassRole action within the permission policy for those IAM principals you expect to be attaching IAM roles to AWS resources. This special Action is evaluated when a principal tries to attach another IAM role to an AWS service or AWS resource.

You should use restrictions on access to the iam:PassRole action with permission policies and permission boundaries. This means that the ability to attach roles to instance profiles for Amazon EC2 is limited, rather than using the trust policy on the role assumed by the EC2 instance to achieve this. This approach makes it much easier to manage scaling for both those principals attaching roles to EC2 instances, and the instances themselves.

You could use a permission policy to limit the ability for the associated role to attach other roles to Amazon EC2 instances with the following permission policy, unless the role name is prefixed with EC2-Webserver-:


{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Action": [
            "iam:GetRole",
            "iam:PassRole"
        ],
        "Resource": "arn:aws:iam::111122223333:role/EC2-Webserver-*"
    }]
}

Conclusion

You now have all the tools you need to build robust and effective trust policies that work at scale, providing guardrails for your users and those who might want to access resources in your account from outside your organization.

Policy logic isn’t always simple, and I encourage you to use sandbox accounts to try out your ideas. In general, simplicity should win over cleverness. IAM policies and statements that might well be frugal in their use of policy language might also be difficult to read, interpret, and update by other IAM administrators in the future. Keeping your trust policies simple helps to build IAM relationships everyone understands and can manage, and use, effectively.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS IAM forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Jonathan Jenkyn

Jonathan is a Senior Security Growth Strategies Consultant with AWS Professional Services. He’s an active member of the People with Disabilities affinity group, and has built several Amazon initiatives supporting charities and social responsibility causes. Since 1998, he has been involved in IT Security at many levels, from implementation of cryptographic primitives to managing enterprise security governance. Outside of work, he enjoys running, cycling, fund-raising for the BHF and Ipswich Hospital Charity, and spending time with his wife and 5 children.

Identify unused IAM roles and remove them confidently with the last used timestamp

Post Syndicated from Mathangi Ramesh original https://aws.amazon.com/blogs/security/identify-unused-iam-roles-remove-confidently-last-used-timestamp/

As you build on AWS, you create AWS Identity and Access Management (IAM) roles to enable teams and applications to use AWS services. As those teams and applications evolve, you might only rely on a sub-set of your original roles to meet your needs. This can leave unused roles in your AWS account. To help you identify these unused roles, IAM now reports the last-used timestamp that represents when a role was last used to make an AWS request. You or your security team can use this information to identify, analyze, and then confidently remove unused roles. This helps you improve the security posture of your AWS environments. Additionally, by removing unused roles, you can simplify your monitoring and auditing efforts by focusing only on roles that are in use. You can review when a role was last used to access your AWS environment in the IAM console, using the AWS Command Line Interface (AWS CLI), or AWS SDK.

In this post, I demonstrate how to identify and remove roles that your team or applications don’t use by viewing the last-used timestamp in the IAM console.  Before I share an example, I’ll describe the existing IAM APIs where we now also report the last-used timestamp:

  • Get-role: Returns role details, including the path, ARN (Amazon resource number), and trust policy. You can now use this API to retrieve the last-used timestamp.
  • Get-account-authorization-details: Retrieves information about all the IAM users, groups, roles, and policies in your AWS account. You can now view the last-used timestamp along with the other role details.

How to use the AWS Management Console to view last-used information for roles

Imagine you’re a system administrator for Example Inc. and your development team is working on a new application. To enable them to get started with AWS quickly, you create roles for the team and their application. As the application goes through final review, you learn the team and application now rely on a smaller set of roles to access AWS services. This leaves unused roles in your AWS accounts that you might want to remove. You’re going to check the last time each role made a request to AWS and use this information to determine whether the team is using the role. If they aren’t, you plan to remove it knowing the team doesn’t need it for the application.

To view role-last-used information in the IAM Console, select Roles in the IAM navigation pane, then look for the Last activity column (see Figure 1 below). This displays the number of days that have passed since each role made an AWS service request. AWS records last-used information for the trailing 400 days. This is referred to as the tracking period. You can sort the column to identify the roles your team has not used recently.

In the case of Example Inc., let’s say you want to get rid of any roles that have been inactive for 90 days or more. From the information in Figure 1, you see that your team is using ApplicationEC2Access, TestRole, and CodeDeployRole. You also see they haven’t used AdminAccess, EC2FullAccess, and InfraSetupRole in the last 90 days. You can now delete these roles confidently. (Last activity “None”, as seen for the AdminAccess role, means that the role was not used within the trailing 400-day tracking period to make any service request.)
 

Figure 1: "Last activity" column in IAM console

Figure 1: “Last activity” column in IAM console

While analyzing the last-used timestamp for each role, you notice that the MigrationRole role was last active two months ago. You want to gather more information about the role’s access patterns to determine whether you ought to delete it. To do this, select the name of the role. From the role detail page, navigate to the Access Advisor tab and investigate the list of accessed services and verify what the role was used for. Access advisor provides a report that displays a list of services and timestamps that indicate when the selected IAM principal last accessed each of the services that it has permissions to. Based on this report, you can decide to follow up with the development team to see if they still need this role. Thus, you have reduced the number of roles in your account from 9 to 6, making it easier to monitor active roles and restrict access to your AWS environments.
 

Figure 2: Access Advisor report

Summary

In this post, I showed you how to use role-last-used information to identify and remove unused roles. By removing unused roles, you can simplify monitoring and improve your security posture. To learn more about deleting roles, visit the deleting roles or instance profiles documentation.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the IAM forum or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Mathangi Ramesh

Mathangi Ramesh

Mathangi is the product manager for AWS Identity and Access Management. She enjoys talking to customers and working with data to solve problems. Outside of work, Mathangi is a fitness enthusiast and a Bharatanatyam dancer. She holds an MBA degree from Carnegie Mellon University.

Amazon Neptune Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-neptune-generally-available/

Amazon Neptune is now Generally Available in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland). Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets. At the core of Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with millisecond latencies. Neptune supports two popular graph models, Property Graph and RDF, through Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune can be used to power everything from recommendation engines and knowledge graphs to drug discovery and network security. Neptune is fully-managed with automatic minor version upgrades, backups, encryption, and fail-over. I wrote about Neptune in detail for AWS re:Invent last year and customers have been using the preview and providing great feedback that the team has used to prepare the service for GA.

Now that Amazon Neptune is generally available there are a few changes from the preview:

Launching an Amazon Neptune Cluster

Launching a Neptune cluster is as easy as navigating to the AWS Management Console and clicking create cluster. Of course you can also launch with CloudFormation, the CLI, or the SDKs.

You can monitor your cluster health and the health of individual instances through Amazon CloudWatch and the console.

Additional Resources

We’ve created two repos with some additional tools and examples here. You can expect continuous development on these repos as we add additional tools and examples.

  • Amazon Neptune Tools Repo
    This repo has a useful tool for converting GraphML files into Neptune compatible CSVs for bulk loading from S3.
  • Amazon Neptune Samples Repo
    This repo has a really cool example of building a collaborative filtering recommendation engine for video game preferences.

Purpose Built Databases

There’s an industry trend where we’re moving more and more onto purpose-built databases. Developers and businesses want to access their data in the format that makes the most sense for their applications. As cloud resources make transforming large datasets easier with tools like AWS Glue, we have a lot more options than we used to for accessing our data. With tools like Amazon Redshift, Amazon Athena, Amazon Aurora, Amazon DynamoDB, and more we get to choose the best database for the job or even enable entirely new use-cases. Amazon Neptune is perfect for workloads where the data is highly connected across data rich edges.

I’m really excited about graph databases and I see a huge number of applications. Looking for ideas of cool things to build? I’d love to build a web crawler in AWS Lambda that uses Neptune as the backing store. You could further enrich it by running Amazon Comprehend or Amazon Rekognition on the text and images found and creating a search engine on top of Neptune.

As always, feel free to reach out in the comments or on twitter to provide any feedback!

Randall

Easier way to control access to AWS regions using IAM policies

Post Syndicated from Sulay Shah original https://aws.amazon.com/blogs/security/easier-way-to-control-access-to-aws-regions-using-iam-policies/

We made it easier for you to comply with regulatory standards by controlling access to AWS Regions using IAM policies. For example, if your company requires users to create resources in a specific AWS region, you can now add a new condition to the IAM policies you attach to your IAM principal (user or role) to enforce this for all AWS services. In this post, I review conditions in policies, introduce the new condition, and review a policy example to demonstrate how you can control access across multiple AWS services to a specific region.

Condition concepts

Before I introduce the new condition, let’s review the condition element of an IAM policy. A condition is an optional IAM policy element that lets you specify special circumstances under which the policy grants or denies permission. A condition includes a condition key, operator, and value for the condition. There are two types of conditions: service-specific conditions and global conditions. Service-specific conditions are specific to certain actions in an AWS service. For example, the condition key ec2:InstanceType supports specific EC2 actions. Global conditions support all actions across all AWS services.

Now that I’ve reviewed the condition element in an IAM policy, let me introduce the new condition.

AWS:RequestedRegion condition key

The new global condition key, , supports all actions across all AWS services. You can use any string operator and specify any AWS region for its value.

Condition keyDescriptionOperator(s)Value
aws:RequestedRegionAllows you to specify the region to which the IAM principal (user or role) can make API callsAll string operators (for example, StringEqualsAny AWS region (for example, us-east-1)

I’ll now demonstrate the use of the new global condition key.

Example: Policy with region-level control

Let’s say a group of software developers in my organization is working on a project using Amazon EC2 and Amazon RDS. The project requires a web server running on an EC2 instance using Amazon Linux and a MySQL database instance in RDS. The developers also want to test Amazon Lambda, an event-driven platform, to retrieve data from the MySQL DB instance in RDS for future use.

My organization requires all the AWS resources to remain in the Frankfurt, eu-central-1, region. To make sure this project follows these guidelines, I create a single IAM policy for all the AWS services that this group is going to use and apply the new global condition key aws:RequestedRegion for all the services. This way I can ensure that any new EC2 instances launched or any database instances created using RDS are in Frankfurt. This policy also ensures that any Lambda functions this group creates for testing are also in the Frankfurt region.


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeAccountAttributes",
                "ec2:DescribeAvailabilityZones",
                "ec2:DescribeInternetGateways",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcAttribute",
                "ec2:DescribeVpcs",
                "ec2:DescribeInstances",
                "ec2:DescribeImages",
                "ec2:DescribeKeyPairs",
                "rds:Describe*",
                "iam:ListRolePolicies",
                "iam:ListRoles",
                "iam:GetRole",
                "iam:ListInstanceProfiles",
                "iam:AttachRolePolicy",
                "lambda:GetAccountSettings"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:RunInstances",
                "rds:CreateDBInstance",
                "rds:CreateDBCluster",
                "lambda:CreateFunction",
                "lambda:InvokeFunction"
            ],
            "Resource": "*",
      "Condition": {"StringEquals": {"aws:RequestedRegion": "eu-central-1"}}

        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": "arn:aws:iam::account-id:role/*"
        }
    ]
}

The first statement in the above example contains all the read-only actions that let my developers use the console for EC2, RDS, and Lambda. The permissions for IAM-related actions are required to launch EC2 instances with a role, enable enhanced monitoring in RDS, and for AWS Lambda to assume the IAM execution role to execute the Lambda function. I’ve combined all the read-only actions into a single statement for simplicity. The second statement is where I give write access to my developers for the three services and restrict the write access to the Frankfurt region using the aws:RequestedRegion condition key. You can also list multiple AWS regions with the new condition key if your developers are allowed to create resources in multiple regions. The third statement grants permissions for the IAM action iam:PassRole required by AWS Lambda. For more information on allowing users to create a Lambda function, see Using Identity-Based Policies for AWS Lambda.

Summary

You can now use the aws:RequestedRegion global condition key in your IAM policies to specify the region to which the IAM principal (user or role) can invoke an API call. This capability makes it easier for you to restrict the AWS regions your IAM principals can use to comply with regulatory standards and improve account security. For more information about this global condition key and policy examples using aws:RequestedRegion, see the IAM documentation.

If you have comments about this post, submit them in the Comments section below. If you have questions about or suggestions for this solution, start a new thread on the IAM forum.

Want more AWS Security news? Follow us on Twitter.

Congratulations to Oracle on MySQL 8.0

Post Syndicated from Michael "Monty" Widenius original http://monty-says.blogspot.com/2018/04/congratulations-to-oracle-on-mysql-80.html

Last week, Oracle announced the general availability of MySQL 8.0. This is good news for database users, as it means Oracle is still developing MySQL.

I decide to celebrate the event by doing a quick test of MySQL 8.0. Here follows a step-by-step description of my first experience with MySQL 8.0.
Note that I did the following without reading the release notes, as is what I have done with every MySQL / MariaDB release up to date; In this case it was not the right thing to do.

I pulled MySQL 8.0 from [email protected]:mysql/mysql-server.git
I was pleasantly surprised that ‘cmake . ; make‘ worked without without any compiler warnings! I even checked the used compiler options and noticed that MySQL was compiled with -Wall + several other warning flags. Good job MySQL team!

I did have a little trouble finding the mysqld binary as Oracle had moved it to ‘runtime_output_directory’; Unexpected, but no big thing.

Now it’s was time to install MySQL 8.0.

I did know that MySQL 8.0 has removed mysql_install_db, so I had to use the mysqld binary directly to install the default databases:
(I have specified datadir=/my/data3 in the /tmp/my.cnf file)

> cd runtime_output_directory
> mkdir /my/data3
> ./mysqld –defaults-file=/tmp/my.cnf –install

2018-04-22T12:38:18.332967Z 1 [ERROR] [MY-011011] [Server] Failed to find valid data directory.
2018-04-22T12:38:18.333109Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed.
2018-04-22T12:38:18.333135Z 0 [ERROR] [MY-010119] [Server] Aborting

A quick look in mysqld –help –verbose output showed that the right command option is –-initialize. My bad, lets try again,

> ./mysqld –defaults-file=/tmp/my.cnf –initialize

2018-04-22T12:39:31.910509Z 0 [ERROR] [MY-010457] [Server] –initialize specified but the data directory has files in it. Aborting.
2018-04-22T12:39:31.910578Z 0 [ERROR] [MY-010119] [Server] Aborting

Now I used the right options, but still didn’t work.
I took a quick look around:

> ls /my/data3/
binlog.index

So even if the mysqld noticed that the data3 directory was wrong, it still wrote things into it.  This even if I didn’t have –log-binlog enabled in the my.cnf file. Strange, but easy to fix:

> rm /my/data3/binlog.index
> ./mysqld –defaults-file=/tmp/my.cnf –initialize

2018-04-22T12:40:45.633637Z 0 [ERROR] [MY-011071] [Server] unknown variable ‘max-tmp-tables=100’
2018-04-22T12:40:45.633657Z 0 [Warning] [MY-010952] [Server] The privilege system failed to initialize correctly. If you have upgraded your server, make sure you’re executing mysql_upgrade to correct the issue.
2018-04-22T12:40:45.633663Z 0 [ERROR] [MY-010119] [Server] Aborting

The warning about the privilege system confused me a bit, but I ignored it for the time being and removed from my configuration files the variables that MySQL 8.0 doesn’t support anymore. I couldn’t find a list of the removed variables anywhere so this was done with the trial and error method.

> ./mysqld –defaults-file=/tmp/my.cnf

2018-04-22T12:42:56.626583Z 0 [ERROR] [MY-010735] [Server] Can’t open the mysql.plugin table. Please run mysql_upgrade to create it.
2018-04-22T12:42:56.827685Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table ‘mysql.gtid_executed’ cannot be opened.
2018-04-22T12:42:56.838501Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2018-04-22T12:42:56.848375Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables
2018-04-22T12:42:56.848863Z 0 [ERROR] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is attached. Therefore, we’re sending the information to the error-log instead: MY-001146 – Table ‘mysql.component’ doesn’t exist
2018-04-22T12:42:56.848916Z 0 [Warning] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is attached. Therefore, we’re sending the information to the error-log instead: MY-003543 – The mysql.component table is missing or has an incorrect definition.
….
2018-04-22T12:42:56.854141Z 0 [System] [MY-010931] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld: ready for connections. Version: ‘8.0.11’ socket: ‘/tmp/mysql.sock’ port: 3306 Source distribution.

I figured out that if there is a single wrong variable in the configuration file, running mysqld –initialize will leave the database in an inconsistent state. NOT GOOD! I am happy I didn’t try this in a production system!

Time to start over from the beginning:

> rm -r /my/data3/*
> ./mysqld –defaults-file=/tmp/my.cnf –initialize

2018-04-22T12:44:45.548960Z 5 [Note] [MY-010454] [Server] A temporary password is generated for [email protected]: px)NaaSp?6um
2018-04-22T12:44:51.221751Z 0 [System] [MY-013170] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld (mysqld 8.0.11) initializing of server has completed

Success!

I wonder why the temporary password is so complex; It could easily have been something that one could easily remember without decreasing security, it’s temporary after all. No big deal, one can always paste it from the logs. (Side note: MariaDB uses socket authentication on many system and thus doesn’t need temporary installation passwords).

Now lets start the MySQL server for real to do some testing:

> ./mysqld –defaults-file=/tmp/my.cnf

2018-04-22T12:45:43.683484Z 0 [System] [MY-010931] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld: ready for connections. Version: ‘8.0.11’ socket: ‘/tmp/mysql.sock’ port: 3306 Source distribution.

And the lets start the client:

> ./client/mysql –socket=/tmp/mysql.sock –user=root –password=”px)NaaSp?6um”
ERROR 2059 (HY000): Plugin caching_sha2_password could not be loaded: /usr/local/mysql/lib/plugin/caching_sha2_password.so: cannot open shared object file: No such file or directory

Apparently MySQL 8.0 doesn’t work with old MySQL / MariaDB clients by default 🙁

I was testing this in a system with MariaDB installed, like all modern Linux system today, and didn’t want to use the MySQL clients or libraries.

I decided to try to fix this by changing the authentication to the native (original) MySQL authentication method.

> mysqld –skip-grant-tables

> ./client/mysql –socket=/tmp/mysql.sock –user=root
ERROR 1045 (28000): Access denied for user ‘root’@’localhost’ (using password: NO)

Apparently –skip-grant-tables is not good enough anymore. Let’s try again with:

> mysqld –skip-grant-tables –default_authentication_plugin=mysql_native_password

> ./client/mysql –socket=/tmp/mysql.sock –user=root mysql
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 8.0.11 Source distribution

Great, we are getting somewhere, now lets fix “root”  to work with the old authenticaion:

MySQL [mysql]> update mysql.user set plugin=”mysql_native_password”,authentication_string=password(“test”) where user=”root”;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘(“test”) where user=”root”‘ at line 1

A quick look in the MySQL 8.0 release notes told me that the PASSWORD() function is removed in 8.0. Why???? I don’t know how one in MySQL 8.0 is supposed to generate passwords compatible with old installations of MySQL. One could of course start an old MySQL or MariaDB version, execute the password() function and copy the result.

I decided to fix this the easy way and use an empty password:

(Update:: I later discovered that the right way would have been to use: FLUSH PRIVILEGES;  ALTER USER’ root’@’localhost’ identified by ‘test’  ; I however dislike this syntax as it has the password in clear text which is easy to grab and the command can’t be used to easily update the mysql.user table. One must also disable the –skip-grant mode to do use this)

MySQL [mysql]> update mysql.user set plugin=”mysql_native_password”,authentication_string=”” where user=”root”;
Query OK, 1 row affected (0.077 sec)
Rows matched: 1 Changed: 1 Warnings: 0
 
I restarted mysqld:
> mysqld –default_authentication_plugin=mysql_native_password

> ./client/mysql –user=root –password=”” mysql
ERROR 1862 (HY000): Your password has expired. To log in you must change it using a client that supports expired passwords.

Ouch, forgot that. Lets try again:

> mysqld –skip-grant-tables –default_authentication_plugin=mysql_native_password

> ./client/mysql –user=root –password=”” mysql
MySQL [mysql]> update mysql.user set password_expired=”N” where user=”root”;

Now restart and test worked:

> ./mysqld –default_authentication_plugin=mysql_native_password

>./client/mysql –user=root –password=”” mysql

Finally I had a working account that I can use to create other users!

When looking at mysqld –help –verbose again. I noticed the option:

–initialize-insecure
Create the default database and exit. Create a super user
with empty password.

I decided to check if this would have made things easier:

> rm -r /my/data3/*
> ./mysqld –defaults-file=/tmp/my.cnf –initialize-insecure

2018-04-22T13:18:06.629548Z 5 [Warning] [MY-010453] [Server] [email protected] is created with an empty password ! Please consider switching off the –initialize-insecure option.

Hm. Don’t understand the warning as–initialize-insecure is not an option that one would use more than one time and thus nothing one would ‘switch off’.

> ./mysqld –defaults-file=/tmp/my.cnf

> ./client/mysql –user=root –password=”” mysql
ERROR 2059 (HY000): Plugin caching_sha2_password could not be loaded: /usr/local/mysql/lib/plugin/caching_sha2_password.so: cannot open shared object file: No such file or directory

Back to the beginning 🙁

To get things to work with old clients, one has to initialize the database with:
> ./mysqld –defaults-file=/tmp/my.cnf –initialize-insecure –default_authentication_plugin=mysql_native_password

Now I finally had MySQL 8.0 up and running and thought I would take it up for a spin by running the “standard” MySQL/MariaDB sql-bench test suite. This was removed in MySQL 5.7, but as I happened to have MariaDB 10.3 installed, I decided to run it from there.

sql-bench is a single threaded benchmark that measures the “raw” speed for some common operations. It gives you the ‘maximum’ performance for a single query. Its different from other benchmarks that measures the maximum throughput when you have a lot of users, but sql-bench still tells you a lot about what kind of performance to expect from the database.

I tried first to be clever and create the “test” database, that I needed for sql-bench, with
> mkdir /my/data3/test

but when I tried to run the benchmark, MySQL 8.0 complained that the test database didn’t exist.

MySQL 8.0 has gone away from the original concept of MySQL where the user can easily
create directories and copy databases into the database directory. This may have serious
implication for anyone doing backup of databases and/or trying to restore a backup with normal OS commands.

I created the ‘test’ database with mysqladmin and then tried to run sql-bench:

> ./run-all-tests –user=root

The first run failed in test-ATIS:

Can’t execute command ‘create table class_of_service (class_code char(2) NOT NULL,rank tinyint(2) NOT NULL,class_description char(80) NOT NULL,PRIMARY KEY (class_code))’
Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘rank tinyint(2) NOT NULL,class_description char(80) NOT NULL,PRIMARY KEY (class_’ at line 1

This happened because ‘rank‘ is now a reserved word in MySQL 8.0. This is also reserved in ANSI SQL, but I don’t know of any other database that has failed to run test-ATIS before. I have in the past run it against Oracle, PostgreSQL, Mimer, MSSQL etc without any problems.

MariaDB also has ‘rank’ as a keyword in 10.2 and 10.3 but one can still use it as an identifier.

I fixed test-ATIS and then managed to run all tests on MySQL 8.0.

I did run the test both with MySQL 8.0 and MariaDB 10.3 with the InnoDB storage engine and by having identical values for all InnoDB variables, table-definition-cache and table-open-cache. I turned off performance schema for both databases. All test are run with a user with an empty password (to keep things comparable and because it’s was too complex to generate a password in MySQL 8.0)

The result are as follows
Results per test in seconds:

Operation         |MariaDB|MySQL-8|

———————————–
ATIS              | 153.00| 228.00|
alter-table       |  92.00| 792.00|
big-tables        | 990.00|2079.00|
connect           | 186.00| 227.00|
create            | 575.00|4465.00|
insert            |4552.00|8458.00|
select            | 333.00| 412.00|
table-elimination |1900.00|3916.00|
wisconsin         | 272.00| 590.00|
———————————–

This is of course just a first view of the performance of MySQL 8.0 in a single user environment. Some reflections about the results:

  • Alter-table test is slower (as expected) in 8.0 as some of the alter tests benefits of the instant add column in MariaDB 10.3.
  • connect test is also better for MariaDB as we put a lot of efforts to speed this up in MariaDB 10.2
  • table-elimination shows an optimization in MariaDB for the  Anchor table model, which MySQL doesn’t have.
  • CREATE and DROP TABLE is almost 8 times slower in MySQL 8.0 than in MariaDB 10.3. I assume this is the cost of ‘atomic DDL’. This may also cause performance problems for any thread using the data dictionary when another thread is creating/dropping tables.
  • When looking at the individual test results, MySQL 8.0 was slower in almost every test, in many significantly slower.
  • The only test where MySQL was faster was “update_with_key_prefix”. I checked this and noticed that there was a bug in the test and the columns was updated to it’s original value (which should be instant with any storage engine). This is an old bug that MySQL has found and fixed and that we have not been aware of in the test or in MariaDB.
  • While writing this, I noticed that MySQL 8.0 is now using utf8mb4 as the default character set instead of latin1. This may affect some of the benchmarks slightly (not much as most tests works with numbers and Oracle claims that utf8mb4 is only 20% slower than latin1), but needs to be verified.
  • Oracle claims that MySQL 8.0 is much faster on multi user benchmarks. The above test indicates that they may have done this by sacrificing single user performance.
  •  We need to do more and many different benchmarks to better understand exactly what is going on. Stay tuned!

Short summary of my first run with MySQL 8.0:

  • Using the new caching_sha2_password authentication as default for new installation is likely to cause a lot of problems for users. No old application will be able to use MySQL 8.0, installed with default options, without moving to MySQL’s client libraries. While working on this blog I saw MySQL users complain on IRC that not even MySQL Workbench can authenticate with MySQL 8.0. This is the first time in MySQL’s history where such an incompatible change has ever been done!
  • Atomic DDL is a good thing (We plan to have this in MariaDB 10.4), but it should not have such a drastic impact on performance. I am also a bit skeptical of MySQL 8.0 having just one copy of the data dictionary as if this gets corrupted you will lose all your data. (Single point of failure)
  • MySQL 8.0 has several new reserved words and has removed a lot of variables, which makes upgrades hard. Before upgrading to MySQL 8.0 one has to check all one’s databases and applications to ensure that there are no conflicts.
  • As my test above shows, if you have a single deprecated variable in your configuration files, the installation of MySQL will abort and can leave the database in inconsistent state. I did of course my tests by installing into an empty data dictionary, but one can assume that some of the problems may also happen when upgrading an old installation.

Conclusions:
In many ways, MySQL 8.0 has caught up with some earlier versions of MariaDB. For instance, in MariaDB 10.0, we introduced roles (four years ago). In MariaDB 10.1, we introduced encrypted redo/undo logs (three years ago). In MariaDB 10.2, we introduced window functions and CTEs (a year ago). However, some catch-up of MariaDB Server 10.2 features still remains for MySQL (such as check constraints, binlog compression, and log-based rollback).

MySQL 8.0 has a few new interesting features (mostly Atomic DDL and JSON TABLE functions), but at the same time MySQL has strayed away from some of the fundamental corner stone principles of MySQL:

From the start of the first version of MySQL in 1995, all development has been focused around 3 core principles:

  • Ease of use
  • Performance
  • Stability

With MySQL 8.0, Oracle has sacrifices 2 of 3 of these.

In addition (as part of ease of use), while I was working on MySQL, we did our best to ensure that the following should hold:

  • Upgrades should be trivial
  • Things should be kept compatible, if possible (don’t remove features/options/functions that are used)
  • Minimize reserved words, don’t remove server variables
  • One should be able to use normal OS commands to create and drop databases, copy and move tables around within the same system or between different systems. With 8.0 and data dictionary taking backups of specific tables will be hard, even if the server is not running.
  • mysqldump should always be usable backups and to move to new releases
  • Old clients and application should be able to use ‘any’ MySQL server version unchanged. (Some Oracle client libraries, like C++, by default only supports the new X protocol and can thus not be used with older MySQL or any MariaDB version)

We plan to add a data dictionary to MariaDB 10.4 or MariaDB 10.5, but in a way to not sacrifice any of the above principles!

The competition between MySQL and MariaDB is not just about a tactical arms race on features. It’s about design philosophy, or strategic vision, if you will.

This shows in two main ways: our respective view of the Storage Engine structure, and of the top-level direction of the roadmap.

On the Storage Engine side, MySQL is converging on InnoDB, even for clustering and partitioning. In doing so, they are abandoning the advantages of multiple ways of storing data. By contrast, MariaDB sees lots of value in the Storage Engine architecture: MariaDB Server 10.3 will see the general availability of MyRocks (for write-intensive workloads) and Spider (for scalable workloads). On top of that, we have ColumnStore for analytical workloads. One can use the CONNECT engine to join with other databases. The use of different storage engines for different workloads and different hardware is a competitive differentiator, now more than ever.

On the roadmap side, MySQL is carefully steering clear of features that close the gap between MySQL and Oracle. MariaDB has no such constraints. With MariaDB 10.3, we are introducing PL/SQL compatibility (Oracle’s stored procedures) and AS OF (built-in system versioned tables with point-in-time querying). For both of those features, MariaDB is the first Open Source database doing so. I don’t except Oracle to provide any of the above features in MySQL!

Also on the roadmap side, MySQL is not working with the ecosystem in extending the functionality. In 2017, MariaDB accepted more code contributions in one year, than MySQL has done during its entire lifetime, and the rate is increasing!

I am sure that the experience I had with testing MySQL 8.0 would have been significantly better if MySQL would have an open development model where the community could easily participate in developing and testing MySQL continuously. Most of the confusing error messages and strange behavior would have been found and fixed long before the GA release.

Before upgrading to MySQL 8.0 please read https://dev.mysql.com/doc/refman/8.0/en/upgrading-from-previous-series.html to see what problems you can run into! Don’t expect that old installations or applications will work out of the box without testing as a lot of features and options has been removed (query cache, partition of myisam tables etc)! You probably also have to revise your backup methods, especially if you want to ever restore just a few tables. (With 8.0, I don’t know how this can be easily done).

According to the MySQL 8.0 release notes, one can’t use mysqldump to copy a database to MySQL 8.0. One has to first to move to a MySQL 5.7 GA version (with mysqldump, as recommended by Oracle) and then to MySQL 8.0 with in-place update. I assume this means that all old mysqldump backups are useless for MySQL 8.0?

MySQL 8.0 seams to be a one way street to an unknown future. Up to MySQL 5.7 it has been trivial to move to MariaDB and one could always move back to MySQL with mysqldump. All MySQL client libraries has worked with MariaDB and all MariaDB client libraries has worked with MySQL. With MySQL 8.0 this has changed in the wrong direction.

As long as you are using MySQL 5.7 and below you have choices for your future, after MySQL 8.0 you have very little choice. But don’t despair, as MariaDB will always be able to load a mysqldump file and it’s very easy to upgrade your old MySQL installation to MariaDB 🙂

I wish you good luck to try MySQL 8.0 (and also the upcoming MariaDB 10.3)!

Implement continuous integration and delivery of serverless AWS Glue ETL applications using AWS Developer Tools

Post Syndicated from Prasad Alle original https://aws.amazon.com/blogs/big-data/implement-continuous-integration-and-delivery-of-serverless-aws-glue-etl-applications-using-aws-developer-tools/

AWS Glue is an increasingly popular way to develop serverless ETL (extract, transform, and load) applications for big data and data lake workloads. Organizations that transform their ETL applications to cloud-based, serverless ETL architectures need a seamless, end-to-end continuous integration and continuous delivery (CI/CD) pipeline: from source code, to build, to deployment, to product delivery. Having a good CI/CD pipeline can help your organization discover bugs before they reach production and deliver updates more frequently. It can also help developers write quality code and automate the ETL job release management process, mitigate risk, and more.

AWS Glue is a fully managed data catalog and ETL service. It simplifies and automates the difficult and time-consuming tasks of data discovery, conversion, and job scheduling. AWS Glue crawls your data sources and constructs a data catalog using pre-built classifiers for popular data formats and data types, including CSV, Apache Parquet, JSON, and more.

When you are developing ETL applications using AWS Glue, you might come across some of the following CI/CD challenges:

  • Iterative development with unit tests
  • Continuous integration and build
  • Pushing the ETL pipeline to a test environment
  • Pushing the ETL pipeline to a production environment
  • Testing ETL applications using real data (live test)
  • Exploring and validating data

In this post, I walk you through a solution that implements a CI/CD pipeline for serverless AWS Glue ETL applications supported by AWS Developer Tools (including AWS CodePipeline, AWS CodeCommit, and AWS CodeBuild) and AWS CloudFormation.

Solution overview

The following diagram shows the pipeline workflow:

This solution uses AWS CodePipeline, which lets you orchestrate and automate the test and deploy stages for ETL application source code. The solution consists of a pipeline that contains the following stages:

1.) Source Control: In this stage, the AWS Glue ETL job source code and the AWS CloudFormation template file for deploying the ETL jobs are both committed to version control. I chose to use AWS CodeCommit for version control.

To get the ETL job source code and AWS CloudFormation template, download the gluedemoetl.zip file. This solution is developed based on a previous post, Build a Data Lake Foundation with AWS Glue and Amazon S3.

2.) LiveTest: In this stage, all resources—including AWS Glue crawlers, jobs, S3 buckets, roles, and other resources that are required for the solution—are provisioned, deployed, live tested, and cleaned up.

The LiveTest stage includes the following actions:

  • Deploy: In this action, all the resources that are required for this solution (crawlers, jobs, buckets, roles, and so on) are provisioned and deployed using an AWS CloudFormation template.
  • AutomatedLiveTest: In this action, all the AWS Glue crawlers and jobs are executed and data exploration and validation tests are performed. These validation tests include, but are not limited to, record counts in both raw tables and transformed tables in the data lake and any other business validations. I used AWS CodeBuild for this action.
  • LiveTestApproval: This action is included for the cases in which a pipeline administrator approval is required to deploy/promote the ETL applications to the next stage. The pipeline pauses in this action until an administrator manually approves the release.
  • LiveTestCleanup: In this action, all the LiveTest stage resources, including test crawlers, jobs, roles, and so on, are deleted using the AWS CloudFormation template. This action helps minimize cost by ensuring that the test resources exist only for the duration of the AutomatedLiveTest and LiveTestApproval

3.) DeployToProduction: In this stage, all the resources are deployed using the AWS CloudFormation template to the production environment.

Try it out

This code pipeline takes approximately 20 minutes to complete the LiveTest test stage (up to the LiveTest approval stage, in which manual approval is required).

To get started with this solution, choose Launch Stack:

This creates the CI/CD pipeline with all of its stages, as described earlier. It performs an initial commit of the sample AWS Glue ETL job source code to trigger the first release change.

In the AWS CloudFormation console, choose Create. After the template finishes creating resources, you see the pipeline name on the stack Outputs tab.

After that, open the CodePipeline console and select the newly created pipeline. Initially, your pipeline’s CodeCommit stage shows that the source action failed.

Allow a few minutes for your new pipeline to detect the initial commit applied by the CloudFormation stack creation. As soon as the commit is detected, your pipeline starts. You will see the successful stage completion status as soon as the CodeCommit source stage runs.

In the CodeCommit console, choose Code in the navigation pane to view the solution files.

Next, you can watch how the pipeline goes through the LiveTest stage of the deploy and AutomatedLiveTest actions, until it finally reaches the LiveTestApproval action.

At this point, if you check the AWS CloudFormation console, you can see that a new template has been deployed as part of the LiveTest deploy action.

At this point, make sure that the AWS Glue crawlers and the AWS Glue job ran successfully. Also check whether the corresponding databases and external tables have been created in the AWS Glue Data Catalog. Then verify that the data is validated using Amazon Athena, as shown following.

Open the AWS Glue console, and choose Databases in the navigation pane. You will see the following databases in the Data Catalog:

Open the Amazon Athena console, and run the following queries. Verify that the record counts are matching.

SELECT count(*) FROM "nycitytaxi_gluedemocicdtest"."data";
SELECT count(*) FROM "nytaxiparquet_gluedemocicdtest"."datalake";

The following shows the raw data:

The following shows the transformed data:

The pipeline pauses the action until the release is approved. After validating the data, manually approve the revision on the LiveTestApproval action on the CodePipeline console.

Add comments as needed, and choose Approve.

The LiveTestApproval stage now appears as Approved on the console.

After the revision is approved, the pipeline proceeds to use the AWS CloudFormation template to destroy the resources that were deployed in the LiveTest deploy action. This helps reduce cost and ensures a clean test environment on every deployment.

Production deployment is the final stage. In this stage, all the resources—AWS Glue crawlers, AWS Glue jobs, Amazon S3 buckets, roles, and so on—are provisioned and deployed to the production environment using the AWS CloudFormation template.

After successfully running the whole pipeline, feel free to experiment with it by changing the source code stored on AWS CodeCommit. For example, if you modify the AWS Glue ETL job to generate an error, it should make the AutomatedLiveTest action fail. Or if you change the AWS CloudFormation template to make its creation fail, it should affect the LiveTest deploy action. The objective of the pipeline is to guarantee that all changes that are deployed to production are guaranteed to work as expected.

Conclusion

In this post, you learned how easy it is to implement CI/CD for serverless AWS Glue ETL solutions with AWS developer tools like AWS CodePipeline and AWS CodeBuild at scale. Implementing such solutions can help you accelerate ETL development and testing at your organization.

If you have questions or suggestions, please comment below.

 


Additional Reading

If you found this post useful, be sure to check out Implement Continuous Integration and Delivery of Apache Spark Applications using AWS and Build a Data Lake Foundation with AWS Glue and Amazon S3.

 


About the Authors

Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.

 
Luis Caro is a Big Data Consultant for AWS Professional Services. He works with our customers to provide guidance and technical assistance on big data projects, helping them improving the value of their solutions when using AWS.

 

 

 

How to retain system tables’ data spanning multiple Amazon Redshift clusters and run cross-cluster diagnostic queries

Post Syndicated from Karthik Sonti original https://aws.amazon.com/blogs/big-data/how-to-retain-system-tables-data-spanning-multiple-amazon-redshift-clusters-and-run-cross-cluster-diagnostic-queries/

Amazon Redshift is a data warehouse service that logs the history of the system in STL log tables. The STL log tables manage disk space by retaining only two to five days of log history, depending on log usage and available disk space.

To retain STL tables’ data for an extended period, you usually have to create a replica table for every system table. Then, for each you load the data from the system table into the replica at regular intervals. By maintaining replica tables for STL tables, you can run diagnostic queries on historical data from the STL tables. You then can derive insights from query execution times, query plans, and disk-spill patterns, and make better cluster-sizing decisions. However, refreshing replica tables with live data from STL tables at regular intervals requires schedulers such as Cron or AWS Data Pipeline. Also, these tables are specific to one cluster and they are not accessible after the cluster is terminated. This is especially true for transient Amazon Redshift clusters that last for only a finite period of ad hoc query execution.

In this blog post, I present a solution that exports system tables from multiple Amazon Redshift clusters into an Amazon S3 bucket. This solution is serverless, and you can schedule it as frequently as every five minutes. The AWS CloudFormation deployment template that I provide automates the solution setup in your environment. The system tables’ data in the Amazon S3 bucket is partitioned by cluster name and query execution date to enable efficient joins in cross-cluster diagnostic queries.

I also provide another CloudFormation template later in this post. This second template helps to automate the creation of tables in the AWS Glue Data Catalog for the system tables’ data stored in Amazon S3. After the system tables are exported to Amazon S3, you can run cross-cluster diagnostic queries on the system tables’ data and derive insights about query executions in each Amazon Redshift cluster. You can do this using Amazon QuickSight, Amazon Athena, Amazon EMR, or Amazon Redshift Spectrum.

You can find all the code examples in this post, including the CloudFormation templates, AWS Glue extract, transform, and load (ETL) scripts, and the resolution steps for common errors you might encounter in this GitHub repository.

Solution overview

The solution in this post uses AWS Glue to export system tables’ log data from Amazon Redshift clusters into Amazon S3. The AWS Glue ETL jobs are invoked at a scheduled interval by AWS Lambda. AWS Systems Manager, which provides secure, hierarchical storage for configuration data management and secrets management, maintains the details of Amazon Redshift clusters for which the solution is enabled. The last-fetched time stamp values for the respective cluster-table combination are maintained in an Amazon DynamoDB table.

The following diagram covers the key steps involved in this solution.

The solution as illustrated in the preceding diagram flows like this:

  1. The Lambda function, invoke_rs_stl_export_etl, is triggered at regular intervals, as controlled by Amazon CloudWatch. It’s triggered to look up the AWS Systems Manager parameter store to get the details of the Amazon Redshift clusters for which the system table export is enabled.
  2. The same Lambda function, based on the Amazon Redshift cluster details obtained in step 1, invokes the AWS Glue ETL job designated for the Amazon Redshift cluster. If an ETL job for the cluster is not found, the Lambda function creates one.
  3. The ETL job invoked for the Amazon Redshift cluster gets the cluster credentials from the parameter store. It gets from the DynamoDB table the last exported time stamp of when each of the system tables was exported from the respective Amazon Redshift cluster.
  4. The ETL job unloads the system tables’ data from the Amazon Redshift cluster into an Amazon S3 bucket.
  5. The ETL job updates the DynamoDB table with the last exported time stamp value for each system table exported from the Amazon Redshift cluster.
  6. The Amazon Redshift cluster system tables’ data is available in Amazon S3 and is partitioned by cluster name and date for running cross-cluster diagnostic queries.

Understanding the configuration data

This solution uses AWS Systems Manager parameter store to store the Amazon Redshift cluster credentials securely. The parameter store also securely stores other configuration information that the AWS Glue ETL job needs for extracting and storing system tables’ data in Amazon S3. Systems Manager comes with a default AWS Key Management Service (AWS KMS) key that it uses to encrypt the password component of the Amazon Redshift cluster credentials.

The following table explains the global parameters and cluster-specific parameters required in this solution. The global parameters are defined once and applicable at the overall solution level. The cluster-specific parameters are specific to an Amazon Redshift cluster and repeat for each cluster for which you enable this post’s solution. The CloudFormation template explained later in this post creates these parameters as part of the deployment process.

Parameter nameTypeDescription
Global parametersdefined once and applied to all jobs
redshift_query_logs.global.s3_prefixStringThe Amazon S3 path where the query logs are exported. Under this path, each exported table is partitioned by cluster name and date.
redshift_query_logs.global.tempdirStringThe Amazon S3 path that AWS Glue ETL jobs use for temporarily staging the data.
redshift_query_logs.global.role>StringThe name of the role that the AWS Glue ETL jobs assume. Just the role name is sufficient. The complete Amazon Resource Name (ARN) is not required.
redshift_query_logs.global.enabled_cluster_listStringListA comma-separated list of cluster names for which system tables’ data export is enabled. This gives flexibility for a user to exclude certain clusters.
Cluster-specific parametersfor each cluster specified in the enabled_cluster_list parameter
redshift_query_logs.<<cluster_name>>.connectionStringThe name of the AWS Glue Data Catalog connection to the Amazon Redshift cluster. For example, if the cluster name is product_warehouse, the entry is redshift_query_logs.product_warehouse.connection.
redshift_query_logs.<<cluster_name>>.userStringThe user name that AWS Glue uses to connect to the Amazon Redshift cluster.
redshift_query_logs.<<cluster_name>>.passwordSecure StringThe password that AWS Glue uses to connect the Amazon Redshift cluster’s encrypted-by key that is managed in AWS KMS.

For example, suppose that you have two Amazon Redshift clusters, product-warehouse and category-management, for which the solution described in this post is enabled. In this case, the parameters shown in the following screenshot are created by the solution deployment CloudFormation template in the AWS Systems Manager parameter store.

Solution deployment

To make it easier for you to get started, I created a CloudFormation template that automatically configures and deploys the solution—only one step is required after deployment.

Prerequisites

To deploy the solution, you must have one or more Amazon Redshift clusters in a private subnet. This subnet must have a network address translation (NAT) gateway or a NAT instance configured, and also a security group with a self-referencing inbound rule for all TCP ports. For more information about why AWS Glue ETL needs the configuration it does, described previously, see Connecting to a JDBC Data Store in a VPC in the AWS Glue documentation.

To start the deployment, launch the CloudFormation template:

CloudFormation stack parameters

The following table lists and describes the parameters for deploying the solution to export query logs from multiple Amazon Redshift clusters.

PropertyDefaultDescription
S3BucketmybucketThe bucket this solution uses to store the exported query logs, stage code artifacts, and perform unloads from Amazon Redshift. For example, the mybucket/extract_rs_logs/data bucket is used for storing all the exported query logs for each system table partitioned by the cluster. The mybucket/extract_rs_logs/temp/ bucket is used for temporarily staging the unloaded data from Amazon Redshift. The mybucket/extract_rs_logs/code bucket is used for storing all the code artifacts required for Lambda and the AWS Glue ETL jobs.
ExportEnabledRedshiftClustersRequires InputA comma-separated list of cluster names from which the system table logs need to be exported.
DataStoreSecurityGroupsRequires InputA list of security groups with an inbound rule to the Amazon Redshift clusters provided in the parameter, ExportEnabledClusters. These security groups should also have a self-referencing inbound rule on all TCP ports, as explained on Connecting to a JDBC Data Store in a VPC.

After you launch the template and create the stack, you see that the following resources have been created:

  1. AWS Glue connections for each Amazon Redshift cluster you provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
  2. All parameters required for this solution created in the parameter store.
  3. The Lambda function that invokes the AWS Glue ETL jobs for each configured Amazon Redshift cluster at a regular interval of five minutes.
  4. The DynamoDB table that captures the last exported time stamps for each exported cluster-table combination.
  5. The AWS Glue ETL jobs to export query logs from each Amazon Redshift cluster provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
  6. The IAM roles and policies required for the Lambda function and AWS Glue ETL jobs.

After the deployment

For each Amazon Redshift cluster for which you enabled the solution through the CloudFormation stack parameter, ExportEnabledRedshiftClusters, the automated deployment includes temporary credentials that you must update after the deployment:

  1. Go to the parameter store.
  2. Note the parameters <<cluster_name>>.user and redshift_query_logs.<<cluster_name>>.password that correspond to each Amazon Redshift cluster for which you enabled this solution. Edit these parameters to replace the placeholder values with the right credentials.

For example, if product-warehouse is one of the clusters for which you enabled system table export, you edit these two parameters with the right user name and password and choose Save parameter.

Querying the exported system tables

Within a few minutes after the solution deployment, you should see Amazon Redshift query logs being exported to the Amazon S3 location, <<S3Bucket_you_provided>>/extract_redshift_query_logs/data/. In that bucket, you should see the eight system tables partitioned by customer name and date: stl_alert_event_log, stl_dlltext, stl_explain, stl_query, stl_querytext, stl_scan, stl_utilitytext, and stl_wlm_query.

To run cross-cluster diagnostic queries on the exported system tables, create external tables in the AWS Glue Data Catalog. To make it easier for you to get started, I provide a CloudFormation template that creates an AWS Glue crawler, which crawls the exported system tables stored in Amazon S3 and builds the external tables in the AWS Glue Data Catalog.

Launch this CloudFormation template to create external tables that correspond to the Amazon Redshift system tables. S3Bucket is the only input parameter required for this stack deployment. Provide the same Amazon S3 bucket name where the system tables’ data is being exported. After you successfully create the stack, you can see the eight tables in the database, redshift_query_logs_db, as shown in the following screenshot.

Now, navigate to the Athena console to run cross-cluster diagnostic queries. The following screenshot shows a diagnostic query executed in Athena that retrieves query alerts logged across multiple Amazon Redshift clusters.

You can build the following example Amazon QuickSight dashboard by running cross-cluster diagnostic queries on Athena to identify the hourly query count and the key query alert events across multiple Amazon Redshift clusters.

How to extend the solution

You can extend this post’s solution in two ways:

  • Add any new Amazon Redshift clusters that you spin up after you deploy the solution.
  • Add other system tables or custom query results to the list of exports from an Amazon Redshift cluster.

Extend the solution to other Amazon Redshift clusters

To extend the solution to more Amazon Redshift clusters, add the three cluster-specific parameters in the AWS Systems Manager parameter store following the guidelines earlier in this post. Modify the redshift_query_logs.global.enabled_cluster_list parameter to append the new cluster to the comma-separated string.

Extend the solution to add other tables or custom queries to an Amazon Redshift cluster

The current solution ships with the export functionality for the following Amazon Redshift system tables:

  • stl_alert_event_log
  • stl_dlltext
  • stl_explain
  • stl_query
  • stl_querytext
  • stl_scan
  • stl_utilitytext
  • stl_wlm_query

You can easily add another system table or custom query by adding a few lines of code to the AWS Glue ETL job, <<cluster-name>_extract_rs_query_logs. For example, suppose that from the product-warehouse Amazon Redshift cluster you want to export orders greater than $2,000. To do so, add the following five lines of code to the AWS Glue ETL job product-warehouse_extract_rs_query_logs, where product-warehouse is your cluster name:

  1. Get the last-processed time-stamp value. The function creates a value if it doesn’t already exist.

salesLastProcessTSValue = functions.getLastProcessedTSValue(trackingEntry=”mydb.sales_2000",job_configs=job_configs)

  1. Run the custom query with the time stamp.

returnDF=functions.runQuery(query="select * from sales s join order o where o.order_amnt > 2000 and sale_timestamp > '{}'".format (salesLastProcessTSValue) ,tableName="mydb.sales_2000",job_configs=job_configs)

  1. Save the results to Amazon S3.

functions.saveToS3(dataframe=returnDF,s3Prefix=s3Prefix,tableName="mydb.sales_2000",partitionColumns=["sale_date"],job_configs=job_configs)

  1. Get the latest time-stamp value from the returned data frame in Step 2.

latestTimestampVal=functions.getMaxValue(returnDF,"sale_timestamp",job_configs)

  1. Update the last-processed time-stamp value in the DynamoDB table.

functions.updateLastProcessedTSValue(“mydb.sales_2000",latestTimestampVal[0],job_configs)

Conclusion

In this post, I demonstrate a serverless solution to retain the system tables’ log data across multiple Amazon Redshift clusters. By using this solution, you can incrementally export the data from system tables into Amazon S3. By performing this export, you can build cross-cluster diagnostic queries, build audit dashboards, and derive insights into capacity planning by using services such as Athena. I also demonstrate how you can extend this solution to other ad hoc query use cases or tables other than system tables by adding a few lines of code.


Additional Reading

If you found this post useful, be sure to check out Using Amazon Redshift Spectrum, Amazon Athena, and AWS Glue with Node.js in Production and Amazon Redshift – 2017 Recap.


About the Author

Karthik Sonti is a senior big data architect at Amazon Web Services. He helps AWS customers build big data and analytical solutions and provides guidance on architecture and best practices.

 

 

 

 

AWS Secrets Manager: Store, Distribute, and Rotate Credentials Securely

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/aws-secrets-manager-store-distribute-and-rotate-credentials-securely/

Today we’re launching AWS Secrets Manager which makes it easy to store and retrieve your secrets via API or the AWS Command Line Interface (CLI) and rotate your credentials with built-in or custom AWS Lambda functions. Managing application secrets like database credentials, passwords, or API Keys is easy when you’re working locally with one machine and one application. As you grow and scale to many distributed microservices, it becomes a daunting task to securely store, distribute, rotate, and consume secrets. Previously, customers needed to provision and maintain additional infrastructure solely for secrets management which could incur costs and introduce unneeded complexity into systems.

AWS Secrets Manager

Imagine that I have an application that takes incoming tweets from Twitter and stores them in an Amazon Aurora database. Previously, I would have had to request a username and password from my database administrator and embed those credentials in environment variables or, in my race to production, even in the application itself. I would also need to have our social media manager create the Twitter API credentials and figure out how to store those. This is a fairly manual process, involving multiple people, that I have to restart every time I want to rotate these credentials. With Secrets Manager my database administrator can provide the credentials in secrets manager once and subsequently rely on a Secrets Manager provided Lambda function to automatically update and rotate those credentials. My social media manager can put the Twitter API keys in Secrets Manager which I can then access with a simple API call and I can even rotate these programmatically with a custom lambda function calling out to the Twitter API. My secrets are encrypted with the KMS key of my choice, and each of these administrators can explicitly grant access to these secrets with with granular IAM policies for individual roles or users.

Let’s take a look at how I would store a secret using the AWS Secrets Manager console. First, I’ll click Store a new secret to get to the new secrets wizard. For my RDS Aurora instance it’s straightforward to simply select the instance and provide the initial username and password to connect to the database.

Next, I’ll fill in a quick description and a name to access my secret by. You can use whatever naming scheme you want here.

Next, we’ll configure rotation to use the Secrets Manager-provided Lambda function to rotate our password every 10 days.

Finally, we’ll review all the details and check out our sample code for storing and retrieving our secret!

Finally I can review the secrets in the console.

Now, if I needed to access these secrets I’d simply call the API.

import json
import boto3
secrets = boto3.client("secretsmanager")
rds = json.dumps(secrets.get_secrets_value("prod/TwitterApp/Database")['SecretString'])
print(rds)

Which would give me the following values:


{'engine': 'mysql',
 'host': 'twitterapp2.abcdefg.us-east-1.rds.amazonaws.com',
 'password': '-)Kw>THISISAFAKEPASSWORD:lg{&sad+Canr',
 'port': 3306,
 'username': 'ranman'}

More than passwords

AWS Secrets Manager works for more than just passwords. I can store OAuth credentials, binary data, and more. Let’s look at storing my Twitter OAuth application keys.

Now, I can define the rotation for these third-party OAuth credentials with a custom AWS Lambda function that can call out to Twitter whenever we need to rotate our credentials.

Custom Rotation

One of the niftiest features of AWS Secrets Manager is custom AWS Lambda functions for credential rotation. This allows you to define completely custom workflows for credentials. Secrets Manager will call your lambda with a payload that includes a Step which specifies which step of the rotation you’re in, a SecretId which specifies which secret the rotation is for, and importantly a ClientRequestToken which is used to ensure idempotency in any changes to the underlying secret.

When you’re rotating secrets you go through a few different steps:

  1. createSecret
  2. setSecret
  3. testSecret
  4. finishSecret

The advantage of these steps is that you can add any kind of approval steps you want for each phase of the rotation. For more details on custom rotation check out the documentation.

Available Now
AWS Secrets Manager is available today in US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), EU (Frankfurt), EU (Ireland), EU (London), and South America (São Paulo). Secrets are priced at $0.40 per month per secret and $0.05 per 10,000 API calls. I’m looking forward to seeing more users adopt rotating credentials to secure their applications!

Randall

AWS Key Management Service now offers FIPS 140-2 validated cryptographic modules enabling easier adoption of the service for regulated workloads

Post Syndicated from Sreekumar Pisharody original https://aws.amazon.com/blogs/security/aws-key-management-service-now-offers-fips-140-2-validated-cryptographic-modules-enabling-easier-adoption-of-the-service-for-regulated-workloads/

AWS Key Management Service (KMS) now uses FIPS 140-2 validated hardware security modules (HSM) and supports FIPS 140-2 validated endpoints, which provide independent assurances about the confidentiality and integrity of your keys. Having additional third-party assurances about the keys you manage in AWS KMS can make it easier to use the service for regulated workloads.

The process of gaining FIPS 140-2 validation is rigorous. First, AWS KMS HSMs were tested by an independent lab; those results were further reviewed by the Cryptographic Module Validation Program run by NIST. You can view the FIPS 140-2 certificate of the AWS Key Management Service HSM to get more details.

AWS KMS HSMs are designed so that no one, not even AWS employees, can retrieve your plaintext keys. The service uses the FIPS 140-2 validated HSMs to protect your keys when you request the service to create keys on your behalf or when you import them. Your plaintext keys are never written to disk and are only used in volatile memory of the HSMs while performing your requested cryptographic operation. Furthermore, AWS KMS keys are never transmitted outside the AWS Regions they were created. And HSM firmware updates are controlled by multi-party access that is audited and reviewed by an independent group within AWS.

AWS KMS HSMs are validated at level 2 overall and at level 3 in the following areas:

  • Cryptographic Module Specification
  • Roles, Services, and Authentication
  • Physical Security
  • Design Assurance

You can also make AWS KMS requests to API endpoints that terminate TLS sessions using a FIPS 140-2 validated cryptographic software module. To do so, connect to the unique FIPS 140-2 validated HTTPS endpoints in the AWS KMS requests made from your applications. AWS KMS FIPS 140-2 validated HTTPS endpoints are powered by the OpenSSL FIPS Object Module. FIPS 140-2 validated API endpoints are available in all commercial regions where AWS KMS is available.