All posts by Jonathan Jenkyn

Architecting for database encryption on AWS

Post Syndicated from Jonathan Jenkyn original https://aws.amazon.com/blogs/security/architecting-for-database-encryption-on-aws/

In this post, I review the options you have to protect your customer data when migrating or building new databases in Amazon Web Services (AWS). I focus on how you can support sensitive workloads in ways that help you maintain compliance and regulatory obligations, and meet security objectives.

Understanding transparent data encryption

I commonly see enterprise customers migrating existing databases straight from on-premises to AWS without reviewing their design. This might seem simpler and faster, but they miss the opportunity to review the scalability, cost-savings, and feature capability of native cloud services. A straight lift and shift migration can also create unnecessary operational overheads, carry-over unneeded complexity, and result in more time spent troubleshooting and responding to events over time.

One example is when enterprise customers who are using Transparent Data Encryption (TDE) or Extensible Key Management (EKM) technologies want to reuse the same technologies in their migration to AWS. TDE and EKM are database technologies that encrypt and decrypt database records as the records are written and read to the underlying storage medium. Customers use TDE features in Microsoft SQL Server, Oracle 10g and 11g, and Oracle Enterprise Edition to meet requirements for data-at-rest encryption. This shouldn’t mean that TDE is the requirement. It’s infrequent that an organizational policy or compliance framework specifies a technology such as TDE in the actual requirement. For example, the Payment Card Industry Data Security Standard (PCI-DSS) standard requires that sensitive data must be protected using “Strong cryptography with associated key-management processes and procedures.” Nowhere does PCI-DSS endorse or require the use of a specific technology.

Understanding risks

It’s important that you understand the risks that encryption-at-rest mitigates before selecting a technology to use. Encryption-at-rest, in the context of databases, generally manages the risk that one of the disks used to store database data is physically stolen and thus compromised. In on-premises scenarios, TDE is an effective technology used to manage this risk. All data from the database—up to and including the disk—is encrypted. The database manages all key management and cryptographic operations. You can also use TDE with a hardware security module (HSM) so that the keys and cryptography for the database are managed outside of the database itself. In TDE implementations, the HSM is used only to manage the key encryption keys (KEK), and not the data encryption keys (DEK) themselves. The DEKs are in volatile memory in the database at runtime, and so the cryptographic operations occur on the database itself.

You can also use native operating system encryption technologies such as dm-crypt or LUKS (Linux Unified Key Setup). Dm-crypt is a full disk encryption (FDE) subsystem in Linux kernel version 2.6 and beyond. Dm-crypt can be used on its own or with LUKS as an extension to add more features. When using dm-crypt, the operating system kernel is responsible for encrypting and decrypting data as it’s written and read from the attached volumes. This would achieve the same outcome as TDE—data written and read to the disk volume is encrypted, and the risk related to physical disk compromise is managed. DEKs are in runtime memory of the machine running the database.

With some TDE implementations, you can encrypt tables, rows, columns, and cells with different DEKs to achieve granular separation of duties between operators. Customers can then configure TDE to authorize access to each DEK based on database login credentials and job function, helping to manage risks associated with unauthorized access. However, the most common configuration I’ve seen is to rely on whole database encryption when using TDE. This configuration gives similar protection against the identified risks as dm-crypt with LUKS used without an HSM, since the DEKs and KEKs are stored within the instance in both cases and the result is that the database data on disk is encrypted.

Using encryption to manage data at rest risks in AWS

When you move to AWS, you gain additional security capabilities that can simplify your security implementations. Since the announcement of the AWS Key Management Service (AWS KMS) in 2014, it has been tightly integrated with Amazon Elastic Block Store (Amazon EBS), Amazon Simple Storage Service (Amazon S3), and dozens of other services on AWS. This means that data is encrypted on disk by checking a single check box. Furthermore, you get the benefits of AWS KMS for key management and cryptographic operations, while being transparent to the Amazon Elastic Compute Cloud (Amazon EC2) instance where the data is being encrypted and decrypted. For simplicity, the authorization for access to the data is managed entirely by AWS Identity and Access Management (IAM) and AWS KMS key resource policies.

If you need more granular access control to the data, you can use the AWS Encryption SDK to encrypt data at the application layer. That provides the same effect as TDE cell-level protection, with a FIPS140-2 Level 2 validated HSM, as might be required by a recognizing standard.

If you must use a FIPS140-2 Level 3 validated HSM to meet more stringent compliance standards or regulations, then you can use the Custom Key Store capability of AWS KMS to achieve that—again in a transparent way. This option has a trade-off, as there is additional operational overhead in terms of managing an AWS CloudHSM cluster.

Many customers choose to migrate their database into the managed Amazon Relational Database Service (Amazon RDS), rather than managing the database instance themselves. Like the Amazon EC2 service, RDS uses Amazon EBS volumes for its data storage, and so can seamlessly use AWS KMS for encryption at rest functionality. When you do so, your management overhead for the protection of data-at-rest reduces to almost zero. This lets you focus on business value while AWS is responsible for the management of your database and the protection of the underlying data. The next section reviews this option and others in more detail.

You can review the available Amazon RDS database engines and versions via the Amazon RDS User Guide documentation, or by running the following AWS Command Line Interface (AWS CLI) command:

aws rds describe-db-engine-versions – query "DBEngineVersions[].DBEngineVersionDescription" – region <regionIdentifier>

Recommended Solutions

If you’re moving an existing database to AWS, you have the following solutions for data at rest encryption. I go into more detail for each option below.

Table 1 – Encryption options

OptionDatabase managementHostEncryptionKey management
1Amazon managedAmazon RDSAmazon EBSAWS KMS
2Amazon managedAmazon RDSAmazon EBSAWS KMS Custom Key Store
3Customer managedAmazon EC2Amazon EBSAWS KMS
4Customer managedAmazon EC2Amazon EBSAWS KMS Custom Key Store
5Customer managedAmazon EC2Amazon EBSLUKS
6Customer managedAmazon EC2DatabaseDatabase TDE
7Customer managedAmazon EC2DatabaseCloudHSM

Option 1 – Using Amazon RDS with Amazon EBS encryption and key management provided by AWS KMS

This approach uses the Amazon RDS service where AWS manages the operating system and database engine. You can configure this service to be a highly scalable resource spanning multiple Availability Zones within an AWS Region to provide resiliency. AWS KMS manages the keys that are used to encrypt the attached Amazon EBS volumes at rest.

Note: This configuration is recommended as your default database encryption approach.

Benefits

  • No key management requirement on host; key management is automated and performed by AWS KMS
  • Meets FIPS140-2 Level 2 validation requirements
  • Simple vertical and horizontal scalability
  • Snapshots for recovery are encrypted automatically
  • AWS manages the patching, maintenance, and configuration of the operating system and database engine
  • Well-recognized configuration, with support offered through AWS Support
  • AWS KMS costs are comparatively low

Challenges

  • Dependent on Amazon RDS supported engines and versions
  • Might require additional controls to manage unauthorized access at table, row, column, or cell level

Option 2 – Using Amazon RDS with Amazon EBS encryption and key management provided by AWS KMS custom key store

This approach uses the Amazon RDS service where AWS manages the operating system and database engine. You can configure this service to be a highly scalable resource spanning multiple Availability Zones within a Region to provide resiliency. CloudHSM keys are used via AWS KMS service integration to encrypt the Amazon EBS volumes at rest.

Note: This configuration is recommended where FIPS140-2 Level 3 validation is a specified compliance requirement.

Benefits

  • No key management requirement on host; key management is performed by AWS KMS
  • Meets FIPS140-2 Level 3 validation requirements
  • Simple vertical and horizontal scalability
  • Snapshots for recovery are encrypted automatically
  • AWS manages the patching, maintenance, and configuration of the database engine
  • Well-recognized configuration with support offered through AWS Support

Challenges

  • Dependent on Amazon RDS supported engines and versions
  • You are responsible for provisioning, configuration, scaling, maintenance, and costs of running CloudHSM cluster
  • Might require additional controls to manage unauthorized access at table, row, column or cell level

Option 3 – Customer-managed database platform hosted on Amazon EC2 with Amazon EBS encryption and key management provided by KMS

In this approach, the key difference is that you’re responsible for managing the EC2 instances, operating systems, and database engines. You can still configure your databases to be highly scalable resources spanning multiple Availability Zones within a Region to provide resiliency, but it takes more effort. AWS KMS manages the keys that are used to encrypt the attached Amazon EBS volumes at rest.

Note: This configuration is recommended when Amazon RDS doesn’t support the desired database engine type or version.

Benefits

  • A 1:1 relationship for migration of database engine configuration
  • Key rotation and management is handled transparently by AWS
  • Data encryption keys are managed by the hypervisor, not by your EC2 instance
  • AWS KMS costs are comparatively low

Challenges

  • You’re responsible for patching and updates of the database engine and OS
  • Might require additional controls to manage unauthorized access at table, row, column, or cell level

Option 4 – Customer-managed database platform hosted on Amazon EC2 with Amazon EBS encryption and key management provided by KMS custom key store

In this approach, you are again responsible for managing the EC2 instances, operating systems, and database engines. You can still configure your databases to be highly scalable resources spanning multiple Availability Zones within a Region to provide resiliency, but it takes more effort. And similar to Option 2, CloudHSM keys are used via AWS KMS service integration to encrypt the Amazon EBS volumes at rest.

Note: This configuration is recommended when Amazon RDS doesn’t support the desired database engine type or version and when FIPS140-2 Level 3 compliance is required.

Benefits

  • A 1:1 relationship for migration of database engine configuration
  • Data encryption keys managed by the hypervisor, not by your EC2 instance
  • Keys managed by FIPS140-2 Level 3 validated HSM

Challenges

  • You’re responsible for provisioning, configuration, scaling, maintenance, and costs of running CloudHSM cluster
  • You’re responsible for patching and updates of the database engine and OS
  • Might require additional controls to manage unauthorized access at table, row, column, or cell level

Option 5 – Customer-managed database platform hosted on Amazon EC2 with Amazon EBS encryption and key management provided by LUKS

In this approach, you’re still responsible for managing the EC2 instances, operating systems, and database engines. You also need to install LUKS onto the Linux instance to manage the encryption of data on Amazon EBS.

Benefits

  • A 1:1 relationship for migration of database engine configuration
  • Transparent encryption is managed by OS with LUKS

Challenges

  • You’re responsible for patching and updates of the database engine and OS
  • Data encryption keys are managed directly on the EC2 instance, and not a dedicated key management system
  • Scaling must be vertical, which is slow and costly
  • LUKS is supported through open-source licensing
  • Support for backup and recovery is LUKS specific, and require additional consideration
  • Might require additional controls to manage unauthorized access at table, row, column or cell level

Note: This approach limits you to only Linux instances and requires the most technical knowledge and effort on your part. Options, such as BitLocker and SQL Server Always Encrypted, exist for Windows hosts, and the complexity and challenges are similar to those of LUKS.

Option 6 – Customer-managed database platform hosted on Amazon EC2 with database encryption and key management provided by TDE

In this approach, you’re still responsible for managing the EC2 instances, operating systems, and database engines. However, instead of encrypting the Amazon EBS volume where the database is stored, you use TDE wallet keys managed by the database engine to encrypt and decrypt records as they are stored and retrieved.

Benefits

  • A 1:1 relationship for migration of database engine configuration
  • Table, row, column, and cell level encryption are managed by TDE, reducing end point risks relating to unauthorized access

Challenges

  • You’re responsible for patching and updates of the database engine and OS
  • Costly license for TDE feature
  • Data encryption keys are managed directly on the EC2 instance
  • Scaling is dependent on TDE functionality and Amazon EC2 scaling
  • Support is split between AWS and a third-party database vendor
  • Cannot share snapshots

Note: This approach is not available with Amazon RDS.

Option 7 – Customer-managed database platform hosted on Amazon EC2 with database encryption performed by TDE and key management provided by CloudHSM

In this approach, you’re still responsible for managing the EC2 instances, operating systems, and database engines. However, instead of encrypting the Amazon EBS volume where the database is stored, you use TDE wallet keys managed by a CloudHSM cluster to encrypt and decrypt records as they are stored and retrieved.

Benefits

  • A 1:1 relationship for migration of database engine configuration
  • Wallet keys (KEK) are managed by a FIPS140-2 Level 3 validated HSM
  • Table, row, column, and cell level encryption are managed by TDE, reducing end point risks relating to unauthorized access

Challenges

  • You’re responsible for patching and updates of the database engine and OS
  • Costly license for TDE feature
  • You are responsible for provisioning, configuration, scaling, maintenance, and costs of running CloudHSM cluster
  • Integration and support of CloudHSM with TDE might vary
  • Scaling is dependent on TDE functionality, Amazon EC2 scaling, and CloudHSM cluster.
  • Data encryption keys are managed on EC2 instance
  • Support is split between AWS and a third-party database vendor
  • Cannot share snapshots

Note: This approach is not available with Amazon RDS.

Summary

While you can operate in AWS similar to how you operate in your on-premises environment, the preceding configurations and recommendations show how you can significantly reduce your challenges and increase your benefits by using cloud-native security services like AWS KMS, Amazon RDS, and CloudHSM. Specifically, using Amazon RDS with Amazon EBS volumes encrypted by AWS KMS provides a highly scalable, resilient, and secure way to manage your keys in AWS.

While there might be some architectural redesign and configuration work needed to move an on-premises database into Amazon RDS, you can leverage AWS services to help you meet your compliance requirements with less effort. By offloading the OS and database maintenance responsibility to AWS, you simultaneously reduce operational friction and increase security. By migrating this way, you can benefit from the scalability and resilience of the AWS global infrastructure and expertise. Lastly, to get started with migrating your database to AWS, I encourage you to use the AWS Database Migration Service.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Jonathan Jenkyn

Jonathan is a Senior Security Growth Strategies Consultant with AWS Professional Services. He’s an active member of the People with Disabilities affinity group, and has built several Amazon initiatives supporting charities and social responsibility causes. Since 1998, he has been involved in IT Security at many levels, from implementation of cryptographic primitives to managing enterprise security governance. Outside of work, he enjoys running, cycling, fund-raising for the BHF and Ipswich Hospital Charity, and spending time with his wife and 5 children.

Author

Scott Conklin

Scott is a Senior Security Consultant with AWS Professional Services (Global Specialty Practice). Based out of Chicago with 4 years tenure, he is an avid distance runner, crypto nerd, lover of unicorns, and enjoys camping, nature, playing Minecraft with his 3 kids, and binge watching Amazon Prime with his wife.

How to use trust policies with IAM roles

Post Syndicated from Jonathan Jenkyn original https://aws.amazon.com/blogs/security/how-to-use-trust-policies-with-iam-roles/

AWS Identity and Access Management (IAM) roles are a significant component in the way customers operate in Amazon Web Service (AWS). In this post, I’ll dive into the details on how Cloud security architects and account administrators can protect IAM roles from misuse by using trust policies. By the end of this post, you’ll know how to use IAM roles to build trust policies that work at scale, providing guardrails to control access to resources in your organization.

In general, there are four different scenarios where you might use IAM roles in AWS:

  • One AWS service accesses another AWS service – When an AWS service needs access to other AWS services or functions, you can create a role that will grant that access.
  • One AWS account accesses another AWS account – This use case is commonly referred to as a cross-account role pattern. This allows human or machine IAM principals from other AWS accounts to assume this role and act on resources in this account.
  • A third-party web identity needs access – This use case allows users with identities in third-party systems like Google and Facebook, or Amazon Cognito, to use a role to access resources in the account.
  • Authentication using SAML2.0 federation – This is commonly used by enterprises with Active Directory that want to connect using an IAM role so that their users can use single sign-on workflows to access AWS accounts.

In all cases, the makeup of an IAM role is the same as that of an IAM user and is only differentiated by the following qualities:

  • An IAM role does not have long term credentials associated with it; rather, a principal (an IAM user, machine, or other authenticated identity) assumes the IAM role and inherits the permissions assigned to that role.
  • The tokens issued when a principal assumes an IAM role are temporary. Their expiration reduces the risks associated with credentials leaking and being reused.
  • An IAM role has a trust policy that defines which conditions must be met to allow other principals to assume it. This trust policy reduces the risks associated with privilege escalation.

Recommendation: You should make extensive use of temporary IAM roles rather than permanent credentials such as IAM users. For more information review this page: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp.html

While the list of users having access to your AWS accounts can change over time, the roles used to manage your AWS account probably won’t. The use of IAM roles essentially decouples your enterprise identity system (SAML 2.0) from your permission system (AWS IAM policies), simplifying management of each.

Managing access to IAM roles

Let’s dive into how you can create relationships between your enterprise identity system and your permissions system by looking at the policy types you can apply to an IAM role.

An IAM role has three places where it uses policies:

  • Permission policies (inline and attached) – These policies define the permissions that a principal assuming the role is able (or restricted) to perform, and on which resources.
  • Permissions boundary – A permissions boundary is an advanced feature for using a managed policy to set the maximum permissions that an identity-based policy can grant to an IAM entity. An entity’s permissions boundary allows it to perform only the actions that are allowed by both its identity-based permission policies and its permissions boundaries.
  • Trust relationship – This policy defines which principals can assume the role, and under which conditions. This is sometimes referred to as a resource-based policy for the IAM role. We’ll refer to this policy simply as the ‘trust policy’.

A role can be assumed by a human user or a machine principal, such as an Amazon Elastic Computer Cloud (Amazon EC2) instance or an AWS Lambda function. Over the rest of this post, you’ll see how you’re able to reduce the conditions for principals to use roles by configuring their trust policies.

An example of a simple trust policy

A common use case is when you need to provide security audit access to your account, allowing a third party to review the configuration of that account. After attaching the relevant permission policies to an IAM role, you need to add a cross-account trust policy to allow the third-party auditor to make the sts:AssumeRole API call to elevate their access in the audited account. The following trust policy shows an example policy created through the AWS Management Console:


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {}
    }
  ]
}

As you can see, it has the same structure as other IAM policies with Effect, Action, and Condition components. It also has the Principal parameter, but no Resource attribute. This is because the resource, in the context of the trust policy, is the IAM role itself. For the same reason, the Action parameter will only ever be set to one of the following values: sts:AssumeRole, sts:AssumeRoleWithSAML, or sts:AssumeRoleWithWebIdentity.

Note: The suffix root in the policy’s Principal attribute equates to “authenticated and authorized principals in the account,” not the special and all-powerful root user principal that is created when an AWS account is created.

Using the Principal attribute to reduce scope

In a trust policy, the Principal attribute indicates which other principals can assume the IAM role. In the example above, 111122223333 represents the AWS account number for the auditor’s AWS account. In effect, this allows any principal in the 111122223333 AWS account with sts:AssumeRole permissions to assume this role.

To restrict access to a specific IAM user account, you can define the trust policy like the following example, which would allow only the IAM user LiJuan in the 111122223333 account to assume this role. LiJuan would also need to have sts:AssumeRole permissions attached to their IAM user for this to work:


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:user/LiJuan"
      },
      "Action": "sts:AssumeRole",
      "Condition": {}
    }
  ]
}

The principals set in the Principal attribute can be any principal defined by the IAM documentation, and can refer to an AWS or a federated principal. You cannot use a wildcard (“*” or “?”) within a Principal for a trust policy, other than one special condition, which I’ll come back to in a moment: You must define precisely which principal you are referring to because there is a translation that occurs when you submit your trust policy that ties it to each principal’s hidden principal ID, and it can’t do that if there are wildcards in the principal.

The only scenario where you can use a wildcard in the Principal parameter is where the parameter value is only the “*” wildcard. Use of the global wildcard “*” for the Principal isn’t recommended unless you have clearly defined Conditional attributes in the policy statement to restrict use of the IAM role, since doing so without Conditional attributes permits assumption of the role by any principal in any AWS account, regardless of who that is.

Using identity federation on AWS

Federated users from SAML 2.0 compliant enterprise identity services are given permissions to access AWS accounts through the use of IAM roles. While the user-to-role configuration of this connection is established within the SAML 2.0 identity provider, you should also put controls in the trust policy in IAM to reduce any abuse.

Because the Principal attribute contains configuration information about the SAML mapping, in the case of Active Directory, you need to use the Condition attribute in the trust policy to restrict use of the role from the AWS account management perspective. This can be done by restricting the SourceIp address, as demonstrated later, or by using one or more of the SAML-specific Condition keys available. My recommendation here is to be as specific as you can in reducing the set of principals that can use the role as is practical. This is best achieved by adding qualifiers into the Condition attribute of your trust policy.

There’s a very good guide on creating roles for SAML 2.0 federation that contains a basic example trust policy you can use.

Using the Condition attribute in a trust policy to reduce scope

The Condition statement in your trust policy sets additional requirements for the Principal trying to assume the role. If you don’t set a Condition attribute, the IAM engine will rely solely on the Principal attribute of this policy to authorize role assumption. Given that it isn’t possible to use wildcards within the Principal attribute, the Condition attribute is a really flexible way to reduce the set of users that are able to assume the role without necessarily specifying the principals.

Limiting role use based on an identifier

Occasionally teams managing multiple roles can become confused as to which role achieves what and can inadvertently assume the wrong role. This is referred to as the Confused Deputy problem. This next section shows you a way to quickly reduce this risk.

The following trust policy requires that principals from the 111122223333 AWS account have provided a special phrase when making their request to assume the role. Adding this condition reduces the risk that someone from the 111122223333 account will assume this role by mistake. This phrase is configured by specifying an ExternalID conditional context key.


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "ExampleSpecialPhrase"
        }
      }
    }
  ]
}

In the example trust policy above, the value ExampleSpecialPhrase isn’t a secret or a password. Adding the ExternalID condition limits this role from being assumed using the console. The only way to add this ExternalID argument into the role assumption API call is to use the AWS Command Line Interface (AWS CLI) or a programming interface. Having this condition doesn’t prevent a user who knows about this relationship and the ExternalId from assuming what might be a privileged set of permissions, but does help manage risks like the Confused Deputy problem. I see customers using an ExternalID that matches the name of the AWS account, which works to ensure that an operator is working on the account they believe they’re working on.

Limiting role use based on multi-factor authentication

By using the Condition attribute, you can also require that the principal assuming this role has passed a multi-factor authentication (MFA) check before they’re permitted to use this role. This again limits the risk associated with mistaken use of the role and adds some assurances about the principal’s identity.


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "BoolIfExists": { 
          "aws:MultiFactorAuthPresent" : "true" 
		}
      }
    }
  ]
}

In the example trust policy above, I also introduced the MultiFactorAuthPresent conditional context key. Per the AWS global condition context keys documentation, the MultiFactorAuthPresent conditional context key does not apply to sts:AssumeRole requests in the following contexts:

  • When using access keys in the CLI or with the API
  • When using temporary credentials without MFA
  • When a user signs in to the AWS Console
  • When services (like AWS CloudFormation or Amazon Athena) reuse session credentials to call other APIs
  • When authentication has taken place via federation

In the example above, the use of the BoolIfExists qualifier to the MultiFactorAuthPresent conditional context key evaluates the condition as true if:

  • The principal type can have an MFA attached, and does.
    or
  • The principal type cannot have an MFA attached.

This is a subtle difference but makes the use of this conditional key in trust policies much more flexible across all principal types.

Limiting role use based on time

During activities like security audits, it’s quite common for the activity to be time-bound and temporary. There’s a risk that the IAM role could be assumed even after the audit activity concludes, which might be undesirable. You can manage this risk by adding a time condition to the Condition attribute of the trust policy. This means that rather than being concerned with disabling the IAM role created immediately following the activity, customers can build the date restriction into the trust policy. You can do this by using policy attribute statements, like so:


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "DateGreaterThan": {
          "aws:CurrentTime": "2020-09-01T12:00:00Z"
        },
        "DateLessThan": {
          "aws:CurrentTime": "2020-09-07T12:00:00Z"
        }
      }
    }
  ]
}

Limiting role use based on IP addresses or CIDR ranges

If the auditor for a security audit is using a known fixed IP address, you can build that information into the trust policy, further reducing the opportunity for the role to be assumed by unauthorized actors calling the assumeRole API function from another IP address or CIDR range:


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "IpAddress": {
          "aws:SourceIp": "203.0.113.0/24"
        }
      }
    }
  ]
}

Limiting role use based on tags

IAM tagging capabilities can also help to build flexible and adaptive trust policies, too, so that they create an attribute-based access control (ABAC) model for IAM management. You can build trust policies that only permit principals that have already been tagged with a specific key and value to assume a specific role. The following example requires that IAM principals in the AWS account 111122223333 be tagged with department = OperationsTeam for them to assume the IAM role.


tagged with department = OperationsTeam for them to assume the IAM role.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalTag/department": "OperationsTeam"
        }
      }
    }
  ]
}

If you want to create this effect, I highly recommend the use of the PrincipalTag pattern above, but you must also be cautious about which principals are then also given iam:TagUser, iam:TagRole, iam:UnTagUser, and iam:UnTagRole permissions, perhaps even using the aws:PrincipalTag condition within the permissions boundary policy to restrict their ability to retag their own IAM principal or that of another IAM role they can assume.

Limiting or extending access to a role based on AWS Organizations

Since its announcement in 2016, almost every enterprise customer I work with uses AWS Organizations. This AWS service allows customers to create an organizational structure for their accounts by creating hard boundaries to manage blast-radius risks, among other advantages. You can use the PrincipalOrgID condition to limit assumption of an organization-wide core IAM role.

Caution: As you’ll see in the example below, you need to set the Principal attribute to “*” to do this, which would, without the conditional restriction, allow all role assumption requests to be accepted for this role, irrespective of the source of that assumption request. For that reason, be especially careful about the use of this pattern.


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalOrgID": "o-abcd12efg1"
        }
      }
    }
  ]
}

It isn’t practical to write out all the AWS account identifiers into a trust policy, and because of the way policies like this are evaluated, you can’t include wildcard characters for the account number in the principal’s account number field. The use of the PrincipalOrgID global condition context key provides us with a neat and dynamic mechanism to create a short policy statement.

Role chaining

There are instances where a third party might themselves be using IAM roles, or where an AWS service resource that has already assumed a role needs to assume another role (perhaps in another account), and customers might need to allow only specific IAM roles in that remote account to assume the IAM role you create in your account. You can use role chaining to build permitted role escalation routes using role assumption from within the same account or AWS organization, or from third-party AWS accounts.

Consider the following trust policy example where I use a combination of the Principal attribute to scope down to an AWS account, and the aws:UserId global conditional context key to scope down to a specific role using its RoleId. To capture the RoleId for the role you want to be able to assume, you can run the following command using the AWS CLI:


# aws iam get-role – role-name CrossAccountAuditor
{
    "Role": {
        "Path": "/",
        "RoleName": "CrossAccountAuditor",
        "RoleId": "ARO1234567123456D",
        "Arn": "arn:aws:iam:: 111122223333:role/CrossAccountAuditor",
        "CreateDate": "2017-08-31T14:24:20+00:00",
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Sid": "",
                    "Effect": "Allow",
                    "Principal": {
                        "AWS": "arn:aws:iam::111122223333:root"
                    },
                    "Action": "sts:AssumeRole",
                    "Condition": {
                        "StringEquals": {
                            "sts:ExternalId": "ExampleSpecialPhrase"
                        }
                    }
                }
            ]
        }
    }
}

Here is the example trust policy that limits to only the CrossAccountAuditor role from AWS Account 111122223333.


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringLike": {
          "aws:userId": "ARO1234567123456D:*"
        }
      }
    }
  ]
}

If you’re using an IAM user and have assumed the CrossAccountAuditor IAM role, the policy above will work through the AWS CLI with a call to aws sts assume-role and through the console.

This type of trust policy also works for services like Amazon EC2, allowing those instances using their assigned instance profile role to assume a role in another account to perform actions. We’ll touch on this use case later in the post.

Putting it all together

AWS customers can use combinations of all the above Principal and Condition attributes to hone the trust they’re extending out to any third party, or even within their own organization. They might create an accumulated trust policy for an IAM role which achieves the following effect:

Allows only a user named PauloSantos, in AWS account number 111122223333, to assume the role if they have also authenticated with an MFA, are logging in from an IP address in the 203.0.113.0 to 203.0.113.24 CIDR range, and the date is between noon of September 1, 2020, and noon of September 7, 2020.


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::111122223333:user/PauloSantos"
        ]
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "BoolIfExists": {
          "aws:MultiFactorAuthPresent": "true"
        },
        "IpAddress": {
          "aws:SourceIp": "203.0.113.0/24"
        },
        "DateGreaterThan": {
          "aws:CurrentTime": "2020-09-01T12:00:00Z"
        },
        "DateLessThan": {
          "aws:CurrentTime": "2020-09-07T12:00:00Z"
        }
      }
    }
  ]
}

I’ve seen customers use this to create IAM users who have no permissions attached besides sts:AssumeRole. Trust relationships are then configured between the IAM users and the IAM roles, creating ultimate flexibility in defining who has access to what roles without needing to update the IAM user identity pool at all.

A word on Effect: Deny and NotPrincipal in IAM role trust policies

I have seen some customers make use of an “Effect”: “Deny” clause in their trust policies. This pattern can help manage a wildcard statement in another “Effect”: “Allow” clause of the same trust policy. However, this isn’t the best approach for most scenarios. You will typically be able to define each principal in your policy as being allowed access. An example of where this might not be true is where you have a clause that uses the global wildcard “*” as a principal, in which case it will be necessary to add Deny statements to further filter the access.

Putting a wildcard into the Principal attribute of an Allow policy statement, particularly in relation to trust policies, can be dangerous if you haven’t done a robust job of managing the Condition attribute in the same statement. Be as specific as possible in your Allow statement, and use Principal attributes first, rather than then relying on Deny statements to manage potential security gaps created by your use of wildcards.

The following trust policy allows all IAM principals within the o-abcd12efg1 organization to assume the IAM role, but only if it’s before September 7, 2020:


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "aws:PrincipalOrgID": "o-abcd12efg1"
        }
      }
    },
    {
      "Effect": "Deny",
      "Principal": {
        "AWS": "*"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "DateGreaterThan": {
          "aws:CurrentTime": "2020-09-07T12:00:00Z"
        }
      }
    }
  ]
}

The use of NotPrincipal in trust policies

You can also build into your trust policies a NotPrincipal condition. Again, this is rarely the best choice, because you can introduce unnecessary complexity and confusion into your policies. Instead, you can avoid that problem by using fairly simple and prescriptive Principal statements.

Statements with NotPrincipal can also use a Deny statement as well, so it can create quite baffling policy logic, which if misunderstood could create unintended opportunities for misuse or abuse.

Here’s an example where you might think to use Deny and NotPrincipal in a trust policy—but notice this has the same effect as adding arn:aws:iam::123456789012:role/CoreAccess in a single Allow statement. In general, Deny with NotPrincipal statements in trust policies create unnecessary complexity, and should be avoided.


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Deny",
      "NotPrincipal": {
        "AWS": "arn:aws:iam::111122223333:role/CoreAccess"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Remember, your Principal attribute should be very specific, to reduce the set of those able to assume the role, and an IAM role trust policy won’t permit access if a corresponding Allow statement isn’t explicitly present in the trust policy. It’s better to rely on the default deny policy evaluation logic where you’re able, rather than introducing unnecessary complexity into your policy logic.

Creating trust policies for AWS services that assume roles

There are two types of contexts where AWS services need access to IAM roles to function:

  1. Resources managed by an AWS service (like Amazon EC2 or Lambda, for example) need access to an IAM role to execute functions on other AWS resources, and need permissions to do so.
  2. An AWS service that abstracts its functionality from other AWS services, like Amazon Elastic Container Service (Amazon ECS) or Amazon Lex, needs access to execute functions on AWS resources. These are called service-linked roles and are a special case that’s out of the scope of this post.

In both contexts, you have the service itself as an actor. The service is assuming your IAM role so it can provide your credentials to your Lambda function (the first context) or use those credentials to do things (the second context). In the same way that IAM roles are used by human operators to provide an escalation mechanism for users operating with specific functions in the examples above, so, too, do AWS resources, such as Lambda functions, Amazon EC2 instances, and even AWS CloudFormation, require the same mechanism. You can find more information about how to create IAM Roles for AWS Services here.

An IAM role for a human operator and for an AWS service are exactly the same, even though they have a different principal defined in the trust policy. The policy’s Principal will define the AWS service that is permitted to assume the role for its function.

Here’s an example trust policy for a role designed for an Amazon EC2 instance to assume. You can see that the principal provided is the ec2.amazonaws.com service:


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Every configuration of an AWS resource should be passed a specific role unique to its function. So, if you have two Amazon EC2 launch configurations, you should design two separate IAM roles, even if the permissions they require are currently the same. This allows each configuration to grow or shrink the permissions it requires over time, without needing to reattach IAM roles to configurations, which might create a privilege escalation risk. Instead, you update the permissions attached to each IAM role independently, knowing that it will only be used by that one service resource. This helps reduce the potential impact of risks. Automating your management of roles will help here, too.

Several customers have asked if it’s possible to design a trust policy for an IAM role such that it can only be passed to a specific Amazon EC2 instance. This isn’t directly possible. You cannot place the Amazon Resource Name (ARN) for an EC2 instance into the Principal of a trust policy, nor can you use tag-based condition statements in the trust policy to limit the ability for the role to be used by a specific resource.

The only option is to manage access to the iam:PassRole action within the permission policy for those IAM principals you expect to be attaching IAM roles to AWS resources. This special Action is evaluated when a principal tries to attach another IAM role to an AWS service or AWS resource.

You should use restrictions on access to the iam:PassRole action with permission policies and permission boundaries. This means that the ability to attach roles to instance profiles for Amazon EC2 is limited, rather than using the trust policy on the role assumed by the EC2 instance to achieve this. This approach makes it much easier to manage scaling for both those principals attaching roles to EC2 instances, and the instances themselves.

You could use a permission policy to limit the ability for the associated role to attach other roles to Amazon EC2 instances with the following permission policy, unless the role name is prefixed with EC2-Webserver-:


{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Action": [
            "iam:GetRole",
            "iam:PassRole"
        ],
        "Resource": "arn:aws:iam::111122223333:role/EC2-Webserver-*"
    }]
}

Conclusion

You now have all the tools you need to build robust and effective trust policies that work at scale, providing guardrails for your users and those who might want to access resources in your account from outside your organization.

Policy logic isn’t always simple, and I encourage you to use sandbox accounts to try out your ideas. In general, simplicity should win over cleverness. IAM policies and statements that might well be frugal in their use of policy language might also be difficult to read, interpret, and update by other IAM administrators in the future. Keeping your trust policies simple helps to build IAM relationships everyone understands and can manage, and use, effectively.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS IAM forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Jonathan Jenkyn

Jonathan is a Senior Security Growth Strategies Consultant with AWS Professional Services. He’s an active member of the People with Disabilities affinity group, and has built several Amazon initiatives supporting charities and social responsibility causes. Since 1998, he has been involved in IT Security at many levels, from implementation of cryptographic primitives to managing enterprise security governance. Outside of work, he enjoys running, cycling, fund-raising for the BHF and Ipswich Hospital Charity, and spending time with his wife and 5 children.

How to manage security governance using DevOps methodologies

Post Syndicated from Jonathan Jenkyn original https://aws.amazon.com/blogs/security/how-to-manage-security-governance-using-devops-methodologies/

I’ve conducted more security audits and reviews than I can comfortably count, and I’ve found that these reviews can be surprisingly open to interpretation (as much as they try not to be). Many companies use spreadsheets to explain and limit business risks, with an annual review to confirm the continued suitability of their controls. However, multiple business stakeholders often influence the master security control set, which can result in challenges like security control definitions being repeated with different wording, or being inconsistently scoped. Reviewing these spreadsheets is not especially fun for anyone.

I believe it’s possible for businesses to not only define their security controls in a less ambiguous way, but also to automate security audits, allowing for more rapid innovation. The approach I’ll demonstrate in this post isn’t a silver bullet, but it’s a method by which you can control some of that inevitable shift in threat evaluations resulting from changes in business and technical operations, such as vulnerability announcements, feature updates, or new requirements.

My solution comes in two parts and borrows some foundational methodologies from DevOps culture. If you’re not familiar with DevOps, you can read more about it here. AWS defines DevOps as “the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity: evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes.” Sounds pretty good, right?

User story definitions of security controls

The process of developing security controls should start with a threat modeling exercise. There are some great tools out there that can help you develop very rich threat models for your solutions. I have a personal preference for using STRIDE with my customers, as it’s very widely accepted and has a low barrier to use, but you might also try PASTA, DREAD, VAST, Trike, or OCTAVE. All of these tools result in a risk register being published. A risk register is a prioritized list of risks to your business (or some component of your business or solution), their likelihood of being realized, and their impact if they were to be realized. Combining these factors results in a risk score. You’ll walk through the various mechanisms for those risks to be addressed based on their likelihood or impact. The mechanisms can be directive, preventative, detective or responsive controls; collectively, these are your security controls. (If you want to learn more about the difference between control types, check out the whitepaper AWS Cloud Adoption Framework: Security Perspective.)
 

Figure 1: The AWS Cloud Adoption Framework Security Perspective

Figure 1 – The AWS Cloud Adoption Framework Security Perspective

Security controls should be carefully worded to avoid ambiguity. Each control typically takes the form of a single statement that requires some action or configuration. The action or configuration results in the documented risk either being mitigated in full or else leaving some residual risk, which can be managed further with other security controls as required by your business’s risk tolerance.

However, the end result can feel like a children’s game of “Telephone” in that an implemented control doesn’t always relate closely to the originally envisioned threat. Consider the following security control definition:

  • Only approved AMIs are allowed to be used.

On the surface, this looks like an easy preventative control to implement, but it immediately raises multiple questions, including:

  • Who approves AMIs, and how are they approved?
  • How can users get AMIs to use?
  • What constitutes “use”? Starting? Connecting to?

This is where DevOps comes in. Many DevOps practices use the notion of a “user story” to help define the requirements for solutions. A user story is simply a syntax for defining a requirement. In other words: As a <user>, I want to <requirement>, so that <outcome>. If you use the same approach to define your security controls, you’ll notice that a lot, if not all, of the ambiguity fades:


As a Security Operations Manager,
I want only images tagged as being hardened by the Security Operations Team to be permitted to start in a VPC
So that I can be assured that the solution is not vulnerable to common attack vectors.

Boom! Now the engineer trying to implement the security control has a better understanding of the intention behind the control, and thus a better idea of how to implement it, and test it.

Documenting these controls for your security stakeholders (legal, governance, CISO, and so on) in an accessible, agile project management tool rather than in a spreadsheet is also a good idea. While spreadsheets are a very common method of documentation, a project management tool makes it easier for you to update your controls, ensuring that they keep pace with your company’s innovations. There are many agile project management suites that can assist you here. I’ve used Jira by Atlassian with most of my customers, but there are a few other tools that achieve similar outcomes: Agilean, Wrike, Trello, and Asana, to name a few.

Continuous integration and evaluation of security controls

Once you’ve written your security control as a user story, you can borrow from DevOps again, and write some acceptance criteria. This is done through a process that’s very similar to creating a threat model in the first place. You’ll create a scenario and then define actions for actors plus expected outcomes. The syntax used is that we start by defining the scenario we’re testing, and then use “Given that <conditions of the test> When <test action> Then <expected outcome>.” For example:


Scenario: User is starting an instance in a VPC

“Given that I am logged in to the AWS Console
and I have permissions to start an instance in a VPC
When I try to start an instance
and the AMI is not tagged as hardened
Then it is denied” [Preventative]

“Given that I am logged into the AWS console
and I have permission to start an instance in the VPC
When I try to start an instance 
and the AMI is not tagged as hardened
Then an email is sent to the Security Operations Team.” [Responsive]

“Given that I am logged into the AWS Console
and I have permission to start an instance in the VPC
When I try to start and instance that is tagged as hardened
Then the instance starts.” [Allowed]

After you write multiple action statements and scenarios supporting the user story (both positive and negative), you can write them up as a runbook, an AWS Config rule, or a combination of both as required.

The second example acceptance criteria above would need to be written as a runbook, as it’s a responsive control. You wouldn’t want to generate a stream of emails to your security operations manager to validate that it’s working.

The other two examples could be written as AWS Config rules using a call to the iam:simulate-custom-policy API, since they are related to preventative controls. An AWS Config rule allows your entire account to be continuously evaluated for compliance, essentially evaluating your control adherence on a <15min basis, rather than from a yearly audit.

Committing those runbook and AWS Config rules to a central code repository fosters the agility of the controls. In the case of runbooks, you may want to adopt a light-weight markup format, such as markdown, that you are able to check in like code. The defined controls can then sit in a CI/CD pipeline, allowing the security controls to be as agile as your pace of innovation.
 

Figure 2 – A standard DevOps pipeline

Figure 2: A standard DevOps pipeline

There are numerous benefits to this approach:

  • You get immediate feedback on compliance to your security controls and thus your businesses security posture.
  • Unlike traditional annual security compliance audits, you have a record that not only are you compliant now, but you’ve been compliant all year. And publication of this evidence to provide support to audit processes requires almost negligible effort on your part.
  • You may not have to take weeks out of your schedule to audit your security controls.Instead, you can check your AWS Config dashboard and run some simple procedural runbooks.
  • Your developers are now empowered to get early feedback on any solutions they’re designing.
  • Changes to your threat model can quickly radiate down to applicable security controls and acceptances tests, again making security teams enablers of innovation rather than blockers.

One word of caution: You will inevitably have exceptions to your security controls. It’s tempting to hardcode these exceptions or write configuration files that allow for exclusions to rules. However, this approach can create hidden complexity in your control. If a resource is identified as being non-compliant, it may be better to allow it to remain as such, and to document it as an exception to be periodically reviewed. Remember to keep a clean separation between your risk evidence and risk management processes here. Exception lists in code are difficult to maintain and ultimately mean that your AWS Config dashboard can show a distorted evaluation of your resources’ compliance. I advise against codified exceptions in most cases. In fact, if you find yourself preparing to write out exceptions in code, consider that maybe your user-story needs re-writing. And the cycle begins again!

Closing notes

As cloud computing becomes the new normal, agility and innovation are crucial behaviors for long-term success. Adopting the use of user stories and acceptance criteria to mature your security governance process empowers your business to plan for acceleration. I’ve used the DevOps approach with several customers in the finance sector and have seen a shift in the perception of how security governance affects teams. DevOps has the ability to turn security teams into enablers of business innovation.

If you want help finding practical ways to build DevOps into your business, please reach out to the AWS Security, Risk and Compliance Professional Services team. For information about AWS Config pricing, check out the pricing details page. If you have feedback about this blog post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Author

Jonathan Jenkyn

Since graduating from the University of East Anglia in 1998, Jonathan has been involved in IT Security at many levels, from the implementation of cryptographic primitives to managing enterprise security governance. He joined AWS Professional Services as a Senior Security Consultant in 2017 and supports CloudHSM, DevSecOps, Blockchain and GDPR initiatives, as well as the People with Disabilities affinity group. Outside of work, he enjoys running, volunteering for the BHF, and spending time with his wife and 5 children.