Tag Archives: sts

How to use Regional AWS STS endpoints

Post Syndicated from Darius Januskis original https://aws.amazon.com/blogs/security/how-to-use-regional-aws-sts-endpoints/

This blog post provides recommendations that you can use to help improve resiliency in the unlikely event of disrupted availability of the global (now legacy) AWS Security Token Service (AWS STS) endpoint. Although the global (legacy) AWS STS endpoint https://sts.amazonaws.com is highly available, it’s hosted in a single AWS Region—US East (N. Virginia)—and like other endpoints, it doesn’t provide automatic failover to endpoints in other Regions. In this post I will show you how to use Regional AWS STS endpoints in your configurations to improve the performance and resiliency of your workloads.

For authentication, it’s best to use temporary credentials instead of long-term credentials to help reduce risks, such as inadvertent disclosure, sharing, or theft of credentials. With AWS STS, trusted users can request temporary, limited-privilege credentials to access AWS resources.

Temporary credentials include an access key pair and a session token. The access key pair consists of an access key ID and a secret key. AWS STS generates temporary security credentials dynamically and provides them to the user when requested, which eliminates the need for long-term storage. Temporary security credentials have a limited lifetime so you don’t have to manage or rotate them.

To get these credentials, you can use several different methods:

Figure 1: Methods to request credentials from AWS STS

Figure 1: Methods to request credentials from AWS STS

Global (legacy) and Regional AWS STS endpoints

To connect programmatically to an AWS service, you use an endpoint. An endpoint is the URL of the entry point for AWS STS.

AWS STS provides Regional endpoints in every Region. AWS initially built AWS STS with a global endpoint (now legacy) https://sts.amazonaws.com, which is hosted in the US East (N. Virginia) Region (us-east-1). Regional AWS STS endpoints are activated by default for Regions that are enabled by default in your AWS account. For example, https://sts.us-east-2.amazonaws.com is the US East (Ohio) Regional endpoint. By default, AWS services use Regional AWS STS endpoints. For example, IAM Roles Anywhere uses the Regional STS endpoint that corresponds to the trust anchor. For a complete list of AWS STS endpoints for each Region, see AWS Security Token Service endpoints and quotas. You can’t activate an AWS STS endpoint in a Region that is disabled. For more information on which AWS STS endpoints are activated by default and which endpoints you can activate or deactivate, see Regions and endpoints.

As noted previously, the global (legacy) AWS STS endpoint https://sts.amazonaws.com is hosted in a single Region — US East (N. Virginia) — and like other endpoints, it doesn’t provide automatic failover to endpoints in other Regions. If your workloads on AWS or outside of AWS are configured to use the global (legacy) AWS STS endpoint https://sts.amazonaws.com, you introduce a dependency on a single Region: US East (N. Virginia). In the unlikely event that the endpoint becomes unavailable in that Region or connectivity between your resources and that Region is lost, your workloads won’t be able to use AWS STS to retrieve temporary credentials, which poses an availability risk to your workloads.

AWS recommends that you use Regional AWS STS endpoints (https://sts.<region-name>.amazonaws.com) instead of the global (legacy) AWS STS endpoint.

In addition to improved resiliency, Regional endpoints have other benefits:

  • Isolation and containment — By making requests to an AWS STS endpoint in the same Region as your workloads, you can minimize cross-Region dependencies and align the scope of your resources with the scope of your temporary security credentials to help address availability and security concerns. For example, if your workloads are running in the US East (Ohio) Region, you can target the Regional AWS STS endpoint in the US East (Ohio) Region (us-east-2) to remove dependencies on other Regions.
  • Performance — By making your AWS STS requests to an endpoint that is closer to your services and applications, you can access AWS STS with lower latency and shorter response times.

Figure 2 illustrates the process for using an AWS principal to assume an AWS Identity and Access Management (IAM) role through the AWS STS AssumeRole API, which returns a set of temporary security credentials:

Figure 2: Assume an IAM role by using an API call to a Regional AWS STS endpoint

Figure 2: Assume an IAM role by using an API call to a Regional AWS STS endpoint

Calls to AWS STS within the same Region

You should configure your workloads within a specific Region to use only the Regional AWS STS endpoint for that Region. By using a Regional endpoint, you can use AWS STS in the same Region as your workloads, removing cross-Region dependency. For example, workloads in the US East (Ohio) Region should use only the Regional endpoint https://sts.us-east-2.amazonaws.com to call AWS STS. If a Regional AWS STS endpoint becomes unreachable, your workloads shouldn’t call AWS STS endpoints outside of the operating Region. If your workload has a multi-Region resiliency requirement, your other active or standby Region should use a Regional AWS STS endpoint for that Region and should be deployed such that the application can function despite a Regional failure. You should direct STS traffic to the STS endpoint within the same Region, isolated and independent from other Regions, and remove dependencies on the global (legacy) endpoint.

Calls to AWS STS from outside AWS

You should configure your workloads outside of AWS to call the appropriate Regional AWS STS endpoints that offer the lowest latency to your workload located outside of AWS. If your workload has a multi-Region resiliency requirement, build failover logic for AWS STS calls to other Regions in the event that Regional AWS STS endpoints become unreachable. Temporary security credentials obtained from Regional AWS STS endpoints are valid globally for the default session duration or duration that you specify.

How to configure Regional AWS STS endpoints for your tools and SDKs

I recommend that you use the latest major versions of the AWS Command Line Interface (CLI) or AWS SDK to call AWS STS APIs.

AWS CLI

By default, the AWS CLI version 2 sends AWS STS API requests to the Regional AWS STS endpoint for the currently configured Region. If you are using AWS CLI v2, you don’t need to make additional changes.

By default, the AWS CLI v1 sends AWS STS requests to the global (legacy) AWS STS endpoint. To check the version of the AWS CLI that you are using, run the following command: $ aws –version.

When you run AWS CLI commands, the AWS CLI looks for credential configuration in a specific order—first in shell environment variables and then in the local AWS configuration file (~/.aws/config).

AWS SDK

AWS SDKs are available for a variety of programming languages and environments. Since July 2022, major new versions of the AWS SDK default to Regional AWS STS endpoints and use the endpoint corresponding to the currently configured Region. If you use a major version of the AWS SDK that was released after July 2022, you don’t need to make additional changes.

An AWS SDK looks at various configuration locations until it finds credential configuration values. For example, the AWS SDK for Python (Boto3) adheres to the following lookup order when it searches through sources for configuration values:

  1. A configuration object created and passed as the AWS configuration parameter when creating a client
  2. Environment variables
  3. The AWS configuration file ~/.aws/config

If you still use AWS CLI v1, or your AWS SDK version doesn’t default to a Regional AWS STS endpoint, you have the following options to set the Regional AWS STS endpoint:

Option 1 — Use a shared AWS configuration file setting

The configuration file is located at ~/.aws/config on Linux or macOS, and at C:\Users\USERNAME\.aws\config on Windows. To use the Regional endpoint, add the sts_regional_endpoints parameter.

The following example shows how you can set the value for the Regional AWS STS endpoint in the US East (Ohio) Region (us-east-2), by using the default profile in the AWS configuration file:

[default]
region = us-east-2
sts_regional_endpoints = regional

The valid values for the AWS STS endpoint parameter (sts_regional_endpoints) are:

  • legacy (default) — Uses the global (legacy) AWS STS endpoint, sts.amazonaws.com.
  • regional — Uses the AWS STS endpoint for the currently configured Region.

Note: Since July 2022, major new versions of the AWS SDK default to Regional AWS STS endpoints and use the endpoint corresponding to the currently configured Region. If you are using AWS CLI v1, you must use version 1.16.266 or later to use the AWS STS endpoint parameter.

You can use the --debug option with the AWS CLI command to receive the debug log and validate which AWS STS endpoint was used.

$ aws sts get-caller-identity \
$ --region us-east-2 \
$ --debug

If you search for UseGlobalEndpoint in your debug log, you’ll find that the UseGlobalEndpoint parameter is set to False, and you’ll see the Regional endpoint provider fully qualified domain name (FQDN) when the Regional AWS STS endpoint is configured in a shared AWS configuration file or environment variables:

2023-09-11 18:51:11,300 – MainThread – botocore.regions – DEBUG – Calling endpoint provider with parameters: {'Region': 'us-east-2', 'UseDualStack': False, 'UseFIPS': False, 'UseGlobalEndpoint': False}
2023-09-11 18:51:11,300 – MainThread – botocore.regions – DEBUG – Endpoint provider result: https://sts.us-east-2.amazonaws.com

For a list of AWS SDKs that support shared AWS configuration file settings for Regional AWS STS endpoints, see Compatibility with AWS SDKS.

Option 2 — Use environment variables

Environment variables provide another way to specify configuration options. They are global and affect calls to AWS services. Most SDKs support environment variables. When you set the environment variable, the SDK uses that value until the end of your shell session or until you set the variable to a different value. To make the variables persist across future sessions, set them in your shell’s startup script.

The following example shows how you can set the value for the Regional AWS STS endpoint in the US East (Ohio) Region (us-east-2) by using environment variables:

Linux or macOS

$ export AWS_DEFAULT_REGION=us-east-2
$ export AWS_STS_REGIONAL_ENDPOINTS=regional

You can run the command $ (echo $AWS_DEFAULT_REGION; echo $AWS_STS_REGIONAL_ENDPOINTS) to validate the variables. The output should look similar to the following:

us-east-2
regional

Windows

C:\> set AWS_DEFAULT_REGION=us-east-2
C:\> set AWS_STS_REGIONAL_ENDPOINTS=regional

The following example shows how you can configure an STS client with the AWS SDK for Python (Boto3) to use a Regional AWS STS endpoint by setting the environment variable:

import boto3
import os
os.environ["AWS_DEFAULT_REGION"] = "us-east-2"
os.environ["AWS_STS_REGIONAL_ENDPOINTS"] = "regional"

You can use the metadata attribute sts_client.meta.endpoint_url to inspect and validate how an STS client is configured. The output should look similar to the following:

>>> sts_client = boto3.client("sts")
>>> sts_client.meta.endpoint_url
'https://sts.us-east-2.amazonaws.com'

For a list of AWS SDKs that support environment variable settings for Regional AWS STS endpoints, see Compatibility with AWS SDKs.

Option 3 — Construct an endpoint URL

You can also manually construct an endpoint URL for a specific Regional AWS STS endpoint.

The following example shows how you can configure the STS client with AWS SDK for Python (Boto3) to use a Regional AWS STS endpoint by setting a specific endpoint URL:

import boto3
sts_client = boto3.client('sts', region_name='us-east-2', endpoint_url='https://sts.us-east-2.amazonaws.com')

Use a VPC endpoint with AWS STS

You can create a private connection to AWS STS from the resources that you deployed in your Amazon VPCs. AWS STS integrates with AWS PrivateLink by using interface VPC endpoints. The network traffic on AWS PrivateLink stays on the global AWS network backbone and doesn’t traverse the public internet. When you configure a VPC endpoint for AWS STS, the traffic for the Regional AWS STS endpoint traverses to that endpoint.

By default, the DNS in your VPC will update the entry for the Regional AWS STS endpoint to resolve to the private IP address of the VPC endpoint for AWS STS in your VPC. The following output from an Amazon Elastic Compute Cloud (Amazon EC2) instance shows the DNS name for the AWS STS endpoint resolving to the private IP address of the VPC endpoint for AWS STS:

[ec2-user@ip-10-120-136-166 ~]$ nslookup sts.us-east-2.amazonaws.com
Server:         10.120.0.2
Address:        10.120.0.2#53

Non-authoritative answer:
Name:   sts.us-east-2.amazonaws.com
Address: 10.120.138.148

After you create an interface VPC endpoint for AWS STS in your Region, set the value for the respective Regional AWS STS endpoint by using environment variables to access AWS STS in the same Region.

The output of the following log shows that an AWS STS call was made to the Regional AWS STS endpoint:

POST
/

content-type:application/x-www-form-urlencoded; charset=utf-8
host:sts.us-east-2.amazonaws.com

Log AWS STS requests

You can use AWS CloudTrail events to get information about the request and endpoint that was used for AWS STS. This information can help you identify AWS STS request patterns and validate if you are still using the global (legacy) STS endpoint.

An event in CloudTrail is the record of an activity in an AWS account. CloudTrail events provide a history of both API and non-API account activity made through the AWS Management Console, AWS SDKs, command line tools, and other AWS services.

Log locations

  • Requests to Regional AWS STS endpoints sts.<region-name>.amazonaws.com are logged in CloudTrail within their respective Region.
  • Requests to the global (legacy) STS endpoint sts.amazonaws.com are logged within the US East (N. Virginia) Region (us-east-1).

Log fields

  • Requests to Regional AWS STS endpoints and global endpoint are logged in the tlsDetails field in CloudTrail. You can use this field to determine if the request was made to a Regional or global (legacy) endpoint.
  • Requests made from a VPC endpoint are logged in the vpcEndpointId field in CloudTrail.

The following example shows a CloudTrail event for an STS request to a Regional AWS STS endpoint with a VPC endpoint.

"eventType": "AwsApiCall",
"managementEvent": true,
"recipientAccountId": "123456789012",
"vpcEndpointId": "vpce-021345abcdef6789",
"eventCategory": "Management",
"tlsDetails": {
    "tlsVersion": "TLSv1.2",
    "cipherSuite": "ECDHE-RSA-AES128-GCM-SHA256",
    "clientProvidedHostHeader": "sts.us-east-2.amazonaws.com"
}

The following example shows a CloudTrail event for an STS request to the global (legacy) AWS STS endpoint.

"eventType": "AwsApiCall",
"managementEvent": true,
"recipientAccountId": "123456789012",
"eventCategory": "Management",
"tlsDetails": {
    "tlsVersion": "TLSv1.2",
    "cipherSuite": "ECDHE-RSA-AES128-GCM-SHA256",
    "clientProvidedHostHeader": "sts.amazonaws.com"
}

To interactively search and analyze your AWS STS log data, use AWS CloudWatch Logs Insights or Amazon Athena.

CloudWatch Logs Insights

The following example shows how to run a CloudWatch Logs Insights query to look for API calls made to the global (legacy) AWS STS endpoint. Before you can query CloudTrail events, you must configure a CloudTrail trail to send events to CloudWatch Logs.

filter eventSource="sts.amazonaws.com" and tlsDetails.clientProvidedHostHeader="sts.amazonaws.com"
| fields eventTime, recipientAccountId, eventName, tlsDetails.clientProvidedHostHeader, sourceIPAddress, userIdentity.arn, @message
| sort eventTime desc

The query output shows event details for an AWS STS call made to the global (legacy) AWS STS endpoint https://sts.amazonaws.com.

Figure 3: Use a CloudWatch Log Insights query to look for STS API calls

Figure 3: Use a CloudWatch Log Insights query to look for STS API calls

Amazon Athena

The following example shows how to query CloudTrail events with Amazon Athena and search for API calls made to the global (legacy) AWS STS endpoint.

SELECT
    eventtime,
    recipientaccountid,
    eventname,
    tlsdetails.clientProvidedHostHeader,
    sourceipaddress,
    eventid,
    useridentity.arn
FROM "cloudtrail_logs"
WHERE
    eventsource = 'sts.amazonaws.com' AND
    tlsdetails.clientProvidedHostHeader = 'sts.amazonaws.com'
ORDER BY eventtime DESC

The query output shows STS calls made to the global (legacy) AWS STS endpoint https://sts.amazonaws.com.

Figure 4: Use Athena to search for STS API calls and identify STS endpoints

Figure 4: Use Athena to search for STS API calls and identify STS endpoints

Conclusion

In this post, you learned how to use Regional AWS STS endpoints to help improve resiliency, reduce latency, and increase session token usage for the operating Regions in your AWS environment.

AWS recommends that you check the configuration and usage of AWS STS endpoints in your environment, validate AWS STS activity in your CloudTrail logs, and confirm that Regional AWS STS endpoints are used.

If you have questions, post them in the Security Identity and Compliance re:Post topic or reach out to AWS Support.

Want more AWS Security news? Follow us on X.

Darius Januskis

Darius is a Senior Solutions Architect at AWS helping global financial services customers in their journey to the cloud. He is a passionate technology enthusiast who enjoys working with customers and helping them build well-architected solutions. His core interests include security, DevOps, automation, and serverless technologies.

Announcing an update to IAM role trust policy behavior

Post Syndicated from Mark Ryland original https://aws.amazon.com/blogs/security/announcing-an-update-to-iam-role-trust-policy-behavior/

AWS Identity and Access Management (IAM) is changing an aspect of how role trust policy evaluation behaves when a role assumes itself. Previously, roles implicitly trusted themselves from a role trust policy perspective if they had identity-based permissions to assume themselves. After receiving and considering feedback from customers on this topic, AWS is changing role assumption behavior to always require self-referential role trust policy grants. This change improves consistency and visibility with regard to role behavior and privileges. This change allows customers to create and understand role assumption permissions in a single place (the role trust policy) rather than two places (the role trust policy and the role identity policy). It increases the simplicity of role trust permission management: “What you see [in the trust policy] is what you get.”

Therefore, beginning today, for any role that has not used the identity-based behavior since June 30, 2022, a role trust policy must explicitly grant permission to all principals, including the role itself, that need to assume it under the specified conditions. Removal of the role’s implicit self-trust improves consistency and increases visibility into role assumption behavior.

Most AWS customers will not be impacted by the change at all. Only a tiny percentage (approximately 0.0001%) of all roles are involved. Customers whose roles have recently used the previous implicit trust behavior are being notified, beginning today, about those roles, and may continue to use this behavior with those roles until February 15, 2023, to allow time for making the necessary updates to code or configuration. Or, if these customers are confident that the change will not impact them, they can opt out immediately by substituting in new roles, as discussed later in this post.

The first part of this post briefly explains the change in behavior. The middle sections answer practical questions like: “why is this happening?,” “how might this change impact me?,” “which usage scenarios are likely to be impacted?,” and “what should I do next?” The usage scenario section is important because it shows that, based on our analysis, the self-assuming role behavior exhibited by code or human users is very likely to be unnecessary and counterproductive. Finally, for security professionals interested in better understanding the reasons for the old behavior, the rationale for the change, as well as its possible implications, the last section reviews a number of core IAM concepts and digs in to additional details.

What is changing?

Until today, an IAM role implicitly trusted itself. Consider the following role trust policy attached to the role named RoleA in AWS account 123456789012.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::123456789012:role/RoleB"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

This role trust policy grants role assumption access to the role named RoleB in the same account. However, if the corresponding identity-based policy for RoleA grants the sts:AssumeRole action with respect to itself, then RoleA could also assume itself. Therefore, there were actually two roles that could assume RoleA: the explicitly permissioned RoleB, and RoleA, which implicitly trusted itself as a byproduct of the IAM ownership model (explained in detail in the final section). Note that the identity-based permission that RoleA must have to assume itself is not required in the case of RoleB, and indeed an identity-based policy associated with RoleB that references other roles is not sufficient to allow RoleB to assume them. The resource-based permission granted by RoleA’s trust policy is both necessary and sufficient to allow RoleB to assume RoleA.

Although earlier we summarized this behavior as “implicit self-trust,” the key point here is that the ability of Role A to assume itself is not actually implicit behavior. The role’s self-referential permission had to be explicit in one place or the other (or both): either in the role’s identity-based policy (perhaps based on broad wildcard permissions), or its trust policy. But unlike the case with other principals and role trust, an IAM administrator would have to look in two different policies to determine whether a role could assume itself.

As of today, for any new role, or any role that has not recently assumed itself while relying on the old behavior, IAM administrators must modify the previously shown role trust policy as follows to allow RoleA to assume itself, regardless of the privileges granted by its identity-based policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::123456789012:role/RoleB",
                    "arn:aws:iam::123456789012:role/RoleA"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

This change makes role trust behavior clearer and more consistent to understand and manage, whether directly by humans or as embodied in code.

How might this change impact me?

As previously noted, most customers will not be impacted by the change at all. For those customers who do use the prior implicit trust grant behavior, AWS will work with you to eliminate your usage prior to February 15, 2023. Here are more details for the two cases of customers who have not used the behavior, and those who have.

If you haven’t used the implicit trust behavior since June 30, 2022

Beginning today, if you have not used the old behavior for a given role at any time since June 30, 2022, you will now experience the new behavior. Those existing roles, as well as any new roles, will need an explicit reference in their own trust policy in order to assume themselves. If you have roles that are used only very occasionally, such as once per quarter for a seldom-run batch process, you should identify those roles and if necessary either remove the dependency on the old behavior or update their role trust policies to include the role itself prior to their next usage (see the second sample policy above for an example).

If you have used the implicit trust behavior since June 30, 2022

If you have a role that has used the implicit trust behavior since June 30, 2022, then you will continue to be able to do so with that role until February 15, 2023. AWS will provide you with notice referencing those roles beginning today through your AWS Health Dashboard and will also send an email with the relevant information to the account owner and security contact. We are allowing time for you to make any necessary changes to your existing processes, code, or configurations to prepare for removal of the implicit trust behavior. If you can’t change your processes or code, you can continue to use the behavior by making a configuration change—namely, by updating the relevant role trust policies to reference the role itself. On the other hand, you can opt out of the old behavior at any time by creating a new role with a different Amazon Resource Name (ARN) with the desired identity-based and trust-policy-based permissions and substituting it for any older role that was identified as using the implicit trust behavior. (The new role will not be allow-listed, because the allow list is based on role ARNs.) You can also modify an existing allow-listed role’s trust policy to explicitly deny access to itself. See the “What should I do next?” section for more information.

Notifications and retirement

As we previously noted, starting today, accounts with existing roles that use the implicit self-assume role assumption behavior will be notified of this change by email and through their AWS Health Dashboard. Those roles have been allow-listed, and so for now their behavior will continue as before. After February 15, 2023, the old behavior will be retired for all roles and all accounts. IAM Documentation has been updated to make clear the new behavior.

After the old behavior is retired from the allow-listed roles and accounts, role sessions that make self-referential role assumption calls will fail with an Access Denied error unless the role’s trust policy explicitly grants the permission directly through a role ARN. Another option is to grant permission indirectly through an ARN to the root principal in the trust policy that acts as a delegation of privilege management, after which permission grants in identity-based policies determine access, similar to the typical cross-account case.

Which usage scenarios are likely to be impacted?

Users often attach an IAM role to an Amazon Elastic Compute Cloud (Amazon EC2) instance, an Amazon Elastic Container Service (Amazon ECS) task, or AWS Lambda function. Attaching a role to one of these runtime environments enables workloads to use short-term session credentials based on that role. For example, when an EC2 instance is launched, AWS automatically creates a role session and assigns it to the instance. An AWS best practice is for the workload to use these credentials to issue AWS API calls without explicitly requesting short-term credentials through sts:AssumeRole calls.

However, examples and code snippets commonly available on internet forums and community knowledge sharing sites might incorrectly suggest that workloads need to call sts:AssumeRole to establish short-term sessions credentials for operation within those environments.

We analyzed AWS Security Token Service (AWS STS) service metadata about role self-assumption in order to understand the use cases and possible impact of the change. What the data shows is that in almost all cases this behavior is occurring due to unnecessarily reassuming the role in an Amazon EC2, Amazon ECS, Amazon Elastic Kubernetes Services (EKS), or Lambda runtime environment already provided by the environment. There are two exceptions, discussed at the end of this section under the headings, “self-assumption with a scoped-down policy” and “assuming a target compute role during development.”

There are many variations on this theme, but overall, most role self-assumption occurs in scenarios where the person or code is unnecessarily reassuming the role that the code was already running as. Although this practice and code style can still work with a configuration change (by adding an explicit self-reference to the role trust policy), the better practice will almost always be to remove this unnecessary behavior or code from your AWS environment going forward. By removing this unnecessary behavior, you save CPU, memory, and network resources.

Common mistakes when using Amazon EKS

Some users of the Amazon EKS service (or possibly their shell scripts) use the command line interface (CLI) command aws eks get-token to obtain an authentication token for use in managing a Kubernetes cluster. The command takes as an optional parameter a role ARN. That parameter allows a user to assume another role other than the one they are currently using before they call get-token. However, the CLI cannot call that API without already having an IAM identity. Some users might believe that they need to specify the role ARN of the role they are already using. We have updated the Amazon EKS documentation to make clear that this is not necessary.

Common mistakes when using AWS Lambda

Another example is the use of an sts:AssumeRole API call from a Lambda function. The function is already running in a preassigned role provided by user configuration within the Lambda service, or else it couldn’t successfully call any authenticated API action, including sts:AssumeRole. However, some Lambda functions call sts:AssumeRole with the target role being the very same role that the Lambda function has already been provided as part of its configuration. This call is unnecessary.

AWS Software Development Kits (SDKs) all have support for running in AWS Lambda environments and automatically using the credentials provided in that environment. We have updated the Lambda documentation to make clear that such STS calls are unnecessary.

Common mistakes when using Amazon ECS

Customers can associate an IAM role with an Amazon ECS task to give the task AWS credentials to interact with other AWS resources.

We detected ECS tasks that call sts:AssumeRole on the same role that was provided to the ECS task. Amazon ECS makes the role’s credentials available inside the compute resources of the ECS task, whether on Amazon EC2 or AWS Fargate, and these credentials can be used to access AWS services or resources as the IAM role associated with the ECS talk, without being called through sts:AssumeRole. AWS handles renewing the credentials available on ECS tasks before the credentials expire. AWS STS role assumption calls are unnecessary, because they simply create a new set of the same temporary role session credentials.

AWS SDKs all have support for running in Amazon ECS environments and automatically using the credentials provided in that ECS environment. We have updated the Amazon ECS documentation to make clear that calling sts:AssumeRole for an ECS task is unnecessary.

Common mistakes when using Amazon EC2

Users can configure an Amazon EC2 instance to contain an instance profile. This instance profile defines the IAM role that Amazon EC2 assigns the compute instance when it is launched and begins to run. The role attached to the EC2 instance enables your code to send signed requests to AWS services. Without this attached role, your code would not be able to access your AWS resources (nor would it be able to call sts:AssumeRole). The Amazon EC2 service handles renewing these temporary role session credentials that are assigned to the instance before they expire.

We have observed that workloads running on EC2 instances call sts:AssumeRole to assume the same role that is already associated with the EC2 instance and use the resulting role-session for communication with AWS services. These role assumption calls are unnecessary, because they simply create a new set of the same temporary role session credentials.

AWS SDKs all have support for running in Amazon EC2 environments and automatically using the credentials provided in that EC2 environment. We have updated the Amazon EC2 documentation to make clear that calling sts:AssumeRole for an EC2 instance with a role assigned is unnecessary.

For information on creating an IAM role, attaching that role to an EC2 instance, and launching an instance with an attached role, see “IAM roles for Amazon EC2” in the Amazon EC2 User Guide.

Other common mistakes

If your use case does not use any of these AWS execution environments, you might still experience an impact from this change. We recommend that you examine the roles in your account and identify scenarios where your code (or human use through the AWS CLI) results in a role assuming itself. We provide Amazon Athena and AWS CloudTrail Lake queries later in this post to help you locate instances where a role assumed itself. For each instance, you can evaluate whether a role assuming itself is the right operation for your needs.

Self-assumption with a scoped-down policy

The first pattern we have observed that is not a mistake is the use of self-assumption combined with a scoped-down policy. Some systems use this approach to provide different privileges for different use cases, all using the same underlying role. Customers who choose to continue with this approach can do so by adding the role to its own trust policy. While the use of scoped-down policies and the associated least-privilege approach to permissions is a good idea, we recommend that customers switch to using a second generic role and assume that role along with the scoped-down policy rather than using role self-assumption. This approach provides more clarity in CloudTrail about what is happening, and limits the possible iterations of role assumption to one round, since the second role should not be able to assume the first. Another possible approach in some cases is to limit subsequent assumptions is by using an IAM condition in the role trust policy that is no longer satisfied after the first role assumption. For example, for Lambda functions, this would be done by a condition checking for the presence of the “lambda:SourceFunctionArn” property; for EC2, by checking for presence of “ec2:SourceInstanceARN.”

Assuming an expected target compute role during development

Another possible reason for role self-assumption may result from a development practice in which developers attempt to normalize the roles that their code is running in between scenarios in which role credentials are not automatically provided by the environment, and scenarios where they are. For example, imagine a developer is working on code that she expects to run as a Lambda function, but during development is using her laptop to do some initial testing of the code. In order to provide the same execution role as is expected later in product, the developer might configure the role trust policy to allow assumption by a principal readily available on the laptop (an IAM Identity Center role, for example), and then assume the expected Lambda function execution role when the code is initializing. The same approach could be used on a build and test server. Later, when the code is deployed to Lambda, the actual role is already available and in use, but the code need not be modified in order to provide the same post-role-assumption behavior that existing outside of Lambda: the unmodified code can automatically assume what is in this case the same role, and proceed. While this approach is not illogical, as with the scope-down policy case we recommend that customers configure distinct roles for assumption both in development and test environments as well as later production environments. Again, this approach provides more clarity in CloudTrail about what is happening, and limits the possible iterations of role assumption to one round, since the second role should not be able to assume the first.

What should I do next?

If you receive an email or AWS Health Dashboard notification for an account, we recommend that you review your existing role trust policies and corresponding code. For those roles, you should remove the dependency on the old behavior, or if you can’t, update those role trust policies with an explicit self-referential permission grant. After the grace period expires on February 15, 2023, you will no longer be able to use the implicit self-referential permission grant behavior.

If you currently use the old behavior and need to continue to do so for a short period of time in the context of existing infrastructure as code or other automated processes that create new roles, you can do so by adding the role’s ARN to its own trust policy. We strongly encourage you to treat this as a temporary stop-gap measure, because in almost all cases it should not be necessary for a role to be able to assume itself, and the correct solution is to change the code that results in the unnecessary self-assumption. If for some reason that self-service solution is not sufficient, you can reach out to AWS Support to seek an accommodation of your use case for new roles or accounts.

If you make any necessary code or configuration changes and want to remove roles that are currently allow-listed, you can also ask AWS Support to remove those roles from the allow list so that their behavior follows the new model. Or, as previously noted, you can opt out of the old behavior at any time by creating a new role with a different ARN that has the desired identity-based and trust-policy–based permissions and substituting it for the allow-listed role. Another stop-gap type of option is to add an explicit deny that references the role to its own trust policy.

If you would like to understand better the history of your usage of role self-assumption in a given account or organization, you can follow these instructions on querying CloudTrail data with Athena and then use the following Athena query against your account or organization CloudTrail data, as stored in Amazon Simple Storage Services (Amazon S3). The results of the query can help you understand the scenarios and conditions and code involved. Depending on the size of your CloudTrail logs, you may need to follow the partitioning instructions to query subsets of your CloudTrail logs sequentially. If this query yields no results, the role self-assumption scenario described in this blog post has never occurred within the analyzed CloudTrail dataset.

SELECT eventid, eventtime, userIdentity.sessioncontext.sessionissuer.arn as RoleARN, split_part(userIdentity.principalId, ':', 2) as RoleSessionName from cloudtrail_logs t CROSS JOIN UNNEST(t.resources) unnested (resources_entry) where eventSource = 'sts.amazonaws.com' and eventName = 'AssumeRole' and userIdentity.type = 'AssumedRole' and errorcode IS NULL and substr(userIdentity.sessioncontext.sessionissuer.arn,12) = substr(unnested.resources_entry.ARN,12)

As another option, you can follow these instructions to set up CloudTrail Lake to perform a similar analysis. CloudTrail Lake allows richer, faster queries without the need to partition the data. As of September 20, 2022, CloudTrail Lake now supports import of CloudTrail logs from Amazon S3. This allows you to perform a historical analysis even if you haven’t previously enabled CloudTrail Lake. If this query yields no results, the scenario described in this blog post has never occurred within the analyzed CloudTrail dataset.

SELECT eventid, eventtime, userIdentity.sessioncontext.sessionissuer.arn as RoleARN, userIdentity.principalId as RoleIdColonRoleSessionName from $EDS_ID where eventSource = 'sts.amazonaws.com' and eventName = 'AssumeRole' and userIdentity.type = 'AssumedRole' and errorcode IS NULL and userIdentity.sessioncontext.sessionissuer.arn = element_at(resources,1).arn

Understanding the change: more details

To better understand the background of this change, we need to review the IAM basics of identity-based policies and resource-based policies, and then explain some subtleties and exceptions. You can find additional overview material in the IAM documentation.

The structure of each IAM policy follows the same basic model: one or more statements with an effect (allow or deny), along with principals, actions, resources, and conditions. Although the identity-based and resource-based policies share the same basic syntax and semantics, the former is associated with a principal, the latter with a resource. The main difference between the two is that identity-based policies do not specify the principal, because that information is supplied implicitly by associating the policy with a given principal. On the other hand, resource policies do not specify an arbitrary resource, because at least the primary identifier of the resource (for example, the bucket identifier of an S3 bucket) is supplied implicitly by associating the policy with that resource. Note that an IAM role is the only kind of AWS object that is both a principal and a resource.

In most cases, access to a resource within the same AWS account can be granted by either an identity-based policy or a resource-based policy. Consider an Amazon S3 example. An identity-based policy attached to an IAM principal that allows the s3:GetObject action does not require an equivalent grant in the S3 bucket resource policy. Conversely, an s3:GetObject permission grant in a bucket’s resource policy is all that is needed to allow a principal in the same account to call the API with respect to that bucket; an equivalent identity-based permission is not required. Either the identity-based policy or the resource-based policy can grant the necessary permission. For more information, see IAM policy types: How and when to use them.

However, in order to more tightly govern access to certain security-sensitive resources, such as AWS Key Management Service (AWS KMS) keys and IAM roles, those resource policies need to grant access to the IAM principal explicitly, even within the same AWS account. A role trust policy is the resource policy associated with a role that specifies which IAM principals can assume the role by using one of the sts:AssumeRole* API calls. For example, in order for RoleB to assume RoleA in the same account, whether or not RoleB’s identity-based policy explicitly allows it to assume RoleA, RoleA’s role trust policy must grant access to RoleB. Within the same account, an identity-based permission by itself is not sufficient to allow assumption of a role. On the other hand, a resource-based permission—a grant of access in the role trust policy—is sufficient. (Note that it’s possible to construct a kind of hybrid permission to a role by using both its resource policy and other identity-based policies. In that case, the role trust policy grants permission to the root principal ARN; after that, the identity-based policy of a principal in that account would need to explicitly grant permission to assume that role. This is analogous to the typical cross-account role trust scenario.)

Until now, there has been a nonintuitive exception to these rules for situations where a role assumes itself. Since a role is both a principal (potentially with an identity-based policy) and a resource (with a resource-based policy), it is in the unique position of being both a subject and an object within the IAM system, as well as being an object owned by itself rather than its containing account. Due to this ownership model, roles with identity-based permission to assume themselves implicitly trusted themselves as resources, and vice versa. That is to say, roles that had the privilege as principals to assume themselves implicitly trusted themselves as resources, without an explicit self-referential Allow in the role trust policy. Conversely, a grant of permission in the role trust policy was sufficient regardless of whether there was a grant in the same role’s identity-based policy. Thus, in the self-assumption case, roles behaved like most other resources in the same account: only a single permission was required to allow role self-assumption, either on the identity side or the resource side of their dual-sided nature. Because of a role’s implicit trust of itself as a resource, the role’s trust policy—which might otherwise limit assumption of the role with properties such as actions and conditions—was not applied, unless it contained an explicit deny of itself.

The following example is a role trust policy attached to the role named RoleA in account 123456789012. It grants explicit access only to the role named RoleB.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::123456789012:role/RoleB"
            },
            "Action": ["sts:AssumeRole", "sts:TagSession"],
            "Condition": {
                "StringEquals": {
                    "aws:PrincipalTag/project": "BlueSkyProject"
                }
            }
        }
    ]
}

Assuming that the corresponding identity-based policy for RoleA granted the sts:AssumeRole action with regard to RoleA, this role trust policy provided that there were two roles that could assume RoleA: RoleB (explicitly referenced in the trust policy) and RoleA (assuming it was explicitly referenced in its identity policy). RoleB could assume RoleA only if it had the principal tag project:BlueSkyProject because of the trust policy condition. (The sts:TagSession permission is needed here in case tags need to be added by the caller as parted of the RoleAssumption call.) RoleA, on the other hand, did not need to meet that condition because it relied on a different explicit permission—the one granted in the identity-based policy. RoleA would have needed the principal tag project:BlueSkyProject to meet the trust policy condition if and only if it was relying on the trust policy to gain access through the sts:AssumeRole action; that is, in the case where its identity-based policy did not provide the needed privilege.

As we previously noted, after considering feedback from customers on this topic, AWS has decided that requiring self-referential role trust policy grants even in the case where the identity-based policy also grants access is the better approach to delivering consistency and visibility with regard to role behavior and privileges. Therefore, as of today, r­ole assumption behavior requires an explicit self-referential permission in the role trust policy, and the actions and conditions within that policy must also be satisfied, regardless of the permissions expressed in the role’s identity-based policy. (If permissions in the identity-based policy are present, they must also be satisfied.)

Requiring self-reference in the trust policy makes role trust policy evaluation consistent regardless of which role is seeking to assume the role. Improved consistency makes role permissions easier to understand and manage, whether through human inspection or security tooling. This change also eliminates the possibility of continuing the lifetime of an otherwise temporary credential without explicit, trackable grants of permission in trust policies. It also means that trust policy constraints and conditions are enforced consistently, regardless of which principal is assuming the role. Finally, as previously noted, this change allows customers to create and understand role assumption permissions in a single place (the role trust policy) rather than two places (the role trust policy and the role identity policy). It increases the simplicity of role trust permission management: “what you see [in the trust policy] is what you get.”

Continuing with the preceding example, if you need to allow a role to assume itself, you now must update the role trust policy to explicitly allow both RoleB and RoleA. The RoleA trust policy now looks like the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::123456789012:role/RoleB",
                    "arn:aws:iam::123456789012:role/RoleA"
                ]
            },
            "Action": ["sts:AssumeRole", "sts:TagSession"],
            "Condition": {
                "StringEquals": {
					"aws:PrincipalTag/project": "BlueSkyProject"
				}
            }
        }
    ]
}

Without this new principal grant, the role can no longer assume itself. The trust policy conditions are also applied, even if the role still has unconditioned access to itself in its identity-based policy.

Conclusion

In this blog post we’ve reviewed the old and new behavior of role assumption in the case where a role seeks to assume itself. We’ve seen that, according to our analysis of service metadata, the vast majority of role self-assumption behavior that relies solely on identity-based privileges is totally unnecessary, because the code (or human) who calls sts:AssumeRole is already, without realizing it, using the role’s credentials to call the AWS STS API. Eliminating that mistake will improve performance and decrease resource consumption. We’ve also explained in more depth the reasons for the old behavior and the reasons for making the change, and provided Athena and CloudTrail Lake queries that you can use to examine past or (in the case of allow-listed roles) current self-assumption behavior in your own environments. You can reach out to AWS Support or your customer account team if you need help in this effort.

If you currently use the old behavior and need to continue to do so, your primary option is to create an explicit allow for the role in its own trust policy. If that option doesn’t work due to operational constraints, you can reach out to AWS Support to seek an accommodation of your use case for new roles or new accounts. You can also ask AWS Support to remove roles from the allow-list if you want their behavior to follow the new model.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new IAM-tagged discussion on AWS re:Post or contact AWS Support.

AWS would like to thank several customers and partners who highlighted this behavior as something they found surprising and unhelpful, and asked us to consider making this change. We would also like to thank independent security researcher Ryan Gerstenkorn who engaged with AWS on this topic and worked with us prior to this update.

Want more AWS Security news? Follow us on Twitter.

Mark Ryland

Mark Ryland

Mark is the director of the Office of the CISO for AWS. He has over 30 years of experience in the technology industry and has served in leadership roles in cybersecurity, software engineering, distributed systems, technology standardization and public policy. Previously, he served as the Director of Solution Architecture and Professional Services for the AWS World Public Sector team.

Stephen Whinston

Stephen Whinston

Stephen is a Senior Product Manager with the AWS Identity and Access Management organization. Prior to Amazon, Stephen worked in product management for cloud service and identity management providers. Stephen holds degrees in computer science and an MBA from the University of Colorado Leeds School of Business. Outside of work, Stephen enjoys his family time and the Pacific Northwest.