Amazon Web Services (AWS) was recognized by KuppingerCole Analysts AG as an Overall Leader in the firm’s Leadership Compass report for Policy Based Access Management. The Leadership Compass report reveals Amazon Verified Permissions as an Overall Leader (as shown in Figure 1), a Product Leader for functional strength, and an Innovation Leader for open source security. The recognition is based on a comparison with 14 other vendors, using standardized evaluation criteria set by KuppingerCole.
Figure 1: KuppingerCole Leadership Compass for Policy Based Access Management
The report helps organizations learn about policy-based access management solutions for common use cases and requirements. KuppingerCole defines policy-based access management as an approach that helps to centralize policy management, run authorization decisions across a variety of applications and resource types, continually evaluate authorization decisions, and support corporate governance.
Policy-based access management has three major benefits: consistency, security, and agility. Many organizations grapple with a patchwork of access control mechanisms, which can hinder their ability to implement a consistent approach across the organization, increase their security risk exposure, and reduce the agility of their development teams. A policy-based access control architecture helps organizations centralize their policies in a policy store outside the application codebase, where the policies can be audited and consistently evaluated. This enables teams to build, refactor, and expand applications faster, because policy guardrails are in place and access management is externalized.
Amazon Verified Permissions is a scalable, fine-grained permissions management and authorization service for the applications that you build. This service helps your developers to build more secure applications faster, by externalizing authorization and centralizing policy management and administration. Developers can align their application access with Zero Trust principles by implementing least privilege and continual authorization within applications. Security and audit teams can better analyze and audit who has access to what within applications.
Verified Permissions uses Cedar, a purpose-built and security-first open source policy language, to define policy-based access controls by using roles and attributes for more granular, context-aware access control. Cedar demonstrates the AWS commitment to raising the bar for open source security by developing key security-related technologies in collaboration with the community, with a goal of improving security postures across the industry.
The KuppingerCole Leadership Compass report offers insightful guidance as you evaluate policy-based access management solutions for your organization. Access a complimentary copy of the 2024 KuppingerCole Leadership Compass for Policy-Based Access Management.
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
AWS Identity and Access Management (IAM) Roles Anywhere enables workloads that run outside of Amazon Web Services (AWS), such as servers, containers, and applications, to use X.509 digital certificates to obtain temporary AWS credentials and access AWS resources, the same way that you use IAM roles for workloads on AWS. Now, IAM Roles Anywhere allows you to use PKCS #11–compatible cryptographic modules to help you securely store private keys associated with your end-entity X.509 certificates.
Cryptographic modules allow you to generate non-exportable asymmetric keys in the module hardware. The cryptographic module exposes high-level functions, such as encrypt, decrypt, and sign, through an interface such as PKCS #11. Using a cryptographic module with IAM Roles Anywhere helps to ensure that the private keys associated with your end-identity X.509 certificates remain in the module and cannot be accessed or copied to the system.
In this post, I will show how you can use PKCS #11–compatible cryptographic modules, such as YubiKey 5 Series and Thales ID smart cards, with your on-premises servers to securely store private keys. I’ll also show how to use those private keys and certificates to obtain temporary credentials for the AWS Command Line Interface (AWS CLI) and AWS SDKs.
Cryptographic modules use cases
IAM Roles Anywhere reduces the need to manage long-term AWS credentials for workloads running outside of AWS, to help improve your security posture. Now IAM Roles Anywhere has added support for compatible PKCS #11 cryptographic modules to the credential helper tool so that organizations that are currently using these (such as defense, government, or large enterprises) can benefit from storing their private keys on their security devices. This mitigates the risk of storing the private keys as files on servers where they can be accessed or copied by unauthorized users.
Note: If your organization does not implement PKCS #11–compatible modules, IAM Roles Anywhere credential helper supports OS certificate stores (Keychain Access for macOS and Cryptography API: Next Generation (CNG) for Windows) to help protect your certificates and private keys.
Solution overview
This authentication flow is shown in Figure 1 and is described in the following sections.
Figure 1: Authentication flow using crypto modules with IAM Roles Anywhere
How it works
As a prerequisite, you must first create a trust anchor and profile within IAM Roles Anywhere. The trust anchor will establish trust between your public key infrastructure (PKI) and IAM Roles Anywhere, and the profile allows you to specify which roles IAM Roles Anywhere assumes and what your workloads can do with the temporary credentials. You establish trust between IAM Roles Anywhere and your certificate authority (CA) by creating a trust anchor. A trust anchor is a reference to either AWS Private Certificate Authority (AWS Private CA) or an external CA certificate. For this walkthrough, you will use the AWS Private CA.
The one-time initialization process (step “0 – Module initialization” in Figure 1) works as follows:
You first generate the non-exportable private key within the secure container of the cryptographic module.
You then create the X.509 certificate that will bind an identity to a public key:
Create a certificate signing request (CSR).
Submit the CSR to the AWS Private CA.
Obtain the certificate signed by the CA in order to establish trust.
The certificate is then imported into the cryptographic module for mobility purposes, to make it available and simple to locate when the module is connected to the server.
After initialization is done, the module is connected to the server, which can then interact with the AWS CLI and AWS SDK without long-term credentials stored on a disk.
To obtain temporary security credentials from IAM Roles Anywhere:
The server will use the credential helper tool that IAM Roles Anywhere provides. The credential helper works with the credential_process feature of the AWS CLI to provide credentials that can be used by the CLI and the language SDKs. The helper manages the process of creating a signature with the private key.
The credential helper tool calls the IAM Roles Anywhere endpoint to obtain temporary credentials that are issued in a standard JSON format to IAM Roles Anywhere clients via the API method CreateSession action.
The server uses the temporary credentials for programmatic access to AWS services.
Alternatively, you can use the update or serve commands instead of credential-process. The update command will be used as a long-running process that will renew the temporary credentials 5 minutes before the expiration time and replace them in the AWS credentials file. The serve command will be used to vend temporary credentials through an endpoint running on the local host using the same URIs and request headers as IMDSv2 (Instance Metadata Service Version 2).
Supported modules
The credential helper tool for IAM Roles Anywhere supports most devices that are compatible with PKCS #11. The PKCS #11 standard specifies an API for devices that hold cryptographic information and perform cryptographic functions such as signature and encryption.
I will showcase how to use a YubiKey 5 Series device that is a multi-protocol security key that supports Personal Identity Verification (PIV) through PKCS #11. I am using YubiKey 5 Series for the purpose of demonstration, as it is commonly accessible (you can purchase it at the Yubico store or Amazon.com) and is used by some of the world’s largest companies as a means of providing a one-time password (OTP), Fast IDentity Online (FIDO) and PIV for smart card interface for multi-factor authentication. For a production server, we recommend using server-specific PKCS #11–compatible hardware security modules (HSMs) such as the YubiHSM 2, Luna PCIe HSM, or Trusted Platform Modules (TPMs) available on your servers.
Note: The implementation might differ with other modules, because some of these come with their own proprietary tools and drivers.
Implement the solution: Module initialization
You need to have the following prerequisites in order to initialize the module:
The Security Officer (SO) should have permissions to use the AWS Private CA service
Following are the high-level steps for initializing the YubiKey device and generating the certificate that is signed by AWS Private Certificate Authority (AWS Private CA). Note that you could also use your own public key infrastructure (PKI) and register it with IAM Roles Anywhere.
To initialize the module and generate a certificate
Verify that the YubiKey PIV interface is enabled, because some organizations might disable interfaces that are not being used. To do so, run the YubiKey Manager CLI, as follows:
ykman info
The output should look like the following, with the PIV interface enabled for USB.
Figure 2:YubiKey Manager CLI showing that the PIV interface is enabled
Use the YubiKey Manager CLI to generate a new RSA2048 private key on the security module in slot 9a and store the associated public key in a file. Different slots are available on YubiKey, and we will use the slot 9a that is for PIV authentication purpose. Use the following command to generate an asymmetric key pair. The private key is generated on the YubiKey, and the generated public key is saved as a file. Enter the YubiKey management key to proceed:
Create a certificate request (CSR) based on the public key and specify the subject that will identify your server. Enter the user PIN code when prompted.
Import the certificate file on the module by using the YubiKey Manager CLI or through the YubiKey Manager UI. Enter the YubiKey management key to proceed.
The security module is now initialized and can be plugged into the server.
Configuration to use the security module for programmatic access
The following steps will demonstrate how to configure the server to interact with the AWS CLI and AWS SDKs by using the private key stored on the YubiKey or PKCS #11–compatible device.
Install the p11-kit package. Most providers (including opensc) will ship with a p11-kit “module” file that makes them discoverable. Users shouldn’t need to specify the PKCS #11 “provider” library when using the credential helper, because we use p11-kit by default.
If your device library is not supported by p11-kit, you can install that library separately.
Verify the content of the YubiKey by using the following command:
ykman --device 123456 piv info
The output should look like the following.
Figure 3: YubiKey Manager CLI output for the PIV information
This command provides the general status of the PIV application and content in the different slots such as the certificates installed.
Use the credential helper command with the security module. The command will require at least:
The ARN of the trust anchor
The ARN of the target role to assume
The ARN of the profile to pull policies from
The certificate and/or key identifiers in the form of a PKCS #11 URI
You can use the certificate flag to search which slot on the security module contains the private key associated with the user certificate.
To specify an object stored in a cryptographic module, you should use the PKCS #11 URI that is defined in RFC7512. The attributes in the identifier string are a set of search criteria used to filter a set of objects. See a recommended method of locating objects in PKCS #11.
In the following example, we search for an object of type certificate, with the object label as “Certificate for Digital Signature”, in slot 1. The pin-value attribute allows you to directly use the pin to log into the cryptographic device.
From the folder where you have installed the credential helper tool, use the following command. Because we only have one certificate on the device, we can limit the filter to the certificate type in our PKCS #11 URI.
If everything is configured correctly, the credential helper tool will return a JSON that contains the credentials, as follows. The PIN code will be requested if you haven’t specified it in the command.
Please enter your user PIN:
{
"Version":1,
"AccessKeyId": <String>,
"SecretAccessKey": <String>,
"SessionToken": <String>,
"Expiration": <Timestamp>
}
To use temporary security credentials with AWS SDKs and the AWS CLI, you can configure the credential helper tool as a credential process. For more information, see Source credentials with an external process. The following example shows a config file (usually in ~/.aws/config) that sets the helper tool as the credential process.
You can provide the PIN as part of the credential command with the option pin-value=<PIN> so that the user input is not required.
If you prefer not to store your PIN in the config file, you can remove the attribute pin-value. In that case, you will be prompted to enter the PIN for every CLI command.
You can use the serve and update commands of the credential helper mentioned in the solution overview to manage credential rotation for unattended workloads. After the successful use of the PIN, the credential helper will store it in memory for the duration of the process and not ask for it anymore.
Auditability and fine-grained access
You can audit the activity of servers that are assuming roles through IAM Roles Anywhere. IAM Roles Anywhere is integrated with AWS CloudTrail, a service that provides a record of actions taken by a user, role, or an AWS service in IAM Roles Anywhere.
For Lookup attributes, filter by Event source and enter rolesanywhere.amazonaws.com in the textbox. You will find all the API calls that relate to IAM Roles Anywhere, including the CreateSession API call that returns temporary security credentials for workloads that have been authenticated with IAM Roles Anywhere to access AWS resources.
Figure 4: CloudTrail Events filtered on the “IAM Roles Anywhere” event source
When you review the CreateSession event record details, you can find the assumed role ID in the form of <PrincipalID>:<serverCertificateSerial>, as in the following example:
Figure 5: Details of the CreateSession event in the CloudTrail console showing which role is being assumed
If you want to identify API calls made by a server, for Lookup attributes, filter by User name, and enter the serverCertificateSerial value from the previous step in the textbox.
Figure 6: CloudTrail console events filtered by the username associated to our certificate on the security module
The API calls to AWS services made with the temporary credentials acquired through IAM Roles Anywhere will contain the identity of the server that made the call in the SourceIdentity field. For example, the EC2 DescribeInstances API call provides the following details:
Figure 7: The event record in the CloudTrail console for the EC2 describe instances call, with details on the assumed role and certificate CN.
Additionally, you can include conditions in the identity policy for the IAM role to apply fine-grained access control. This will allow you to apply a fine-grained access control filter to specify which server in the group of servers can perform the action.
To apply access control per server within the same IAM Roles Anywhere profile
In the IAM Roles Anywhere console, select the profile used by the group of servers, then select one of the roles that is being assumed.
Apply the following policy, which will allow only the server with CN=server1-demo to list all buckets by using the condition on aws:SourceIdentity.
In this blog post, I’ve demonstrated how you can use the YubiKey 5 Series (or any PKCS #11 cryptographic module) to securely store the private keys for the X.509 certificates used with IAM Roles Anywhere. I’ve also highlighted how you can use AWS CloudTrail to audit API actions performed by the roles assumed by the servers.
Amazon Simple Email Service (Amazon SES) is a bulk and transactional email sending service for businesses and developers. To send emails from a particular email address through SES, users have to verify ownership of the email address, the domain used by the email address, or a parent domain of the domain used by the email address. This is referred to as an identity and is treated as a user-owned resource by SES.
For example, to send an email from [email protected], the user must verify ownership of the email address [email protected], the subdomain mail.example.com, or the domain example.com. Only identity owners are allowed to send emails from email addresses covered by their identities.
Why use the sending authorization feature in email?
This post will show you how you can grant another account or user to send emails from the identity that you own . By using sending authorization , you can authorize other users to send emails from the identities that you own using their Amazon SES accounts . In this blog post I’d like to walk you through how to setup sending authorization and addressing common concerns regarding the same.
With sending authorization, you can verify the identity under a single account and then grant the other accounts/users permission to send emails from that verified identity.
Let’s look at the below use case :
For example, if you’re a business owner who has collaborated with a email marketing company to send emails from your domain but you would like that only the domain you own should be verified in your account whereas , the email sending, and the monitoring of those emails ( bounce/complaint/delivery notifications for the emails) should be taken care by the email marketing company itself.
With sending authorization, the business owner can verify the identity in their SES account and provide the necessary permissions to the user of the email marketing company in order to send emails using their domain .
Before we proceed further , there are two important terms shared below which you should know that are used throughout the blog:
Delegate Sender : The user that will be using the verified identity from another account to send email.
Identity Owner : The account where the identity is verified . A policy is attached to an identity to specify who may send for that identity and under which conditions. You can refer the SES developer guide to know more
Overview of solution
If you want to enable a delegate sender to send on your behalf, you create a sending authorization policy and associate the policy to your identity by using the Amazon SES console or the Amazon SES API.
When the delegate sender attempts to send an email through Amazon SES on your behalf, the delegate sender passes the ARN of your identity in the request or in the header of the email as you can see from the Figure 1 shared below. Figure 1 shows the architecture of the sending authorization process.
3. When Amazon SES receives the request to send the email, it checks your identity’s policy (if present) to determine if you have authorized the delegate sender to send on the identity’s behalf. If the delegate sender is authorized, Amazon SES accepts the email; otherwise, Amazon SES returns an error message. The error message is similar to error message :“ AccessDenied: User is not authorized to perform ses sendemail”
Walkthrough
In this section, you’ll learn the steps needed to setup email sending authorization:
Create a IAM user in Delegate Sender Account with the necessary email sending permissions.You can read more about the necessary email sending permission in our developer guide
Verify Identity in Identity Owner Account which will be used by the Delegate Sender account later to send email.
Set up Identity policy to authorize the Delegate Sender Account to send emails using an email address or domain (an identity) owned by Identity Owner Account. The below steps illustrates how you can setup the identity policy .
In order to add the identity policy , go to the Verified-identities screen of the SES console, select the verified identity you wish to authorize for the delegate sender to send on your behalf.
Choose the verified identity’s Authorization tab. Please refer the below screenshot for reference :
You can use both policy generator or create a custom policy .
In the Authorization policies pane, if you wish to use the policy generator to create the policy then you can select Use policy generator from the drop-down. You can create the sending authorization policy depending on your use case . The below screenshot demonstrates the policy generator view :
You can also create the policy using the option “create custom policy ” . Please see the below screenshot for reference for a sample policy :
Add the identity policy to the verified identity in Identity owner account . Check the sample policy below for reference :
{ “Version”: “2008-10-17”, “Statement”: [ { “Sid”: “stmt1532578375047”, “Effect”: “Allow”, “Principal”: { “AWS”: “<write ARN of user belonging to Delegate sender account>” }, “Action”: [ “ses:SendEmail”, “ses:SendRawEmail” ], “Resource”: “<write ARN of the identity verified in Identity owner Account >” } ] }
Note: Please make sure to write the ARN’s for the Principal and the Resource in the above given sample policy.
3.Click on Apply policy after you have reviewed the authorization policy.
You can use the policy generator to create a sending authorization policy or use Amazon SES API or console to create a custom policy . This policy can also restrict usage based on different conditions . A condition is any restriction about the permission in the statement. A key is the specific characteristic that’s the basis for access restriction .
4. Send email from Account B using the source ARN of the identity of Account A . Here we will be sending emails using the send-email api command using AWS CLI . When you send an email using the Amazon SES API, you specify the content of the message, and Amazon SES assembles a MIME email for you.
This blogpost assumes that you have installed and configured AWS CLI on your terminal. For more information on Installing or updating the latest version of the AWS CLI, refer this link.
aws ses send-email –source-arn “arn:aws:ses:us-east-1:XXXXXXXXX:identity/example.com” –from [email protected] –to [email protected] –text “This is for those who cannot read HTML.” –html “<h1>Hello World</h1><p>This is a pretty mail with HTML formatting</p>” –subject “Hello World”
Replace the From address , To address and source ARN (identity ARN from identity owner account) in the above command.
Once the email request is sent to SES , SES will acknowledge it with a Message ID. This Message ID is a string of characters that uniquely identifies the request and looks something like this: “000001271b15238a-fd3ae762-2563-11df-8cd4-6d4e828a9ae8-000000” .
If you are using SMTP interface for delegate sending, you have to add the authorisation policy in the SMTP user and include the X-SES-SOURCE-ARN, X-SES-FROM-ARN, and X-SES-RETURN-PATH-ARN headers in your message. Pass these headers after you issue the DATA command in the SMTP conversation.
Notifications in case of email sending authorization
If you authorize a delegate sender to send email on your behalf, Amazon SES counts all bounces or complaints that those emails generate toward the delegate sender’s bounce and complaint limits, rather than the identity owner. However, if your IP address appears on third-party anti-spam, DNS-based Blackhole Lists (DNSBLs) as a result of messages sent by a delegate sender, the reputation of your identities may be damaged. For this reason, if you’re an identity owner, you should set up email feedback forwarding for all your identities, including those that you’ve authorized for delegate sending.
For setting up notifications for Identity owner , refer the steps mentioned in the SES developer guide
Delegate senders can and should set up their own bounce and complaint notifications for the identities that you have authorized them to use. They can set up event publishing to to publish bounce and complaint events to an Amazon SNS topic or a Kinesis Data Firehose stream.
Note : If neither the identity owner nor the delegate sender sets up a method of sending notifications for bounce and complaint events, or if the sender doesn’t apply the configuration set that uses the event publishing rule, then Amazon SES automatically sends event notifications by email to the address in the Return-Path field of the email (or the address in the Source field, if you didn’t specify a Return-Path address), even if you disabled email feedback forwarding
Cleaning up resources:
To remove the resources created by this solution:
You can delete the verified identities from Idenitity owner account if you no longer wish to send emails from that verified identity. You can check the SES developer guide for steps for deleting the verified identity .
Frequently Asked Questions
Q.1 If my delegate sender account is in sandbox, can I send emails from the delegate sender account to non-verified addresses ?
Sanbox Restriction : If delegate sender account is in sandbox mode then you need to submit a limit increase case to move the Delegate sender account out of Sandbox mode to “get rid of the Sandbox limitations“. The AWS account of the delegate sender has to be removed from the sandbox before it can be used to send email to non-verified addresses.
If delegate sender account is in sandbox mode, you will face the following error while email sending to unverified identities :
An error occurred (MessageRejected) when calling the SendEmail operation: Email address is not verified. The following identities failed the check in region US-EAST-1 [email protected]
However , you can sent email to verified identities successfully from the delegate sender account in case of sandbox access .
Q2. Is it necessary to have production access in identity owner account ? It is not necessary to have the Identity owner account to have production access for using Sending authorization.
Q.3 Will the delegate sender account or the identity owner get charged for the emails sent using sending authorization ?
Billing : Emails sent from the delegate sender account are billed to delegate sender account .
Reputation and sending quota : Cross-account emails count against the delegate’s sending limits, so the delegate is responsible for applying for any sending limit increases they might need. Similarly, delegated emails get charged to the delegate’s account, and any bounces and complaints count against the delegate’s reputation.
Region : The delegate sender must send the emails from the AWS Region in which the identity owner’s identity is verified.
Conclusion:
By using Sending Authorization, identity owners will be able to grant delegate senders the permission to send emails through their own verified identities in SES. With the sending authorization feature, you will have complete control over your identities so that you can change or revoke permissions at any time.
The authorization team at Netflix recently sponsored work to add Attribute Based Access Control (ABAC) support to AuthZed’s open source Google Zanzibar inspired authorization system, SpiceDB. Netflix required attribute support in SpiceDB to support core Netflix application identity constructs. This post discusses why Netflix wanted ABAC support in SpiceDB, how Netflix collaborated with AuthZed, the end result–SpiceDB Caveats, and how Netflix may leverage this new feature.
Netflix is always looking for security, ergonomic, or efficiency improvements, and this extends to authorization tools. Google Zanzibar is exciting to Netflix as it makes it easier to produce authorization decision objects and reverse indexes for resources a principal can access.
Last year, while experimenting with Zanzibar approaches to authorization, Netflix found SpiceDB, the open source Google Zanzibar inspired permission system, and built a prototype to experiment with modeling. The prototype uncovered trade-offs required to implement Attribute Based Access Control in SpiceDB, which made it poorly suited to Netflix’s core requirements for application identities.
Why did Netflix Want Caveated Relationships?
Netflix application identities are fundamentally attribute based: e.g. an instance of the Data Processor runs in eu-west-1 in the test environment with a public shard.
Authorizing these identities is done not only by application name, but by specifying specific attributes on which to match. An application owner might want to craft a policy like “Application members of the EU data processors group can access a PI decryption key”. This is one normal relationship in SpiceDB. But, they might also want to specify a policy for compliance reasons that only allows access to the PI key from data processor instances running in the EU within a sensitive shard. Put another way, an identity should only be considered to have the “is member of the EU-data-processors group” if certain identity attributes (like region==eu) match in addition to the application name. This is a Caveated SpiceDB relationship.
Netflix Modeling Challenges Before Caveats
SpiceDB, being a Relationship Based Access Control (ReBAC) system, expected authorization checks to be performed against the existence of a specific relationship between objects. Users fit this model — they have a single user ID to describe who they are. As described above, Netflix applications do not fit this model. Their attributes are used to scope permissions to varying degrees.
Netflix ran into significant difficulties in trying to fit their existing policy model into relations. To do so Netflix’s design required:
An event based mechanism that could ingest information about application autoscaling groups. An autoscaling group isn’t the lowest level of granularity, but it’s relatively close to the lowest level where we’d typically see authorization policy applied.
Ingest the attributes describing the autoscaling group and write them as separate relations. That is for the data-processor, Netflix would need to write relations describing the region, environment, account, application name, etc.
At authZ check time, provide the attributes for the identity to check, e.g. “can app bar in us-west-2 access this document.” SpiceDB is then responsible for figuring out which relations map back to the autoscaling group, e.g. name, environment, region, etc.
A cleanup process to prune stale relationships from the database.
What was problematic about this design? Aside from being complicated, there were a few specific things that made Netflix uncomfortable. The most salient being that it wasn’t resilient to an absence of relationship data, e.g. if a new autoscaling group started and reporting its presence to SpiceDB had not yet happened, the autoscaling group members would be missing necessary permissions to run. All this meant that Netflix would have to write and prune the relationship state with significant freshness requirements. This would be a significant departure from its existing policy based system.
While working through this, Netflix hopped into the SpiceDB Discord to chat about possible solutions and found an open community issue: the caveated relationships proposal.
The Beginning of SpiceDB Caveats
The SpiceDB community had already explored integrating SpiceDB with Open Policy Agent (OPA) and concluded it strayed too far from Zanzibar’s core promise of global horizontal scalability with strong consistency. With Netflix’s support, the AuthZed team pondered a Zanzibar-native approach to Attribute-Based Access Control.
The requirements were captured and published as the caveated relationships proposal on GitHub for feedback from the SpiceDB community. The community’s excitement and interest became apparent through comments, reactions, and conversations on the SpiceDB Discord server. Clearly, Netflix wasn’t the only one facing challenges when reconciling SpiceDB with policy-based approaches, so Netflix decided to help! By sponsoring the project, Netflix was able to help AuthZed prioritize engineering effort and accelerate adding Caveats to SpiceDB.
Building SpiceDB Caveats
Quick Intro to SpiceDB
The SpiceDB Schema Language lays the rules for how to build, traverse, and interpret SpiceDB’s Relationship Graph to make authorization decisions. SpiceDB Relationships, e.g., document:readme writer user:emilia, are stored as relationships that represent a graph within a datastore like CockroachDB or PostgreSQL. SpiceDB walks the graph and decomposes it into subproblems. These subproblems are assigned through consistent hashing and dispatched to a node in a cluster running SpiceDB. Over time, each node caches a subset of subproblems to support a distributed cache, reduce the datastore load, and achieve SpiceDB’s horizontal scalability.
SpiceDB Caveats Design
The fundamental challenge with policies is that their input arguments can change the authorization result as understood by a centralized relationships datastore. If SpiceDB were to cache subproblems that have been “tainted” with policy variables, the likelihood those are reused for other requests would decrease and thus severely affect the cache hit rate. As you’d suspect, this would jeopardize one of the pillars of the system: its ability to scale.
Once you accept that adding input arguments to the distributed cache isn’t efficient, you naturally gravitate toward the first question: what if you keep those inputs out of the cached subproblems? They are only known at request-time, so let’s add them as a variable in the subproblem! The cost of propagating those variables, assembling them, and executing the logic pales compared to fetching relationships from the datastore.
The next question was: how do you integrate the policy decisions into the relationships graph? The SpiceDB Schema Languages’ core concepts are Relations and Permissions; these are how a developer defines the shape of their relationships and how to traverse them. Naturally, being a graph, it’s fitting to add policy logic at the edges or the nodes. That leaves at least two obvious options: policy at the Relation level, or policy at the Permission level.
After iterating on both options to get a feel for the ergonomics and expressiveness the choice was policy at the relation level. After all, SpiceDB is a Relationship Based Access Control (ReBAC) system. Policy at the relation level allows you to parameterize each relationship, which brought about the saying “this relationship exists, but with a Caveat!.” With this approach, SpiceDB could do request-time relationship vetoing like so:
definition human {}
caveat the_answer(received int) { received == 42 } definition the_answer_to_life_the_universe_and_everything { relation humans: human with the_answer permission enlightenment = humans
Netflix and AuthZed discussed the concept of static versus dynamic Caveats as well. A developer would define static Caveat expressions in the SpiceDB Schema, while dynamic Caveats would have expressions defined at run time. The discussion centered around typed versus dynamic programming languages, but given SpiceDB’s Schema Language was designed for type safety, it seemed coherent with the overall design to continue with static Caveats. To support runtime-provided policies, the choice was to introduce expressions as arguments to a Caveat. Keeping the SpiceDB Schema easy to understand was a key driver for this decision.
For defining Caveats, the main requirement was to provide an expression language with first-class support for partially-evaluated expressions. Google’s CEL seemed like the obvious choice: a protobuf-native expression language that evaluates in linear time, with first-class support for partial results that can be run at the edge, and is not turing complete. CEL expressions are type-safe, so they wouldn’t cause as many errors at runtime and can be stored in the datastore as a compiled protobuf. Given the near-perfect requirement match, it does make you wonder what Google’s Zanzibar has been up to since the white paper!
To execute the logic, SpiceDB would have to return a third response CAVEATED, in addition to ALLOW and DENY, to signal that a result of a CheckPermission request depends on computing an unresolved chain of CEL expressions.
SpiceDB Caveats needed to allow static input variables to be stored before evaluation to represent the multi-dimensional nature of Netflix application identities. Today, this is called “Caveat context,” defined by the values written in a SpiceDB Schema alongside a Relation and those provided by the client. Think of build time variables as an expansion of a templated CEL expression, and those take precedence over request-time arguments. Here is an example:
caveat the_answer(received int, expected int) { received == expected }
Lastly, to deal with scenarios where there are multiple Caveated subproblems, the decision was to collect up a final CEL expression tree before evaluating it. The result of the final evaluation can be ALLOW, DENY, or CAVEATED. Things get trickier with wildcards and SpiceDB APIs, but let’s save that for another post! If the response is CAVEATED, the client receives a list of missing variables needed to properly evaluate the expression.
To sum up! The primary design decisions were:
Caveats defined at the Relation-level, not the Permission-level
Keep Caveats in line with SpiceDB Schema’s type-safe nature
Support well-typed values provided by the caller
Use Google’s CEL to define Caveat expressions
Introduce a new result type: CAVEATED
How do SpiceDB Caveats Change Authorizing Netflix Identities?
SpiceDB Caveats simplify this approach by allowing Netflix to specify authorization policy as they have in the past for applications. Instead of needing to have the entire state of the authorization world persisted as relations, the system can have relations and attributes of the identity used at authorization check time.
Now Netflix can write a Caveat similar to match_fine , described below, that takes lists of expected attributes, e.g. region, account, etc. This Caveat would allow the specific application named by the relation as long as the context of the authorization check had an observed account, stack, detail, region, and extended attribute values that matched the values in their expected counterparts. This playground has a live version of the schema, relations, etc. with which to experiment.
With the playground we can also make assertions that can mirror the behavior we’d see from the CheckPermission API. These assertions make it clear that our caveats work as expected.
Netflix and AuthZed are both excited about the collaboration’s outcome. Netflix has another authorization tool it can employ and SpiceDB users have another option with which to perform rich authorization checks. Bridging the gap between policy based authorization and ReBAC is a powerful paradigm that is already benefiting companies looking to Zanzibar based implementations for modernizing their authorization stack.
In this post, we continue with our recommendations for using AWS Identity and Access Management (IAM) APIs. In part 1 of this two-part series, we described how you could create IAM resources and use them soon after for authorization decisions. We also described options for monitoring and responding to IAM resource changes for entire accounts. Now, in part 2 of this post, we’ll cover the API throttling behavior of IAM and AWS Security Token Service (AWS STS) and how you can effectively plan your usage of these APIs. We’ll also cover the features of IAM that enable you to right-size the permissions granted to principals in your organization and assess external access to your resources.
Increase your usage of IAM APIs
If you’re a developer or a security engineer, you might find yourself writing more and more automation that interacts with IAM APIs. Other engineers, teams, or applications might also call the IAM APIs within the same account or cross-account. Over time, anyone calling the APIs in your account incrementally increases the number of requests per second. If so, IAM might send a “Rate exceeded” error that indicates you have exceeded a certain threshold of API calls per second. This is called API throttling.
Understand IAM API throttling
API throttling occurs when you exceed the call rate limits for an API. AWS uses API throttling to limit requests to a service. Like many AWS services, IAM limits API requests to maintain the performance of the service, and to ensure fair usage across customers. IAM and AWS STS independently implement a token bucket algorithm for throttling, in which a bucket of virtual tokens is refilled every second. Each token represents a non-throttled API call that you can make. The number of tokens that a bucket holds and the refill rate depends on the API. For each IAM API, a number of token buckets might apply.
We refer to this simply as rate-limiting criteria. Essentially, there are several rate-limiting criteria that are considered when evaluating whether a customer is generating more traffic than the service allows. The following are some examples of these criteria:
The account where the API is called
The account for read or write APIs (depending on whether the API is a read or write operation)
The account from which AssumeRole was called prior to the API call (for example, third-party cross-account calls)
The account from which AssumeRole was called prior to the API call for read APIs
The API and organization where the API is called
Understand STS API throttling
Although IAM has criteria pertaining to the account from which AssumeRole was called, IAM has its own API rate limits that are distinct from AWS STS. Therefore, the preceding criteria are IAM-specific and are separate from the throttling that can occur if you call STS APIs. IAM is also a global service, and the limits are not Region-aware. In contrast, while STS has a single global endpoint, every Region has its own STS endpoint with its own limits.
The STS rate-limiting criteria pertain to each account and endpoint for API calls. For example, if you call the AssumeRole API against the sts.ap-northeast-1.amazonaws.com endpoint, STS will evaluate the rate-limiting criteria associated with that account and the ap-northeast-1 endpoint. Other STS API requests that you perform under the same account and endpoint will also count towards these criteria. However, if you make a request from the same account to a different regional endpoint or the global endpoint, that request will count against different criteria.
Note: AWS recommends that you use the STS regional endpoints instead of the STS global endpoint. Regional endpoints have several benefits, including redundancy and reduced latency. To learn more about other benefits, see Managing AWS STS in an AWS Region.
How multiple criteria affect throttling
The preceding examples show the different ways that IAM and STS can independently limit requests. Limits might be applied at the account level and across read or write APIs. More than one rate-limiting criterion is typically associated with an API call, with each request counted against each rate-limiting criterion independently. This means that if the requests-per-second exceeds the applicable criteria, then API throttling occurs and returns a rate-limiting error.
How to address IAM and STS API throttling
In this section, we’ll walk you through some strategies to reduce IAM and STS API throttling.
Query for top callers
With AWS CloudTrail Lake, your organization can aggregate, store, and query events recorded by CloudTrail for auditing, security investigation, and operational troubleshooting. To monitor API throttling, you can run a simple query that identifies the top callers of IAM and STS.
For example, you can make a SQL-based query in the CloudTrail console to identify the top callers of IAM, as shown in Figure 1. This query includes the API count, API event name, and more that are made to IAM (shown under eventSource). In this example, the top result is a call to GetServiceLastAccessedDetails, which occurred 163 times. The result includes the account ID and principal ID that made those requests.
To reduce call throttling, you need to know when you exceed a rate limit. You can identify when you are being throttled by catching the RateLimitExceeded exception in your API calls. Or, you can send your application logs to Amazon CloudWatch Logs and then configure a metric filter to record each time that throttling occurs, for later analysis or notification. Ideally, you should do this across your applications, and log this information centrally so that you can investigate whether calls from a specific account (such as your central monitoring account) are affecting API availability across your other accounts by exceeding a rate-limiting criterion in those accounts.
Call your APIs with a less aggressive retry strategy
In the AWS SDKs, you can use the existing retry library and provide a custom base for the initial sleep done between API calls. For example, you can set a custom configuration for the backoff or edit the defaults directly. The default SDK_DEFAULT_THROTTLED_BASE_DELAY is 500 milliseconds (ms) in the relevant Java SDK file, but if you’re experiencing throttling consistently, we recommend a minimum 1000 ms for the throttled base delay. You can change this value or implement a custom configuration through the PredefinedBackoffStrategies.SDKDefaultBackoffStrategy() class, which is referenced in the same file. As another example, in the Javascript SDK, you can edit the base retry of the retryDelayOptions configuration in the AWS.Config class, as described in the documentation.
The difference between making these changes and using the SDK defaults is that the custom base provides a less aggressive retry. You shouldn’t retry multiple requests that are throttled during the same one-second window. If the API has other applicable rate-limiting criteria, you can potentially exceed those limits as well, preventing other calls in your account from performing requests. Lastly, be careful that you don’t implement your own retry or backoff logic on top of the SDK retry or backoff logic because this could make throttling worse — instead, you should override the SDK defaults.
Reduce the number of requests by using max items
For some APIs, you can increase the number of items returned by a single API call. Consider the example of the GetServiceLastAccessedDetails API. This API returns a lot of data, but the results are truncated by default to 100 items, ordered alphabetically by the service namespace. If the number of items returned is greater than 100, then the results are paginated, and you need to make multiple requests to retrieve the paginated results individually. But if you increase the value of the MaxItems parameter, you can decrease the number of requests that you need to make to obtain paginated results.
AWS has hundreds of services, so you should set the value of the MaxItems parameter no higher than your application can handle (the response size could be large). At the time of our testing, the results were no longer truncated when this value was 300. For this particular API, IAM might return fewer results, even when more results are available. This means that your code still needs to check whether the results are paginated and make an additional request if paginated results are available.
Consistent use of the MaxItems parameter across AWS APIs can help reduce your total number of API requests. The MaxItems parameter is also available through the GetOrganizationsAccessReport operation, which defaults to 100 items but offers a maximum of 1000 items, with the same caveat that fewer results might be returned, so check for paginated results.
Smooth your high burst traffic
In the table from part 1 of this post, we stated that you should evaluate IAM resources every 24 hours. However, if you use a simple script to perform this check, you could initiate a throttling event. Consider the following fictional example:
As a member of ExampleCorp’s Security team, you are working on a task to evaluate the company’s IAM resources through some custom evaluation scripts. The scripts run in a central security account. ExampleCorp has 1000 accounts. You write automation that assumes a role in every account to run the GetAccountAuthorizationDetails API call. Everything works fine during development on a few accounts, but you later build a dashboard to graph the data. To get the results faster for the dashboard, you update your code to run concurrently every hour. But after this change, you notice that many requests result in the throttling error “Rate exceeded.” Other security teams see “Rate exceeded” errors in their application logs, too.
Can you guess what happened? When you tried to make the requests faster, you used concurrency to make the requests run in parallel. By initiating this large number of requests simultaneously, you exceeded the rate-limiting criteria for the security account from which the sts:AssumeRole action was called prior to the GetAccountAuthorizationDetails API call.
To address scenarios like this, we recommend that you set your own client-side limitations when you need to make a large number of API requests. You can spread these calls out so that they happen sequentially and avoid large spikes. For example, if you run checks every 24 hours, make sure that the calls don’t happen at exactly midnight. Figure 2 shows two different ways to distribute API volume over time:
Figure 2: Call volume that periodically spikes compared to evenly-distributed call volume
The graph on the left represents a large, recurring API call volume, with calls occurring at roughly the same time each day—such as 1000 requests at midnight every 24 hours. However, if you intend to make these 1000 requests consistently every 24 hours, you can spread them out over the 24-hour period. This means that you could make about 41 requests every hour, so that 41 accounts are queried at 00:00 UTC, another 41 the next hour, and so on. By using this strategy, you can make these requests blend into your other traffic, as shown in the graph on the right, rather than create large spikes. In summary, your automation scheduler should avoid large spikes and distribute API requests evenly over the 24-hour period. Using queues such as those provided by Amazon Simple Queue Service (Amazon SQS) can also help, and when errors are identified, you can put them in a dead letter queue to try again later.
Some IAM APIs have rate-limiting criteria for API requests made from the account from which the AssumeRole was called prior to the call. We recommend that you serially iterate over the accounts in your organization to avoid throttling. To continue with our example, you should iterate the 41 accounts one-by-one each hour, rather than running 41 calls at once, to reduce spikes in your request rates.
Recommendations specific to STS
You can adjust how you use AWS STS to reduce your number of API calls. When you write code that calls the AssumeRole API, you can reuse the returned credentials for future requests because the credentials might still be valid. Imagine that you have an event-driven application running in a central account that assumes a role in a target account and does an API call for each event that occurs in that account. You should consider reusing the credentials returned by the AssumeRole call for each subsequent call in the target account, especially if calls in the central accounts are being throttled. You can do this for AssumeRole calls because there is no service-side limit to the number of credentials that you can create and use. Whether it’s one credential or many, you need to use and store these carefully. You can also adjust the role session duration, which determines how long the role’s credentials are valid. This value can be up to 12 hours, depending on the maximum session duration configured on the role. If you reuse short-term credentials or adjust the session duration, make sure that you evaluate these changes from a security perspective as you optimize your use of STS to reduce API call volume.
Use case #3: Pare down permissions for least-privileged access
Let’s assume that you want to evaluate your organization’s IAM resources with some custom evaluation scripts. AWS has native functionality that can reduce your need for a custom solution. Let’s take a look at some of these that can help you accomplish these goals.
Identify unintended external sharing
To identify whether resources in your accounts, such as IAM roles and S3 buckets, have been shared with external entities, you can use IAM Access Analyzer instead of writing your own checks. With IAM Access Analyzer, you can identify whether resources are accessible outside your account or even your entire organization. Not only can you identify these resources on-demand, but IAM Access Analyzer proactively re-analyzes resources when their policies change, and reports new findings. This can help you feel confident that you will be notified of new external sharing of supported resources, so that you can act quickly to investigate. For more details, see the IAM Access Analyzer user guide.
Right-size permissions
You can also use IAM Access Analyzer to help right-size the permissions policies for key roles in your accounts. IAM Access Analyzer has a policy generation feature that allows you to generate a policy by analyzing your CloudTrail logs to identify actions used from over 140 services. You can compare this generated policy with the existing policy to see if permissions are unused, and if so, remove them.
You can perform policy generation through the API or the IAM console. For example, you can use the console to navigate to the role that you want to analyze, and then choose Generate policy to start analyzing the actions used over a specified period. Actions that are missing from the generated policy are permissions that can be potentially removed from the existing policy, after you confirm your changes with those who administer the IAM role. To learn more about generating policies based on CloudTrail activity, see IAM Access Analyzer makes it easier to implement least privilege permissions by generating IAM policies based on access activity.
Conclusion
In this two-part series, you learned more about how to use IAM so that you can test and query IAM more efficiently. In this post, you learned about the rate-limiting criteria for IAM and STS, to help you address API throttling when increasing your usage of these services. You also learned how IAM Access Analyzer helps you identify unintended resource sharing while also generating policies that serve as a baseline for principals in your account. In part 1, you learned how to quickly create IAM resources and use them when refining permissions. You also learned how to get information about IAM resources and respond to IAM changes through the various services integrated with IAM. Lastly, when calling IAM directly, you learned about bulk APIs, which help you efficiently retrieve the state of your principals and policies. We hope these posts give you valuable insights about IAM to help you better monitor, review, and secure access to your AWS cloud environment!
In this two-part blog post, we’ll provide recommendations for using AWS Identity and Access Management (IAM) APIs, and we’ll share useful details on how IAM works so that you can use it more effectively. For example, you might be creating new IAM resources such as roles and policies through automation and notice a delay for resource propagations. Or you might be building a custom cloud security monitoring solution that uses IAM APIs to evaluate the security and compliance of your AWS accounts, and you want to know how to do that without exceeding limits. Although these are just a few example use cases, the insights described in this post are intended to help you avoid anti-patterns when building scalable cloud services that use IAM APIs.
In this post, we describe how to create IAM resources and use them soon after for authorization decisions. We also describe options for monitoring and responding to IAM resource changes for entire accounts. In part 2, we’ll cover the API throttling behavior of IAM and AWS Security Token Service (AWS STS) and how you can effectively plan your usage of these APIs. Let’s dive in!
Use case 1: Create IAM resources and attempt to use them immediately
If you’re a cloud developer, you create and use IAM resources when you develop applications on AWS. For your application to interact with AWS services, you need to grant IAM permissions to your application. Your application—whether it runs on AWS Lambda, Amazon Elastic Compute Cloud (Amazon EC2), or another service—will need an associated IAM role and policy that provide the necessary permissions.
Imagine that you want to create least privilege policies for your application. You begin by deploying new or updated IAM resources, such as roles and policies, along with your application updates, and you automate this process to speed up testing and development.
During development, you begin removing unnecessary policy permissions, with your automation testing the updated permissions. However, you notice that some of your updates do not immediately take effect. The following sections address why this occurs and provide insights to help you architect for other scenarios.
Understand the IAM control plane and data plane
Let’s first learn more about the control plane and data plane in IAM. The control plane involves operations to create, read, update, and delete IAM resources, and it’s how you get the current state of IAM. When you invoke IAM APIs, you interact with the control plane. This includes any API that falls under the iam:* namespace. The data plane, in contrast, consists of the authorization system that is used at scale to grant access to the broader set of AWS services and resources. This includes the AWS STS APIs, which have their own sts:* namespace.
When you call the IAM control plane APIs to create, update, or delete resources, you can expect a read-after-write consistent response. This means that you can retrieve (read) the resource and its latest updates immediately after it’s written. In contrast, the IAM data plane, where authorizations occur, is eventually consistent. This means that there will be a delay for IAM resource changes, such as updates to roles and policies, to propagate and reflect in the authorizations that follow. The delay can be several seconds or longer. Because of this, you need to allow for propagation time when you test changes to IAM resources. To learn more about the control plane and data plane of IAM, see Resilience in AWS Identity and Access Management.
Note: Because calls to AWS APIs rely on IAM to check permissions, the availability and scalability of the data plane are paramount. In 2011, the “can the caller do this?” function handled a couple of thousand requests per second. Today, as new services continue to launch and the number of AWS customers increases, AWS Identity handles over half a billion API calls per second worldwide, and the number is growing. Eventually consistent design enables the IAM data plane to maintain the high availability and low latency needed to evaluate permissions on AWS.
This is why when architecting your application, we recommend that you don’t depend on control plane actions such as resource updates for critical parts of your application’s workflow. Instead, you should architect to take advantage of the data plane, which includes STS and the authorization system of IAM. In the next section, we describe how you can do this.
Test permissions with STS scope-down policies
IAM role sessions have a feature called a session policy, which takes effect immediately when a role is assumed. This is an optional policy that you can provide to scope down the role’s existing identity policies, with the permissions being the intersection of the role’s identity-based policies and the session policy. By using session policies, you get specific, scoped-down credentials from a single pre-existing role without having to create new roles or identity policies for each particular session’s use case. You can use session policies for your application or when you test which least privilege policies are best for your application.
Let’s walk through an example of when to use session policies for permissions testing. Imagine that you need permissions that require very specific, fine-grained conditions to attain your ideal least privilege policy. You might iterate on the policy several times, making updates and testing the changes over and over again. If you update a policy attached to a role, you need to wait for these changes to propagate to the IAM data plane. But if you instead specify a scope-down policy when assuming the pre-existing role prior to testing, you can immediately test and observe the effects of your permissions changes. Immediate testing is possible because your role and its original policy have already propagated to the data plane, enabling you to iterate over various scoped-down session policies that operate against the IAM data plane.
Use STS session policies to assume a role with the AWS CLI
There are two ways to provide a session policy during the AssumeRole process: you can provide an inline policy document or the Amazon Resource Names (ARNs) of managed session policies. The following example shows how to do this through the AWS Command Line Interface (AWS CLI), by passing in a policy document along with the AssumeRole call. If you use this example policy, make sure to replace <123456789012> and <DOC-EXAMPLE-BUCKET> with your own information.
In this example, we provide a previously created role ARN named s3-full-access, which provides full access to Amazon Simple Storage Service (Amazon S3). We can further restrict the role’s permissions by supplying a policy with the optional --policy option. The inline policy document only allows the GetObject request against the S3 bucket named <DOC-EXAMPLE-BUCKET>. The effective permissions for the returned session are the intersection of the role’s identity-based policies and our provided session policy. Therefore, the role session’s permissions are limited to only performing the GetObject request against the <DOC-EXAMPLE-BUCKET>.
Note: The combined size of the passed inline policy document and all passed managed policy ARN characters cannot exceed 2,048 characters. You can reduce the size of the JSON policy document by removing unnecessary whitespace and shortening or removing tags associated with your session.
Use case 2: Monitor and respond to IAM resources for entire accounts
You might need to periodically audit the state of your IAM resources, such as roles and policies, including whether these IAM resources have changed, in a single account or across your entire organization. For example, you might want to check whether roles have overly broad access to actions and resources. Or you might want to monitor IAM resource creation and updates to respond to security-relevant permission changes. In this section, you will learn how to choose the right tool for auditing and monitoring IAM resources across accounts. You will learn about the AWS services that support this use case, the benefits of polling compared to event-based architectures, and powerful APIs that aggregate common information.
Respond to configuration changes with an event-driven approach
Sometimes you might need to perform actions relatively quickly based on IAM changes. For example, you might need to check if a trust policy for a newly created or updated role allows cross-account access. In cases like this, you can use AWS Config rules, AWS CloudTrail, or Amazon EventBridge to detect state changes and perform actions based on these state changes. You can use AWS Config rules to evaluate whether a resource complies with the conditions that you specify. If it doesn’t comply, you can provide a workflow to remediate the non-compliance. With CloudTrail, you can monitor your account’s API calls, and log API calls for your accounts with AWS Organizations integration. EventBridge works closely with CloudTrail and helps you create rules that match incoming events and send them to targets, such as Lambda, where your code can perform analysis or automated remediation. You can even filter out events from your accounts and send them to a central account’s event bus for processing. For an example of how to use EventBridge with IAM Access Analyzer to remediate cross-account access in a role’s trust policy, see Automate resolution for IAM Access Analyzer cross-account access findings on IAM roles. Which feature you choose depends on whether you need to monitor one account or all accounts in your organization, as well as which solution you are more comfortable building with.
One caveat to an event-driven approach is that if many events occur over a short period and your application responds to each event with an IAM API call of its own, you could eventually be throttled by IAM. To address this, you can queue up your responding API calls, distribute them over a longer period, or aggregate them to reduce API call volume. For example, if some of your calls are write APIs (such as UpdateAssumeRolePolicy or CreatePolicyVersion) or read APIs (such as GetRole or GetRolePolicy), you can call them serially with a delay between calls. If you need the latest status on a large number of principals and policies, you can call IAM bulk APIs such as GetAccountAuthorizationDetails, which will return data to you for principals and policies and their relationships in your organization. This approach helps you avoid throttling and querying the IAM control plane with unnecessary and redundant API calls. You will learn more about throttling and how to address it in part two of this post.
Retrieve point-in-time resource information with AWS Config
AWS Config helps you assess, audit, and evaluate the configuration of your AWS resources. It also offers multi-account, multi-Region data aggregation and is integrated with AWS Organizations. With AWS Config, you can create rules that detect and respond to changes. AWS Config also keeps an inventory of AWS resource configurations that you can query through its API, so that you don’t need to make direct API calls to each resource’s service. AWS Config also offers the ability to return the status of resources from multiple accounts and AWS Regions. As shown in Figure 1, you can use the AWS Config console to run a simple SQL-like statement for details on the IAM roles in your entire organization.
Figure 1: Run a query on IAM roles in AWS Config
The preceding results also show associated resources, such as the inline and attached policies for the IAM roles. Alternatively, you can obtain these results from the SDK or CLI. The following query that uses the CLI is equivalent to the preceding query that uses the console. If you use this query, make sure to replace DOC-EXAMPLE-CONFIG-AGGREGATOR> with your AWS Config aggregator.
The preceding command returns the details of roles in your organization’s accounts, including the full policy document for the associated inline policy. It also returns the customer-managed policy names and their ARNs, for which you can view the policy documents and versions by using the BatchGetResourceConfig API. Note that AWS Config doesn’t provide the AWS-managed policy documents. However, these are common across accounts, and we will show you how to query that data later in this section.
To query the status of roles in your organization, you need to have AWS Config enabled in each account. You also need an aggregator to monitor your accounts with your organization’s management account or a delegated administrator account. For more details on how to set up AWS Config, see the AWS Config developer guide. After you set up AWS Config, you can periodically call the AWS Config APIs to get a snapshot of the current or prior state of your resources. Furthermore, you can periodically pull the snapshot records and evaluate this information in other tools outside of AWS Config. So before you directly use the IAM APIs to get IAM information, consider using AWS Config—this is what it’s for!
Retrieve IAM resource information directly from IAM
As previously noted, AWS Config can give you a bulk view of your AWS and IAM resources. Additionally, CloudTrail and EventBridge can detect AWS and IAM resource changes and help you act on them. If you need data from IAM beyond what these services offer, you can query the IAM APIs directly to get the latest information on your resources.
A few key APIs can help you audit IAM resources more efficiently, especially in bulk. The first is GetAccountAuthorizationDetails, which enables you to retrieve the principals in your account, their associated inline policy documents (if any), attached managed policies, and their relationships to each other. This API reduces the need to individually call ListRolePolicies and ListAttachedRolePolicies for each role in an account. GetAccountAuthorizationDetails also returns the role trust policy document for roles in the results. Finally, GetAccountAuthorizationDetails allows you to filter the result set. For example, if you don’t need information relating to groups or AWS managed policies, you can exclude these from the API response. You can do this by using the filter parameter to only include the details that you need at the time.
Another useful API is GenerateServiceLastAccessedDetails. This API gives you details about when an IAM resource (user, group, role, or policy) was last used in an attempt to access AWS services. You can use this API to identify roles that are unused and remove them if you don’t need them. IAM Access Analyzer, which you will learn about later in this post, also uses the same information.
The following table summarizes the key APIs that you can use, rather than building your own code that loops for this information individually.
Submit a job through the GenerateServiceLastAccessedDetails API which returns a JobId; then retrieve the results after the job completes. Pass ACTION_LEVEL as the required Granularity parameter.
Spread total number of requests evenly across 24 hours
Note: In the table, we suggest that you perform some of these API requests once every 24 hours as a starting point. You might prefer to perform your own analysis at a longer time interval, such as every 48 hours, but we don’t recommend requesting it more often than every 24 hours because these resources (and therefore the details in the responses) don’t change often. These APIs are suitable for periodic, point-in-time collection of information. If you need faster detection of information from GetAccountAuthorizationDetails, consider whether AWS Config rules or EventBridge will fit your needs. For GetServiceLastAccessedDetails, recent activity usually appears within four hours, so more frequent requests are unlikely to provide much value.
Use of these APIs can help you avoid writing code that loops through results to make individual read API calls for each principal, policy, and policy version in an account, which could result in tens of thousands of API requests and call throttling. Instead of iterating over each resource, you should use solutions that return bulk data, such as GetAccountAuthorizationDetails, AWS Config, or an AWS Partner Network solution. However, if you’re experiencing throttling, you will learn some practical considerations on how to handle that later in this post.
Inspect IAM resources across multiple accounts and organizations
Your use case might require that you inspect IAM resources across multiple accounts in your organization. Or perhaps you are an independent software vendor and need to build a software-as-a-service tool to evaluate IAM resources across many organizations. The following considerations can help you address use cases like these.
AWS Organizations integration
Previously, you learned of the benefits of the “service last accessed data” that the GenerateServiceLastAccessedDetails and GetServiceLastAccessedDetails APIs provide. But what if you want to pull this data for multiple accounts in your organization? IAM has bulk APIs that support querying this data across your entire organization, so you don’t need to assume a role in each account to generate the request. To generate a report for entities (organization root, organizational unit, or account) or policies in your organization, use the GenerateOrganizationsAccessReport operation, which returns a JobId that is passed as a parameter to the GetOrganizationsAccessReport operation to check if the report has been generated. When the job status is marked complete, you can retrieve the report.
AWS managed policies
Many customers use AWS managed policies because they align to common job functions. AWS creates and administers these policies, which have their own ARNs, such as arn:aws:iam::aws:policy/AWSCodeCommitPowerUser. AWS managed policies are available for every account, and they are the same for every account. AWS updates them when new services and API operations are introduced. Updated policies are recorded and visible as a new version, so you only need to query for the current AWS managed policies once per evaluation cycle, rather than once per account. Therefore, if you’re evaluating hundreds or thousands of accounts, you shouldn’t include the AWS managed policies and their policy versions in your query. Doing so would result in thousands of redundant API requests and could cause throttling. Instead, you can query the AWS managed policies once and then reuse the results across your analysis and evaluation by caching the results for a period of time (for example, every 24 hours) in your application before requesting them again to check for updates. Because AWS managed policies are available through the GetAccountAuthorizationDetails API, you don’t need to query for the AWS managed policies or their versions as a separate action.
Multi-account limits
The preceding table lists the frequency of API requests as “per account” in many places. If you’re calling IAM APIs by assuming a role in other accounts from a central account, some IAM APIs have rate-limiting criteria that apply to API requests performed from the assuming account (the central account). To query data from multiple accounts, we recommend that you serially iterate over the accounts one-by-one to avoid throttling. You’ll learn more about this strategy, as well as throttling, in part 2 of this blog post.
Conclusion
In this post, you learned about different aspects of IAM and best practices to test and query IAM efficiently. With STS session policies, you can test different policies to help achieve least privilege access. With AWS Config, EventBridge, CloudTrail, and CloudTrail Lake, you can audit your IAM resources and respond to changes while reducing the number of IAM API calls that you make. If you need to call IAM directly, you can use IAM bulk APIs for more efficient retrieval of your resource state. You can learn more about IAM and best practices in part 2 of blog post: How to monitor and query IAM resources at scale – Part 2.
AWS Identity and Access Management (IAM) has now made it easier for you to use IAM roles for your workloads that are running outside of AWS, with the release of IAM Roles Anywhere. This feature extends the capabilities of IAM roles to workloads outside of AWS. You can use IAM Roles Anywhere to provide a secure way for on-premises servers, containers, or applications to obtain temporary AWS credentials and remove the need for creating and managing long-term AWS credentials.
In this post, I will briefly discuss how IAM Roles Anywhere works. I’ll mention some of the common use cases for IAM Roles Anywhere. And finally, I’ll walk you through an example scenario to demonstrate how the implementation works.
Background
To enable your applications to access AWS services and resources, you need to provide the application with valid AWS credentials for making AWS API requests. For workloads running on AWS, you do this by associating an IAM role with Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), or AWS Lambda resources, depending on the compute platform hosting your application. This is secure and convenient, because you don’t have to distribute and manage AWS credentials for applications running on AWS. Instead, the IAM role supplies temporary credentials that applications can use when they make AWS API calls.
IAM Roles Anywhere enables you to use IAM roles for your applications outside of AWS to access AWS APIs securely, the same way that you use IAM roles for workloads on AWS. With IAM Roles Anywhere, you can deliver short-term credentials to your on-premises servers, containers, or other compute platforms. When you use IAM Roles Anywhere to vend short-term credentials you can remove the need for long-term AWS access keys and secrets, which can help improve security, and remove the operational overhead of managing and rotating the long-term credentials. You can also use IAM Roles Anywhere to provide a consistent experience for managing credentials across hybrid workloads.
In this post, I assume that you have a foundational knowledge of IAM, so I won’t go into the details here about IAM roles. For more information on IAM roles, see the IAM documentation.
How does IAM Roles Anywhere work?
IAM Roles Anywhere relies on public key infrastructure (PKI) to establish trust between your AWS account and certificate authority (CA) that issues certificates to your on-premises workloads. Your workloads outside of AWS use IAM Roles Anywhere to exchange X.509 certificates for temporary AWS credentials. The certificates are issued by a CA that you register as a trust anchor (root of trust) in IAM Roles Anywhere. The CA can be part of your existing PKI system, or can be a CA that you created with AWS Certificate Manager Private Certificate Authority (ACM PCA).
Your application makes an authentication request to IAM Roles Anywhere, sending along its public key (encoded in a certificate) and a signature signed by the corresponding private key. Your application also specifies the role to assume in the request. When IAM Roles Anywhere receives the request, it first validates the signature with the public key, then it validates that the certificate was issued by a trust anchor previously configured in the account. For more details, see the signature validation documentation.
After both validations succeed, your application is now authenticated and IAM Roles Anywhere will create a new role session for the role specified in the request by calling AWS Security Token Service (AWS STS). The effective permissions for this role session are the intersection of the target role’s identity-based policies and the session policies, if specified, in the profile you create in IAM Roles Anywhere. Like any other IAM role session, it is also subject to other policy types that you might have in place, such as permissions boundaries and service control policies (SCPs).
There are typically three main tasks, performed by different personas, that are involved in setting up and using IAM Roles Anywhere:
Initial configuration of IAM Roles Anywhere – This task involves creating a trust anchor, configuring the trust policy of the role that IAM Roles Anywhere is going to assume, and defining the role profile. These activities are performed by the AWS account administrator and can be limited by IAM policies.
Provisioning of certificates to workloads outside AWS – This task involves ensuring that the X.509 certificate, signed by the CA, is installed and available on the server, container, or application outside of AWS that needs to authenticate. This is performed in your on-premises environment by an infrastructure admin or provisioning actor, typically by using existing automation and configuration management tools.
Using IAM Roles Anywhere – This task involves configuring the credential provider chain to use the IAM Roles Anywhere credential helper tool to exchange the certificate for session credentials. This is typically performed by the developer of the application that interacts with AWS APIs.
I’ll go into the details of each task when I walk through the example scenario later in this post.
Common use cases for IAM Roles Anywhere
You can use IAM Roles Anywhere for any workload running in your data center, or in other cloud providers, that requires credentials to access AWS APIs. Here are some of the use cases we think will be interesting to customers based on the conversations and patterns we have seen:
Send security findings from on-premises sources to AWS Security Hub
Enable hybrid workloads to access AWS services over the course of phased migrations
Example scenario and walkthrough
To demonstrate how IAM Roles Anywhere works in action, let’s walk through a simple scenario where you want to call S3 APIs to upload some data from a server in your data center.
Prerequisites
Before you set up IAM Roles Anywhere, you need to have the following requirements in place:
The certificate bundle of your own CA, or an active ACM PCA CA in the same AWS Region as IAM Roles Anywhere
An end-entity certificate and associated private key available on the on-premises server
Administrator permissions for IAM roles and IAM Roles Anywhere
Setup
Here I demonstrate how to perform the setup process by using the IAM Roles Anywhere console. Alternatively, you can use the AWS API or Command Line Interface (CLI) to perform these actions. There are three main activities here:
Create a trust anchor
Create and configure a role that trusts IAM Roles Anywhere
Under Trust anchors, choose Create a trust anchor.
On the Create a trust anchor page, enter a name for your trust anchor and select the existing AWS Certificate Manager Private CA from the list. Alternatively, if you want to use your own external CA, choose External certificate bundle and provide the certificate bundle.
Figure 1: Create a trust anchor in IAM Roles Anywhere
To create and configure a role that trusts IAM Roles Anywhere
Using the AWS Command Line Interface (AWS CLI), you are going to create an IAM role with appropriate permissions that you want your on-premises server to assume after authenticating to IAM Roles Anywhere. Save the following trust policy as rolesanywhere-trust-policy.json on your computer.
Save the following identity-based policy as onpremsrv-permissions-policy.json. This grants the role permissions to write objects into the specified S3 bucket.
You can optionally use condition statements based on the attributes extracted from the X.509 certificate to further restrict the trust policy to control the on-premises resources that can obtain credentials from IAM Roles Anywhere. IAM Roles Anywhere sets the SourceIdentity value to the CN of the subject (onpremsrv01 in my example). It also sets individual session tags (PrincipalTag/) with the derived attributes from the certificate. So, you can use the principal tags in the Condition clause in the trust policy as additional authorization constraints.
For example, the Subject for the certificate I use in this post is as follows.
Subject: … O = Example Corp., OU = SecOps, CN = onpremsrv01
So, I can add condition statements like the following into the trust policy (rolesanywhere-trust-policy.json):
On the Create a profile page, enter a name for the profile.
For Roles, select the role that you created in the previous step (ExampleS3WriteRole).
5. Optionally, you can define session policies to further scope down the sessions delivered by IAM Roles Anywhere. This is particularly useful when you configure the profile with multiple roles and want to restrict permissions across all the roles. You can add the desired session polices as managed policies or inline policy. Here, for demonstration purpose, I add an inline policy to only allow requests coming from my specified IP address.
Figure 2: Create a profile in IAM Roles Anywhere
At this point, IAM Roles Anywhere setup is complete and you can start using it.
Use IAM Roles Anywhere
IAM Roles Anywhere provides a credential helper tool that can be used with the process credentials functionality that all current AWS SDKs support. This simplifies the signing process for the applications. See the IAM Roles Anywhere documentation to learn how to get the credential helper tool.
To test the functionality first, run the credential helper tool (aws_signing_helper) manually from the on-premises server, as follows.
Figure 3: Running the credential helper tool manually
You should successfully receive session credentials from IAM Roles Anywhere, similar to the example in Figure 3. Once you’ve confirmed that the setup works, update or create the ~/.aws/config file and add the signing helper as a credential_process. This will enable unattended access for the on-premises server. To learn more about the AWS CLI configuration file, see Configuration and credential file settings.
To verify that the config works as expected, call the aws sts get-caller-identity AWS CLI command and confirm that the assumed role is what you configured in IAM Roles Anywhere. You should also see that the role session name contains the Serial Number of the certificate that was used to authenticate (cc:c3:…:85:37 in this example). Finally, you should be able to copy a file to the S3 bucket, as shown in Figure 4.
Figure 4: Verify the assumed role
Audit
As with other AWS services, AWS CloudTrail captures API calls for IAM Roles Anywhere. Let’s look at the corresponding CloudTrail log entries for the activities we performed earlier.
The first log entry I’m interested in is CreateSession, when the on-premises server called IAM Roles Anywhere through the credential helper tool and received session credentials back.
You can see that the cert, along with other parameters, is sent to IAM Roles Anywhere and a role session along with temporary credentials is sent back to the server.
The next log entry we want to look at is the one for the s3:PutObject call we made from our on-premises server.
In addition to the CloudTrail logs, there are several metrics and events available for you to use for monitoring purposes. To learn more, see Monitoring IAM Roles Anywhere.
Additional notes
You can disable the trust anchor in IAM Roles Anywhere to immediately stop new sessions being issued to your resources outside of AWS. Certificate revocation is supported through the use of imported certificate revocation lists (CRLs). You can upload a CRL that is generated from your CA, and certificates used for authentication will be checked for their revocation status. IAM Roles Anywhere does not support callbacks to CRL Distribution Points (CDPs) or Online Certificate Status Protocol (OCSP) endpoints.
Another consideration, not specific to IAM Roles Anywhere, is to ensure that you have securely stored the private keys on your server with appropriate file system permissions.
Conclusion
In this post, I discussed how the new IAM Roles Anywhere service helps you enable workloads outside of AWS to interact with AWS APIs securely and conveniently. When you extend the capabilities of IAM roles to your servers, containers, or applications running outside of AWS you can remove the need for long-term AWS credentials, which means no more distribution, storing, and rotation overheads.
I mentioned some of the common use cases for IAM Roles Anywhere. You also learned about the setup process and how to use IAM Roles Anywhere to obtain short-term credentials.
If you have any questions, you can start a new thread on AWS re:Post or reach out to AWS Support.
With Amazon Cognito, you can quickly add user sign-up, sign-in, and access control to your web and mobile applications. After a user signs in successfully, Cognito generates an identity token for user authorization. The service provides a pre token generation trigger, which you can use to customize identity token claims before token generation. In this blog post, we’ll demonstrate how to perform fine-grained authorization, which provides additional details about an authenticated user by using claims that are added to the identity token. The solution uses a pre token generation trigger to add these claims to the identity token.
Scenario
Imagine a web application that is used by a construction company, where engineers log in to review information related to multiple projects. We’ll look at two different ways of designing the architecture for this scenario: a standard design and a more optimized design.
Standard architecture
A sample standard architecture for such an application is shown in Figure 1, with labels for the various workflow steps:
The user interface is implemented by using ReactJS (a JavaScript library for building user interfaces).
AWS Lambda functions exist to implement business logic.
The AWS Lambda CheckUserAccess function (5) checks whether the user has authorization to call the AWS Lambda functions (4).
The project information is stored in an Amazon DynamoDB database.
Figure 1: Lambda functions that need the user’s projectID call the GetProjectID Lambda function
In this scenario, because the user has access to information from several projects, several backend functions use calls to the CheckUserAccess Lambda function (step 5 in Figure 1) in order to serve the information that was requested. This will result in multiple calls to the function for the same user, which introduces latency into the system.
Optimized architecture
This blog post introduces a new optimized design, shown in Figure 2, which substantially reduces calls to the CheckUserAccess API endpoint:
The user logs in.
Amazon Cognito makes a single call to the PretokenGenerationLambdaFunction-pretokenCognito function.
The PretokenGenerationLambdaFunction-pretokenCognito function queries the Project ID from the DynamoDB table and adds that information to the Identity token.
DynamoDB delivers the query result to the PretokenGenerationLambdaFunction-pretokenCognito function.
This Identity token is passed in the authorization header for making calls to the Amazon API Gateway endpoint.
Information in the identity token claims is used by the Lambda functions that contain business logic, for additional fine-grained authorization. Therefore, the CheckUserAccess function (7) need not be called.
The improved architecture is shown in Figure 2.
Figure 2. Get the projectID and inset it in a custom claim in the Identity token
The benefits of this approach are:
The number of calls to get the Project ID from the DynamoDB table are reduced, which in turn reduces overall latency.
The dependency on the CheckUserAccess Lambda function is removed from the business logic. This reduces coupling in the architecture, as depicted in the diagram.
In the code sample provided in this post, the user interface is run locally from the user’s computer, for simplicity.
Code sample
You can download a zip file that contains the code and the AWS CloudFormation template to implement this solution. The code that we provide to illustrate this solution is described in the following sections.
Prerequisites
Before you deploy this solution, you must first do the following:
The code provided with this post installs the following infrastructure in your AWS account.
Resource
Description
Amazon Cognito user pool
The users, added by the addUserInfo.py script, are added to this pool. The client ID is used to identify the web client that will connect to the user pool. The user pool domain is used by the web client to request authentication of the user.
Policies used for running the Lambda function and connecting to the DynamoDB database.
Lambda function for the pre token generation trigger
A Lambda function to add custom claims to the Identity token.
DynamoDB table with user information
A sample database to store user information that is specific to the application.
Deploy the solution
In this section, we describe how to deploy the infrastructure, save the trigger configuration, add users to the Cognito user pool, and run the web application.
To deploy the solution infrastructure
Download the zip file to your machine. The readme.md file in the addclaimstoidtoken folder includes a table that describes the key files in the code.
Change the directory to addclaimstoidtoken. cd addclaimstoidtoken
Review stackInputs.json. Change the value of the userPoolDomainName parameter to a random unique value of your choice. This example uses pretokendomainname as the Amazon Cognito domain name; you should change it to a unique domain name of your choice.
Deploy the infrastructure by running the following Python script. python3 setup_pretoken.py
After the CloudFormation stack creation is complete, you should see the details of the infrastructure created as depicted in Figure 3.
Figure 3: Details of infrastructure
Now you’re ready to add users to your Amazon Cognito user pool.
To add users to your Cognito user pool
To add users to the Cognito user pool and configure the DynamoDB store, run the Python script from the addclaimstoidtoken directory. python3 add_user_info.py
This script adds one user. It will prompt you to provide a username, email, and password for the user.
Note: Because this is sample code, advanced features of Cognito, like multi-factor authentication, are not enabled. We recommend enabling these features for a production application.
The addUserInfo.py script performs two actions:
Adds the user to the Cognito user pool.
Figure 4: User added to the Cognito user pool
Adds sample data to the DynamoDB table.
Figure 5: Sample data added to the DynamoDB table named UserInfoTable
Now you’re ready to run the application to verify the custom claim addition.
To run the web application
Change the directory to the pre-token-web-app directory and run the following command. cd pre-token-web-app
This directory contains a ReactJS web application that displays details of the identity token. On the terminal, run the following commands to run the ReactJS application. npm install npm start
This should open http://localhost:8081 in your default browser window that shows the Login button.
Figure 6: Browser opens to URL http://localhost:8081
Choose the Login button. After you do so, the Cognito-hosted login screen is displayed. Log in to the website with the user identity you created by using the addUserInfo.py script in step 1 of the To add users to your Cognito user pool procedure.
Figure 7: Input credentials in the Cognito-hosted login screen
When the login is successful, the next screen displays the identity and access tokens in the URL. You can reveal the token details to verify that the custom claim has been added to the token by choosing the Show Token Detail button.
Figure 8: Token details displayed in the browser
What happened behind the scenes?
In this web application, the following steps happened behind the scenes:
When you ran the npm start command on the terminal command line, that ran the react-scripts start command from package.json. The port number (8081) was configured in the pre-token-web-app/.env file. This opened the web application that was defined in app.js in the default browser at the URL http://localhost:8081.
The Login button is configured to navigate to the URL that was defined in the constants.js file. The constants.js file was generated during the running of the setup_pretoken.py script. This URL points to the Cognito-hosted default login user interface.
When you provided the login information (username and password), Amazon Cognito authenticated the user. Before generating the set of tokens (identity token and access token), Cognito first called the pre-token-generation Lambda trigger. This Lambda function has the code to connect to the DynamoDB database. The Lambda function can then access the project information for the user that is stored in the userInfo table. The Lambda function read this project information and added it to the identity token that was delivered to the web application.
After a successful login, Amazon Cognito redirected to the URL that was specified in the App Client Settings section, and added the token to the URL.
The webpage detected the token in the URL and displayed the Show Token Detail button. When you selected the button, the webpage read the token in the URL, decoded the token, and displayed the information in the relevant text boxes.
Notice that the Decoded ID Token box shows the custom claim named projects that displays the projectID that was added by the PretokenGenerationLambdaFunction-pretokenCognito trigger.
How to use the sample code in your application
We recommend that you use this sample code with the following modifications:
The code provided does not implement the API Gateway and Lambda functions that consume the custom claim information. You should implement the necessary Lambda functions and read the custom claim for the event object. This event object is a JSON-formatted object that contains authorization data.
The projectId of the user is available in the token. Therefore, when the token is passed by the Authorization trigger to the back end, this custom claim information can be used to perform actions specific to the project for that user. For example, getting all of that user’s work items that are related to the project.
Because the token is valid for one hour, the information in the custom claim information is available to the user interface during that time.
You can use the AWS Amplify library to simplify the communication between your web application and Amazon Cognito. AWS Amplify can handle the token retention and refresh token mechanism for the web application. This also removes the need for the token to be displayed in the URL.
If you’re using Amazon Cognito to manage your users and authenticate them, using the Amazon Cognito user pool to control access to your API is easier, because you don’t have to write the authentication code in your authorizer.
If you decide to use Lambda authorizers, note the following important information from the topic Steps to create an API Gateway Lambda authorizer: “In production code, you may need to authenticate the user before granting authorization. If so, you can add authentication logic in the Lambda function as well by calling an authentication provider as directed in the documentation for that provider.”
Lambda authorizer is recommended if the final authorization (not just token validity) decision is made based on custom claims.
Conclusion
In this blog post, we demonstrated how to implement fine-grained authorization based on data stored in the back end, by using claims stored in an identity token that is generated by the Amazon Cognito pre token generation trigger. This solution can help you achieve a reduction in latency and improvement in performance.
This file is encrypted using AES-256-CBC encryption combined with Base64 encoding.
A 4-digit application PIN (which gets set during the initial onboarding when a user first instals the application) is the encryption password used to protect or encrypt the licence data.
The problem here is that an attacker who has access to the encrypted licence data (whether that be through accessing a phone backup, direct access to the device or remote compromise) could easily brute-force this 4-digit PIN by using a script that would try all 10,000 combinations….
[…]
The second design flaw that is favourable for attackers is that the Digital Driver Licence data is never validated against the back-end authority which is the Service NSW API/database.
This means that the application has no native method to validate the Digital Driver Licence data that exists on the phone and thus cannot perform further actions such as warn users when this data has been modified.
As the Digital Licence is stored on the client’s device, validation should take place to ensure the local copy of the data actually matches the Digital Driver’s Licence data that was originally downloaded from the Service NSW API.
As this verification does not take place, an attacker is able to display the edited data on the Service NSW application without any preventative factors.
This blog post shows you how to share encrypted Amazon Simple Storage Service (Amazon S3) buckets across accounts on a multi-tenant data lake. Our objective is to show scalability over a larger volume of accounts that can access the data lake, in a scenario where there is one central account to share from. Most use cases involve multiple groups or customers that need to access data across multiple accounts, which makes data lake solutions inherently multi-tenant. Therefore, it becomes very important to associate data assets and set policies to manage these assets in a consistent way. The use of AWS Key Management Service (AWS KMS) simplifies seamless integration with AWS services and offers improved data protection to ultimately enable data lake services (for example, Amazon EMR, AWS Glue, or Amazon Redshift).
To further enable access at scale, you can use AWS Identity and Access Management (IAM) to restrict access to all IAM principals in your organization in AWS Organizations through the aws:PrincipalOrgID condition key. When used in a role trust policy, this key enables you to restrict access to IAM principals in your organization when assuming the role to gain access to resources in your account. This is more scalable than adding every role or account that needs access in the trust policy’s Principal element, and automatically includes future accounts that you may add to your organization.
A data lake that is built on AWS uses Amazon S3 as its primary storage location; Amazon S3 provides an optimal foundation for a data lake because of its virtually unlimited scalability and high durability. AWS Lake Formation can be used as your authorization engine to manage or enforce permissions to your data lake with integrated services such as Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum. For AWS services that integrate with Lake Formation and honor Lake Formation permissions, see the Lake Formation Developer Guide.
This post focuses on services that are not integrated with Lake Formation or where Lake Formation is not enabled on the account; in these cases, direct access for S3 buckets is required for defining and enforcing access control policies. You will learn how to use attribute-based access control (ABAC) as an authorization strategy that defines fine-grained permissions based on attributes, where fewer IAM roles are required to be provisioned, which helps scale your authorization needs. By using ABAC in conjunction with S3 bucket policies, you can authorize users to read objects based on one or more tags that are applied to S3 objects and to the IAM role session of your users based on key-value pair attributes, named session tags. ABAC reduces the number of policies, because session tags are easier to manage and establish a differentiation in job policies. The session tags will be passed when you assume an IAM role or federate a user (through your identity provider) in AWS Security Token Service (AWS STS). This enables administrators to configure a SAML-based identity provider (IdP) to send specific employee attributes as tags in AWS. As a result, you can simplify the creation of fine-grained permissions for employees to get access only to the AWS resources with matching tags.
For simplicity, in this solution you will be using an AWS Command Line Interface (AWS CLI) request for temporary security credentials when generating a session. You will apply the global aws:PrincipalOrgID condition key in your resource-based policies to restrict access to accounts in your AWS organization. This key will restrict access even though the Principal element uses the wildcard character * which applies to all AWS principals in the Principal element. We will suggest additional controls where feasible.
Cross-account access
From a high-level overview perspective, the following items are a starting point when enabling cross-account access. In order to grant cross-account access to AWS KMS-encrypted S3 objects in Account A to a user in Account B, you must have the following permissions in place (objective #1):
The bucket policy in Account A must grant access to Account B.
The AWS KMS key policy in Account A must grant access to the user in Account B.
By establishing these permissions, you will learn how to maintain entitlements (objective #2) at the bucket or object level, explore cross-account bucket sharing at scale, and overcome limitations such as inline policy size or bucket policy file size (you can learn more details in the Policies overview section). As an extension, you:
Enable granular permissions.
Grant access to groups of resources by tags.
Configuration options can be challenging for cross-account access, especially when the objective is to scale across a large number of accounts to a multi-tenant data lake. This post offers to orchestrate the various configurations options in such a way that both objectives #1 and #2 are met and challenges are addressed. Note that although not all AWS services support tag-based authorization today, we are seeing an increase in support for ABAC.
Solution overview
Our objective is to overcome challenges and design backwards with scalability in mind. The following table depicts the challenges, and outlines recommendations for a better design.
Challenge
Recommendation detail
Use employee attributes from your corporate directory
You can configure your SAML-based or web identity provider to pass session tags to AWS. When your employees federate into AWS, their attributes are applied to their resulting principal in AWS. You can then use ABAC to allow or deny permissions based on those attributes.
Enable granular permissions
ABAC requires fewer policies, because differentiation in job policies is given through session tags, which are easier to manage. Permissions are granted automatically based on attributes.
Grant access to resources by tags
When you use ABAC, you can allow actions on all resources, but only if the resource tag matches the principal’s tag and/or the organization’s tag. It’s best practice to grant least privilege.
Not all services support tag-based authorization
Check for service updates and design around the limitations.
Scale with innovation
ABAC permissions scale with innovation, because it’s not necessary for the administrator to update existing policies to allow access to new resources.
Architecture overview
Figure 1: High-level architecture
The high-level architecture in Figure 1 is designed to show you the advantages of scaling ABAC across accounts and to reflect on a common model that applies to most large organizations.
Solution walkthrough
Here is a brief introduction to the basic components that make up the solution:
Identity provider (IdP) – Can be either on-premises or in the cloud. Your administrators configure your SAML-based IdP to allow workforce users federated access to AWS resources using credentials from your corporate directory (backed by Active Directory). Now you can configure your IdP to pass in user attributes as session tags in federated AWS sessions. These tags can also be transitive, persisting even across subsequent role assumptions.
Central governance account – Helps to establish the access control policies. It takes in the user’s attributes from the IdP and grants permission to access resources. The X-Acct Control Manager is a collection of IAM roles with trust boundaries that grants access to the SAML role by role re-assumption, also known as role chaining. The governance account manages access to the SAML role and trust relationship by using OpenID Connect (OIDC) to allow users to assume roles and pass session tags. This account’s primary purpose is to manage permissions for secure data access. This account can also consume data from the lake if necessary.
Multiple accounts to connect to a multi-tenant data lake – Mirrors a large organization with a multitude of accounts that would like cross-account read access. Each account uses a pre-pave data manager role (one that has been previously established) that can tag or untag any IAM roles in the account, with the cross-account trust allowing users to assume a role to a specific central account. Any role besides the data manager should be prevented from tagging or untagging IAM principals to help prevent unauthorized changes.
Member Analytics organizational unit (OU) – Is where the EMR analytics clusters are connecting to the data in the consumptions layer to visualize by using business intelligence (BI) tools such as Tableau. Access to the shared buckets is granted through bucket and identity policies. Multiple accounts may also have access to buckets within their own accounts. Since this is a consumer account, this account is only joining the lake to consume data from the lake and will not be contributing any data to the data lake.
Central Data Lake OU – Is the account that owns the data stored on Amazon S3. The objects are encrypted, which will require the IAM role to have permissions to the specified AWS KMS key in the key policy. AWS KMS supports the use of the aws:ResourceTag/tag-key global condition context key within identity policies, which lets you control access to AWS KMS keys based on the tags on the key.
Prerequisites
For using SAML session tags for ABAC, you need to have the following:
Access to a SAML-based IdP where you can create test users with specific attributes.
For simplicity, you will be using an AWS CLI request for temporary security credentials when generating a session.
You’ll be passing session tags using AssumeRoleWithSAML.
AWS accounts where users can sign in. There are five accounts for interaction defined with accommodating policies and permission, as outlined in Figure 1. The numerals listed here refer to the labels on the figure:
IdP account – IAM user with administrative permission (1)
Central governance account – Admin/Writer ABAC policy (2)
Sample accounts to connect to multi-tenant data lake – Reader ABAC policy (3)
Member Analytics OU account – Assume roles in shared bucket (4)
Central Data Lake OU account – Pave (that is create) or update virtual private cloud (VPC) condition and principals/PrincipalOrgId in a shared bucket (5)
AWS resources, such as Amazon S3, AWS KMS, or Amazon EMR.
Any third-party software or hardware, such as BI tools.
Our objective is to design an AWS perimeter where access is allowed only if necessary and sufficient conditions are met for getting inside the AWS perimeter. See the following table, and the blog post Establishing a data perimeter on AWS for more information.
Boundary
Perimeter objective
AWS services used
Identity
Only My Resources Only My Networks
Identity-based policies and SCPs
Resource
Only My IAM Principals Only My Networks
Resource-based policies
Network
Only My IAM Principals Only My Resources
VPC endpoint (VPCE) policies
There are multiple design options, as described in the whitepaper Building A Data Perimeter on AWS, but this post will focus on option A, which is a logical AND (∧) conjunction of principal organization, resource, and network:
(Only My aws:PrincipalOrgID) ∧ (Only My Resource) ∧ (Only My Network)
(Only My IAM Principals) ∧ (Only My Resource) ∧ (Only My Network)
(Only My aws:PrincipalOrgID) ∧ (Only My IAM Principals) ∧ (Only My Resource) ∧ (Only My Network)
Policies overview
In order to properly design and consider control points as well as scalability limitations, the following table shows an overview of the policies applied in this post. It outlines the design limitations and briefly discusses the proposed solutions, which are described in more detail in the Establish an ABAC policy section.
Policy type
Sample policies
Limitations to overcome
Solutions outlined in this blog
IAM user policy
AllowIamUserAssumeRole
AllowPassSessionTagsAndTransitive
IAMPolicyWithResourceTagForKMS
Inline JSON policy document is limited to (2048 bytes).
For using KMS keys, accounts are limited to specific AWS Regions.
Enable session tags, which expire and require credentials.
It is a best practice to grant least privilege permissions with AWS Identity and Access Management (IAM) policies.
S3 bucket policy
Principals-only-from-given-Org
Access-to-specific-VPCE-only
KMS-Decrypt-with-specific-key
DenyUntaggedData
The bucket policy has a 20 KB maximum file size.
Data exfiltration or non-trusted writing to buckets can be restricted by a combination of policies and conditions.
Use aws:PrincipalOrgID to simplify specifying the Principal element in a resource-based policy.
No manual updating of account IDs required, if they belong to the intended organization.
~ VPCE policy
DenyNonApprovedIPAddresses VPCEndpointsNonAdmin
DenyNonSSLConnections
DenyIncorrectEncryptionKey
DenyUnEncryptedObjectUploads
aws:PrincipalOrgID opens up broadly to accounts with single control—requiring additional controls.
Make sure that principals aren’t inadvertently allowed or denied access.
Restrict access to VPC with endpoint policies and deny statements.
KMS key policy
Allow-use-of-the-KMS-key-for-organization
Listing all AWS account IDs in an organization.
Maximum key policy document size is 32 KB, which applies to KMS key.
Changing a tag or alias might allow or deny permission to a KMS key.
Specify the organization ID in the condition element, instead of listing all the AWS account IDs.
AWS owned keys do not count against these quotas, so use these where possible.
Assure visibility of API operations
By using a combination of these policies and conditions, you can work to mitigate accidental or intentional exfiltration of data, IAM credentials, and VPC endpoints. You can also alleviate writing to a bucket that is owned by a non-trusted account by using even more restrictive policies for write operations. For example, use the s3:ResourceAccount condition key to filter access to trusted S3 buckets that belong only to specific AWS accounts.
Today, the scalability of cross-account bucket sharing is limited by the current allowed S3 bucket policy size (20 KB) and KMS key policy size (32 KB). Cross-account sharing also may increase risk, unless the appropriate guardrails are in place. The aws:PrincipalOrgID condition helps by allowing sharing of resources such as buckets only to accounts and principals within your organization. If ABAC is also used for enforcement, tags used for authorization also need to be governed across all participating accounts; this includes protecting them from modification and also helps ensure only authorized principals can use ABAC when gaining access.
Figure 2 demonstrates a sample scenario for the roles from a specific organization in AWS Organizations, including multiple application accounts (Consumer Account 1, Consumer Account 2) with defined tags accessing an S3 bucket that is owned by Central Data Lake Account.
Figure 2: Sample scenario
Establish an ABAC policy
AWS has published additional blog posts and documentation for various services and features that are helpful supplements to this post. For more information, see the following resources:
Federate the user with permissions to assume roles with the same tags. By way of tagging the users, they automatically get access to assume the correct role, if they work only on one project and team.
Create the customer managed policy to attach to the federated user that uses ABAC to allow role assumption.
The federated user can assume a role only when the user and role tags are matching. More detailed instructions for defining permissions to access AWS resources based on tags are available in Create the ABAC policy in the IAM tutorial: Define permissions to access AWS resources based on tags.
Policy sample to allow the user to assume the role
The AllowUserAssumeRole statement in the following sample policy allows the federated user to assume the role test-session-tags with the attached policy. When that federated user assumes the role, they must pass the required session tags. For more information about federation, see Identity providers and federation in the AWS Identity and Access Management User Guide.
Allow cross-account access to an AWS KMS key. This is a key policy to allow principals to call specific operations on KMS keys.
Using ABAC with AWS KMS provides a flexible way to authorize access without editing policies or managing grants. Additionally, the aws:PrincipalOrgID global condition key can be used to restrict access to all accounts in your organization. This is more scalable than having to list all the account IDs in an organization within a policy. You accomplish this by specifying the organization ID in the condition element with the condition key aws:PrincipalOrgID. For detailed instructions about changing the key policy by using the AWS Management Console, see Changing a key policy in the AWS KMS Developer Guide.
You should use these features with care so that principals aren’t inadvertently allowed or denied access.
Policy sample to allow a role in another account to use a KMS key
To grant another account access to a KMS key, update its key policy to grant access to the external account or role. For instructions, see Allowing users in other accounts to use a KMS key in the AWS KMS Developer Guide.
The accounts are limited to roles that have an access-project=CDO5951 tag. You might attach this policy to roles in the example Alpha project.
Note: when using ABAC to grant access to an external account, you need to ensure that the tags used for ABAC are protected from modification across your organization.
Configure the S3 bucket policy.
For cross-account permissions to other AWS accounts or users in another account, you must use a bucket policy. Bucket policy is limited to a size of 20KB. For more information, see Access policy guidelines.
The idea of the S3 bucket policy is based on data classification, where the S3 bucket policy is used with deny statements that apply if the user doesn’t have the appropriate tags applied. You don’t need to explicitly deny all actions in the bucket policy, because a user must be authorized in both their identity policy and the S3 bucket policy in a cross-account scenario.
Policy sample for the S3 bucket
You can use the Amazon S3 console to add a new bucket policy or edit an existing bucket policy. For detailed instructions to create or edit a bucket policy, see Adding a bucket policy using the Amazon S3 console in the Amazon S3 User Guide. The following sample policy restricts access only to principals from organizations as listed in the policy and a specific VPC endpoint. It is important to test that all the AWS services involved in a solution can access this S3 bucket. For more information, see Resource-based policy examples in the whitepaper Building A Data Perimeter on AWS.
To create an interface endpoint, you must specify the VPC in which to create the interface endpoint, and the service to which to establish the connection.
Configure the SAML IdP (if you have one) or post in commands manually.
Typically, you configure your SAML IdP to pass in the project, cost-center, and department attributes as session tags. For more information, see Passing session tags using AssumeRoleWithSAML.
The following assume-role AWS CLI command helps you perform and test this request.
This example request assumes the test-session-tags role for the specified duration with the included session policy, session tags, external ID, and source identity. The resulting session is named my-session.
Cleanup
To avoid unwanted charges to your AWS account, follow the instructions to delete the AWS resources that you created during this walkthrough.
Conclusion
This post explains how to share encrypted S3 buckets across accounts at scale by granting access across a large number of accounts to a multi-tenant data lake. The approach does not use Lake Formation, which may have advantages for use-cases requiring fine-grained access control. Instead, this solution uses a central governance account combining role-chaining with session tags to produce data for the lake. Permissions are granted through these tags. For read-only consumption of data by multiple accounts, cross-account read-only access is granted through Amazon EMR analytics clusters that access the data, visualizing them though BI tools. Access to the shared buckets is restricted through bucket and identity policies. Through the use of multiple AWS services and IAM features such as KMS key policies, IAM policies, S3 bucket policies, and VPCE policies, this solution enables controls that help secure access to the data, while enabling the users to use ABAC for access.
This approach is focused on scalability; you can generalize and repurpose the steps for different requirements and projects. As a result, this scalable solution for services not integrated with AWS Lake Formation can be customized with AWS KMS by way of combining with ABAC to authorize access without editing policies or managing grants. For services that honor Lake Formation permissions, you can use the Lake Formation Developer Guide to more easily set up integrated services to encrypt and decrypt data.
In summary, the design provided here is feasible for large projects, with appropriate controls to allow scalability potentially across more than 1,000 accounts.
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
As an AWS customer, if you use multiple Amazon Elastic Container Service (Amazon ECS) services/tasks to achieve better isolation, you often have the challenge of how to manage access to these containers. In such cases, using tags can enable you to categorize these services in different ways, such as by owner or environment.
This blog post shows you how tags allow conditional access to Amazon ECS resources. You can use attribute-based access control (ABAC) policies to grant access rights to users through the use of policies that combine attributes together. ABAC can be helpful in rapidly-growing environments, where policy management can become cumbersome. This blog post uses ECS resource tags (owner tag and environment tag) as the attributes that are used to control access in the policies.
Amazon ECS resources have many attributes, such as tags, which can be used to control permissions. You can attach tags to AWS Identity and Access Management (IAM) principals, and create either a single ABAC policy, or a small set of policies for your IAM principals. These ABAC policies can be designed to allow operations when the principal tag (a tag that exists on the user or role making the call) matches the resource tag. They can be used to simplify permission management at scale. A single Amazon ECS policy can enforce permissions across a range of applications, without having to update the policy each time you create new Amazon ECS resources.
This post provides a step-by-step procedure for creating ABAC policies for controlling access to Amazon ECS containers. As the team adds ECS resources to its projects, permissions are automatically applied based on the owner tag and the environment tag. As a result, no policy update is required for each new resource. Using this approach can save time and help improve security, because it relies on granular permissions rules.
Condition key mappings
It’s important to note that each IAM permission in Amazon ECS supports different types of tagging condition keys. The following table maps each condition key to its ECS actions.
Condition key
Description
ECS actions
aws:RequestTag/${TagKey}
Set this tag value to require that a specific tag be used (or not used) when making an API request to create or modify a resource that allows tags.
The following tutorial gives you a step-by-step process to create and test an Amazon ECS policy that allows IAM roles with principal tags to access resources with matching tags. When a principal makes a request to AWS, their permissions are granted based on whether the principal and resource tags match. This strategy allows individuals to view or edit only the ECS resources required for their jobs.
Scenario
Example Corp. has multiple Amazon ECS containers created for different applications. Each of these containers are created by different owners within the company. The permissions for each of the Amazon ECS resources must be restricted based on the owner of the container, and also based on the environment where the action is performed.
Assume that you’re a lead developer at this company, and you’re an experienced IAM administrator. You’re familiar with creating and managing IAM users, roles, and policies. You want to ensure that the development engineering team members can access only the containers they own. You also need a strategy that will scale as your company grows.
For this scenario, you choose to use AWS resource tags and IAM role principal tags to implement an ABAC strategy for Amazon ECS resources. The condition key mappings table shows which tagging condition keys you can use in a policy for each Amazon ECS action and resources. You can define the tags in the role you created. For this scenario, you define two tags Owner and Environment. These tags restrict permissions in the role based on the tags you defined.
Prerequisites
To perform the steps in this tutorial, you must already have the following:
An IAM role or user with sufficient privileges for services like IAM and ECS. Following the security best practices the role should have a minimum set of permissions and grant additional permissions as necessary. You can add the AWS managed policiesIAMFullAccess and AmazonECS_FullAccess to create the IAM role to provide permissions for creating IAM and ECS resources.
An AWS account that you can sign in to as an IAM role or user.
Experience creating and editing IAM users, roles, and policies in the AWS Management Console. For more information, see Tutorial to create IAM resources.
Create an ABAC policy for Amazon ECS resources
After you complete the prerequisites for the tutorial, you will need to define which Amazon ECS privileges and access controls you want in place for the users, and configure the tags needed for creating the ABAC policies. This tutorial focuses on providing step-by-step instructions for creating test users, defining the ABAC policies for the Amazon ECS resources, creating a role, and defining tags for the implementation.
To create the ABAC policy
You create an ABAC policy that defines permissions based on attributes. In AWS, these attributes are called tags.
The sample ABAC policy that follows provides ECS permissions to users when the principal’s tag matches the resource tag.
Sample ABAC policy for ECS resources
The sample ECS ABAC policy that follows allows the user to perform action on the ECS resources, but only when those resources are tagged with the same key-pairs as the principal.
Download the sample ECS policy. This policy allows principals to create, read, edit, and delete resources, but only when those resources are tagged with the same key-value pairs as the principal.
Use the downloaded ECS policy to create the ECS ABAC policy, and name your new policy ECSABACpolicy. For more information, see Creating IAM policies.
This sample policy provides permission to each ECS action based on the condition key that action supports. See to the condition key mappings table for a mapping of the ECS actions and the condition key they support.
What does this policy do?
The ECSCreateCluster statement allows users to create cluster, create and tag resources. These ECS actions only support the RequestTag condition key. This condition block returns true if every tag passed (tags: owner and environment) in the request is included in the specified list. This is done using the StringEquals condition operator. If an incorrect tag key other than owner or environment tag is passed, or incorrect value for the tags are passed, then the condition returns false. The ECS actions within these statements do not have a specific requirement of a resource type.
The ECSDeletion, ECSUpdate, and ECSDescribe statements allow users to update, delete or list/describe ECS resources. The ECS actions under these statements only support the ResourceTag condition key. Statements return true if the specified tag keys are present on the ECS resource and their values match the principal’s tags. These statements return false for mismatched tags (in this policy, the only acceptable tags are owner and environment), or for an incorrect value for the owner and environment tag passed to the ECS resources. They also return false for any ECS action that does not support resource tagging.
The ECSCreateService, ECSTaskControl, and ECSRegistration statements contain ECS actions that allow users to create a service, start or run tasks and register container instances in ECS. The ECS actions within these statements support both Request and Resource tag condition keys.
Create IAM roles
Create the following IAM roles and attach the ECSABAC policy you created in the previous procedure. You can create the roles and add tags to them using the AWS console, through the role creation flow, as shown in the following steps.
To create IAM roles
Sign in to the AWS Management Console and navigate to the IAM console.
In the left navigation pane, select Roles, and then select CreateRole.
Choose the Another AWS account role type.
For Account ID, enter the AWS account ID mentioned in the prerequisites to which you want to grant access to your resources.
Choose Next: Permissions.
IAM includes a list of the AWS managed and customer managed policies in your account. Select the ECSABAC policy you created previously from the dropdown menu to use for the permissions policy. Alternatively, you can choose Create policy to open a new browser tab and create a new policy, as shown in Figure 1.
Figure 1. Attach the ECS ABAC policy to the role
Choose Next: Tags.
Add metadata to the role by attaching tags as key-value pairs. Add the following tags to the role: for Keyowner, enter Valuemock_owner; and for Keyenvironment, enter development, as shown in Figure 2.
Figure 2. Define the tags in the IAM role
Choose Next: Review.
For Role name, enter a name for your role. Role names must be unique within your AWS account.
Review the role and then choose Create role.
Test the solution
The following sections present some positive and negative test cases that show how tags can provide fine-grained permission to users through ABAC policies.
Prerequisites for the negative and positive testing
Before you can perform the positive and negative tests, you must first do these steps in the AWS Management Console:
Follow the procedures above for creating IAM role and the ABAC policy.
Switch the role from the role assumed in the prerequisites to the role you created in To create IAM Roles above, following the steps in the documentation Switching to a role.
Perform negative testing
For the negative testing, three test cases are presented here that show how the ABAC policies prevent successful creation of the ECS resources if the owner or environment tags are missing, or if an incorrect tag is used for the creation of the ECS resource.
Negative test case 1: Create cluster without the required tags
In this test case, you check if an ECS cluster is successfully created without any tags. Create an Amazon ECS cluster without any tags (in other words, without adding the owner and environment tag).
To create a cluster without the required tags
Sign in to the AWS Management Console and navigate to the IAM console.
From the navigation bar, select the Region to use.
In the navigation pane, choose Clusters.
On the Clusters page, choose Create Cluster.
For Select cluster compatibility, choose Networking only, then choose Next Step.
On the Configure cluster page, enter a cluster name. For Provisioning Model, choose On-Demand Instance, as shown in Figure 3.
Figure 3. Create a cluster
In the Networking section, configure the VPC for your cluster.
Don’t add any tags in the Tags section, as shown in Figure 4.
Figure 4. No tags added to the cluster
Choose Create.
Expected result of negative test case 1
Because the owner and the environment tags are absent, the ABAC policy prevents the creation of the cluster and throws an error, as shown in Figure 5.
Figure 5. Unsuccessful creation of the ECS cluster due to missing tags
Negative test case 2: Create cluster with a missing tag
In this test case, you check whether an ECS cluster is successfully created missing a single tag. You create a cluster similar to the one created in Negative test case 1. However, in this test case, in the Tags section, you enter only the owner tag. The environment tag is missing, as shown in Figure 6.
In the Tags section, add the owner tag and enter its value as mock_user.
Figure 6. Create a cluster with the environment tag missing
Expected result of negative test case 2
The ABAC policy prevents the creation of the cluster, due to the missing environment tag in the cluster. This results in an error, as shown in Figure 7.
Figure 7. Unsuccessful creation of the ECS cluster due to missing tag
Negative test case 3: Create cluster with incorrect tag values
In this test case, you check whether an ECS cluster is successfully created with incorrect tag-value pairs. Create a cluster similar to the one in Negative test case 1. However, in this test case, in the Tags section, enter incorrect values for the owner and the environment tag keys, as shown in Figure 8.
In the Tags section, add the owner tag and enter the value as test_user; add the environment tag and enter the value as production.
Figure 8. Create a cluster with the incorrect values for the tags
Expected result of negative test case 3
The ABAC policy prevents the creation of the cluster, due to incorrect values for the owner and environment tags in the cluster. This results in an error, as shown in Figure 9.
Figure 9. Unsuccessful creation of the ECS cluster due to incorrect value for the tags
Perform positive testing
For the positive testing, two test cases are provided here that show how the ABAC policies allow successful creation of ECS resources, such as ECS clusters and ECS tasks, if the correct tags with correct values are provided as input for the ECS resources.
Positive test case 1: Create cluster with all the correct tag-value pairs
This test case checks whether an ECS cluster is successfully created with the correct tag-value pairs when you create a cluster with both the owner and environment tag that matches the ABAC policy you created earlier.
To create a cluster with all the correct tag-value pairs
In the Tags section, add the owner tag and enter the value as mock_user; add the environment tag and enter the value as development, as shown in Figure 10.
Figure 10. Add correct tags to the cluster
Expected result of positive test case 1
Because both the owner and the environment tags were input correctly, the ABAC policy allows the successful creation of the cluster without throwing an error, as shown in Figure 11.
Figure 11. Successful creation of the cluster
Positive test case 2: Create standalone task with all the correct tag-value pairs
Deploying your application as a standalone task can be ideal in certain situations. For example, suppose you’re developing an application, but you aren’t ready to deploy it with the service scheduler. Maybe your application is a one-time or periodic batch job, and it doesn’t make sense to keep running it, or to restart when it finishes.
For this test case, you run a standalone task with the correct owner and environment tags that match the ABAC policy.
To create a standalone task with all the correct tag-value pairs
To run a standalone task, see Run a standalone task in the Amazon ECS Developer Guide. Figure 12 shows the beginning of the Run Task process.
Figure 12. Run a standalone task
In the Task tagging configuration section, under Tags, add the owner tag and enter the value as mock_user; add the environment tag and enter the value as development, as shown in Figure 13.
Figure 13. Creation of the task with the correct tag
Expected result of positive test case 2
Because you applied the correct tags in the creation phase, the task is created successfully, as shown in Figure 14.
Figure 14. Successful creation of the task
Cleanup
To avoid incurring future charges, after completing testing, delete any resources you created for this solution that are no longer needed. See the following links for step-by-step instructions for deleting the resources you created in this blog post.
This post demonstrates the basics of how to use ABAC policies to provide fine-grained permissions to users based on attributes such as tags. You learned how to create ABAC policies to restrict permissions to users by associating tags with each ECS resource you create. You can use tags to manage and secure access to ECS resources, including ECS clusters, ECS tasks, ECS task definitions, and ECS services.
If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on AWS Secrets Manager re:Post or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
This blog post demonstrates how you can secure Amazon API Gateway HTTP endpoints with JSON web token (JWT) authorizers. Amazon API Gateway helps developers create, publish, and maintain secure APIs at any scale, helping manage thousands of API calls. There are no minimum fees, and you only pay for the API calls you receive.
This post includes step-by-step guidance for setting up JWT authorizers using Amazon Cognito as the identity provider, configuring HTTP APIs to use JWT authorizers, and examples to test the entire setup. If you want to protect HTTP APIs using Lambda and IAM authorizers, you can refer to Introducing IAM and Lambda authorizers for Amazon API Gateway HTTP APIs.
Prerequisites
Before you can set up a JWT authorizer using Cognito, you first need to create three Lambda functions. You should create each Lambda function using the following configuration settings, permissions, and code:
The first Lambda function (Pre-tokenAuthLambda) is invoked before the token generation, allowing you to customize the claims in the identity token.
The second Lambda function (LambdaForAdminUser) acts as the HTTP API Gateway integration target for /AdminUser HTTP API resource route.
The third Lambda function (LambdaForRegularUser) acts as the HTTP API Gateway integration target for /RegularUser HTTP API resource route.
IAM policy for Lambda function
You first need to create an IAM role using the following IAM policy for each of the three Lambda functions:
For the three Lambda functions, use these settings:
Function name
Enter an appropriate name for the Lambda function, for example:
Pre-tokenAuthLambda for the first Lambda
LambdaForAdminUser for the second
LambdaForRegularUser for the third
Runtime
Choose Node.js 12.x
Permissions
Choose Use an existing role and select the role you created with the IAM policy in the Prerequisites section above.
Pre-tokenAuthLambda code
This first Lambda code, Pre-tokenAuthLambda, converts the authenticated user’s Cognito group details to be returned as the scope claim in the id_token returned by Cognito.
This Lambda code, LambdaForAdminUser, acts as the HTTP API Gateway integration target and sends back the response Hello from Admin User when the /AdminUser resource path is invoked in API Gateway.
This Lambda code, LambdaForRegularUser , acts as the HTTP API Gateway integration target and sends back the response Hello from Regular User when the /RegularUser resource path is invoked within API Gateway.
Choose Manage User Pools, then choose Create a user pool.
Figure 1: Create a user pool
Enter a Pool name, then choose Review defaults.
Figure 2: Review defaults while creating the user pool
Choose Add app client.
Figure 3: Add an app client for the user pool
Enter an app client name. For this example, keep the default options. Choose Create app client to finish.
Figure 4: Review the app client configuration and create it
Choose Return to pool details, and then choose Create pool.
Figure 5: Complete the creation of user pool setup
To configure Cognito user pool settings
Now you can configure app client settings:
On the left pane, choose App client settings. In Enabled Identity Providers, select the identity providers you want for the apps you configured in the App Clients tab.
Enter the Callback URLs you want, separated by commas. These URLs apply to all selected identity providers.
Under OAuth 2.0, select the from the following options.
For Allowed OAuth Flows, select Authorization code grant.
For Allowed OAuth Scopes, select phone, email, openID, and profile.
Choose Save changes.
Figure 6: Configure app client settings
Now add the domain prefix to use for the sign-in pages hosted by Amazon Cognito. On the left pane, choose Domain name and enter the appropriate domain prefix, then Save changes.
Figure 7: Choose a domain name prefix for the Amazon Cognito domain
Next, create the pre-token generation trigger. On the left pane, choose Triggers and under Pre Token Generation, select the Pre-tokenAuthLambda Lambda function you created in the Prerequisites procedure above, then choose Save changes.
Figure 8: Configure Pre Token Generation trigger Lambda for user pool
Finally, create two Cognito groups named admin and regular. Create two Cognito users named adminuser and regularuser. Assign adminuser to both admin and regular group. Assign regularuser to regular group.
Figure 9: Create groups and users for user pool
Configuring HTTP endpoints with JWT authorizer
The first step to configure HTTP endpoints is to create the API in the API Gateway management console.
Figure 10: Create an API in API Gateway management console
Choose HTTP API and select Build.
Figure 11: Choose Build option for HTTP API
Under Create and configure integrations, enter JWTAuth for the API name and choose Review and Create.
Figure 12: Create Integrations for HTTP API
Once you’ve created the API JWTAuth, choose Routes on the left pane.
Figure 13: Navigate to Routes tab
Choose Create a route and select GET method. Then, enter /AdminUser for the path.
Figure 14: Create the first route for HTTP API
Repeat step 5 and create a second route using the GET method and /RegularUser for the path.
Figure 15: Create the second route for HTTP API
To create API integrations
Now that the two routes are created, select Integrations from the left pane.
Figure 16: Navigate to Integrations tab
Select GET for the /AdminUser resource path, and choose Create and attach an integration.
Figure 17: Attach an integration to first route
To create an integration, select the following values
Integration type: Lambda function Integration target: LambdaForAdminUser
Choose Create. NOTE: LambdaForAdminUser is the Lambda function you previously created as part of the Prerequisites procedure LambdaForAdminUser code.
Figure 18: Create an integration for first route
Next, select GET for the /RegularUser resource path and choose Create and attach an integration.
Figure 19: Attach an integration to second route
To create an integration, select the following values
Integration type: Lambda function Integration target: LambdaForRegularUser
Choose Create. NOTE: LambdaForRegularUser is the Lambda function you previously created as part of the Prerequisites procedure LambdaForRegularUser code.
Figure 20: Create an integration for the second route
To configure API authorization
Select Authorization from the left pane, select /AdminUser path and choose Create and attach an authorizer.
Figure 21: Navigate to Authorization left pane option to create an authorizer
For Authorizer type select JWT and under Authorizer settings enter the following details:
Figure 22: Create and attach an authorizer to HTTP API first route
In the Authorizer for route GET /AdminUser screen, choose Add scope in the Authorization Scope section and enter scope name as admin-<app_client_id> and choose Save.
Figure 23: Add authorization scopes to first route of HTTP API
Now select the /RegularUser path and from the dropdown, select the JWTAuth authorizer you created in step 3. Choose Attach authorizer.
Figure 24: Attach an authorizer to HTTP API second route
Choose Add scope and enter the scope name as regular-<app_client_id> and choose Save.
Figure 25: Add authorization scopes to second route of HTTP API
Enter Test as the Name and then choose Create.
Figure 26: Create a stage for HTTP API
Under Select a stage, enter Test, and then choose Deploy to stage.
Figure 27: Deploy HTTP API to stage
Test the JWT authorizer
You can use the following examples to test the API authentication. We use Curl in this example, but you can use any HTTP client.
To test the API authentication
Send a GET request to the /RegularUser HTTP API resource without specifying any authorization header.
curl -s -X GET https://a1b2c3d4e5.execute-api.us-east-1.amazonaws.com/RegularUser
API Gateway returns a 401 Unauthorized response, as expected.
{“message”:”Unauthorized”}
The required $request.header.Authorization identity source is not provided, so the JWT authorizer is not called. Supply a valid Authorization header key and value. You authenticate as the regularuser, using the aws cognito-idp initiate-auth AWS CLI command.
aws cognito-idp initiate-auth --auth-flow USER_PASSWORD_AUTH --client-id <Cognito User Pool App Client ID> --auth-parameters USERNAME=regularuser,PASSWORD=<Password for regularuser>
The command response contains a JWT (IdToken) that contains information about the authenticated user. This information can be used as the Authorization header value.
curl -H "Authorization: aaabbbcccddd1234567890" -s -X GET https://a1b2c3d4e5.execute-api.us-east-1.amazonaws.com/RegularUser
API Gateway returns the response Hello from Regular User. Now test access for the /AdminUser HTTP API resource with the JWT token for the regularuser.
curl -H "Authorization: aaabbbcccddd1234567890" -s -X GET "https://a1b2c3d4e5.execute-api.us-east-1.amazonaws.com/AdminUser"
API Gateway returns a 403 – Forbidden response. {“message”:”Forbidden”} The JWT token for the regularuser does not have the authorization scope defined for the /AdminUser resource, so API Gateway returns a 403 – Forbidden response.
Next, log in as adminuser and validate that you can successfully access both /RegularUser and /AdminUser resource. You use the cognito-idp initiate-auth AWS CLI command.
aws cognito-idp initiate-auth --auth-flow USER_PASSWORD_AUTH --client-id <Cognito User Pool App Client ID> --auth-parameters USERNAME=adminuser,PASSWORD==<Password for adminuser>
Using Curl, you can validate that the adminuser JWT token now has access to both the /RegularUser resource and the /AdminUser resource. This is possible when adminuser is part of both Cognito groups, so the JWT token contains both authorization scopes.
curl -H "Authorization: a1b2c3d4e5c6aabbbcccddd" -s -X GET https://a1b2c3d4e5.execute-api.us-east-1.amazonaws.com/RegularUser
API Gateway returns the response Hello from Regular User
curl -H "Authorization: a1b2c3d4e5c6aabbbcccddd" -s -X GET https://a1b2c3d4e5.execute-api.us-east-1.amazonaws.com/AdminUser
API Gateway returns the following response Hello from Admin User
Conclusion
AWS enabled the ability to manage access to an HTTP API in API Gateway in multiple ways: with Lambda authorizers, IAM roles and policies, and JWT authorizers. This post demonstrated how you can secure API Gateway HTTP API endpoints with JWT authorizers. We configured a JWT authorizer using Amazon Cognito as the identity provider (IdP). You can achieve the same results with any IdP that supports OAuth 2.0 standards. API Gateway validates the JWT that the client submits with API requests. API Gateway allows or denies requests based on token validation along with the scope of the token. You can configure distinct authorizers for each route of an API, or use the same authorizer for multiple routes.
If you’re considering delegating encryption to an AWS service to use a key under your control when it encrypts your data in that service, you might wonder how to ensure the AWS service can only use your key when you want it to and not have full access to decrypt any of your resources at any time. The answer is to use scoped-down dynamic permissions in AWS KMS. Specifically, a combination of permissions that you define in the KMS key policy document along with additional permissions that are created dynamically using KMS grants define the conditions under which one or more AWS services can use your KMS keys to encrypt and decrypt your data.
In this blog post, I discuss:
An example of how an AWS service uses your KMS key policy and grants to securely manage access to your encryption keys. The example uses Amazon RDS and demonstrates how the block storage volume behind your database instance is encrypted.
Best practices for using grants from AWS KMS in your own workloads.
Recent performance improvements when using grants in AWS KMS.
Case study: How RDS uses grants from AWS KMS to encrypt your database volume
Many Amazon RDS instance types are hosted on an Amazon Elastic Compute Cloud (Amazon EC2) instance where the underlying storage layer is an Amazon Elastic Block Store (Amazon EBS) volume. The blocks of the EBS volume that stores the database content are encrypted under a randomly generated 256-bit symmetric data key that is itself encrypted under a KMS key that you configure RDS to use when you create your database instance. Let’s look at how RDS interacts with EBS, EC2, and AWS KMS to securely create an RDS instance using an KMS key.
When you send a request to RDS to create your database, there are several asynchronous requests being made among the RDS, EC2, EBS, and KMS services to:
Create the underlying storage volume with a unique encryption key.
Create the compute instance in EC2.
Load the database engine into the EC2 instance.
Give the EC2 instance permissions to use the encryption key to read and write data to the database storage volume.
The initial authenticated request that you make to RDS to create a new database is made by an AWS Identity and Access Management (IAM) principal in your account (e.g. a user or role). Once the request is received, a series of things has to happen:
RDS needs to request EBS to create an encrypted volume to store your future data.
EBS needs to request AWS KMS generate a unique 256-bit data key for the volume and encrypt it under the KMS key you told RDS to use.
RDS then needs to request that EC2 launch an instance, attach that encrypted volume, and make the data key available to EC2 for use in reads and writes to the volume.
From your perspective, the IAM principal used to create the database also must have permissions in the KMS key policy for the GenerateDataKeyWithoutPlaintext and Decrypt actions. This enables the unique 256-bit data key to be created and encrypted under the desired KMS key as well as allowing the user or role to have the data key decrypted and provisioned to the Nitro card managing your EC2 instance so that reads/writes can happen from/to the database. Given the asynchronous nature of the process of creating the database vs. launching the database volume in the future, how do the RDS, EBS, and EC2 services all get the necessary least privileged permissions to create and provision the data key for use with your database? The answer starts with your IAM principal having permission for the AWS KMS CreateGrant action in the key policy.
RDS uses the identity from your IAM principal to create a grant in AWS KMS that allows it to create other grants for EC2 and EBS with very limited permissions that are further scoped down compared to the original permissions your IAM principal has on the AWS KMS key. A total of three grants are created:
The initial RDS grant.
A subsequent EBS grant that allows EBS to call AWS KMS and generate a 256-bit data key that is encrypted under the KMS key you defined when creating your database.
The attachment grant, which allows the specific EC2 instance hosting your database volume to decrypt the encrypted data key for and provision it for use during I/O between the instance and the EBS volume.
RDS grant
In this example, let’s say you’ve created an RDS instance with an ID of db-1234 and specified a KMS key for encryption. The following grant is created on the KMS key, allowing RDS to create more grants for EC2 and EBS to use in the asynchronous processes required to launch your database instance. The RDS grant is as follows:
In plain English, this grant gives RDS permissions to use the KMS key for three specific operations (API actions) only when the call specifies the RDS instance ID db-1234 in the Encryption Context parameter. The grant provides access for the the grantee principal, which in this case is the value shown for the <Regional RDS service account>. This grant is created in AWS KMS and associated with your KMS key. Because the EC2 instance hasn’t yet been created and launched, the grantee principal cannot include the EC2 instance ID and must instead be the regional RDS service account.
EBS grant
With the RDS instance and initial AWS KMS grant created, RDS requests EC2 to launch an instance for the RDS database. EC2 creates an instance with a unique ID (e.g. i-1234567890abcdefg) using EC2 permissions you gave to the original IAM principal. In addition to the EC2 instance being created, RDS requests that Amazon EBS create an encrypted volume dedicated to the database. As a part of volume creation, EBS needs permission to call AWS KMS to generate a unique 256-bit data key for the volume and encrypt that data key under the KMS key you defined.
The EC2 instance ID is used as the name of the identity for future calls to AWS KMS, so RDS inserts it as the grantee principal in the EBS grant it creates. The EBS grant is as follows:
You’ll notice that this grant uses the same encryption context as the initial RDS grant. However, now that we have the EC2 instance ID associated with the database ID, the permissions that EBS gets to use your key as the grantee principal can be scoped down to require both values. Once this grant is created, EBS can create the EBS volume (e.g. vol-0987654321gfedcba) and call AWS KMS to generate and encrypt a 256-bit data key that can only be used for that volume. This encrypted data key is stored by EBS in preparation for the volume attachment process.
Attachment grant
The final step in creating the RDS instance is to attach the EBS volume to the EC2 instance hosting your database. EC2 now uses the previously created EBS grant to create the attachment grant with the i-1234567890abcdefg instance identity. This grant allows EC2 to decrypt the encrypted data key, provision it to the Nitro card that manages the instance, and begin encrypting I/O to the EBS volume of the RDS database. The attachment grant in this example will be as follows:
The attachment grant is the most restrictive of the three grants. It requires the caller to know the IDs of all the AWS entities involved: EC2 instance ID, EBS volume ID, and RDS database ID. This design ensures that your KMS key can only be used for decryption by these AWS services in order to launch the specific RDS database you want.
The encrypted EBS volume is now active and attached to the EC2 instance. Should you terminate the RDS instance, the services retire all the relevant KMS grants so they no longer have any permission to use your KMS key to decrypt the 256-bit data key required to decrypt data in your database. If you need to launch your encrypted database again, a similar set of three grants will be dynamically created with the RDS database, EC2 instance, and EBS volume IDs used to scope down permissions on the AWS KMS key.
The process described in the previous paragraphs is graphically shown in Figure 1:
Considering all the AWS KMS key permissions that are added and removed as a part of launching a database, you might ask why not just use the key policy document to make these changes? A KMS key allows only one key policy with a maximum document size of 32 KB. Because one key could be used to encrypt any number of AWS resources, trying to dynamically add and remove scoped-down permissions related to each resource to the key policy document creates two risks. First, the maximum allowable size of the key policy document (32KB) might be exceeded. Second, depending on how many resources are being accessed concurrently, you may exceed the request rate quota for the PutKeyPolicy API action in AWS KMS.
In contrast, there can be any number of grants on a given AWS KMS key, each grant specifying a scoped-down permission for the use of a KMS key with any AWS service that integrated with AWS KMS. Grant creation and deletion is also designed for much higher-volume request rates than modifications to the key policy document. Finally, permission to call PutKeyPolicy is a highly privileged permission, as it lets the caller make unrestricted changes to the permissions on the key, including changes to administrative permissions to disable or schedule the key for deletion. Grants on a key can only allow permissions to use the key, not administer the key. Also, grants that allow the creation of other grants by other IAM principals prohibit the escalation of privilege. In the RDS example above, the permissions RDS receives from the IAM principal in your account during the first CreateGrant request cannot be more permissive than what you defined for the IAM principal in the KMS key policy. The permissions RDS gives to EC2 and EBS during the database creation process cannot be more permissive than the original permission RDS has from the initial grant. This design ensures that AWS services cannot escalate their privileges and use your KMS key for purposes different than what you intend.
Best practices for using AWS KMS grants
AWS KMS grants are a powerful tool to dynamically define permissions to use keys. They are automatically created when you use server-side encryption features in various AWS services. You can also use grants to control permission in your own applications that perform client-side encryption. Here are some best practices to consider:
Design the permissions to be as scoped down as possible. Use a specific grantee principal, such as an IAM role, and give the principal access only to the AWS KMS API actions that are needed. You can further limit the scope of grants with the Encryption Context parameter by using any element you want to ensure callers are using the AWS KMS key only for the intended purpose. Below is a specific example that grants AWS account 123456789012 permission to call the GenerateDataKey or Decrypt APIs, but only if the supplied encryption context for customerID is 5678.
This grant could prevent your application from decrypting data belonging to customer “5678” without explicitly passing the expected customerID in the request to AWS KMS. This may be a useful defense-in-depth mechanism to prevent unauthorized access to your customers’ data if your application’s AWS credentials were compromised and used from a different caller who doesn’t know that encryption context is a required parameter for all reads and writes in order to encrypt and decrypt data.
Remember that grants don’t automatically expire. Your code needs to retire or revoke them once you know the permission is no longer needed on the KMS key. Grants that aren’t retired are leftover permissions that might create a security risk for encrypted resources. See retiring and revoking grants in the AWS KMS developer guide for more detail.
Avoid creating duplicate grants. A duplicate grant is a grant that shares the same AWS KMS key ID, API actions, grantee principal, encryption context, and name. If you retire the original grant after use and not the duplicates, then the leftover duplicate grants can lead to unintended access to encrypt or decrypt data.
Recent performance improvements to AWS KMS grants: Removing a resource quota
For customers who use AWS KMS to encrypt resources in AWS services that use grants, there used to be cases where AWS KMS had to enforce a quota on the number of concurrently active resources that could be encrypted under the same KMS key. For example, customers of Amazon RDS, Amazon WorkSpaces, or Amazon EBS would run into this quota at very large scale. This was the Grants for a given principal per key quota and was previously set to 500. You might have seen the error message “Keys only support 500 grants per grantee principal in this region” when trying to create a resource in one of these services.
We recently made a change to AWS KMS to remove this quota entirely and this error message no longer exists. With this quota removed, you can now attach unlimited grants to any KMS key when using any AWS service.
Summary
In this blog post, you’ve seen how services such as Amazon RDS use AWS KMS grants to pass scoped-down permissions through the Amazon EC2 and Amazon EBS instances. You also saw some best practices for using AWS KMS grants in your own applications. Finally, you learned about how AWS KMS has improved grants by removing one of the resource quotas.
Below are some additional resources for AWS KMS and grants.
June 5, 2021: We’ve updated Figure 1: User request flow.
Authorizing functionality of an application based on group membership is a best practice. If you’re building APIs with Amazon API Gateway and you need fine-grained access control for your users, you can use Amazon Cognito. Amazon Cognito allows you to use groups to create a collection of users, which is often done to set the permissions for those users. In this post, I show you how to build fine-grained authorization to protect your APIs using Amazon Cognito, API Gateway, and AWS Identity and Access Management (IAM).
As a developer, you’re building a customer-facing application where your users are going to log into your web or mobile application, and as such you will be exposing your APIs through API Gateway with upstream services. The APIs could be deployed on Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), AWS Lambda, or Elastic Load Balancing where each of these options will forward the request to your Amazon Elastic Compute Cloud (Amazon EC2) instances. Additionally, you can use on-premises services that are connected to your Amazon Web Services (AWS) environment over an AWS VPN or AWS Direct Connect. It’s important to have fine-grained controls for each API endpoint and HTTP method. For instance, the user should be allowed to make a GET request to an endpoint, but should not be allowed to make a POST request to the same endpoint. As a best practice, you should assign users to groups and use group membership to allow or deny access to your API services.
Solution overview
In this blog post, you learn how to use an Amazon Cognito user pool as a user directory and let users authenticate and acquire the JSON Web Token (JWT) to pass to the API Gateway. The JWT is used to identify what group the user belongs to, as mapping a group to an IAM policy will display the access rights the group is granted.
The following figure shows the basic architecture and information flow for user requests.
Figure 1: User request flow
Let’s go through the request flow to understand what happens at each step, as shown in Figure 1:
A user logs in and acquires an Amazon Cognito JWT ID token, access token, and refresh token. To learn more about each token, see using tokens with user pools.
A RestAPI request is made and a bearer token—in this solution, an access token—is passed in the headers.
API Gateway forwards the request to a Lambda authorizer—also known as a custom authorizer.
The Lambda authorizer verifies the Amazon Cognito JWT using the Amazon Cognito public key. On initial Lambda invocation, the public key is downloaded from Amazon Cognito and cached. Subsequent invocations will use the public key from the cache.
The Lambda authorizer looks up the Amazon Cognito group that the user belongs to in the JWT and does a lookup in Amazon DynamoDB to get the policy that’s mapped to the group.
Lambda returns the policy and—optionally—context to API Gateway. The context is a map containing key-value pairs that you can pass to the upstream service. It can be additional information about the user, the service, or anything that provides additional information to the upstream service.
The API Gateway policy engine evaluates the policy.
Note: Lambda isn’t responsible for understanding and evaluating the policy. That responsibility falls on the native capabilities of API Gateway.
The request is forwarded to the service.
Note: To further optimize Lambda authorizer, the authorization policy can be cached or disabled, depending on your needs. By enabling cache, you could improve the performance as the authorization policy will be returned from the cache whenever there is a cache key match. To learn more, see Configure a Lambda authorizer using the API Gateway console.
Let’s have a closer look at the following example policy that is stored as part of an item in DynamoDB.
Based on this example policy, the user is allowed to make calls to the petstore API. For version v1, the user can make requests to any verb and any path, which is expressed by an asterisk (*). For v2, the user is only allowed to make a GET request for path /status. To learn more about how the policies work, see Output from an Amazon API Gateway Lambda authorizer.
Getting started
For this solution, you need the following prerequisites:
Let’s review each service, and how those will be used, before creating the resources for this solution.
Amazon Cognito user pool
A user pool is a user directory in Amazon Cognito. With a user pool, your users can log in to your web or mobile app through Amazon Cognito. You use the Amazon Cognito user directory directly, as this sample solution creates an Amazon Cognito user. However, your users can also log in through social IdPs, OpenID Connect (OIDC), and SAML IdPs.
Lambda as backing API service
Initially, you create a Lambda function that serves your APIs. API Gateway forwards all requests to the Lambda function to serve up the requests.
An API Gateway instance and integration with Lambda
Next, you create an API Gateway instance and integrate it with the Lambda function you created. This API Gateway instance serves as an entry point for the upstream service. The following bash command below creates an Amazon Cognito user pool, a Lambda function, and an API Gateway instance. The command then configures proxy integration with Lambda and deploys an API Gateway stage.
Deploy the sample solution
From within the directory where you downloaded the sample code from GitHub, run the following command to generate a random Amazon Cognito user password and create the resources described in the previous section.
$ bash ./helper.sh cf-create-stack-gen-password
...
Successfully created CloudFormation stack.
When the command is complete, it returns a message confirming successful stack creation.
Validate Amazon Cognito user creation
To validate that an Amazon Cognito user has been created successfully, run the following command to open the Amazon Cognito UI in your browser and then log in with your credentials.
Note: When you run this command, it returns the user name and password that you should use to log in.
$ bash ./helper.sh open-cognito-ui
Opening Cognito UI. Please use following credentials to login:
Username: cognitouser
Password: xxxxxxxx
Alternatively, you can open the CloudFormation stack and get the Amazon Cognito hosted UI URL from the stack outputs. The URL is the value assigned to the CognitoHostedUiUrl variable.
Since we haven’t installed a web application that would respond to the redirect request, Amazon Cognito will redirect to localhost, which might look like an error. The key aspect is that after a successful log in, there is a URL similar to the following in the navigation bar of your browser:
Before you protect the API with Amazon Cognito so that only authorized users can access it, let’s verify that the configuration is correct and the API is served by API Gateway. The following command makes a curl request to API Gateway to retrieve data from the API service.
The expected result is that the response will be a list of pets. In this case, the setup is correct: API Gateway is serving the API.
Protect the API
To protect your API, the following is required:
DynamoDB to store the policy that will be evaluated by the API Gateway to make an authorization decision.
A Lambda function to verify the user’s access token and look up the policy in DynamoDB.
Let’s review all the services before creating the resources.
Lambda authorizer
A Lambda authorizer is an API Gateway feature that uses a Lambda function to control access to an API. You use a Lambda authorizer to implement a custom authorization scheme that uses a bearer token authentication strategy. When a client makes a request to one of the API operations, the API Gateway calls the Lambda authorizer. The Lambda authorizer takes the identity of the caller as input and returns an IAM policy as the output. The output is the policy that is returned in DynamoDB and evaluated by the API Gateway. If there is no policy mapped to the caller identity, Lambda will generate a deny policy and request will be denied.
DynamoDB table
DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. This is ideal for this use case to ensure that the Lambda authorizer can quickly process the bearer token, look up the policy, and return it to API Gateway. To learn more, see Control access for invoking an API.
The final step is to create the DynamoDB table for the Lambda authorizer to look up the policy, which is mapped to an Amazon Cognito group.
Figure 3 illustrates an item in DynamoDB. Key attributes are:
Group, which is used to look up the policy.
Policy, which is returned to API Gateway to evaluate the policy.
Figure 3: DynamoDB item
Based on this policy, the user that is part of the Amazon Cognito group pet-veterinarian is allowed to make API requests to endpoints https://<domain>/<api-gateway-stage>/petstore/v1/* and https://<domain>/<api-gateway-stage>/petstore/v2/status for GET requests only.
Update and create resources
Run the following command to update existing resources and create a Lambda authorizer and DynamoDB table.
The request is denied with the message Unauthorized. At this point, the Amazon API Gateway expects a header named Authorization (case sensitive) in the request. If there’s no authorization header, the request is denied before it reaches the lambda authorizer. This is a way to filter out requests that don’t include required information.
Use the following command for the next test. In this test, you pass the required header but the token is invalid because it wasn’t issued by Amazon Cognito but is a simple JWT-format token stored in ./helper.sh. To learn more about how to decode and validate a JWT, see decode and verify an Amazon Cognito JSON token.
$ bash ./helper.sh curl-api-invalid-token
{"Message":"User is not authorized to access this resource"}
This time the message is different. The Lambda authorizer received the request and identified the token as invalid and responded with the message User is not authorized to access this resource.
To make a successful request to the protected API, your code will need to perform the following steps:
Use a user name and password to authenticate against your Amazon Cognito user pool.
Acquire the tokens (id token, access token, and refresh token).
Make an HTTPS (TLS) request to API Gateway and pass the access token in the headers.
Before the request is forwarded to the API service, API Gateway receives the request and passes it to the Lambda authorizer. The authorizer performs the following steps. If any of the steps fail, the request is denied.
Retrieve the public keys from Amazon Cognito.
Cache the public keys so the Lambda authorizer doesn’t have to make additional calls to Amazon Cognito as long as the Lambda execution environment isn’t shut down.
The access token has claims such as Amazon Cognito assigned groups, user name, token use, and others, as shown in the following example (some fields removed).
Finally, let’s programmatically log in to Amazon Cognito UI, acquire a valid access token, and make a request to API Gateway. Run the following command to call the protected API.
This time, you receive a response with data from the API service. Let’s examine the steps that the example code performed:
Lambda authorizer validates the access token.
Lambda authorizer looks up the policy in DynamoDB based on the group name that was retrieved from the access token.
Lambda authorizer passes the IAM policy back to API Gateway.
API Gateway evaluates the IAM policy and the final effect is an allow.
API Gateway forwards the request to Lambda.
Lambda returns the response.
Let’s continue to test our policy from Figure 3. In the policy document, arn:aws:execute-api:*:*:*/*/GET/petstore/v2/status is the only endpoint for version V2, which means requests to endpoint /GET/petstore/v2/pets should be denied. Run the following command to test this.
$ bash ./helper.sh curl-protected-api-not-allowed-endpoint
{"Message":"User is not authorized to access this resource"}
Note: Now that you understand fine grained access control using Cognito user pool, API Gateway and lambda function, and you have finished testing it out, you can run the following command to clean up all the resources associated with this solution:
In this post, you learned how IAM and Amazon Cognito can be used to provide fine-grained access control for your API behind API Gateway. You can use this approach to transparently apply fine-grained control to your API, without having to modify the code in your API, and create advanced policies by using IAM condition keys.
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Cognito forum or contact AWS Support.
Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.
As most developers can attest, dealing with security protocols and identity tokens, as well as user and device authentication, can be challenging. Imagine having multiple protocols, multiple tokens, 200M+ users, and thousands of device types, and the problem can explode in scope. A few years ago, we decided to address this complexity by spinning up a new initiative, and eventually a new team, to move the complex handling of user and device authentication, and various security protocols and tokens, to the edge of the network, managed by a set of centralized services, and a single team. In the process, we changed end-to-end identity propagation within the network of services to use a cryptographically-verifiable token-agnostic identity object.
Read on to learn more about this journey and how we have been able to:
Reduce complexity for service owners, who no longer need to have knowledge of and responsibility for terminating security protocols and dealing with myriad security tokens,
Improve security by delegating token management to services and teams with expertise in this area, and
Improve audit-ability and forensic analysis.
How We Got Here
Netflix started as a website that allowed members to manage their DVD queue. This website was later enhanced with the capability to stream content. Streaming devices came a bit later, but these initial devices were limited in capability. Over time, devices increased in capability and functions that were once only accessible on the website became accessible through streaming devices. Scale of the Netflix service was growing rapidly, with over 2000 device types supported.
Services supporting these functions now had an increased burden of being able to understand multiple tokens and security protocols in order to identify the user and device and authorize access to those functions. The whole system was quite complex, and starting to become brittle. Plus, the architecture of the Edge tier was evolving to a PaaS (platform as a service) model, and we had some tough decisions to make about how, and where, to handle identity token handling.
To demonstrate the complexity of the system, following is a description of how the user login flow worked prior to the changes described in this article:
At the highest level, the steps involved in this (greatly simplified) flow are as follows:
User enters their credentials and the Netflix client transmits the credentials, along with the ESN of the device to the Edge gateway, AKA Zuul.
Zuul redirects the user call to the API /login endpoint.
The API server orchestrates backend systems to authenticate the user.
Upon successful authentication of the claims provided, the API server sends a cookie response back upstream, including the customerId (a Long), the ESN (a String) and an expiration directive.
Zuul sends the Cookies back to the Netflix client.
This model had some problems, e.g.:
Externally valid tokens were being minted deep down in the stack and they needed to be propagated all the way upstream, opening possibilities for them to be logged inappropriately or potentially mismanaged.
Upstream systems had to reopen the tokens to identify the user logging in and potentially manage multiple parallel identity data structures, which could easily get out of sync.
Multiple Protocols & Tokens
The example above shows one flow, dealing with one protocol (HTTP/S) and one type of token (Cookies). There are several protocols and tokens in use across the Netflix streaming product, as summarized below:
These tokens were consumed by, and potentially mutated by, several systems within the Netflix streaming ecosystem, for example:
To complicate things further, there were multiple methods for transmitting these tokens, or the data contained therein, from system to system. In some cases, tokens were cracked open and identity data elements extracted as simple primitives or strings to be used in API calls, or passed from system to system via request context headers, or even as URL parameters. There were no checks in place to ensure the integrity of the tokens or the data contained therein.
At Netflix Scale
Meanwhile, the scale at which Netflix operated grew exponentially. At the time of this article, Netflix has 200M+ subscribers, with over a billion devices. We are serving over 2.5 million requests per second, a large percentage of which require some form of authentication. In the old architecture, each of these requests resulted in an API call to authenticate the claims presented with the request, as shown:
EdgePaas Enters the Picture
To further complicate the situation, the Edge Engineering team was in the middle of migrating from an old API server architecture to a new PaaS-based approach. As we migrated to EdgePaaS, front-end services were moved from the Java-based API to a BFF (backend for frontend), aka NodeQuark, as shown:
This model enables front-end engineers to own and operate their services outside of the core API framework. However, this introduced another layer of complexity — how would these NodeQuark services deal with identity tokens? NodeQuark services are written in JavaScript and terminating a protocol as complex as MSL would have been difficult and wasteful, as would replicating all of the logic for token management.
So, Where Were We Again?
To summarize, we found ourselves with a complex and inefficient solution for handling authentication and identity tokens at massive scale. We had multiple types and sources of identity tokens, each requiring special handling, the logic for which was replicated in various systems. Critical identity data was being propagated throughout the server ecosystem in an inconsistent fashion.
Edge Authentication to the Rescue
We realized that in order to solve this problem, a unified identity model was needed. We would need to process authentication tokens (and protocols) further upstream. We did this by moving authentication and protocol termination to the edge of the network, and created a new integrity-protected token-agnostic identity object to propagate throughout the server ecosystem.
Moving Authentication to the Edge
Keeping in mind our objectives to improve security and reduce complexity, and ultimately provide a better user experience, we strategized on how to centralize device authentication operations and user identification and authentication token management to the services edge.
At a high-level, Zuul (cloud gateway) was to become the termination point for token inspection and payload encryption/decryption. In the case that Zuul would be unable to handle these operations (a small percentage), e.g., if tokens were not present, needed to be renewed, or were otherwise invalid, Zuul would delegate those operations to a new set of Edge Authentication Services to handle cryptographic key exchange and token creation or renewal.
Edge Authentication Services
Edge Authentication Services (EAS) is both an architectural concept of moving authentication and identification of devices and users higher up on the stack to the cloud edge, as well as a suite of services that have been developed to handle each token type.
EAS is functionally a series of filters that run in Zuul, which may call out to external services to support their domain, e.g., to a service to handle MSL tokens or another for Cookies. EAS also covers the read-only processing of tokens to create Passports (more on that later).
The basic pattern for how EAS handles requests is as follows:
For each request coming into the Netflix service, the EAS Inbound Filter in Zuul inspects the tokens provided by the device client and either passes through the request to the Passport Injection Filter, or delegates to one of the Edge Authentication Services to process. The Passport Injection Filter generates a token-agnostic identity to propagate down through the rest of the server ecosystem. On the response path, the EAS Outbound Filter determines, with help from the Edge Authentication Services as needed, generates the tokens needed to send back to the client device.
The system architecture now takes the form of:
Notice that tokens never traverse past the Edge gateway / EAS boundary. The MSL security protocol is terminated at the Edge and all tokens are cracked open and identity data is propagated through the server ecosystem in a token-agnostic manner.
A Note on Resilience
On the happy path, Zuul is able to process the large percentage of tokens that are valid and not expired, and the Edge Auth Services handle the remainder of the requests.
The EAS services are designed to be fault tolerant, e.g., in the case where Zuul identifies that Cookies are valid, but expired, and the renewal call to EAS fails or is latent:
In this failure scenario, the EAS filter in Zuul will be lenient and allow the resolved identity to be propagated and will indicate that the renewal call should be rescheduled on the next request.
Token-Agnostic Identity (Passport)
An easily mutable identity structure would not suffice because that would mean passing less trusted identities from service to service. A token-agnostic identity structure was needed.
We introduced an identity structure called “Passport” which allowed us to propagate the user and device identity information in a uniform way. The Passport is also a kind of token, but there are many benefits to using an internal structure that differs from external tokens. However, downstream systems still need access to the user and device identity.
A Passport is a short-lived identity structure created at the Edge for each request, i.e., it is scoped to the life of the request and it is completely internal to the Netflix ecosystem. These are generated in Zuul via a set of Identity Filters. A Passport contains both user & device identity, is in protobuf format, and is integrity protected by HMAC.
Passport Structure
As noted above, the Passport is modeled as a Protocol Buffer. At the highest level, the definition of the Passport is as follows:
messagePassport {
Header header = 1;
UserInfo user_info = 2;
DeviceInfo device_info = 3;
Integrity user_integrity = 4;
Integrity device_integrity = 5;
}
The Header element communicates the name of the service that created the Passport. What’s more interesting is what is propagated related to the user and device.
User & Device Information
The UserInfo element contains all of the information required to identify the user on whose behalf requests are being made, with the DeviceInfo element containing all of the information required for the device on which the user is visiting Netflix:
Both UserInfo and DeviceInfo carry the Source and PassportAuthenticationLevel for the request. The Source list is a classification of claims, with the protocol being used and the services used to validate the claims. The PassportAuthenticationLevel is the level of trust that we put into the authentication claims.
enum Source {
NONE = 0;
COOKIE = 1;
COOKIE_INSECURE = 2;
MSL = 3;
PARTNER_TOKEN = 4;
…
}
enum PassportAuthenticationLevel {
LOW = 1; // untrusted transport
HIGH = 2; // secure tokens over TLS
HIGHEST = 3; // MSL or user credentials
}
Downstream applications can use these values to make Authorization and/or user experience decisions.
Passport Integrity
The integrity of the Passport is protected via an HMAC (hash-based message authentication code), which is a specific type of MAC involving a crytographic hash function and a secret cryptographic key. It may be used to simultaneously verify both the data integrity and authenticity of a message.
User and device integrity are defined as:
messageIntegrity {
int32 version = 1;
string key_name = 2;
bytes hmac = 3;
}
Version 1 of the Integrity element uses SHA-256 for the HMAC, which is encoded as a ByteArray. Future versions of Integrity may use a different has function or encoding. In version 1, the HMAC field contains the 256 bits from MacSpec.SHA_256.
Integrity protection guarantees that Passport field are not mutated after the Passport is created. Client applications can use the Passport Introspector to check the integrity of the Passport before using any of the values contained therein.
Passport Introspector
The Passport object itself is opaque; clients can use the Passport Introspector to extract the Passport from the headers and retrieve the contents inside it. The Passport Introspector is a wrapper over the Passport binary data. Clients create an Introspector via a factory and then have access to basic accessor methods:
public interface PassportIntrospector {
Long getCustomerId();
Long getAccountOwnerId();
String getEsn();
Integer getDeviceTypeId();
String getPassportAsString();
…
}
Passport Actions
In the Passport protocol buffer definition shown above, there are Passport Actions defined:
messageUserInfo {
repeated UserAction actions = 12;
…
}
messageDeviceInfo {
repeated DeviceAction actions = 7;
…
}
Passport Actions are explicit signals sent by downstream services, when an update to user or device identity has been performed. The signal is used by EAS to either create or update the corresponding type of token.
Login Flow, Revisited
Let’s wrap up with an example of all of these solutions working together.
With the movement of authentication and protocol termination to the Edge, and the introduction of Passports as identity, the Login Flow described earlier has morphed into the following:
User enters their credentials and the Netflix client transmits the credentials, along with the ESN of the device to the Edge gateway, AKA Zuul.
Identity filters running in Zuul generate a device-bound Passport and pass it along to the API /login endpoint.
The API server propagates the Passport to the mid-tier services responsible for authentication the user.
Upon successful authentication of the claims provided, these services create a Passport Action and send it, along with the original Passport, back up stream to API and Zuul.
Zuul makes a call to the Cookie Service to resolve the Passport and Passport Actions and sends the Cookies back to the Netflix client.
Key Benefits and Learnings
Simplified Authorization
One of the reasons there were external tokens flowing into downstream systems was because authorization decisions often depend on authentication claims in tokens and the trust associated with each token type. In our Passport structure, we have assigned levels to this trust, meaning that systems requiring authorization decisions can write sensible rules around the Passport instead of replicating the trust rules in code across many services.
An Explicit and Extensible Identity Model
Having a structure that is the canonical identity is very useful. Alternatives where identity primitives are passed around are brittle and hard to debug. If the customer identity changed from service A to service D in a call chain, who changed it? Once the identity structure is passed through all key systems, it is relatively easy to add new external token types, new trust levels, or new ways to represent identity.
Operational Concerns and Visibility
Having a structure, like Passport, allows you to define the services that can write a Passport and other services can validate it. When the Passport is propagated and when we see it in logs, we can open it up, validate it, and know what the identity is. We also know the provenance of the Passport, and can trace it back to where it entered the system. This makes the debugging of any identity-related anomalies much easier.
Reduced Downstream System Complexity & Load
Passing a uniform structure to downstream systems means that those systems can easily look up the device and user identity, using an introspection library. Instead of having separate handling for each type of external token, they can use the common structure.
By offloading token processing from these systems to the central Edge Authentication Services, downstream systems saw significant gains in CPU, request latency, and garbage collection metrics, all of which help reduce cluster footprint and cloud costs. The following examples of these gains are from the primary API service.
In the prior implementation, it was necessary to incur decryption/termination costs twice per request because we needed the ability to route at the edge but also needed rich termination in the downstream service. Some of the performance improvement is due to consolidation of this — MSL requests now only need to be processed once.
CPU to RPS Ratio
Offloading token processing resulted in a 30% reduction in CPU cost per request and a 40% reduction in load average. The following graph shows the CPU to RPS ratio, where lower is better:
API Response Time
Response times for all calls on the API service showed significant improvement, with a 30% reduction in average latency and a 20% drop in 99th percentile latency:
Garbage Collection
The API service also saw a significant reduction in GC pressure and GC pause times, as shown in the Stop The World Garbage Collection metrics:
Developer Velocity
Abstracting these authentication and identity-related concerns away from the developers of microservices means that they can focus on their core domain. Changes in this area are now done once, and in one set of specialized services, versus being distributed across multiple.
What’s Next?
Strong(er) Authentication
We are currently expanding the Edge Authentication Services to support Multi-Factor Authentication via a new service called “Resistor”. We selectively introduce the second factor for connections that are suspicious, based on machine learning models. As we onboard new flows, we are introducing new factors, e.g., one-time passwords (OTP) sent to email or phone, push notifications to mobile devices, and third-party authenticator applications. We may also explore opt-in Multi-Factor Authentication for users who desire the added security on their accounts.
Flexible Authorization
Now that we have a verified identity flowing through the system, we can use that as a strong signal for authorization decisions. Last year, we started to explore a new Product Access Strategy (PACS) and are currently working on moving it into production for several new experiences in the Netflix streaming product. PACS recently powered the experience access control for the Streamfest, a weekend of free Netflix in India.
Want More?
Team members presented this work at QCon San Francisco (and were two of the top three attended talks at the conference!):
In this post, I’ll be showing you how to configure Amazon Cognito as an OpenID provider (OP) with a single-page web application.
This use case describes using Amazon Cognito to integrate with an existing authorization system following the OpenID Connect (OIDC) specification. OIDC is an identity layer on top of the OAuth 2.0 protocol to enable clients to verify the identity of users. Amazon Cognito lets you add user sign-up, sign-in, and access control to your web and mobile apps quickly and easily. Some key reasons customers select Amazon Cognito include:
Simplicity of implementation: The console is very intuitive; it takes a short time to understand how to configure and use Amazon Cognito. Amazon Cognito also has key out-of-the-box functionality, including social sign-in, multi-factor authentication (MFA), forgotten password support, and infrastructure as code (AWS CloudFormation) support.
Ability to customize workflows: Amazon Cognito offers the option of a hosted UI where users can sign-in directly to Amazon Cognito or sign-in via social identity providers such as Amazon, Google, Apple, and Facebook. The Amazon Cognito hosted UI and workflows help save your team significant time and effort.
OIDC support: Amazon Cognito can securely pass user profile information to an existing authorization system following the ODIC authorization code flow. The authorization system uses the user profile information to secure access to the app.
Amazon Cognito overview
Amazon Cognito follows the OIDC specification to authenticate users of web and mobile apps. Users can sign in directly through the Amazon Cognito hosted UI or through a federated identity provider, such as Amazon, Facebook, Apple, or Google. The hosted UI workflows include sign-in and sign-up, password reset, and MFA. Since not all customer workflows are the same, you can customize Amazon Cognito workflows at key points with AWS Lambda functions, allowing you to run code without provisioning or managing servers. After a user authenticates, Amazon Cognito returns standard OIDC tokens. You can use the user profile information in the ID token to grant your users access to your own resources or you can use the tokens to grant access to APIs hosted by Amazon API Gateway. You can also exchange the tokens for temporary AWS credentials to access other AWS services.
Figure 1: Amazon Cognito sign-in flow
OAuth 2.0 and OIDC
OAuth 2.0 is an open standard that allows a user to delegate access to their information to other websites or applications without handing over credentials. OIDC is an identity layer on top of OAuth 2.0 that uses OAuth 2.0 flows. OAuth 2.0 defines a number of flows to manage the interaction between the application, user, and authorization server. The right flow to use depends on the type of application.
The client credentials flow is used in machine-to-machine communications. You can use the client credentials flow to request an access token to access your own resources, which means you can use this flow when your app is requesting the token on its own behalf, not on behalf of a user. The authorization code grant flow is used to return an authorization code that is then exchanged for user pool tokens. Because the tokens are never exposed directly to the user, they are less likely to be shared broadly or accessed by an unauthorized party. However, a custom application is required on the back end to exchange the authorization code for user pool tokens. For security reasons, we recommend the Authorization Code Flow with Proof Key Code Exchange (PKCE) for public clients, such as single-page apps or native mobile apps.
The following table shows recommended flows per application type.
Application
CFlow
Description
Machine
Client credentials
Use this flow when your application is requesting the token on its own behalf, not on behalf of the user
Web app on a server
Authorization code grant
A regular web app on a web server
Single-page app
Authorization code grant PKCE
An app running in the browser, such as JavaScript
Mobile app
Authorization code grant PKCE
iOS or Android app
Securing the authorization code flow
Amazon Cognito can help you achieve compliance with regulatory frameworks and certifications, but it’s your responsibility to use the service in a way that remains compliant and secure. You need to determine the sensitivity of the user profile data in Amazon Cognito; adhere to your company’s security requirements, applicable laws and regulations; and configure your application and corresponding Amazon Cognito settings appropriately for your use case.
We recommend that you use the authorization code flow with PKCE for single-page apps. Applications that use PKCE generate a random code verifier that’s created for every authorization request. Proof Key for Code Exchange by OAuth Public Clients has more information on use of a code verifier. In the following sections, I will show you how to set up the Amazon Cognito authorization endpoint for your app to support a code verifier.
The authorization code flow
In OpenID terms, the app is the relying party (RP) and Amazon Cognito is the OP. The flow for the authorization code flow with PKCE is as follows:
The user enters the app home page URL in the browser and the browser fetches the app.
The app generates the PKCE code challenge and redirects the request to the Amazon Cognito OAuth2 authorization endpoint (/oauth2/authorize).
Amazon Cognito responds back to the user’s browser with the Amazon Cognito hosted sign-in page.
The user signs in with their user name and password, signs up as a new user, or signs in with a federated sign-in. After a successful sign-in, Amazon Cognito returns the authorization code to the browser, which redirects the authorization code back to the app.
The app sends a request to the Amazon Cognito OAuth2 token endpoint (/oauth2/token) with the authorization code, its client credentials, and the PKCE verifier.
Amazon Cognito authenticates the app with the supplied credentials, validates the authorization code, validates the request with the code verifier, and returns the OpenID tokens, access token, ID token, and refresh token.
The app validates the OpenID ID token and then uses the user profile information (claims) in the ID token to provide access to resources.(Optional) The app can use the access token to retrieve the user profile information from the Amazon Cognito user information endpoint (/userInfo).
Amazon Cognito returns the user profile information (claims) about the authenticated user to the app. The app then uses the claims to provide access to resources.
The following diagram shows the authorization code flow with PKCE.
Figure 2: Authorization code flow
Implementing an app with Amazon Cognito authentication
Now that you’ve learned about Amazon Cognito OAuth implementation, let’s create a working example app that uses Amazon Cognito OAuth implementation. You’ll create an Amazon Cognito user pool along with an app client, the app, an Amazon Simple Storage Service (Amazon S3) bucket, and an Amazon CloudFront distribution for the app, and you’ll configure the app client.
Step 1. Create a user pool
Start by creating your user pool with the default configuration.
Create a user pool:
Go to the Amazon Cognito console and select Manage User Pools. This takes you to the User Pools Directory.
Select Create a user pool in the upper corner.
Enter a Pool name, select Review defaults, and select Create pool.
Copy the Pool ID, which will be used later to create your single-page app. It will be something like region_xxxxx. You will use it to replace the variable YOUR_USERPOOL_ID in a later step.(Optional) You can add additional features to the user pool, but this demonstration uses the default configuration. For more information see, the Amazon Cognito documentation.
The following figure shows you entering the user pool name.
Figure 3: Enter a name for the user pool
The following figure shows the resulting user pool configuration.
Figure 4: Completed user pool configuration
Step 2. Create a domain name
The Amazon Cognito hosted UI lets you use your own domain name or you can add a prefix to the Amazon Cognito domain. This example uses an Amazon Cognito domain with a prefix.
Create a domain name:
Sign in to the Amazon Cognito console, select Manage User Pools, and select your user pool.
Under App integration, select Domain name.
In the Amazon Cognito domain section, add your Domain prefix (for example, myblog).
Select Check availability. If your domain isn’t available, change the domain prefix and try again.
When your domain is confirmed as available, copy the Domain prefix to use when you create your single-page app. You will use it to replace the variable YOUR_COGNITO_DOMAIN_PREFIX in a later step.
Choose Save changes.
The following figure shows creating an Amazon Cognito hosted domain.
Figure 5: Creating an Amazon Cognito hosted UI domain
Step 3. Create an app client
Now create the app client user pool. An app client is where you register your app with the user pool. Generally, you create an app client for each app platform. For example, you might create an app client for a single-page app and another app client for a mobile app. Each app client has its own ID, authentication flows, and permissions to access user attributes.
Create an app client:
Sign in to the Amazon Cognito console, select Manage User Pools, and select your user pool.
Under General settings, select App clients.
Choose Add an app client.
Enter a name for the app client in the App client name field.
Uncheck Generate client secret and accept the remaining default configurations.
Note: The client secret is used to authenticate the app client to the user pool. Generate client secret is unchecked because you don’t want to send the client secret on the URL using client-side JavaScript. The client secret is used by applications that have a server-side component that can secure the client secret.
Choose Create app client as shown in the following figure.
Figure 6: Create and configure an app client
Copy the App client ID. You will use it to replace the variable YOUR_APPCLIENT_ID in a later step.
The following figure shows the App client ID which is automatically generated when the app client is created.
Figure 7: App client configuration
Step 4. Create an Amazon S3 website bucket
Amazon S3 is an object storage service that offers industry-leading scalability, data availability, security, and performance. We use Amazon S3 here to host a static website.
Sign in to the AWS Management Console and open the Amazon S3 console.
Choose Create bucket to start the Create bucket wizard.
In Bucket name, enter a DNS-compliant name for your bucket. You will use this in a later step to replace the YOURS3BUCKETNAME variable.
In Region, choose the AWS Region where you want the bucket to reside.
Note: It’s recommended to create the Amazon S3 bucket in the same AWS Region as Amazon Cognito.
Look up the region code from the region table (for example, US-East [N. Virginia] has a region code of us-east-1). You will use the region code to replace the variable YOUR_REGION in a later step.
Choose Next.
Select the Versioning checkbox.
Choose Next.
Choose Next.
Choose Create bucket.
Select the bucket you just created from the Amazon S3 bucket list.
Select the Properties tab.
Choose Static website hosting.
Choose Use this bucket to host a website.
For the index document, enter index.html and then choose Save.
Step 5. Create a CloudFront distribution
Amazon CloudFront is a fast content delivery network service that helps securely deliver data, videos, applications, and APIs to customers globally with low latency and high transfer speeds—all within a developer-friendly environment. In this step, we use CloudFront to set up an HTTPS-enabled domain for the static website hosted on Amazon S3.
On the first page of the Create Distribution Wizard, in the Web section, choose Get Started.
Choose the Origin Domain Name from the dropdown list. It will be YOURS3BUCKETNAME.s3.amazonaws.com.
For Restrict Bucket Access, select Yes.
For Origin Access Identity, select Create a New Identity.
For Grant Read Permission on Bucket, select Yes, Update Bucket Policy.
For the Viewer Protocol Policy, select Redirect HTTP to HTTPS.
For Cache Policy, select Managed-Caching Disabled.
Set the Default Root Object to index.html.(Optional) Add a comment. Comments are a good place to describe the purpose of your distribution, for example, “Amazon Cognito SPA.”
Select Create Distribution. The distribution will take a few minutes to create and update.
Copy the Domain Name. This is the CloudFront distribution domain name, which you will use in a later step as the DOMAINNAME value in the YOUR_REDIRECT_URI variable.
Step 6. Create the app
Now that you’ve created the Amazon S3 bucket for static website hosting and the CloudFront distribution for the site, you’re ready to use the code that follows to create a sample app.
Use the following information from the previous steps:
YOUR_COGNITO_DOMAIN_PREFIX is from Step 2.
YOUR_REGION is the AWS region you used in Step 4 when you created your Amazon S3 bucket.
YOUR_APPCLIENT_ID is the App client ID from Step 3.
YOUR_USERPOOL_ID is the Pool ID from Step 1.
YOUR_REDIRECT_URI, which is https://DOMAINNAME/index.html, where DOMAINNAME is your domain name from Step 5.
Create userprofile.js
Use the following text to create the userprofile.js file. Substitute the preceding pre-existing values for the variables in the text.
var myHeaders = new Headers();
myHeaders.set('Cache-Control', 'no-store');
var urlParams = new URLSearchParams(window.location.search);
var tokens;
var domain = "YOUR_COGNITO_DOMAIN_PREFIX";
var region = "YOUR_REGION";
var appClientId = "YOUR_APPCLIENT_ID";
var userPoolId = "YOUR_USERPOOL_ID";
var redirectURI = "YOUR_REDIRECT_URI";
//Convert Payload from Base64-URL to JSON
const decodePayload = payload => {
const cleanedPayload = payload.replace(/-/g, '+').replace(/_/g, '/');
const decodedPayload = atob(cleanedPayload)
const uriEncodedPayload = Array.from(decodedPayload).reduce((acc, char) => {
const uriEncodedChar = ('00' + char.charCodeAt(0).toString(16)).slice(-2)
return `${acc}%${uriEncodedChar}`
}, '')
const jsonPayload = decodeURIComponent(uriEncodedPayload);
return JSON.parse(jsonPayload)
}
//Parse JWT Payload
const parseJWTPayload = token => {
const [header, payload, signature] = token.split('.');
const jsonPayload = decodePayload(payload)
return jsonPayload
};
//Parse JWT Header
const parseJWTHeader = token => {
const [header, payload, signature] = token.split('.');
const jsonHeader = decodePayload(header)
return jsonHeader
};
//Generate a Random String
const getRandomString = () => {
const randomItems = new Uint32Array(28);
crypto.getRandomValues(randomItems);
const binaryStringItems = randomItems.map(dec => `0${dec.toString(16).substr(-2)}`)
return binaryStringItems.reduce((acc, item) => `${acc}${item}`, '');
}
//Encrypt a String with SHA256
const encryptStringWithSHA256 = async str => {
const PROTOCOL = 'SHA-256'
const textEncoder = new TextEncoder();
const encodedData = textEncoder.encode(str);
return crypto.subtle.digest(PROTOCOL, encodedData);
}
//Convert Hash to Base64-URL
const hashToBase64url = arrayBuffer => {
const items = new Uint8Array(arrayBuffer)
const stringifiedArrayHash = items.reduce((acc, i) => `${acc}${String.fromCharCode(i)}`, '')
const decodedHash = btoa(stringifiedArrayHash)
const base64URL = decodedHash.replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, '');
return base64URL
}
// Main Function
async function main() {
var code = urlParams.get('code');
//If code not present then request code else request tokens
if (code == null){
// Create random "state"
var state = getRandomString();
sessionStorage.setItem("pkce_state", state);
// Create PKCE code verifier
var code_verifier = getRandomString();
sessionStorage.setItem("code_verifier", code_verifier);
// Create code challenge
var arrayHash = await encryptStringWithSHA256(code_verifier);
var code_challenge = hashToBase64url(arrayHash);
sessionStorage.setItem("code_challenge", code_challenge)
// Redirtect user-agent to /authorize endpoint
location.href = "https://"+domain+".auth."+region+".amazoncognito.com/oauth2/authorize?response_type=code&state="+state+"&client_id="+appClientId+"&redirect_uri="+redirectURI+"&scope=openid&code_challenge_method=S256&code_challenge="+code_challenge;
} else {
// Verify state matches
state = urlParams.get('state');
if(sessionStorage.getItem("pkce_state") != state) {
alert("Invalid state");
} else {
// Fetch OAuth2 tokens from Cognito
code_verifier = sessionStorage.getItem('code_verifier');
await fetch("https://"+domain+".auth."+region+".amazoncognito.com/oauth2/token?grant_type=authorization_code&client_id="+appClientId+"&code_verifier="+code_verifier+"&redirect_uri="+redirectURI+"&code="+ code,{
method: 'post',
headers: {
'Content-Type': 'application/x-www-form-urlencoded'
}})
.then((response) => {
return response.json();
})
.then((data) => {
// Verify id_token
tokens=data;
var idVerified = verifyToken (tokens.id_token);
Promise.resolve(idVerified).then(function(value) {
if (value.localeCompare("verified")){
alert("Invalid ID Token - "+ value);
return;
}
});
// Display tokens
document.getElementById("id_token").innerHTML = JSON.stringify(parseJWTPayload(tokens.id_token),null,'\t');
document.getElementById("access_token").innerHTML = JSON.stringify(parseJWTPayload(tokens.access_token),null,'\t');
});
// Fetch from /user_info
await fetch("https://"+domain+".auth."+region+".amazoncognito.com/oauth2/userInfo",{
method: 'post',
headers: {
'authorization': 'Bearer ' + tokens.access_token
}})
.then((response) => {
return response.json();
})
.then((data) => {
// Display user information
document.getElementById("userInfo").innerHTML = JSON.stringify(data, null,'\t');
});
}}}
main();
Create the verifier.js file
Use the following text to create the verifier.js file.
var key_id;
var keys;
var key_index;
//verify token
async function verifyToken (token) {
//get Cognito keys
keys_url = 'https://cognito-idp.'+ region +'.amazonaws.com/' + userPoolId + '/.well-known/jwks.json';
await fetch(keys_url)
.then((response) => {
return response.json();
})
.then((data) => {
keys = data['keys'];
});
//Get the kid (key id)
var tokenHeader = parseJWTHeader(token);
key_id = tokenHeader.kid;
//search for the kid key id in the Cognito Keys
const key = keys.find(key =>key.kid===key_id)
if (key === undefined){
return "Public key not found in Cognito jwks.json";
}
//verify JWT Signature
var keyObj = KEYUTIL.getKey(key);
var isValid = KJUR.jws.JWS.verifyJWT(token, keyObj, {alg: ["RS256"]});
if (isValid){
} else {
return("Signature verification failed");
}
//verify token has not expired
var tokenPayload = parseJWTPayload(token);
if (Date.now() >= tokenPayload.exp * 1000) {
return("Token expired");
}
//verify app_client_id
var n = tokenPayload.aud.localeCompare(appClientId)
if (n != 0){
return("Token was not issued for this audience");
}
return("verified");
};
Create an index.html file
Use the following text to create the index.html file.
Upload the files into the Amazon S3 Bucket you created earlier
Upload the files you just created to the Amazon S3 bucket that you created in Step 4. If you’re using Chrome or Firefox browsers, you can choose the folders and files to upload and then drag and drop them into the destination bucket. Dragging and dropping is the only way that you can upload folders.
Sign in to the AWS Management Console and open the Amazon S3 console.
In the Bucket name list, choose the name of the bucket that you created earlier in Step 4.
In a window other than the console window, select the index.html file to upload. Then drag and drop the file into the console window that lists the destination bucket.
In the Upload dialog box, choose Upload.
Choose Create Folder.
Enter the name js and choose Save.
Choose the js folder.
In a window other than the console window, select the userprofile.js and verifier.js files to upload. Then drag and drop the files into the console window js folder.
Note: The Amazon S3 bucket root will contain the index.html file and a js folder. The js folder will contain the userprofile.js and verifier.js files.
Step 7. Configure the app client settings
Use the Amazon Cognito console to configure the app client settings, including identity providers, OAuth flows, and OAuth scopes.
Select App integration, and then select App client settings.
Under Enabled Identity Providers, select Cognito User Pool.(Optional) You can add federated identity providers. Adding User Pool Sign-in Through a Third-Party has more information about how to add federation providers.
Enter the Callback URL(s) where the user is to be redirected after successfully signing in. The callback URL is the URL of your web app that will receive the authorization code. In our example, this will be the Domain Name for the CloudFront distribution you created earlier. It will look something like https://DOMAINNAME/index.html where DOMAINNAME is xxxxxxx.cloudfront.net.
Note: HTTPS is required for the Callback URLs. For this example, I used CloudFront as a HTTPS endpoint for the app in Amazon S3.
Next, select Authorization code grant from the Allowed OAuth Flows and OpenID from Allowed OAuth Scopes. The OpenID scope will return the ID token and grant access to all user attributes that are readable by the client.
Choose Save changes.
Step 8. Show the app home page
Now that the Amazon Cognito user pool is configured and the sample app is built, you can test using Amazon Cognito as an OP from the sample JavaScript app you created in Step 6.
View the app’s home page:
Open a web browser and enter the app’s home page URL using the CloudFront distribution to serve your index.html page created in Step 6 (https://DOMAINNAME/index.html) and the app will redirect the browser to the Amazon Cognito /authorize endpoint.
The /authorize endpoint redirects the browser to the Amazon Cognito hosted UI, where the user can sign in or sign up. The following figure shows the user sign-in page.
Figure 8: User sign-in page
Step 9. Create a user
You can use the Amazon Cognito user pool to manage your users or you can use a federated identity provider. Users can sign in or sign up from the Amazon Cognito hosted UI or from a federated identity provider. If you configured a federated identity provider, users will see a list of federated providers that they can choose from. When a user chooses a federated identity provider, they are redirected to the federated identity provider sign-in page. After signing in, the browser is directed back to Amazon Cognito. For this post, Amazon Cognito is the only identity provider, so you will use the Amazon Cognito hosted UI to create an Amazon Cognito user.
Create a new user using Amazon Cognito hosted UI:
Create a new user by selecting Sign up and entering a username, password, and email address. Then select the Sign up button. The following figure shows the sign up screen.
Figure 9: Sign up with a new account
The Amazon Cognito sign up workflow will verify the email address by sending a verification code to that address. The following figure shows the prompt to enter the verification code.
Figure 10: Enter the verification code
Enter the code from the verification email in the Verification Code text box.
Select Confirm Account.
Step 10. Viewing the Amazon Cognito tokens and profile information
After authentication, the app displays the tokens and user information. The following figure shows the OAuth2 access token and OIDC ID token that are returned from the /token endpoint and the user profile returned from the /userInfo endpoint. Now that the user has been authenticated, the application can use the user’s email address to look up the user’s account information in an application data store. Based on the user’s account information, the application can grant/restrict access to paid content or show account information like order history.
Figure 11: Token and user profile information
Note: Many browsers will cache redirects. If your browser is repeatedly redirecting to the index.html page, clear the browser cache.
Summary
In this post, we’ve shown you how easy it is to add user authentication to your web and mobile apps with Amazon Cognito.
We created a Cognito User Pool as our user directory, assigned a domain name to the Amazon Cognito hosted UI, and created an application client for our application. Then we created an Amazon S3 bucket to host our website. Next, we created a CloudFront distribution for our Amazon S3 bucket. Then we created our application and uploaded it to our Amazon S3 website bucket. From there, we configured the client app settings with our identity provider, OAuth flows, and scopes. Then we accessed our application and used the Amazon Cognito sign-in flow to create a username and password. Finally, we logged into our application to see the OAuth and OIDC tokens.
Amazon Cognito saves you time and effort when implementing authentication with an intuitive UI, OAuth2 and OIDC support, and customizable workflows. You can now focus on building features that are important to your core business.
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Cognito forum or contact AWS Support.
Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.