Tag Archives: Tenant isolation

SaaS authentication: Identity management with Amazon Cognito user pools

Post Syndicated from Shubhankar Sumar original https://aws.amazon.com/blogs/security/saas-authentication-identity-management-with-amazon-cognito-user-pools/

Amazon Cognito is a customer identity and access management (CIAM) service that can scale to millions of users. Although the Cognito documentation details which multi-tenancy models are available, determining when to use each model can sometimes be challenging. In this blog post, we’ll provide guidance on when to use each model and review their pros and cons to help inform your decision.

Cognito overview

Amazon Cognito handles user identity management and access control for web and mobile apps. With Cognito user pools, you can add sign-up, sign-in, and access control to your apps. A Cognito user pool is a user directory within a specific AWS Region where users can authenticate and register for applications. In addition, a Cognito user pool is an OpenID Connect (OIDC) identity provider (IdP). App users can either sign in directly through a user pool or federate through a third-party IdP. Cognito issues a user pool token after successful authentication, which can be used to securely access backend APIs and resources.

Cognito issues three types of tokens:

  • ID token – Contains user identity claims like name, email, and phone number. This token type authenticates users and enables authorization decisions in apps and API gateways.
  • Access token – Includes user claims, groups, and authorized scopes. This token type grants access to API operations based on the authenticated user and application permissions. It also enables fine-grained, user-based access control within the application or service.
  • Refresh token – Retrieves new ID and access tokens when these are expired. Access and ID tokens are short-lived, while the refresh token is long-lived. By default, refresh tokens expire 30 days after the user signs in, but this can be configured to a value between 60 minutes and 10 years.

You can find more information on using tokens and their contents in the Cognito documentation.

Multi-tenancy approaches

Software as a service (SaaS) architectures often use silo, pool, or bridge deployment models, which also apply to CIAM services like Cognito. The silo model isolates tenants in dedicated resources. The pool model shares resources between tenants. The bridge model connects siloed and pooled components. This post compares the Cognito silo and pool models for SaaS identity management.

It’s also possible to combine the silo and pool models by having multiple tiers of resources. For example, you could have a siloed tier for sensitive tenant data along with a pooled tier for shared functionality. This is similar to the silo model but with added routing complexity to connect the tiers. When you have multiple pools or silos, this is a similar approach to the pure silo model but with more components to manage.

More detail on these models are included in the AWS SaaS Lens.

We’ve detailed five possible patterns in the following sections and explored the scenarios where each of the patterns can be used, along with the advantages and disadvantages for each. The rest of the post delves deeper into the details of these different patterns, enabling you to make an informed decision that best aligns with your unique requirements and constraints.

Pattern 1: Representing SaaS identity with custom attributes

To implement multi-tenancy in a SaaS application, tenant context needs to be associated with user identity. This allows implementation of the multi-tenant policies and strategies that comprise our SaaS application. Cognito has user pool attributes, which are pieces of information to represent identity. There are standard attributes, such as name and email, that describe the user identity. Cognito also supports custom attributes that can be used to hold information about the user’s relationship to a tenant, such as tenantId.

By using custom attributes for multi-tenancy in Amazon Cognito, the tenant context for each user can be stored in their user profile.

To enable multi-tenancy, you can add a custom attribute like tenantId to the user profile. When a new user signs up, this tenantId attribute can be set to a value indicating which tenant the user belongs to. For example, users with tenantId “1234” belong to Tenant A, while users with tenantId “5678” belong to Tenant B.

The tenantId attribute value gets returned in the ID token after a successful user authentication. (This value can also be added to the access token through customization by using a pre-token generation Lambda trigger.) The application can then inspect this claim to determine which tenant the user belongs to. The tenantId attribute is typically managed at the SaaS platform level and is read-only to users and the application layer. (Note: SaaS providers need to configure the tenantId attribute to be read-only.)

In addition to storing a tenant ID, you can use custom attributes to model additional tenant context. For instance, attributes like tenantName, tenantTier, or tenantRegion could be defined and set appropriately for each user to provide relevant informational context for the application. However, make sure not to use custom attributes as a database—they are meant to represent identity, not store application data. Custom attributes should only contain information that is relevant for authorization decisions and JSON web token (JWT) compactness and should be relatively static because their values are stored in the Cognito directory. Updating frequently changing data requires modifying the directory, which can be cumbersome.

The custom attributes themselves need to be defined at the time of creating the Amazon Cognito user pool, and there is a maximum of 50 custom attributes that you can create. Once the pool is created, these custom attribute fields will be present on every user profile in that user pool. However, they won’t have values populated yet. The actual tenant attribute values get populated only when a new user is created in the user pool. This can be done in two ways:

  1. During user sign-up, a post confirmation AWS Lambda trigger can be used to set the appropriate tenant attribute values based on the user’s input.
  2. An admin user can provision a new user through the AdminCreateUser API operation and specify the tenant attribute values at that time.

After user creation, the custom tenant attribute values can still be updated by an administrator through the AdminUpdateUserAttributes API operation or by a user with the UpdateUserAttributes API operation, if needed. But the key point is that the custom attributes themselves must be predefined at user pool creation, while the values get set later during user creation and provisioning flows. Figure 1 shows how custom attributes are associated with an ID token and used subsequently in downstream applications.

Figure 1: Associating tenant context with custom attributes

Figure 1: Associating tenant context with custom attributes

As shown in Figure 1:

  • The custom tenant attribute values from the user profile are included in the Cognito ID token that is generated after a successful user authentication. These values can be used for access control for other AWS services, such as Amazon API Gateway.
  • You can configure Amazon API Gateway with a Lambda authorizer function that validates the ID token signature (the aws-jwt-verify library can be used for this purpose) and inspects the tenant ID claim in each request.
  • Based on the tenant ID value extracted from the ID token, the Lambda authorizer can determine which backend resources and services each authenticated user is authorized to access.

You can use this method to provide fine-grained access control, as described in this blog post, by using tenant claims as context in addition to the user claims embedded within the token. This pattern of embedding information about the user’s identity, along with details on their associated tenant, in a single token is what AWS refers to as SaaS identity.

The multi-tenancy approaches of using siloed user pools, shared pools, or custom attributes rely on embedding tenant context within the user identity. This is accomplished by having Cognito include claims with tenant information in the JWTs issued after authentication.

The JWT encodes user identity information like the username, email address, and so on. By adding custom claims that contain tenant identifiers or metadata, the tenant context gets tightly coupled to the user identity. The embedded tenant context in the JWT allows applications to implement access control and authorization based on the associated tenant for each user.

This combination of user identity information and tenant context in the issued JWT represents the SaaS identity—a unified identity spanning both user and tenant dimensions. The application uses this SaaS identity for implementing multi-tenant logic and policies.

Pattern 2: Shared user pool (pool model)

A single, shared Amazon Cognito user pool simplifies identity management for multi-tenant SaaS applications. With one consolidated pool, changes and configurations apply across tenants in one place, which can reduce overhead.

For example, you can define password complexity rules and other settings once at the user pool level, and then these settings are shared across tenants. Adding new tenants is streamlined by using the settings in the existing shared pool, without duplicating setup per tenant. This avoids deploying isolated pools when onboarding new tenants.

Additionally, the tokens issued from the shared pool are signed by the same issuer. There is no tenant-specific issuer in the tokens when using a shared pool. For SaaS apps with common identity needs, a shared multi-tenant pool minimizes friction for rapid onboarding despite that loss of per-tenant customization.

Advantages of the pool model:

  • This model uses a single shared user pool for tenants. This simplifies onboarding by setting user attributes rather than configuring multiple user pools.
  • Tenants authenticate using the same application client and user pool, which keeps the SaaS client configuration simple.

Disadvantages of the pool model:

  • Sharing one pool means that settings like password policies and MFA apply uniformly, without customization per tenant.
  • Some resource quotas are managed at a user pool level (for example, the number of application clients or customer attributes), so you need to consider quotas carefully when adopting this model.

Pattern 3: Group-based multi-tenancy (pool model)

Amazon Cognito user pools give an administrator the capability to add groups and associate users with groups. Doing so introduces specific attributes (cognito:groups and cognito:roles) that are managed and maintained by Cognito and available within the ID tokens. (Access tokens only have the cognito:groups attribute.) These groups can be used to enable multi-tenancy by creating a separate group for each tenant. Users can be assigned to the appropriate tenant group based on the value of a custom tenantId attribute. The application can then implement authorization logic to limit access to resources and data based on the user’s tenant group membership that is encoded in the tokens. This provides isolation and access control across tenants, making use of the native group constructs in Cognito rather than relying entirely on custom attributes.

The group information contained in the tokens can then be used by downstream services to make authorization decisions. Groups are often combined with custom attributes for more granular access control. For example, in the SaaS Factory Serverless SaaS – Reference Solution developed by the AWS SaaS Factory team, roles are specified by using Cognito groups, but tenant identity relies on a custom tenantId attribute. The tenant ID attribute provides isolation between tenants, while the groups define individual user roles and access privileges that apply within a tenant.

Figure 2 shows how groups are associated with the user and then the Lambda authorizer can determine which backend resources and services each authenticated user is authorized to access.

Figure 2: Group-based multi-tenancy

Figure 2: Group-based multi-tenancy

In this model, groups can provide role-based controls, while custom attributes like tenant ID provide the contextual information needed to enforce tenant isolation. The authorization decisions are then made by evaluating a user’s group memberships and attribute values in order to provide fine-grained access tailored to each tenant and user. So groups directly enable role-based checks, while custom attributes provide broader context for conditional access across tenants. Together they can provide the data that is needed to implement granular authorization in a multi-tenant application.

Advantages of group-based multi-tenancy:

  • This model uses a single shared user pool for tenants, so that onboarding requires setting user attributes rather than configuring multiple pools.
  • Tenants authenticate through the same application client and pool, keeping SaaS client configuration straightforward.

Disadvantages of group-based multi-tenancy:

  • Sharing one pool means that settings like password policies and MFA apply uniformly without per-tenant customization.
  • There is a limit of 10,000 groups per user pool.

Pattern 4: Dedicated user pool per tenant (silo model)

Another common approach for multi-tenant identity with Cognito is to provision a separate user pool for each tenant. A Cognito user pool is a user directory, so using distinct pools provides maximum isolation. However, this approach requires that you implement tenant routing logic in the application to determine which user pool a user should authenticate against, based on their tenant.

Tenant routing

With separate user pools per tenant (or application clients, as we’ll discuss later), the application needs logic to route each user to the appropriate pool (or client) for authentication. There are a few options that you can use for this approach:

  • Use a subdomain in the URL that maps to the tenant—for example, tenant1.myapp.com routes to Tenant 1’s user pool. This requires mapping subdomains to tenant pools.
  • Rely on unique email domains per tenant—for example, @tenant1.com goes to Tenant 1’s pool. This requires mapping email domains to pools.
  • Have the user select their tenant from a dropdown list. This requires the tenant choices to be configured.
  • Prompt the user to enter a tenant ID code that maps to pools. This requires mapping codes to pools.

No matter the approach you chose, the key requirements are the following:

  • A data point to identify the tenant (such as subdomain, email, selection, or code).
  • A mapping dataset that takes tenant identifying information from the user and looks up the corresponding user pool to route to for authentication.
  • Routing logic to redirect to the appropriate user pool.

For example, the AWS SaaS Factory Serverless Reference Architecture uses the approach shown in Figure 3.

Figure 3: Dedicated user pool per tenant

Figure 3: Dedicated user pool per tenant

The workflow is as follows:

  1. The user enters their tenant name during sign-in.
  2. The tenant name retrieves tenant-specific information like the user pool ID, application client ID, and API URLs.
  3. Tenant-specific information is passed to the SaaS app to initialize authentication to the correct user pool and app client, and this is used to initialize an authorization code flow.
  4. The app redirects to the Cognito hosted UI for authentication.
  5. User credentials are validated, and Cognito issues an OAuth code.
  6. The OAuth code is exchanged for a JWT token from Cognito.
  7. The JWT token is used to authenticate the user to access microservices.

Advantages of the one pool per tenant model:

  • Users exist in a single directory with no cross-tenant visibility. Tokens are issued and signed with keys that are unique to that pool.
  • Each pool can have customized security policies, like password rules or MFA requirements per tenant.
  • Pools can be hosted in different AWS Regions to meet data residency needs.

Potential disadvantages of the one pool per tenant model:

  • There are limits on the number of pools per account. (The default is 1,000 pools, and the maximum is 10,000.)
  • Additional automation is required to create multiple pools, especially with customized configurations.
  • Applications must implement tenant routing to direct authentication requests to the correct user pool.
  • Troubleshooting can be more difficult, because configuration of each pool is managed separately and tenant routing functionality is added.

In summary, separate user pools maximize tenant isolation but require more complex provisioning and routing. You might also need to consider limits on the pool count for large multi-tenant deployments.

Pattern 5: Application client per tenant (bridge model)

You can achieve some extra tenant isolation by using separate application clients per tenant in a single user pool, in addition to using groups and custom attributes. Cognito configurations from the application client, such as OAuth scopes, hosted UI customization, and security policies can be specific to each tenant. The application client also enables external IdP federation per tenant. However, user pool–level settings, such as password policy, remain shared.

Figure 4 shows how a single user pool can be configured with multiple application clients. Each of those application clients is assigned to a tenant. However, this approach requires that you implement tenant routing logic in the application to determine which application client a tenant should be mapped to (similar to the approach we discussed for the shared user pool). Once the user is authenticated, you can configure Amazon API Gateway with a Lambda authorizer function that validates the ID token signature. Subsequently, the Lambda authorizer can determine which backend resources and services each authenticated user is authorized to access.

Figure 4: Application client based multi-tenancy

Figure 4: Application client based multi-tenancy

For tenants that want to use their own IdP through SAML or OpenID Connect federation, you can create a dedicated application client that will redirect users to authenticate with the tenant’s federated IdP. This has some key benefits:

  • If a single external IdP is enabled on the application client, the hosted UI automatically redirects users without presenting Cognito sign-in screens. This provides a familiar sign-in experience for tenants and is frictionless if users have existing sessions with the tenant IdP.
  • Management of user activities like joining and leaving, passwords, and other tasks are entirely handled by the tenant in their own IdP. The SaaS provider doesn’t need to get involved in these processes.

Importantly, even with federation, Cognito still issues tokens after successful external authentication. So the SaaS provider gets consistent tokens from Cognito to validate during authorization, regardless of the IdP.

Attribute mapping

When federating with an external IdP, Amazon Cognito can dynamically map attributes to populate the tokens it issues. This allows attributes like groups, email addresses, and roles created in the IdP to be passed to Cognito during authentication and added to the tokens.

The mapping occurs upon every sign-in, overwriting the existing mapped attributes to stay in sync with the latest IdP values. Therefore, changes made in the external IdP related to mapped attributes are reflected in Cognito after signing in. If a mapped attribute is required in the Cognito user pool, like email for sign-in, it must have an equivalent in the IdP to map. The target attributes in Cognito must be configured as mutable, since immutable attributes cannot be overwritten after creation, even through mapping.

Important: For SaaS identity, tenant attributes should be defined in Cognito rather than mapped from an external IdP. This helps to prevent tenants from tampering with values and maintains isolation. However, user attributes like groups and roles can be mapped from the tenant’s IdP to manage permissions. This allows tenants to configure application roles by using their own IdP groups.

Advantages of the bridge model:

  • This model enables tenant-specific configuration like OAuth scopes, UI, and IdPs.
  • Tenant users access familiar workflows through external IdPs, and when using external IdPs, tenant user management is handled externally.
  • No custom claim mappings are needed, but can be used optionally.
  • Cognito still issues tokens for authorization.

Disadvantages of the bridge model:

  • Requires routing users to the correct app client per tenant.
  • There is a limit on the number of app clients per user pool.
  • Some user pool settings remain shared, such as password policy.
  • There is no dynamic group claim modification.

Conclusion

In this blog post, we explored various ways Amazon Cognito user pools can enable multi-tenant identity for SaaS solutions. A single shared user pool simplifies management but limits the option to customize user pool–level policies, while separate pools maximize isolation and configurability at the cost of complexity. If you use multiple application clients, you can balance tailored options like external IdPs and OAuth scopes with centralized policies in the user pool. Custom claim mappings provide flexibility but require additional logic.

These two approaches can also be combined. For example, you can have dedicated user pools for select high-tier tenants while others share a multi-tenant pool. The optimal choice depends on the specific tenant needs and on the customization that is required.

In this blog post, we have mainly focused on a static approach. You can also use a pre-token generation Lambda trigger to modify tokens by adding, changing, or removing claims dynamically. The trigger can also override the group membership in both the identity and access tokens. Other claim changes only apply to the ID token. A common use case for this trigger is injecting tenant attributes into the token dynamically.

Evaluate the pros and cons of each approach against the requirements of the SaaS architecture and tenants. Often a hybrid model works best. Cognito constructs like user pools, IdPs, and triggers provide various levers that you can use to fine-tune authentication and authorization across tenants.

For further reading on these topics, see the Common Amazon Cognito scenarios topic in the Cognito Developer Guide and the related blog post How to Use Cognito Pre-Token Generation trigger to Customize Claims in ID Tokens.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on Amazon Cognito re:Post

Shubhankar Sumar

Shubhankar Sumar
Shubhankar is a Senior Solutions Architect at AWS, working with enterprise software and SaaS customers across the UK to help architect secure, scalable, efficient and cost-effective systems. He is an experienced software engineer having built many SaaS solutions. Shubhankar specializes in building multi-tenant platform on the cloud. He is also working closely with customers to bring in GenAI capabilities in their SaaS application.

Owen Hawkins

Owen Hawkins
With over 20 years of information security experience, Owen brings deep expertise to his role as a Principal Solutions Architect at AWS. He works closely with ISV customers, drawing on his extensive background in digital banking security. Owen specializes in SaaS and multi-tenant architecture. He is passionate about enabling companies to securely embrace the cloud. Solving complex challenges excites Owen, who thrives on finding innovative ways to protect and run applications on AWS.

SaaS access control using Amazon Verified Permissions with a per-tenant policy store

Post Syndicated from Manuel Heinkel original https://aws.amazon.com/blogs/security/saas-access-control-using-amazon-verified-permissions-with-a-per-tenant-policy-store/

Access control is essential for multi-tenant software as a service (SaaS) applications. SaaS developers must manage permissions, fine-grained authorization, and isolation.

In this post, we demonstrate how you can use Amazon Verified Permissions for access control in a multi-tenant document management SaaS application using a per-tenant policy store approach. We also describe how to enforce the tenant boundary.

We usually see the following access control needs in multi-tenant SaaS applications:

  • Application developers need to define policies that apply across all tenants.
  • Tenant users need to control who can access their resources.
  • Tenant admins need to manage all resources for a tenant.

Additionally, independent software vendors (ISVs) implement tenant isolation to prevent one tenant from accessing the resources of another tenant. Enforcing tenant boundaries is imperative for SaaS businesses and is one of the foundational topics for SaaS providers.

Verified Permissions is a scalable, fine-grained permissions management and authorization service that helps you build and modernize applications without having to implement authorization logic within the code of your application.

Verified Permissions uses the Cedar language to define policies. A Cedar policy is a statement that declares which principals are explicitly permitted, or explicitly forbidden, to perform an action on a resource. The collection of policies defines the authorization rules for your application. Verified Permissions stores the policies in a policy store. A policy store is a container for policies and templates. You can learn more about Cedar policies from the Using Open Source Cedar to Write and Enforce Custom Authorization Policies blog post.

Before Verified Permissions, you had to implement authorization logic within the code of your application. Now, we’ll show you how Verified Permissions helps remove this undifferentiated heavy lifting in an example application.

Multi-tenant document management SaaS application

The application allows to add, share, access and manage documents. It requires the following access controls:

  • Application developers who can define policies that apply across all tenants.
  • Tenant users who can control who can access their documents.
  • Tenant admins who can manage all documents for a tenant.

Let’s start by describing the application architecture and then dive deeper into the design details.

Application architecture overview

There are two approaches to multi-tenant design in Verified Permissions: a single shared policy store and a per-tenant policy store. You can learn about the considerations, trade-offs and guidance for these approaches in the Verified Permissions user guide.

For the example document management SaaS application, we decided to use the per-tenant policy store approach for the following reasons:

  • Low-effort tenant policies isolation
  • The ability to customize templates and schema per tenant
  • Low-effort tenant off-boarding
  • Per-tenant policy store resource quotas

We decided to accept the following trade-offs:

  • High effort to implement global policies management (because the application use case doesn’t require frequent changes to these policies)
  • Medium effort to implement the authorization flow (because we decided that in this context, the above reasons outweigh implementing a mapping from tenant ID to policy store ID)

Figure 1 shows the document management SaaS application architecture. For simplicity, we omitted the frontend and focused on the backend.

Figure 1: Document management SaaS application architecture

Figure 1: Document management SaaS application architecture

  1. A tenant user signs in to an identity provider such as Amazon Cognito. They get a JSON Web Token (JWT), which they use for API requests. The JWT contains claims such as the user_id, which identifies the tenant user, and the tenant_id, which defines which tenant the user belongs to.
  2. The tenant user makes API requests with the JWT to the application.
  3. Amazon API Gateway verifies the validity of the JWT with the identity provider.
  4. If the JWT is valid, API Gateway forwards the request to the compute provider, in this case an AWS Lambda function, for it to run the business logic.
  5. The Lambda function assumes an AWS Identity and Access Management (IAM) role with an IAM policy that allows access to the Amazon DynamoDB table that provides tenant-to-policy-store mapping. The IAM policy scopes down access such that the Lambda function can only access data for the current tenant_id.
  6. The Lambda function looks up the Verified Permissions policy_store_id for the current request. To do this, it extracts the tenant_id from the JWT. The function then retrieves the policy_store_id from the tenant-to-policy-store mapping table.
  7. The Lambda function assumes another IAM role with an IAM policy that allows access to the Verified Permissions policy store, the document metadata table, and the document store. The IAM policy uses tenant_id and policy_store_id to scope down access.
  8. The Lambda function gets or stores documents metadata in a DynamoDB table. The function uses the metadata for Verified Permissions authorization requests.
  9. Using the information from steps 5 and 6, the Lambda function calls Verified Permissions to make an authorization decision or create Cedar policies.
  10. If authorized, the application can then access or store a document.

Application architecture deep dive

Now that you know the architecture for the use cases, let’s review them in more detail and work backwards from the user experience to the related part of the application architecture. The architecture focuses on permissions management. Accessing and storing the actual document is out of scope.

Define policies that apply across all tenants

The application developer must define global policies that include a basic set of access permissions for all tenants. We use Cedar policies to implement these permissions.

Because we’re using a per-tenant policy store approach, the tenant onboarding process should create these policies for each new tenant. Currently, to update policies, the deployment pipeline should apply changes to all policy stores.

The “Add a document” and “Manage all the documents for a tenant” sections that follow include examples of global policies.

Make sure that a tenant can’t edit the policies of another tenant

The application uses IAM to isolate the resources of one tenant from another. Because we’re using a per-tenant policy store approach we can use IAM to isolate one tenant policy store from another.

Architecture

Figure 2: Tenant isolation

Figure 2: Tenant isolation

  1. A tenant user calls an API endpoint using a valid JWT.
  2. The Lambda function uses AWS Security Token Service (AWS STS) to assume an IAM role with an IAM policy that allows access to the tenant-to-policy-store mapping DynamoDB table. The IAM policy only allows access to the table and the entries that belong to the requesting tenant. When the function assumes the role, it uses tenant_id to scope access to the items whose partition key matches the tenant_id. See the How to implement SaaS tenant isolation with ABAC and AWS IAM blog post for examples of such policies.
  3. The Lambda function uses the user’s tenant_id to get the Verified Permissions policy_store_id.
  4. The Lambda function uses the same mechanism as in step 2 to assume a different IAM role using tenant_id and policy_store_id which only allows access to the tenant policy store.
  5. The Lambda function accesses the tenant policy store.

Add a document

When a user first accesses the application, they don’t own any documents. To add a document, the frontend calls the POST /documents endpoint and supplies a document_name in the request’s body.

Cedar policy

We need a global policy that allows every tenant user to add a new document. The tenant onboarding process creates this policy in the tenant’s policy store.

permit (    
  principal,
  action == DocumentsAPI::Action::"addDocument",
  resource
);

This policy allows any principal to add a document. Because we’re using a per-tenant policy store approach, there’s no need to scope the principal to a tenant.

Architecture

Figure 3: Adding a document

Figure 3: Adding a document

  1. A tenant user calls the POST /documents endpoint to add a document.
  2. The Lambda function uses the user’s tenant_id to get the Verified Permissions policy_store_id.
  3. The Lambda function calls the Verified Permissions policy store to check if the tenant user is authorized to add a document.
  4. After successful authorization, the Lambda function adds a new document to the documents metadata database and uploads the document to the documents storage.

The database structure is described in the following table:

tenant_id (Partition key): String document_id (Sort key): String document_name: String document_owner: String
<TENANT_ID> <DOCUMENT_ID> <DOCUMENT_NAME> <USER_ID>
  • tenant_id: The tenant_id from the JWT claims.
  • document_id: A random identifier for the document, created by the application.
  • document_name: The name of the document supplied with the API request.
  • document_owner: The user who created the document. The value is the user_id from the JWT claims.

Share a document with another user of a tenant

After a tenant user has created one or more documents, they might want to share them with other users of the same tenant. To share a document, the frontend calls the POST /shares endpoint and provides the document_id of the document the user wants to share and the user_id of the receiving user.

Cedar policy

We need a global document owner policy that allows the document owner to manage the document, including sharing. The tenant onboarding process creates this policy in the tenant’s policy store.

permit (    
  principal,
  action,
  resource
) when {
  resource.owner == principal && 
  resource.type == "document"
};

The policy allows principals to perform actions on available resources (the document) when the principal is the document owner. This policy allows the shareDocument action, which we describe next, to share a document.

We also need a share policy that allows the receiving user to access the document. The application creates these policies for each successful share action. We recommend that you use policy templates to define the share policy. Policy templates allow a policy to be defined once and then attached to multiple principals and resources. Policies that use a policy template are called template-linked policies. Updates to the policy template are reflected across the principals and resources that use the template. The tenant onboarding process creates the share policy template in the tenant’s policy store.

We define the share policy template as follows:

permit (    
  principal == ?principal,  
  action == DocumentsAPI::Action::"accessDocument",
  resource == ?resource 
);

The following is an example of a template-linked policy using the share policy template:

permit (    
  principal == DocumentsAPI::User::"<user_id>",
  action == DocumentsAPI::Action::"accessDocument",
  resource == DocumentsAPI::Document::"<document_id>" 
);

The policy includes the user_id of the receiving user (principal) and the document_id of the document (resource).

Architecture

Figure 4: Sharing a document

Figure 4: Sharing a document

  1. A tenant user calls the POST /shares endpoint to share a document.
  2. The Lambda function uses the user’s tenant_id to get the Verified Permissions policy_store_id and policy template IDs for each action from the DynamoDB table that stores the tenant to policy store mapping. In this case the function needs to use the share_policy_template_id.
  3. The function queries the documents metadata DynamoDB table to retrieve the document_owner attribute for the document the user wants to share.
  4. The Lambda function calls Verified Permissions to check if the user is authorized to share the document. The request context uses the user_id from the JWT claims as the principal, shareDocument as the action, and the document_id as the resource. The document entity includes the document_owner attribute, which came from the documents metadata DynamoDB table.
  5. If the user is authorized to share the resource, the function creates a new template-linked share policy in the tenant’s policy store. This policy includes the user_id of the receiving user as the principal and the document_id as the resource.

Access a shared document

After a document has been shared, the receiving user wants to access the document. To access the document, the frontend calls the GET /documents endpoint and provides the document_id of the document the user wants to access.

Cedar policy

As shown in the previous section, during the sharing process, the application creates a template-linked share policy that allows the receiving user to access the document. Verified Permissions evaluates this policy when the user tries to access the document.

Architecture

Figure 5: Accessing a shared document

Figure 5: Accessing a shared document

  1. A tenant user calls the GET /documents endpoint to access the document.
  2. The Lambda function uses the user’s tenant_id to get the Verified Permissions policy_store_id.
  3. The Lambda function calls Verified Permissions to check if the user is authorized to access the document. The request context uses the user_id from the JWT claims as the principal, accessDocument as the action, and the document_id as the resource.

Manage all the documents for a tenant

When a customer signs up for a SaaS application, the application creates the tenant admin user. The tenant admin must have permissions to perform all actions on all documents for the tenant.

Cedar policy

We need a global policy that allows tenant admins to manage all documents. The tenant onboarding process creates this policy in the tenant’s policy store.

permit (    
  principal in DocumentsAPI::Group::"<admin_group_id>”,
  action,
  resource
);

This policy allows every member of the <admin_group_id> group to perform any action on any document.

Architecture

Figure 6: Managing documents

Figure 6: Managing documents

  1. A tenant admin calls the POST /documents endpoint to manage a document. 
  2. The Lambda function uses the user’s tenant_id to get the Verified Permissions policy_store_id.
  3. The Lambda function calls Verified Permissions to check if the user is authorized to manage the document.

Conclusion

In this blog post, we showed you how Amazon Verified Permissions helps to implement fine-grained authorization decisions in a multi-tenant SaaS application. You saw how to apply the per-tenant policy store approach to the application architecture. See the Verified Permissions user guide for how to choose between using a per-tenant policy store or one shared policy store. To learn more, visit the Amazon Verified Permissions documentation and workshop.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Verified Permissions re:Post or contact AWS Support.

Manuel Heinkel

Manuel Heinkel

Manuel is a Solutions Architect at AWS, working with software companies in Germany to build innovative and secure applications in the cloud. He supports customers in solving business challenges and achieving success with AWS. Manuel has a track record of diving deep into security and SaaS topics. Outside of work, he enjoys spending time with his family and exploring the mountains.

Alex Pulver

Alex Pulver

Alex is a Principal Solutions Architect at AWS. He works with customers to help design processes and solutions for their business needs. His current areas of interest are product engineering, developer experience, and platform strategy. He’s the creator of Application Design Framework, which aims to align business and technology, reduce rework, and enable evolutionary architecture.

Security practices in AWS multi-tenant SaaS environments

Post Syndicated from Keith P original https://aws.amazon.com/blogs/security/security-practices-in-aws-multi-tenant-saas-environments/

Securing software-as-a-service (SaaS) applications is a top priority for all application architects and developers. Doing so in an environment shared by multiple tenants can be even more challenging. Identity frameworks and concepts can take time to understand, and forming tenant isolation in these environments requires deep understanding of different tools and services.

While security is a foundational element of any software application, specific considerations apply to SaaS applications. This post dives into the challenges, opportunities and best practices for securing multi-tenant SaaS environments on Amazon Web Services (AWS).

SaaS application security considerations

Single tenant applications are often deployed for a specific customer, and typically only deal with this single entity. While security is important in these environments, the threat profile does not include potential access by other customers. Multi-tenant SaaS applications have unique security considerations when compared to single tenant applications.

In particular, multi-tenant SaaS applications must pay special attention to identity and tenant isolation. These considerations are in addition to the security measures all applications must take. This blog post reviews concepts related to identity and tenant isolation, and how AWS can help SaaS providers build secure applications.

Identity

SaaS applications are accessed by individual principals (often referred to as users). These principals may be interactive (for example, through a web application) or machine-based (for example, through an API). Each principal is uniquely identified, and is usually associated with information about the principal, including email address, name, role and other metadata.

In addition to the unique identification of each individual principal, a SaaS application has another construct: a tenant. A paper on multi-tenancy defines a tenant as a group of one or more users sharing the same view on an application they use. This view may differ for different tenants. Each individual principal is associated with a tenant, even if it is only a 1:1 mapping. A tenant is uniquely identified, and contains information about the tenant administrator, billing information and other metadata.

When a principal makes a request to a SaaS application, the principal provides their tenant and user identifier along with the request. The SaaS application validates this information and makes an authorization decision. In well-designed SaaS applications, this authorization step should not rely on a centralized authorization service. A centralized authorization service is a single point of failure in an application. If it fails, or is overwhelmed with requests, the application will no longer be able to process requests.

There are two key techniques to providing this type of experience in a SaaS application: using an identity provider (IdP) and representing identity or authorization in a token.

Using an Identity Provider (IdP)

In the past, some web applications often stored user information in a relational database table. When a principal authenticated successfully, the application issued a session ID. For subsequent requests, the principal passed the session ID to the application. The application made authorization decisions based on this session ID. Figure 1 provides an example of how this setup worked.

Figure 1 - An example of legacy application authentication.

Figure 1 – An example of legacy application authentication.

In applications larger than a simple web application, this pattern is suboptimal. Each request usually results in at least one database query or cache look up, creating a bottleneck on the data store holding the user or session information. Further, because of the tight coupling between the application and its user management, federation with external identity providers becomes difficult.

When designing your SaaS application, you should consider the use of an identity provider like Amazon Cognito, Auth0, or Okta. Using an identity provider offloads the heavy lifting required for managing identity by having user authentication, including federation, handled by external identity providers. Figure 2 provides an example of how a SaaS provider can use an identity provider in place of the self-managed solution shown in Figure 1.

Figure 2 – An example of an authentication flow that involves an identity provider.

Figure 2 – An example of an authentication flow that involves an identity provider.

Once a user authenticates with an identity provider, the identity provider issues a standardized token. This token is the same regardless of how a user authenticates, which means your application does not need to build in support for multiple different authentication methods tenants might use.

Identity providers also commonly support federated access. Federated access means that a third party maintains the identities, but the identity provider has a trust relationship with this third party. When a customer tries to log in with an identity managed by the third party, the SaaS application’s identity provider handles the authentication transaction with the third-party identity provider.

This authentication transaction commonly uses a protocol like Security Assertion Markup Language (SAML) 2.0. The SaaS application’s identity provider manages the interaction with the tenant’s identity provider. The SaaS application’s identity provider issues a token in a format understood by the SaaS application. Figure 3 provides an example of how a SaaS application can provide support for federation using an identity provider.

Figure 3 - An example of authentication that involves a tenant-provided identity provider

Figure 3 – An example of authentication that involves a tenant-provided identity provider

For an example, see How to set up Amazon Cognito for federated authentication using Azure AD.

Representing identity with tokens

Identity is usually represented by signed tokens. JSON Web Signatures (JWS), often referred to as JSON Web Tokens (JWT), are signed JSON objects used in web applications to demonstrate that the bearer is authorized to access a particular resource. These JSON objects are signed by the identity provider, and can be validated without querying a centralized database or service.

The token contains several key-value pairs, called claims, which are issued by the identity provider. Besides several claims relating to the issuance and expiration of the token, the token can also contain information about the individual principal and tenant.

Sample access token claims

The example below shows the claims section of a typical access token issued by Amazon Cognito in JWT format.

{
  "sub": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "cognito:groups": [
"TENANT-1"
  ],
  "token_use": "access",
  "auth_time": 1562190524,
  "iss": "https://cognito-idp.us-west-2.amazonaws.com/us-west-2_example",
  "exp": 1562194124,
  "iat": 1562190524,
  "origin_jti": "bbbbbbbbb-cccc-dddd-eeee-aaaaaaaaaaaa",
  "jti": "cccccccc-dddd-eeee-aaaa-bbbbbbbbbbbb",
  "client_id": "12345abcde",

The principal, and the tenant the principal is associated with, are represented in this token by the combination of the user identifier (the sub claim) and the tenant ID in the cognito:groups claim. In this example, the SaaS application represents a tenant by creating a Cognito group per tenant. Other identity providers may allow you to add a custom attribute to a user that is reflected in the access token.

When a SaaS application receives a JWT as part of a request, the application validates the token and unpacks its contents to make authorization decisions. The claims within the token set what is known as the tenant context. Much like the way environment variables can influence a command line application, the tenant context influences how the SaaS application processes the request.

By using a JWT, the SaaS application can process a request without frequent reference to an external identity provider or other centralized service.

Tenant isolation

Tenant isolation is foundational to every SaaS application. Each SaaS application must ensure that one tenant cannot access another tenant’s resources. The SaaS application must create boundaries that adequately isolate one tenant from another.

Determining what constitutes sufficient isolation depends on your domain, deployment model and any applicable compliance frameworks. The techniques for isolating tenants from each other depend on the isolation model and the applications you use. This section provides an overview of tenant isolation strategies.

Your deployment model influences isolation

How an application is deployed influences how tenants are isolated. SaaS applications can use three types of isolation: silo, pool, and bridge.

Silo deployment model

The silo deployment model involves customers deploying one set of infrastructure per tenant. Depending on the application, this may mean a VPC-per-tenant, a set of containers per tenant, or some other resource that is deployed for each tenant. In this model, there is one deployment per tenant, though there may be some shared infrastructure for cross-tenant administration. Figure 4 shows an example of a siloed deployment that uses a VPC-per-tenant model.

Figure 4 - An example of a siloed deployment that provisions a VPC-per-tenant

Figure 4 – An example of a siloed deployment that provisions a VPC-per-tenant

Pool deployment model

The pool deployment model involves a shared set of infrastructure for all tenants. Tenant isolation is implemented logically in the application through application-level constructs. Rather than having separate resources per tenant, isolation enforcement occurs within the application. Figure 5 shows an example of a pooled deployment model that uses serverless technologies.

Figure 5 - An example of a pooled deployment model using serverless technologies

Figure 5 – An example of a pooled deployment model using serverless technologies

In Figure 5, an AWS Lambda function that retrieves an item from an Amazon DynamoDB table shared by all tenants needs temporary credentials issued by the AWS Security Token Service. These credentials only allow the requester to access items in the table that belong to the tenant making the request. A requester gets these credentials by assuming an AWS Identity and Access Management (IAM) role. This allows a SaaS application to share the underlying infrastructure, while still isolating tenants from one another. See Isolation enforcement depends on service below for more details on this pattern.

Bridge deployment model

The bridge model combines elements of both the silo and pool models. Some resources may be separate, others may be shared. For example, suppose your application has a shared application layer and an Amazon Relational Database Service (RDS) instance per tenant. The application layer evaluates each request and connects to the database for the tenant that made the request.

This model is useful in a situation where each tenant may require a certain response time and one set of resources acts as a bottleneck. In the RDS example, the application layer could handle the requests imposed by the tenants, but a single RDS instance could not.

The decision on which isolation model to implement depends on your customer’s requirements, compliance needs or industry needs. You may find that some customers can be deployed onto a pool model, while larger customers may require their own silo deployment.

Your tiering strategy may also influence the type of isolation model you use. For example, a basic tier customer might be deployed onto pooled infrastructure, while an enterprise tier customer is deployed onto siloed infrastructure.

For more information about different tenant isolation models, read the tenant isolation strategies whitepaper.

Isolation enforcement depends on service

Most SaaS applications will need somewhere to store state information. This could be a relational database, a NoSQL database, or some other storage medium which persists state. SaaS applications built on AWS use various mechanisms to enforce tenant isolation when accessing a persistent storage medium.

IAM provides fine grain access controls access for the AWS API. Some services, like Amazon Simple Storage Service (Amazon S3) and DynamoDB, provide the ability to control access to individual objects or items with IAM policies. When possible, your application should use IAM’s built-in functionality to limit access to tenant resources. See Isolating SaaS Tenants with Dynamically Generated IAM Policies for more information about using IAM to implement tenant isolation.

AWS IAM also offers the ability to restrict access to resources based on tags. This is known as attribute-based access control (ABAC). This technique allows you to apply tags to supported resources, and make access control decisions based on which tags are applied. This is a more scalable access control mechanism than role-based access control (RBAC), because you do not need to modify an IAM policy each time a resource is added or removed. See How to implement SaaS tenant isolation with ABAC and AWS IAM for more information about how this can be applied to a SaaS application.

Some relational databases offer features that can enforce tenant isolation. For example, PostgreSQL offers a feature called row level security (RLS). Depending on the context in which the query is sent to the database, only tenant-specific items are returned in the results. See Multi-tenant data isolation with PostgreSQL Row Level Security for more information about row level security in PostgreSQL.

Other persistent storage mediums do not have fine grain permission models. They may, however, offer some kind of state container per tenant. For example, when using MongoDB, each tenant is assigned a MongoDB user and a MongoDB database. The secret associated with the user can be stored in AWS Secrets Manager. When retrieving a tenant’s data, the SaaS application first retrieves the secret, then authenticates with MongoDB. This creates tenant isolation because the associated credentials only have permission to access collections in a tenant-specific database.

Generally, if the persistent storage medium you’re using offers its own permission model that can enforce tenant isolation, you should use it, since this keeps you from having to implement isolation in your application. However, there may be cases where your data store does not offer this level of isolation. In this situation, you would need to write application-level tenant isolation enforcement. Application-level tenant isolation means that the SaaS application, rather than the persistent storage medium, makes sure that one tenant cannot access another tenant’s data.

Conclusion

This post reviews the challenges, opportunities and best practices for the unique security considerations associated with a multi-tenant SaaS application, and describes specific identity considerations, as well as tenant isolation methods.

If you’d like to know more about the topics above, the AWS Well-Architected SaaS Lens Security pillar dives deep on performance management in SaaS environments. It also provides best practices and resources to help you design and improve performance efficiency in your SaaS application.

Get Started with the AWS Well-Architected SaaS Lens

The AWS Well-Architected SaaS Lens focuses on SaaS workloads, and is intended to drive critical thinking for developing and operating SaaS workloads. Each question in the lens has a list of best practices, and each best practice has a list of improvement plans to help guide you in implementing them.

The lens can be applied to existing workloads, or used for new workloads you define in the tool. You can use it to improve the application you’re working on, or to get visibility into multiple workloads used by the department or area you’re working with.

The SaaS Lens is available in all Regions where the AWS Well-Architected Tool is offered, as described in the AWS Regional Services List. There are no costs for using the AWS Well-Architected Tool.

If you’re an AWS customer, find current AWS Partners that can conduct a review by learning about AWS Well-Architected Partners and AWS SaaS Competency Partners.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Security Hub forum. To start your 30-day free trial of Security Hub, visit AWS Security Hub.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Keith P

Keith is a senior partner solutions architect on the SaaS Factory team.

Andy Powell

Andy is the global lead partner for solutions architecture on the SaaS Factory team.

Implement tenant isolation for Amazon S3 and Aurora PostgreSQL by using ABAC

Post Syndicated from Ashutosh Upadhyay original https://aws.amazon.com/blogs/security/implement-tenant-isolation-for-amazon-s3-and-aurora-postgresql-by-using-abac/

In software as a service (SaaS) systems, which are designed to be used by multiple customers, isolating tenant data is a fundamental responsibility for SaaS providers. The practice of isolation of data in a multi-tenant application platform is called tenant isolation. In this post, we describe an approach you can use to achieve tenant isolation in Amazon Simple Storage Service (Amazon S3) and Amazon Aurora PostgreSQL-Compatible Edition databases by implementing attribute-based access control (ABAC). You can also adapt the same approach to achieve tenant isolation in other AWS services.

ABAC in Amazon Web Services (AWS), which uses tags to store attributes, offers advantages over the traditional role-based access control (RBAC) model. You can use fewer permissions policies, update your access control more efficiently as you grow, and last but not least, apply granular permissions for various AWS services. These granular permissions help you to implement an effective and coherent tenant isolation strategy for your customers and clients. Using the ABAC model helps you scale your permissions and simplify the management of granular policies. The ABAC model reduces the time and effort it takes to maintain policies that allow access to only the required resources.

The solution we present here uses the pool model of data partitioning. The pool model helps you avoid the higher costs of duplicated resources for each tenant and the specialized infrastructure code required to set up and maintain those copies.

Solution overview

In a typical customer environment where this solution is implemented, the tenant request for access might land at Amazon API Gateway, together with the tenant identifier, which in turn calls an AWS Lambda function. The Lambda function is envisaged to be operating with a basic Lambda execution role. This Lambda role should also have permissions to assume the tenant roles. As the request progresses, the Lambda function assumes the tenant role and makes the necessary calls to Amazon S3 or to an Aurora PostgreSQL-Compatible database. This solution helps you to achieve tenant isolation for objects stored in Amazon S3 and data elements stored in an Aurora PostgreSQL-Compatible database cluster.

Figure 1 shows the tenant isolation architecture for both Amazon S3 and Amazon Aurora PostgreSQL-Compatible databases.

Figure 1: Tenant isolation architecture diagram

Figure 1: Tenant isolation architecture diagram

As shown in the numbered diagram steps, the workflow for Amazon S3 tenant isolation is as follows:

  1. AWS Lambda sends an AWS Security Token Service (AWS STS) assume role request to AWS Identity and Access Management (IAM).
  2. IAM validates the request and returns the tenant role.
  3. Lambda sends a request to Amazon S3 with the assumed role.
  4. Amazon S3 sends the response back to Lambda.

The diagram also shows the workflow steps for tenant isolation for Aurora PostgreSQL-Compatible databases, as follows:

  1. Lambda sends an STS assume role request to IAM.
  2. IAM validates the request and returns the tenant role.
  3. Lambda sends a request to IAM for database authorization.
  4. IAM validates the request and returns the database password token.
  5. Lambda sends a request to the Aurora PostgreSQL-Compatible database with the database user and password token.
  6. Aurora PostgreSQL-Compatible database returns the response to Lambda.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  1. An AWS account for your workload.
  2. An Amazon S3 bucket.
  3. An Aurora PostgreSQL-Compatible cluster with a database created.

    Note: Make sure to note down the default master database user and password, and make sure that you can connect to the database from your desktop or from another server (for example, from Amazon Elastic Compute Cloud (Amazon EC2) instances).

  4. A security group and inbound rules that are set up to allow an inbound PostgreSQL TCP connection (Port 5432) from Lambda functions. This solution uses regular non-VPC Lambda functions, and therefore the security group of the Aurora PostgreSQL-Compatible database cluster should allow an inbound PostgreSQL TCP connection (Port 5432) from anywhere (0.0.0.0/0).

Make sure that you’ve completed the prerequisites before proceeding with the next steps.

Deploy the solution

The following sections describe how to create the IAM roles, IAM policies, and Lambda functions that are required for the solution. These steps also include guidelines on the changes that you’ll need to make to the prerequisite components Amazon S3 and the Aurora PostgreSQL-Compatible database cluster.

Step 1: Create the IAM policies

In this step, you create two IAM policies with the required permissions for Amazon S3 and the Aurora PostgreSQL database.

To create the IAM policies

  1. Open the AWS Management Console.
  2. Choose IAM, choose Policies, and then choose Create policy.
  3. Use the following JSON policy document to create the policy. Replace the placeholder <111122223333> with the bucket name from your account.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "s3:Get*",
                    "s3:List*"
                ],
                "Resource": "arn:aws:s3:::sts-ti-demo-<111122223333>/${aws:PrincipalTag/s3_home}/*"
            }
        ]
    }
    

  4. Save the policy with the name sts-ti-demo-s3-access-policy.

    Figure 2: Create the IAM policy for Amazon S3 (sts-ti-demo-s3-access-policy)

    Figure 2: Create the IAM policy for Amazon S3 (sts-ti-demo-s3-access-policy)

  5. Open the AWS Management Console.
  6. Choose IAM, choose Policies, and then choose Create policy.
  7. Use the following JSON policy document to create a second policy. This policy grants an IAM role permission to connect to an Aurora PostgreSQL-Compatible database through a database user that is IAM authenticated. Replace the placeholders with the appropriate Region, account number, and cluster resource ID of the Aurora PostgreSQL-Compatible database cluster, respectively.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "rds-db:connect"
            ],
            "Resource": [
                "arn:aws:rds-db:<us-west-2>:<111122223333>:dbuser:<cluster- ZTISAAAABBBBCCCCDDDDEEEEL4>/${aws:PrincipalTag/dbuser}"
            ]
        }
    ]
}
  • Save the policy with the name sts-ti-demo-dbuser-policy.

    Figure 3: Create the IAM policy for Aurora PostgreSQL database (sts-ti-demo-dbuser-policy)

    Figure 3: Create the IAM policy for Aurora PostgreSQL database (sts-ti-demo-dbuser-policy)

Note: Make sure that you use the cluster resource ID for the clustered database. However, if you intend to adapt this solution for your Aurora PostgreSQL-Compatible non-clustered database, you should use the instance resource ID instead.

Step 2: Create the IAM roles

In this step, you create two IAM roles for the two different tenants, and also apply the necessary permissions and tags.

To create the IAM roles

  1. In the IAM console, choose Roles, and then choose Create role.
  2. On the Trusted entities page, choose the EC2 service as the trusted entity.
  3. On the Permissions policies page, select sts-ti-demo-s3-access-policy and sts-ti-demo-dbuser-policy.
  4. On the Tags page, add two tags with the following keys and values.

    Tag key Tag value
    s3_home tenant1_home
    dbuser tenant1_dbuser
  5. On the Review screen, name the role assumeRole-tenant1, and then choose Save.
  6. In the IAM console, choose Roles, and then choose Create role.
  7. On the Trusted entities page, choose the EC2 service as the trusted entity.
  8. On the Permissions policies page, select sts-ti-demo-s3-access-policy and sts-ti-demo-dbuser-policy.
  9. On the Tags page, add two tags with the following keys and values.

    Tag key Tag value
    s3_home tenant2_home
    dbuser tenant2_dbuser
  10. On the Review screen, name the role assumeRole-tenant2, and then choose Save.

Step 3: Create and apply the IAM policies for the tenants

In this step, you create a policy and a role for the Lambda functions. You also create two separate tenant roles, and establish a trust relationship with the role that you created for the Lambda functions.

To create and apply the IAM policies for tenant1

  1. In the IAM console, choose Policies, and then choose Create policy.
  2. Use the following JSON policy document to create the policy. Replace the placeholder <111122223333> with your AWS account number.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": "sts:AssumeRole",
                "Resource": [
                    "arn:aws:iam::<111122223333>:role/assumeRole-tenant1",
                    "arn:aws:iam::<111122223333>:role/assumeRole-tenant2"
                ]
            }
        ]
    }
    

  3. Save the policy with the name sts-ti-demo-assumerole-policy.
  4. In the IAM console, choose Roles, and then choose Create role.
  5. On the Trusted entities page, select the Lambda service as the trusted entity.
  6. On the Permissions policies page, select sts-ti-demo-assumerole-policy and AWSLambdaBasicExecutionRole.
  7. On the review screen, name the role sts-ti-demo-lambda-role, and then choose Save.
  8. In the IAM console, go to Roles, and enter assumeRole-tenant1 in the search box.
  9. Select the assumeRole-tenant1 role and go to the Trust relationship tab.
  10. Choose Edit the trust relationship, and replace the existing value with the following JSON document. Replace the placeholder <111122223333> with your AWS account number, and choose Update trust policy to save the policy.
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::<111122223333>:role/sts-ti-demo-lambda-role"
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }
    

To verify that the policies are applied correctly for tenant1

In the IAM console, go to Roles, and enter assumeRole-tenant1 in the search box. Select the assumeRole-tenant1 role and on the Permissions tab, verify that sts-ti-demo-dbuser-policy and sts-ti-demo-s3-access-policy appear in the list of policies, as shown in Figure 4.

Figure 4: The assumeRole-tenant1 Permissions tab

Figure 4: The assumeRole-tenant1 Permissions tab

On the Trust relationships tab, verify that sts-ti-demo-lambda-role appears under Trusted entities, as shown in Figure 5.

Figure 5: The assumeRole-tenant1 Trust relationships tab

Figure 5: The assumeRole-tenant1 Trust relationships tab

On the Tags tab, verify that the following tags appear, as shown in Figure 6.

Tag key Tag value
dbuser tenant1_dbuser
s3_home tenant1_home

 

Figure 6: The assumeRole-tenant1 Tags tab

Figure 6: The assumeRole-tenant1 Tags tab

To create and apply the IAM policies for tenant2

  1. In the IAM console, go to Roles, and enter assumeRole-tenant2 in the search box.
  2. Select the assumeRole-tenant2 role and go to the Trust relationship tab.
  3. Edit the trust relationship, replacing the existing value with the following JSON document. Replace the placeholder <111122223333> with your AWS account number.
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::<111122223333>:role/sts-ti-demo-lambda-role"
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }
    

  4. Choose Update trust policy to save the policy.

To verify that the policies are applied correctly for tenant2

In the IAM console, go to Roles, and enter assumeRole-tenant2 in the search box. Select the assumeRole-tenant2 role and on the Permissions tab, verify that sts-ti-demo-dbuser-policy and sts-ti-demo-s3-access-policy appear in the list of policies, you did for tenant1. On the Trust relationships tab, verify that sts-ti-demo-lambda-role appears under Trusted entities.

On the Tags tab, verify that the following tags appear, as shown in Figure 7.

Tag key Tag value
dbuser tenant2_dbuser
s3_home tenant2_home
Figure 7: The assumeRole-tenant2 Tags tab

Figure 7: The assumeRole-tenant2 Tags tab

Step 4: Set up an Amazon S3 bucket

Next, you’ll set up an S3 bucket that you’ll use as part of this solution. You can either create a new S3 bucket or re-purpose an existing one. The following steps show you how to create two user homes (that is, S3 prefixes, which are also known as folders) in the S3 bucket.

  1. In the AWS Management Console, go to Amazon S3 and select the S3 bucket you want to use.
  2. Create two prefixes (folders) with the names tenant1_home and tenant2_home.
  3. Place two test objects with the names tenant.info-tenant1_home and tenant.info-tenant2_home in the prefixes that you just created, respectively.

Step 5: Set up test objects in Aurora PostgreSQL-Compatible database

In this step, you create a table in Aurora PostgreSQL-Compatible Edition, insert tenant metadata, create a row level security (RLS) policy, create tenant users, and grant permission for testing purposes.

To set up Aurora PostgreSQL-Compatible

  1. Connect to Aurora PostgreSQL-Compatible through a client of your choice, using the master database user and password that you obtained at the time of cluster creation.
  2. Run the following commands to create a table for testing purposes and to insert a couple of testing records.
    CREATE TABLE tenant_metadata (
        tenant_id VARCHAR(30) PRIMARY KEY,
        email     VARCHAR(50) UNIQUE,
        status    VARCHAR(10) CHECK (status IN ('active', 'suspended', 'disabled')),
        tier      VARCHAR(10) CHECK (tier IN ('gold', 'silver', 'bronze')));
    
    INSERT INTO tenant_metadata (tenant_id, email, status, tier) 
    VALUES ('tenant1_dbuser','[email protected]','active','gold');
    INSERT INTO tenant_metadata (tenant_id, email, status, tier) 
    VALUES ('tenant2_dbuser','[email protected]','suspended','silver');
    ALTER TABLE tenant_metadata ENABLE ROW LEVEL SECURITY;
    

  3. Run the following command to query the newly created database table.
    SELECT * FROM tenant_metadata;
    

    Figure 8: The tenant_metadata table content

    Figure 8: The tenant_metadata table content

  4. Run the following command to create the row level security policy.
    CREATE POLICY tenant_isolation_policy ON tenant_metadata
    USING (tenant_id = current_user);
    

  5. Run the following commands to establish two tenant users and grant them the necessary permissions.
    CREATE USER tenant1_dbuser WITH LOGIN;
    CREATE USER tenant2_dbuser WITH LOGIN;
    GRANT rds_iam TO tenant1_dbuser;
    GRANT rds_iam TO tenant2_dbuser;
    
    GRANT select, insert, update, delete ON tenant_metadata to tenant1_dbuser, tenant2_dbuser;
    

  6. Run the following commands to verify the newly created tenant users.
    SELECT usename AS role_name,
      CASE
         WHEN usesuper AND usecreatedb THEN
           CAST('superuser, create database' AS pg_catalog.text)
         WHEN usesuper THEN
            CAST('superuser' AS pg_catalog.text)
         WHEN usecreatedb THEN
            CAST('create database' AS pg_catalog.text)
         ELSE
            CAST('' AS pg_catalog.text)
      END role_attributes
    FROM pg_catalog.pg_user
    WHERE usename LIKE (‘tenant%’)
    ORDER BY role_name desc;
    

    Figure 9: Verify the newly created tenant users output

    Figure 9: Verify the newly created tenant users output

Step 6: Set up the AWS Lambda functions

Next, you’ll create two Lambda functions for Amazon S3 and Aurora PostgreSQL-Compatible. You also need to create a Lambda layer for the Python package PG8000.

To set up the Lambda function for Amazon S3

  1. Navigate to the Lambda console, and choose Create function.
  2. Choose Author from scratch. For Function name, enter sts-ti-demo-s3-lambda.
  3. For Runtime, choose Python 3.7.
  4. Change the default execution role to Use an existing role, and then select sts-ti-demo-lambda-role from the drop-down list.
  5. Keep Advanced settings as the default value, and then choose Create function.
  6. Copy the following Python code into the lambda_function.py file that is created in your Lambda function.
    import json
    import os
    import time 
    
    def lambda_handler(event, context):
        import boto3
        bucket_name     =   os.environ['s3_bucket_name']
    
        try:
            login_tenant_id =   event['login_tenant_id']
            data_tenant_id  =   event['s3_tenant_home']
        except:
            return {
                'statusCode': 400,
                'body': 'Error in reading parameters'
            }
    
        prefix_of_role  =   'assumeRole'
        file_name       =   'tenant.info' + '-' + data_tenant_id
    
        # create an STS client object that represents a live connection to the STS service
        sts_client = boto3.client('sts')
        account_of_role = sts_client.get_caller_identity()['Account']
        role_to_assume  =   'arn:aws:iam::' + account_of_role + ':role/' + prefix_of_role + '-' + login_tenant_id
    
        # Call the assume_role method of the STSConnection object and pass the role
        # ARN and a role session name.
        RoleSessionName = 'AssumeRoleSession' + str(time.time()).split(".")[0] + str(time.time()).split(".")[1]
        try:
            assumed_role_object = sts_client.assume_role(
                RoleArn         = role_to_assume, 
                RoleSessionName = RoleSessionName, 
                DurationSeconds = 900) #15 minutes
    
        except:
            return {
                'statusCode': 400,
                'body': 'Error in assuming the role ' + role_to_assume + ' in account ' + account_of_role
            }
    
        # From the response that contains the assumed role, get the temporary 
        # credentials that can be used to make subsequent API calls
        credentials=assumed_role_object['Credentials']
        
        # Use the temporary credentials that AssumeRole returns to make a connection to Amazon S3  
        s3_resource=boto3.resource(
            's3',
            aws_access_key_id=credentials['AccessKeyId'],
            aws_secret_access_key=credentials['SecretAccessKey'],
            aws_session_token=credentials['SessionToken']
        )
    
        try:
            obj = s3_resource.Object(bucket_name, data_tenant_id + "/" + file_name)
            return {
                'statusCode': 200,
                'body': obj.get()['Body'].read()
            }
        except:
            return {
                'statusCode': 400,
                'body': 'error in reading s3://' + bucket_name + '/' + data_tenant_id + '/' + file_name
            }
    

  7. Under Basic settings, edit Timeout to increase the timeout to 29 seconds.
  8. Edit Environment variables to add a key called s3_bucket_name, with the value set to the name of your S3 bucket.
  9. Configure a new test event with the following JSON document, and save it as testEvent.
    {
      "login_tenant_id": "tenant1",
      "s3_tenant_home": "tenant1_home"
    }
    

  10. Choose Test to test the Lambda function with the newly created test event testEvent. You should see status code 200, and the body of the results should contain the data for tenant1.

    Figure 10: The result of running the sts-ti-demo-s3-lambda function

    Figure 10: The result of running the sts-ti-demo-s3-lambda function

Next, create another Lambda function for Aurora PostgreSQL-Compatible. To do this, you first need to create a new Lambda layer.

To set up the Lambda layer

  1. Use the following commands to create a .zip file for Python package pg8000.

    Note: This example is created by using an Amazon EC2 instance running the Amazon Linux 2 Amazon Machine Image (AMI). If you’re using another version of Linux or don’t have the Python 3 or pip3 packages installed, install them by using the following commands.

    sudo yum update -y 
    sudo yum install python3 
    sudo pip3 install pg8000 -t build/python/lib/python3.8/site-packages/ 
    cd build 
    sudo zip -r pg8000.zip python/
    

  2. Download the pg8000.zip file you just created to your local desktop machine or into an S3 bucket location.
  3. Navigate to the Lambda console, choose Layers, and then choose Create layer.
  4. For Name, enter pgdb, and then upload pg8000.zip from your local desktop machine or from the S3 bucket location.

    Note: For more details, see the AWS documentation for creating and sharing Lambda layers.

  5. For Compatible runtimes, choose python3.6, python3.7, and python3.8, and then choose Create.

To set up the Lambda function with the newly created Lambda layer

  1. In the Lambda console, choose Function, and then choose Create function.
  2. Choose Author from scratch. For Function name, enter sts-ti-demo-pgdb-lambda.
  3. For Runtime, choose Python 3.7.
  4. Change the default execution role to Use an existing role, and then select sts-ti-demo-lambda-role from the drop-down list.
  5. Keep Advanced settings as the default value, and then choose Create function.
  6. Choose Layers, and then choose Add a layer.
  7. Choose Custom layer, select pgdb with Version 1 from the drop-down list, and then choose Add.
  8. Copy the following Python code into the lambda_function.py file that was created in your Lambda function.
    import boto3
    import pg8000
    import os
    import time
    import ssl
    
    connection = None
    assumed_role_object = None
    rds_client = None
    
    def assume_role(event):
        global assumed_role_object
        try:
            RolePrefix  = os.environ.get("RolePrefix")
            LoginTenant = event['login_tenant_id']
        
            # create an STS client object that represents a live connection to the STS service
            sts_client      = boto3.client('sts')
            # Prepare input parameters
            role_to_assume  = 'arn:aws:iam::' + sts_client.get_caller_identity()['Account'] + ':role/' + RolePrefix + '-' + LoginTenant
            RoleSessionName = 'AssumeRoleSession' + str(time.time()).split(".")[0] + str(time.time()).split(".")[1]
        
            # Call the assume_role method of the STSConnection object and pass the role ARN and a role session name.
            assumed_role_object = sts_client.assume_role(
                RoleArn         =   role_to_assume, 
                RoleSessionName =   RoleSessionName,
                DurationSeconds =   900) #15 minutes 
            
            return assumed_role_object['Credentials']
        except Exception as e:
            print({'Role assumption failed!': {'role': role_to_assume, 'Exception': 'Failed due to :{0}'.format(str(e))}})
            return None
    
    def get_connection(event):
        global rds_client
        creds = assume_role(event)
    
        try:
            # create an RDS client using assumed credentials
            rds_client = boto3.client('rds',
                aws_access_key_id       = creds['AccessKeyId'],
                aws_secret_access_key   = creds['SecretAccessKey'],
                aws_session_token       = creds['SessionToken'])
    
            # Read the environment variables and event parameters
            DBEndPoint   = os.environ.get('DBEndPoint')
            DatabaseName = os.environ.get('DatabaseName')
            DBUserName   = event['dbuser']
    
            # Generates an auth token used to connect to a database with IAM credentials.
            pwd = rds_client.generate_db_auth_token(
                DBHostname=DBEndPoint, Port=5432, DBUsername=DBUserName, Region='us-west-2'
            )
    
            ssl_context             = ssl.SSLContext()
            ssl_context.verify_mode = ssl.CERT_REQUIRED
            ssl_context.load_verify_locations('rds-ca-2019-root.pem')
    
            # create a database connection
            conn = pg8000.connect(
                host        =   DBEndPoint,
                user        =   DBUserName,
                database    =   DatabaseName,
                password    =   pwd,
                ssl_context =   ssl_context)
            
            return conn
        except Exception as e:
            print ({'Database connection failed!': {'Exception': "Failed due to :{0}".format(str(e))}})
            return None
    
    def execute_sql(connection, query):
        try:
            cursor = connection.cursor()
            cursor.execute(query)
            columns = [str(desc[0]) for desc in cursor.description]
            results = []
            for res in cursor:
                results.append(dict(zip(columns, res)))
            cursor.close()
            retry = False
            return results    
        except Exception as e:
            print ({'Execute SQL failed!': {'Exception': "Failed due to :{0}".format(str(e))}})
            return None
    
    
    def lambda_handler(event, context):
        global connection
        try:
            connection = get_connection(event)
            if connection is None:
                return {'statusCode': 400, "body": "Error in database connection!"}
    
            response = {'statusCode':200, 'body': {
                'db & user': execute_sql(connection, 'SELECT CURRENT_DATABASE(), CURRENT_USER'), \
                'data from tenant_metadata': execute_sql(connection, 'SELECT * FROM tenant_metadata')}}
            return response
        except Exception as e:
            try:
                connection.close()
            except Exception as e:
                connection = None
            return {'statusCode': 400, 'statusDesc': 'Error!', 'body': 'Unhandled error in Lambda Handler.'}
    

  9. Add a certificate file called rds-ca-2019-root.pem into the Lambda project root by downloading it from https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem.
  10. Under Basic settings, edit Timeout to increase the timeout to 29 seconds.
  11. Edit Environment variables to add the following keys and values.

    Key Value
    DBEndPoint Enter the database cluster endpoint URL
    DatabaseName Enter the database name
    RolePrefix assumeRole
    Figure 11: Example of environment variables display

    Figure 11: Example of environment variables display

  12. Configure a new test event with the following JSON document, and save it as testEvent.
    {
      "login_tenant_id": "tenant1",
      "dbuser": "tenant1_dbuser"
    }
    

  13. Choose Test to test the Lambda function with the newly created test event testEvent. You should see status code 200, and the body of the results should contain the data for tenant1.

    Figure 12: The result of running the sts-ti-demo-pgdb-lambda function

    Figure 12: The result of running the sts-ti-demo-pgdb-lambda function

Step 7: Perform negative testing of tenant isolation

You already performed positive tests of tenant isolation during the Lambda function creation steps. However, it’s also important to perform some negative tests to verify the robustness of the tenant isolation controls.

To perform negative tests of tenant isolation

  1. In the Lambda console, navigate to the sts-ti-demo-s3-lambda function. Update the test event to the following, to mimic a scenario where tenant1 attempts to access other tenants’ objects.
    {
      "login_tenant_id": "tenant1",
      "s3_tenant_home": "tenant2_home"
    }
    

  2. Choose Test to test the Lambda function with the updated test event. You should see status code 400, and the body of the results should contain an error message.

    Figure 13: The results of running the sts-ti-demo-s3-lambda function (negative test)

    Figure 13: The results of running the sts-ti-demo-s3-lambda function (negative test)

  3. Navigate to the sts-ti-demo-pgdb-lambda function and update the test event to the following, to mimic a scenario where tenant1 attempts to access other tenants’ data elements.
    {
      "login_tenant_id": "tenant1",
      "dbuser": "tenant2_dbuser"
    }
    

  4. Choose Test to test the Lambda function with the updated test event. You should see status code 400, and the body of the results should contain an error message.

    Figure 14: The results of running the sts-ti-demo-pgdb-lambda function (negative test)

    Figure 14: The results of running the sts-ti-demo-pgdb-lambda function (negative test)

Cleaning up

To de-clutter your environment, remove the roles, policies, Lambda functions, Lambda layers, Amazon S3 prefixes, database users, and the database table that you created as part of this exercise. You can choose to delete the S3 bucket, as well as the Aurora PostgreSQL-Compatible database cluster that we mentioned in the Prerequisites section, to avoid incurring future charges.

Update the security group of the Aurora PostgreSQL-Compatible database cluster to remove the inbound rule that you added to allow a PostgreSQL TCP connection (Port 5432) from anywhere (0.0.0.0/0).

Conclusion

By taking advantage of attribute-based access control (ABAC) in IAM, you can more efficiently implement tenant isolation in SaaS applications. The solution we presented here helps to achieve tenant isolation in Amazon S3 and Aurora PostgreSQL-Compatible databases by using ABAC with the pool model of data partitioning.

If you run into any issues, you can use Amazon CloudWatch and AWS CloudTrail to troubleshoot. If you have feedback about this post, submit comments in the Comments section below.

To learn more, see these AWS Blog and AWS Support articles:

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Ashutosh Upadhyay

Ashutosh works as a Senior Security, Risk, and Compliance Consultant in AWS Professional Services (GFS) and is based in Singapore. Ashutosh started off his career as a developer, and for the past many years has been working in the Security, Risk, and Compliance field. Ashutosh loves to spend his free time learning and playing with new technologies.

Author

Chirantan Saha

Chirantan works as a DevOps Engineer in AWS Professional Services (GFS) and is based in Singapore. Chirantan has 20 years of development and DevOps experience working for large banking, financial services, and insurance organizations. Outside of work, Chirantan enjoys traveling, spending time with family, and helping his children with their science projects.

How to implement SaaS tenant isolation with ABAC and AWS IAM

Post Syndicated from Michael Pelts original https://aws.amazon.com/blogs/security/how-to-implement-saas-tenant-isolation-with-abac-and-aws-iam/

Multi-tenant applications must be architected so that the resources of each tenant are isolated and cannot be accessed by other tenants in the system. AWS Identity and Access Management (IAM) is often a key element in achieving this goal. One of the challenges with using IAM, however, is that the number and complexity of IAM policies you need to support your tenants can grow rapidly and impact the scale and manageability of your isolation model. The attribute-based access control (ABAC) mechanism of IAM provides developers with a way to address this challenge.

In this blog post, we describe and provide detailed examples of how you can use ABAC in IAM to implement tenant isolation in a multi-tenant environment.

Choose an IAM isolation method

IAM makes it possible to implement tenant isolation and scope down permissions in a way that is integrated with other AWS services. By relying on IAM, you can create strong isolation foundations in your system, and reduce the risk of developers unintentionally introducing code that leads to a violation of tenant boundaries. IAM provides an AWS native, non-invasive way for you to achieve isolation for those cases where IAM supports policies that align with your overall isolation model.

There are several methods in IAM that you can use for isolating tenants and restricting access to resources. Choosing the right method for your application depends on several parameters. The number of tenants and the number of role definitions are two important dimensions that you should take into account.

Most applications require multiple role definitions for different user functions. A role definition refers to a minimal set of privileges that users or programmatic components need in order to do their job. For example, business users and data analysts would typically have different set of permissions to allow minimum necessary access to resources that they use.

In software-as-a-service (SaaS) applications, in addition to functional boundaries, there are also boundaries between tenant resources. As a result, the entire set of role definitions exists for each individual tenant. In highly dynamic environments (e.g., collaboration scenarios with cross-tenant access), new role definitions can be added ad-hoc. In such a case, the number of role definitions and their complexity can grow significantly as the system evolves.

There are three main tenant isolation methods in IAM. Let’s briefly review them before focusing on the ABAC in the following sections.

Figure 1: IAM tenant isolation methods

Figure 1: IAM tenant isolation methods

RBAC – Each tenant has a dedicated IAM role or static set of IAM roles that it uses for access to tenant resources. The number of IAM roles in RBAC equals to the number of role definitions multiplied by the number of tenants. RBAC works well when you have a small number of tenants and relatively static policies. You may find it difficult to manage multiple IAM roles as the number of tenants and the complexity of the attached policies grows.

Dynamically generated IAM policies – This method dynamically generates an IAM policy for a tenant according to user identity. Choose this method in highly dynamic environments with changing or frequently added role definitions (e.g., tenant collaboration scenario). You may also choose dynamically generated policies if you have a preference for generating and managing IAM policies by using your code rather than relying on built-in IAM service features. You can find more details about this method in the blog post Isolating SaaS Tenants with Dynamically Generated IAM Policies.

ABAC – This method is suitable for a wide range of SaaS applications, unless your use case requires support for frequently changed or added role definitions, which are easier to manage with dynamically generated IAM policies. Unlike Dynamically generated IAM policies, where you manage and apply policies through a self-managed mechanism, ABAC lets you rely more directly on IAM.

ABAC for tenant isolation

ABAC is achieved by using parameters (attributes) to control tenant access to resources. Using ABAC for tenant isolation results in temporary access to resources, which is restricted according to the caller’s identity and attributes.

One of the key advantages of the ABAC model is that it scales to any number of tenants with a single role. This is achieved by using tags (such as the tenant ID) in IAM polices and a temporary session created specifically for accessing tenant data. The session encapsulates the attributes of the requesting entity (for example, a tenant user). At policy evaluation time, IAM replaces these tags with session attributes.

Another component of ABAC is the assignation of attributes to tenant resources by using special naming conventions or resource tags. The access to a resource is granted when session and resource attributes match (for example, a session with the TenantID: yellow attribute can access a resource that is tagged as TenantID: yellow).

For more information about ABAC in IAM, see What is ABAC for AWS?

ABAC in a typical SaaS architecture

To demonstrate how you can use ABAC in IAM for tenant isolation, we will walk you through an example of a typical microservices-based application. More specifically, we will focus on two microservices that implement a shipment tracking flow in a multi-tenant ecommerce application.

Our sample tenant, Yellow, which has many users in many roles, has exclusive access to shipment data that belongs to this particular tenant. To achieve this, all microservices in the call chain operate in a restricted context that prevents cross-tenant access.

Figure 2: Sample shipment tracking flow in a SaaS application

Figure 2: Sample shipment tracking flow in a SaaS application

Let’s take a closer look at the sequence of events and discuss the implementation in detail.

A shipment tracking request is initiated by an authenticated Tenant Yellow user. The authentication process is left out of the scope of this discussion for the sake of brevity. The user identity expressed in the JSON Web Token (JWT) includes custom claims, one of which is a TenantID. In this example, TenantID equals yellow.

The JWT is delivered from the user’s browser in the HTTP header of the Get Shipment request to the shipment service. The shipment service then authenticates the request and collects the required parameters for getting the shipment estimated time of arrival (ETA). The shipment service makes a GetShippingETA request using the parameters to the tracking service along with the JWT.

The tracking service manages shipment tracking data in a data repository. The repository stores data for all of the tenants, but each shipment record there has an attached TenantID resource tag, for instance yellow, as in our example.

An IAM role attached to the tracking service, called TrackingServiceRole in this example, determines the AWS resources that the microservice can access and the actions it can perform.

Note that TrackingServiceRole itself doesn’t have permissions to access tracking data in the data store. To get access to tracking records, the tracking service temporarily assumes another role called TrackingAccessRole. This role remains valid for a limited period of time, until credentials representing this temporary session expire.

To understand how this works, we need to talk first about the trust relationship between TrackingAccessRole and TrackingServiceRole. The following trust policy lists TrackingServiceRole as a trusted entity.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<account-id>:role/TrackingServiceRole"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<account-id>:role/TrackingServiceRole"
      },
      "Action": "sts:TagSession",
      "Condition": {
        "StringLike": {
          "aws:RequestTag/TenantID": "*"
        }
      }
    }
  ]
}

This policy needs to be associated with TrackingAccessRole. You can do that on the Trust relationships tab of the Role Details page in the IAM console or via the AWS CLI update-assume-role-policy method. That association is what allows the tracking service with the attached TrackingServiceRole role to assume TrackingAccessRole. The policy also allows TrackingServiceRole to attach the TenantID session tag to the temporary sessions it creates.

Session tags are principal tags that you specify when you request a session. This is how you inject variables into the request context for API calls executed during the session. This is what allows IAM policies evaluated in subsequent API calls to reference TenantID with the aws:PrincipalTag context key.

Now let’s talk about TrackingAccessPolicy. It’s an identity policy attached to TrackingAccessRole. This policy makes use of the aws:PrincipalTag/TenantID key to dynamically scope access to a specific tenant.

Later in this post, you can see examples of such data access policies for three different data storage services.

Now the stage is set to see how the tracking service creates a temporary session and injects TenantID into the request context. The following Python function does that by using AWS SDK for Python (Boto3). The function gets the TenantID (extracted from the JWT) and the TrackingAccessRole Amazon Resource Name (ARN) as parameters and returns a scoped Boto3 session object.

import boto3

def create_temp_tenant_session(access_role_arn, session_name, tenant_id, duration_sec):
    """
    Create a temporary session
    :param access_role_arn: The ARN of the role that the caller is assuming
    :param session_name: An identifier for the assumed session
    :param tenant_id: The tenant identifier the session is created for
    :param duration_sec: The duration, in seconds, of the temporary session
    :return: The session object that allows you to create service clients and resources
    """
    sts = boto3.client('sts')
    assume_role_response = sts.assume_role(
        RoleArn=access_role_arn,
        DurationSeconds=duration_sec,
        RoleSessionName=session_name,
        Tags=[
            {
                'Key': 'TenantID',
                'Value': tenant_id
            }
        ]
    )
    session = boto3.Session(aws_access_key_id=assume_role_response['Credentials']['AccessKeyId'],
                    aws_secret_access_key=assume_role_response['Credentials']['SecretAccessKey'],
                    aws_session_token=assume_role_response['Credentials']['SessionToken'])
    return session

Use these parameters to create temporary sessions for a specific tenant with a duration that meets your needs.

access_role_arn – The assumed role with an attached templated policy. The IAM policy must include the aws:PrincipalTag/TenantID tag key.

session_name – The name of the session. Use the role session name to uniquely identify a session. The role session name is used in the ARN of the assumed role principal and included in the AWS CloudTrail logs.

tenant_id – The tenant identifier that describes which tenant the session is created for. For better compatibility with resource names in IAM policies, it’s recommended to generate non-guessable alphanumeric lowercase tenant identifiers.

duration_sec – The duration of your temporary session.

Note: The details of token management can be abstracted away from the application by extracting the token generation into a separate module, as described in the blog post Isolating SaaS Tenants with Dynamically Generated IAM Policies. In that post, the reusable application code for acquiring temporary session tokens is called a Token Vending Machine.

The returned session can be used to instantiate IAM-scoped objects such as a storage service. After the session is returned, any API call performed with the temporary session credentials contains the aws:PrincipalTag/TenantID key-value pair in the request context.

When the tracking service attempts to access tracking data, IAM completes several evaluation steps to determine whether to allow or deny the request. These include evaluation of the principal’s identity-based policy, which is, in this example, represented by TrackingAccessPolicy. It is at this stage that the aws:PrincipalTag/TenantID tag key is replaced with the actual value, policy conditions are resolved, and access is granted to the tenant data.

Common ABAC scenarios

Let’s take a look at some common scenarios with different data storage services. For each example, a diagram is included that illustrates the allowed access to tenant data and how the data is partitioned in the service.

These examples rely on the architecture described earlier and assume that the temporary session context contains a TenantID parameter. We will demonstrate different versions of TrackingAccessPolicy that are applicable to different services. The way aws:PrincipalTag/TenantID is used depends on service-specific IAM features, such as tagging support, policy conditions and ability to parameterize resource ARN with session tags. Examples below illustrate these techniques applied to different services.

Pooled storage isolation with DynamoDB

Many SaaS applications rely on a pooled data partitioning model where data from all tenants is combined into a single table. The tenant identifier is then introduced into each table to identify the items that are associated with each tenant. Figure 3 provides an example of this model.

Figure 3: DynamoDB index-based partitioning

Figure 3: DynamoDB index-based partitioning

In this example, we’ve used Amazon DynamoDB, storing each tenant identifier in the table’s partition key. Now, we can use ABAC and IAM fine-grained access control to implement tenant isolation for the items in this table.

The following TrackingAccessPolicy uses the dynamodb:LeadingKeys condition key to restrict permissions to only the items whose partition key matches the tenant’s identifier as passed in a session tag.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:BatchGetItem",
        "dynamodb:Query"
      ],
      "Resource": [
        "arn:aws:dynamodb:<region>:<account-id>:table/TrackingData"
      ],
      "Condition": {
        "ForAllValues:StringEquals": {
          "dynamodb:LeadingKeys": [
            "${aws:PrincipalTag/TenantID}"
          ]
        }
      }
    }
  ]
}

This example uses the dynamodb:LeadingKeys condition key in the policy to describe how you can control access to tenant resources. You’ll notice that we haven’t bound this policy to any specific tenant. Instead, the policy relies on the aws:PrincipalTag tag to resolve the TenantID parameter at runtime.

This approach means that you can add new tenants without needing to create any new IAM constructs. This reduces the maintenance burden and limits your chances that any IAM quotas will be exceeded.

Siloed storage isolation with Amazon Elasticsearch Service

Let’s look at another example that illustrates how you might implement tenant isolation of Amazon Elasticsearch Service resources. Figure 4 illustrates a silo data partitioning model, where each tenant of your system has a separate Elasticsearch index for each tenant.

Figure 4: Elasticsearch index-per-tenant strategy

Figure 4: Elasticsearch index-per-tenant strategy

You can isolate these tenant resources by creating a parameterized identity policy with the principal TenantID tag as a variable (similar to the one we created for DynamoDB). In the following example, the principal tag is a part of the index name in the policy resource element. At access time, the principal tag is replaced with the tenant identifier from the request context, yielding the Elasticsearch index ARN as a result.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "es:ESHttpGet",
        "es:ESHttpPut"
      ],
      "Resource": [
        "arn:aws:es:<region>:<account-id>:domain/test/${aws:PrincipalTag/TenantID}*/*"
      ]
    }
  ]
}

In the case where you have multiple indices that belong to the same tenant, you can allow access to them by using a wildcard. The preceding policy allows es:ESHttpGet and es:ESHttpPut actions to be taken on documents if the documents belong to an index with a name that matches the pattern.

Important: In order for this to work, the tenant identifier must follow the same naming restrictions as indices.

Although this approach scales the tenant isolation strategy, you need to keep in mind that this solution is constrained by the number of indices your Elasticsearch cluster can support.

Amazon S3 prefix-per-tenant strategy

Amazon Simple Storage Service (Amazon S3) buckets are commonly used as shared object stores with dedicated prefixes for different tenants. For enhanced security, you can optionally use a dedicated customer master key (CMK) per tenant. If you do so, attach a corresponding TenantID resource tag to a CMK.

By using ABAC and IAM, you can make sure that each tenant can only get and decrypt objects in a shared S3 bucket that have the prefixes that correspond to that tenant.

Figure 5: S3 prefix-per-tenant strategy

Figure 5: S3 prefix-per-tenant strategy

In the following policy, the first statement uses the TenantID principal tag in the resource element. The policy grants s3:GetObject permission, but only if the requested object key begins with the tenant’s prefix.

The second statement allows the kms:Decrypt operation on a KMS key that the requested object is encrypted with. The KMS key must have a TenantID resource tag attached to it with a corresponding tenant ID as a value.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::sample-bucket-12345678/${aws:PrincipalTag/TenantID}/*"
    },
    {
      "Effect": "Allow",
      "Action": "kms:Decrypt",
       "Resource": "arn:aws:kms:<region>:<account-id>:key/*",
       "Condition": {
           "StringEquals": {
           "aws:PrincipalTag/TenantID": "${aws:ResourceTag/TenantID}"
        }
      }
    }
  ]
}

Important: In order for this policy to work, the tenant identifier must follow the S3 object key name guidelines.

With the prefix-per-tenant approach, you can support any number of tenants. However, if you choose to use a dedicated customer managed KMS key per tenant, you will be bounded by the number of KMS keys per AWS Region.

Conclusion

The ABAC method combined with IAM provides teams who build SaaS platforms with a compelling model for implementing tenant isolation. By using this dynamic, attribute-driven model, you can scale your IAM isolation policies to any practical number of tenants. This approach also makes it possible for you to rely on IAM to manage, scale, and enforce isolation in a way that’s integrated into your overall tenant identity scheme. You can start experimenting with IAM ABAC by using either the examples in this blog post, or this resource: IAM Tutorial: Define permissions to access AWS resources based on tags.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS IAM forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Michael Pelts

As a Senior Solutions Architect at AWS, Michael works with large ISV customers, helping them create innovative solutions to address their cloud challenges. Michael is passionate about his work, enjoys the creativity that goes into building solutions in the cloud, and derives pleasure from passing on his knowledge.

Author

Oren Reuveni

Oren is a Principal Solutions Architect and member of the SaaS Factory team. He helps guide and assist AWS partners with building their SaaS products on AWS. Oren has over 15 years of experience in the modern IT and Cloud domains. He is passionate about shaping the right dynamics between technology and business.