Tag Archives: SaaS

Top Architecture Blog Posts of 2023

Post Syndicated from Andrea Courtright original https://aws.amazon.com/blogs/architecture/top-architecture-blog-posts-of-2023/

2023 was a rollercoaster year in tech, and we at the AWS Architecture Blog feel so fortunate to have shared in the excitement. As we move into 2024 and all of the new technologies we could see, we want to take a moment to highlight the brightest stars from 2023.

As always, thanks to our readers and to the many talented and hardworking Solutions Architects and other contributors to our blog.

I give you our 2023 cream of the crop!

#10: Build a serverless retail solution for endless aisle on AWS

In this post, Sandeep and Shashank help retailers and their customers alike in this guided approach to finding inventory that doesn’t live on shelves.

Building endless aisle architecture for order processing

Figure 1. Building endless aisle architecture for order processing

Check it out!

#9: Optimizing data with automated intelligent document processing solutions

Who else dreads wading through large amounts of data in multiple formats? Just me? I didn’t think so. Using Amazon AI/ML and content-reading services, Deependra, Anirudha, Bhajandeep, and Senaka have created a solution that is scalable and cost-effective to help you extract the data you need and store it in a format that works for you.

AI-based intelligent document processing engine

Figure 2: AI-based intelligent document processing engine

Check it out!

#8: Disaster Recovery Solutions with AWS managed services, Part 3: Multi-Site Active/Passive

Disaster recovery posts are always popular, and this post by Brent and Dhruv is no exception. Their creative approach in part 3 of this series is most helpful for customers who have business-critical workloads with higher availability requirements.

Warm standby with managed services

Figure 3. Warm standby with managed services

Check it out!

#7: Simulating Kubernetes-workload AZ failures with AWS Fault Injection Simulator

Continuing with the theme of “when bad things happen,” we have Siva, Elamaran, and Re’s post about preparing for workload failures. If resiliency is a concern (and it really should be), the secret is test, test, TEST.

Architecture flow for Microservices to simulate a realistic failure scenario

Figure 4. Architecture flow for Microservices to simulate a realistic failure scenario

Check it out!

#6: Let’s Architect! Designing event-driven architectures

Luca, Laura, Vittorio, and Zamira weren’t content with their four top-10 spots last year – they’re back with some things you definitely need to know about event-driven architectures.

Let's Architect

Figure 5. Let’s Architect artwork

Check it out!

#5: Use a reusable ETL framework in your AWS lake house architecture

As your lake house increases in size and complexity, you could find yourself facing maintenance challenges, and Ashutosh and Prantik have a solution: frameworks! The reusable ETL template with AWS Glue templates might just save you a headache or three.

Reusable ETL framework architecture

Figure 6. Reusable ETL framework architecture

Check it out!

#4: Invoking asynchronous external APIs with AWS Step Functions

It’s possible that AWS’ menagerie of services doesn’t have everything you need to run your organization. (Possible, but not likely; we have a lot of amazing services.) If you are using third-party APIs, then Jorge, Hossam, and Shirisha’s architecture can help you maintain a secure, reliable, and cost-effective relationship among all involved.

Invoking Asynchronous External APIs architecture

Figure 7. Invoking Asynchronous External APIs architecture

Check it out!

#3: Announcing updates to the AWS Well-Architected Framework

The Well-Architected Framework continues to help AWS customers evaluate their architectures against its six pillars. They are constantly striving for improvement, and Haleh’s diligence in keeping us up to date has not gone unnoticed. Thank you, Haleh!

Well-Architected logo

Figure 8. Well-Architected logo

Check it out!

#2: Let’s Architect! Designing architectures for multi-tenancy

The practically award-winning Let’s Architect! series strikes again! This time, Luca, Laura, Vittorio, and Zamira were joined by Federica to discuss multi-tenancy and why that concept is so crucial for SaaS providers.

Let's Architect

Figure 9. Let’s Architect

Check it out!

And finally…

#1: Understand resiliency patterns and trade-offs to architect efficiently in the cloud

Haresh, Lewis, and Bonnie revamped this 2022 post into a masterpiece that completely stole our readers’ hearts and is among the top posts we’ve ever made!

Resilience patterns and trade-offs

Figure 10. Resilience patterns and trade-offs

Check it out!

Bonus! Three older special mentions

These three posts were published before 2023, but we think they deserve another round of applause because you, our readers, keep coming back to them.

Thanks again to everyone for their contributions during a wild year. We hope you’re looking forward to the rest of 2024 as much as we are!

SaaS access control using Amazon Verified Permissions with a per-tenant policy store

Post Syndicated from Manuel Heinkel original https://aws.amazon.com/blogs/security/saas-access-control-using-amazon-verified-permissions-with-a-per-tenant-policy-store/

Access control is essential for multi-tenant software as a service (SaaS) applications. SaaS developers must manage permissions, fine-grained authorization, and isolation.

In this post, we demonstrate how you can use Amazon Verified Permissions for access control in a multi-tenant document management SaaS application using a per-tenant policy store approach. We also describe how to enforce the tenant boundary.

We usually see the following access control needs in multi-tenant SaaS applications:

  • Application developers need to define policies that apply across all tenants.
  • Tenant users need to control who can access their resources.
  • Tenant admins need to manage all resources for a tenant.

Additionally, independent software vendors (ISVs) implement tenant isolation to prevent one tenant from accessing the resources of another tenant. Enforcing tenant boundaries is imperative for SaaS businesses and is one of the foundational topics for SaaS providers.

Verified Permissions is a scalable, fine-grained permissions management and authorization service that helps you build and modernize applications without having to implement authorization logic within the code of your application.

Verified Permissions uses the Cedar language to define policies. A Cedar policy is a statement that declares which principals are explicitly permitted, or explicitly forbidden, to perform an action on a resource. The collection of policies defines the authorization rules for your application. Verified Permissions stores the policies in a policy store. A policy store is a container for policies and templates. You can learn more about Cedar policies from the Using Open Source Cedar to Write and Enforce Custom Authorization Policies blog post.

Before Verified Permissions, you had to implement authorization logic within the code of your application. Now, we’ll show you how Verified Permissions helps remove this undifferentiated heavy lifting in an example application.

Multi-tenant document management SaaS application

The application allows to add, share, access and manage documents. It requires the following access controls:

  • Application developers who can define policies that apply across all tenants.
  • Tenant users who can control who can access their documents.
  • Tenant admins who can manage all documents for a tenant.

Let’s start by describing the application architecture and then dive deeper into the design details.

Application architecture overview

There are two approaches to multi-tenant design in Verified Permissions: a single shared policy store and a per-tenant policy store. You can learn about the considerations, trade-offs and guidance for these approaches in the Verified Permissions user guide.

For the example document management SaaS application, we decided to use the per-tenant policy store approach for the following reasons:

  • Low-effort tenant policies isolation
  • The ability to customize templates and schema per tenant
  • Low-effort tenant off-boarding
  • Per-tenant policy store resource quotas

We decided to accept the following trade-offs:

  • High effort to implement global policies management (because the application use case doesn’t require frequent changes to these policies)
  • Medium effort to implement the authorization flow (because we decided that in this context, the above reasons outweigh implementing a mapping from tenant ID to policy store ID)

Figure 1 shows the document management SaaS application architecture. For simplicity, we omitted the frontend and focused on the backend.

Figure 1: Document management SaaS application architecture

Figure 1: Document management SaaS application architecture

  1. A tenant user signs in to an identity provider such as Amazon Cognito. They get a JSON Web Token (JWT), which they use for API requests. The JWT contains claims such as the user_id, which identifies the tenant user, and the tenant_id, which defines which tenant the user belongs to.
  2. The tenant user makes API requests with the JWT to the application.
  3. Amazon API Gateway verifies the validity of the JWT with the identity provider.
  4. If the JWT is valid, API Gateway forwards the request to the compute provider, in this case an AWS Lambda function, for it to run the business logic.
  5. The Lambda function assumes an AWS Identity and Access Management (IAM) role with an IAM policy that allows access to the Amazon DynamoDB table that provides tenant-to-policy-store mapping. The IAM policy scopes down access such that the Lambda function can only access data for the current tenant_id.
  6. The Lambda function looks up the Verified Permissions policy_store_id for the current request. To do this, it extracts the tenant_id from the JWT. The function then retrieves the policy_store_id from the tenant-to-policy-store mapping table.
  7. The Lambda function assumes another IAM role with an IAM policy that allows access to the Verified Permissions policy store, the document metadata table, and the document store. The IAM policy uses tenant_id and policy_store_id to scope down access.
  8. The Lambda function gets or stores documents metadata in a DynamoDB table. The function uses the metadata for Verified Permissions authorization requests.
  9. Using the information from steps 5 and 6, the Lambda function calls Verified Permissions to make an authorization decision or create Cedar policies.
  10. If authorized, the application can then access or store a document.

Application architecture deep dive

Now that you know the architecture for the use cases, let’s review them in more detail and work backwards from the user experience to the related part of the application architecture. The architecture focuses on permissions management. Accessing and storing the actual document is out of scope.

Define policies that apply across all tenants

The application developer must define global policies that include a basic set of access permissions for all tenants. We use Cedar policies to implement these permissions.

Because we’re using a per-tenant policy store approach, the tenant onboarding process should create these policies for each new tenant. Currently, to update policies, the deployment pipeline should apply changes to all policy stores.

The “Add a document” and “Manage all the documents for a tenant” sections that follow include examples of global policies.

Make sure that a tenant can’t edit the policies of another tenant

The application uses IAM to isolate the resources of one tenant from another. Because we’re using a per-tenant policy store approach we can use IAM to isolate one tenant policy store from another.

Architecture

Figure 2: Tenant isolation

Figure 2: Tenant isolation

  1. A tenant user calls an API endpoint using a valid JWT.
  2. The Lambda function uses AWS Security Token Service (AWS STS) to assume an IAM role with an IAM policy that allows access to the tenant-to-policy-store mapping DynamoDB table. The IAM policy only allows access to the table and the entries that belong to the requesting tenant. When the function assumes the role, it uses tenant_id to scope access to the items whose partition key matches the tenant_id. See the How to implement SaaS tenant isolation with ABAC and AWS IAM blog post for examples of such policies.
  3. The Lambda function uses the user’s tenant_id to get the Verified Permissions policy_store_id.
  4. The Lambda function uses the same mechanism as in step 2 to assume a different IAM role using tenant_id and policy_store_id which only allows access to the tenant policy store.
  5. The Lambda function accesses the tenant policy store.

Add a document

When a user first accesses the application, they don’t own any documents. To add a document, the frontend calls the POST /documents endpoint and supplies a document_name in the request’s body.

Cedar policy

We need a global policy that allows every tenant user to add a new document. The tenant onboarding process creates this policy in the tenant’s policy store.

permit (    
  principal,
  action == DocumentsAPI::Action::"addDocument",
  resource
);

This policy allows any principal to add a document. Because we’re using a per-tenant policy store approach, there’s no need to scope the principal to a tenant.

Architecture

Figure 3: Adding a document

Figure 3: Adding a document

  1. A tenant user calls the POST /documents endpoint to add a document.
  2. The Lambda function uses the user’s tenant_id to get the Verified Permissions policy_store_id.
  3. The Lambda function calls the Verified Permissions policy store to check if the tenant user is authorized to add a document.
  4. After successful authorization, the Lambda function adds a new document to the documents metadata database and uploads the document to the documents storage.

The database structure is described in the following table:

tenant_id (Partition key): String document_id (Sort key): String document_name: String document_owner: String
<TENANT_ID> <DOCUMENT_ID> <DOCUMENT_NAME> <USER_ID>
  • tenant_id: The tenant_id from the JWT claims.
  • document_id: A random identifier for the document, created by the application.
  • document_name: The name of the document supplied with the API request.
  • document_owner: The user who created the document. The value is the user_id from the JWT claims.

Share a document with another user of a tenant

After a tenant user has created one or more documents, they might want to share them with other users of the same tenant. To share a document, the frontend calls the POST /shares endpoint and provides the document_id of the document the user wants to share and the user_id of the receiving user.

Cedar policy

We need a global document owner policy that allows the document owner to manage the document, including sharing. The tenant onboarding process creates this policy in the tenant’s policy store.

permit (    
  principal,
  action,
  resource
) when {
  resource.owner == principal && 
  resource.type == "document"
};

The policy allows principals to perform actions on available resources (the document) when the principal is the document owner. This policy allows the shareDocument action, which we describe next, to share a document.

We also need a share policy that allows the receiving user to access the document. The application creates these policies for each successful share action. We recommend that you use policy templates to define the share policy. Policy templates allow a policy to be defined once and then attached to multiple principals and resources. Policies that use a policy template are called template-linked policies. Updates to the policy template are reflected across the principals and resources that use the template. The tenant onboarding process creates the share policy template in the tenant’s policy store.

We define the share policy template as follows:

permit (    
  principal == ?principal,  
  action == DocumentsAPI::Action::"accessDocument",
  resource == ?resource 
);

The following is an example of a template-linked policy using the share policy template:

permit (    
  principal == DocumentsAPI::User::"<user_id>",
  action == DocumentsAPI::Action::"accessDocument",
  resource == DocumentsAPI::Document::"<document_id>" 
);

The policy includes the user_id of the receiving user (principal) and the document_id of the document (resource).

Architecture

Figure 4: Sharing a document

Figure 4: Sharing a document

  1. A tenant user calls the POST /shares endpoint to share a document.
  2. The Lambda function uses the user’s tenant_id to get the Verified Permissions policy_store_id and policy template IDs for each action from the DynamoDB table that stores the tenant to policy store mapping. In this case the function needs to use the share_policy_template_id.
  3. The function queries the documents metadata DynamoDB table to retrieve the document_owner attribute for the document the user wants to share.
  4. The Lambda function calls Verified Permissions to check if the user is authorized to share the document. The request context uses the user_id from the JWT claims as the principal, shareDocument as the action, and the document_id as the resource. The document entity includes the document_owner attribute, which came from the documents metadata DynamoDB table.
  5. If the user is authorized to share the resource, the function creates a new template-linked share policy in the tenant’s policy store. This policy includes the user_id of the receiving user as the principal and the document_id as the resource.

Access a shared document

After a document has been shared, the receiving user wants to access the document. To access the document, the frontend calls the GET /documents endpoint and provides the document_id of the document the user wants to access.

Cedar policy

As shown in the previous section, during the sharing process, the application creates a template-linked share policy that allows the receiving user to access the document. Verified Permissions evaluates this policy when the user tries to access the document.

Architecture

Figure 5: Accessing a shared document

Figure 5: Accessing a shared document

  1. A tenant user calls the GET /documents endpoint to access the document.
  2. The Lambda function uses the user’s tenant_id to get the Verified Permissions policy_store_id.
  3. The Lambda function calls Verified Permissions to check if the user is authorized to access the document. The request context uses the user_id from the JWT claims as the principal, accessDocument as the action, and the document_id as the resource.

Manage all the documents for a tenant

When a customer signs up for a SaaS application, the application creates the tenant admin user. The tenant admin must have permissions to perform all actions on all documents for the tenant.

Cedar policy

We need a global policy that allows tenant admins to manage all documents. The tenant onboarding process creates this policy in the tenant’s policy store.

permit (    
  principal in DocumentsAPI::Group::"<admin_group_id>”,
  action,
  resource
);

This policy allows every member of the <admin_group_id> group to perform any action on any document.

Architecture

Figure 6: Managing documents

Figure 6: Managing documents

  1. A tenant admin calls the POST /documents endpoint to manage a document. 
  2. The Lambda function uses the user’s tenant_id to get the Verified Permissions policy_store_id.
  3. The Lambda function calls Verified Permissions to check if the user is authorized to manage the document.

Conclusion

In this blog post, we showed you how Amazon Verified Permissions helps to implement fine-grained authorization decisions in a multi-tenant SaaS application. You saw how to apply the per-tenant policy store approach to the application architecture. See the Verified Permissions user guide for how to choose between using a per-tenant policy store or one shared policy store. To learn more, visit the Amazon Verified Permissions documentation and workshop.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Verified Permissions re:Post or contact AWS Support.

Manuel Heinkel

Manuel Heinkel

Manuel is a Solutions Architect at AWS, working with software companies in Germany to build innovative and secure applications in the cloud. He supports customers in solving business challenges and achieving success with AWS. Manuel has a track record of diving deep into security and SaaS topics. Outside of work, he enjoys spending time with his family and exploring the mountains.

Alex Pulver

Alex Pulver

Alex is a Principal Solutions Architect at AWS. He works with customers to help design processes and solutions for their business needs. His current areas of interest are product engineering, developer experience, and platform strategy. He’s the creator of Application Design Framework, which aims to align business and technology, reduce rework, and enable evolutionary architecture.

How to improve cross-account access for SaaS applications accessing customer accounts

Post Syndicated from Ashwin Phadke original https://aws.amazon.com/blogs/security/how-to-improve-cross-account-access-for-saas-applications-accessing-customer-accounts/

Several independent software vendors (ISVs) and software as a service (SaaS) providers need to access their customers’ Amazon Web Services (AWS) accounts, especially if the SaaS product accesses data from customer environments. SaaS providers have adopted multiple variations of this third-party access scenario. In some cases, the providers ask the customer for an access key and a secret key, which is not recommended because these are long-term user credentials and require processes to be built for periodic rotation. However, in most cases, the provider has an integration guide with specific details on creating a cross-account AWS Identity and Access Management (IAM) role.

In all these scenarios, as a SaaS vendor, you should add the necessary protections to your SaaS implementation. At AWS, security is the top priority and we recommend that customers follow best practices and incorporate security in their product design. In this blog post intended for SaaS providers, I describe three ways to improve your cross-account access implementation for your products.

Why is this important?

As a security specialist, I’ve worked with multiple ISV customers on improving the security of their products, specifically on this third-party cross-account access scenario. Consumers of your SaaS products don’t want to give more access permissions than are necessary for the product’s proper functioning. At the same time, you should maintain and provide a secure SaaS product to protect your customers’ and your own AWS accounts from unauthorized access or privilege escalations.

Let’s consider a hypothetical scenario with a simple SaaS implementation where a customer is planning to use a SaaS product. In Figure 1, you can see that the SaaS product has multiple different components performing separate functions, for example, a SaaS product with separate components performing compute analysis, storage analysis, and log analysis. The SaaS provider asks the customer to provide IAM user credentials and uses those in their product to access customer resources. Let’s look at three techniques for improving the cross-account access for this scenario. Each technique builds on the previous one, so you could adopt an incremental approach to implement these techniques.

Figure 1: SaaS architecture using customer IAM user credentials

Figure 1: SaaS architecture using customer IAM user credentials

Technique 1 – Using IAM roles and an external ID

As stated previously, IAM user credentials are long-term, so customers would need to implement processes to rotate these periodically and share them with the ISV.

As a better option, SaaS product components can use IAM roles, which provide short-term credentials to the component assuming the role. These credentials need to be refreshed depending on the role’s session duration setting (the default is 1 hour) to continue accessing the resources. IAM roles also provide an advantage for auditing purposes because each time an IAM principal assumes a role, a new session is created, and this can be used to identify and audit activity for separate sessions.

When using IAM roles for third-party access, an important consideration is the confused deputy problem, where an unauthorized entity could coerce the product components into performing an action against another customers’ resources. To mitigate this problem, a highly recommended approach is to use the external ID parameter when assuming roles in customers’ accounts. It’s important and recommended that you generate these external ID parameters to make sure they’re unique for each of your customers, for example, using a customer ID or similar attribute. For external ID character restrictions, see the IAM quotas page. Your customers will use this external ID in their IAM role’s trust policy, and your product components will pass this as a parameter in all AssumeRole API calls to customer environments. An example of the trust policy principal and condition blocks for the role to be assumed in the customer’s account follows:

    "Principal": {"AWS": "<SaaS Provider’s AWS account ID>"},
    "Condition": {"StringEquals": {"sts:ExternalId": "<Unique ID Assigned by SaaS Provider>"}}
Figure 2: SaaS architecture using an IAM role and external ID

Figure 2: SaaS architecture using an IAM role and external ID

Technique 2 – Using least-privilege IAM policies and role chaining

As an IAM best practice, we recommend that an IAM role should only have the minimum set of permissions as required to perform its functions. When your customers create an IAM role in Technique 1, they might inadvertently provide more permissions than necessary to use your product. The role could have permissions associated with multiple AWS services and might become overly permissive. If you provide granular permissions for separate AWS services, you might reach the policy size quota or policies per role quota. See IAM quotas for more information. That’s why, in addition to Technique 1, we recommend that each component have a separate IAM role in the customer’s account with only the minimum permissions required for its functions.

As a part of your integration guide to the customer, you should ask them to create appropriate IAM policies for these IAM roles. There needs to be a clear separation of duties and least privilege access for the product components. For example, an account-monitoring SaaS provider might use a separate IAM role for Amazon Elastic Compute Cloud (Amazon EC2) monitoring and another one for AWS CloudTrail monitoring. Your components will also use separate IAM roles in your own AWS account. However, you might want to provide a single integration IAM role to customers to establish the trust relationship with each component role in their account. In effect, you will be using the concept of role chaining to access your customer’s accounts. The auditing mechanisms on the customer’s end will only display the integration IAM role sessions.

When using role chaining, you must be aware of certain caveats and limitations. Your components will each have separate roles: Role A, which will assume the integration role (Role B), and then use the Role B credentials to assume the customer role (Role C) in customer’s accounts. You need to properly define the correct permissions for each of these roles, because the permissions of the previous role aren’t passed while assuming the role. Optionally, you can pass an IAM policy document known as a session policy as a parameter while assuming the role, and the effective permissions will be a logical intersection of the passed policy and the attached permissions for the role. To learn more about these session policies, see session policies.

Another consideration of using role chaining is that it limits your AWS Command Line Interface (AWS CLI) or AWS API role session duration to a maximum of one hour. This means that you must track the sessions and perform credential refresh actions every hour to continue accessing the resources.

Figure 3: SaaS architecture with role chaining

Figure 3: SaaS architecture with role chaining

Technique 3 – Using role tags and session tags for attribute-based access control

When you create your IAM roles for role chaining, you define which entity can assume the role. You will need to add each component-specific IAM role to the integration role’s trust relationship. As the number of components within your product increases, you might reach the maximum length of the role trust policy. See IAM quotas for more information.

That’s why, in addition to the above two techniques, we recommend using attribute-based access control (ABAC), which is an authorization strategy that defines permissions based on tag attributes. You should tag all the component IAM roles with role tags and use these role tags as conditions in the trust policy for the integration role as shown in the following example. Optionally, you could also include instructions in the product integration guide for tagging customers’ IAM roles with certain role tags and modify the IAM policy of the integration role to allow it to assume only roles with those role tags. This helps in reducing IAM policy length and minimizing the risk of reaching the IAM quota.

"Condition": {
     "StringEquals": {"iam:ResourceTag/<Product>": "<ExampleSaaSProduct>"}

Another consideration for improving the auditing and traceability for your product is IAM role session tags. These could be helpful if you use CloudTrail log events for alerting on specific role sessions. If your SaaS product also operates on CloudTrail logs, you could use these session tags to identify the different sessions from your product. As opposed to role tags, which are tags attached to an IAM role, session tags are key-value pair attributes that you pass when you assume an IAM role. These can be used to identify a session and further control or restrict access to resources based on the tags. Session tags can also be used along with role chaining. When you use session tags with role chaining, you can set the keys as transitive to make sure that you pass them to subsequent sessions. CloudTrail log events for these role sessions will contain the session tags, transitive tags, and role (also called principal) tags.

Conclusion

In this post, we discussed three incremental techniques that build on each other and are important for SaaS providers to improve security and access control while implementing cross-account access to their customers. As a SaaS provider, it’s important to verify that your product adheres to security best practices. When you improve security for your product, you’re also improving security for your customers.

To see more tutorials about cross-account access concepts, visit the AWS documentation on IAM Roles, ABAC, and session tags.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Identity and Access Management re:Post or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Ashwin Phadke

Ashwin Phadke

Ashwin is a Sr. Solutions Architect, working with large enterprises and ISV customers to build highly available, scalable, and secure applications, and to help them successfully navigate through their cloud journey. He is passionate about information security and enjoys working on creative solutions for customers’ security challenges.

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Post Syndicated from Tom Romano original https://aws.amazon.com/blogs/big-data/empower-your-jira-data-in-a-data-lake-with-amazon-appflow-and-aws-glue/

In the world of software engineering and development, organizations use project management tools like Atlassian Jira Cloud. Managing projects with Jira leads to rich datasets, which can provide historical and predictive insights about project and development efforts.

Although Jira Cloud provides reporting capability, loading this data into a data lake will facilitate enrichment with other business data, as well as support the use of business intelligence (BI) tools and artificial intelligence (AI) and machine learning (ML) applications. Companies often take a data lake approach to their analytics, bringing data from many different systems into one place to simplify how the analytics are done.

This post shows you how to use Amazon AppFlow and AWS Glue to create a fully automated data ingestion pipeline that will synchronize your Jira data into your data lake. Amazon AppFlow provides software as a service (SaaS) integration with Jira Cloud to load the data into your AWS account. AWS Glue is a serverless data discovery, load, and transformation service that will prepare data for consumption in BI and AI/ML activities. Additionally, this post strives to achieve a low-code and serverless solution for operational efficiency and cost optimization, and the solution supports incremental loading for cost optimization.

Solution overview

This solution uses Amazon AppFlow to retrieve data from the Jira Cloud. The data is synchronized to an Amazon Simple Storage Service (Amazon S3) bucket using an initial full download and subsequent incremental downloads of changes. When new data arrives in the S3 bucket, an AWS Step Functions workflow is triggered that orchestrates extract, transform, and load (ETL) activities using AWS Glue crawlers and AWS Glue DataBrew. The data is then available in the AWS Glue Data Catalog and can be queried by services such as Amazon Athena, Amazon QuickSight, and Amazon Redshift Spectrum. The solution is completely automated and serverless, resulting in low operational overhead. When this setup is complete, your Jira data will be automatically ingested and kept up to date in your data lake!

The following diagram illustrates the solution architecture.

The Jira Appflow Architecture is shown. The Jira Cloud data is retrieved by Amazon AppFlow and is stored in Amazon S3. This triggers an Amazon EventBridge event that runs an AWS Step Functions workflow. The workflow uses AWS Glue to catalog and transform the data, The data is then queried with QuickSight.

The Step Functions workflow orchestrates the following ETL activities, resulting in two tables:

  • An AWS Glue crawler collects all downloads into a single AWS Glue table named jira_raw. This table is comprised of a mix of full and incremental downloads from Jira, with many versions of the same records representing changes over time.
  • A DataBrew job prepares the data for reporting by unpacking key-value pairs in the fields, as well as removing depreciated records as they are updated in subsequent change data captures. This reporting-ready data will available in an AWS Glue table named jira_data.

The following figure shows the Step Functions workflow.

A diagram represents the AWS Step Functions workflow. It contains the steps to run an AWS Crawler, wait for it's completion, and then run a AWS Glue DataBrew data transformation job.

Prerequisites

This solution requires the following:

  • Administrative access to your Jira Cloud instance, and an associated Jira Cloud developer account.
  • An AWS account and a login with access to the AWS Management Console. Your login will need AWS Identity and Access Management (IAM) permissions to create and access the resources in your AWS account.
  • Basic knowledge of AWS and working knowledge of Jira administration.

Configure the Jira Instance

After logging in to your Jira Cloud instance, you establish a Jira project with associated epics and issues to download into a data lake. If you’re starting with a new Jira instance, it helps to have at least one project with a sampling of epics and issues for the initial data download, because it allows you to create an initial dataset without errors or missing fields. Note that you may have multiple projects as well.

An image show a Jira Cloud example, with several issues arranged in a Kansan board.

After you have established your Jira project and populated it with epics and issues, ensure you also have access to the Jira developer portal. In later steps, you use this developer portal to establish authentication and permissions for the Amazon AppFlow connection.

Provision resources with AWS CloudFormation

For the initial setup, you launch an AWS CloudFormation stack to create an S3 bucket to store data, IAM roles for data access, and the AWS Glue crawler and Data Catalog components. Complete the following steps:

  1. Sign in to your AWS account.
  2. Click Launch Stack:
  3. For Stack name, enter a name for the stack (the default is aws-blog-jira-datalake-with-AppFlow).
  4. For GlueDatabaseName, enter a unique name for the Data Catalog database to hold the Jira data table metadata (the default is jiralake).
  5. For InitialRunFlag, choose Setup. This mode will scan all data and disable the change data capture (CDC) features of the stack. (Because this is the initial load, the stack needs an initial data load before you configure CDC in later steps.)
  6. Under Capabilities and transforms, select the acknowledgement check boxes to allow IAM resources to be created within your AWS account.
  7. Review the parameters and choose Create stack to deploy the CloudFormation stack. This process will take around 5–10 minutes to complete.
    An image depicts the Amazon CloudFormation configuration steps, including setting a stack name, setting parameters to "jiralake" and "Setup" mode, and checking all IAM capabilities requested.
  8. After the stack is deployed, review the Outputs tab for the stack and collect the following values to use when you set up Amazon AppFlow:
    • Amazon AppFlow destination bucket (o01AppFlowBucket)
    • Amazon AppFlow destination bucket path (o02AppFlowPath)
    • Role for Amazon AppFlow Jira connector (o03AppFlowRole)
      An image demonstrating the Amazon Cloudformation "Outputs" tab, highlighting the values to add to the Amazon AppFlow configuration.

Configure Jira Cloud

Next, you configure your Jira Cloud instance for access by Amazon AppFlow. For full instructions, refer to Jira Cloud connector for Amazon AppFlow. The following steps summarize these instructions and discuss the specific configuration to enable OAuth in the Jira Cloud:

  1. Open the Jira developer portal.
  2. Create the OAuth 2 integration from the developer application console by choosing Create an OAuth 2.0 Integration. This will provide a login mechanism for AppFlow.
  3. Enable fine-grained permissions. See Recommended scopes for the permission settings to grant AppFlow appropriate access to your Jira instance.
  4. Add the following permission scopes to your OAuth app:
    1. manage:jira-configuration
    2. read:field-configuration:jira
  5. Under Authorization, set the Call Back URL to return to Amazon AppFlow with the URL https://us-east-1.console.aws.amazon.com/AppFlow/oauth.
  6. Under Settings, note the client ID and secret to use in later steps to set up authentication from Amazon AppFlow.

Create the Amazon AppFlow Jira Cloud connection

In this step, you configure Amazon AppFlow to run a one-time full data fetch of all your data, establishing the initial data lake:

  1. On the Amazon AppFlow console, choose Connectors in the navigation pane.
  2. Search for the Jira Cloud connector.
  3. Choose Create flow on the connector tile to create the connection to your Jira instance.
    An image of Amazon AppFlor, showing the search for the "Jira Cloud" connector.
  4. For Flow name, enter a name for the flow (for example, JiraLakeFlow).
  5. Leave the Data encryption setting as the default.
  6. Choose Next.
    The Amazon AppFlow Jira connector configuration, showing the Flow name set to "JiraLakeFlow" and clicking the "next" button.
  7. For Source name, keep the default of Jira Cloud.
  8. Choose Create new connection under Jira Cloud connection.
  9. In the Connect to Jira Cloud section, enter the values for Client ID, Client secret, and Jira Cloud Site that you collected earlier. This provides the authentication from AppFlow to Jira Cloud.
  10. For Connection Name, enter a connection name (for example, JiraLakeCloudConnection).
  11. Choose Connect. You will be prompted to allow your OAuth app to access your Atlassian account to verify authentication.
    An image of the Amazon AppFlow conflagration, reflecting the completion of the prior steps.
  12. In the Authorize App window that pops up, choose Accept.
  13. With the connection created, return to the Configure flow section on the Amazon AppFlow console.
  14. For API version, choose V2 to use the latest Jira query API.
  15. For Jira Cloud object, choose Issue to query and download all issues and associated details.
    An image of the Amazon AppFlow configuration, reflecting the completion of the prior steps.
  16. For Destination Name in the Destination Details section, choose Amazon S3.
  17. For Bucket details, choose the S3 bucket name that matches the Amazon AppFlow destination bucket value that you collected from the outputs of the CloudFormation stack.
  18. Enter the Amazon AppFlow destination bucket path to complete the full S3 path. This will send the Jira data to the S3 bucket created by the CloudFormation script.
  19. Leave Catalog your data in the AWS Glue Data Catalog unselected. The CloudFormation script uses an AWS Glue crawler to update the Data Catalog in a different manner, grouping all the downloads into a common table, so we disable the update here.
  20. For File format settings, select Parquet format and select Preserve source data types in Parquet output. Parquet is a columnar format to optimize subsequent querying.
  21. Select Add a timestamp to the file name for Filename preference. This will allow you to easily find data files downloaded at a specific date and time.
    An image of the Amazon AppFlow configuration, reflecting the completion of the prior steps.
  22. For now, select Run on Demand for the Flow trigger to run the full load flow manually. You will schedule downloads in a later step when implementing CDC.
  23. Choose Next.
    An image of the Amazon AppFlow Flow Trigger configuration, reflecting the completion of the prior steps.
  24. On the Map data fields page, select Manually map fields.
  25. For Source to destination field mapping, choose the drop-down box under Source field name and select Map all fields directly. This will bring down all fields as they are received, because we will instead implement data preparation in later steps.
    An image of the Amazon AppFlow configuration, reflecting the completion of steps 24 & 25.
  26. Under Partition and aggregation settings, you can set up the partitions in a way that works for your use case. For this example, we use a daily partition, so select Date and time and choose Daily.
  27. For Aggregation settings, leave it as the default of Don’t aggregate.
  28. Choose Next.
    An image of the Amazon AppFlow configuration, reflecting the completion of steps 26-28.
  29. On the Add filters page, you can create filters to only download specific data. For this example, you download all the data, so choose Next.
  30. Review and choose Create flow.
  31. When the flow is created, choose Run flow to start the initial data seeding. After some time, you should receive a banner indicating the run finished successfully.
    An image of the Amazon AppFlow configuration, reflecting the completion of step 31.

Review seed data

At this stage in the process, you now have data in your S3 environment. When new data files are created in the S3 bucket, it will automatically run an AWS Glue crawler to catalog the new data. You can see if it’s complete by reviewing the Step Functions state machine for a Succeeded run status. There is a link to the state machine on the CloudFormation stack’s Resources tab, which will redirect you to the Step Functions state machine.

A image showing the CloudFormation resources tab of the stack, with a link to the AWS Step Functions workflow.

When the state machine is complete, it’s time to review the raw Jira data with Athena. The database is as you specified in the CloudFormation stack (jiralake by default), and the table name is jira_raw. If you kept the default AWS Glue database name of jiralake, the Athena SQL is as follows:

SELECT * FROM "jiralake"."jira_raw" limit 10;

If you explore the data, you’ll notice that most of the data you would want to work with is actually packed into a column called fields. This means the data is not available as columns in your Athena queries, making it harder to select, filter, and sort individual fields within an Athena SQL query. This will be addressed in the next steps.

An image demonstrating the Amazon Athena query SELECT * FROM "jiralake"."jira_raw" limit 10;

Set up CDC and unpack the fields columns

To add the ongoing CDC and reformat the data for analytics, we introduce a DataBrew job to transform the data and filter to the most recent version of each record as changes come in. You can do this by updating the CloudFormation stack with a flag that includes the CDC and data transformation steps.

  1. On the AWS CloudFormation console, return to the stack.
  2. Choose Update.
  3. Select Use current template and choose Next.
    An image showing Amazon CloudFormation, with steps 1-3 complete.
  4. For SetupOrCDC, choose CDC, then choose Next. This will enable both the CDC steps and the data transformation steps for the Jira data.
    An image showing Amazon CloudFormation, with step 4 complete.
  5. Continue choosing Next until you reach the Review section.
  6. Select I acknowledge that AWS CloudFormation might create IAM resources, then choose Submit.
    An image showing Amazon CloudFormation, with step 5-6 complete.
  7. Return to the Amazon AppFlow console and open your flow.
  8. On the Actions menu, choose Edit flow. We will now edit the flow trigger to run an incremental load on a periodic basis.
  9. Select Run flow on schedule.
  10. Configure the desired repeats, as well as start time and date. For this example, we choose Daily for Repeats and enter 1 for the number of days you’ll have the flow trigger. For Starting at, enter 01:00.
  11. Select Incremental transfer for Transfer mode.
  12. Choose Updated on the drop-down menu so that changes will be captured based on when the records were updated.
  13. Choose Save. With these settings in our example, the run will happen nightly at 1:00 AM.
    An image showing the Flow Trigger, with incremental transfer selected.

Review the analytics data

When the next incremental load occurs that results in new data, the Step Functions workflow will start the DataBrew job and populate a new staged analytical data table named jira_data in your Data Catalog database. If you don’t want to wait, you can trigger the Step Functions workflow manually.

The DataBrew job performs data transformation and filtering tasks. The job unpacks the key-values from the Jira JSON data and the raw Jira data, resulting in a tabular data schema that facilitates use with BI and AI/ML tools. As Jira items are changed, the changed item’s data is resent, resulting in multiple versions of an item in the raw data feed. The DataBrew job filters the raw data feed so that the resulting data table only contains the most recent version of each item. You could enhance this DataBrew job to further customize the data for your needs, such as renaming the generic Jira custom field names to reflect their business meaning.

When the Step Functions workflow is complete, we can query the data in Athena again using the following query:

SELECT * FROM "jiralake"."jira_data" limit 10;

You can see that in our transformed jira_data table, the nested JSON fields are broken out into their own columns for each field. You will also notice that we’ve filtered out obsolete records that have been superseded by more recent record updates in later data loads so the data is fresh. If you want to rename custom fields, remove columns, or restructure what comes out of the nested JSON, you can modify the DataBrew recipe to accomplish this. At this point, the data is ready to be used by your analytics tools, such as Amazon QuickSight.

An image demonstrating the Amazon Athena query SELECT * FROM "jiralake"."jira_data" limit 10;

Clean up

If you would like to discontinue this solution, you can remove it with the following steps:

  1. On the Amazon AppFlow console, deactivate the flow for Jira, and optionally delete it.
  2. On the Amazon S3 console, select the S3 bucket for the stack, and empty the bucket to delete the existing data.
  3. On the AWS CloudFormation console, delete the CloudFormation stack that you deployed.

Conclusion

In this post, we created a serverless incremental data load process for Jira that will synchronize data while handling custom fields using Amazon AppFlow, AWS Glue, and Step Functions. The approach uses Amazon AppFlow to incrementally load the data into Amazon S3. We then use AWS Glue and Step Functions to manage the extraction of the Jira custom fields and load them in a format to be queried by analytics services such as Athena, QuickSight, or Redshift Spectrum, or AI/ML services like Amazon SageMaker.

To learn more about AWS Glue and DataBrew, refer to Getting started with AWS Glue DataBrew. With DataBrew, you can take the sample data transformation in this project and customize the output to meet your specific needs. This could include renaming columns, creating additional fields, and more.

To learn more about Amazon AppFlow, refer to Getting started with Amazon AppFlow. Note that Amazon AppFlow supports integrations with many SaaS applications in addition to the Jira Cloud.

To learn more about orchestrating flows with Step Functions, see Create a Serverless Workflow with AWS Step Functions and AWS Lambda. The workflow could be enhanced to load the data into a data warehouse, such as Amazon Redshift, or trigger a refresh of a QuickSight dataset for analytics and reporting.

In future posts, we will cover how to unnest parent-child relationships within the Jira data using Athena and how to visualize the data using QuickSight.


About the Authors

Tom Romano is a Sr. Solutions Architect for AWS World Wide Public Sector from Tampa, FL, and assists GovTech and EdTech customers as they create new solutions that are cloud native, event driven, and serverless. He is an enthusiastic Python programmer for both application development and data analytics, and is an Analytics Specialist. In his free time, Tom flies remote control model airplanes and enjoys vacationing with his family around Florida and the Caribbean.

Shane Thompson is a Sr. Solutions Architect based out of San Luis Obispo, California, working with AWS Startups. He works with customers who use AI/ML in their business model and is passionate about democratizing AI/ML so that all customers can benefit from it. In his free time, Shane loves to spend time with his family and travel around the world.

Scan and secure Atlassian with Cloudflare CASB

Post Syndicated from Alex Dunbrack original https://blog.cloudflare.com/scan-atlassian-casb/

Scan and secure Atlassian with Cloudflare CASB

Scan and secure Atlassian with Cloudflare CASB

As part of Security Week, two new integrations are coming to Cloudflare CASB, one for Atlassian Confluence and the other for Atlassian Jira.

We’re excited to launch support for these two new SaaS applications (in addition to those we already support) given the reliance that we’ve seen organizations from around the world place in them for streamlined, end-to-end project management.

Let’s dive into what Cloudflare Zero Trust customers can expect from these new integrations.

CASB: Security for your SaaS apps

First, a quick recap. CASB, or Cloud Access Security Broker, is one of Cloudflare’s newer offerings, released last September to provide security operators – CISOs and security engineers – clear visibility and administrative control over the security of their SaaS apps.

Whether it’s Google Workspace, Microsoft 365, Slack, Salesforce, Box, GitHub, or Atlassian (whew!), CASB can easily connect and scan these apps for critical security issues, and provide users an exhaustive list of identified problems, organized for triage.

Scan and secure Atlassian with Cloudflare CASB

Scan Confluence with Cloudflare CASB

Scan and secure Atlassian with Cloudflare CASB

Over time, Atlassian Confluence has become the go-to collaboration platform for teams to create, organize, and share content, such as documents, notes, and meeting minutes. However, from a security perspective, Confluence’s flexibility and wide compatibility with third-party applications can pose a security risk if not properly configured and monitored.

With this new integration, IT and security teams can begin scanning for Atlassian- and Confluence-specific security issues that may be leaving sensitive corporate data at risk. Customers of CASB using Confluence Cloud can expect to identify issues like publicly shared content, unauthorized access, and other vulnerabilities that could be exploited by bad actors.

By providing this additional layer of SaaS security, Cloudflare CASB can help organizations better protect their sensitive data while still leveraging the collaborative power of Confluence.

Scan Jira with Cloudflare CASB

Scan and secure Atlassian with Cloudflare CASB

A mainstay project management tool used to track tasks, issues, and progress on projects, Atlassian Jira has become an essential part of the software development process for teams of all sizes. At the same time, this also means that Jira has become a rich target for those looking to exploit and gain access to sensitive data.

With Cloudflare CASB, security teams can now easily identify security issues that could leave employees and sensitive business data vulnerable to compromise. Compatible with Jira Cloud accounts, Identified issues can range from flagging user and third-party app access issues, such as account misuse and users not following best practices, to identification of files that could be potentially overshared and worth deeper investigation.

By providing security admins with a single view to see security issues across their entire SaaS footprint, now including Jira and Confluence, Cloudflare CASB makes it easier for security teams to stay up-to-date with potential security risks.

Getting started

With the addition of Jira and Confluence to the growing list of CASB integrations, we’re making our products as widely compatible as possible so that organizations can continue placing their trust and confidence in us to help keep them secure.

Today, Cloudflare CASB supports integrations with Google Workspace, Microsoft 365, Slack, Salesforce, Box, GitHub, Jira, and Confluence, with a growing list of other critical applications on their way, so if there’s one in particular you’d like to see soon, let us know!

For those not already using Cloudflare Zero Trust, don’t hesitate to get started today – see the platform yourself with 50 free seats by signing up here, then get in touch with our team here to learn more about how Cloudflare CASB can help your organization lock down its SaaS apps.

New: Scan Salesforce and Box for security issues

Post Syndicated from Alex Dunbrack original https://blog.cloudflare.com/casb-adds-salesforce-and-box-integrations/

New: Scan Salesforce and Box for security issues

New: Scan Salesforce and Box for security issues

Today, we’re sharing the release of two new SaaS integrations for Cloudflare CASB – Salesforce and Box – in order to help CIOs, IT leaders, and security admins swiftly identify looming security issues present across the exact type of tools housing this business-critical data.

Recap: What is Cloudflare CASB?

Released in September, Cloudflare’s API CASB has already proven to organizations from around the world that security risks – like insecure settings and inappropriate file sharing – can often exist across the friendly SaaS apps we all know and love, and indeed pose a threat. By giving operators a comprehensive view of the issues plaguing their SaaS environments, Cloudflare CASB has allowed them to effortlessly remediate problems in a timely manner before they can be leveraged against them.

But as both we and other forward-thinking administrators have come to realize, it’s not always Microsoft 365, Google Workspace, and business chat tools like Slack that contain an organization’s most sensitive information.

Scan Salesforce with Cloudflare CASB

The first Software-as-a-Service. Salesforce, the sprawling, intricate, hard-to-contain Customer Relationship Management (CRM) platform, gives workforces a flexible hub from which they can do just as the software describes: manage customer relationships. Whether it be tracking deals and selling opportunities, managing customer conversations, or storing contractual agreements, Salesforce has truly become the ubiquitous solution for organizations looking for a way to manage every customer-facing interaction they have.

This reliance, however, also makes Salesforce a business data goldmine for bad actors.

New: Scan Salesforce and Box for security issues

With CASB’s new integration for Salesforce, IT and security operators will be able to quickly connect their environments and scan them for the kind of issues putting their sensitive business data at risk. Spot uploaded files that have been shared publicly with anyone who has the link. Identify default permissions that give employees access to records that should be need-to-know only. You can even see employees who are sending out emails as other Salesforce users!

Using this new integration, we’re excited to help close the security visibility gap for yet another SaaS app serving as the lifeblood for teams out in the field making business happen.

Scan Box with Cloudflare CASB

Box is the leading Content Cloud that enables organizations to accelerate business processes, power workplace collaboration, and protect their most valuable information, all while working with a best-of-breed enterprise IT stack like Cloudflare.

A platform used to store everything – from contracts and financials to product roadmaps and employee records – Box has given collaborative organizations a single place to convene and share information that, in a growing remote-first world, has no better place to be stored.

So where are disgruntled employees and people with malicious intent going to look when they want to unveil private business files?

New: Scan Salesforce and Box for security issues

With Cloudflare CASB’s new integration for Box, security and IT teams alike can now link their admin accounts and scan them for under-the-radar security issues leaving them prone to compromise and data exfiltration. In addition to Box’s built-in content and collaboration security, Cloudflare CASB gives you another added layer of protection where you can catch files and folders shared publicly or with users outside your organization. By providing security admins with a single view to see employees who aren’t following security policies, we make it harder for bad actors to get inside and do damage.

With Cloudflare’s status as an official Box Technology Partner, we’re looking forward to offering both Cloudflare and Box users a robust, yet easy-to-use toolset that can help stop pressing, real-world data security incidents right in their tracks.

“Organizations today need products that are inherently secure to support employees working from anywhere,” said Areg Alimian, Head of Security Products at Box. “At Box, we continuously strive to improve our integrations with third-party apps so that it’s easier than ever for customers to use Box alongside best-in-class solutions. With today’s integration with Cloudflare CASB, we enable our joint customers to have a single pane of glass view allowing them to consistently enforce security policies and protect leakage of sensitive information across all their apps.”

Taking action on your business data security

Salesforce and Box are certainly not the only SaaS applications managing this type of sensitive organizational data. At Cloudflare, we strive to make our products as widely compatible as possible so that organizations can continue to place their trust and confidence in us to help keep them secure.

Today, Cloudflare CASB supports integrations with Google Workspace, Microsoft 365, Slack, GitHub, Salesforce, and Box, with a growing list of other critical applications on their way, so if there’s one in particular you’d like to see soon, let us know!

For those not already using Cloudflare Zero Trust, don’t hesitate to get started today – see the platform yourself with 50 free seats by signing up here, then get in touch with our team here to learn more about how Cloudflare CASB can help your organization lock down its SaaS apps.

How to secure your SaaS tenant data in DynamoDB with ABAC and client-side encryption

Post Syndicated from Jani Muuriaisniemi original https://aws.amazon.com/blogs/security/how-to-secure-your-saas-tenant-data-in-dynamodb-with-abac-and-client-side-encryption/

If you’re a SaaS vendor, you may need to store and process personal and sensitive data for large numbers of customers across different geographies. When processing sensitive data at scale, you have an increased responsibility to secure this data end-to-end. Client-side encryption of data, such as your customers’ contact information, provides an additional mechanism that can help you protect your customers and earn their trust.

In this blog post, we show how to implement client-side encryption of your SaaS application’s tenant data in Amazon DynamoDB with the Amazon DynamoDB Encryption Client. This is accomplished by leveraging AWS Identity and Access Management (IAM) together with AWS Key Management Service (AWS KMS) for a more secure and cost-effective isolation of the client-side encrypted data in DynamoDB, both at run-time and at rest.

Encrypting data in Amazon DynamoDB

Amazon DynamoDB supports data encryption at rest using encryption keys stored in AWS KMS. This functionality helps reduce operational burden and complexity involved in protecting sensitive data. In this post, you’ll learn about the benefits of adding client-side encryption to achieve end-to-end encryption in transit and at rest for your data, from its source to storage in DynamoDB. Client-side encryption helps ensure that your plaintext data isn’t available to any third party, including AWS.

You can use the Amazon DynamoDB Encryption Client to implement client-side encryption with DynamoDB. In the solution in this post, client-side encryption refers to the cryptographic operations that are performed on the application-side in the application’s Lambda function, before the data is sent to or retrieved from DynamoDB. The solution in this post uses the DynamoDB Encryption Client with the Direct KMS Materials Provider so that your data is encrypted by using AWS KMS. However, the underlying concept of the solution is not limited to the use of the DynamoDB Encryption Client, you can apply it to any client-side use of AWS KMS, for example using the AWS Encryption SDK.

For detailed information about using the DynamoDB Encryption Client, see the blog post How to encrypt and sign DynamoDB data in your application. This is a great place to start if you are not yet familiar with DynamoDB Encryption Client. If you are unsure about whether you should use client-side encryption, see Client-side and server-side encryption in the Amazon DynamoDB Encryption Client Developer Guide to help you with the decision.

AWS KMS encryption context

AWS KMS gives you the ability to add an additional layer of authentication for your AWS KMS API decrypt operations by using encryption context. The encryption context is one or more key-value pairs of additional data that you want associated with AWS KMS protected information.

Encryption context helps you defend against the risks of ciphertexts being tampered with, modified, or replaced — whether intentionally or unintentionally. Encryption context helps defend against both an unauthorized user replacing one ciphertext with another, as well as problems like operational events. To use encryption context, you specify associated key-value pairs on encrypt. You must provide the exact same key-value pairs in the encryption context on decrypt, or the operation will fail. Encryption context is not secret, and is not an access-control mechanism. The encryption context is a means of authenticating the data, not the caller.

The Direct KMS Materials Provider used in this blog post transparently generates a unique data key by using AWS KMS for each item stored in the DynamoDB table. It automatically sets the item’s partition key and sort key (if any) as AWS KMS encryption context key-value pairs.

The solution in this blog post relies on the partition key of each table item being defined in the encryption context. If you encrypt data with your own implementation, make sure to add your tenant ID to the encryption context in all your AWS KMS API calls.

For more information about the concept of AWS KMS encryption context, see the blog post How to Protect the Integrity of Your Encrypted Data by Using AWS Key Management Service and EncryptionContext. You can also see another example in Exercise 3 of the Busy Engineer’s Document Bucket Workshop.

Attribute-based access control for AWS

Attribute-based access control (ABAC) is an authorization strategy that defines permissions based on attributes. In AWS, these attributes are called tags. In the solution in this post, ABAC helps you create tenant-isolated access policies for your application, without the need to provision tenant specific AWS IAM roles.

If you are new to ABAC, or need a refresher on the concepts and the different isolation methods, see the blog post How to implement SaaS tenant isolation with ABAC and AWS IAM.

Solution overview

If you are a SaaS vendor expecting large numbers of tenants, it is important that your underlying architecture can cost effectively scale with minimal complexity to support the required number of tenants, without compromising on security. One way to meet these criteria is to store your tenant data in a single pooled DynamoDB table, and to encrypt the data using a single AWS KMS key.

Using a single shared KMS key to read and write encrypted data in DynamoDB for multiple tenants reduces your per-tenant costs. This may be especially relevant to manage your costs if you have users on your organization’s free tier, with no direct revenue to offset your costs.

When you use shared resources such as a single pooled DynamoDB table encrypted by using a single KMS key, you need a mechanism to help prevent cross-tenant access to the sensitive data. This is where you can use ABAC for AWS. By using ABAC, you can build an application with strong tenant isolation capabilities, while still using shared and pooled underlying resources for storing your sensitive tenant data.

You can find the solution described in this blog post in the aws-dynamodb-encrypt-with-abac GitHub repository. This solution uses ABAC combined with KMS encryption context to provide isolation of tenant data, both at rest and at run time. By using a single KMS key, the application encrypts tenant data on the client-side, and stores it in a pooled DynamoDB table, which is partitioned by a tenant ID.

Solution Architecture

Figure 1: Components of solution architecture

Figure 1: Components of solution architecture

The presented solution implements an API with a single AWS Lambda function behind an Amazon API Gateway, and implements processing for two types of requests:

  1. GET request: fetch any key-value pairs stored in the tenant data store for the given tenant ID.
  2. POST request: store the provided key-value pairs in the tenant data store for the given tenant ID, overwriting any existing data for the same tenant ID.

The application is written in Python, it uses AWS Lambda Powertools for Python, and you deploy it by using the AWS CDK.

It also uses the DynamoDB Encryption Client for Python, which includes several helper classes that mirror the AWS SDK for Python (Boto3) classes for DynamoDB. This solution uses the EncryptedResource helper class which provides Boto3 compatible get_item and put_item methods. The helper class is used together with the KMS Materials Provider to handle encryption and decryption with AWS KMS transparently for the application.

Note: This example solution provides no authentication of the caller identity. See chapter “Considerations for authentication and authorization” for further guidance.

How it works

Figure 2: Detailed architecture for storing new or updated tenant data

Figure 2: Detailed architecture for storing new or updated tenant data

As requests are made into the application’s API, they are routed by API Gateway to the application’s Lambda function (1). The Lambda function begins to run with the IAM permissions that its IAM execution role (DefaultExecutionRole) has been granted. These permissions do not grant any access to the DynamoDB table or the KMS key. In order to access these resources, the Lambda function first needs to assume the ResourceAccessRole, which does have the necessary permissions. To implement ABAC more securely in this use case, it is important that the application maintains clear separation of IAM permissions between the assumed ResourceAccessRole and the DefaultExecutionRole.

As the application assumes the ResourceAccessRole using the AssumeRole API call (2), it also sets a TenantID session tag. Session tags are key-value pairs that can be passed when you assume an IAM role in AWS Simple Token Service (AWS STS), and are a fundamental core building block of ABAC on AWS. When the session credentials (3) are used to make a subsequent request, the request context includes the aws:PrincipalTag context key, which can be used to access the session’s tags. The chapter “The ResourceAccessRole policy” describes how the aws:PrincipalTag context key is used in IAM policy condition statements to implement ABAC for this solution. Note that for demonstration purposes, this solution receives the value for the TenantID tag directly from the request URL, and it is not authenticated.

The trust policy of the ResourceAccessRole defines the principals that are allowed to assume the role, and to tag the assumed role session. Make sure to limit the principals to the least needed for your application to function. In this solution, the application Lambda function is the only trusted principal defined in the trust policy.

Next, the Lambda function prepares to encrypt or decrypt the data (4). To do so, it uses the DynamoDB Encryption Client. The KMS Materials Provider and the EncryptedResource helper class are both initialized with sessions by using the temporary credentials from the AssumeRole API call. This allows the Lambda function to access the KMS key and DynamoDB table resources, with access restricted to operations on data belonging only to the specific tenant ID.

Finally, using the EncryptedResource helper class provided by the DynamoDB Encryption Library, the data is written to and read from the DynamoDB table (5).

Considerations for authentication and authorization

The solution in this blog post intentionally does not implement authentication or authorization of the client requests. Instead, the requested tenant ID from the request URL is passed as the tenant identity. Your own applications should always authenticate and authorize tenant requests. There are multiple ways you can achieve this.

Modern web applications commonly use OpenID Connect (OIDC) for authentication, and OAuth for authorization. JSON Web Tokens (JWTs) can be used to pass the resulting authorization data from client to the application. You can validate a JWT when using AWS API Gateway with one of the following methods:

  1. When using a REST or a HTTP API, you can use a Lambda authorizer
  2. When using a HTTP API, you can use a JWT authorizer
  3. You can validate the token directly in your application code

If you write your own authorizer code, you can pick a popular open source library or you can choose the AWS provided open source library. To learn more about using a JWT authorizer, see the blog post How to secure API Gateway HTTP endpoints with JWT authorizer.

Regardless of the chosen method, you must be able to map a suitable claim from the user’s JWT, such as the subject, to the tenant ID, so that it can be used as the session tag in this solution.

The ResourceAccessRole policy

A critical part of the correct operation of ABAC in this solution is with the definition of the IAM access policy for the ResourceAccessRole. In the following policy, be sure to replace <region>, <account-id>, <table-name>, and <key-id> with your own values.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:DescribeTable",
                "dynamodb:GetItem",
                "dynamodb:PutItem"
            ],
            "Resource": [
                "arn:aws:dynamodb:<region>:<account-id>:table/<table-name>",
           ],
            "Condition": {
                "ForAllValues:StringEquals": {
                    "dynamodb:LeadingKeys": [
                        "${aws:PrincipalTag/TenantID}"
                    ]
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt",
                "kms:GenerateDataKey",
            ],
            "Resource": "arn:aws:kms:<region>:<account-id>:key/<key-id>",
            "Condition": {
                "StringEquals": {
                    "kms:EncryptionContext:tenant_id": "${aws:PrincipalTag/TenantID}"
                }
            }
        }
    ]
}

The policy defines two access statements, both of which apply separate ABAC conditions:

  1. The first statement grants access to the DynamoDB table with the condition that the partition key of the item matches the TenantID session tag in the caller’s session.
  2. The second statement grants access to the KMS key with the condition that one of the key-value pairs in the encryption context of the API call has a key called tenant_id with a value that matches the TenantID session tag in the caller’s session.

Warning: Do not use a ForAnyValue or ForAllValues set operator with the kms:EncryptionContext single-valued condition key. These set operators can create a policy condition that does not require values you intend to require, and allows values you intend to forbid.

Deploying and testing the solution

Prerequisites

To deploy and test the solution, you need the following:

Deploying the solution

After you have the prerequisites installed, run the following steps in a command line environment to deploy the solution. Make sure that your AWS CLI is configured with your AWS account credentials. Note that standard AWS service charges apply to this solution. For more information about pricing, see the AWS Pricing page.

To deploy the solution into your AWS account

  1. Use the following command to download the source code:
    git clone https://github.com/aws-samples/aws-dynamodb-encrypt-with-abac
    cd aws-dynamodb-encrypt-with-abac

  2. (Optional) You will need an AWS CDK version compatible with the application (2.37.0) to deploy. The simplest way is to install a local copy with npm, but you can also use a globally installed version if you already have one. To install locally, use the following command to use npm to install the AWS CDK:
    npm install [email protected]

  3. Use the following commands to initialize a Python virtual environment:
    python3 -m venv demoenv
    source demoenv/bin/activate
    python3 -m pip install -r requirements.txt

  4. (Optional) If you have not used AWS CDK with this account and Region before, you first need to bootstrap the environment:
    npx cdk bootstrap

  5. Use the following command to deploy the application with the AWS CDK:
    npx cdk deploy

  6. Make note of the API endpoint URL https://<api url>/prod/ in the Outputs section of the CDK command. You will need this URL for the next steps.
    Outputs:
    DemoappStack.ApiEndpoint4F160690 = https://<api url>/prod/

Testing the solution with example API calls

With the application deployed, you can test the solution by making API calls against the API URL that you captured from the deployment output. You can start with a simple HTTP POST request to insert data for a tenant. The API expects a JSON string as the data to store, so make sure to post properly formatted JSON in the body of the request.

An example request using curl -command looks like:

curl https://<api url>/prod/tenant/<tenant-name> -X POST --data '{"email":"<[email protected]>"}'

You can then read the same data back with an HTTP GET request:

curl https://<api url>/prod/tenant/<tenant-name>

You can store and retrieve data for any number of tenants, and can store as many attributes as you like. Each time you store data for a tenant, any previously stored data is overwritten.

Additional considerations

A tenant ID is used as the DynamoDB table’s partition key in the example application in this solution. You can replace the tenant ID with another unique partition key, such as a product ID, as long as the ID is consistently used in the IAM access policy, the IAM session tag, and the KMS encryption context. In addition, while this solution does not use a sort key in the table, you can modify the application to support a sort key with only a few changes. For more information, see Working with tables and data in DynamoDB.

Clean up

To clean up the application resources that you deployed while testing the solution, in the solution’s home directory, run the command cdk destroy.

Then, if you no longer plan to deploy to this account and Region using AWS CDK, you can also use the AWS CloudFormation console to delete the bootstrap stack (CDKToolKit).

Conclusion

In this post, you learned a method for simple and cost-efficient client-side encryption for your tenant data. By using the DynamoDB Encryption Client, you were able to implement the encryption with less effort, all while using a standard Boto3 DynamoDB Table resource compatible interface.

Adding to the client-side encryption, you also learned how to apply attribute-based access control (ABAC) to your IAM access policies. You used ABAC for tenant isolation by applying conditions for both the DynamoDB table access, as well as access to the KMS key that is used for encryption of the tenant data in the DynamoDB table. By combining client-side encryption with ABAC, you have increased your data protection with multiple layers of security.

You can start experimenting today on your own by using the provided solution. If you have feedback about this post, submit comments in the Comments section below. If you have questions on the content, consider submitting them to AWS re:Post

Want more AWS Security news? Follow us on Twitter.

Jani Muuriaisniemi

Jani is a Principal Solutions Architect at Amazon Web Services based out of Helsinki, Finland. With more than 20 years of industry experience, he works as a trusted advisor with a broad range of customers across different industries and segments, helping the customers on their cloud journey.

The easiest way to build a modern SaaS application

Post Syndicated from Tanushree Sharma original https://blog.cloudflare.com/workers-for-platforms-ga/

The easiest way to build a modern SaaS application

The easiest way to build a modern SaaS application

The Software as a Service (SaaS) model has changed the way we work – 80% of businesses use at least one SaaS application. Instead of investing in building proprietary software or installing and maintaining on-prem licensed software, SaaS vendors provide businesses with out-of-the-box solutions.

SaaS has many benefits over the traditional software model: cost savings, continuous updates and scalability, to name a few. However, any managed solution comes with trade-offs. As a business, one of the biggest challenges in adopting SaaS tooling is loss of customization. Not every business uses software in the same way and as you grow as a SaaS company it’s not long until you get customers saying “if only I could do X”.

Enter Workers for Platforms – Cloudflare’s serverless functions offering for SaaS businesses. With Workers for Platforms, your customers can build custom logic to meet their requirements right into your application.

We’re excited to announce that Workers for Platforms is now in GA for all Enterprise customers! If you’re an existing customer, reach out to your Customer Success Manager (CSM) to get access. For new customers, fill out our contact form to get started.

The conundrum of customization

As a SaaS business invested in capturing the widest market, you want to build a universal solution that can be used by customers of different sizes, in various industries and regions. However, every one of your customers has a unique set of tools and vendors and best practices. A generalized platform doesn’t always meet their needs.

For SaaS companies, once you get in the business of creating customizations yourself, it can be hard to keep up with seemingly never ending requests. You want your engineering teams to focus on building out your core business instead of building and maintaining custom solutions for each of your customer’s use cases.

With Workers for Platforms, you can give your customers the ability to write code that customizes your software. This gives your customers the flexibility to meet their exact use case while also freeing up internal engineering time  – it’s a win-win!

How is this different from Workers?

Workers is Cloudflare’s serverless execution environment that runs your code on Cloudflare’s global network. Workers is lightning fast and scalable; running at data centers in 275+ cities globally and serving requests from as close as possible to the end user. Workers for Platforms extends the power of Workers to our customer’s developers!

So, what’s new?

Dispatch Worker: As a platform customer, you want to have full control over how end developer code fits in with your APIs. A Dispatch Worker is written by our platform customers to run their own logic before dispatching (aka routing) to Workers written by end developers. In addition to routing, it can be used to run authentication, create boilerplate functions and sanitize responses.

User Workers: User Workers are written by end developers, that is, our customers’ developers. End developers can deploy User Workers to script automated actions, create integrations or modify response payload to return custom content. Unlike self-managed Function-as-a-Service (FaaS) options, with Workers for Platforms, end developers don’t need to worry about setting up and maintaining their code on any 3rd party platform. All they need to do is upload their code and you – or rather Cloudflare – takes care of the rest.

Unlimited Scripts: Yes, you read that correctly! With hundreds-plus end developers, the existing 100 script limit for Workers won’t cut it for Workers for Platforms customers. Some of our Workers for Platforms customers even deploy a new script each time their end developers make a change to their code in order to maintain version control and to easily revert to a previous state if a bug is deployed.

Dynamic Dispatch Namespaces: If you’ve used Workers before, you may be familiar with a feature we released earlier this year, Service Bindings. Service Bindings are a way for two Workers to communicate with each other. They allow developers to break up their applications into modules that can be chained together. Service Bindings explicitly link two Workers together, and they’re meant for use cases where you know exactly which Workers need to communicate with each other.

Service Bindings don’t work in the Workers for Platforms model because User Workers are uploaded ad hoc. Dynamic Dispatch Namespaces is our solution to this! A Dispatch Namespace is composed of a collection of User Workers. With Dispatch Namespaces, a Dispatch Worker can be used to call any User Worker in a namespace (similar to how Service Bindings work) but without needing to explicitly pre-define the relationship.

Read more about how to use these features below!

How to use Workers for Platforms

The easiest way to build a modern SaaS application

Dispatch Workers

Dispatch Workers are the entry point for requests to Workers in a Dispatch Namespace. The Dispatch Worker can be used to run any functions ahead of User Workers. They can make a request to any User Workers in the Dispatch Namespace, and they ultimately handle the routing to User Workers.

Dispatch Workers are created the same way as a regular Worker, except they need a Dispatch Namespace binding in the project’s wrangler.toml configuration file.

[[dispatch_namespaces]]
binding = "dispatcher"
namespace = "api-prod"

In the example below, this Dispatch Worker reads the subdomain from the path and calls the appropriate User Worker. Alternatively you can use KV, D1 or your data store of choice to map identifying parameters from an incoming request to a User Worker.

export default {
 async fetch(request, env) {
   try {
       // parse the URL, read the subdomain
       let worker_name = new URL(request.url).host.split('.')[0]
       let user_worker = env.dispatcher.get(worker_name)
       return user_worker.fetch(request)
   } catch (e) {
       if (e.message == 'Error: Worker not found.') {
           // we tried to get a worker that doesn't exist in our dispatch namespace
           return new Response('', {status: 404})
       }
       // this could be any other exception from `fetch()` *or* an exception
       // thrown by the called worker (e.g. if the dispatched worker has
       // `throw MyException()`, you could check for that here).
       return new Response(e.message, {status: 500})
   }
 }

}

Uploading User Workers

User Workers must be uploaded to a Dispatch Namespace through the Cloudflare API (wrangler support coming soon!). This code snippet below uses a simple HTML form to take in a script and customer id and then uploads it to the Dispatch Namespace.

export default {
 async fetch(request: Request) {
   try {
     // on form submit
     if (request.method === "POST"){
       const str = JSON.stringify(await request.json())
       const upload_obj = JSON.parse(str)
       await upload(upload_obj.customerID, upload_obj.script)
   }
   //render form
     return new Response (html, {
       headers: {
         "Content-Type": "text/html"
       }
     })
   } catch (e) {
       // form error
       return new Response(e.message, {status: 500})
   }
 }
}

async function upload(customerID:string, script:string){
 const scriptName = customerID;
 const scriptContent = script;
 const accountId = "<ACCOUNT_ID>";
 const dispatchNamespace = "api-prod";
 const url = `https://api.cloudflare.com/client/v4/accounts/${accountId}/workers/dispatch/namespaces/${dispatchNamespace}/scripts/${scriptName}`;
 // construct and send request
 const response = await fetch(url, {
   method: "PUT",
   body: scriptContent,
   headers: {
     "Content-Type": "application/javascript",
     "X-Auth-Email": "<EMAIL>",
     "X-Auth-Key": "<API_KEY>"
   }
 });

 const result = (await response.json());
 if (response.status != 200) {
   throw new Error(`Upload error`);
 }
}

It’s that simple. With Dispatch Namespaces and Dispatch Workers, we’re giving you the building blocks to customize your SaaS applications. Along with the Platforms APIs, we’re also releasing a Workers for Platforms UI on the Cloudflare dashboard where you can view your Dispatch Namespaces, search scripts and view analytics for User Workers.

The easiest way to build a modern SaaS application

To view an end to end example, check out our Workers for Platforms example application.

Get started today!

We’re releasing Workers for Platforms to all Cloudflare Enterprise customers. If you’re interested, reach out to your Customer Success Manager (CSM) to get access. To get started, take a look at our Workers for Platforms starter project and developer documentation.

We also have plans to release this down to the Workers Paid plan. Stay tuned on the Cloudflare Discord (channel name: workers-for-platforms-beta) for updates.

What’s next?

We’ve heard lots of great feature requests from our early Workers for Platforms customers. Here’s a preview of what’s coming next on the roadmap:

  • Fine-grained controls over User Workers: custom script limits, allowlist/blocklist for fetch requests
  • GraphQL and Logs: Metrics for User Workers by tag
  • Plug and play Platform Development Kit
  • Tighter integration with Cloudflare for SaaS custom domains

If you have specific feature requests in mind, please reach out to your CSM or get in touch through the Discord!

Introducing new Cloudflare for SaaS documentation

Post Syndicated from Mia Malden original https://blog.cloudflare.com/introducing-new-cloudflare-for-saas-documentation/

Introducing new Cloudflare for SaaS documentation

Introducing new Cloudflare for SaaS documentation

As a SaaS provider, you’re juggling many challenges while building your application, whether it’s custom domain support, protection from attacks, or maintaining an origin server. In 2021, we were proud to announce Cloudflare for SaaS for Everyone, which allows anyone to use Cloudflare to cover those challenges, so they can focus on other aspects of their business. This product has a variety of potential implementations; now, we are excited to announce a new section in our Developer Docs specifically devoted to Cloudflare for SaaS documentation to allow you take full advantage of its product suite.

Cloudflare for SaaS solution

You may remember, from our October 2021 blog post, all the ways that Cloudflare provides solutions for SaaS providers:

  • Set up an origin server
  • Encrypt your customers’ traffic
  • Keep your customers online
  • Boost the performance of global customers
  • Support custom domains
  • Protect against attacks and bots
  • Scale for growth
  • Provide insights and analytics
Introducing new Cloudflare for SaaS documentation

However, we received feedback from customers indicating confusion around actually using the capabilities of Cloudflare for SaaS because there are so many features! With the existing documentation, it wasn’t 100% clear how to enhance security and performance, or how to support custom domains. Now, we want to show customers how to use Cloudflare for SaaS to its full potential by including more product integrations in the docs, as opposed to only focusing on the SSL/TLS piece.

Bridging the gap

Cloudflare for SaaS can be overwhelming with so many possible add-ons and configurations. That’s why the new docs are organized into six main categories, housing a number of new, detailed guides (for example, WAF for SaaS and Regional Services for SaaS):

Introducing new Cloudflare for SaaS documentation

Once you get your SaaS application up and running with the Get Started page, you can find which configurations are best suited to your needs based on your priorities as a provider. Even if you aren’t sure what your goals are, this setup outlines the possibilities much more clearly through a number of new documents and product guides such as:

Instead of pondering over vague subsection titles, you can peruse with purpose in mind. The advantages and possibilities of Cloudflare for SaaS are highlighted instead of hidden.

Possible configurations

This setup facilitates configurations much more easily to meet your goals as a SaaS provider.

For example, consider performance. Previously, there was no documentation surrounding reduced latency for SaaS providers. Now, the Performance section explains the automatic benefits to your performance by onboarding with Cloudflare for SaaS. Additionally, it offers three options of how to reduce latency even further through brand-new docs:

Similarly, the new organization offers WAF for SaaS as a previously hidden security solution, extending providers the ability to enable automatic protection from vulnerabilities and the flexibility to create custom rules. This is conveniently accompanied by a step-by-step tutorial using Cloudflare Managed Rulesets.

What’s next

While this transition represents an improvement in the Cloudflare for SaaS docs, we’re going to expand its accessibility even more. Some tutorials, such as our Managed Ruleset Tutorial, are already live within the tile. However, more step-by-step guides for Cloudflare for SaaS products and add-ons will further enable our customers to take full advantage of the available product suite. In particular, keep an eye out for expanding documentation around using Workers for Platforms.

Check it out

Visit the new Cloudflare for SaaS tile to see the updates. If you are a SaaS provider interested in extending Cloudflare benefits to your customers through Cloudflare for SaaS, visit our Cloudflare for SaaS overview and our Plans page.

Throttling a tiered, multi-tenant REST API at scale using API Gateway: Part 2

Post Syndicated from Nick Choi original https://aws.amazon.com/blogs/architecture/throttling-a-tiered-multi-tenant-rest-api-at-scale-using-api-gateway-part-2/

In Part 1 of this blog series, we demonstrated why tiering and throttling become necessary at scale for multi-tenant REST APIs, and explored tiering strategy and throttling with Amazon API Gateway.

In this post, Part 2, we will examine tenant isolation strategies at scale with API Gateway and extend the sample code from Part 1.

Enhancing the sample code

To enable this functionality in the sample code (Figure 1), we will make manual changes. First, create one API key for the Free Tier and five API keys for the Basic Tier. Currently, these API keys are private keys for your Amazon Cognito login, but we will make a further change in the backend business logic that will promote them to pooled resources. Note that all of these modifications are specific to this sample code’s implementation; the implementation and deployment of a production code may be completely different (Figure 1).

Cloud architecture of the sample code

Figure 1. Cloud architecture of the sample code

Next, in the business logic for thecreateKey(), find the AWS Lambda function in lambda/create_key.js.  It appears like this:

function createKey(tableName, key, plansTable, jwt, rand, callback) {
  const pool = getPoolForPlanId( key.planId ) 
  if (!pool) {
    createSiloedKey(tableName, key, plansTable, jwt, rand, callback);
  } else {
    createPooledKey(pool, tableName, key, jwt, callback);
  }
}

The getPoolForPlanId() function does a search for a pool of keys associated with the usage plan. If there is a pool, we “create” a kind of reference to the pooled resource, rather than a completely new key that is created by the API Gateway service directly. The lambda/api_key_pools.js should be empty.

exports.apiKeyPools = [];

In effect, all usage plans were considered as siloed keys up to now. To change that, populate the data structure with values from the six API keys that were created manually. You will have to look up the IDs of the API keys and usage plans that were created in API Gateway (Figures 2 and 3). Using the AWS console to navigate to API Gateway is the most intuitive.

A view of the AWS console when inspecting the ID for the Basic usage plan

Figure 2. A view of the AWS console when inspecting the ID for the Basic usage plan

A view of the AWS Console when looking up the API key value (not the ID)

Figure 3. A view of the AWS Console when looking up the API key value (not the ID)

When done, your code in lambda/api_key_pools.js should be the following, but instead of ellipses (), the IDs for the user plans and API keys specific to your environment will appear.

exports.apiKeyPools = [{
    planName: "FreePlan"
    planId: "...",
    apiKeys: [ "..." ]
  },
 {
    planName: "BasicPlan"
    planId: "...",
    apiKeys: [ "...", "...", "...", "...", "..." ]
  }];

After making the code changes, run cdk deploy from the command line to update the Lambda functions. This change will only affect key creation and deletion because of the system implementation. Updates affect only the user’s specific reference to the key, not the underlying resource managed by API Gateway.

When the web application is run now, it will look similar to before—tenants should not be aware what tiering strategy they have been assigned to. The only way to notice the difference would be to create two Free Tier keys, test them, and note that the value of the X-API-KEY header is unchanged between the two.

Now, you have a virtually unlimited number of users who can have API keys in the Free or Basic Tier. By keeping the Premium Tier siloed, you are subject to the 10,000-API-key maximum (less any keys allocated for the lower tiers). You may consider additional techniques to continue to scale, such as replicating your service in another AWS account.

Other production considerations

The sample code is minimal, and it illustrates just one aspect of scaling a Software-as-a-service (SaaS) application. There are many other aspects be considered in a production setting that we explore in this section.

The throttled endpoint, GET /api rely only on API key for authorization for demonstration purpose. For any production implementation consider authentication options for your REST APIs. You may explore and extend to require authentication with Cognito similar to /admin/* endpoints in the sample code.

One API key for Free Tier access and five API keys for Basic Tier access are illustrative in a sample code but not representative of production deployments. Number of API keys with service quota into consideration, business and technical decisions may be made to minimize noisy neighbor effect such as setting blast radius upper threshold of 0.1% of all users. To satisfy that requirement, each tier would need to spread users across at least 1,000 API keys. The number of keys allocated to Basic or Premium Tier would depend on market needs and pricing strategies. Additional allocations of keys could be held in reserve for troubleshooting, QA, tenant migrations, and key retirement.

In the planning phase of your solution, you will decide how many tiers to provide, how many usage plans are needed, and what throttle limits and quotas to apply. These decisions depend on your architecture and business.

To define API request limits, examine the system API Gateway is protecting and what load it can sustain. For example, if your service will scale up to 1,000 requests per second, it is possible to implement three tiers with a 10/50/40 split: the lowest tier shares one common API key with a 100 request per second limit; an intermediate tier has a pool of 25 API keys with a limit of 20 requests per second each; and the highest tier has a maximum of 10 API keys, each supporting 40 requests per second.

Metrics play a large role in continuously evolving your SaaS-tiering strategy (Figure 4). They provide rich insights into how tenants are using the system. Tenant-aware and SaaS-wide metrics on throttling and quota limits can be used to: assess tiering in-place, if tenants’ requirements are being met, and if currently used tenant usage profiles are valid (Figure 5).

Tiering strategy example with 3 tiers and requests allocation per tier

Figure 4. Tiering strategy example with 3 tiers and requests allocation per tier

An example SaaS metrics dashboard

Figure 5. An example SaaS metrics dashboard

API Gateway provides options for different levels of granularity required, including detailed metrics, and execution and access logging to enable observability of your SaaS solution. Granular usage metrics combined with underlying resource consumption leads to managing optimal experience for your tenants with throttling levels and policies per method and per client.

Cleanup

To avoid incurring future charges, delete the resources. This can be done on the command line by typing:

cd ${TOP}/cdk
cdk destroy

cd ${TOP}/react
amplify delete

${TOP} is the topmost directory of the sample code. For the most up-to-date information, see the README.md file.

Conclusion

In this two-part blog series, we have reviewed the best practices and challenges of effectively guarding a tiered multi-tenant REST API hosted in AWS API Gateway. We also explored how throttling policy and quota management can help you continuously evaluate the needs of your tenants and evolve your tiering strategy to protect your backend systems from being overwhelmed by inbound traffic.

Further reading:

Throttling a tiered, multi-tenant REST API at scale using API Gateway: Part 1

Post Syndicated from Nick Choi original https://aws.amazon.com/blogs/architecture/throttling-a-tiered-multi-tenant-rest-api-at-scale-using-api-gateway-part-1/

Many software-as-a-service (SaaS) providers adopt throttling as a common technique to protect a distributed system from spikes of inbound traffic that might compromise reliability, reduce throughput, or increase operational cost. Multi-tenant SaaS systems have an additional concern of fairness; excessive traffic from one tenant needs to be selectively throttled without impacting the experience of other tenants. This is also known as “the noisy neighbor” problem. AWS itself enforces some combination of throttling and quota limits on nearly all its own service APIs. SaaS providers building on AWS should design and implement throttling strategies in all of their APIs as well.

In this two-part blog series, we will explore tiering and throttling strategies for multi-tenant REST APIs and review tenant isolation models with hands-on sample code. In part 1, we will look at why a tiering and throttling strategy is needed and show how Amazon API Gateway can help by showing sample code. In part 2, we will dive deeper into tenant isolation models as well as considerations for production.

We selected Amazon API Gateway for this architecture since it is a fully managed service that helps developers to create, publish, maintain, monitor, and secure APIs. First, let’s focus on how Amazon API Gateway can be used to throttle REST APIs with fine granularity using Usage Plans and API Keys. Usage Plans define the thresholds beyond which throttling should occur. They also enable quotas, which sets a maximum usage per a day, week, or month. API Keys are identifiers for distinguishing traffic and determining which Usage Plans to apply for each request. We limit the scope of our discussion to REST APIs because other protocols that API Gateway supports — WebSocket APIs and HTTP APIs — have different throttling mechanisms that do not employ Usage Plans or API Keys.

SaaS providers must balance minimizing cost to serve and providing consistent quality of service for all tenants. They also need to ensure one tenant’s activity does not affect the other tenants’ experience. Throttling and quotas are a key aspect of a tiering strategy and important for protecting your service at any scale. In practice, this impact of throttling polices and quota management is continuously monitored and evaluated as the tenant composition and behavior evolve over time.

Architecture Overview

Figure 1. Cloud Architecture of the sample code.

Figure 1 – Architecture of the sample code

To get a firm foundation of the basics of throttling and quotas with API Gateway, we’ve provided sample code in AWS-Samples on GitHub. Not only does it provide a starting point to experiment with Usage Plans and API Keys in the API Gateway, but we will modify this code later to address complexity that happens at scale. The sample code has two main parts: 1) a web frontend and, 2) a serverless backend. The backend is a serverless architecture using Amazon API Gateway, AWS Lambda, Amazon DynamoDB, and Amazon Cognito. As Figure I illustrates, it implements one REST API endpoint, GET /api, that is protected with throttling and quotas. There are additional APIs under the /admin/* resource to provide Read access to Usage Plans, and CRUD operations on API Keys.

All these REST endpoints could be tested with developer tools such as curl or Postman, but we’ve also provided a web application, to help you get started. The web application illustrates how tenants might interact with the SaaS application to browse different tiers of service, purchase API Keys, and test them. The web application is implemented in React and uses AWS Amplify CLI and SDKs.

Prerequisites

To deploy the sample code, you should have the following prerequisites:

For clarity, we’ll use the environment variable, ${TOP}, to indicate the top-most directory in the cloned source code or the top directory in the project when browsing through GitHub.

Detailed instructions on how to install the code are in ${TOP}/INSTALL.md file in the code. After installation, follow the ${TOP}/WALKTHROUGH.md for step-by-step instructions to create a test key with a very small quota limit of 10 requests per day, and use the client to hit that limit. Search for HTTP 429: Too Many Requests as the signal your client has been throttled.

Figure 2: The web application (with browser developer tools enabled) shows that a quick succession of API calls starts returning an HTTP 429 after the quota for the day is exceeded.

Figure 2: The web application (with browser developer tools enabled) shows that a quick succession of API calls starts returning an HTTP 429 after the quota for the day is exceeded.

Responsibilities of the Client to support Throttling

The Client must provide an API Key in the header of the HTTP request, labelled, “X-Api-Key:”. If a resource in API Gateway has throttling enabled and that header is missing or invalid in the request, then API Gateway will reject the request.

Important: API Keys are simple identifiers, not authorization tokens or cryptographic keys. API keys are for throttling and managing quotas for tenants only and not suitable as a security mechanism. There are many ways to properly control access to a REST API in API Gateway, and we refer you to the AWS documentation for more details as that topic is beyond the scope of this post.

Clients should always test for the response to any network call, and implement logic specific to an HTTP 429 response. The correct action is almost always “try again later.” Just how much later, and how many times before giving up, is application dependent. Common approaches include:

  • Retry – With simple retry, client retries the request up to defined maximum retry limit configured
  • Exponential backoff – Exponential backoff uses progressively larger wait time between retries for consecutive errors. As the wait time can become very long quickly, maximum delay and a maximum retry limits should be specified.
  • Jitter – Jitter uses a random amount of delay between retry to prevent large bursts by spreading the request rate.

AWS SDK is an example client-responsibility implementation. Each AWS SDK implements automatic retry logic that uses a combination of retry, exponential backoff, jitter, and maximum retry limit.

SaaS Considerations: Tenant Isolation Strategies at Scale

While the sample code is a good start, the design has an implicit assumption that API Gateway will support as many API Keys as we have number of tenants. In fact, API Gateway has a quota on available per region per account. If the sample code’s requirements are to support more than 10,000 tenants (or if tenants are allowed multiple keys), then the sample implementation is not going to scale, and we need to consider more scalable implementation strategies.

This is one instance of a general challenge with SaaS called “tenant isolation strategies.” We highly recommend reviewing this white paper ‘SasS Tenant Isolation Strategies‘. A brief explanation here is that the one-resource-per-customer (or “siloed”) model is just one of many possible strategies to address tenant isolation. While the siloed model may be the easiest to implement and offers strong isolation, it offers no economy of scale, has high management complexity, and will quickly run into limits set by the underlying AWS Services. Other models besides siloed include pooling, and bridged models. Again, we recommend the whitepaper for more details.

Figure 3. Tiered multi-tenant architectures often employ different tenant isolation strategies at different tiers. Our example is specific to API Keys, but the technique generalizes to storage, compute, and other resources.

Figure 3- Tiered multi-tenant architectures often employ different tenant isolation strategies at different tiers. Our example is specific to API Keys, but the technique generalizes to storage, compute, and other resources.

In this example, we implement a range of tenant isolation strategies at different tiers of service. This allows us to protect against “noisy-neighbors” at the highest tier, minimize outlay of limited resources (namely, API-Keys) at the lowest tier, and still provide an effective, bounded “blast radius” of noisy neighbors at the mid-tier.

A concrete development example helps illustrate how this can be implemented. Assume three tiers of service: Free, Basic, and Premium. One could create a single API Key that is a pooled resource among all tenants in the Free Tier. At the other extreme, each Premium customer would get their own unique API Key. They would protect Premium tier tenants from the ‘noisy neighbor’ effect. In the middle, the Basic tenants would be evenly distributed across a set of fixed keys. This is not complete isolation for each tenant, but the impact of any one tenant is contained within “blast radius” defined.

In production, we recommend a more nuanced approach with additional considerations for monitoring and automation to continuously evaluate tiering strategy. We will revisit these topics in greater detail after considering the sample code.

Conclusion

In this post, we have reviewed how to effectively guard a tiered multi-tenant REST API hosted in Amazon API Gateway. We also explored how tiering and throttling strategies can influence tenant isolation models. In Part 2 of this blog series, we will dive deeper into tenant isolation models and gaining insights with metrics.

If you’d like to know more about the topic, the AWS Well-Architected SaaS Lens Performance Efficiency pillar dives deep on tenant tiers and providing differentiated levels of performance to each tier. It also provides best practices and resources to help you design and reduce impact of noisy neighbors your SaaS solution.

To learn more about Serverless SaaS architectures in general, we recommend the AWS Serverless SaaS Workshop and the SaaS Factory Serverless SaaS reference solution that inspired it.

Zero Trust for SaaS: Deploying mTLS on custom hostnames

Post Syndicated from Dina Kozlov original https://blog.cloudflare.com/zero-trust-for-saas-deploying-mtls-on-custom-hostnames/

Zero Trust for SaaS: Deploying mTLS on custom hostnames

Cloudflare has a large base of Software-as-a-Service (SaaS) customers who manage thousands or millions of their customers’ domains that use their SaaS service. We have helped those SaaS providers grow by extending our infrastructure and services to their customer’s domains through a product called Cloudflare for SaaS. Today, we’re excited to give our SaaS providers a new tool that will help their customers add an extra layer of security: they can now enable mutual TLS authentication on their customer’s domains through our Access product.

Primer on Mutual TLS

When you connect to a website, you should see a lock icon in the address bar — that’s your browser telling you that you’re connecting to a website over a secure connection and that the website has a valid public TLS certificate. TLS certificates keep Internet traffic encrypted using a public/private key pair to encrypt and decrypt traffic. They also provide authentication, proving to clients that they are connecting to the correct server.

To make a secure connection, a TLS handshake needs to take place. During the handshake, the client and the server exchange cryptographic keys, the client authenticates the identity of the server, and both the client and the server generate session keys that are later used to encrypt traffic.

A TLS handshake looks like this:

Zero Trust for SaaS: Deploying mTLS on custom hostnames

In a TLS handshake, the client always validates the certificate that is served by the server to make sure that it’s sending requests to the right destination. In the same way that the client needs to authenticate the identity of the server, sometimes the server needs to authenticate the client — to ensure that only authorized clients are sending requests to the server.

Let’s say that you’re managing a few services: service A writes information to a database. This database is absolutely crucial and should only have entries submitted by service A. Now, what if you have a bug in your system and service B accidentally makes a write call to the database?

You need something that checks whether a service is authorized to make calls to your database — like a bouncer. A bouncer has a VIP list — they can check people’s IDs against the list to see whether they’re allowed to enter a venue. Servers can use a similar model, one that uses TLS certificates as a form of ID.

In the same way that a bouncer has a VIP list, a server can have a Certificate Authority (CA) Root from which they issue certificates. Certificates issued from the CA Root are then provisioned onto clients. These client certificates can then be used to identify and authorize the client. As long as a client presents a valid certificate — one that the server can validate against the Root CA, it’s allowed to make requests. If a client doesn’t present a client certificate (isn’t on the VIP list) or presents an unauthorized client certificate, then the server can choose to reject the request. This process of validating client and server certificates is called mutual TLS authentication (mTLS) and is done during the TLS handshake.

When mTLS isn’t used, only the server is responsible for presenting a certificate, which the client verifies. With mTLS, both the client and the server present and validate one another’s certificates, pictured below.

Zero Trust for SaaS: Deploying mTLS on custom hostnames

mTLS + Access = Zero Trust

A few years ago, we added mTLS support to our Access product, allowing customers to enable a Zero Trust policy on their applications. Access customers can deploy a policy that dictates that all clients must present a valid certificate when making a request. That means that requests made without a valid certificate — usually from unauthorized clients — will be blocked, adding an extra layer of protection. Cloudflare has allowed customers to configure mTLS on their Cloudflare domains by setting up Access policies. The only caveat was that to use this feature, you had to be the owner of the domain. Now, what if you’re not the owner of a domain, but you do manage that domain’s origin? This is the case for a large base of our customers, the SaaS providers that extend their services to their customers’ domains that they do not own.

Extending Cloudflare benefits through SaaS providers

Cloudflare for SaaS enables SaaS providers to extend the benefits of the Cloudflare network to their customers’ domains. These domains are not owned by the SaaS provider, but they do use the SaaS provider’s service, routing traffic back to the SaaS provider’s origin.

By doing this, SaaS providers take on the responsibility of providing their customers with the highest uptime, lightning fast performance, and unparalleled security — something they can easily extend to their customers through Cloudflare.

Cloudflare for SaaS actually started out as SSL for SaaS. We built SSL for SaaS to give SaaS providers the ability to issue TLS certificates for their customers, keeping the SaaS provider’s customers safe and secure.

Since then, our SaaS customers have come to us with a new request: extend the mTLS support that we built out for our direct customers, but to their customers.

Why would SaaS providers want to use mTLS?

As a SaaS provider, there’s a wide range of services that you can provide. Some of these services require higher security controls than others.

Let’s say that the SaaS solution that you’re building is a payment processor. Each customer gets its own API endpoint that their users send requests to, for example, pay.<business_name>.com. As a payment processor, you don’t want any client or device to make requests to your service, instead you only want authorized devices to do so — mTLS does exactly that.

As the SaaS provider, you can configure a Root CA for each of your customers’ API endpoints. Then, have each Root CA issue client certificates that will be installed on authorized devices. Once the client certificates have been installed, all that is left is enforcing a check for valid certificates.

To recap, by doing this, as a SaaS provider, your customers can now ensure that requests bound for their payment processing API endpoint only come from valid devices. In addition, by deploying individual Root CAs for each customer, you also prevent clients that are authorized to make requests to one customers’ API endpoint from making requests to another customers’ API endpoint when they are not authorized to do so.

How can you set this up with Cloudflare?

As a SaaS provider, configure Cloudflare for SaaS and add your customer’s domains as Custom Hostnames. Then, in the Cloudflare for Teams dashboard, add mTLS authentication with a few clicks.

This feature is currently in Beta and is available for Enterprise customers to use. If you have any feedback, please let your Account Team know.

Security for SaaS providers

Post Syndicated from Dina Kozlov original https://blog.cloudflare.com/waf-for-saas/

Security for SaaS providers

Security for SaaS providers

Some of the largest Software-as-a-Service (SaaS) providers use Cloudflare as the underlying infrastructure to provide their customers with fast loading times, unparalleled redundancy, and the strongest security — all through our Cloudflare for SaaS product. Today, we’re excited to give our SaaS providers new tools that will help them enhance the security of their customers’ applications.

For our Enterprise customers, we’re bringing WAF for SaaS — the ability for SaaS providers to easily create and deploy different sets of WAF rules for their customers. This gives SaaS providers the ability to segment customers into different groups based on their security requirements.

For developers who are getting their application off the ground, we’re thrilled to announce a Free tier of Cloudflare for SaaS for the Free, Pro, and Biz plans, giving our customers 100 custom hostnames free of charge to provision and test across their account. In addition to that, we want to make it easier for developers to scale their applications, so we’re happy to announce that we are lowering our custom hostname price from \$2 to \$0.10 a month.

But that’s not all! At Cloudflare, we believe security should be available for all. That’s why we’re extending a new selection of WAF rules to Free customers — giving all customers the ability to secure both their applications and their customers’.

Making SaaS infrastructure available to all

At Cloudflare, we take pride in our Free tier which gives any customer the ability to make use of our Network to stay secure and online. We are eager to extend the same support to customers looking to build a new SaaS offering, giving them a Free tier of Cloudflare for SaaS and allowing them to onboard 100 custom hostnames at no charge. The 100 custom hostnames will be automatically allocated to new and existing Cloudflare for SaaS customers. Beyond that, we are also dropping the custom hostname price from \$2 to \$0.10 a month, giving SaaS providers the power to onboard and scale their application. Existing Cloudflare for SaaS customers will see the updated custom hostname pricing reflected in their next billing cycle.

Cloudflare for SaaS started as a TLS certificate issuance product for SaaS providers. Now, we’re helping our customers go a step further in keeping their customers safe and secure.

Introducing WAF for SaaS

SaaS providers may have varying customer bases — from mom-and-pop shops to well established banks. No matter the customer, it’s important that as a SaaS provider you’re able to extend the best protection for your customers, regardless of their size.

At Cloudflare, we have spent years building out the best Web Application Firewall for our customers. From managed rules that offer advanced zero-day vulnerability protections to OWASP rules that block popular attack techniques, we have given our customers the best tools to keep themselves protected. Now, we want to hand off the tools to our SaaS providers who are responsible for keeping their customer base safe and secure.

One of the benefits of Cloudflare for SaaS is that SaaS providers can configure security rules and settings on their SaaS zone which their customers automatically inherit. But one size does not fit all, which is why we are excited to give Enterprise customers the power to create various sets of WAF rules that they can then extend as different security packages to their customers — giving end users differing levels of protection depending on their needs.

Getting Started

WAF for SaaS can be easily set up. We have an example below that shows how you can configure different buckets of WAF rules to your various customers.

There’s no limit to the number of rulesets that you can create, so feel free to create a handful of configurations for your customers, or deploy one ruleset per customer — whatever works for you!

End-to-end example

Step 1 – Define custom hostname

Cloudflare for SaaS customers define their customer’s domains by creating custom hostnames. Custom hostnames indicate which domains need to be routed to the SaaS provider’s origin. Custom hostnames can define specific domains, like example.com, or they can extend to wildcards like *.example.com which allows subdomains under example.com to get routed to the SaaS service. WAF for SaaS supports both types of custom hostnames, so that SaaS providers have flexibility in choosing the scope of their protection.

The first step is to create a custom hostname to define your customer’s domain. This can be done through the dashboard or the API.

curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone:id}/custom_hostnames" \
     -H "X-Auth-Email: {email}" -H "X-Auth-Key: {key}"\
     -H "Content-Type: application/json" \
     --data '{

"Hostname":{“example.com”},
"Ssl":{wildcard: true}
}'

Step 2 – Associate custom metadata to a custom hostname

Next, create an association between the custom hostnames — your customer’s domain — and the firewall ruleset that you’d like to attach to it.

This is done by associating a JSON blob to a custom hostname. Our product, Custom Metadata allows customers to easily do this via API.

In the example below, a JSON blob with two fields (“customer_id” and “security_level”) will be associated to each request for *.example.com and example.com.

There is no predetermined schema for custom metadata. Field names and structure are fully customisable based on our customer’s needs. In this example, we have chosen the tag “security_level” to which we expect to assign three values (low, medium or high). These will, in turn, trigger three different sets of rules.

curl -sXPATCH "https://api.cloudflare.com/client/v4/zones/{zone:id}/custom_hostnames/{custom_hostname:id}"\
    -H "X-Auth-Email: {email}" -H "X-Auth-Key: {key}"\
    -H "Content-Type: application/json"\
    -d '{
"Custom_metadata":{
"customer_id":"12345",
“security_level”: “low”
}
}'

Step 3 – Trigger security products based on tags

Finally, you can trigger a rule based on the custom hostname. The custom metadata field e.g. “security_level” is available in the Ruleset Engine where the WAF runs. In this example, “security_level” can be used to trigger different configurations of products such as WAF, Firewall Rules, Advanced Rate Limiting and Transform Rules.

Rules can be built through the dashboard or via the API, as shown below. Here, a rate limiting rule is triggered on traffic with “security_level” set to low.

curl -X PUT "https://api.cloudflare.com/client/v4/zones/{zone:id}/rulesets/phases/http_ratelimit/entrypoint" \
    -H "X-Auth-Email: {email}" -H "X-Auth-Key: {key}"\
    -H "Content-Type: application/json"\
    -d '{

"rules": [
              {
                "action": "block",
                "ratelimit": {
                  "characteristics": [
                    "cf.colo.id",
                    "ip.src"
                  ],
                  "period": 10,
                  "requests_per_period": 2,
                  "mitigation_timeout": 60
                },
                "expression": "lookup_json_string(cf.hostname.metadata, \"security_level\") eq \"low\" and http.request.uri contains \"login\""
              }
            ]
          }}'

If you’d like to learn more about our Advanced Rate Limiting rules, check out our documentation.

Security for SaaS providers

Conclusion

We’re excited to be the provider for our SaaS customers’ infrastructure needs. From custom domains to TLS certificates to Web Application Firewall, we’re here to help. Sign up for Cloudflare for SaaS today, or if you’re an Enterprise customer, reach out to your account team to get started with WAF for SaaS.

Security practices in AWS multi-tenant SaaS environments

Post Syndicated from Keith P original https://aws.amazon.com/blogs/security/security-practices-in-aws-multi-tenant-saas-environments/

Securing software-as-a-service (SaaS) applications is a top priority for all application architects and developers. Doing so in an environment shared by multiple tenants can be even more challenging. Identity frameworks and concepts can take time to understand, and forming tenant isolation in these environments requires deep understanding of different tools and services.

While security is a foundational element of any software application, specific considerations apply to SaaS applications. This post dives into the challenges, opportunities and best practices for securing multi-tenant SaaS environments on Amazon Web Services (AWS).

SaaS application security considerations

Single tenant applications are often deployed for a specific customer, and typically only deal with this single entity. While security is important in these environments, the threat profile does not include potential access by other customers. Multi-tenant SaaS applications have unique security considerations when compared to single tenant applications.

In particular, multi-tenant SaaS applications must pay special attention to identity and tenant isolation. These considerations are in addition to the security measures all applications must take. This blog post reviews concepts related to identity and tenant isolation, and how AWS can help SaaS providers build secure applications.

Identity

SaaS applications are accessed by individual principals (often referred to as users). These principals may be interactive (for example, through a web application) or machine-based (for example, through an API). Each principal is uniquely identified, and is usually associated with information about the principal, including email address, name, role and other metadata.

In addition to the unique identification of each individual principal, a SaaS application has another construct: a tenant. A paper on multi-tenancy defines a tenant as a group of one or more users sharing the same view on an application they use. This view may differ for different tenants. Each individual principal is associated with a tenant, even if it is only a 1:1 mapping. A tenant is uniquely identified, and contains information about the tenant administrator, billing information and other metadata.

When a principal makes a request to a SaaS application, the principal provides their tenant and user identifier along with the request. The SaaS application validates this information and makes an authorization decision. In well-designed SaaS applications, this authorization step should not rely on a centralized authorization service. A centralized authorization service is a single point of failure in an application. If it fails, or is overwhelmed with requests, the application will no longer be able to process requests.

There are two key techniques to providing this type of experience in a SaaS application: using an identity provider (IdP) and representing identity or authorization in a token.

Using an Identity Provider (IdP)

In the past, some web applications often stored user information in a relational database table. When a principal authenticated successfully, the application issued a session ID. For subsequent requests, the principal passed the session ID to the application. The application made authorization decisions based on this session ID. Figure 1 provides an example of how this setup worked.

Figure 1 - An example of legacy application authentication.

Figure 1 – An example of legacy application authentication.

In applications larger than a simple web application, this pattern is suboptimal. Each request usually results in at least one database query or cache look up, creating a bottleneck on the data store holding the user or session information. Further, because of the tight coupling between the application and its user management, federation with external identity providers becomes difficult.

When designing your SaaS application, you should consider the use of an identity provider like Amazon Cognito, Auth0, or Okta. Using an identity provider offloads the heavy lifting required for managing identity by having user authentication, including federation, handled by external identity providers. Figure 2 provides an example of how a SaaS provider can use an identity provider in place of the self-managed solution shown in Figure 1.

Figure 2 – An example of an authentication flow that involves an identity provider.

Figure 2 – An example of an authentication flow that involves an identity provider.

Once a user authenticates with an identity provider, the identity provider issues a standardized token. This token is the same regardless of how a user authenticates, which means your application does not need to build in support for multiple different authentication methods tenants might use.

Identity providers also commonly support federated access. Federated access means that a third party maintains the identities, but the identity provider has a trust relationship with this third party. When a customer tries to log in with an identity managed by the third party, the SaaS application’s identity provider handles the authentication transaction with the third-party identity provider.

This authentication transaction commonly uses a protocol like Security Assertion Markup Language (SAML) 2.0. The SaaS application’s identity provider manages the interaction with the tenant’s identity provider. The SaaS application’s identity provider issues a token in a format understood by the SaaS application. Figure 3 provides an example of how a SaaS application can provide support for federation using an identity provider.

Figure 3 - An example of authentication that involves a tenant-provided identity provider

Figure 3 – An example of authentication that involves a tenant-provided identity provider

For an example, see How to set up Amazon Cognito for federated authentication using Azure AD.

Representing identity with tokens

Identity is usually represented by signed tokens. JSON Web Signatures (JWS), often referred to as JSON Web Tokens (JWT), are signed JSON objects used in web applications to demonstrate that the bearer is authorized to access a particular resource. These JSON objects are signed by the identity provider, and can be validated without querying a centralized database or service.

The token contains several key-value pairs, called claims, which are issued by the identity provider. Besides several claims relating to the issuance and expiration of the token, the token can also contain information about the individual principal and tenant.

Sample access token claims

The example below shows the claims section of a typical access token issued by Amazon Cognito in JWT format.

{
  "sub": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "cognito:groups": [
"TENANT-1"
  ],
  "token_use": "access",
  "auth_time": 1562190524,
  "iss": "https://cognito-idp.us-west-2.amazonaws.com/us-west-2_example",
  "exp": 1562194124,
  "iat": 1562190524,
  "origin_jti": "bbbbbbbbb-cccc-dddd-eeee-aaaaaaaaaaaa",
  "jti": "cccccccc-dddd-eeee-aaaa-bbbbbbbbbbbb",
  "client_id": "12345abcde",

The principal, and the tenant the principal is associated with, are represented in this token by the combination of the user identifier (the sub claim) and the tenant ID in the cognito:groups claim. In this example, the SaaS application represents a tenant by creating a Cognito group per tenant. Other identity providers may allow you to add a custom attribute to a user that is reflected in the access token.

When a SaaS application receives a JWT as part of a request, the application validates the token and unpacks its contents to make authorization decisions. The claims within the token set what is known as the tenant context. Much like the way environment variables can influence a command line application, the tenant context influences how the SaaS application processes the request.

By using a JWT, the SaaS application can process a request without frequent reference to an external identity provider or other centralized service.

Tenant isolation

Tenant isolation is foundational to every SaaS application. Each SaaS application must ensure that one tenant cannot access another tenant’s resources. The SaaS application must create boundaries that adequately isolate one tenant from another.

Determining what constitutes sufficient isolation depends on your domain, deployment model and any applicable compliance frameworks. The techniques for isolating tenants from each other depend on the isolation model and the applications you use. This section provides an overview of tenant isolation strategies.

Your deployment model influences isolation

How an application is deployed influences how tenants are isolated. SaaS applications can use three types of isolation: silo, pool, and bridge.

Silo deployment model

The silo deployment model involves customers deploying one set of infrastructure per tenant. Depending on the application, this may mean a VPC-per-tenant, a set of containers per tenant, or some other resource that is deployed for each tenant. In this model, there is one deployment per tenant, though there may be some shared infrastructure for cross-tenant administration. Figure 4 shows an example of a siloed deployment that uses a VPC-per-tenant model.

Figure 4 - An example of a siloed deployment that provisions a VPC-per-tenant

Figure 4 – An example of a siloed deployment that provisions a VPC-per-tenant

Pool deployment model

The pool deployment model involves a shared set of infrastructure for all tenants. Tenant isolation is implemented logically in the application through application-level constructs. Rather than having separate resources per tenant, isolation enforcement occurs within the application. Figure 5 shows an example of a pooled deployment model that uses serverless technologies.

Figure 5 - An example of a pooled deployment model using serverless technologies

Figure 5 – An example of a pooled deployment model using serverless technologies

In Figure 5, an AWS Lambda function that retrieves an item from an Amazon DynamoDB table shared by all tenants needs temporary credentials issued by the AWS Security Token Service. These credentials only allow the requester to access items in the table that belong to the tenant making the request. A requester gets these credentials by assuming an AWS Identity and Access Management (IAM) role. This allows a SaaS application to share the underlying infrastructure, while still isolating tenants from one another. See Isolation enforcement depends on service below for more details on this pattern.

Bridge deployment model

The bridge model combines elements of both the silo and pool models. Some resources may be separate, others may be shared. For example, suppose your application has a shared application layer and an Amazon Relational Database Service (RDS) instance per tenant. The application layer evaluates each request and connects to the database for the tenant that made the request.

This model is useful in a situation where each tenant may require a certain response time and one set of resources acts as a bottleneck. In the RDS example, the application layer could handle the requests imposed by the tenants, but a single RDS instance could not.

The decision on which isolation model to implement depends on your customer’s requirements, compliance needs or industry needs. You may find that some customers can be deployed onto a pool model, while larger customers may require their own silo deployment.

Your tiering strategy may also influence the type of isolation model you use. For example, a basic tier customer might be deployed onto pooled infrastructure, while an enterprise tier customer is deployed onto siloed infrastructure.

For more information about different tenant isolation models, read the tenant isolation strategies whitepaper.

Isolation enforcement depends on service

Most SaaS applications will need somewhere to store state information. This could be a relational database, a NoSQL database, or some other storage medium which persists state. SaaS applications built on AWS use various mechanisms to enforce tenant isolation when accessing a persistent storage medium.

IAM provides fine grain access controls access for the AWS API. Some services, like Amazon Simple Storage Service (Amazon S3) and DynamoDB, provide the ability to control access to individual objects or items with IAM policies. When possible, your application should use IAM’s built-in functionality to limit access to tenant resources. See Isolating SaaS Tenants with Dynamically Generated IAM Policies for more information about using IAM to implement tenant isolation.

AWS IAM also offers the ability to restrict access to resources based on tags. This is known as attribute-based access control (ABAC). This technique allows you to apply tags to supported resources, and make access control decisions based on which tags are applied. This is a more scalable access control mechanism than role-based access control (RBAC), because you do not need to modify an IAM policy each time a resource is added or removed. See How to implement SaaS tenant isolation with ABAC and AWS IAM for more information about how this can be applied to a SaaS application.

Some relational databases offer features that can enforce tenant isolation. For example, PostgreSQL offers a feature called row level security (RLS). Depending on the context in which the query is sent to the database, only tenant-specific items are returned in the results. See Multi-tenant data isolation with PostgreSQL Row Level Security for more information about row level security in PostgreSQL.

Other persistent storage mediums do not have fine grain permission models. They may, however, offer some kind of state container per tenant. For example, when using MongoDB, each tenant is assigned a MongoDB user and a MongoDB database. The secret associated with the user can be stored in AWS Secrets Manager. When retrieving a tenant’s data, the SaaS application first retrieves the secret, then authenticates with MongoDB. This creates tenant isolation because the associated credentials only have permission to access collections in a tenant-specific database.

Generally, if the persistent storage medium you’re using offers its own permission model that can enforce tenant isolation, you should use it, since this keeps you from having to implement isolation in your application. However, there may be cases where your data store does not offer this level of isolation. In this situation, you would need to write application-level tenant isolation enforcement. Application-level tenant isolation means that the SaaS application, rather than the persistent storage medium, makes sure that one tenant cannot access another tenant’s data.

Conclusion

This post reviews the challenges, opportunities and best practices for the unique security considerations associated with a multi-tenant SaaS application, and describes specific identity considerations, as well as tenant isolation methods.

If you’d like to know more about the topics above, the AWS Well-Architected SaaS Lens Security pillar dives deep on performance management in SaaS environments. It also provides best practices and resources to help you design and improve performance efficiency in your SaaS application.

Get Started with the AWS Well-Architected SaaS Lens

The AWS Well-Architected SaaS Lens focuses on SaaS workloads, and is intended to drive critical thinking for developing and operating SaaS workloads. Each question in the lens has a list of best practices, and each best practice has a list of improvement plans to help guide you in implementing them.

The lens can be applied to existing workloads, or used for new workloads you define in the tool. You can use it to improve the application you’re working on, or to get visibility into multiple workloads used by the department or area you’re working with.

The SaaS Lens is available in all Regions where the AWS Well-Architected Tool is offered, as described in the AWS Regional Services List. There are no costs for using the AWS Well-Architected Tool.

If you’re an AWS customer, find current AWS Partners that can conduct a review by learning about AWS Well-Architected Partners and AWS SaaS Competency Partners.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Security Hub forum. To start your 30-day free trial of Security Hub, visit AWS Security Hub.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Keith P

Keith is a senior partner solutions architect on the SaaS Factory team.

Andy Powell

Andy is the global lead partner for solutions architecture on the SaaS Factory team.

What’s new with Cloudflare for SaaS?

Post Syndicated from Dina Kozlov original https://blog.cloudflare.com/whats-new-with-cloudflare-for-saas/

What’s new with Cloudflare for SaaS?

What’s new with Cloudflare for SaaS?

This past April, we announced the Cloudflare for SaaS Beta which makes our SSL for SaaS product available to everyone. This allows any customer — from first-time developers to large enterprises — to use Cloudflare for SaaS to extend our full product suite to their own customers. SSL for SaaS is the subset of Cloudflare for SaaS features that focus on a customer’s Public Key Infrastructure (PKI) needs.

Today, we’re excited to announce all the customizations that our team has been working on for our Enterprise customers — for both Cloudflare for SaaS and SSL for SaaS.

Let’s start with the basics — the common SaaS setup

If you’re running a SaaS company, your solution might exist as a subdomain of your SaaS website, e.g. template.<mysaas>.com, but ideally, your solution would allow the customer to use their own vanity hostname for it, such as example.com.

The most common way to begin using a SaaS company’s service is to point a CNAME DNS record to the subdomain that the SaaS provider has created for your application. This ensures traffic gets to the right place, and it allows the SaaS provider to make infrastructure changes without involving your end customers.

What’s new with Cloudflare for SaaS?

We kept this in mind when we built our SSL for SaaS a few years ago. SSL for SaaS takes away the burden of certificate issuance and management from the SaaS provider by proxying traffic through Cloudflare’s edge. All the SaaS provider needs to do is onboard their zone to Cloudflare and ask their end customers to create a CNAME to the SaaS zone — something they were already doing.

The big benefit of giving your customers a CNAME record (instead of a fixed IP address) is that it gives you, the SaaS provider, more flexibility and control. It allows you to seamlessly change the IP address of your server in the background. For example, if your IP gets blocked by ISP providers, you can update that address on your customers’ behalf with a CNAME record. With a fixed A record, you rely on each of your customers to make a change.

While the CNAME record works great for most customers, some came back and wanted to bypass the limitation that CNAME records present. RFC 1912 states that CNAME records cannot coexist with other DNS records, so in most cases, you cannot have a CNAME at the root of your domain, e.g. example.com. Instead, the CNAME needs to exist at the subdomain level, i.e. www.example.com. Some DNS providers (including Cloudflare) bypass this restriction through a method called CNAME flattening.

Since SaaS providers have no control over the DNS provider that their customers are using, the feedback they got from their customers was that they wanted to use the apex of their zone and not the subdomain. So when our SaaS customers came back asking us for a solution, we delivered. We call it Apex Proxying.

Apex Proxying

For our SaaS customers who want to allow their customers to proxy their apex to their zone, regardless of which DNS provider they are using, we give them the option of Apex Proxying. Apex Proxying is an SSL for SaaS feature that gives SaaS providers a pair of IP addresses to provide to their customers when CNAME records do not work for them.

Cloudflare starts by allocating a dedicated set of IPs for the SaaS provider. The SaaS provider then gives their customers these IPs that they can add as A or AAAA DNS records, allowing them to proxy traffic directly from the apex of their zone.

While this works for most, some of our customers want more flexibility. They want to have multiple IPs that they can change, or they want to assign different sets of customers to different buckets of IPs. For those customers, we give them the option to bring their own IP range, so they control the IP allocation for their application.

What’s new with Cloudflare for SaaS?

Bring Your Own IPs

Last year, we announced Bring Your Own IP (BYOIP), which allows customers to bring their own IP range for Cloudflare to announce at our edge. One of the benefits of BYOIP is that it allows SaaS customers to allocate that range to their account and then, instead of having a few IPs that their customers can point to, they can distribute all the IPs in their range.

What’s new with Cloudflare for SaaS?

SaaS customers often require granular control of how their IPs are allocated to different zones that belong to different customers. With 256 IPs to use, you have the flexibility to either group customers together or to give them dedicated IPs. It’s up to you!

While we’re on the topic of grouping customers, let’s talk about how you might want to do this when sending traffic to your origin.

Custom Origin Support

When setting up Cloudflare for SaaS, you indicate your fallback origin, which defines the origin that all of your Custom Hostnames are routed to. This origin can be defined by an IP address or point to a load balancer defined in the zone. However, you might not want to route all customers to the same origin. Instead, you want to route different customers (or custom hostnames) to different origins — either because you want to group customers together or to help you scale the origins supporting your application.

Our Enterprise customers can now choose a custom origin that is not the default fallback origin for any of their Custom Hostnames. Traditionally, this has been done by emailing your account manager and requesting custom logic at Cloudflare’s edge, a very cumbersome and outdated practice. But now, customers can easily indicate this in the UI or in their API requests.

Wildcard Support

Oftentimes, SaaS providers have customers that don’t just want their domain to stay protected and encrypted, but also the subdomains that fall under it.

We wanted to give our Enterprise customers the flexibility to extend this benefit to their end customers by offering wildcard support for Custom Hostnames.

Wildcard Custom Hostnames extend the Custom Hostname’s configuration from a specific hostname — e.g. “blog.example.com” — to the next level of subdomains of that hostname, e.g. “*.blog.example.com”.

To create a Custom Hostname with a wildcard, you can either indicate Enable wildcard support when creating a Custom Hostname in the Cloudflare dashboard or when you’re creating a Custom Hostname through the API, indicate wildcard: “true”.

What’s new with Cloudflare for SaaS?

Now let’s switch gears to TLS certificate management and the improvements our team has been working on.

TLS Management for All

SSL for SaaS was built to reduce the burden of certificate management for SaaS providers. The initial functionality was meant to serve most customers and their need to issue, maintain, and renew certificates on their customers’ behalf. But one size does not fit all, and some of our customers have more specific needs for their certificate management — and we want to make sure we accommodate them.

CSR Support/Custom certs

One of the superpowers of SSL for SaaS is that it allows Cloudflare to manage all the certificate issuance and renewals on behalf of our customers and their customers. However, some customers want to allow their end customers to upload their own certificates.

For example, a bank may only trust certain certificate authorities (CAs) for their certificate issuance. Alternatively, the SaaS provider may have initially built out TLS support for their customers and now their customers expect that functionality to be available. Regardless of the reasoning, we have given our customers a few options that satisfy these requirements.

For customers who want to maintain control over their TLS private keys or give their customers the flexibility to use their certification authority (CA) of choice, we allow the SaaS provider to upload their customer’s certificate.

If you are a SaaS provider and one of your customers does not allow third parties to generate keys on their behalf, then you want to allow that customer to upload their own certificate. Cloudflare allows SaaS providers to upload their customers’ certificates to any of their custom hostnames — in just one API call!

Some SaaS providers never want a person to see private key material, but want to be able to use the CA of their choice. They can do so by generating a Certificate Signing Request (CSR) for their Custom Hostnames, and then either use those CSRs themselves to order certificates for their customers or relay the CSRs to their customers so that they can provision their own certificates. In either case, the SaaS provider is able to then upload the certificate for the Custom Hostname after the certificate has been issued from their customer’s CA for use at Cloudflare’s edge.

Custom Metadata

For our customers who need to customize their configuration to handle specific rules for their customer’s domains, they can do so by using Custom Metadata and Workers.

By adding metadata to an individual custom hostname and then deploying a Worker to read the data, you can use the Worker to customize per-hostname behavior.

Some customers use this functionality to add a customer_id field to each custom hostname that they then send in a request header to their origin. Another way to use this is to set headers like HTTP Strict Transport Security (HSTS) on a per-customer basis.

Saving the best for last: Analytics!

Tomorrow, we have a very special announcement about how you can now get more visibility into your customers’ traffic and — more importantly —  how you can share this information back to them.

Interested? Reach out!

If you’re an Enterprise customer, and you’re interested in any of these features, reach out to your account team to get access today!

Zero Trust controls for your SaaS applications

Post Syndicated from Kenny Johnson original https://blog.cloudflare.com/access-saas-integrations/

Zero Trust controls for your SaaS applications

Zero Trust controls for your SaaS applications

Most teams start that journey by moving the applications that lived on their private networks into this Zero Trust model. Instead of a private network where any user on the network is assumed to be trusted, the applications that use Cloudflare Access now check every attempt against the rules you create. For your end users, this makes these applications just feel like regular SaaS apps, while your security teams have full control and logs.

However, we kept hearing from teams that wanted to use their Access control plane to apply consistent security controls to their SaaS apps, and consolidate logs from self-hosted and SaaS in one place.

We’re excited to give your team the tools to solve that challenge. With Access in front of your SaaS applications, you can build Zero Trust rules that determine who can reach your SaaS applications in the same place where your rules for self-hosted applications and network access live. To make that easier, we are launching guided integrations with the Amazon Web Services (AWS) management console, Zendesk, and Salesforce. In just a few minutes, your team can apply a Zero Trust layer over every resource you use and ensure your logs never miss a request.

How it works

Cloudflare Access secures applications that you host by becoming the authoritative DNS for the application itself. All DNS queries, and subsequent HTTP requests, hit Cloudflare’s network first. Once there, Cloudflare can apply the types of identity-aware and context-driven rules that make it possible to move to a Zero Trust model. Enforcing these rules in our network means your application doesn’t need to change. You can secure it on Cloudflare, integrate your single sign-on (SSO) provider and other systems like Crowdstrike and Tanium, and begin building rules.

Zero Trust controls for your SaaS applications

SaaS applications pose a different type of challenge. You do not control where your SaaS applications are hosted — and that’s a big part of the value. You don’t need to worry about maintaining the hardware or software of the application.

However, that also means that your team cannot control how users reach those resources. In most cases, any user on the Internet can attempt to log in. Even if you incorporate SSO authentication or IP-based allowlisting, you might not have the ability to add location or device rules. You also have no way to centrally capture logs of user behavior on a per-request basis. Logging and permissions vary across SaaS applications — some are quite granular while others have non-existent controls and logging.

Cloudflare Access for SaaS solves that problem by injecting Zero Trust checks into the SSO flow for any application that supports SAML authentication. When users visit your SaaS application and attempt to log in, they are redirected through Cloudflare and then to your identity provider. They authenticate with your identity provider and are sent back to Cloudflare, where we layer on additional rules like device posture, multi factor method, and country of login. If the user meets all the requirements, Cloudflare converts the user’s authentication with your identity provider into a SAML assertion that we send to the SaaS application.

Zero Trust controls for your SaaS applications

We built support for SaaS applications by using Workers to take the JWT and convert its content into SAML assertions that are sent to the SaaS application. The application thinks that Cloudflare Access is the identity provider, even though we’re just aggregating identity signals from your SSO provider and other sources into the JWT, and sending that summary to the app via SAML. All of this leverages Cloudflare’s global network and ensures users do not see a performance penalty.

Enforcing managed devices and Gateway for SaaS applications

COVID-19 made it commonplace for employees to work from anywhere and, more concerning, from any device. Many SaaS applications contain sensitive data that should only be accessed with a corporately managed device. A benefit of SaaS tools is that they’re readily available from any device, it’s up to security administrators to enforce which devices can be used to log in.

Once Access for SaaS has been configured as the SSO provider for SaaS applications, policies that verify a device can be configured. You can then lock a tool like Salesforce down to only users with a device that has a known serial number, hard auth key plugged in, an up to date operating system and much more.

Zero Trust controls for your SaaS applications

Cloudflare Gateway keeps your users and data safe from threats on the Internet by filtering Internet-bound connections that leave laptops and offices. Gateway gives administrators the ability to block, allow, or log every connection and request to SaaS applications.

However, users are connecting from personal devices and home WiFi networks, potentially bypassing Internet security filtering available on corporate networks. If users have their password and MFA token, they can bypass security requirements and reach into SaaS applications from their own, unprotected devices at home.

Zero Trust controls for your SaaS applications

To ensure traffic to your SaaS apps only connects over Gateway-protected devices, Cloudflare Access will add a new rule type that requires Gateway when users login to your SaaS applications. Once enabled, users will only be able to connect to your SaaS applications when they use Cloudflare Gateway. Gateway will log those connections and provide visibility into every action within SaaS apps and the Internet.

Getting started and what’s next

It’s easy to get started with setting up Access for SaaS application. Visit the Cloudflare for Teams Dashboard and follow one of our published guides.

We will make it easier to protect SaaS applications and will soon be supporting configuration via metadata files. We will also continue to publish SaaS app specific integration guides. Are there specific applications you’ve been trying to integrate? Let us know in the community!

Implement tenant isolation for Amazon S3 and Aurora PostgreSQL by using ABAC

Post Syndicated from Ashutosh Upadhyay original https://aws.amazon.com/blogs/security/implement-tenant-isolation-for-amazon-s3-and-aurora-postgresql-by-using-abac/

In software as a service (SaaS) systems, which are designed to be used by multiple customers, isolating tenant data is a fundamental responsibility for SaaS providers. The practice of isolation of data in a multi-tenant application platform is called tenant isolation. In this post, we describe an approach you can use to achieve tenant isolation in Amazon Simple Storage Service (Amazon S3) and Amazon Aurora PostgreSQL-Compatible Edition databases by implementing attribute-based access control (ABAC). You can also adapt the same approach to achieve tenant isolation in other AWS services.

ABAC in Amazon Web Services (AWS), which uses tags to store attributes, offers advantages over the traditional role-based access control (RBAC) model. You can use fewer permissions policies, update your access control more efficiently as you grow, and last but not least, apply granular permissions for various AWS services. These granular permissions help you to implement an effective and coherent tenant isolation strategy for your customers and clients. Using the ABAC model helps you scale your permissions and simplify the management of granular policies. The ABAC model reduces the time and effort it takes to maintain policies that allow access to only the required resources.

The solution we present here uses the pool model of data partitioning. The pool model helps you avoid the higher costs of duplicated resources for each tenant and the specialized infrastructure code required to set up and maintain those copies.

Solution overview

In a typical customer environment where this solution is implemented, the tenant request for access might land at Amazon API Gateway, together with the tenant identifier, which in turn calls an AWS Lambda function. The Lambda function is envisaged to be operating with a basic Lambda execution role. This Lambda role should also have permissions to assume the tenant roles. As the request progresses, the Lambda function assumes the tenant role and makes the necessary calls to Amazon S3 or to an Aurora PostgreSQL-Compatible database. This solution helps you to achieve tenant isolation for objects stored in Amazon S3 and data elements stored in an Aurora PostgreSQL-Compatible database cluster.

Figure 1 shows the tenant isolation architecture for both Amazon S3 and Amazon Aurora PostgreSQL-Compatible databases.

Figure 1: Tenant isolation architecture diagram

Figure 1: Tenant isolation architecture diagram

As shown in the numbered diagram steps, the workflow for Amazon S3 tenant isolation is as follows:

  1. AWS Lambda sends an AWS Security Token Service (AWS STS) assume role request to AWS Identity and Access Management (IAM).
  2. IAM validates the request and returns the tenant role.
  3. Lambda sends a request to Amazon S3 with the assumed role.
  4. Amazon S3 sends the response back to Lambda.

The diagram also shows the workflow steps for tenant isolation for Aurora PostgreSQL-Compatible databases, as follows:

  1. Lambda sends an STS assume role request to IAM.
  2. IAM validates the request and returns the tenant role.
  3. Lambda sends a request to IAM for database authorization.
  4. IAM validates the request and returns the database password token.
  5. Lambda sends a request to the Aurora PostgreSQL-Compatible database with the database user and password token.
  6. Aurora PostgreSQL-Compatible database returns the response to Lambda.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  1. An AWS account for your workload.
  2. An Amazon S3 bucket.
  3. An Aurora PostgreSQL-Compatible cluster with a database created.

    Note: Make sure to note down the default master database user and password, and make sure that you can connect to the database from your desktop or from another server (for example, from Amazon Elastic Compute Cloud (Amazon EC2) instances).

  4. A security group and inbound rules that are set up to allow an inbound PostgreSQL TCP connection (Port 5432) from Lambda functions. This solution uses regular non-VPC Lambda functions, and therefore the security group of the Aurora PostgreSQL-Compatible database cluster should allow an inbound PostgreSQL TCP connection (Port 5432) from anywhere (0.0.0.0/0).

Make sure that you’ve completed the prerequisites before proceeding with the next steps.

Deploy the solution

The following sections describe how to create the IAM roles, IAM policies, and Lambda functions that are required for the solution. These steps also include guidelines on the changes that you’ll need to make to the prerequisite components Amazon S3 and the Aurora PostgreSQL-Compatible database cluster.

Step 1: Create the IAM policies

In this step, you create two IAM policies with the required permissions for Amazon S3 and the Aurora PostgreSQL database.

To create the IAM policies

  1. Open the AWS Management Console.
  2. Choose IAM, choose Policies, and then choose Create policy.
  3. Use the following JSON policy document to create the policy. Replace the placeholder <111122223333> with the bucket name from your account.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "s3:Get*",
                    "s3:List*"
                ],
                "Resource": "arn:aws:s3:::sts-ti-demo-<111122223333>/${aws:PrincipalTag/s3_home}/*"
            }
        ]
    }
    

  4. Save the policy with the name sts-ti-demo-s3-access-policy.

    Figure 2: Create the IAM policy for Amazon S3 (sts-ti-demo-s3-access-policy)

    Figure 2: Create the IAM policy for Amazon S3 (sts-ti-demo-s3-access-policy)

  5. Open the AWS Management Console.
  6. Choose IAM, choose Policies, and then choose Create policy.
  7. Use the following JSON policy document to create a second policy. This policy grants an IAM role permission to connect to an Aurora PostgreSQL-Compatible database through a database user that is IAM authenticated. Replace the placeholders with the appropriate Region, account number, and cluster resource ID of the Aurora PostgreSQL-Compatible database cluster, respectively.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "rds-db:connect"
            ],
            "Resource": [
                "arn:aws:rds-db:<us-west-2>:<111122223333>:dbuser:<cluster- ZTISAAAABBBBCCCCDDDDEEEEL4>/${aws:PrincipalTag/dbuser}"
            ]
        }
    ]
}
  • Save the policy with the name sts-ti-demo-dbuser-policy.

    Figure 3: Create the IAM policy for Aurora PostgreSQL database (sts-ti-demo-dbuser-policy)

    Figure 3: Create the IAM policy for Aurora PostgreSQL database (sts-ti-demo-dbuser-policy)

Note: Make sure that you use the cluster resource ID for the clustered database. However, if you intend to adapt this solution for your Aurora PostgreSQL-Compatible non-clustered database, you should use the instance resource ID instead.

Step 2: Create the IAM roles

In this step, you create two IAM roles for the two different tenants, and also apply the necessary permissions and tags.

To create the IAM roles

  1. In the IAM console, choose Roles, and then choose Create role.
  2. On the Trusted entities page, choose the EC2 service as the trusted entity.
  3. On the Permissions policies page, select sts-ti-demo-s3-access-policy and sts-ti-demo-dbuser-policy.
  4. On the Tags page, add two tags with the following keys and values.

    Tag key Tag value
    s3_home tenant1_home
    dbuser tenant1_dbuser
  5. On the Review screen, name the role assumeRole-tenant1, and then choose Save.
  6. In the IAM console, choose Roles, and then choose Create role.
  7. On the Trusted entities page, choose the EC2 service as the trusted entity.
  8. On the Permissions policies page, select sts-ti-demo-s3-access-policy and sts-ti-demo-dbuser-policy.
  9. On the Tags page, add two tags with the following keys and values.

    Tag key Tag value
    s3_home tenant2_home
    dbuser tenant2_dbuser
  10. On the Review screen, name the role assumeRole-tenant2, and then choose Save.

Step 3: Create and apply the IAM policies for the tenants

In this step, you create a policy and a role for the Lambda functions. You also create two separate tenant roles, and establish a trust relationship with the role that you created for the Lambda functions.

To create and apply the IAM policies for tenant1

  1. In the IAM console, choose Policies, and then choose Create policy.
  2. Use the following JSON policy document to create the policy. Replace the placeholder <111122223333> with your AWS account number.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": "sts:AssumeRole",
                "Resource": [
                    "arn:aws:iam::<111122223333>:role/assumeRole-tenant1",
                    "arn:aws:iam::<111122223333>:role/assumeRole-tenant2"
                ]
            }
        ]
    }
    

  3. Save the policy with the name sts-ti-demo-assumerole-policy.
  4. In the IAM console, choose Roles, and then choose Create role.
  5. On the Trusted entities page, select the Lambda service as the trusted entity.
  6. On the Permissions policies page, select sts-ti-demo-assumerole-policy and AWSLambdaBasicExecutionRole.
  7. On the review screen, name the role sts-ti-demo-lambda-role, and then choose Save.
  8. In the IAM console, go to Roles, and enter assumeRole-tenant1 in the search box.
  9. Select the assumeRole-tenant1 role and go to the Trust relationship tab.
  10. Choose Edit the trust relationship, and replace the existing value with the following JSON document. Replace the placeholder <111122223333> with your AWS account number, and choose Update trust policy to save the policy.
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::<111122223333>:role/sts-ti-demo-lambda-role"
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }
    

To verify that the policies are applied correctly for tenant1

In the IAM console, go to Roles, and enter assumeRole-tenant1 in the search box. Select the assumeRole-tenant1 role and on the Permissions tab, verify that sts-ti-demo-dbuser-policy and sts-ti-demo-s3-access-policy appear in the list of policies, as shown in Figure 4.

Figure 4: The assumeRole-tenant1 Permissions tab

Figure 4: The assumeRole-tenant1 Permissions tab

On the Trust relationships tab, verify that sts-ti-demo-lambda-role appears under Trusted entities, as shown in Figure 5.

Figure 5: The assumeRole-tenant1 Trust relationships tab

Figure 5: The assumeRole-tenant1 Trust relationships tab

On the Tags tab, verify that the following tags appear, as shown in Figure 6.

Tag key Tag value
dbuser tenant1_dbuser
s3_home tenant1_home

 

Figure 6: The assumeRole-tenant1 Tags tab

Figure 6: The assumeRole-tenant1 Tags tab

To create and apply the IAM policies for tenant2

  1. In the IAM console, go to Roles, and enter assumeRole-tenant2 in the search box.
  2. Select the assumeRole-tenant2 role and go to the Trust relationship tab.
  3. Edit the trust relationship, replacing the existing value with the following JSON document. Replace the placeholder <111122223333> with your AWS account number.
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::<111122223333>:role/sts-ti-demo-lambda-role"
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }
    

  4. Choose Update trust policy to save the policy.

To verify that the policies are applied correctly for tenant2

In the IAM console, go to Roles, and enter assumeRole-tenant2 in the search box. Select the assumeRole-tenant2 role and on the Permissions tab, verify that sts-ti-demo-dbuser-policy and sts-ti-demo-s3-access-policy appear in the list of policies, you did for tenant1. On the Trust relationships tab, verify that sts-ti-demo-lambda-role appears under Trusted entities.

On the Tags tab, verify that the following tags appear, as shown in Figure 7.

Tag key Tag value
dbuser tenant2_dbuser
s3_home tenant2_home
Figure 7: The assumeRole-tenant2 Tags tab

Figure 7: The assumeRole-tenant2 Tags tab

Step 4: Set up an Amazon S3 bucket

Next, you’ll set up an S3 bucket that you’ll use as part of this solution. You can either create a new S3 bucket or re-purpose an existing one. The following steps show you how to create two user homes (that is, S3 prefixes, which are also known as folders) in the S3 bucket.

  1. In the AWS Management Console, go to Amazon S3 and select the S3 bucket you want to use.
  2. Create two prefixes (folders) with the names tenant1_home and tenant2_home.
  3. Place two test objects with the names tenant.info-tenant1_home and tenant.info-tenant2_home in the prefixes that you just created, respectively.

Step 5: Set up test objects in Aurora PostgreSQL-Compatible database

In this step, you create a table in Aurora PostgreSQL-Compatible Edition, insert tenant metadata, create a row level security (RLS) policy, create tenant users, and grant permission for testing purposes.

To set up Aurora PostgreSQL-Compatible

  1. Connect to Aurora PostgreSQL-Compatible through a client of your choice, using the master database user and password that you obtained at the time of cluster creation.
  2. Run the following commands to create a table for testing purposes and to insert a couple of testing records.
    CREATE TABLE tenant_metadata (
        tenant_id VARCHAR(30) PRIMARY KEY,
        email     VARCHAR(50) UNIQUE,
        status    VARCHAR(10) CHECK (status IN ('active', 'suspended', 'disabled')),
        tier      VARCHAR(10) CHECK (tier IN ('gold', 'silver', 'bronze')));
    
    INSERT INTO tenant_metadata (tenant_id, email, status, tier) 
    VALUES ('tenant1_dbuser','[email protected]','active','gold');
    INSERT INTO tenant_metadata (tenant_id, email, status, tier) 
    VALUES ('tenant2_dbuser','[email protected]','suspended','silver');
    ALTER TABLE tenant_metadata ENABLE ROW LEVEL SECURITY;
    

  3. Run the following command to query the newly created database table.
    SELECT * FROM tenant_metadata;
    

    Figure 8: The tenant_metadata table content

    Figure 8: The tenant_metadata table content

  4. Run the following command to create the row level security policy.
    CREATE POLICY tenant_isolation_policy ON tenant_metadata
    USING (tenant_id = current_user);
    

  5. Run the following commands to establish two tenant users and grant them the necessary permissions.
    CREATE USER tenant1_dbuser WITH LOGIN;
    CREATE USER tenant2_dbuser WITH LOGIN;
    GRANT rds_iam TO tenant1_dbuser;
    GRANT rds_iam TO tenant2_dbuser;
    
    GRANT select, insert, update, delete ON tenant_metadata to tenant1_dbuser, tenant2_dbuser;
    

  6. Run the following commands to verify the newly created tenant users.
    SELECT usename AS role_name,
      CASE
         WHEN usesuper AND usecreatedb THEN
           CAST('superuser, create database' AS pg_catalog.text)
         WHEN usesuper THEN
            CAST('superuser' AS pg_catalog.text)
         WHEN usecreatedb THEN
            CAST('create database' AS pg_catalog.text)
         ELSE
            CAST('' AS pg_catalog.text)
      END role_attributes
    FROM pg_catalog.pg_user
    WHERE usename LIKE (‘tenant%’)
    ORDER BY role_name desc;
    

    Figure 9: Verify the newly created tenant users output

    Figure 9: Verify the newly created tenant users output

Step 6: Set up the AWS Lambda functions

Next, you’ll create two Lambda functions for Amazon S3 and Aurora PostgreSQL-Compatible. You also need to create a Lambda layer for the Python package PG8000.

To set up the Lambda function for Amazon S3

  1. Navigate to the Lambda console, and choose Create function.
  2. Choose Author from scratch. For Function name, enter sts-ti-demo-s3-lambda.
  3. For Runtime, choose Python 3.7.
  4. Change the default execution role to Use an existing role, and then select sts-ti-demo-lambda-role from the drop-down list.
  5. Keep Advanced settings as the default value, and then choose Create function.
  6. Copy the following Python code into the lambda_function.py file that is created in your Lambda function.
    import json
    import os
    import time 
    
    def lambda_handler(event, context):
        import boto3
        bucket_name     =   os.environ['s3_bucket_name']
    
        try:
            login_tenant_id =   event['login_tenant_id']
            data_tenant_id  =   event['s3_tenant_home']
        except:
            return {
                'statusCode': 400,
                'body': 'Error in reading parameters'
            }
    
        prefix_of_role  =   'assumeRole'
        file_name       =   'tenant.info' + '-' + data_tenant_id
    
        # create an STS client object that represents a live connection to the STS service
        sts_client = boto3.client('sts')
        account_of_role = sts_client.get_caller_identity()['Account']
        role_to_assume  =   'arn:aws:iam::' + account_of_role + ':role/' + prefix_of_role + '-' + login_tenant_id
    
        # Call the assume_role method of the STSConnection object and pass the role
        # ARN and a role session name.
        RoleSessionName = 'AssumeRoleSession' + str(time.time()).split(".")[0] + str(time.time()).split(".")[1]
        try:
            assumed_role_object = sts_client.assume_role(
                RoleArn         = role_to_assume, 
                RoleSessionName = RoleSessionName, 
                DurationSeconds = 900) #15 minutes
    
        except:
            return {
                'statusCode': 400,
                'body': 'Error in assuming the role ' + role_to_assume + ' in account ' + account_of_role
            }
    
        # From the response that contains the assumed role, get the temporary 
        # credentials that can be used to make subsequent API calls
        credentials=assumed_role_object['Credentials']
        
        # Use the temporary credentials that AssumeRole returns to make a connection to Amazon S3  
        s3_resource=boto3.resource(
            's3',
            aws_access_key_id=credentials['AccessKeyId'],
            aws_secret_access_key=credentials['SecretAccessKey'],
            aws_session_token=credentials['SessionToken']
        )
    
        try:
            obj = s3_resource.Object(bucket_name, data_tenant_id + "/" + file_name)
            return {
                'statusCode': 200,
                'body': obj.get()['Body'].read()
            }
        except:
            return {
                'statusCode': 400,
                'body': 'error in reading s3://' + bucket_name + '/' + data_tenant_id + '/' + file_name
            }
    

  7. Under Basic settings, edit Timeout to increase the timeout to 29 seconds.
  8. Edit Environment variables to add a key called s3_bucket_name, with the value set to the name of your S3 bucket.
  9. Configure a new test event with the following JSON document, and save it as testEvent.
    {
      "login_tenant_id": "tenant1",
      "s3_tenant_home": "tenant1_home"
    }
    

  10. Choose Test to test the Lambda function with the newly created test event testEvent. You should see status code 200, and the body of the results should contain the data for tenant1.

    Figure 10: The result of running the sts-ti-demo-s3-lambda function

    Figure 10: The result of running the sts-ti-demo-s3-lambda function

Next, create another Lambda function for Aurora PostgreSQL-Compatible. To do this, you first need to create a new Lambda layer.

To set up the Lambda layer

  1. Use the following commands to create a .zip file for Python package pg8000.

    Note: This example is created by using an Amazon EC2 instance running the Amazon Linux 2 Amazon Machine Image (AMI). If you’re using another version of Linux or don’t have the Python 3 or pip3 packages installed, install them by using the following commands.

    sudo yum update -y 
    sudo yum install python3 
    sudo pip3 install pg8000 -t build/python/lib/python3.8/site-packages/ 
    cd build 
    sudo zip -r pg8000.zip python/
    

  2. Download the pg8000.zip file you just created to your local desktop machine or into an S3 bucket location.
  3. Navigate to the Lambda console, choose Layers, and then choose Create layer.
  4. For Name, enter pgdb, and then upload pg8000.zip from your local desktop machine or from the S3 bucket location.

    Note: For more details, see the AWS documentation for creating and sharing Lambda layers.

  5. For Compatible runtimes, choose python3.6, python3.7, and python3.8, and then choose Create.

To set up the Lambda function with the newly created Lambda layer

  1. In the Lambda console, choose Function, and then choose Create function.
  2. Choose Author from scratch. For Function name, enter sts-ti-demo-pgdb-lambda.
  3. For Runtime, choose Python 3.7.
  4. Change the default execution role to Use an existing role, and then select sts-ti-demo-lambda-role from the drop-down list.
  5. Keep Advanced settings as the default value, and then choose Create function.
  6. Choose Layers, and then choose Add a layer.
  7. Choose Custom layer, select pgdb with Version 1 from the drop-down list, and then choose Add.
  8. Copy the following Python code into the lambda_function.py file that was created in your Lambda function.
    import boto3
    import pg8000
    import os
    import time
    import ssl
    
    connection = None
    assumed_role_object = None
    rds_client = None
    
    def assume_role(event):
        global assumed_role_object
        try:
            RolePrefix  = os.environ.get("RolePrefix")
            LoginTenant = event['login_tenant_id']
        
            # create an STS client object that represents a live connection to the STS service
            sts_client      = boto3.client('sts')
            # Prepare input parameters
            role_to_assume  = 'arn:aws:iam::' + sts_client.get_caller_identity()['Account'] + ':role/' + RolePrefix + '-' + LoginTenant
            RoleSessionName = 'AssumeRoleSession' + str(time.time()).split(".")[0] + str(time.time()).split(".")[1]
        
            # Call the assume_role method of the STSConnection object and pass the role ARN and a role session name.
            assumed_role_object = sts_client.assume_role(
                RoleArn         =   role_to_assume, 
                RoleSessionName =   RoleSessionName,
                DurationSeconds =   900) #15 minutes 
            
            return assumed_role_object['Credentials']
        except Exception as e:
            print({'Role assumption failed!': {'role': role_to_assume, 'Exception': 'Failed due to :{0}'.format(str(e))}})
            return None
    
    def get_connection(event):
        global rds_client
        creds = assume_role(event)
    
        try:
            # create an RDS client using assumed credentials
            rds_client = boto3.client('rds',
                aws_access_key_id       = creds['AccessKeyId'],
                aws_secret_access_key   = creds['SecretAccessKey'],
                aws_session_token       = creds['SessionToken'])
    
            # Read the environment variables and event parameters
            DBEndPoint   = os.environ.get('DBEndPoint')
            DatabaseName = os.environ.get('DatabaseName')
            DBUserName   = event['dbuser']
    
            # Generates an auth token used to connect to a database with IAM credentials.
            pwd = rds_client.generate_db_auth_token(
                DBHostname=DBEndPoint, Port=5432, DBUsername=DBUserName, Region='us-west-2'
            )
    
            ssl_context             = ssl.SSLContext()
            ssl_context.verify_mode = ssl.CERT_REQUIRED
            ssl_context.load_verify_locations('rds-ca-2019-root.pem')
    
            # create a database connection
            conn = pg8000.connect(
                host        =   DBEndPoint,
                user        =   DBUserName,
                database    =   DatabaseName,
                password    =   pwd,
                ssl_context =   ssl_context)
            
            return conn
        except Exception as e:
            print ({'Database connection failed!': {'Exception': "Failed due to :{0}".format(str(e))}})
            return None
    
    def execute_sql(connection, query):
        try:
            cursor = connection.cursor()
            cursor.execute(query)
            columns = [str(desc[0]) for desc in cursor.description]
            results = []
            for res in cursor:
                results.append(dict(zip(columns, res)))
            cursor.close()
            retry = False
            return results    
        except Exception as e:
            print ({'Execute SQL failed!': {'Exception': "Failed due to :{0}".format(str(e))}})
            return None
    
    
    def lambda_handler(event, context):
        global connection
        try:
            connection = get_connection(event)
            if connection is None:
                return {'statusCode': 400, "body": "Error in database connection!"}
    
            response = {'statusCode':200, 'body': {
                'db & user': execute_sql(connection, 'SELECT CURRENT_DATABASE(), CURRENT_USER'), \
                'data from tenant_metadata': execute_sql(connection, 'SELECT * FROM tenant_metadata')}}
            return response
        except Exception as e:
            try:
                connection.close()
            except Exception as e:
                connection = None
            return {'statusCode': 400, 'statusDesc': 'Error!', 'body': 'Unhandled error in Lambda Handler.'}
    

  9. Add a certificate file called rds-ca-2019-root.pem into the Lambda project root by downloading it from https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem.
  10. Under Basic settings, edit Timeout to increase the timeout to 29 seconds.
  11. Edit Environment variables to add the following keys and values.

    Key Value
    DBEndPoint Enter the database cluster endpoint URL
    DatabaseName Enter the database name
    RolePrefix assumeRole
    Figure 11: Example of environment variables display

    Figure 11: Example of environment variables display

  12. Configure a new test event with the following JSON document, and save it as testEvent.
    {
      "login_tenant_id": "tenant1",
      "dbuser": "tenant1_dbuser"
    }
    

  13. Choose Test to test the Lambda function with the newly created test event testEvent. You should see status code 200, and the body of the results should contain the data for tenant1.

    Figure 12: The result of running the sts-ti-demo-pgdb-lambda function

    Figure 12: The result of running the sts-ti-demo-pgdb-lambda function

Step 7: Perform negative testing of tenant isolation

You already performed positive tests of tenant isolation during the Lambda function creation steps. However, it’s also important to perform some negative tests to verify the robustness of the tenant isolation controls.

To perform negative tests of tenant isolation

  1. In the Lambda console, navigate to the sts-ti-demo-s3-lambda function. Update the test event to the following, to mimic a scenario where tenant1 attempts to access other tenants’ objects.
    {
      "login_tenant_id": "tenant1",
      "s3_tenant_home": "tenant2_home"
    }
    

  2. Choose Test to test the Lambda function with the updated test event. You should see status code 400, and the body of the results should contain an error message.

    Figure 13: The results of running the sts-ti-demo-s3-lambda function (negative test)

    Figure 13: The results of running the sts-ti-demo-s3-lambda function (negative test)

  3. Navigate to the sts-ti-demo-pgdb-lambda function and update the test event to the following, to mimic a scenario where tenant1 attempts to access other tenants’ data elements.
    {
      "login_tenant_id": "tenant1",
      "dbuser": "tenant2_dbuser"
    }
    

  4. Choose Test to test the Lambda function with the updated test event. You should see status code 400, and the body of the results should contain an error message.

    Figure 14: The results of running the sts-ti-demo-pgdb-lambda function (negative test)

    Figure 14: The results of running the sts-ti-demo-pgdb-lambda function (negative test)

Cleaning up

To de-clutter your environment, remove the roles, policies, Lambda functions, Lambda layers, Amazon S3 prefixes, database users, and the database table that you created as part of this exercise. You can choose to delete the S3 bucket, as well as the Aurora PostgreSQL-Compatible database cluster that we mentioned in the Prerequisites section, to avoid incurring future charges.

Update the security group of the Aurora PostgreSQL-Compatible database cluster to remove the inbound rule that you added to allow a PostgreSQL TCP connection (Port 5432) from anywhere (0.0.0.0/0).

Conclusion

By taking advantage of attribute-based access control (ABAC) in IAM, you can more efficiently implement tenant isolation in SaaS applications. The solution we presented here helps to achieve tenant isolation in Amazon S3 and Aurora PostgreSQL-Compatible databases by using ABAC with the pool model of data partitioning.

If you run into any issues, you can use Amazon CloudWatch and AWS CloudTrail to troubleshoot. If you have feedback about this post, submit comments in the Comments section below.

To learn more, see these AWS Blog and AWS Support articles:

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Ashutosh Upadhyay

Ashutosh works as a Senior Security, Risk, and Compliance Consultant in AWS Professional Services (GFS) and is based in Singapore. Ashutosh started off his career as a developer, and for the past many years has been working in the Security, Risk, and Compliance field. Ashutosh loves to spend his free time learning and playing with new technologies.

Author

Chirantan Saha

Chirantan works as a DevOps Engineer in AWS Professional Services (GFS) and is based in Singapore. Chirantan has 20 years of development and DevOps experience working for large banking, financial services, and insurance organizations. Outside of work, Chirantan enjoys traveling, spending time with family, and helping his children with their science projects.

How to implement SaaS tenant isolation with ABAC and AWS IAM

Post Syndicated from Michael Pelts original https://aws.amazon.com/blogs/security/how-to-implement-saas-tenant-isolation-with-abac-and-aws-iam/

Multi-tenant applications must be architected so that the resources of each tenant are isolated and cannot be accessed by other tenants in the system. AWS Identity and Access Management (IAM) is often a key element in achieving this goal. One of the challenges with using IAM, however, is that the number and complexity of IAM policies you need to support your tenants can grow rapidly and impact the scale and manageability of your isolation model. The attribute-based access control (ABAC) mechanism of IAM provides developers with a way to address this challenge.

In this blog post, we describe and provide detailed examples of how you can use ABAC in IAM to implement tenant isolation in a multi-tenant environment.

Choose an IAM isolation method

IAM makes it possible to implement tenant isolation and scope down permissions in a way that is integrated with other AWS services. By relying on IAM, you can create strong isolation foundations in your system, and reduce the risk of developers unintentionally introducing code that leads to a violation of tenant boundaries. IAM provides an AWS native, non-invasive way for you to achieve isolation for those cases where IAM supports policies that align with your overall isolation model.

There are several methods in IAM that you can use for isolating tenants and restricting access to resources. Choosing the right method for your application depends on several parameters. The number of tenants and the number of role definitions are two important dimensions that you should take into account.

Most applications require multiple role definitions for different user functions. A role definition refers to a minimal set of privileges that users or programmatic components need in order to do their job. For example, business users and data analysts would typically have different set of permissions to allow minimum necessary access to resources that they use.

In software-as-a-service (SaaS) applications, in addition to functional boundaries, there are also boundaries between tenant resources. As a result, the entire set of role definitions exists for each individual tenant. In highly dynamic environments (e.g., collaboration scenarios with cross-tenant access), new role definitions can be added ad-hoc. In such a case, the number of role definitions and their complexity can grow significantly as the system evolves.

There are three main tenant isolation methods in IAM. Let’s briefly review them before focusing on the ABAC in the following sections.

Figure 1: IAM tenant isolation methods

Figure 1: IAM tenant isolation methods

RBAC – Each tenant has a dedicated IAM role or static set of IAM roles that it uses for access to tenant resources. The number of IAM roles in RBAC equals to the number of role definitions multiplied by the number of tenants. RBAC works well when you have a small number of tenants and relatively static policies. You may find it difficult to manage multiple IAM roles as the number of tenants and the complexity of the attached policies grows.

Dynamically generated IAM policies – This method dynamically generates an IAM policy for a tenant according to user identity. Choose this method in highly dynamic environments with changing or frequently added role definitions (e.g., tenant collaboration scenario). You may also choose dynamically generated policies if you have a preference for generating and managing IAM policies by using your code rather than relying on built-in IAM service features. You can find more details about this method in the blog post Isolating SaaS Tenants with Dynamically Generated IAM Policies.

ABAC – This method is suitable for a wide range of SaaS applications, unless your use case requires support for frequently changed or added role definitions, which are easier to manage with dynamically generated IAM policies. Unlike Dynamically generated IAM policies, where you manage and apply policies through a self-managed mechanism, ABAC lets you rely more directly on IAM.

ABAC for tenant isolation

ABAC is achieved by using parameters (attributes) to control tenant access to resources. Using ABAC for tenant isolation results in temporary access to resources, which is restricted according to the caller’s identity and attributes.

One of the key advantages of the ABAC model is that it scales to any number of tenants with a single role. This is achieved by using tags (such as the tenant ID) in IAM polices and a temporary session created specifically for accessing tenant data. The session encapsulates the attributes of the requesting entity (for example, a tenant user). At policy evaluation time, IAM replaces these tags with session attributes.

Another component of ABAC is the assignation of attributes to tenant resources by using special naming conventions or resource tags. The access to a resource is granted when session and resource attributes match (for example, a session with the TenantID: yellow attribute can access a resource that is tagged as TenantID: yellow).

For more information about ABAC in IAM, see What is ABAC for AWS?

ABAC in a typical SaaS architecture

To demonstrate how you can use ABAC in IAM for tenant isolation, we will walk you through an example of a typical microservices-based application. More specifically, we will focus on two microservices that implement a shipment tracking flow in a multi-tenant ecommerce application.

Our sample tenant, Yellow, which has many users in many roles, has exclusive access to shipment data that belongs to this particular tenant. To achieve this, all microservices in the call chain operate in a restricted context that prevents cross-tenant access.

Figure 2: Sample shipment tracking flow in a SaaS application

Figure 2: Sample shipment tracking flow in a SaaS application

Let’s take a closer look at the sequence of events and discuss the implementation in detail.

A shipment tracking request is initiated by an authenticated Tenant Yellow user. The authentication process is left out of the scope of this discussion for the sake of brevity. The user identity expressed in the JSON Web Token (JWT) includes custom claims, one of which is a TenantID. In this example, TenantID equals yellow.

The JWT is delivered from the user’s browser in the HTTP header of the Get Shipment request to the shipment service. The shipment service then authenticates the request and collects the required parameters for getting the shipment estimated time of arrival (ETA). The shipment service makes a GetShippingETA request using the parameters to the tracking service along with the JWT.

The tracking service manages shipment tracking data in a data repository. The repository stores data for all of the tenants, but each shipment record there has an attached TenantID resource tag, for instance yellow, as in our example.

An IAM role attached to the tracking service, called TrackingServiceRole in this example, determines the AWS resources that the microservice can access and the actions it can perform.

Note that TrackingServiceRole itself doesn’t have permissions to access tracking data in the data store. To get access to tracking records, the tracking service temporarily assumes another role called TrackingAccessRole. This role remains valid for a limited period of time, until credentials representing this temporary session expire.

To understand how this works, we need to talk first about the trust relationship between TrackingAccessRole and TrackingServiceRole. The following trust policy lists TrackingServiceRole as a trusted entity.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<account-id>:role/TrackingServiceRole"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<account-id>:role/TrackingServiceRole"
      },
      "Action": "sts:TagSession",
      "Condition": {
        "StringLike": {
          "aws:RequestTag/TenantID": "*"
        }
      }
    }
  ]
}

This policy needs to be associated with TrackingAccessRole. You can do that on the Trust relationships tab of the Role Details page in the IAM console or via the AWS CLI update-assume-role-policy method. That association is what allows the tracking service with the attached TrackingServiceRole role to assume TrackingAccessRole. The policy also allows TrackingServiceRole to attach the TenantID session tag to the temporary sessions it creates.

Session tags are principal tags that you specify when you request a session. This is how you inject variables into the request context for API calls executed during the session. This is what allows IAM policies evaluated in subsequent API calls to reference TenantID with the aws:PrincipalTag context key.

Now let’s talk about TrackingAccessPolicy. It’s an identity policy attached to TrackingAccessRole. This policy makes use of the aws:PrincipalTag/TenantID key to dynamically scope access to a specific tenant.

Later in this post, you can see examples of such data access policies for three different data storage services.

Now the stage is set to see how the tracking service creates a temporary session and injects TenantID into the request context. The following Python function does that by using AWS SDK for Python (Boto3). The function gets the TenantID (extracted from the JWT) and the TrackingAccessRole Amazon Resource Name (ARN) as parameters and returns a scoped Boto3 session object.

import boto3

def create_temp_tenant_session(access_role_arn, session_name, tenant_id, duration_sec):
    """
    Create a temporary session
    :param access_role_arn: The ARN of the role that the caller is assuming
    :param session_name: An identifier for the assumed session
    :param tenant_id: The tenant identifier the session is created for
    :param duration_sec: The duration, in seconds, of the temporary session
    :return: The session object that allows you to create service clients and resources
    """
    sts = boto3.client('sts')
    assume_role_response = sts.assume_role(
        RoleArn=access_role_arn,
        DurationSeconds=duration_sec,
        RoleSessionName=session_name,
        Tags=[
            {
                'Key': 'TenantID',
                'Value': tenant_id
            }
        ]
    )
    session = boto3.Session(aws_access_key_id=assume_role_response['Credentials']['AccessKeyId'],
                    aws_secret_access_key=assume_role_response['Credentials']['SecretAccessKey'],
                    aws_session_token=assume_role_response['Credentials']['SessionToken'])
    return session

Use these parameters to create temporary sessions for a specific tenant with a duration that meets your needs.

access_role_arn – The assumed role with an attached templated policy. The IAM policy must include the aws:PrincipalTag/TenantID tag key.

session_name – The name of the session. Use the role session name to uniquely identify a session. The role session name is used in the ARN of the assumed role principal and included in the AWS CloudTrail logs.

tenant_id – The tenant identifier that describes which tenant the session is created for. For better compatibility with resource names in IAM policies, it’s recommended to generate non-guessable alphanumeric lowercase tenant identifiers.

duration_sec – The duration of your temporary session.

Note: The details of token management can be abstracted away from the application by extracting the token generation into a separate module, as described in the blog post Isolating SaaS Tenants with Dynamically Generated IAM Policies. In that post, the reusable application code for acquiring temporary session tokens is called a Token Vending Machine.

The returned session can be used to instantiate IAM-scoped objects such as a storage service. After the session is returned, any API call performed with the temporary session credentials contains the aws:PrincipalTag/TenantID key-value pair in the request context.

When the tracking service attempts to access tracking data, IAM completes several evaluation steps to determine whether to allow or deny the request. These include evaluation of the principal’s identity-based policy, which is, in this example, represented by TrackingAccessPolicy. It is at this stage that the aws:PrincipalTag/TenantID tag key is replaced with the actual value, policy conditions are resolved, and access is granted to the tenant data.

Common ABAC scenarios

Let’s take a look at some common scenarios with different data storage services. For each example, a diagram is included that illustrates the allowed access to tenant data and how the data is partitioned in the service.

These examples rely on the architecture described earlier and assume that the temporary session context contains a TenantID parameter. We will demonstrate different versions of TrackingAccessPolicy that are applicable to different services. The way aws:PrincipalTag/TenantID is used depends on service-specific IAM features, such as tagging support, policy conditions and ability to parameterize resource ARN with session tags. Examples below illustrate these techniques applied to different services.

Pooled storage isolation with DynamoDB

Many SaaS applications rely on a pooled data partitioning model where data from all tenants is combined into a single table. The tenant identifier is then introduced into each table to identify the items that are associated with each tenant. Figure 3 provides an example of this model.

Figure 3: DynamoDB index-based partitioning

Figure 3: DynamoDB index-based partitioning

In this example, we’ve used Amazon DynamoDB, storing each tenant identifier in the table’s partition key. Now, we can use ABAC and IAM fine-grained access control to implement tenant isolation for the items in this table.

The following TrackingAccessPolicy uses the dynamodb:LeadingKeys condition key to restrict permissions to only the items whose partition key matches the tenant’s identifier as passed in a session tag.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:BatchGetItem",
        "dynamodb:Query"
      ],
      "Resource": [
        "arn:aws:dynamodb:<region>:<account-id>:table/TrackingData"
      ],
      "Condition": {
        "ForAllValues:StringEquals": {
          "dynamodb:LeadingKeys": [
            "${aws:PrincipalTag/TenantID}"
          ]
        }
      }
    }
  ]
}

This example uses the dynamodb:LeadingKeys condition key in the policy to describe how you can control access to tenant resources. You’ll notice that we haven’t bound this policy to any specific tenant. Instead, the policy relies on the aws:PrincipalTag tag to resolve the TenantID parameter at runtime.

This approach means that you can add new tenants without needing to create any new IAM constructs. This reduces the maintenance burden and limits your chances that any IAM quotas will be exceeded.

Siloed storage isolation with Amazon Elasticsearch Service

Let’s look at another example that illustrates how you might implement tenant isolation of Amazon Elasticsearch Service resources. Figure 4 illustrates a silo data partitioning model, where each tenant of your system has a separate Elasticsearch index for each tenant.

Figure 4: Elasticsearch index-per-tenant strategy

Figure 4: Elasticsearch index-per-tenant strategy

You can isolate these tenant resources by creating a parameterized identity policy with the principal TenantID tag as a variable (similar to the one we created for DynamoDB). In the following example, the principal tag is a part of the index name in the policy resource element. At access time, the principal tag is replaced with the tenant identifier from the request context, yielding the Elasticsearch index ARN as a result.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "es:ESHttpGet",
        "es:ESHttpPut"
      ],
      "Resource": [
        "arn:aws:es:<region>:<account-id>:domain/test/${aws:PrincipalTag/TenantID}*/*"
      ]
    }
  ]
}

In the case where you have multiple indices that belong to the same tenant, you can allow access to them by using a wildcard. The preceding policy allows es:ESHttpGet and es:ESHttpPut actions to be taken on documents if the documents belong to an index with a name that matches the pattern.

Important: In order for this to work, the tenant identifier must follow the same naming restrictions as indices.

Although this approach scales the tenant isolation strategy, you need to keep in mind that this solution is constrained by the number of indices your Elasticsearch cluster can support.

Amazon S3 prefix-per-tenant strategy

Amazon Simple Storage Service (Amazon S3) buckets are commonly used as shared object stores with dedicated prefixes for different tenants. For enhanced security, you can optionally use a dedicated customer master key (CMK) per tenant. If you do so, attach a corresponding TenantID resource tag to a CMK.

By using ABAC and IAM, you can make sure that each tenant can only get and decrypt objects in a shared S3 bucket that have the prefixes that correspond to that tenant.

Figure 5: S3 prefix-per-tenant strategy

Figure 5: S3 prefix-per-tenant strategy

In the following policy, the first statement uses the TenantID principal tag in the resource element. The policy grants s3:GetObject permission, but only if the requested object key begins with the tenant’s prefix.

The second statement allows the kms:Decrypt operation on a KMS key that the requested object is encrypted with. The KMS key must have a TenantID resource tag attached to it with a corresponding tenant ID as a value.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::sample-bucket-12345678/${aws:PrincipalTag/TenantID}/*"
    },
    {
      "Effect": "Allow",
      "Action": "kms:Decrypt",
       "Resource": "arn:aws:kms:<region>:<account-id>:key/*",
       "Condition": {
           "StringEquals": {
           "aws:PrincipalTag/TenantID": "${aws:ResourceTag/TenantID}"
        }
      }
    }
  ]
}

Important: In order for this policy to work, the tenant identifier must follow the S3 object key name guidelines.

With the prefix-per-tenant approach, you can support any number of tenants. However, if you choose to use a dedicated customer managed KMS key per tenant, you will be bounded by the number of KMS keys per AWS Region.

Conclusion

The ABAC method combined with IAM provides teams who build SaaS platforms with a compelling model for implementing tenant isolation. By using this dynamic, attribute-driven model, you can scale your IAM isolation policies to any practical number of tenants. This approach also makes it possible for you to rely on IAM to manage, scale, and enforce isolation in a way that’s integrated into your overall tenant identity scheme. You can start experimenting with IAM ABAC by using either the examples in this blog post, or this resource: IAM Tutorial: Define permissions to access AWS resources based on tags.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS IAM forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Michael Pelts

As a Senior Solutions Architect at AWS, Michael works with large ISV customers, helping them create innovative solutions to address their cloud challenges. Michael is passionate about his work, enjoys the creativity that goes into building solutions in the cloud, and derives pleasure from passing on his knowledge.

Author

Oren Reuveni

Oren is a Principal Solutions Architect and member of the SaaS Factory team. He helps guide and assist AWS partners with building their SaaS products on AWS. Oren has over 15 years of experience in the modern IT and Cloud domains. He is passionate about shaping the right dynamics between technology and business.