Tag Archives: Enterprise governance and control

Data governance in the age of generative AI

Post Syndicated from Krishna Rupanagunta original https://aws.amazon.com/blogs/big-data/data-governance-in-the-age-of-generative-ai/

Data is your generative AI differentiator, and a successful generative AI implementation depends on a robust data strategy incorporating a comprehensive data governance approach. Working with large language models (LLMs) for enterprise use cases requires the implementation of quality and privacy considerations to drive responsible AI. However, enterprise data generated from siloed sources combined with the lack of a data integration strategy creates challenges for provisioning the data for generative AI applications. The need for an end-to-end strategy for data management and data governance at every step of the journey—from ingesting, storing, and querying data to analyzing, visualizing, and running artificial intelligence (AI) and machine learning (ML) models—continues to be of paramount importance for enterprises.

In this post, we discuss the data governance needs of generative AI application data pipelines, a critical building block to govern data used by LLMs to improve the accuracy and relevance of their responses to user prompts in a safe, secure, and transparent manner. Enterprises are doing this by using proprietary data with approaches like Retrieval Augmented Generation (RAG), fine-tuning, and continued pre-training with foundation models.

Data governance is a critical building block across all these approaches, and we see two emerging areas of focus. First, many LLM use cases rely on enterprise knowledge that needs to be drawn from unstructured data such as documents, transcripts, and images, in addition to structured data from data warehouses. Unstructured data is typically stored across siloed systems in varying formats, and generally not managed or governed with the same level of rigor as structured data. Second, generative AI applications introduce a higher number of data interactions than conventional applications, which requires that the data security, privacy, and access control policies be implemented as part of the generative AI user workflows.

In this post, we cover data governance for building generative AI applications on AWS with a lens on structured and unstructured enterprise knowledge sources, and the role of data governance during the user request-response workflows.

Use case overview

Let’s explore an example of a customer support AI assistant. The following figure shows the typical conversational workflow that is initiated with a user prompt.

The workflow includes the following key data governance steps:

  1. Prompt user access control and security policies.
  2. Access policies to extract permissions based on relevant data and filter out results based on the prompt user role and permissions.
  3. Enforce data privacy policies such as personally identifiable information (PII) redactions.
  4. Enforce fine-grained access control.
  5. Grant the user role permissions for sensitive information and compliance policies.

To provide a response that includes the enterprise context, each user prompt needs to be augmented with a combination of insights from structured data from the data warehouse and unstructured data from the enterprise data lake. On the backend, the batch data engineering processes refreshing the enterprise data lake need to expand to ingest, transform, and manage unstructured data. As part of the transformation, the objects need to be treated to ensure data privacy (for example, PII redaction). Finally, access control policies also need to be extended to the unstructured data objects and to vector data stores.

Let’s look at how data governance can be applied to the enterprise knowledge source data pipelines and the user request-response workflows.

Enterprise knowledge: Data management

The following figure summarizes data governance considerations for data pipelines and the workflow for applying data governance.

Data governance steps in data pipelines

In the above figure, the data engineering pipelines include the following data governance steps:

  1. Create and update a catalog through data evolution.
  2. Implement data privacy policies.
  3. Implement data quality by data type and source.
  4. Link structured and unstructured datasets.
  5. Implement unified fine-grained access controls for structured and unstructured datasets.

Let’s look at some of the key changes in the data pipelines namely, data cataloging, data quality, and vector embedding security in more detail.

Data discoverability

Unlike structured data, which is managed in well-defined rows and columns, unstructured data is stored as objects. For users to be able to discover and comprehend the data, the first step is to build a comprehensive catalog using the metadata that is generated and captured in the source systems. This starts with the objects (such as documents and transcript files) being ingested from the relevant source systems into the raw zone in the data lake in Amazon Simple Storage Service (Amazon S3) in their respective native formats (as illustrated in the preceding figure). From here, object metadata (such as file owner, creation date, and confidentiality level) is extracted and queried using Amazon S3 capabilities. Metadata can vary by data source, and it’s important to examine the fields and, where required, derive the necessary fields to complete all the necessary metadata. For instance, if an attribute like content confidentiality is not tagged at a document level in the source application, this may need to be derived as part of the metadata extraction process and added as an attribute in the data catalog. The ingestion process needs to capture object updates (changes, deletions) in addition to new objects on an ongoing basis. For detailed implementation guidance, refer to Unstructured data management and governance using AWS AI/ML and analytics services. To further simplify the discovery and introspection between business glossaries and technical data catalogs, you can use Amazon DataZone for business users to discover and share data stored across data silos.

Data privacy

Enterprise knowledge sources often contain PII and other sensitive data (such as addresses and Social Security numbers). Based on your data privacy policies, these elements need to be treated (masked, tokenized, or redacted) from the sources before they can be used for downstream use cases. From the raw zone in Amazon S3, the objects need to be processed before they can be consumed by downstream generative AI models. A key requirement here is PII identification and redaction, which you can implement with Amazon Comprehend. It’s important to remember that it will not always be feasible to strip away all the sensitive data without impacting the context of the data. Semantic context is one of the key factors that drive the accuracy and relevance of generative AI model outputs, and it’s critical to work backward from the use case and strike the necessary balance between privacy controls and model performance.

Data enrichment

In addition, additional metadata may need to be extracted from the objects. Amazon Comprehend provides capabilities for entity recognition (for example, identifying domain-specific data like policy numbers and claim numbers) and custom classification (for example, categorizing a customer care chat transcript based on the issue description). Furthermore, you may need to combine the unstructured and structured data to create a holistic picture of key entities, like customers. For example, in an airline loyalty scenario, there would be significant value in linking unstructured data capture of customer interactions (such as customer chat transcripts and customer reviews) with structured data signals (such as ticket purchases and miles redemption) to create a more complete customer profile that can then enable the delivery of better and more relevant trip recommendations. AWS Entity Resolution is an ML service that helps in matching and linking records. This service helps link related sets of information to create deeper, more connected data about key entities like customers, products, and so on, which can further improve the quality and relevance of LLM outputs. This is available in the transformed zone in Amazon S3 and is ready to be consumed downstream for vector stores, fine-tuning, or training of LLMs. After these transformations, data can be made available in the curated zone in Amazon S3.

Data quality

A critical factor to realizing the full potential of generative AI is dependent on the quality of the data that is used to train the models as well as the data that is used to augment and enhance the model response to a user input. Understanding the models and their outcomes in the context of accuracy, bias, and reliability is directly proportional to the quality of data used to build and train the models.

Amazon SageMaker Model Monitor provides a proactive detection of deviations in model data quality drift and model quality metrics drift. It also monitors bias drift in your model’s predictions and feature attribution. For more details, refer to Monitoring in-production ML models at large scale using Amazon SageMaker Model Monitor. Detecting bias in your model is a fundamental building block to responsible AI, and Amazon SageMaker Clarify helps detect potential bias that can produce a negative or a less accurate result. To learn more, see Learn how Amazon SageMaker Clarify helps detect bias.

A newer area of focus in generative AI is the use and quality of data in prompts from enterprise and proprietary data stores. An emerging best practice to consider here is shift-left, which puts a strong emphasis on early and proactive quality assurance mechanisms. In the context of data pipelines designed to process data for generative AI applications, this implies identifying and resolving data quality issues earlier upstream to mitigate the potential impact of data quality issues later. AWS Glue Data Quality not only measures and monitors the quality of your data at rest in your data lakes, data warehouses, and transactional databases, but also allows early detection and correction of quality issues for your extract, transform, and load (ETL) pipelines to ensure your data meets the quality standards before it is consumed. For more details, refer to Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog.

Vector store governance

Embeddings in vector databases elevate the intelligence and capabilities of generative AI applications by enabling features such as semantic search and reducing hallucinations. Embeddings typically contain private and sensitive data, and encrypting the data is a recommended step in the user input workflow. Amazon OpenSearch Serverless stores and searches your vector embeddings, and encrypts your data at rest with AWS Key Management Service (AWS KMS). For more details, see Introducing the vector engine for Amazon OpenSearch Serverless, now in preview. Similarly, additional vector engine options on AWS, including Amazon Kendra and Amazon Aurora, encrypt your data at rest with AWS KMS. For more information, refer to Encryption at rest and Protecting data using encryption.

As embeddings are generated and stored in a vector store, controlling access to the data with role-based access control (RBAC) becomes a key requirement to maintaining overall security. Amazon OpenSearch Service provides fine-grained access controls (FGAC) features with AWS Identity and Access Management (IAM) rules that can be associated with Amazon Cognito users. Corresponding user access control mechanisms are also provided by OpenSearch Serverless, Amazon Kendra, and Aurora. To learn more, refer to Data access control for Amazon OpenSearch Serverless, Controlling user access to documents with tokens, and Identity and access management for Amazon Aurora, respectively.

User request-response workflows

Controls in the data governance plane need to be integrated into the generative AI application as part of the overall solution deployment to ensure compliance with data security (based on role-based access controls) and data privacy (based on role-based access to sensitive data) policies. The following figure illustrates the workflow for applying data governance.

Data governance in user prompt workflow

The workflow includes the following key data governance steps:

  1. Provide a valid input prompt for alignment with compliance policies (for example, bias and toxicity).
  2. Generate a query by mapping prompt keywords with the data catalog.
  3. Apply FGAC policies based on user role.
  4. Apply RBAC policies based on user role.
  5. Apply data and content redaction to the response based on user role permissions and compliance policies.

As part of the prompt cycle, the user prompt must be parsed and keywords extracted to ensure alignment with compliance policies using a service like Amazon Comprehend (see New for Amazon Comprehend – Toxicity Detection) or Guardrails for Amazon Bedrock (preview). When that is validated, if the prompt requires structured data to be extracted, the keywords can be used against the data catalog (business or technical) to extract the relevant data tables and fields and construct a query from the data warehouse. The user permissions are evaluated using AWS Lake Formation to filter the relevant data. In the case of unstructured data, the search results are restricted based on the user permission policies implemented in the vector store. As a final step, the output response from the LLM needs to be evaluated against user permissions (to ensure data privacy and security) and compliance with safety (for example, bias and toxicity guidelines).

Although this process is specific to a RAG implementation and is applicable to other LLM implementation strategies, there are additional controls:

  • Prompt engineering – Access to the prompt templates to invoke need to be restricted based on access controls augmented by business logic.
  • Fine-tuning models and training foundation models – In cases where objects from the curated zone in Amazon S3 are used as training data for fine-tuning the foundation models, the permissions policies need to be configured with Amazon S3 identity and access management at the bucket or object level based on the requirements.

Summary

Data governance is critical to enabling organizations to build enterprise generative AI applications. As enterprise use cases continue to evolve, there will be a need to expand the data infrastructure to govern and manage new, diverse, unstructured datasets to ensure alignment with privacy, security, and quality policies. These policies need to be implemented and managed as part of data ingestion, storage, and management of the enterprise knowledge base along with the user interaction workflows. This makes sure that the generative AI applications not only minimize the risk of sharing inaccurate or wrong information, but also protect from bias and toxicity that can lead to harmful or libelous outcomes. To learn more about data governance on AWS, see What is Data Governance?

In subsequent posts, we will provide implementation guidance on how to expand the governance of the data infrastructure to support generative AI use cases.


About the Authors

Krishna Rupanagunta leads a team of Data and AI Specialists at AWS. He and his team work with customers to help them innovate faster and make better decisions using Data, Analytics, and AI/ML. He can be reached via LinkedIn.

Imtiaz (Taz) Sayed is the WW Tech Leader for Analytics at AWS. He enjoys engaging with the community on all things data and analytics. He can be reached via LinkedIn.

Raghvender Arni (Arni) leads the Customer Acceleration Team (CAT) within AWS Industries. The CAT is a global cross-functional team of customer facing cloud architects, software engineers, data scientists, and AI/ML experts and designers that drives innovation via advanced prototyping, and drives cloud operational excellence via specialized technical expertise.

Deploy CloudFormation Hooks to an Organization with service-managed StackSets

Post Syndicated from Kirankumar Chandrashekar original https://aws.amazon.com/blogs/devops/deploy-cloudformation-hooks-to-an-organization-with-service-managed-stacksets/

This post demonstrates using AWS CloudFormation StackSets to deploy CloudFormation Hooks from a centralized delegated administrator account to all accounts within an Organization Unit(OU). It provides step-by-step guidance to deploy controls at scale to your AWS Organization as Hooks using StackSets. By following this post, you will learn how to deploy a hook to hundreds of AWS accounts in minutes.

AWS CloudFormation StackSets help deploy CloudFormation stacks to multiple accounts and regions with a single operation. Using service-managed permissions, StackSets automatically generate the IAM roles required to deploy stack instances, eliminating the need for manual creation in each target account prior to deployment. StackSets provide auto-deploy capabilities to deploy stacks to new accounts as they’re added to an Organizational Unit (OU) in AWS Organization. With StackSets, you can deploy AWS well-architected multi-account solutions organization-wide in a single click and target stacks to selected accounts in OUs. You can also leverage StackSets to auto deploy foundational stacks like networking, policies, security, monitoring, disaster recovery, billing, and analytics to new accounts. This ensures consistent security and governance reflecting AWS best practices.

AWS CloudFormation Hooks allow customers to invoke custom logic to validate resource configurations before a CloudFormation stack create/update/delete operation. This helps enforce infrastructure-as-code policies by preventing non-compliant resources. Hooks enable policy-as-code to support consistency and compliance at scale. Without hooks, controlling CloudFormation stack operations centrally across accounts is more challenging because governance checks and enforcement have to be implemented through disjointed workarounds across disparate services after the resources are deployed. Other options like Config rules evaluate resource configurations on a timed basis rather than on stack operations. And SCPs manage account permissions but don’t include custom logic tailored to granular resource configurations. In contrast, CloudFormation hooks allows customer-defined automation to validate each resource as new stacks are deployed or existing ones updated. This enables stronger compliance guarantees and rapid feedback compared to asynchronous or indirect policy enforcement via other mechanisms.

Follow the later sections of this post that provide a step-by-step implementation for deploying hooks across accounts in an organization unit (OU) with a StackSet including:

  1. Configure service-managed permissions to automatically create IAM roles
  2. Create the StackSet in the delegated administrator account
  3. Target the OU to distribute hook stacks to member accounts

This shows how to easily enable a policy-as-code framework organization-wide.

I will show you how to register a custom CloudFormation hook as a private extension, restricting permissions and usage to internal administrators and automation. Registering the hook as a private extension limits discoverability and access. Only approved accounts and roles within the organization can invoke the hook, following security best practices of least privilege.

StackSets Architecture

As depicted in the following AWS StackSets architecture diagram, a dedicated Delegated Administrator Account handles creation, configuration, and management of the StackSet that defines the template for standardized provisioning. In addition, these centrally managed StackSets are deploying a private CloudFormation hook into all member accounts that belong to the given Organization Unit. Registering this as a private CloudFormation hook enables administrative control over the deployment lifecycle events it can respond to. Private hooks prevent public usage, ensuring the hook can only be invoked by approved accounts, roles, or resources inside your organization.

Architecture for deploying CloudFormation Hooks to accounts in an Organization

Diagram 1: StackSets Delegated Administration and Member Account Diagram

In the above architecture, Member accounts join the StackSet through their inclusion in a central Organization Unit. By joining, these accounts receive deployed instances of the StackSet template which provisions resources consistently across accounts, including the controlled private hook for administrative visibility and control.

The delegation of StackSet administration responsibilities to the Delegated Admin Account follows security best practices. Rather than having the sensitive central Management Account handle deployment logistics, delegation isolates these controls to an admin account with purpose-built permissions. The Management Account representing the overall AWS Organization focuses more on high-level compliance governance and organizational oversight. The Delegated Admin Account translates broader guardrails and policies into specific infrastructure automation leveraging StackSets capabilities. This separation of duties ensures administrative privileges are restricted through delegation while also enabling an organization-wide StackSet solution deployment at scale.

Centralized StackSets facilitate account governance through code-based infrastructure management rather than manual account-by-account changes. In summary, the combination of account delegation roles, StackSet administration, and joining through Organization Units creates an architecture to allow governed, infrastructure-as-code deployments across any number of accounts in an AWS Organization.

Sample Hook Development and Deployment

In the section, we will develop a hook on a workstation using the AWS CloudFormation CLI, package it, and upload it to the Hook Package S3 Bucket. Then we will deploy a CloudFormation stack that in turn deploys a hook across member accounts within an Organization Unit (OU) using StackSets.

The sample hook used in this blog post enforces that server-side encryption must be enabled for any S3 buckets and SQS queues created or updated on a CloudFormation stack. This policy requires that all S3 buckets and SQS queues be configured with server-side encryption when provisioned, ensuring security is built into our infrastructure by default. By enforcing encryption at the CloudFormation level, we prevent data from being stored unencrypted and minimize risk of exposure. Rather than manually enabling encryption post-resource creation, our developers simply enable it as a basic CloudFormation parameter. Adding this check directly into provisioning stacks leads to a stronger security posture across environments and applications. This example hook demonstrates functionality for mandating security best practices on infrastructure-as-code deployments.

Prerequisites

On the AWS Organization:

On the workstation where the hooks will be developed:

In the Delegated Administrator account:

Create a hooks package S3 bucket within the delegated administrator account. Upload the hooks package and CloudFormation templates that StackSets will deploy. Ensure the S3 bucket policy allows access from the AWS accounts within the OU. This access lets AWS CloudFormation access the hooks package objects and CloudFormation template objects in the S3 bucket from the member accounts during stack deployment.

Follow these steps to deploy a CloudFormation template that sets up the S3 bucket and permissions:

  1. Click here to download the admin-cfn-hook-deployment-s3-bucket.yaml template file in to your local workstation.
    Note: Make sure you model the S3 bucket and IAM policies as least privilege as possible. For the above S3 Bucket policy, you can add a list of IAM Role ARNs created by the StackSets service managed permissions instead of AWS: “*”, which allows S3 bucket access to all the IAM entities from the accounts in the OU. The ARN of this role will be “arn:aws:iam:::role/stacksets-exec-” in every member account within the OU. For more information about equipping least privilege access to IAM policies and S3 Bucket Policies, refer IAM Policies and Bucket Policies and ACLs! Oh, My! (Controlling Access to S3 Resources) blog post.
  2. Execute the following command to deploy the template admin-cfn-hook-deployment-s3-bucket.yaml using AWS CLI. For more information see Creating a stack using the AWS Command Line Interface. If using AWS CloudFormation console, see Creating a stack on the AWS CloudFormation console.
    To get the OU Id, see Viewing the details of an OU. OU Id starts with “ou-“. To get the Organization Id, see Viewing details about your organization. Organization Id starts with “o-

    aws cloudformation create-stack \
    --stack-name hooks-asset-stack \
    --template-body file://admin-cfn-deployment-s3-bucket.yaml \
    --parameters ParameterKey=OrgId,ParameterValue="<Org_id>" \
    ParameterKey=OUId,ParameterValue="<OU_id>"
  3. After deploying the stack, note down the AWS S3 bucket name from the CloudFormation Outputs.

Hook Development

In this section, you will develop a sample CloudFormation hook package that will enforce encryption for S3 Buckets and SQS queues within the preCreate and preDelete hook. Follow the steps in the walkthrough to develop a sample hook and generate a zip package for deploying and enabling them in all the accounts within an OU. While following the walkthrough, within the Registering hooks section, make sure that you stop right after executing the cfn submit --dry-run command. The --dry-run option will make sure that your hook is built and packaged your without registering it with CloudFormation on your account. While initiating a Hook project if you created a new directory with the name mycompany-testing-mytesthook, the hook package will be generated as a zip file with the name mycompany-testing-mytesthook.zip at the root your hooks project.

Upload mycompany-testing-mytesthook.zip file to the hooks package S3 bucket within the Delegated Administrator account. The packaged zip file can then be distributed to enable the encryption hooks across all accounts in the target OU.

Note: If you are using your own hooks project and not doing the tutorial, irrespective of it, you should make sure that you are executing the cfn submit command with the --dry-run option. This ensures you have a hooks package that can be distributed and reused across multiple accounts.

Hook Deployment using CloudFormation Stack Sets

In this section, deploy the sample hook developed previously across all accounts within an OU. Use a centralized CloudFormation stack deployed from the delegated administrator account via StackSets.

Deploying hooks via CloudFormation requires these key resources:

  1. AWS::CloudFormation::HookVersion: Publishes a new hook version to the CloudFormation registry
  2. AWS::CloudFormation::HookDefaultVersion: Specifies the default hook version for the AWS account and region
  3. AWS::CloudFormation::HookTypeConfig: Defines the hook configuration
  4. AWS::IAM::Role #1: Task execution role that grants the hook permissions
  5. AWS::IAM::Role #2: (Optional) role for CloudWatch logging that CloudFormation will assume to send log entries during hook execution
  6. AWS::Logs::LogGroup: (Optional) Enables CloudWatch error logging for hook executions

Follow these steps to deploy CloudFormation Hooks to accounts within the OU using StackSets:

  1. Click here to download the hooks-template.yaml template file into your local workstation and upload it into the Hooks package S3 bucket in the Delegated Administrator account.
  2. Deploy the hooks CloudFormation template hooks-template.yaml to all accounts within an OU using StackSets. Leverage service-managed permissions for automatic IAM role creation across the OU.
    To deploy the hooks template hooks-template.yaml across OU using StackSets, click here to download the CloudFormation StackSets template hooks-stack-sets-template.yaml locally, and upload it to the hooks package S3 bucket in the delegated administrator account. This StackSets template contains an AWS::CloudFormation::StackSet resource that will deploy the necessary hooks resources from hooks-template.yaml to all accounts in the target OU. Using SERVICE_MANAGED permissions model automatically handle provisioning the required IAM execution roles per account within the OU.
  3. Execute the following command to deploy the template hooks-stack-sets-template.yaml using AWS CLI. For more information see Creating a stack using the AWS Command Line Interface. If using AWS CloudFormation console, see Creating a stack on the AWS CloudFormation console.To get the S3 Https URL for the hooks template, hooks package and StackSets template, login to the AWS S3 service on the AWS console, select the respective object and click on Copy URL button as shown in the following screenshot:s3 download https url
    Diagram 2: S3 Https URL

    To get the OU Id, see Viewing the details of an OU. OU Id starts with “ou-“.
    Make sure to replace the <S3BucketName> and then <OU_Id> accordingly in the following command:

    aws cloudformation create-stack --stack-name hooks-stack-set-stack \
    --template-url https://<S3BucketName>.s3.us-west-2.amazonaws.com/hooks-stack-sets-template.yaml \
    --parameters ParameterKey=OuId,ParameterValue="<OU_Id>" \
    ParameterKey=HookTypeName,ParameterValue="MyCompany::Testing::MyTestHook" \
    ParameterKey=s3TemplateURL,ParameterValue="https://<S3BucketName>.s3.us-west-2.amazonaws.com/hooks-template.yaml" \
    ParameterKey=SchemaHandlerPackageS3URL,ParameterValue="https://<S3BucketName>.s3.us-west-2.amazonaws.com/mycompany-testing-mytesthook.zip"
  4. Check the progress of the stack deployment using the aws cloudformation describe-stack command. Move to the next section when the stack status is CREATE_COMPLETE.
    aws cloudformation describe-stacks --stack-name hooks-stack-set-stack
  5. If you navigate to the AWS CloudFormation Service’s StackSets section in the console, you can view the stack instances deployed to the accounts within the OU. Alternatively, you can execute the AWS CloudFormation list-stack-instances CLI command below to list the deployed stack instances:
    aws cloudformation list-stack-instances --stack-set-name MyTestHookStackSet

Testing the deployed hook

Deploy the following sample templates into any AWS account that is within the OU where the hooks was deployed and activated. Follow the steps in the Creating a stack on the AWS CloudFormation console. If using AWS CloudFormation CLI, follow the steps in the Creating a stack using the AWS Command Line Interface.

  1. Provision a non-compliant stack without server-side encryption using the following template:
    AWSTemplateFormatVersion: 2010-09-09
    Description: |
      This CloudFormation template provisions an S3 Bucket
    Resources:
      S3Bucket:
        Type: 'AWS::S3::Bucket'
        Properties: {}

    The stack deployment will not succeed and will give the following error message

    The following hook(s) failed: [MyCompany::Testing::MyTestHook] and the hook status reason as shown in the following screenshot:

    stack deployment failure due to hooks execution
    Diagram 3: S3 Bucket creation failure with hooks execution

  2. Provision a stack using the following template that has server-side encryption for the S3 Bucket.
    AWSTemplateFormatVersion: 2010-09-09
    Description: |
      This CloudFormation template provisions an encrypted S3 Bucket. **WARNING** This template creates an Amazon S3 bucket and a KMS key that you will be charged for. You will be billed for the AWS resources used if you create a stack from this template.
    Resources:
      EncryptedS3Bucket:
        Type: "AWS::S3::Bucket"
        Properties:
          BucketName: !Sub "encryptedbucket-${AWS::Region}-${AWS::AccountId}"
          BucketEncryption:
            ServerSideEncryptionConfiguration:
              - ServerSideEncryptionByDefault:
                  SSEAlgorithm: "aws:kms"
                  KMSMasterKeyID: !Ref EncryptionKey
                BucketKeyEnabled: true
      EncryptionKey:
        Type: "AWS::KMS::Key"
        DeletionPolicy: Retain
        UpdateReplacePolicy: Retain
        Properties:
          Description: KMS key used to encrypt the resource type artifacts
          EnableKeyRotation: true
          KeyPolicy:
            Version: 2012-10-17
            Statement:
              - Sid: Enable full access for owning account
                Effect: Allow
                Principal:
                  AWS: !Ref "AWS::AccountId"
                Action: "kms:*"
                Resource: "*"
    Outputs:
      EncryptedBucketName:
        Value: !Ref EncryptedS3Bucket

    The deployment will succeed as it will pass the hook validation with the following hook status reason as shown in the following screenshot:

    stack deployment pass due to hooks executionDiagram 4: S3 Bucket creation success with hooks execution

Updating the hooks package

To update the hooks package, follow the same steps described in the Hooks Development section to change the hook code accordingly. Then, execute the cfn submit --dry-run command to build and generate the hooks package file with the registering the type with the CloudFormation registry. Make sure to rename the zip file with a unique name compared to what was previously used. Otherwise, while updating the CloudFormation StackSets stack, it will not see any changes in the template and thus not deploy updates. The best practice is to use a CI/CD pipeline to manage the hook package. Typically, it is good to assign unique version numbers to the hooks packages so that CloudFormation stacks with the new changes get deployed.

Cleanup

Navigate to the AWS CloudFormation console on the Delegated Administrator account, and note down the Hooks package S3 bucket name and empty its contents. Refer to Emptying the Bucket for more information.

Delete the CloudFormation stacks in the following order:

  1. Test stack that failed
  2. Test stack that passed
  3. StackSets CloudFormation stack. This has a DeletionPolicy set to Retain, update the stack by removing the DeletionPolicy and then initiate a stack deletion via CloudFormation or physically delete the StackSet instances and StackSets from the Console or CLI by following: 1. Delete stack instances from your stack set 2. Delete a stack set
  4. Hooks asset CloudFormation stack

Refer to the following documentation to delete CloudFormation Stacks: Deleting a stack on the AWS CloudFormation console or Deleting a stack using AWS CLI.

Conclusion

Throughout this blog post, you have explored how AWS StackSets enable the scalable and centralized deployment of CloudFormation hooks across all accounts within an Organization Unit. By implementing hooks as reusable code templates, StackSets provide consistency benefits and slash the administrative labor associated with fragmented and manual installs. As organizations aim to fortify governance, compliance, and security through hooks, StackSets offer a turnkey mechanism to efficiently reach hundreds of accounts. By leveraging the described architecture of delegated StackSet administration and member account joining, organizations can implement a single hook across hundreds of accounts rather than manually enabling hooks per account. Centralizing your hook code-base within StackSets templates facilitates uniform adoption while also simplifying maintenance. Administrators can update hooks in one location instead of attempting fragmented, account-by-account changes. By enclosing new hooks within reusable StackSets templates, administrators benefit from infrastructure-as-code descriptiveness and version control instead of one-off scripts. Once configured, StackSets provide automated hook propagation without overhead. The delegated administrator merely needs to include target accounts through their Organization Unit alignment rather than handling individual permissions. New accounts added to the OU automatically receive hook deployments through the StackSet orchestration engine.

About the Author

kirankumar.jpeg

Kirankumar Chandrashekar is a Sr. Solutions Architect for Strategic Accounts at AWS. He focuses on leading customers in architecting DevOps, modernization using serverless, containers and container orchestration technologies like Docker, ECS, EKS to name a few. Kirankumar is passionate about DevOps, Infrastructure as Code, modernization and solving complex customer issues. He enjoys music, as well as cooking and traveling.

How organizations are modernizing for cloud operations

Post Syndicated from Adam Keller original https://aws.amazon.com/blogs/devops/how-organizations-are-modernizing-for-cloud-operations/

Over the past decade, we’ve seen a rapid evolution in how IT operations teams and application developers work together. In the early days, there was a clear division of responsibilities between the two teams, with one team focused on providing and maintaining the servers and various components (i.e., storage, DNS, networking, etc.) for the application to run, while the other primarily focused on developing the application’s features, fixing bugs, and packaging up their artifacts for the operations team to deploy. Ultimately, this division led to a siloed approach which presented glaring challenges. These siloes hindered communication between the teams, which would often result in developers being ready to ship code and passing it over to the operations teams with little to no collaboration prior. In turn, operations teams were often left scrambling trying to deliver on the requirements at the last minute. This would lead to bottlenecks in software delivery, delaying features and bug fixes from being shipped. Aside from software delivery, operations teams were primarily responsible for handling on-call duties, which encompassed addressing issues arising from both applications and infrastructure. Consequently, when incidents occurred, the operations teams were the ones receiving alerts, irrespective of the source of the problem. This raised the question: what motivates the software developers to create resilient and dependable software? Terms such as “throw it over the wall” and “it works on my laptop” were coined because of this and are still commonly referenced in discussion today.

The DevOps movement emerged in response to these challenges, aiming to build a bridge between developers and operations teams. DevOps focuses on collaboration between the two teams through communication and integration by fostering a culture of shared responsibility. This approach promotes the use of automation of infrastructure and application code leveraging continuous integration (CI) and continuous delivery (CD), microservices architectures, and visibility through monitoring, logging and tracing. The end result of operating in a DevOps model provides quicker and more reliable release cycles. While the ideology is well intentioned, implementing a DevOps practice is not easy as organizations struggle to adapt and adhere to the cultural expectations. In addition, teams can struggle to find the right balance between speed and stability, which often times results in reverting back to old behaviors due to fear of downtime and instability of their environments. While DevOps is very focused on culture through collaboration and automation, not all developers want to be involved in operations and vice versa. This poses the question: how do organizations centralize a frictionless developer experience, with guardrails and best practices baked in, while providing a golden path for developers to self serve? This is where platform engineering comes in.

Platform engineering has emerged as a critical discipline for organizations, which is driving the next evolution of infrastructure and operations, while simultaneously empowering developers to create and deliver robust, scalable applications. It aims to improve developer experience by providing self service mechanisms that provide some level of abstraction for provisioning resources, with good practices baked in. This builds on top of DevOps practices by enabling the developer to have full control of their resources through self service, without having to throw it over the wall. There are various ways that platform engineering teams implement these self service interfaces, from leveraging a GitOps focused strategy to building Internal Developer Platforms with a UI and/or API. With the increasing demand for faster and more agile development, many organizations are adopting this model to streamline their operations, gain visibility, reduce costs, and lower the friction of onboarding new applications.

In this blog post, we will explore the common operational models used within organizations today, where platform engineering fits within these models, the common patterns used to build and develop these self-service platforms, and what lies ahead for this emerging field.

Operational Models

It’s important for us to start by understanding how we see technology teams operate today and the various ways they support development teams from instantiating infrastructure to defining pipelines and deploying application code. In the below diagram we highlight the four common operational models and will discuss each to understand the benefits and challenges they bring. This is also critical in understanding where platform teams fit, and where they don’t.

This image shows a sliding scale of the various provisioning models. For each model it shows the interaction between developers and the platform team.

Centralized Provisioning

In a centralized provisioning model, the responsibility for architecting, deploying, and managing infrastructure falls primarily on a centralized team. Organizations assign enforcement of controls into specific roles with narrow scope, including release management, process management, and segmentation of siloed teams (networking, compute, pipelines, etc). The request model generally requires a ticket or request to be sent to the central or dedicated siloed team, ticket enters a backlog, and the developers wait until resources can be provisioned on their behalf. In an ideal world, the central teams can quickly provision the resources and pipelines to get the developers up and running; but, in reality these teams are busy with work and have to prioritize accordingly which often times leaves development teams waiting or having to predict what they need well in advance.

While this model provides central control over resource provisioning, it introduces bottlenecks into the delivery process and generally results in slower deployment cycles and feedback loops. This model becomes especially challenging when supporting a large number of development teams with varying requirements and use cases. Ultimately this model can lead to frustration and friction between teams and hence why organizations after some time look to move away from operating in this model. This leads us to segue into the next model, which is the Platform-enabled Golden Path.

Platform-enabled Golden Path

The platform-enabled golden path model is an approach that allows for developer to have some form of customization while still maintaining consistency by following a set of standards. In this model, platform engineers clearly lay out “preferred” standards with sane defaults, guardrails, and good practices based on common architectures that development teams can use as-is. Sophisticated platform teams may implement their own customizations on top of this framework in the following ways:

The platform engineering team is responsible for creating and updating the templates, with maintenance responsibilities typically being shared. This approach strikes a balance between consistency and flexibility, allowing for some customization while still maintaining standards. However, it can be challenging to maintain visibility across the organization, as development teams have more freedom to customize their infrastructure. This becomes especially challenging when platform teams want a change to propagate across resources deployed by the various development teams building on top of these patterns.

Embedded DevOps

Embedded DevOps is a model in which DevOps engineers are directly aligned with development teams to define, provision, and maintain their infrastructure. There are a couple of common patterns around how organizations use this model.

  • Floating model: A central DevOps team can leverage a floating model where a DevOps engineer will be directly embedded onto a development team early in the development process to help build out the required pipelines and infrastructure resources, and jump to another team once everything is up and running.
  • Permanent embedded model: Alternatively, a development team can have a permanent DevOps engineer on the team to help support early iterations as well as maintenance as the application evolves. The DevOps engineer is ideally there from the beginning of the project and continues to support and improve the infrastructure and automation based on feature requests and bug fixes.

A central platform and/or architecture team may define the acceptable configurations and resources, while DevOps engineers decide how to best use them to meet the needs of their development team. Individual teams are responsible for maintenance and updating of the templates and pipelines. This model offers greater agility and flexibility, but also requires the funding to hire DevOps engineers per development team, which can become costly as development teams scale. It’s important that when operating in this model to maintain collaboration between members of the DevOps team to ensure that best practices can be shared.

Decentralized DevOps

Lastly, the decentralized DevOps model gives development teams full end-to-end ownership and responsibility for defining and managing their infrastructure and pipelines. A central team may be focused on building out guardrails and boundaries to ensure that they limit the blast radius within the boundaries. They can also create a process to ensure that infrastructure deployed meets company standards, while ensuring development teams are free to make design decisions and remain autonomous. This approach offers the greatest agility and flexibility, but also the highest risk of inconsistency, errors, and security vulnerabilities. Additionally, this model requires a cultural shift in the organization because the development teams now own the entire stack, which results in more responsibility. This model can be a deterrent to developers, especially if they are unfamiliar with building resources in the cloud and/or don’t want to do it.

Overall, each model has its strengths and weaknesses, and the purpose of this blog is to educate on the patterns that are emerging. Ultimately the right approach depends on the organization’s specific needs and goals as well as their willingness to shift culturally. Of the above patterns, the two that are emerging as the most common are Platform-enabled Golden Path and Decentralized DevOps. Furthermore, we’re seeing that more often than not platform teams are finding themselves going back and forth between the two patterns within the same organization. This is in part due to technology making infrastructure creation in the cloud more accessible through abstraction and automation (think of tools like the AWS Cloud Development Kit (CDK), AWS Serverless Application Model (SAM) CLI, AWS Copilot, Serverless framework, etc). Let’s now look at the technology patterns that are emerging to support these use cases.

Emerging patterns

Of the trends that are on the rise, Internal Developer Platforms and GitOps practices are becoming increasingly popular in the industry due to their ability to streamline the software development process and improve collaboration between development and platform teams. Internal Developer Platforms provide a centralized platform for developers to access resources and tools needed to build, test, deploy, and monitor applications and associated infrastructure resources. By providing a self-service interface with pre-approved patterns (via UI, API, or Git), internal developer platforms empower development teams to work independently and collaborate with one another more effectively. This reduces the burden on IT and operations teams while also increasing the agility and speed of development as developers aren’t required to wait in line to get resources provisioned. The paradigm shifts with Internal Developer Platforms because the platform teams are focused on building the blueprints and defining the standards for backend resources that development teams centrally consume via the provided interfaces. The platform team should view the internal developer platform as a product and look at developers as their customer.

While internal developer platforms provide a lot of value and abstraction through a UI and API’s, some organizations prefer to use Git as the center of deployment orchestration, and this is where leveraging GitOps can help. GitOps is a methodology that leverages Git as the source of orchestrating and managing the deployment of infrastructure and applications. With GitOps, infrastructure is defined declaratively as code, and changes are tracked in Git, allowing for a more standardized and automated deployment process. Using git for deployment orchestration is not new, but there are some concepts with GitOps that take Git orchestration to a new level.

Let’s look at the principles of GitOps, as defined by OpenGitOps:

  • Declarative
    • A system managed by GitOps must have its desired state expressed declaratively.
  • Versioned and Immutable
    • Desired state is stored in a way that enforces immutability, versioning and retains a complete version history.
  • Pulled Automatically
    • Software agents automatically pull the desired state declarations from the source.
  • Continuously Reconciled
    • Software agents continuously observe actual system state and attempt to apply the desired state.

GitOps helps to reduce the risk of errors and improve consistency across the organization as all change is tracked centrally. Additionally this provides developers with a familiar interface in git as well as the ability to store the desired state of their infrastructure and applications in one place. Lastly, GitOps is focused on ensuring that the desired state in git is always maintained, and if drift occurs, an external process will reconcile the state of the resources. GitOps was born in the Kubernetes ecosystem using tools like Flux and ArgoCD.

The final emerging trend to discuss is particularly relevant to teams functioning within a decentralized DevOps model, possessing end-to-end responsibility for the stack, encompassing infrastructure and application delivery. The amount of cognitive load required to connect the underlying cloud resources together while also being an expert in building out business logic for the application is extremely high, and hence why teams look to harness the power of abstraction and automation for infrastructure provisioning. While this may appear analogous to previously mentioned practices, the key distinction lies in the utilization of tools specifically designed to enhance the developer experience. By abstracting various components (such as networking, identity, and stitching everything together), these tools eliminate the necessity for interaction with centralized teams, empowering developers to operate autonomously and assume complete ownership of the infrastructure. This trend is exemplified by the adoption of innovative tools such as AWS App Composer, AWS CodeCatalyst, SAM CLI, AWS Copilot CLI, and the AWS Cloud Development Kit (CDK).

Looking ahead

If there is one thing that we can ascertain it’s that the journey to successful developer enablement is ongoing, and it’s clear that finding that balance of speed, security, and flexibility can be difficult to achieve. Throughout all of these evolutionary trends in technology, Git has remained as the nucleus of infrastructure and application deployment automation. This is not new; however, the processes being built around Git such as GitOps are. The industry continues to gravitate towards this model, and at AWS we are looking at ways to enable builders to leverage git as the source of truth with simple integrations. For example, AWS Proton has built integrations with git for central template storage with a feature called template sync and recently released a feature called service sync, which allows developers to configure and deploy their Proton services using Git. These features empower the platform team and developers to seamlessly store their templates and desired infrastructure resource states within Git, requiring no additional effort beyond the initial setup.

We also see that interest in building internal developer platforms is on a sharp incline, and it’s still in the early days. With tools like AWS Proton, AWS Service Catalog, Backstage, and other SaaS providers, platform teams are able to define patterns centrally for developers to self serve patterns via a library or “shopping cart”. As mentioned earlier, it’s vital that the teams building out the internal developer platforms think of ways to enable the developer to deploy supplemental resources that aren’t defined in the central templates. While the developer platform can solve the majority of the use cases, it’s nearly impossible to solve them all. If you can’t enable developers to deploy resources on top of their platform deployed services, you’ll find that you’re back to the original problem statement outlined in the beginning of this blog which can ultimately result in a failed implementation. AWS Proton solves this through a feature we call components, which enables developers to bring their own IaC templates to deploy on top of their services deployed through Proton.

The rising popularity of the aforementioned patterns reveals an unmet need for developers who seek to tailor their cloud resources according to the specific requirements of their applications and the demands of platform/central teams that require governance. This is particularly prevalent in serverless workloads, where developers often integrate their application and infrastructure code, utilizing services such as AWS Step Functions to transfer varying degrees of logic from the application layer to the managed service itself. Centralizing these resources becomes increasingly challenging due to their dynamic nature, which adapts to the evolving requirements of business logic. Consequently, it is nearly impossible to consolidate these patterns into a universally applicable blueprint for reuse across diverse business scenarios.

As the distinction between cloud resources and application code becomes increasingly blurred, developers are compelled to employ tools that streamline the underlying logic, enabling them to achieve their desired outcomes swiftly and securely. In this context, it is crucial for platform teams to identify and incorporate these tools, ensuring that organizational safeguards and expectations are upheld. By doing so, they can effectively bridge the gap between developers’ preferences and the essential governance required by the platform or central team.

Wrapping up

We’ve explored the various operating models and emerging trends designed to facilitate these models. Platform Engineering represents the ongoing evolution of DevOps, aiming to enhance the developer experience for rapid and secure deployments. It is crucial to recognize that developers possess varying skill sets and preferences, even within the same organization. As previously discussed, some developers prefer complete ownership of the entire stack, while others concentrate solely on writing code without concerning themselves with infrastructure. Consequently, the platform engineering practice must continuously adapt to accommodate these patterns in a manner that fosters enablement rather than posing as obstacles. To achieve this, the platform must be treated as a product, with developers as its customers, ensuring that their needs and preferences are prioritized and addressed effectively.

To determine where your organization fits within the discussed operational models, we encourage you to initiate a self-assessment and have internal discussions. Evaluate your current infrastructure provisioning, deployment processes, and development team support. Consider the benefits and challenges of each model and how they align with your organization’s specific needs, goals, and cultural willingness to shift.

To facilitate this process, gather key stakeholders from various teams, including leadership, platform engineering, development, and DevOps, for a collaborative workshop. During this workshop, review the four operational models (Centralized Provisioning, Platform-enabled Golden Path, Embedded DevOps, and Decentralized DevOps) and discuss the following:

  • How closely does each model align with your current organizational structure and processes?
  • What are the potential benefits and challenges of adopting or transitioning to each model within your organization?
  • What challenges are you currently facing with the model that you operate under?
  • How can technology be leveraged to optimize infrastructure creation and deployment automation?

By conducting this self-assessment and engaging in open dialogue, your organization can identify the most suitable operational model and develop a strategic plan to optimize collaboration, efficiency, and agility within your technology teams. If a more guided approach is preferred, reach out to our solutions architects and/or AWS partners to assist.

Adam Keller

Adam is a Senior Developer Advocate @ AWS working on all things related to IaC, Platform Engineering, DevOps, and modernization. Reach out to him on twitter @realadamjkeller.

New for AWS Control Tower – Comprehensive Controls Management (Preview)

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-aws-control-tower-comprehensive-controls-management-preview/

Today, customers in regulated industries face the challenge of defining and enforcing controls needed to meet compliance and security requirements while empowering engineers to make their design choices. In addition to addressing risk, reliability, performance, and resiliency requirements, organizations may also need to comply with frameworks and standards such as PCI DSS and NIST 800-53.

Building controls that account for service relationships and their dependencies is time-consuming and expensive. Sometimes customers restrict engineering access to AWS services and features until their cloud architects identify risks and implement their own controls.

To make that easier, today we are launching comprehensive controls management in AWS Control Tower. You can use it to apply managed preventative, detective, and proactive controls to accounts and organizational units (OUs) by service, control objective, or compliance framework. AWS Control Tower does the mapping between them on your behalf, saving time and effort.

With this new capability, you can now also use AWS Control Tower to turn on AWS Security Hub detective controls across all accounts in an OU. In this way, Security Hub controls are enabled in every AWS Region that AWS Control Tower governs.

Let’s see how this works in practice.

Using AWS Control Tower Comprehensive Controls Management
In the AWS Control Tower console, there is a new Controls library section. There, I choose All controls. There are now more than three hundred controls available. For each control, I see which AWS service it is related to, the control objective this control is part of, the implementation (such as AWS Config rule or AWS CloudFormation Guard rule), the behavior (preventive, detective, or proactive), and the frameworks it maps to (such as NIST 800-53 or PCI DSS).

Console screenshot.

In the Find controls search box, I look for a preventive control called CT.CLOUDFORMATION.PR.1. This control uses a service control policy (SCP) to protect controls that use CloudFormation hooks and is required by the control that I want to turn on next. Then, I choose Enable control.

Console screenshot.

Then, I select the OU for which I want to enable this control.

Console screenshot.

Now that I have set up this control, let’s see how controls are presented in the console in categories. I choose Categories in the navigation pane. There, I can browse controls grouped as Frameworks, Services, and Control objectives. By default, the Frameworks tab is selected.

Console screenshot.

I select a framework (for example, PCI DSS version 3.2.1) to see all the related controls and control objectives. To implement a control, I can select the control from the list and choose Enable control.

Console screenshot.

I can also manage controls by AWS service. When I select the Services tab, I see a list of AWS services and the related control objectives and controls.

Console screenshot.

I choose Amazon DynamoDB to see the controls that I can turn on for this service.

Console screenshot.

I select the Control objectives tab. When I need to assess a control objective, this is where I have access to the list of related controls to turn on.

Console screenshot.

I choose Encrypt data at rest to see and search through the available controls for that control objective. I can also check which services are covered in this specific case. I type RDS in the search bar to find the controls related to Amazon Relational Database Service (RDS) for this control objective.

I choose CT.RDS.PR.16 – Require an Amazon RDS database cluster to have encryption at rest configured and then Enable control.

Console screenshot.

I select the OU for which I want to enable the control for, and I proceed. All the AWS accounts in this organization’s OU will have this control enabled in all the Regions that AWS Control Tower governs.

Console screenshot.

After a few minutes, the AWS Control Tower setup is updated. Now, the accounts in this OU have proactive control CT.RDS.PR.16 turned on. When an account in this OU deploys a CloudFormation stack, any Amazon RDS database cluster has to have encryption at rest configured. Because this control is proactive, it’ll be checked by a CloudFormation hook before the deployment starts. This saves a lot of time compared to a detective control that would find the issue only when the CloudFormation deployment is in progress or has terminated. This also improves my security posture by preventing something that’s not allowed as opposed to reacting to it after the fact.

Availability and Pricing
Comprehensive controls management is available in preview today in all AWS Regions where AWS Control Tower is offered. These enhanced control capabilities reduce the time it takes you to vet AWS services from months or weeks to minutes. They help you use AWS by undertaking the heavy burden of defining, mapping, and managing the controls required to meet the most common control objectives and regulations.

There is no additional charge to use these new capabilities during the preview. However, when you set up AWS Control Tower, you will begin to incur costs for AWS services configured to set up your landing zone and mandatory controls. For more information, see AWS Control Tower pricing.

Simplify how you implement compliance and security requirements with AWS Control Tower.

Danilo

Building AWS Lambda governance and guardrails

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-aws-lambda-governance-and-guardrails/

When building serverless applications using AWS Lambda, there are a number of considerations regarding security, governance, and compliance. This post highlights how Lambda, as a serverless service, simplifies cloud security and compliance so you can concentrate on your business logic. It covers controls that you can implement for your Lambda workloads to ensure that your applications conform to your organizational requirements.

The Shared Responsibility Model

The AWS Shared Responsibility Model distinguishes between what AWS is responsible for and what customers are responsible for with cloud workloads. AWS is responsible for “Security of the Cloud” where AWS protects the infrastructure that runs all the services offered in the AWS Cloud. Customers are responsible for “Security in the Cloud”, managing and securing their workloads. When building traditional applications, you take on responsibility for many infrastructure services, including operating systems and network configuration.

Traditional application shared responsibility

Traditional application shared responsibility

One major benefit when building serverless applications is shifting more responsibility to AWS so you can concentrate on your business applications. AWS handles managing and patching the underlying servers, operating systems, and networking as part of running the services.

Serverless application shared responsibility

Serverless application shared responsibility

For Lambda, AWS manages the application platform where your code runs, which includes patching and updating the managed language runtimes. This reduces the attack surface while making cloud security simpler. You are responsible for the security of your code and AWS Identity and Access Management (IAM) to the Lambda service and within your function.

Lambda is SOCHIPAAPCI, and ISO-compliant. For more information, see Compliance validation for AWS Lambda and the latest Lambda certification and compliance readiness services in scope.

Lambda isolation

Lambda functions run in separate isolated AWS accounts that are dedicated to the Lambda service. Lambda invokes your code in a secure and isolated runtime environment within the Lambda service account. A runtime environment is a collection of resources running in a dedicated hardware-virtualized Micro Virtual Machines (MVM) on a Lambda worker node.

Lambda workers are bare metalEC2 Nitro instances, which are managed and patched by the Lambda service team. They have a maximum lease lifetime of 14 hours to keep the underlying infrastructure secure and fresh. MVMs are created by Firecracker, an open source virtual machine monitor (VMM) that uses Linux’s Kernel-based Virtual Machine (KVM) to create and manage MVMs securely at scale.

MVMs maintain a strong separation between runtime environments at the virtual machine hardware level, which increases security. Runtime environments are never reused across functions, function versions, or AWS accounts.

Isolation model for AWS Lambda workers

Isolation model for AWS Lambda workers

Network security

Lambda functions always run inside secure Amazon Virtual Private Cloud (Amazon VPCs) owned by the Lambda service. This gives the Lambda function access to AWS services and the public internet. There is no direct network inbound access to Lambda workers, runtime environments, or Lambda functions. All inbound access to a Lambda function only comes via the Lambda Invoke API, which sends the event object to the function handler.

You can configure a Lambda function to connect to private subnets in a VPC in your account if necessary, which you can control with IAM condition keys . The Lambda function still runs inside the Lambda service VPC but sends all network traffic through your VPC. Function outbound traffic comes from your own network address space.

AWS Lambda service VPC with VPC-to-VPC NAT to customer VPC

AWS Lambda service VPC with VPC-to-VPC NAT to customer VPC

To give your VPC-connected function access to the internet, route outbound traffic to a NAT gateway in a public subnet. Connecting a function to a public subnet doesn’t give it internet access or a public IP address, as the function is still running in the Lambda service VPC and then routing network traffic into your VPC.

All internal AWS traffic uses the AWS Global Backbone rather than traversing the internet. You do not need to connect your functions to a VPC to avoid connectivity to AWS services over the internet. VPC connected functions allow you to control and audit outbound network access.

You can use security groups to control outbound traffic for VPC-connected functions and network ACLs to block access to CIDR IP ranges or ports. VPC endpoints allow you to enable private communications with supported AWS services without internet access.

You can use VPC Flow Logs to audit traffic going to and from network interfaces in your VPC.

Runtime environment re-use

Each runtime environment processes a single request at a time. After Lambda finishes processing the request, the runtime environment is ready to process an additional request for the same function version. For more information on how Lambda manages runtime environments, see Understanding AWS Lambda scaling and throughput.

Data can persist in the local temporary filesystem path, in globally scoped variables, and in environment variables across subsequent invocations of the same function version. Ensure that you only handle sensitive information within individual invocations of the function by processing it in the function handler, or using local variables. Do not re-use files in the local temporary filesystem to process unencrypted sensitive data. Do not put sensitive or confidential information into Lambda environment variables, tags, or other freeform fields such as Name fields.

For more Lambda security information, see the Lambda security whitepaper.

Multiple accounts

AWS recommends using multiple accounts to isolate your resources because they provide natural boundaries for security, access, and billing. Use AWS Organizations to manage and govern individual member accounts centrally. You can use AWS Control Tower to automate many of the account build steps and apply managed guardrails to govern your environment. These include preventative guardrails to limit actions and detective guardrails to detect and alert on non-compliance resources for remediation.

Lambda access controls

Lambda permissions define what a Lambda function can do, and who or what can invoke the function. Consider the following areas when applying access controls to your Lambda functions to ensure least privilege:

Execution role

Lambda functions have permission to access other AWS resources using execution roles. This is an AWS principal that the Lambda service assumes which grants permissions using identity policy statements assigned to the role. The Lambda service uses this role to fetch and cache temporary security credentials, which are then available as environment variables during a function’s invocation. It may re-use them across different runtime environments that use the same execution role.

Ensure that each function has its own unique role with the minimum set of permissions..

Identity/user policies

IAM identity policies are attached to IAM users, groups, or roles. These policies allow users or callers to perform operations on Lambda functions. You can restrict who can create functions, or control what functions particular users can manage.

Resource policies

Resource policies define what identities have fine-grained inbound access to managed services. For example, you can restrict which Lambda function versions can add events to a specific Amazon EventBridge event bus. You can use resource-based policies on Lambda resources to control what AWS IAM identities and event sources can invoke a specific version or alias of your function. You also use a resource-based policy to allow an AWS service to invoke your function on your behalf. To see which services support resource-based policies, see “AWS services that work with IAM”.

Attribute-based access control (ABAC)

With attribute-based access control (ABAC), you can use tags to control access to your Lambda functions. With ABAC, you can scale an access control strategy by setting granular permissions with tags without requiring permissions updates for every new user or resource as your organization scales. You can also use tag policies with AWS Organizations to standardize tags across resources.

Permissions boundaries

Permissions boundaries are a way to delegate permission management safely. The boundary places a limit on the maximum permissions that a policy can grant. For example, you can use boundary permissions to limit the scope of the execution role to allow only read access to databases. A builder with permission to manage a function or with write access to the applications code repository cannot escalate the permissions beyond the boundary to allow write access.

Service control policies

When using AWS Organizations, you can use Service control policies (SCPs) to manage permissions in your organization. These provide guardrails for what actions IAM users and roles within the organization root or OUs can do. For more information, see the AWS Organizations documentation, which includes example service control policies.

Code signing

As you are responsible for the code that runs in your Lambda functions, you can ensure that only trusted code runs by using code signing with the AWS Signer service. AWS Signer digitally signs your code packages and Lambda validates the code package before accepting the deployment, which can be part of your automated software deployment process.

Auditing Lambda configuration, permissions and access

You should audit access and permissions regularly to ensure that your workloads are secure. Use the IAM console to view when an IAM role was last used.

IAM last used

IAM last used

IAM access advisor

Use IAM access advisor on the Access Advisor tab in the IAM console to review when was the last time an AWS service was used from a specific IAM user or role. You can use this to remove IAM policies and access from your IAM roles.

IAM access advisor

IAM access advisor

AWS CloudTrail

AWS CloudTrail helps you monitor, log, and retain account activity to provide a complete event history of actions across your AWS infrastructure. You can monitor Lambda API actions to ensure that only appropriate actions are made against your Lambda functions. These include CreateFunction, DeleteFunction, CreateEventSourceMapping, AddPermission, UpdateEventSourceMapping,  UpdateFunctionConfiguration, and UpdateFunctionCode.

AWS CloudTrail

AWS CloudTrail

IAM Access Analyzer

You can validate policies using IAM Access Analyzer, which provides over 100 policy checks with security warnings for overly permissive policies. To learn more about policy checks provided by IAM Access Analyzer, see “IAM Access Analyzer policy validation”.

You can also generate IAM policies based on access activity from CloudTrail logs, which contain the permissions that the role used in your specified date range.

IAM Access Analyzer

IAM Access Analyzer

AWS Config

AWS Config provides you with a record of the configuration history of your AWS resources. AWS Config monitors the resource configuration and includes rules to alert when they fall into a non-compliant state.

For Lambda, you can track and alert on changes to your function configuration, along with the IAM execution role. This allows you to gather Lambda function lifecycle data for potential audit and compliance requirements. For more information, see the Lambda Operators Guide.

AWS Config includes Lambda managed config rules such as lambda-concurrency-check, lambda-dlq-check, lambda-function-public-access-prohibited, lambda-function-settings-check, and lambda-inside-vpc. You can also write your own rules.

There are a number of other AWS services to help with security compliance.

  1. AWS Audit Manager: Collect evidence to help you audit your use of cloud services.
  2. Amazon GuardDuty: Detect unexpected and potentially unauthorized activity in your AWS environment.
  3. Amazon Macie: Evaluates your content to identify business-critical or potentially confidential data.
  4. AWS Trusted Advisor: Identify opportunities to improve stability, save money, or help close security gaps.
  5. AWS Security Hub: Provides security checks and recommendations across your organization.

Conclusion

Lambda makes cloud security simpler by taking on more responsibility using the AWS Shared Responsibility Model. Lambda implements strict workload security at scale to isolate your code and prevent network intrusion to your functions. This post provides guidance on assessing and implementing best practices and tools for Lambda to improve your security, governance, and compliance controls. These include permissions, access controls, multiple accounts, and code security. Learn how to audit your function permissions, configuration, and access to ensure that your applications conform to your organizational requirements.

For more serverless learning resources, visit Serverless Land.

Creating computing quotas on AWS Outposts rack with EC2 Capacity Reservations sharing

Post Syndicated from Rachel Zheng original https://aws.amazon.com/blogs/compute/creating-computing-quotas-on-aws-outposts-rack-with-ec2-capacity-reservation-sharing/

This post is written by Yi-Kang Wang, Senior Hybrid Specialist SA.

AWS Outposts rack is a fully managed service that delivers the same AWS infrastructure, AWS services, APIs, and tools to virtually any on-premises datacenter or co-location space for a truly consistent hybrid experience. AWS Outposts rack is ideal for workloads that require low latency access to on-premises systems, local data processing, data residency, and migration of applications with local system interdependencies. In addition to these benefits, we have started to see many of you need to share Outposts rack resources across business units and projects within your organization. This blog post will discuss how you can share Outposts rack resources by creating computing quotas on Outposts with Amazon Elastic Compute Cloud (Amazon EC2) Capacity Reservations sharing.

In AWS Regions, you can set up and govern a multi-account AWS environment using AWS Organizations and AWS Control Tower. The natural boundaries of accounts provide some built-in security controls, and AWS provides additional governance tooling to help you achieve your goals of managing a secure and scalable cloud environment. And while Outposts can consistently use organizational structures for security purposes, Outposts introduces another layer to consider in designing that structure. When an Outpost is shared within an Organization, the utilization of the purchased capacity also needs to be managed and tracked within the organization. The account that owns the Outpost resource can use AWS Resource Access Manager (RAM) to create resource shares for member accounts within their organization. An Outposts administrator (admin) can share the ability to launch instances on the Outpost itself, access to the local gateways (LGW) route tables, and/or access to customer-owned IPs (CoIP). Once the Outpost capacity is shared, the admin needs a mechanism to control the usage and prevent over utilization by individual accounts. With the introduction of Capacity Reservations on Outposts, we can now set up a mechanism for computing quotas.

Concept of computing quotas on Outposts rack

In the AWS Regions, Capacity Reservations enable you to reserve compute capacity for your Amazon EC2 instances in a specific Availability Zone for any duration you need. On May 24, 2021, Capacity Reservations were enabled for Outposts rack. It supports not only EC2 but Outposts services running over EC2 instances such as Amazon Elastic Kubernetes (EKS), Amazon Elastic Container Service (ECS) and Amazon EMR. The computing power of above services could be covered in your resource planning as well. For example, you’d like to launch an EKS cluster with two self-managed worker nodes for high availability. You can reserve two instances with Capacity Reservations to secure computing power for the requirement.

Here I’ll describe a method for thinking about resource pools that an admin can use to manage resource allocation. I’ll use three resource pools, that I’ve named reservation pool, bulk and attic, to effectively and extensibly manage the Outpost capacity.

A reservation pool is a resource pool reserved for a specified member account. An admin creates a Capacity Reservation to match member account’s need, and shares the Capacity Reservation with the member account through AWS RAM.

A bulk pool is an unreserved resource pool that is used when member accounts run out of compute capacity such as EC2, EKS, or other services using EC2 as underlay. All compute capacity in the bulk pool can be requested to launch until it is exhausted. Compute capacity that is not under a reservation pool belongs to the bulk pool by default.

An attic is a resource pool created to hold the compute capacity that the admin wouldn’t like to share with member accounts. The compute capacity remains in control by admin, and can be released to the bulk pool when needed. Admin creates a Capacity Reservation for the attic and owns the Capacity Reservation.

The following diagram shows how the admin uses Capacity Reservations with AWS RAM to manage computing quotas for two member accounts on an Outpost equipped with twenty-four m5.xlarge. Here, I’m going to break the idea into several pieces to help you understand easily.

  1. There are three Capacity Reservations created by admin. CR #1 reserves eight m5.xlarge for the attic, CR #2 reserves four m5.xlarge instances for account A and CR #3 reserves six m5.xlarge instances for account B.
  2. The admin shares Capacity Reservation CR #2 and CR #3 with account A and B respectively.
  3. Since eighteen m5.xlarge instances are reserved, the remaining compute capacity in the bulk pool will be six m5.xlarge.
  4. Both Account A and B can continue to launch instances exceeding the amount in their Capacity Reservation, by utilizing the instances available to everyone in the bulk pool.

Concept of defining computing quotas

  1. Once the bulk pool is exhausted, account A and B won’t be able to launch extra instances from the bulk pool.
  2. The admin can release more compute capacity from the attic to refill the bulk pool, or directly share more capacity with CR#2 and CR#3. The following diagram demonstrates how it works.

Concept of refilling bulk pool

Based on this concept, we realize that compute capacity can be securely and efficiently allocated among multiple AWS accounts. Reservation pools allow every member account to have sufficient resources to meet consistent demand. Making the bulk pool empty indirectly sets the maximum quota of each member account. The attic plays as a provider that is able to release compute capacity into the bulk pool for temporary demand. Here are the major benefits of computing quotas.

  • Centralized compute capacity management
  • Reserving minimum compute capacity for consistent demand
  • Resizable bulk pool for temporary demand
  • Limiting maximum compute capacity to avoid resource congestion.

Configuration process

To take you through the process of configuring computing quotas in the AWS console, I have simplified the environment like the following architecture. There are four m5.4xlarge instances in total. An admin account holds two of the m5.4xlarge in the attic, and a member account gets the other two m5.4xlarge for the reservation pool, which results in no extra instance in the bulk pool for temporary demand.

Prerequisites

  • The admin and the member account are within the same AWS Organization.
  • The Outpost ID, LGW and CoIP have been shared with the member account.

Architecture for configuring computing quotas

  1. Creating a Capacity Reservation for the member account

Sign in to AWS console of the admin account and navigate to the AWS Outposts page. Select the Outpost ID you want to share with the member account, choose Actions, and then select Create Capacity Reservation. In this case, reserve two m5.4xlarge instances.

Create a capacity reservation

In the Reservation details, you can terminate the Capacity Reservation by manually enabling or selecting a specific time. The first option of Instance eligibility will automatically count the number of instances against the Capacity Reservation without specifying a reservation ID. To avoid misconfiguration from member accounts, I suggest you select Any instance with matching details in most use cases.

Reservation details

  1. Sharing the Capacity Reservation through AWS RAM

Go to the RAM page, choose Create resource share under Resource shares page. Search and select the Capacity Reservation you just created for the member account.

Specify resource sharing details

Choose a principal that is an AWS ID of the member account.

Choose principals that are allowed to access

  1. Creating a Capacity Reservation for attic

Create a Capacity Reservation like step 1 without sharing with anyone. This reservation will just be owned by the admin account. After that, check Capacity Reservations under the EC2 page, and the two Capacity Reservations there, both with availability of two m5.4xlarge instances.

3.	Creating a Capacity Reservation for attic

  1. Launching EC2 instances

Log in to the member account, select the Outpost ID the admin shared in step 2 then choose Actions and select Launch instance. Follow AWS Outposts User Guide to launch two m5.4xlarge on the Outpost. When the two instances are in Running state, you can see a Capacity Reservation ID on Details page. In this case, it’s cr-0381467c286b3d900.

Create EC2 instances

So far, the member account has run out of two m5.4xlarge instances that the admin reserved for. If you try to launch the third m5.4xlarge instance, the following failure message will show you there is not enough capacity.

Launch status

  1. Allocating more compute capacity in bulk pool

Go back to the admin console, select the Capacity Reservation ID of the attic on EC2 page and choose Edit. Modify the value of Quantity from 2 to 1 and choose Save, which means the admin is going to release one more m5.4xlarge instance from the attic to the bulk pool.

Instance details

  1. Launching more instances from bulk pool

Switch to the member account console, and repeat step 4 but only launch one more m5.4xlarge instance. With the resource release on step 5, the member account successfully gets the third instance. The compute capacity is coming from the bulk pool, so when you check the Details page of the third instance, the Capacity Reservation ID is blank.

6.	Launching more instances from bulk pool

Cleaning up

  1. Terminate the three EC2 instances in the member account.
  2. Unshare the Capacity Reservation in RAM and delete it in the admin account.
  3. Unshare the Outpost ID, LGW and CoIP in RAM to get the Outposts resources back to the admin.

Conclusion

In this blog post, the admin can dynamically adjust compute capacity allocation on Outposts rack for purpose-built member accounts with an AWS Organization. The bulk pool offers an option to fulfill flexibility of resource planning among member accounts if the maximum instance need per member account is unpredictable. By contrast, if resource forecast is feasible, the admin can revise both the reservation pool and the attic to set a hard limit per member account without using the bulk pool. In addition, I only showed you how to create a Capacity Reservation of m5.4xlarge for the member account, but in fact an admin can create multiple Capacity Reservations with various instance types or sizes for a member account to customize the reservation pool. Lastly, if you would like to securely share Amazon S3 on Outposts with your member accounts, check out Amazon S3 on Outposts now supports sharing across multiple accounts to get more details.