Tag Archives: AWS CloudFormation Stackset

Take fine-grained control of your AWS CloudFormation StackSets Deployment with StackSet Dependencies

Post Syndicated from Tanvi Ravindra Malali original https://aws.amazon.com/blogs/devops/take-fine-grained-control-of-your-aws-cloudformation-stacksets-deployment-with-stackset-dependencies/

Introduction

AWS CloudFormation StackSets enable you to deploy CloudFormation stacks across multiple AWS accounts and regions with a single operation, providing centralized management of infrastructure at scale through AWS Organizations integration. In enterprise environments, multiple StackSet often need to deploy in a specific order. For example, networking infrastructure must be ready before applications can deploy successfully.

Architecture diagram showing an Administrator account with a Stack set, and many target accounts with their own stacks, which in turn control other stacks. Demonstrating how a multi account, multi stack architecture can get complicated.

Figure 1: Example of a multi-region AWS CloudFormation StackSet architecture with an administrative account and target accounts

Previously, when multiple StackSets had auto-deployment enabled, they operated independently without coordination. This could cause deployment failures when dependent infrastructure wasn’t ready, forcing customers to implement complex workarounds or disable auto-deployment entirely.

We are announcing StackSets dependencies, a new feature that gives you fine-grained control over the deployment order of your auto-deployed StackSets, elegantly solving these orchestration challenges.

Feature Overview

This new feature introduces the ability to define dependencies between StackSets using the new DependsOn parameter in the AutoDeployment configuration. When accounts move between Organizational Units or are added to your organization, StackSets automatically orchestrates deployments according to your defined sequence, ensuring foundational infrastructure deploys before dependent applications.

Key capabilities include:

  • Dependency Management: Define up to 10 dependencies per StackSet, with up to 100 dependencies per account. For example, if you have 5 StackSets with 5 dependencies each, you have 25 dependencies counting towards the 100 dependency limit. You can request a limit increase through the service quota console.
  • Cycle Detection: Built-in validation prevents circular dependencies with error messages.
  • Cross-Region Support: Dependencies work across regions.
  • Automatic Cleanup: Dependencies are removed when StackSets are deleted or Organizations are deactivated.

How it works

Let’s walk through this feature with a practical example. Consider an infrastructure setup where you have: A central Infrastructure StackSet that creates IAM roles and networking components and multiple Application StackSets that depend on these foundational resources.

With StackSets dependencies, you can make sure the Infrastructure StackSet completes deployment before any Application StackSets begin, preventing deployment failures due to missing dependencies.

Implementation Scenarios

Let’s explore three common scenarios where StackSets Dependencies provides value:

Scenario 1: Foundation-First Deployment

Use Case: You have a foundational Infrastructure StackSet that creates IAM roles and networking components, and multiple Application StackSets that depend on these resources.

Setup:

  • Infrastructure StackSet ARNs (creates IAM roles, VPCs, security groups)
  • App1 StackSet (web application requiring IAM roles)
  • App2 StackSet (API service requiring networking components)
  • No additional permissions are required to use this feature.

Console Experience

The CloudFormation console provides an intuitive interface for managing StackSet dependencies. Log into the AWS console with your credentials, with an IAM user or administrative user, according to your access. Navigate to the Cloudformation service and create a new Stack or add a YAML/JSON template, where you will be configuring dependencies. In the Step 4 of the Create StackSet wizard, you’ll find a new “StackSet dependencies” form field in the Auto-deployment options section. You can use the attribute editor to add StackSet ARNs for dependencies. The console includes input validation for ARN format and helpful alerts about dependency behavior.

Console view showing options to Activate or Deactivate Automatic deployment, and whether to Delete or Retain stacks, and the new feature, Stack set dependencies, and a space to designate a dependent stack set.

Figure 2: CloudFormation StackSets Console – Auto-deployment options view

AWS CLI Implementation:

  1. Create the foundational Infrastructure StackSet:

aws cloudformation create-stack-set \
  --stack-set-name Infrastructure \
  --permission-model SERVICE_MANAGED \
  --auto-deployment Enabled=true,RetainStacksOnAccountRemoval=true \
  --template-body file://infrastructure-template.yaml \
  --region us-east-1

2. Create App1 with dependency on Infrastructure:

aws cloudformation create-stack-set \
  --stack-set-name App1 \
  --permission-model SERVICE_MANAGED \
  --auto-deployment Enabled=true,RetainStacksOnAccountRemoval=true,\
  DependsOn=arn:aws:cloudformation:us-east-1:123456789012:StackSet/Infrastructure:uuid \
  --template-body file://app1-template.yaml \
  --region us-east-1

3. Create App2 with dependency on Infrastructure:

aws cloudformation create-stack-set \
  --stack-set-name App2 \
  --permission-model SERVICE_MANAGED \
  --auto-deployment Enabled=true,RetainStacksOnAccountRemoval=true,DependsOn=arn:aws:cloudformation:us-east-1:123456789012:StackSet/Infrastructure:uuid \
  --template-body file://app2-template.yaml \
  --region us-west-2

Now, when accounts are added to your organization, Infrastructure deploys first, then App1 and App2 deploy in parallel after Infrastructure completes.

Scenario 2: Multi-Dependency Application

Use Case: Your application requires both networking and security components to be ready before deployment.

Setup:

  • Networking StackSet (VPCs, subnets, route tables)
  • Security StackSet (security groups, NACLs, IAM policies)
  • Application StackSet (requires both networking and security)

Implementation:

  1. Create Networking StackSet

aws cloudformation create-stack-set \
  --stack-set-name Networking \
  --permission-model SERVICE_MANAGED \
  --auto-deployment Enabled=true,RetainStacksOnAccountRemoval=true \
  --template-body file://networking-template.yaml \
  --region us-east-1

2. Create Security StackSet

aws cloudformation create-stack-set \
  --stack-set-name Security \
  --permission-model SERVICE_MANAGED \
  --auto-deployment Enabled=true,RetainStacksOnAccountRemoval=true \
  --template-body file://security-template.yaml \
  --region us-east-1

3. Create Application with dependencies on both Networking and Security

aws cloudformation create-stack-set \
  --stack-set-name Application \
  --permission-model SERVICE_MANAGED \
  --auto-deployment Enabled=true,RetainStacksOnAccountRemoval=true,DependsOn=arn:aws:cloudformation:us-east-1:123456789012:StackSet/Networking:uuid,arn:aws:cloudformation:us-east-1:123456789012:Stackset/Security:uuid \
  --template-body file://application-template.yaml \
  --region us-east-1

As a result, Networking and Security StackSets deploy in parallel, and Application waits for both to complete before starting.

Scenario 3: Resolving Dependency Conflicts

Use Case: You need to update existing StackSets to fix incorrect dependency relationships.

Problem: You have App1 and App2 StackSets. There is an existing dependency that App2 has on App1, but you realize App1 should depend on App2, not the other way around.

Implementation:

First, try to set App1 to depend on App2 (this will fail due to cycle):

aws cloudformation update-stack-set \
  --stack-set-name App1 \
  --auto-deployment Enabled=true,RetainStacksOnAccountRemoval=true,DependsOn=arn:aws:cloudformation:us-east-1:123456789012:StackSet/App2:uuid \
  --use-previous-template

This action will result in error: “Detected cycle(s) between auto-deployment dependencies”. If dependency validation cannot be completed, you’ll receive appropriate error messages to help troubleshoot configuration issues.

Now let’s remove the existing dependency from App2:

aws cloudformation update-stack-set \
  --stack-set-name App2 \
  --auto-deployment Enabled=true,RetainStacksOnAccountRemoval=true \
  --use-previous-template

Now successfully set App1 to depend on App2:

aws cloudformation update-stack-set \
  --stack-set-name App1 \
  --auto-deployment Enabled=true,RetainStacksOnAccountRemoval=true,DependsOn=arn:aws:cloudformation:us-east-1:123456789012:StackSet/App2:uuid \
  --use-previous-template

This scenario demonstrates cycle detection and how to resolve dependency conflicts.

Getting Started

StackSet dependencies is available now in all AWS Regions where CloudFormation StackSets are supported. To get started:

  1. Identify Dependencies: Determine which StackSets should deploy first in your infrastructure.
  2. Configure Relationships: Use the CloudFormation console or AWS CLI to set up dependencies using StackSet ARNs.
  3. Test Your Sequence: Validate your dependency configuration in a test environment.
  4. Monitor Deployments: Use CloudFormation events to track sequenced deployments.

Log into your account in the console and visit the AWS CloudFormation StackSets console or use the AWS CLI/SDK with AWS credentials configured to start controlling StackSet dependencies today.

Authors


Tanvi Ravindra Malali

Tanvi Ravindra Malali is an Associate Delivery Consultant in the AWS A2C team in ProServe. She is based in New York City. She handles customer projects and codebases, specializing in AI/ML, Data Engineering and Infrastructure as Code. Outside of work, she loves to paint landscapes, DJing her favorite songs, and dances Tango.

Idriss Louali Abdou
Idriss Laouali Abdou

Idriss Laouali Abdou is a Sr. Product Manager Technical on the AWS Infrastructure-as-Code team based in Seattle. He focuses on improving developer productivity through CloudFormation and StackSets Infrastructure provisioning experiences. Outside of work, you can find him creating educational content for thousands of students, cooking, or dancing.

StackSets Deployment Strategies: Balancing Speed, Safety, and Scale to Optimize Deployments for Different Organizational Needs

Post Syndicated from Amar Meriche original https://aws.amazon.com/blogs/devops/stacksets-deployment-strategies-balancing-speed-safety-and-scale-to-optimize-deployments-for-different-organizational-needs/

AWS CloudFormation StackSets enables organizations to deploy infrastructure consistently across multiple AWS accounts and regions. However, success depends on choosing the right deployment strategy that balances three critical factors: deployment speed, operational safety, and organizational scale. This guide explores proven StackSets deployment strategies specifically designed for multi-account infrastructure management.

Understanding StackSets Deployment Fundamentals

What are StackSets Actually Used For?

Unlike single-account AWS CloudFormation templates, StackSets are specifically designed for multi-account infrastructure governance. Common use cases include Security baselines (deploying IAM policies, security groups, and access controls across all accounts), Compliance controls (rolling out AWS Config rules, AWS CloudTrail configurations, and audit requirements), Organizational standards (establishing consistent VPC configurations, tagging policies, and naming conventions), Shared services (deploying monitoring solutions, logging infrastructure, and backup policies) or Cost management (implementing budget controls, cost allocation tags, and resource optimization policies)

The Multi-Account Challenge

Managing infrastructure across dozens or hundreds of AWS accounts presents unique challenges:

Single Account (CFN Template)     Multi-Account (StackSets)
      App A                           Org Unit A (50 accounts)
        |                                     |
   [Deploy Once]               [Deploy consistently across all]
        |                                     |
    Success/Fail                Complex success/failure matrix

Multi account and multi region Cloudformation deployment complexity

The Speed-Safety-Scale Triangle

Every StackSets deployment strategy involves trade-offs: Speed (how quickly changes propagate across your organization), Safety (risk mitigation and failure containment) and Scale (ability to manage hundreds of accounts efficiently)

Prerequisites

Before implementing any of the deployment strategies described in this guide, ensure you have:

  1. AWS CLI Installation
    1. Install the latest version of AWS CLI by following the AWS CLI installation guide
    2. Verify installation with: aws –version
  2. AWS Profile Configuration
    1. Configure your AWS credentials using: aws configure
    2. For details on configuration, see AWS CLI configuration basics
    3. Ensure your profile has appropriate permissions for CloudFormation StackSets operations as described in AWS StackSets prerequisites
  3. Proper Account Access The commands in this guide must be executed from either:
    1. The management account of your AWS Organization
    2. OR a delegated administrator account for CloudFormation

For information on setting up a delegated administrator, see Register a delegated administrator

Note: StackSets deployments using service-managed permissions cannot be performed from standalone accounts.

Verify you’re using the correct account with:

bash
# For management account
aws organizations describe-organization
# For delegated admin
aws cloudformation list-stack-sets —call-as DELEGATED_ADMIN

AWS CLI to check the usage of an Organization and not a Standalone account

Core Deployment Strategies

As explained in the StackSet documentation:

  • “For a more conservative deployment, set Maximum Concurrent Accounts to 1, and Failure Tolerance to 0. Set your lowest-impact region to be first in the Region Order Start with one region.”
  • “For a faster deployment, increase the values of Maximum Concurrent Accounts and Failure Tolerance as needed. ”

Based on the above, we are proposing below several deployment strategies, depending on the speed, safety and scale you want to achieve.

1. Sequential Deployment: Maximum Safety

Use Case : Critical security updates, compliance requirements, first-time organizational rollouts

Below are listed some possible use cases:

  • Security baseline updates: New IAM policies affecting root access
  • Compliance rollouts: SOX, HIPAA, or PCI-DSS control implementations
  • Critical infrastructure changes: VPC security group modifications
  • Organizational policy changes: New AWS Config rules for audit compliance

Implementation Example:

For this example, we will download the following template ConfigRuleCloudtrailEnabled.yml from the Cloudformation sample library in the AWS documentation to configure an AWS Config rule to determine if AWS CloudTrail is enabled and follow the next steps:

Step 1: Create the StackSet

With the AWS CLI:

# Create Stackset for security baseline
# StackSet operation managed from us-east-1
aws cloudformation create-stack-set \
  --stack-set-name security-baseline \
  --template-body file://ConfigRuleCloudtrailEnabled.yml \
  --capabilities CAPABILITY_NAMED_IAM \
  --permission-model SERVICE_MANAGED \
  --auto-deployment Enabled=true,RetainStacksOnAccountRemoval=false \
  --region us-east-1

AWS CLI to create a security-baseline Stackset

The expected response should be similar to the following :

{"StacksetId": "security-baseline: ...."}

Step 2: Create Stack Instances

Before you launch the below command, you need to adjust the values of the following parameters:

  • OrganizationalUnitIds: you must change the value “ou-test” in the below command line to the name of the target OU you want to deploy to. I recommend creating a new test OU in the console or via the CLI for the purpose of this test.
  • regions: if needed, change the “us-east-1 eu-west-1” value, here you need to list all the regions you want to deploy to. AWS Config must be active in the accounts/regions that you choose, otherwise you’ll get an error when deploying the Stack.

# Deploy security baseline to production accounts
# StackSet operation managed from us-east-1
# Deployed to regions us-east-1 and eu-west-1
# SEQUENTIAL = One region at a time, sequentially
# MaxConcurrentPercentage = Deploy to 5% of accounts at once
# FailureTolerancePercentage = Stop on first failure
aws cloudformation create-stack-instances \
  --stack-set-name security-baseline \
  --deployment-targets OrganizationalUnitIds=ou-test\
  --regions us-east-1 eu-west-1 \
  --region us-east-1 \
  --operation-preferences RegionConcurrencyType=SEQUENTIAL,MaxConcurrentPercentage=5,FailureTolerancePercentage=0

AWS CLI to create security-baseline Stack Instances sequentially for maximum safety

The CLI output should look like the following:

{"OperationId": ....}

Or create the StackSet and add the Stacks with the AWS Console:

In the CloudFormation Console, click “Create StackSet”

AWS CloudFormation Console: create a security-baseline Stackset

AWS CloudFormation Console: create a security-baseline Stackset

Upload your template from S3 or from your computer and click Next:

AWS CloudFormation Console: specify a template

AWS CloudFormation Console: specify a template

Specify the StackSet name and parameters and click Next:

AWS CloudFormation Console: specify the StackSet name and parameters

AWS CloudFormation Console: specify the StackSet name and parameters

Configure StackSet options and click Next:

AWS CloudFormation Console: configure the StackSet options

AWS CloudFormation Console: configure the StackSet options

Set deployment options and click Next:

AWS CloudFormation Console: set deployment options

AWS CloudFormation Console: set deployment options

AWS CloudFormation Console: set deployment options

AWS CloudFormation Console: set more deployment options

Then Review and Submit.

Not to overweight this blog, we’ll provide only this example of CLI output and Console screenshot, but the “Parallel Deployment” and “Balanced Approach” will be similar to this example. You just need to update the parameters for the different StackSet Operations options.

A real-world example would be a financial services company deploying new MFA requirements across 200 production accounts. They could use sequential deployment with 5 concurrency to ensure each batch was validated before proceeding.

2. Parallel Deployment: Maximum Speed

The Parallel Deployment is best for non-critical updates, development environments, routine maintenance

Here are some possible use cases:

  • Development account standardization: Rolling out new development tools
  • Monitoring infrastructure: Deploying Amazon CloudWatch dashboards and alarms
  • Cost optimization: Implementing automated resource cleanup policies
  • Non-production updates: Updating development and staging environments

Implementation Example:

For this example, we will copy paste the .yml template from this Re:Post article about monitoring IAM events in a file called “monitoring-baseline.yml”, and use it in the following command lines.

Step 1: Create the StackSet

# Create Stackset for monitoring baseline
# StackSet operation managed from us-east-1
aws cloudformation create-stack-set \
--stack-set-name monitoring-baseline \
--template-body file://monitoring-baseline.yml \
--capabilities CAPABILITY_NAMED_IAM \
--permission-model SERVICE_MANAGED \
--auto-deployment Enabled=true,RetainStacksOnAccountRemoval=false \
--region us-east-1

AWS CLI to create a monitoring-baseline Stackset

Step 2: Create Stack Instances

Just like in the previous example, before you launch the below command, you need to adjust the values of the OrganizationalUnitIds and regions parameters.

# Deploy monitoring baseline to dev and sandbox accounts
# StackSet operation managed from us-east-1
# Deployed to regions us-east-1 and eu-west-1
# PARALLEL = Deployment in parallel
# MaxConcurrentPercentage = Deploy to 80% of accounts at once
# FailureTolerancePercentage = Tolerate failures in 20% of accounts
aws cloudformation create-stack-instances \
--stack-set-name monitoring-baseline \
--deployment-targets OrganizationalUnitIds=ou-development,ou-sandbox \
--regions us-east-1 eu-west-1 \
--region us-east-1 \
--operation-preferences RegionConcurrencyType=PARALLEL,MaxConcurrentPercentage=80,FailureTolerancePercentage=20

AWS CLI to create monitoring-baseline Stack Instances in parallel with high value for max concurrent percentage for maximum speed

3. Progressive Deployment: Balanced Approach or Multi Phase Approach (Recommended)

For most production scenarios with moderate risk tolerance, it is recommended to use a Balanced Approach, or Multi-Phase Implementation.

Balanced Approach

For this example, to make it easier, you can create a copy of “monitoring-baseline.yml” created previously, and name it “balanced-template.yml”.

cp monitoring-baseline.yml balanced-template.yml

bash command to copy the monitoring-baseline.yml file to balanced-template.yml

Then you can use it in the following command lines.

Step 1: Create the StackSet

# Create Stackset for a balanced creation
# StackSet operation managed from us-east-1
aws cloudformation create-stack-set \
--stack-set-name balanced-deployment \
--template-body file://balanced-template.yml \
--capabilities CAPABILITY_NAMED_IAM \
--permission-model SERVICE_MANAGED \
--auto-deployment Enabled=true,RetainStacksOnAccountRemoval=false \
--region us-east-1

AWS CLI to create a balanced-deployment Stackset

Step 2: Create Stack Instances

You need to adjust the values of the OrganizationalUnitIds and regions parameters.

# Deploy monitoring baseline to production accounts
# StackSet operation managed from us-east-1
# Deployed to regions us-east-1, eu-west-1 and ap-southeast-1
# PARALLEL = Deployment in parallel
# MaxConcurrentPercentage = Deploy to 25% of accounts at once
# FailureTolerancePercentage = Tolerate failures in 8% of accounts
aws cloudformation create-stack-instances \
--stack-set-name balanced-deployment \
--deployment-targets OrganizationalUnitIds=ou-development,ou-sandbox \
--regions us-east-1 eu-west-1 ap-southeast-1 \
--region us-east-1 \
--operation-preferences RegionConcurrencyType=PARALLEL,MaxConcurrentPercentage=25,FailureTolerancePercentage=8

AWS CLI to create balanced-deployment Stack Instances in parallel with low max concurrent percentage for a balanced deployment

Multi-Phase Implementation:

Step 1: Create the StackSet

# Create Stackset for a balanced creation
# StackSet operation managed from us-east-1
aws cloudformation create-stack-set \
--stack-set-name balanced-deployment \
--template-body file://balanced-template.yml \
--capabilities CAPABILITY_NAMED_IAM \
--permission-model SERVICE_MANAGED \
--auto-deployment Enabled=true,RetainStacksOnAccountRemoval=false \
--region us-east-1

AWS CLI to create a balanced-deployment Stackset

Phase 1: Pilot Accounts (10% of target)

Phase 1: Create Pilot Stack Instances

You need to adjust the values of the OrganizationalUnitIds and regions parameters.

# Deploy monitoring baseline to production accounts
# StackSet operation managed from us-east-1
# Deployed to regions us-east-1
# SEQUENTIAL = Deployment in sequence
# MaxConcurrentPercentage = 100% Deploy full speed for small pilot
# FailureTolerancePercentage = Zero tolerance in pilot
aws cloudformation create-stack-instances \
--stack-set-name balanced-deployment \
--deployment-targets Accounts=pilot-account-1,pilot-account-2 \
--regions us-east-1 \
--region us-east-1 \
--operation-preferences RegionConcurrencyType=SEQUENTIAL,MaxConcurrentPercentage=100,FailureTolerancePercentage=0

AWS CLI to create balanced-deployment Stack Instances sequentially for maximum safety in Pilot accounts

Wait for Pilot validation before proceeding to Phase 2

Phase 2: Early Adopter OUs (30% of target)

Phase 2: Create Early Adopter Stack Instances

You need to adjust the values of the OrganizationalUnitIds and regions parameters.

# Deploy monitoring baseline to production accounts
# StackSet operation managed from us-east-1
# Deployed to regions us-east-1, eu-west-1
# PARALLEL = Deployment in parallel
# MaxConcurrentPercentage = Deploy to 25% of accounts at once
# FailureTolerancePercentage = Tolerate failures in 5% of accounts
aws cloudformation create-stack-instances \
--stack-set-name balanced-deployment \
--deployment-targets OrganizationalUnitIds=ou-early-adopter \
--regions us-east-1 \
--region us-east-1 eu-west-1 \
--operation-preferences RegionConcurrencyType=PARALLEL,MaxConcurrentPercentage=25,FailureTolerancePercentage=5

AWS CLI to create balanced-deployment Stack Instances in parallel with low max concurrent percentage for a balanced deployment in Early Adopter OU

Wait for Early Adopter validation before proceeding to Phase 3

Phase 3: Full Deployment (Remaining 60%)

Phase 3: Full Deployment

You need to adjust the values of the OrganizationalUnitIds and regions parameters.

# Deploy monitoring baseline to production accounts
# StackSet operation managed from us-east-1
# Deployed to regions us-east-1, eu-west-1 and ap-southeast-1
# PARALLEL = Deployment in parallel
# MaxConcurrentPercentage = Deploy to 40% of accounts at once for higher speed after validation
# FailureTolerancePercentage = Tolerate failures in 10% of accounts for moderate tolerance
aws cloudformation create-stack-instances \
--stack-set-name balanced-deployment \
--deployment-targets OrganizationalUnitIds=ou-standard-prod,ou-legacy-prod \
--regions us-east-1 \
--region us-east-1 eu-west-1 ap-southeast-1 \
--operation-preferences RegionConcurrencyType=PARALLEL,MaxConcurrentPercentage=25,FailureTolerancePercentage=5

AWS CLI to create balanced-deployment Stack Instances in parallel with low max concurrent percentage for a balanced deployment in the remaining OUs

Using Step Functions for Orchestration

AWS Step Functions provides a serverless workflow service that can orchestrate StackSets deployments with advanced control flow, error handling, and state management capabilities. This approach enhances your multi-account deployments with features not available through standard StackSets operations alone.

Some of the Key Benefits include:

  • Advanced Deployment Orchestration: Coordinate multi-phase rollouts with validation gates
  • Human Approval Workflows: Implement manual approval steps for critical changes
  • Enhanced Error Handling: Define sophisticated retry policies and fallback mechanisms
  • Visual Monitoring: Track deployment progress through the Step Functions visual console

Real-World Use Case: Compliance Control Rollout

In regulated industries, AWS Step Functions enables a phased approach that combines automation with necessary governance. For instance, you can:

  1. Deploy compliance controls to test accounts
  2. Run automated validation and generate compliance reports
  3. Obtain manual approval from compliance team
  4. Deploy to production accounts with comprehensive monitoring

This approach ensures consistent governance while maintaining the complete audit trail required for regulatory compliance.

Monitoring and Optimization

AWS CloudFormation StackSets do not have extensive built-in Amazon CloudWatch metrics specifically designed for monitoring StackSet operations and health. This is actually why the monitoring implementation in our blog post is valuable.

Here’s what AWS does and doesn’t provide out of the box:

What AWS provides natively:

  • Basic AWS API call metrics via AWS CloudTrail (which show that operations happened but don’t track success rates or performance)
  • General service quotas and throttling metrics for CloudFormation as a whole
  • CloudFormation provides some metrics for individual stacks, but not consolidated StackSet-specific metrics

What requires custom implementation (as in our blog post):

  • Success rate metrics for StackSet operations across accounts
  • Deployment completion time tracking
  • Configuration drift detection and monitoring
  • Account-specific failure analysis
  • Comprehensive dashboards that show StackSet health across your organization

The code in our blog post demonstrates how to implement the success rate custom metrics by:

  1. Gathering data from the CloudFormation API about StackSet operations
  2. Calculating the success rate metrics for StackSet deployments
  3. Creating custom Amazon CloudWatch metrics in a custom namespace (like “StackSetMonitoring”)
  4. Setting up alerts for issues

This explains why organizations need to implement custom monitoring solutions like the one shown in our blog post rather than relying solely on built-in metrics.

Automated Monitoring Implementation: example of a custom metric to monitor the StackSet operations success rate

The following AWS Cloudformation template provides real-time monitoring and alerting for AWS CloudFormation StackSet operations through automated infrastructure deployment. This solution creates a complete monitoring system using a AWS Lambda function, Amazon EventBridge rules, Amazon SNS notifications, and Amazon CloudWatch dashboards to track StackSet success and failure rates. The core Lambda function named StackSetMonitor continuously monitors all active StackSets in your account, calculating success rates and publishing custom metrics to Amazon CloudWatch under the StackSetMonitoring namespace.

Below you’ll find a few example of possible custom metrics that could be implemented based on this AWS Cloudformation template:

  • Count of all operations (CREATE, UPDATE, DELETE) per StackSet over time periods
  • Number of stack instances with configuration drift (requires additional API calls)
  • Average time taken for StackSet operations to complete
  • Rate of StackSet operations to identify peak usage times
  • Number of individual stack instances that failed during operations
  • Number of retried operations (indicates infrastructure issues)

Here’s the StackSetMonitor.yml CloudFormation Template:

# StackSetMonitor.yml 
# CFN template for monitoring AWS CloudFormation StackSet operations with real-time alerts, metrics, and dashboards.

AWSTemplateFormatVersion: '2010-09-09'
Description: 'CloudFormation template for StackSet operation monitoring using CloudWatch and SNS'

Parameters:
  StackSetName:
    Type: String
    Description: 'Name of the StackSet to monitor'
    Default: 'security-baseline'
    MinLength: 1
    MaxLength: 128
    AllowedPattern: '[a-zA-Z][-a-zA-Z0-9]*'
    ConstraintDescription: 'Must be a valid StackSet name (1-128 characters, alphanumeric and hyphens, must start with a letter)'
  
  VpcId:
    Type: String
    Description: 'VPC ID where the Lambda function will be deployed (leave empty to create new VPC)'
    Default: ''
  
  SubnetIds:
    Type: CommaDelimitedList
    Description: 'List of subnet IDs for the Lambda function (leave empty to create new subnets)'
    Default: ''
    
  SecurityGroupIds:
    Type: CommaDelimitedList
    Description: 'List of security group IDs for the Lambda function (leave empty to create new security group)'
    Default: ''

Conditions:
  CreateVPC: !Equals [!Ref VpcId, '']
  CreateVPCAndSubnets: !And [!Equals [!Ref VpcId, ''], !Equals [!Join [',', !Ref SubnetIds], '']]
  HasCustomSecurityGroups: !Not [!Equals [!Join [',', !Ref SecurityGroupIds], '']]
  
Resources:
  # KMS Key for CloudWatch Logs encryption
  LogsKMSKey:
    Type: AWS::KMS::Key
    DeletionPolicy: Delete
    UpdateReplacePolicy: Delete
    Properties:
      Description: 'KMS Key for StackSet Monitor CloudWatch Logs and Lambda environment variable encryption'
      EnableKeyRotation: true
      KeyPolicy:
        Version: '2012-10-17'
        Statement:
          - Sid: Enable IAM User Permissions
            Effect: Allow
            Principal:
              AWS: !Sub 'arn:${AWS::Partition}:iam::${AWS::AccountId}:root'
            Action: 'kms:*'
            Resource: '*'
          - Sid: Allow CloudWatch Logs
            Effect: Allow
            Principal:
              Service: !Sub 'logs.${AWS::Region}.amazonaws.com'
            Action:
              - 'kms:Encrypt'
              - 'kms:Decrypt'
              - 'kms:ReEncrypt*'
              - 'kms:GenerateDataKey*'
              - 'kms:DescribeKey'
            Resource: '*'
            Condition:
              ArnEquals:
                'kms:EncryptionContext:aws:logs:arn': 
                  - !Sub 'arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/StackSetMonitor'
                  - !Sub 'arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/cloudformation/stacksets'
          - Sid: Allow Lambda Service
            Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action:
              - 'kms:Encrypt'
              - 'kms:Decrypt'
              - 'kms:ReEncrypt*'
              - 'kms:GenerateDataKey*'
              - 'kms:DescribeKey'
            Resource: '*'

  LogsKMSKeyAlias:
    Type: AWS::KMS::Alias
    Properties:
      AliasName: alias/stackset-monitor-logs
      TargetKeyId: !Ref LogsKMSKey

  # VPC Resources (created when no existing VPC is provided)
  StackSetMonitorVPC:
    Type: AWS::EC2::VPC
    Condition: CreateVPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsHostnames: true
      EnableDnsSupport: true
      Tags:
        - Key: Name
          Value: StackSetMonitor-VPC
        - Key: Purpose
          Value: VPC for StackSet Monitor Lambda function


  PrivateSubnet1:
    Type: AWS::EC2::Subnet
    Condition: CreateVPC
    Properties:
      VpcId: !Ref StackSetMonitorVPC
      CidrBlock: 10.0.1.0/24
      AvailabilityZone: !Select [0, !GetAZs '']
      Tags:
        - Key: Name
          Value: StackSetMonitor-Private-Subnet-1
        - Key: Purpose
          Value: Private subnet for StackSet Monitor Lambda

  PrivateSubnet2:
    Type: AWS::EC2::Subnet
    Condition: CreateVPC
    Properties:
      VpcId: !Ref StackSetMonitorVPC
      CidrBlock: 10.0.2.0/24
      AvailabilityZone: !Select [1, !GetAZs '']
      Tags:
        - Key: Name
          Value: StackSetMonitor-Private-Subnet-2
        - Key: Purpose
          Value: Private subnet for StackSet Monitor Lambda

  PrivateRouteTable1:
    Type: AWS::EC2::RouteTable
    Condition: CreateVPC
    Properties:
      VpcId: !Ref StackSetMonitorVPC
      Tags:
        - Key: Name
          Value: StackSetMonitor-Private-RT-1

  PrivateRouteTable2:
    Type: AWS::EC2::RouteTable
    Condition: CreateVPC
    Properties:
      VpcId: !Ref StackSetMonitorVPC
      Tags:
        - Key: Name
          Value: StackSetMonitor-Private-RT-2

  PrivateSubnet1RouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Condition: CreateVPC
    Properties:
      RouteTableId: !Ref PrivateRouteTable1
      SubnetId: !Ref PrivateSubnet1

  PrivateSubnet2RouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Condition: CreateVPC
    Properties:
      RouteTableId: !Ref PrivateRouteTable2
      SubnetId: !Ref PrivateSubnet2

  # VPC Endpoints for AWS Services (no internet access needed)
  CloudFormationVPCEndpoint:
    Type: AWS::EC2::VPCEndpoint
    Condition: CreateVPC
    Properties:
      VpcId: !Ref StackSetMonitorVPC
      ServiceName: !Sub com.amazonaws.${AWS::Region}.cloudformation
      VpcEndpointType: Interface
      SubnetIds:
        - !Ref PrivateSubnet1
        - !Ref PrivateSubnet2
      SecurityGroupIds:
        - !Ref VPCEndpointSecurityGroup
      PrivateDnsEnabled: true
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal: '*'
            Action:
              - cloudformation:ListStackSets
              - cloudformation:ListStackSetOperations
              - cloudformation:ListStackInstances
              - cloudformation:DescribeStackInstance
              - cloudformation:DescribeStacks
              - cloudformation:GetTemplate
            Resource: '*'

  CloudWatchVPCEndpoint:
    Type: AWS::EC2::VPCEndpoint
    Condition: CreateVPC
    Properties:
      VpcId: !Ref StackSetMonitorVPC
      ServiceName: !Sub com.amazonaws.${AWS::Region}.monitoring
      VpcEndpointType: Interface
      SubnetIds:
        - !Ref PrivateSubnet1
        - !Ref PrivateSubnet2
      SecurityGroupIds:
        - !Ref VPCEndpointSecurityGroup
      PrivateDnsEnabled: true
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal: '*'
            Action:
              - cloudwatch:PutMetricData
            Resource: '*'

  SNSVPCEndpoint:
    Type: AWS::EC2::VPCEndpoint
    Condition: CreateVPC
    Properties:
      VpcId: !Ref StackSetMonitorVPC
      ServiceName: !Sub com.amazonaws.${AWS::Region}.sns
      VpcEndpointType: Interface
      SubnetIds:
        - !Ref PrivateSubnet1
        - !Ref PrivateSubnet2
      SecurityGroupIds:
        - !Ref VPCEndpointSecurityGroup
      PrivateDnsEnabled: true
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal: '*'
            Action:
              - sns:Publish
            Resource: '*'

  EventsVPCEndpoint:
    Type: AWS::EC2::VPCEndpoint
    Condition: CreateVPC
    Properties:
      VpcId: !Ref StackSetMonitorVPC
      ServiceName: !Sub com.amazonaws.${AWS::Region}.events
      VpcEndpointType: Interface
      SubnetIds:
        - !Ref PrivateSubnet1
        - !Ref PrivateSubnet2
      SecurityGroupIds:
        - !Ref VPCEndpointSecurityGroup
      PrivateDnsEnabled: true
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal: '*'
            Action:
              - events:PutEvents
            Resource: '*'

  LogsVPCEndpoint:
    Type: AWS::EC2::VPCEndpoint
    Condition: CreateVPC
    Properties:
      VpcId: !Ref StackSetMonitorVPC
      ServiceName: !Sub com.amazonaws.${AWS::Region}.logs
      VpcEndpointType: Interface
      SubnetIds:
        - !Ref PrivateSubnet1
        - !Ref PrivateSubnet2
      SecurityGroupIds:
        - !Ref VPCEndpointSecurityGroup
      PrivateDnsEnabled: true
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal: '*'
            Action:
              - logs:CreateLogGroup
              - logs:CreateLogStream
              - logs:PutLogEvents
            Resource: '*'

  SQSVPCEndpoint:
    Type: AWS::EC2::VPCEndpoint
    Condition: CreateVPC
    Properties:
      VpcId: !Ref StackSetMonitorVPC
      ServiceName: !Sub com.amazonaws.${AWS::Region}.sqs
      VpcEndpointType: Interface
      SubnetIds:
        - !Ref PrivateSubnet1
        - !Ref PrivateSubnet2
      SecurityGroupIds:
        - !Ref VPCEndpointSecurityGroup
      PrivateDnsEnabled: true
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal: '*'
            Action:
              - sqs:SendMessage
            Resource: '*'

  STSVPCEndpoint:
    Type: AWS::EC2::VPCEndpoint
    Condition: CreateVPC
    Properties:
      VpcId: !Ref StackSetMonitorVPC
      ServiceName: !Sub com.amazonaws.${AWS::Region}.sts
      VpcEndpointType: Interface
      SubnetIds:
        - !Ref PrivateSubnet1
        - !Ref PrivateSubnet2
      SecurityGroupIds:
        - !Ref VPCEndpointSecurityGroup
      PrivateDnsEnabled: true
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal: '*'
            Action:
              - sts:AssumeRole
              - sts:GetCallerIdentity
              - sts:AssumeRoleWithWebIdentity
            Resource: '*'

  # Security Group for Lambda function
  LambdaSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security group for StackSet Monitor Lambda function
      VpcId: !If
        - CreateVPC
        - !Ref StackSetMonitorVPC
        - !Ref VpcId
      SecurityGroupEgress:
        - IpProtocol: tcp
          FromPort: 443
          ToPort: 443
          CidrIp: 10.0.0.0/16
          Description: HTTPS to VPC Endpoints
        - IpProtocol: tcp
          FromPort: 53
          ToPort: 53
          CidrIp: 10.0.0.0/16
          Description: DNS TCP to VPC for name resolution
        - IpProtocol: udp
          FromPort: 53
          ToPort: 53
          CidrIp: 10.0.0.0/16
          Description: DNS UDP to VPC for name resolution
      Tags:
        - Key: Name
          Value: StackSetMonitor-Lambda-SG
        - Key: Purpose
          Value: Security group for StackSet Monitor Lambda

  VPCEndpointSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Condition: CreateVPC
    Properties:
      GroupDescription: Security group for VPC Endpoints
      VpcId: !Ref StackSetMonitorVPC
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 443
          ToPort: 443
          SourceSecurityGroupId: !Ref LambdaSecurityGroup
          Description: HTTPS from Lambda security group
        - IpProtocol: tcp
          FromPort: 53
          ToPort: 53
          SourceSecurityGroupId: !Ref LambdaSecurityGroup
          Description: DNS TCP from Lambda security group
        - IpProtocol: udp
          FromPort: 53
          ToPort: 53
          SourceSecurityGroupId: !Ref LambdaSecurityGroup
          Description: DNS UDP from Lambda security group
      SecurityGroupEgress:
        - IpProtocol: tcp
          FromPort: 443
          ToPort: 443
          CidrIp: 10.0.0.0/16
          Description: HTTPS outbound within VPC
        - IpProtocol: tcp
          FromPort: 53
          ToPort: 53
          CidrIp: 10.0.0.0/16
          Description: DNS TCP outbound within VPC
        - IpProtocol: udp
          FromPort: 53
          ToPort: 53
          CidrIp: 10.0.0.0/16
          Description: DNS UDP outbound within VPC
      Tags:
        - Key: Name
          Value: StackSetMonitor-VPCEndpoint-SG
        - Key: Purpose
          Value: Security group for VPC Endpoints

  # Dead Letter Queue for Lambda function
  StackSetMonitorDLQ:
    Type: AWS::SQS::Queue
    DeletionPolicy: Delete
    UpdateReplacePolicy: Delete
    Properties:
      QueueName: StackSetMonitor-DLQ
      MessageRetentionPeriod: 1209600  # 14 days
      KmsMasterKeyId: alias/aws/sqs
      Tags:
        - Key: Purpose
          Value: Dead Letter Queue for StackSet Monitor Lambda

  StackSetAlertsTopic:
    Type: AWS::SNS::Topic
    Properties: 
      TopicName: StackSetAlerts
      DisplayName: StackSet Monitoring Alerts
      KmsMasterKeyId: alias/aws/sns
  
  StackSetLogGroup:
    Type: AWS::Logs::LogGroup
    DeletionPolicy: Delete
    UpdateReplacePolicy: Delete
    Properties: 
      LogGroupName: /aws/cloudformation/stacksets
      RetentionInDays: 30
      KmsKeyId: !GetAtt LogsKMSKey.Arn

  LambdaLogGroup:
    Type: AWS::Logs::LogGroup
    DeletionPolicy: Delete
    UpdateReplacePolicy: Delete
    Properties:
      LogGroupName: /aws/lambda/StackSetMonitor
      RetentionInDays: 30
      KmsKeyId: !GetAtt LogsKMSKey.Arn
  
  StackSetMonitoringDashboard:
    Type: AWS::CloudWatch::Dashboard
    Properties:
      DashboardName: StackSetMonitoring
      DashboardBody: !Sub |
        {
          "widgets": [
            {
              "type": "metric",
              "width": 24,
              "height": 8,
              "properties": {
                "metrics": [
                  [ "StackSetMonitoring", "SuccessRate", "StackSetName", "${StackSetName}" ]
                ],
                "region": "${AWS::Region}",
                "title": "StackSet Operations",
                "period": 300,
                "stat": "Average"
              }
            },
            {
              "type": "log",
              "width": 24,
              "height": 6,
              "properties": {
                "query": "SOURCE '/aws/lambda/StackSetMonitor' | fields @timestamp, @message\n| sort @timestamp desc\n| limit 20",
                "region": "${AWS::Region}",
                "title": "Latest StackSet Monitor Logs",
                "view": "table"
              }
            }
          ]
        }
  
  # Consolidated rule to catch ALL StackSet events for comprehensive monitoring
  AllStackSetOperationsRule:
    Type: AWS::Events::Rule
    Properties:
      Name: AllStackSetOperationsRule
      Description: "Rule for monitoring all CloudFormation StackSet operations with failure notifications"
      EventPattern: {source: ["aws.cloudformation"], detail-type: ["CloudFormation StackSet Operation Status Change"]}
      State: ENABLED
      Targets:
        - Id: ProcessAllEvents
          Arn: !GetAtt StackSetMonitorLambda.Arn
        - Id: NotifyFailure
          Arn: !Ref StackSetAlertsTopic
          InputTransformer:
            InputPathsMap:
              "stackSetId": "$.detail.stack-set-id"
              "operationId": "$.detail.operation-id"
              "status": "$.detail.status"
              "time": "$.time"
            InputTemplate: '"StackSet Event: ID: <stackSetId>, Op: <operationId>, Status: <status>, Time: <time>"'

  StackSetMonitorLambda:
    Type: AWS::Lambda::Function
    DependsOn: LambdaLogGroup
    Properties:
      FunctionName: StackSetMonitor
      Handler: index.lambda_handler
      Role: !GetAtt StackSetMonitorRole.Arn
      Runtime: python3.12
      Timeout: 300
      MemorySize: 512
      ReservedConcurrentExecutions: 1
      DeadLetterConfig:
        TargetArn: !GetAtt StackSetMonitorDLQ.Arn
      VpcConfig:
        SecurityGroupIds: !If
          - HasCustomSecurityGroups
          - !Ref SecurityGroupIds
          - - !Ref LambdaSecurityGroup
        SubnetIds: !If
          - CreateVPCAndSubnets
          - - !Ref PrivateSubnet1
            - !Ref PrivateSubnet2
          - !Ref SubnetIds
      KmsKeyArn: !GetAtt LogsKMSKey.Arn
      Code:
        ZipFile: |
          import boto3
          import json
          import os
          import logging
          import time
          import datetime
          from typing import Dict, Any, Optional
          
          # Custom JSON encoder to handle datetime objects
          class DateTimeEncoder(json.JSONEncoder):
              def default(self, obj):
                  if isinstance(obj, datetime.datetime):
                      return obj.isoformat()
                  return super().default(obj)
          
          # Set up logging with more details
          logger = logging.getLogger()
          logger.setLevel(logging.INFO)
          
          # Log initialization to verify Lambda is loading correctly
          print("StackSetMonitor Lambda initializing...")
          
          def validate_event(event: Dict[str, Any]) -> bool:
              """Validate the incoming event structure"""
              if not isinstance(event, dict):
                  logger.error("Event must be a dictionary")
                  return False
              
              # If it's an EventBridge event, validate required fields
              if 'detail' in event:
                  detail = event.get('detail', {})
                  if not isinstance(detail, dict):
                      logger.error("Event detail must be a dictionary")
                      return False
                  
                  # Validate StackSet event structure
                  if 'stack-set-id' in detail:
                      stack_set_id = detail.get('stack-set-id')
                      if not isinstance(stack_set_id, str) or not stack_set_id.strip():
                          logger.error("stack-set-id must be a non-empty string")
                          return False
                      
                      # Validate operation-id if present
                      operation_id = detail.get('operation-id')
                      if operation_id is not None and not isinstance(operation_id, str):
                          logger.error("operation-id must be a string if provided")
                          return False
                      
                      # Validate status if present
                      status = detail.get('status')
                      if status is not None and not isinstance(status, str):
                          logger.error("status must be a string if provided")
                          return False
              
              return True
          
          def validate_context(context: Any) -> bool:
              """Validate the Lambda context object"""
              if context is None:
                  logger.error("Context cannot be None")
                  return False
              
              # Check for required context attributes
              required_attrs = ['function_name', 'function_version', 'invoked_function_arn', 'memory_limit_in_mb']
              for attr in required_attrs:
                  if not hasattr(context, attr):
                      logger.error(f"Context missing required attribute: {attr}")
                      return False
              
              return True
          
          def sanitize_string(value: str, max_length: int = 255) -> str:
              """Sanitize and truncate string inputs"""
              if not isinstance(value, str):
                  return str(value)[:max_length]
              return value.strip()[:max_length]
          
          def lambda_handler(event: Dict[str, Any], context: Any) -> Dict[str, Any]:
              """Main Lambda handler function for StackSet monitoring with input validation"""
              
              # Input validation
              if not validate_event(event):
                  return {
                      "statusCode": 400,
                      "body": json.dumps({
                          "status": "error",
                          "message": "Invalid event structure"
                      }, cls=DateTimeEncoder)
                  }
              
              if not validate_context(context):
                  return {
                      "statusCode": 400,
                      "body": json.dumps({
                          "status": "error",
                          "message": "Invalid context object"
                      }, cls=DateTimeEncoder)
                  }
              
              # Log the validated event for debugging
              logger.info(f"Event received: {json.dumps(event, cls=DateTimeEncoder)}")
              logger.info(f"Function: {context.function_name}, Version: {context.function_version}")
              
              try:
                  cf = boto3.client('cloudformation')
                  cw = boto3.client('cloudwatch')
                  
                  # Log that we're starting processing
                  logger.info(f"Starting StackSet monitoring at {time.time()}")
                  
                  # Check if this is an event from EventBridge
                  if 'detail' in event and 'stack-set-id' in event.get('detail', {}):
                      detail = event['detail']
                      stack_set_id = sanitize_string(detail['stack-set-id'])
                      operation_id = sanitize_string(detail.get('operation-id', 'N/A'))
                      status = sanitize_string(detail.get('status', 'N/A'))
                      
                      # Validate stack_set_id format
                      if not stack_set_id or len(stack_set_id) > 128:
                          logger.error(f"Invalid stack_set_id: {stack_set_id}")
                          return {
                              "statusCode": 400,
                              "body": json.dumps({
                                  "status": "error",
                                  "message": "Invalid stack_set_id format"
                              }, cls=DateTimeEncoder)
                          }
                      
                      # Log the StackSet operation with additional context
                      logger.info(f"Processing StackSet event - ID: {stack_set_id}, Op: {operation_id}, Status: {status}")
                      
                      # Extract stack set name from the ID
                      stack_set_name = stack_set_id.split('/')[-1] if '/' in stack_set_id else stack_set_id
                      stack_set_name = sanitize_string(stack_set_name, 128)
                      logger.info(f"Extracted StackSet name: {stack_set_name}")
                  
                  # Always gather metrics regardless of event type
                  # Get all active StackSets
                  stack_sets_response = cf.list_stack_sets(Status='ACTIVE')
                  stack_sets = stack_sets_response.get('Summaries', [])
                  
                  if not isinstance(stack_sets, list):
                      logger.error("Invalid response from list_stack_sets")
                      return {
                          "statusCode": 500,
                          "body": json.dumps({
                              "status": "error",
                              "message": "Invalid CloudFormation API response"
                          }, cls=DateTimeEncoder)
                      }
                  
                  logger.info(f"Found {len(stack_sets)} active StackSets")
                  
                  for stack_set in stack_sets:
                      if not isinstance(stack_set, dict) or 'StackSetName' not in stack_set:
                          logger.warning(f"Skipping invalid stack_set entry: {stack_set}")
                          continue
                      
                      stack_set_name = sanitize_string(stack_set['StackSetName'], 128)
                      logger.info(f"Processing StackSet: {stack_set_name}")
                      
                      try:
                          operations = cf.list_stack_set_operations(StackSetName=stack_set_name, MaxResults=5)
                          
                          # Validate operations response
                          if not isinstance(operations, dict):
                              logger.error(f"Invalid operations response for {stack_set_name}")
                              continue
                          
                          # Calculate success rate
                          successes = 0
                          operations_list = operations.get('Summaries', [])
                          
                          if not isinstance(operations_list, list):
                              logger.error(f"Invalid operations list for {stack_set_name}")
                              continue
                          
                          total_ops = len(operations_list)
                          logger.info(f"Found {total_ops} recent operations for {stack_set_name}")
                          
                          for op in operations_list:
                              if isinstance(op, dict) and op.get('Status') == 'SUCCEEDED':
                                  successes += 1
                          
                          success_rate = (successes / total_ops * 100) if total_ops > 0 else 100
                          
                          # Validate success_rate is within expected bounds
                          if not (0 <= success_rate <= 100):
                              logger.error(f"Invalid success_rate calculated: {success_rate}")
                              continue
                          
                          # Publish metrics to CloudWatch
                          cw.put_metric_data(
                              Namespace='StackSetMonitoring',
                              MetricData=[
                                  {'MetricName': 'SuccessRate', 'Value': success_rate, 
                                   'Dimensions': [{'Name': 'StackSetName', 'Value': stack_set_name}]}
                              ]
                          )
                          
                          logger.info(f"Published metrics for {stack_set_name}: Success Rate = {success_rate}%")
                      except Exception as e:
                          logger.error(f"Error processing StackSet {stack_set_name}: {str(e)}")
                  
                  return {
                      "statusCode": 200,
                      "body": json.dumps({
                          "status": "completed",
                          "message": f"Processed {len(stack_sets)} StackSets"
                      }, cls=DateTimeEncoder)
                  }
                  
              except Exception as e:
                  logger.error(f"Error in Lambda function: {str(e)}")
                  # Return a proper response even on error
                  return {
                      "statusCode": 500,
                      "body": json.dumps({
                          "status": "error",
                          "message": str(e)
                      }, cls=DateTimeEncoder)
                  }
  
  # Managed IAM Policies
  CloudFormationAccessPolicy:
    Type: AWS::IAM::ManagedPolicy
    Properties:
      Description: 'Policy for CloudFormation and CloudWatch access for StackSet Monitor'
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Action:
              - cloudformation:ListStackSets
              - cloudformation:ListStackSetOperations
              - cloudformation:ListStackInstances
              - cloudformation:DescribeStackInstance
            Resource: 
              - !Sub "arn:${AWS::Partition}:cloudformation:${AWS::Region}:${AWS::AccountId}:stackset/*"
              - !Sub "arn:${AWS::Partition}:cloudformation:${AWS::Region}:${AWS::AccountId}:stackset-target/*"
          - Effect: Allow
            Action:
              - cloudwatch:PutMetricData
            Resource: "*"
            Condition:
              StringEquals:
                "cloudwatch:namespace": "StackSetMonitoring"
          - Effect: Allow
            Action:
              - sns:Publish
            Resource: !Ref StackSetAlertsTopic

  EventsAccessPolicy:
    Type: AWS::IAM::ManagedPolicy
    Properties:
      Description: 'Policy for EventBridge access for StackSet Monitor'
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Action:
              - events:PutEvents
            Resource: !Sub "arn:${AWS::Partition}:events:${AWS::Region}:${AWS::AccountId}:event-bus/default"

  LogsAccessPolicy:
    Type: AWS::IAM::ManagedPolicy
    Properties:
      Description: 'Policy for CloudWatch Logs access for StackSet Monitor'
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Action:
              - logs:CreateLogGroup
              - logs:CreateLogStream
              - logs:PutLogEvents
            Resource: 
              - !Sub "arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/StackSetMonitor"
              - !Sub "arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/StackSetMonitor:*"
              - !Sub "arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/cloudformation/stacksets"
              - !Sub "arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/cloudformation/stacksets:*"

  DLQAccessPolicy:
    Type: AWS::IAM::ManagedPolicy
    Properties:
      Description: 'Policy for Dead Letter Queue access for StackSet Monitor'
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Action:
              - sqs:SendMessage
            Resource: !GetAtt StackSetMonitorDLQ.Arn

  StackSetMonitorRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole
        - !Ref CloudFormationAccessPolicy
        - !Ref EventsAccessPolicy
        - !Ref LogsAccessPolicy
        - !Ref DLQAccessPolicy

  # Permissions for event rules to invoke Lambda
  AllOperationsRuleLambdaPermission:
    Type: AWS::Lambda::Permission
    Properties:
      FunctionName: !Ref StackSetMonitorLambda
      Action: lambda:InvokeFunction
      Principal: events.amazonaws.com
      SourceArn: !GetAtt AllStackSetOperationsRule.Arn
  
  # Using a one minute schedule for testing, but you can change this value
  StackSetMonitorSchedule:
    Type: AWS::Events::Rule
    Properties:
      Name: RegularStackSetMonitoring
      Description: "Triggers Lambda function every 1 minute to check StackSet operations"
      ScheduleExpression: "rate(1 minute)"
      State: ENABLED
      Targets:
        - Id: RunMonitor
          Arn: !GetAtt StackSetMonitorLambda.Arn
  
  ScheduleLambdaInvokePermission:
    Type: AWS::Lambda::Permission
    Properties:
      FunctionName: !Ref StackSetMonitorLambda
      Action: lambda:InvokeFunction
      Principal: events.amazonaws.com
      SourceArn: !GetAtt StackSetMonitorSchedule.Arn
  
  StackSetSuccessRateAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmDescription: "Alarm when StackSet operation success rate is low"
      MetricName: SuccessRate
      Namespace: "StackSetMonitoring"
      Statistic: Average
      Period: 300
      EvaluationPeriods: 3
      DatapointsToAlarm: 2
      Threshold: 80
      ComparisonOperator: LessThanThreshold
      AlarmActions: [!Ref StackSetAlertsTopic]
      Dimensions: [{Name: StackSetName, Value: !Ref StackSetName}]

Outputs:
  SNSTopicArn: 
    Description: The ARN of the SNS topic for alerts
    Value: !Ref StackSetAlertsTopic
  DashboardURL: 
    Description: URL to the CloudWatch Dashboard
    Value: !Sub https://console.aws.amazon.com/cloudwatch/home?region=${AWS::Region}#dashboards:name=StackSetMonitoring
  LambdaLogGroupName:
    Description: Name of the CloudWatch Log Group for Lambda logs
    Value: !Ref LambdaLogGroup
  DeadLetterQueueArn:
    Description: ARN of the Dead Letter Queue for Lambda function failures
    Value: !GetAtt StackSetMonitorDLQ.Arn
  DeadLetterQueueURL:
    Description: URL of the Dead Letter Queue for monitoring failed Lambda executions
    Value: !Ref StackSetMonitorDLQ
  TestLambdaCommand:
    Description: Command to manually test the Lambda function
    Value: !Sub "aws lambda invoke --function-name ${StackSetMonitorLambda} --payload '{}' response.json && cat response.json"
  LambdaFunctionArn:
    Description: ARN of the Lambda function configured with VPC
    Value: !GetAtt StackSetMonitorLambda.Arn
  LambdaSecurityGroupId:
    Description: Security Group ID created for the Lambda function
    Value: !Ref LambdaSecurityGroup
  VpcConfiguration:
    Description: VPC configuration summary for the Lambda function
    Value: !Sub 
      - "VPC: ${VpcId}, Subnets: ${SubnetList}, Security Groups: ${LambdaSecurityGroup}"
      - SubnetList: !Join [',', !Ref SubnetIds]

You need to run the following CLI command to deploy the CloudFormation stacks. You can change the ParameterValue of StackSetName“your-stackset-name” by the name of the StackSet you want to monitor. The default value is “security-baseline”. Your CLI profile should use region=“us-east-1“.

aws cloudformation create-stack --stack-name stackset-monitor --template-body file://StackSetMonitor.yml --parameters ParameterKey=StackSetName,ParameterValue="security-baseline" --capabilities CAPABILITY_IAM

AWS CLI to deploy the StackSetMonitor.yml CloudFormation template

The CLI output should look like the following:

{"StackId": "arn:aws:cloudformation:...."}

Here’s the expected output for the CloudFormation template:

StackSetMonitor Console output

StackSetMonitor Console output

And an example of Amazon CloudWatch Dashboard and Alarm screen:

Amazon CloudWatch Dashboard screenshot for StackSetMonitor stack to track StackSet operations success rate

Amazon CloudWatch Dashboard screenshot for StackSetMonitor stack to track StackSet operations success rate

Amazon CloudWatch Alarm screenshot for StackSetMonitor stack to track StackSet operations success rate

Amazon CloudWatch Alarm screenshot for StackSetMonitor stack to track StackSet operations success rate

SNS subscription setup involves retrieving the topic ARN from stack outputs and configuring notifications for email or SMS endpoints (below example CLI for email subscription):

aws sns subscribe --topic-arn $SNS_TOPIC_ARN --protocol email --notification-endpoint [email protected]

AWS CLI to subscribe to the topic providing the user email

Cost:

The estimated monthly expenses ranges between 5 and 15 USD depending on StackSet activity levels, with approximately 2,880 Lambda executions per day (each minute) under the default monitoring schedule.

The solution supports customization of monitoring frequency by modifying the ScheduleExpression from the default one-minute interval. The cost will decrease if the monitoring is less frequent.

Cleanup:

For cleanup, you can run the following command lines:

  • To cleanup the Stack Instances and StackSets created in the Core Deployment Strategies section:

aws cloudformation delete-stack-instances --stack-set-name security-baseline --deployment-targets OrganizationalUnitIds=ou-xxx --regions us-east-1 eu-west-1 --region us-east-1 --no-retain-stack

AWS CLI to delete the Stack Instances

You need to change the parameter OrganizationalUnitIds value with the name of the OU, the parameter regions with the list of regions where you want to delete your stack instances, and the value of the stack-set-name parameter (security-baseline, monitoring-baseline, balanced-deployment…).

Then you can delete the StackSet:

aws cloudformation delete-stack-set --stack-set-name security-baseline

AWS CLI to delete the StackSet

You can change the value of the stack-set-name parameter.

  • To cleanup the stackset-monitor stack

aws cloudformation delete-stack --stack-name stackset-monitor

AWS CLI to delete the stackset-monitor Stack

You can also remove any IAM roles/policies that you specifically created for this blog that you might not need anymore

Conclusion

Throughout this guide, we’ve explored the nuanced approaches to AWS CloudFormation StackSets deployments across large-scale environments. The key takeaways include:

  • Balance is Critical: Every deployment strategy requires careful consideration of the trade-offs between speed, safety, and scale based on your organizational needs.
  • Progressive Adoption Works: For most organizations, a progressive deployment approach with validation gates provides the optimal balance of safety and efficiency.
  • Organizational Context Matters: Enterprise, startup, and regulated industry patterns demonstrate that deployment strategies should be tailored to your specific business requirements and risk tolerance.
  • Monitoring is Essential: As organizations scale to hundreds of accounts, comprehensive monitoring becomes critical for maintaining visibility and ensuring compliance.

These different approaches will help you adopt the right strategy for your AWS CloudFormation Stacksets deployments in your AWS Organization.

You can now test these different approaches on your sandbox environment, before adapting them for your specific needs, in order to balance Speed, Safety and Scale to optimize your deployments.

Amar Meriche

Amar is a Sr Cloud Operations Architect at AWS in Paris. He helps his customers improve their operational posture through advocacy and guidance, and is an active member of the DevOps and IaC community at AWS. He’s passionate about helping customers use the various IaC tools available at AWS following best practices. When he’s not working with customers, Amar can be found on the mountain trails with his family or playing basketball with his team.

Idriss Laouali Abdou

Idriss is a Sr. Product Manager Technical for AWS Infrastructure-as-Code based in Seattle. He focuses on improving developer productivity through StackSets and CloudFormation Infrastructure provisioning experiences. Outside of work, you can find him creating educational content for thousands of students, cooking, or dancing.

Moeve: Controlling resource deployment at scale with AWS CloudFormation Guard Hooks

Post Syndicated from Pablo Sánchez Carmona original https://aws.amazon.com/blogs/devops/moeve-controlling-resource-deployment-at-scale-with-aws-cloudformation-guard-hooks/

This post is co-written with Rayco Martínez Hernández, Head of Cloud Governance at Moeve.

Moeve, formerly known as Cepsa, is a global integrated energy company with over 90 years of experience and more than 11,000 employees. Moeve is committed to driving Europe’s energy transition and accelerating decarbonization efforts. The company has embraced digital transformation to enhance energy efficiency, safety, and sustainability, focusing on investments in green hydrogen, second-generation biofuels, and ultra-fast electric vehicle charging infrastructure.

At Moeve, we decided to make AWS Control Tower our central governance tool and the foundation of our landing zone at the end of 2022. However, as an organization that wants to ensure that all deployed resources comply with the established requirements, it was challenging for us to remediate errors or vulnerabilities that arise when resources were deployed without compliance with our security definitions. The foundation of controls should be proactive. This is where AWS CloudFormation Hooks, along with other AWS measures like Service Control Policies (SCPs), play a differential role.

We have become familiar with CloudFormation Hooks thanks to the Guard Rules that we deploy as part of our proactive deployment policy on AWS. There are times when you want to block the deployment of Amazon API Gateway without security, Amazon VPC security groups with source 0.0.0.0/0, or with an ALL port range open. In these and other cases, we want to take a step further and create our own controls that are more in line with our own policies, and now we can do so in a simple and agile way, using the managed hooks launched in November 2024.

To be able to use these tools, it is essential, among other things, to ensure that resource deployments are only done through Infrastructure as Code (IaC) tools.

Would you like to know how we achieved it? Let’s get to it!

Background

At Moeve, we ensure that all our deployments within our organization are done through IaC. We enforce this by requiring all deployments to go through CloudFormation, which also allows us to enforce organizational policies using CloudFormation Guard Hooks.

However, for teams with more advanced technical expertise, we allow the use of the AWS Cloud Development Kit (CDK). The AWS CDK enables developers to define infrastructure using general-purpose programming languages, which facilitates code reuse, modular design, and better integration with existing development workflows. It provides high-level abstractions that accelerate the definition of common AWS patterns, while also allowing low-level control when needed. Even though it introduces an abstraction layer, the CDK synthesizes into standard CloudFormation templates, maintaining full compatibility with our governance and compliance mechanisms based on CloudFormation Guard Hooks.

We have several ways to perform deployments: directly launching actions against CloudFormation from the AWS Command Line Interface (CLI), through pipelines, or even by executing actions from code. However, the common basis for these deployments is that they cannot be done with Permission Sets associated with individuals. Users do not have access to deploy resources directly; they have read access and can assume a role that can deploy resources.

To make this more user-friendly, we have a small tool that assumes the role with enough permissions and deploys the template code we specify, with just a call like this:

cloudformation-deployer test.yml

It is crucial to make these controls easy for developers if you want them to comply with the established security measures.

Solution

At Moeve, as a best practice, we have delegated the management of Cloudformation StackSets to an AWS account different from the management account. In this account, we deploy an Amazon S3 bucket where we will store all the files for the CloudFormation Guard hooks or the AWS Lambda functions if they are Lambda type. In Figure 1, you can see a simplified version of our multi-account architecture when deploying resources using StackSets.

Multi-Account structure when building CloudFormation resources using CloudFormation StackSets. There are two central accounts (Management & Delegated Admin) and two child Accounts.

Figure 1. AWS CloudFormation StackSets configuration in a multi-Account environment

An example of a stack that manages the buckets can be seen below:

AWSTemplateFormatVersion: "2010-09-09"

Description: |
  This template creates an Amazon S3 bucket that you can use to deploy an AWS CloudFormation hook.

Parameters:
  GuardHooksBucketName:
    Type: String
    Description: Name for S3 bucket storing the CloudFormation Guard hooks
  OrgId:
    Description: Organization Id which is in the format o-xyzabcdefg
    Type: String
    AllowedPattern: ^o-[a-z0-9]{10,32}$
  
Resources:
  S3Bucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: !Sub ${GuardHooksBucketName}-${AWS::Region}-${AWS::AccountId}
      BucketEncryption:
        ServerSideEncryptionConfiguration:
          - ServerSideEncryptionByDefault:
              SSEAlgorithm: AES256
      PublicAccessBlockConfiguration:
        BlockPublicAcls: true
        BlockPublicPolicy: true
        IgnorePublicAcls: true
        RestrictPublicBuckets: true
      LifecycleConfiguration:
        Rules:
          - Id: MultipartClean
            Status: Enabled
            AbortIncompleteMultipartUpload:
              DaysAfterInitiation: 5
      Tags:
        - Key: project
          Value: shared
      VersioningConfiguration:
        Status: Enabled

  S3BucketPolicy:
    Type: AWS::S3::BucketPolicy
    Properties:
      Bucket: !Ref S3Bucket
      PolicyDocument:
        Version: 2012-10-17
        Statement:
          - Sid: AllowSSLRequestsOnly
            Effect: Deny
            Action: s3:*
            Resource:
              - !Sub ${S3Bucket.Arn}/* # "arn:aws:s3:::arn:aws:s3:::<bucket-name>/<path>/*"
              - !Sub ${S3Bucket.Arn} # "arn:aws:s3:::arn:aws:s3:::<bucket-name>/<path>/"
            Principal: '*'
            Condition:
              Bool:
                aws:SecureTransport: 'false'
          - Sid: AllowOrgAccountsDeployAccess
            Effect: Allow
            Principal:
              AWS: "*"
            Action: "s3:GetObject"
            Resource: !Join
              - ""
              - - "arn:aws:s3:::"
                - !Ref S3Bucket
                - /*
            Condition:
              ForAnyValue:StringLike:
                "aws:PrincipalOrgPaths": !Sub "${OrgId}/*"

With this, we achieve having a centralized S3 bucket with an access policy according to which anyone within our organization can access and retrieve the objects. Versioning is configured to keep previous versions of the hooks we deploy. Then, in the bucket, we will store the files of our hooks, differentiating them by folders.

Child Accounts

For the child accounts, we will deploy a StackSet in the central account over the Organizational Units (OUs) that are defined. Auto-deployment will be configured so that all new Accounts added to the OU acquire these same hooks.

Check the example below where an S3 bucket will be deployed to store the logs of the hooks with the IAM role that the hooks will use, and two hooks: to evaluate the creation and update of API Gateway.

AWSTemplateFormatVersion: "2010-09-09"

Description: |
  Registers the hook in the AWS CloudFormation Private Registry and bucket to logs.

Parameters:
  CustomHooksLogBucketName:
    Type: String
    Description: Name for S3 bucket storing the CloudFormation Guard hook logs
  GuardHooksBucketName:
    Type: String
    Description: Name for S3 bucket storing the CloudFormation Guard hooks

Resources:
  # S3 bucket used to store logs generated by CloudFormation Guard hooks
  LogBucket:
    Type: AWS::S3::Bucket
    Properties:
     # Bucket name is built dynamically with account ID and region
      BucketName: !Sub ${CustomHooksLogBucketName}-${AWS::AccountId}-${AWS::Region}
      BucketEncryption:
        ServerSideEncryptionConfiguration:
          - ServerSideEncryptionByDefault:
              SSEAlgorithm: AES256	# Enforce AES256 encryption
      PublicAccessBlockConfiguration:
        BlockPublicAcls: true
        BlockPublicPolicy: true
        IgnorePublicAcls: true
        RestrictPublicBuckets: true
      LifecycleConfiguration:		# Manage bucket lifecycle
        Rules:
          - Id: MultipartClean
            Status: Enabled
            AbortIncompleteMultipartUpload:
              DaysAfterInitiation: 5	# Abort incomplete uploads after 5 days
          - Id: ExpireAfterOneWeek
            Status: Enabled
            Prefix: ""
            ExpirationInDays: 7	# Expire objects after 7 days
      Tags:
        - Key: project
          Value: shared
  # Bucket policy that enforces SSL-only access to the log bucket
  S3BucketPolicy:
    Type: AWS::S3::BucketPolicy
    Properties:
      Bucket: !Ref LogBucket
      PolicyDocument:
        Version: 2012-10-17
        Statement:
          - Sid: AllowSSLRequestsOnly
            Effect: Deny
            Action: s3:*
            Resource:
              - !Sub ${LogBucket.Arn}/* # "arn:aws:s3:::arn:aws:s3:::<bucket-name>/<path>/*" # Apply to all objects in the bucket
              - !Sub ${LogBucket.Arn} # "arn:aws:s3:::arn:aws:s3:::<bucket-name>/<path>/" # Apply to the bucket itself
            Principal: '*'
            Condition:
              Bool:
                aws:SecureTransport: 'false' 	# Deny if not using HTTPS
   # IAM Role that CloudFormation Guard hooks assume to interact with S3
  GuardHookRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:		# Trust policy for CloudFormation hooks
        Statement:
          - Action: sts:AssumeRole
            Condition:
              StringEquals:
                aws:SourceAccount: !Ref 'AWS::AccountId'
              StringLike:
                'aws:SourceArn': !Sub arn:${AWS::Partition}:cloudformation:${AWS::Region}:${AWS::AccountId}:type/hook/Moeve-*/*
            Effect: Allow
            Principal:
              Service:
                - hooks.cloudformation.amazonaws.com
                - resources.cloudformation.amazonaws.com
        Version: "2012-10-17"
      MaxSessionDuration: 8400
      Path: /
      Policies:
        - PolicyName: HookS3Policy		# Policy granting permissions to write/read from the log bucket
          PolicyDocument:
            Statement:
              - Action:
                - 's3:GetEncryptionConfiguration'
                - 's3:ListAllMyBuckets'
                - 's3:ListBucket'
                - 's3:GetObject'
                - 's3:PutObject'
                Effect: Allow
                Resource:
                  - !Sub arn:${AWS::Partition}:s3:::${LogBucket}
                  - !Sub arn:${AWS::Partition}:s3:::${LogBucket}/*
            Version: "2012-10-17"
        - PolicyName: HookGeneralS3Policy	# Policy granting read access to the bucket containing Guard rules
          PolicyDocument:
            Statement:
              - Action:
                - 's3:GetEncryptionConfiguration'
                - 's3:ListBucket'
                - 's3:GetObject'
                Effect: Allow
                Resource:
                  - !Sub arn:${AWS::Partition}:s3:::${GuardHooksBucketName}
                  - !Sub arn:${AWS::Partition}:s3:::${GuardHooksBucketName}/*
            Version: "2012-10-17"
      Tags:
        - Key: project
          Value: shared

   # Guard hook that validates API Gateway methods before provisioning
  GuardHookApiGateway: 
    Type: AWS::CloudFormation::GuardHook
    Properties:
      ExecutionRole: !GetAtt GuardHookRole.Arn
      LogBucket: !Ref LogBucket
      Alias: Moeve::ApiGatewayAuthorization::Coe
      FailureMode: FAIL	# Fail the operation if validation fails
      HookStatus: ENABLED
      TargetOperations:
        - RESOURCE	# Applies at the resource level
      TargetFilters:
        Actions:
          - CREATE		# Triggered on CREATE operations
        TargetNames:
          - AWS::ApiGateway::Method
        InvocationPoints:
          - PRE_PROVISION		# Runs before resource provisioning
      RuleLocation:
        Uri: !Sub
          - s3://${GuardS3Bucket}/${GuardS3File}	# Location of Guard rules in S3
          - GuardS3Bucket: !Ref GuardHooksBucketName
            GuardS3File: APIGATEWAY/ApiGatewaySecureMethod.guard
      StackFilters:
        FilteringCriteria: ALL
        StackNames:
          Exclude:
            - !Ref AWS::StackName	# Exclude the current stack from evaluation

   # Guard hook that validates API Gateway at stack UPDATE level
  GuardHookApiGatewaySR:
    Type: AWS::CloudFormation::GuardHook
    Properties:
      ExecutionRole: !GetAtt GuardHookRole.Arn
      LogBucket: !Ref LogBucket
      Alias: Moeve::ApiGatewayAuthorizationwithSR::Coe
      FailureMode: FAIL
      HookStatus: ENABLED
      TargetOperations:
        - STACK		# Applies at the stack level
      TargetFilters:
        Actions:
          - UPDATE		# Triggered on UPDATE operations
      RuleLocation:
        Uri: !Sub
          - s3://${GuardS3Bucket}/${GuardS3File}
          - GuardS3Bucket: !Ref GuardHooksBucketName
            GuardS3File: APIGATEWAY/ApiGatewaySecureMethodwithSR.guard
      StackFilters:
        FilteringCriteria: ALL
        StackNames:
          Exclude:
            - !Ref AWS::StackName

It is very important to configure backdoors to avoid blocking our own deployments when working with hooks deployed with CloudFormation StackSets across our organization or in OUs with multiple accounts. For this, we will configure a filter in the hook so that it does not activate on the stack that manages them. If this filter is not applied, it may happen that due to a misconfiguration, the stack cannot be updated and has to be deleted entirely.

Examples

In the following examples, we are going to ensure that no one can deploy an unsecured API Gateway, but at the same time, we do not want to break the current deployments. For this, we have defined two hooks.

When the custom hooks are triggered during deployment, they analyze the CloudFormation template and targets resources of type AWS::ApiGateway::Method, excluding methods that use the OPTIONS HTTP verb. The hooks apply two sequential validation rules to ensure that all API operations are properly secured.

The first rule checks whether each method defines either the ApiKeyRequired or the AuthorizationType property. If neither is present, the hook fails with a clear message: “Fallo en el paso 1 porque no existe ApiKeyRequired o AuthorizationType” (“Step 1 failed because there is no ApiKeyRequired or AuthorizationType”). If the first condition is satisfied, the second rule verifies whether the values themselves enforce security. Specifically, it checks that ApiKeyRequired is set to true, or that AuthorizationType is defined and not set to NONE. If not, the deployment is blocked again with the message: “Fallo en el paso 2, porque existe AuthorizationType o ApiKeyRequired, pero no son valores válidos.” (“Step 2 failed, because AuthorizationType or ApiKeyRequired exists, but they are not valid values.”).

If any of these rules fail, CloudFormation immediately stops the deployment before any resources are created. This avoids partial or insecure configurations and ensures consistency with organizational security standards. The error appears directly in the CloudFormation console or in CI/CD logs and includes the custom message defined in the hook, helping developers quickly identify the issue.

The development team receives immediate feedback during deployment, whether through their CI/CD pipeline (like CodePipeline or GitHub Actions) or the AWS console. The hook’s clear, custom messages make it easy to pinpoint and fix issues. In more mature environments, these failures can also trigger alerts, update dashboards, or create automated tickets, ensuring security enforcement without slowing down delivery.

The responsibility to fix the issue usually lies with the same team that wrote the template, since the error occurs before any infrastructure is provisioned. This approach allows teams to move quickly while still respecting the compliance and security controls in place. In cases where the issue is tied to a broader policy update, the resolution might involve collaboration with the platform or security team

Hook 1: Activates when an API Gateway is created. This hook checks at the resource level for Resources.Type == 'AWS::ApiGateway::Method', and if it does not meet the requirements, this resource cannot be deployed.

let api_gateway_method = Resources.*[ Type == 'AWS::ApiGateway::Method'
  Properties.HttpMethod != /(?i)options/  
]

#
# Primary Rules
#

rule api_gw_authorization_method_check when %api_gateway_method !empty {
    %api_gateway_method{
        Properties.ApiKeyRequired exists or
        Properties.AuthorizationType exists
        <<Fallo en el paso 1 Porque no existe ApiKeyRequired o AuthorizationType>>
    }
}

rule api_gw_authorization_method_check_2 when api_gw_authorization_method_check {
    %api_gateway_method{
        Properties.ApiKeyRequired == true or
        Properties.AuthorizationType != /(?i)none/
        <<Fallo en el paso 2, porque existe AuthorizationType o ApiKeyRequired, pero no son valores validos. >>
    }
}

Hook 2: Activates when an API Gateway is updated. This hook checks at the resource level for Resources.Type == 'AWS::ApiGateway::Method', and if it does not meet the requirements, this resource cannot be deployed.

let api_gateway_method = Resources.*[ Type == 'AWS::ApiGateway::Method'
    Properties.HttpMethod != /(?i)options/  
    Metadata.guard.SuppressedRules not exists or
    Metadata.guard.SuppressedRules.* != "API_GW_METHOD_AUTHORIZATION_TYPE_RULE"
]

#
# Primary Rules
#

rule api_gw_authorization_method_check when %api_gateway_method !empty {
    %api_gateway_method{
        Properties.ApiKeyRequired exists or
        Properties.AuthorizationType exists
        <<Fallo en el paso 1 Porque no existe ApiKeyRequired o AuthorizationType>>
    }
}


rule api_gw_authorization_method_check_2 when api_gw_authorization_method_check {
    %api_gateway_method{
        Properties.ApiKeyRequired == true or
        Properties.AuthorizationType != /(?i)none/
        <<Fallo en el paso 2, porque existe AuthorizationType o ApiKeyRequired, pero no son valores validos. >>
    }
}

To avoid breaking existing deployments, a backdoor is configured so that if specific metadata is applied, the hook will not activate, and deployments can continue without modifying the code beyond the metadata.

AWSTemplateFormatVersion: '2010-09-09'

Description: |
  This template creates an Amazon API Gateway REST API. It shows an example on how to suppress specific checks from CloudFormation Guard.

Resources:
  Api:
    Type: AWS::ApiGateway::RestApi
    Properties:
      Name: myAPI
  MyApiGatewayMethod:
    Type: AWS::ApiGateway::Method
    Metadata:
      guard:
        SuppressedRules:
          - API_GW_METHOD_AUTHORIZATION_TYPE_RULE
    Properties:
      RestApiId: !Ref Api
      ResourceId: !GetAtt Api.RootResourceId
      HttpMethod: GET
      AuthorizationType: 'NONE'
      ApiKeyRequired: false
      Integration:
        IntegrationHttpMethod: 'POST'
        Type: 'MOCK'

  OptionsMethod:
    Type: 'AWS::ApiGateway::Method'
    Properties:
      AuthorizationType: 'NONE'
      HttpMethod: 'OPTIONS'
      RestApiId: !Ref Api
      ResourceId: !GetAtt Api.RootResourceId
      Integration:
        IntegrationHttpMethod: 'POST'
        Type: 'MOCK'

Conclusion

The implementation of our own hook controls in CloudFormation has significantly improved infrastructure management and deployment within Moeve. This capability has allowed us to achieve a balance between flexibility, autonomy, and governance at scale. One of the key benefits is the automation of validations and specific controls, which helps us reduce manual errors and ensure greater consistency in deployments. This also enhances traceability and simplifies infrastructure maintenance, ensuring that our configurations adhere to established best practices.

From a security perspective, having custom rules allows us to strengthen regulatory compliance and minimize risks. We can ensure that only secure configurations aligned with our operational needs are implemented, reducing vulnerabilities and improving our cloud security posture. Preventing incorrect configurations from the start of the resource lifecycle also contributes to greater system stability and resilience. By avoiding infrastructure failures, we create more reliable environments that are prepared for growth.

Additionally, by ensuring optimal configurations during resource deployment, we optimize resource usage and avoid bottlenecks, resulting in better performance. This, in turn, helps us manage costs more effectively by preventing the use of unnecessary or oversized resources. Also, automating controls reduces manual intervention and minimizes waste, making our operations more efficient and sustainable over time.

In summary, the creation and management of our own hooks in CloudFormation provide us with full control over our infrastructure, ensuring an optimal balance between security, scalability, and operational efficiency. This strengthens our ability to innovate without compromising governance, enabling us to operate more agilely and securely in the cloud.

About the author(s)

Rayco Martínez Hernández

Rayco is Head of Cloud Governance at Moeve, building secure, compliant, and efficient cloud environments. He holds a degree in Computer Science and currently pursuing a Master’s degree in Cybersecurity. Passionate about cloud strategy and governance, Rayco focuses on adopting AWS with confidence and control. When not working on cloud initiatives, he enjoys exploring new technologies and spending time outdoors in the Canary Islands.

Pablo Sánchez Carmona

Pablo is a Senior Network Specialist Solutions Architect at AWS, where he helps customers to design secure, resilient and cost-effective networks. When not talking about Networking, Pablo can be found playing basketball or video-games. He holds a MSc in Electrical Engineering from the Royal Institute of Technology (KTH), and a Master’s degree in Telecommunications Engineering from the Polytechnic University of Catalonia (UPC).

Ignacio Rodríguez García

Ignacio is a Technical Account Manager at AWS. In his role, he provides advocacy and strategical technical guidance in order to help customers plan and build solutions using AWS best practices. He holds a MSc in Engineering from Télécom Paris, and a Master’s degree in Telecommunications Engineering from the Polytechnic University of Madrid (UPM). In his free time, Ignacio enjoys playing sports, spending time with his friends, and traveling.

Deploy CloudFormation Hooks to an Organization with service-managed StackSets

Post Syndicated from Kirankumar Chandrashekar original https://aws.amazon.com/blogs/devops/deploy-cloudformation-hooks-to-an-organization-with-service-managed-stacksets/

This post demonstrates using AWS CloudFormation StackSets to deploy CloudFormation Hooks from a centralized delegated administrator account to all accounts within an Organization Unit(OU). It provides step-by-step guidance to deploy controls at scale to your AWS Organization as Hooks using StackSets. By following this post, you will learn how to deploy a hook to hundreds of AWS accounts in minutes.

AWS CloudFormation StackSets help deploy CloudFormation stacks to multiple accounts and regions with a single operation. Using service-managed permissions, StackSets automatically generate the IAM roles required to deploy stack instances, eliminating the need for manual creation in each target account prior to deployment. StackSets provide auto-deploy capabilities to deploy stacks to new accounts as they’re added to an Organizational Unit (OU) in AWS Organization. With StackSets, you can deploy AWS well-architected multi-account solutions organization-wide in a single click and target stacks to selected accounts in OUs. You can also leverage StackSets to auto deploy foundational stacks like networking, policies, security, monitoring, disaster recovery, billing, and analytics to new accounts. This ensures consistent security and governance reflecting AWS best practices.

AWS CloudFormation Hooks allow customers to invoke custom logic to validate resource configurations before a CloudFormation stack create/update/delete operation. This helps enforce infrastructure-as-code policies by preventing non-compliant resources. Hooks enable policy-as-code to support consistency and compliance at scale. Without hooks, controlling CloudFormation stack operations centrally across accounts is more challenging because governance checks and enforcement have to be implemented through disjointed workarounds across disparate services after the resources are deployed. Other options like Config rules evaluate resource configurations on a timed basis rather than on stack operations. And SCPs manage account permissions but don’t include custom logic tailored to granular resource configurations. In contrast, CloudFormation hooks allows customer-defined automation to validate each resource as new stacks are deployed or existing ones updated. This enables stronger compliance guarantees and rapid feedback compared to asynchronous or indirect policy enforcement via other mechanisms.

Follow the later sections of this post that provide a step-by-step implementation for deploying hooks across accounts in an organization unit (OU) with a StackSet including:

  1. Configure service-managed permissions to automatically create IAM roles
  2. Create the StackSet in the delegated administrator account
  3. Target the OU to distribute hook stacks to member accounts

This shows how to easily enable a policy-as-code framework organization-wide.

I will show you how to register a custom CloudFormation hook as a private extension, restricting permissions and usage to internal administrators and automation. Registering the hook as a private extension limits discoverability and access. Only approved accounts and roles within the organization can invoke the hook, following security best practices of least privilege.

StackSets Architecture

As depicted in the following AWS StackSets architecture diagram, a dedicated Delegated Administrator Account handles creation, configuration, and management of the StackSet that defines the template for standardized provisioning. In addition, these centrally managed StackSets are deploying a private CloudFormation hook into all member accounts that belong to the given Organization Unit. Registering this as a private CloudFormation hook enables administrative control over the deployment lifecycle events it can respond to. Private hooks prevent public usage, ensuring the hook can only be invoked by approved accounts, roles, or resources inside your organization.

Architecture for deploying CloudFormation Hooks to accounts in an Organization

Diagram 1: StackSets Delegated Administration and Member Account Diagram

In the above architecture, Member accounts join the StackSet through their inclusion in a central Organization Unit. By joining, these accounts receive deployed instances of the StackSet template which provisions resources consistently across accounts, including the controlled private hook for administrative visibility and control.

The delegation of StackSet administration responsibilities to the Delegated Admin Account follows security best practices. Rather than having the sensitive central Management Account handle deployment logistics, delegation isolates these controls to an admin account with purpose-built permissions. The Management Account representing the overall AWS Organization focuses more on high-level compliance governance and organizational oversight. The Delegated Admin Account translates broader guardrails and policies into specific infrastructure automation leveraging StackSets capabilities. This separation of duties ensures administrative privileges are restricted through delegation while also enabling an organization-wide StackSet solution deployment at scale.

Centralized StackSets facilitate account governance through code-based infrastructure management rather than manual account-by-account changes. In summary, the combination of account delegation roles, StackSet administration, and joining through Organization Units creates an architecture to allow governed, infrastructure-as-code deployments across any number of accounts in an AWS Organization.

Sample Hook Development and Deployment

In the section, we will develop a hook on a workstation using the AWS CloudFormation CLI, package it, and upload it to the Hook Package S3 Bucket. Then we will deploy a CloudFormation stack that in turn deploys a hook across member accounts within an Organization Unit (OU) using StackSets.

The sample hook used in this blog post enforces that server-side encryption must be enabled for any S3 buckets and SQS queues created or updated on a CloudFormation stack. This policy requires that all S3 buckets and SQS queues be configured with server-side encryption when provisioned, ensuring security is built into our infrastructure by default. By enforcing encryption at the CloudFormation level, we prevent data from being stored unencrypted and minimize risk of exposure. Rather than manually enabling encryption post-resource creation, our developers simply enable it as a basic CloudFormation parameter. Adding this check directly into provisioning stacks leads to a stronger security posture across environments and applications. This example hook demonstrates functionality for mandating security best practices on infrastructure-as-code deployments.

Prerequisites

On the AWS Organization:

On the workstation where the hooks will be developed:

In the Delegated Administrator account:

Create a hooks package S3 bucket within the delegated administrator account. Upload the hooks package and CloudFormation templates that StackSets will deploy. Ensure the S3 bucket policy allows access from the AWS accounts within the OU. This access lets AWS CloudFormation access the hooks package objects and CloudFormation template objects in the S3 bucket from the member accounts during stack deployment.

Follow these steps to deploy a CloudFormation template that sets up the S3 bucket and permissions:

  1. Click here to download the admin-cfn-hook-deployment-s3-bucket.yaml template file in to your local workstation.
    Note: Make sure you model the S3 bucket and IAM policies as least privilege as possible. For the above S3 Bucket policy, you can add a list of IAM Role ARNs created by the StackSets service managed permissions instead of AWS: “*”, which allows S3 bucket access to all the IAM entities from the accounts in the OU. The ARN of this role will be “arn:aws:iam:::role/stacksets-exec-” in every member account within the OU. For more information about equipping least privilege access to IAM policies and S3 Bucket Policies, refer IAM Policies and Bucket Policies and ACLs! Oh, My! (Controlling Access to S3 Resources) blog post.
  2. Execute the following command to deploy the template admin-cfn-hook-deployment-s3-bucket.yaml using AWS CLI. For more information see Creating a stack using the AWS Command Line Interface. If using AWS CloudFormation console, see Creating a stack on the AWS CloudFormation console.
    To get the OU Id, see Viewing the details of an OU. OU Id starts with “ou-“. To get the Organization Id, see Viewing details about your organization. Organization Id starts with “o-

    aws cloudformation create-stack \
    --stack-name hooks-asset-stack \
    --template-body file://admin-cfn-deployment-s3-bucket.yaml \
    --parameters ParameterKey=OrgId,ParameterValue="&lt;Org_id&gt;" \
    ParameterKey=OUId,ParameterValue="&lt;OU_id&gt;"
  3. After deploying the stack, note down the AWS S3 bucket name from the CloudFormation Outputs.

Hook Development

In this section, you will develop a sample CloudFormation hook package that will enforce encryption for S3 Buckets and SQS queues within the preCreate and preDelete hook. Follow the steps in the walkthrough to develop a sample hook and generate a zip package for deploying and enabling them in all the accounts within an OU. While following the walkthrough, within the Registering hooks section, make sure that you stop right after executing the cfn submit --dry-run command. The --dry-run option will make sure that your hook is built and packaged your without registering it with CloudFormation on your account. While initiating a Hook project if you created a new directory with the name mycompany-testing-mytesthook, the hook package will be generated as a zip file with the name mycompany-testing-mytesthook.zip at the root your hooks project.

Upload mycompany-testing-mytesthook.zip file to the hooks package S3 bucket within the Delegated Administrator account. The packaged zip file can then be distributed to enable the encryption hooks across all accounts in the target OU.

Note: If you are using your own hooks project and not doing the tutorial, irrespective of it, you should make sure that you are executing the cfn submit command with the --dry-run option. This ensures you have a hooks package that can be distributed and reused across multiple accounts.

Hook Deployment using CloudFormation Stack Sets

In this section, deploy the sample hook developed previously across all accounts within an OU. Use a centralized CloudFormation stack deployed from the delegated administrator account via StackSets.

Deploying hooks via CloudFormation requires these key resources:

  1. AWS::CloudFormation::HookVersion: Publishes a new hook version to the CloudFormation registry
  2. AWS::CloudFormation::HookDefaultVersion: Specifies the default hook version for the AWS account and region
  3. AWS::CloudFormation::HookTypeConfig: Defines the hook configuration
  4. AWS::IAM::Role #1: Task execution role that grants the hook permissions
  5. AWS::IAM::Role #2: (Optional) role for CloudWatch logging that CloudFormation will assume to send log entries during hook execution
  6. AWS::Logs::LogGroup: (Optional) Enables CloudWatch error logging for hook executions

Follow these steps to deploy CloudFormation Hooks to accounts within the OU using StackSets:

  1. Click here to download the hooks-template.yaml template file into your local workstation and upload it into the Hooks package S3 bucket in the Delegated Administrator account.
  2. Deploy the hooks CloudFormation template hooks-template.yaml to all accounts within an OU using StackSets. Leverage service-managed permissions for automatic IAM role creation across the OU.
    To deploy the hooks template hooks-template.yaml across OU using StackSets, click here to download the CloudFormation StackSets template hooks-stack-sets-template.yaml locally, and upload it to the hooks package S3 bucket in the delegated administrator account. This StackSets template contains an AWS::CloudFormation::StackSet resource that will deploy the necessary hooks resources from hooks-template.yaml to all accounts in the target OU. Using SERVICE_MANAGED permissions model automatically handle provisioning the required IAM execution roles per account within the OU.
  3. Execute the following command to deploy the template hooks-stack-sets-template.yaml using AWS CLI. For more information see Creating a stack using the AWS Command Line Interface. If using AWS CloudFormation console, see Creating a stack on the AWS CloudFormation console.To get the S3 Https URL for the hooks template, hooks package and StackSets template, login to the AWS S3 service on the AWS console, select the respective object and click on Copy URL button as shown in the following screenshot:s3 download https url
    Diagram 2: S3 Https URL

    To get the OU Id, see Viewing the details of an OU. OU Id starts with “ou-“.
    Make sure to replace the <S3BucketName> and then <OU_Id> accordingly in the following command:

    aws cloudformation create-stack --stack-name hooks-stack-set-stack \
    --template-url https://<S3BucketName>.s3.us-west-2.amazonaws.com/hooks-stack-sets-template.yaml \
    --parameters ParameterKey=OuId,ParameterValue="<OU_Id>" \
    ParameterKey=HookTypeName,ParameterValue="MyCompany::Testing::MyTestHook" \
    ParameterKey=s3TemplateURL,ParameterValue="https://<S3BucketName>.s3.us-west-2.amazonaws.com/hooks-template.yaml" \
    ParameterKey=SchemaHandlerPackageS3URL,ParameterValue="https://<S3BucketName>.s3.us-west-2.amazonaws.com/mycompany-testing-mytesthook.zip"
  4. Check the progress of the stack deployment using the aws cloudformation describe-stack command. Move to the next section when the stack status is CREATE_COMPLETE.
    aws cloudformation describe-stacks --stack-name hooks-stack-set-stack
  5. If you navigate to the AWS CloudFormation Service’s StackSets section in the console, you can view the stack instances deployed to the accounts within the OU. Alternatively, you can execute the AWS CloudFormation list-stack-instances CLI command below to list the deployed stack instances:
    aws cloudformation list-stack-instances --stack-set-name MyTestHookStackSet

Testing the deployed hook

Deploy the following sample templates into any AWS account that is within the OU where the hooks was deployed and activated. Follow the steps in the Creating a stack on the AWS CloudFormation console. If using AWS CloudFormation CLI, follow the steps in the Creating a stack using the AWS Command Line Interface.

  1. Provision a non-compliant stack without server-side encryption using the following template:
    AWSTemplateFormatVersion: 2010-09-09
    Description: |
      This CloudFormation template provisions an S3 Bucket
    Resources:
      S3Bucket:
        Type: 'AWS::S3::Bucket'
        Properties: {}

    The stack deployment will not succeed and will give the following error message

    The following hook(s) failed: [MyCompany::Testing::MyTestHook] and the hook status reason as shown in the following screenshot:

    stack deployment failure due to hooks execution
    Diagram 3: S3 Bucket creation failure with hooks execution

  2. Provision a stack using the following template that has server-side encryption for the S3 Bucket.
    AWSTemplateFormatVersion: 2010-09-09
    Description: |
      This CloudFormation template provisions an encrypted S3 Bucket. **WARNING** This template creates an Amazon S3 bucket and a KMS key that you will be charged for. You will be billed for the AWS resources used if you create a stack from this template.
    Resources:
      EncryptedS3Bucket:
        Type: "AWS::S3::Bucket"
        Properties:
          BucketName: !Sub "encryptedbucket-${AWS::Region}-${AWS::AccountId}"
          BucketEncryption:
            ServerSideEncryptionConfiguration:
              - ServerSideEncryptionByDefault:
                  SSEAlgorithm: "aws:kms"
                  KMSMasterKeyID: !Ref EncryptionKey
                BucketKeyEnabled: true
      EncryptionKey:
        Type: "AWS::KMS::Key"
        DeletionPolicy: Retain
        UpdateReplacePolicy: Retain
        Properties:
          Description: KMS key used to encrypt the resource type artifacts
          EnableKeyRotation: true
          KeyPolicy:
            Version: 2012-10-17
            Statement:
              - Sid: Enable full access for owning account
                Effect: Allow
                Principal:
                  AWS: !Ref "AWS::AccountId"
                Action: "kms:*"
                Resource: "*"
    Outputs:
      EncryptedBucketName:
        Value: !Ref EncryptedS3Bucket

    The deployment will succeed as it will pass the hook validation with the following hook status reason as shown in the following screenshot:

    stack deployment pass due to hooks executionDiagram 4: S3 Bucket creation success with hooks execution

Updating the hooks package

To update the hooks package, follow the same steps described in the Hooks Development section to change the hook code accordingly. Then, execute the cfn submit --dry-run command to build and generate the hooks package file with the registering the type with the CloudFormation registry. Make sure to rename the zip file with a unique name compared to what was previously used. Otherwise, while updating the CloudFormation StackSets stack, it will not see any changes in the template and thus not deploy updates. The best practice is to use a CI/CD pipeline to manage the hook package. Typically, it is good to assign unique version numbers to the hooks packages so that CloudFormation stacks with the new changes get deployed.

Cleanup

Navigate to the AWS CloudFormation console on the Delegated Administrator account, and note down the Hooks package S3 bucket name and empty its contents. Refer to Emptying the Bucket for more information.

Delete the CloudFormation stacks in the following order:

  1. Test stack that failed
  2. Test stack that passed
  3. StackSets CloudFormation stack. This has a DeletionPolicy set to Retain, update the stack by removing the DeletionPolicy and then initiate a stack deletion via CloudFormation or physically delete the StackSet instances and StackSets from the Console or CLI by following: 1. Delete stack instances from your stack set 2. Delete a stack set
  4. Hooks asset CloudFormation stack

Refer to the following documentation to delete CloudFormation Stacks: Deleting a stack on the AWS CloudFormation console or Deleting a stack using AWS CLI.

Conclusion

Throughout this blog post, you have explored how AWS StackSets enable the scalable and centralized deployment of CloudFormation hooks across all accounts within an Organization Unit. By implementing hooks as reusable code templates, StackSets provide consistency benefits and slash the administrative labor associated with fragmented and manual installs. As organizations aim to fortify governance, compliance, and security through hooks, StackSets offer a turnkey mechanism to efficiently reach hundreds of accounts. By leveraging the described architecture of delegated StackSet administration and member account joining, organizations can implement a single hook across hundreds of accounts rather than manually enabling hooks per account. Centralizing your hook code-base within StackSets templates facilitates uniform adoption while also simplifying maintenance. Administrators can update hooks in one location instead of attempting fragmented, account-by-account changes. By enclosing new hooks within reusable StackSets templates, administrators benefit from infrastructure-as-code descriptiveness and version control instead of one-off scripts. Once configured, StackSets provide automated hook propagation without overhead. The delegated administrator merely needs to include target accounts through their Organization Unit alignment rather than handling individual permissions. New accounts added to the OU automatically receive hook deployments through the StackSet orchestration engine.

About the Author

kirankumar.jpeg

Kirankumar Chandrashekar is a Sr. Solutions Architect for Strategic Accounts at AWS. He focuses on leading customers in architecting DevOps, modernization using serverless, containers and container orchestration technologies like Docker, ECS, EKS to name a few. Kirankumar is passionate about DevOps, Infrastructure as Code, modernization and solving complex customer issues. He enjoys music, as well as cooking and traveling.

Using CloudFormation events to build custom workflows for post provisioning management

Post Syndicated from Vivek Kumar original https://aws.amazon.com/blogs/devops/using-cloudformation-events-to-build-custom-workflows-for-post-provisioning-management/

Over one million active customers manage application resources with AWS CloudFormation every week. CloudFormation is a service that helps you model, provision, and manage your cloud resources by treating Infrastructure as Code (IaC). It can simplify infrastructure management, quickly replicate your environment to multiple AWS regions with a single turn-key solution, and let you easily control and track changes in your infrastructure.

You can create various AWS resources using CloudFormation to setup an environment for your workloads. You continue to interact with and manage those resources throughout the workload lifecycle to make sure the resource configuration is aligned with business objectives such as adhering to security compliance standards, meeting required reliability targets, and aligning with budget requirements. The inability to perform a hand-off between resource provisioning actions in CloudFormation and resource management actions in other relevant AWS and non-AWS services poses a challenge. For example, after provisioning of resources, customers might need to perform additional tasks to manage these resources such as adding cost allocation tags, populating resource inventory database or trigger downstream processes.

While they are able to obtain the logical resource grouping that is tied to a workload or a workload component with a CloudFormation stack, that context does not extend beyond CloudFormation for the most part when they use various AWS and non-AWS services to conduct post-provisioning resource management. These AWS and non-AWS services typically offer a resource level view, or in some cases offer basic aggregated views such as supporting a tag group, or an account level abstraction to see all resources in a given account. For a CloudFormation customer, the inability to not have the context of a stack beyond resource provisioning provides a disjointed experience given there is no hand-off between resource provisioning actions in CloudFormation and resource management actions in other relevant AWS and non-AWS services. The various management actions customers take with their workload resources through out their lifecycle are

CloudFormation events provide a robust way to track the status of individual resources during the lifecycle of a stack. You can send CloudFormation events to Amazon EventBridge whenever a create, update,  or drift detection action is performed on your stack. Then you can set up additional workflows based on those events from EventBridge. For example, by tagging the resources automatically, you can reference that tag group when using AWS Trusted Advisor, and continue your resource management experience post-provisioning. CloudFormation sends these events to EventBridge automatically so that you don’t need to do anything. One real-world use case is to use these events to create actionable tasks for your teams to troubleshoot issues. CloudFormation events published to EventBridge can be used to create OpsItems within AWS Systems Manager OpsCenter. OpsItems are the work items created in OpsCenter for engineers to view, investigate and remediate tasks/issues. This enables teams to respond and resolve any issues more efficiently.

Walkthrough

To set up the EventBridge rule, go to the AWS console and navigate to EventBridge. Select on Create Rule to get started. Enter Name, description and select Next:

Create Rule

On the next screen, select AWS events in the Event source section.

This sample event is for the CREATE_COMPLETE event. It contains the source, AWS account number, AWS region, event type, resources and details about the event.

On the same page in the Event pattern section:

Select Custom patterns (JSON editor) and enter the following event pattern. This will match any events when a resource fails to create, update, or delete. Learn more about EventBridge event patterns.

{
    "source": [
        "aws.cloudformation"
    ],
    "detail-type": [
        "CloudFormation Resource Status Change"
    ],
    "detail": {
        "status-details": {
            "status": [
                "CREATE_FAILED",
                "UPDATE_FAILED",
                "DELETE_FAILED"
            ]
        }
    }
}

Custom patterns - JSON editor

Select Next. On the Target screen, select AWS service, then select System Manager OpsItem as the target for this rule.

Target 1

Add a second target – an Amazon Simple Notification Service (SNS) Topic – to notify the Ops team whenever a failure occurs and an OpsItem has been created.

Target 2

Select Next and optionally add tags.

Select next to review the selections, and select Create rule.

Now your rule is created and whenever a stack failure occurs, an OpsItem gets created and a notification is sent out for the operators to troubleshoot and fix the issue. The OpsItem contains operational data, such as the resource that failed, the reason for failure, as well as the stack to which it belongs, which is useful for troubleshooting the issue. Operators can take manual actions or use runbooks codified as Systems Manager Documents to take corrective actions. From the AWS Console you can go to OpsCenter to see the events:

operational data

Once the issues have been addressed, operators can mark the OpsItem as resolved, and retry the stack operation that failed, resulting in a swift resolution of the issue, and preventing duplication of efforts.

This walkthrough is for the Console but you can use AWS Command Line Interface (AWS CLI), AWS SDK or even CloudFormation to accomplish all of this. Refer to AWS CLI documentation for more information on creating EventBridge rules through CLI. Furthermore, refer to AWS SDK documentation for creating EventBridge rules through AWS SDK. You can use following CloudFormation template to deploy the EventBridge rules example used as part of the walkthrough in this blog post:

{
	"Parameters": {
		"SNSTopicARN": {
			"Type": "String",
			"Description": "Enter the ARN of the SNS Topic where you want stack failure notifications to be sent."
		}
	},
	"Resources": {
		"CFNEventsRule": {
			"Type": "AWS::Events::Rule",
			"Properties": {
				"Description": "Event rule to capture CloudFormation failure events",
				"EventPattern": {
					"source": [
						"aws.cloudformation"
					],
					"detail-type": [
						"CloudFormation Resource Status Change"
					],
					"detail": {
						"status-details": {
							"status": [
								"CREATE_FAILED",
								"UPDATE_FAILED",
								"DELETE_FAILED"
							]
						}
					}
				},
				"Name": "cfn-stack-failure-test",
				"State": "ENABLED",
				"Targets": [
					{
						"Arn": {
							"Fn::Sub": "arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:opsitem"
						},
						"Id": "opsitems",
						"RoleArn": {
							"Fn::GetAtt": [
								"TargetInvocationRole",
								"Arn"
							]
						}
					},
					{
						"Arn": {
							"Ref": "SNSTopicARN"
						},
						"Id": "sns"
					}
				]
			}
		},
		"TargetInvocationRole": {
			"Type": "AWS::IAM::Role",
			"Properties": {
				"AssumeRolePolicyDocument": {
					"Version": "2012-10-17",
					"Statement": [
						{
							"Effect": "Allow",
							"Principal": {
								"Service": [
									"events.amazonaws.com"
								]
							},
							"Action": [
								"sts:AssumeRole"
							]
						}
					]
				},
				"Path": "/",
				"Policies": [
					{
						"PolicyName": "createopsitem",
						"PolicyDocument": {
							"Version": "2012-10-17",
							"Statement": [
								{
									"Effect": "Allow",
									"Action": [
										"ssm:CreateOpsItem"
									],
									"Resource": "*"
								}
							]
						}
					}
				]
			}
		},
		"AllowSNSPublish": {
			"Type": "AWS::SNS::TopicPolicy",
			"Properties": {
				"PolicyDocument": {
					"Statement": [
						{
							"Sid": "grant-eventbridge-publish",
							"Effect": "Allow",
							"Principal": {
								"Service": "events.amazonaws.com"
							},
							"Action": [
								"sns:Publish"
							],
							"Resource": {
								"Ref": "SNSTopicARN"
							}
						}
					]
				},
				"Topics": [
					{
						"Ref": "SNSTopicARN"
					}
				]
			}
		}
	}
}

Summary

Responding to CloudFormation stack events becomes easy with the integration between CloudFormation and EventBridge. CloudFormation events can be used to perform post-provisioning actions on workload resources. With the variety of targets available to EventBridge rules, various actions such as adding tags and, troubleshooting issues can be performed. This example above uses Systems Manager and Amazon SNS but you can have numerous targets including, Amazon API gateway, AWS Lambda, Amazon Elastic Container Service (Amazon ECS) task, Amazon Kinesis services, Amazon Redshift, Amazon SageMaker pipeline, and many more. These events are available for free in EventBridge.

Learn more about Managing events with CloudFormation and EventBridge.

About the Author

Vivek is a Solutions Architect at AWS based out of New York. He works with customers providing technical assistance and architectural guidance on various AWS services. He brings more than 25 years of experience in software engineering and architecture roles for various large-scale enterprises.

 

 

Mahanth is a Solutions Architect at Amazon Web Services (AWS). As part of the AWS Well-Architected team, he works with customers and AWS Partner Network partners of all sizes to help them build secure, high-performing, resilient, and efficient infrastructure for their applications. He spends his free time playing with his pup Cosmo, learning more about astronomy, and is an avid gamer.

 

 

Sukhchander is a Solutions Architect at Amazon Web Services. He is passionate about helping startups and enterprises adopt the cloud in the most scalable, secure, and cost-effective way by providing technical guidance, best practices, and well architected solutions.

Building a CI/CD pipeline to update an AWS CloudFormation StackSets

Post Syndicated from Karim Afifi original https://aws.amazon.com/blogs/devops/building-a-ci-cd-pipeline-to-update-an-aws-cloudformation-stacksets/

AWS CloudFormation StackSets can extend the functionality of CloudFormation Stacks by enabling you to create, update, or delete one or more stack across multiple accounts. As a developer working in a large enterprise or for a group that supports multiple AWS accounts, you may often find yourself challenged with updating AWS CloudFormation StackSets. If you’re building a CI/CD pipeline to automate the process of updating CloudFormation stacks, you can do so natively. AWS CodePipeline can initiate a workflow that builds and tests a stack, and then pushes it to production. The workflow can either create or manipulate an existing stack; however, working with AWS CloudFormation StackSets is currently not a supported action at the time of this writing.

You can update an existing CloudFormation stack using one of two methods:

  • Directly updating the stack – AWS immediately deploys the changes that you submit. You can use this method when you want to quickly deploy your updates.
  • Running change sets – You can preview the changes AWS CloudFormation will make to the stack, and decide whether to proceed with the changes.

You have several options when building a CI/CD pipeline to automate creating or updating a stack. You can create or update a stack, delete a stack, create or replace a change set, or run a change set. Creating or updating a CloudFormation StackSet, however, is not a supported action.

The following screenshot shows the existing actions supported by CodePipeline against AWS CloudFormation on the CodePipeline console.

CodePipeline console

This post explains how to use CodePipeline to update an existing CloudFormation StackSet. For this post, we update the StackSet’s parameters. Parameters enable you to input custom values to your template each time you create or update a stack.

Overview of solution

To implement this solution, we walk you through the following high-level steps:

  1. Update a parameter for a StackSet by passing a parameter key and its associated value via an AWS CodeCommit
  2. Create an AWS CodeBuild
  3. Build a CI/CD pipeline.
  4. Run your pipeline and monitor its status.

After completing all the steps in this post, you will have a fully functional CI/CD that updates the CloudFormation StackSet parameters. The pipeline starts automatically after you apply the intended changes into the CodeCommit repository.

The following diagram illustrates the solution architecture.

Solution Architecture

The solution workflow is as follows:

  1. Developers integrate changes into a main branch hosted within a CodeCommit repository.
  2. CodePipeline polls the source code repository and triggers the pipeline to run when a new version is detected.
  3. CodePipeline runs a build of the new revision in CodeBuild.
  4. CodeBuild runs the changes in the yml file, which includes the changes against the StackSets. (To update all the stack instances associated with this StackSet, do not specify DeploymentTargets or Regions in the buildspec.yml file.)
  5. Verify that the changes were applied successfully.

Prerequisites

To complete this tutorial, you should have the following prerequisites:

Retrieving your StackSet parameters

Your first step is to verify that you have a StackSet in the AWS account you intend to use. If not, create one before proceeding. For this post, we use an existing StackSet called StackSet-Test.

  1. Sign in to your AWS account.
  2. On the CloudFormation console, choose StackSets.
  3. Choose your StackSet.

StackSet

For this post, we modify the value of the parameter with the key KMSId.

  1. On the Parameters tab, note the value of the key KMSId.

Parameters

Creating a CodeCommit repository

To create your repository, complete the following steps:

  1. On the CodeCommit console, choose Repositories.
  2. Choose Create repository.

Repositories name

  1. For Repository name, enter a name (for example, Demo-Repo).
  2. Choose Create.

Repositories Description

  1. Choose Create file to populate the repository with the following artifacts.

Create file

A buildspec.yml file informs CodeBuild of all the actions that should be taken during a build run for our application. We divide the build run into separate predefined phases for logical organization, and list the commands that run on the provisioned build server performing a build job.

  1. Enter the following code in the code editor:

YAML

phases:

  pre_build:

    commands:

      - aws cloudformation update-stack-set --stack-set-name StackSet-Test --use-previous-template --parameters ParameterKey=KMSId,ParameterValue=newCustomValue

The preceding AWS CloudFormation command updates a StackSet with the name StackSet-Test. The command results in updating the parameter value of the parameter key KMSId to newCustomValue.

  1. Name the file yml.
  2. Provide an author name and email address.
  3. Choose Commit changes.

Creating a CodeBuild project

To create your CodeBuild project, complete the following steps:

  1. On the CodeBuild console, choose Build projects.
  2. Choose Create build project.

create build project

  1. For Project name, enter your project name (for example, Demo-Build).
  2. For Description, enter an optional description.

project name

  1. For Source provider, choose AWS CodeCommit.
  2. For Repository, choose the CodeCommit repository you created in the previous step.
  3. For Reference type, keep default selection Branch.
  4. For Branch, choose master.

Source configuration

To set up the CodeBuild environment, we use a managed image based on Amazon Linux 2.

  1. For Environment Image, select Managed image.
  2. For Operating system, choose Amazon Linux 2.
  3. For Runtime(s), choose Standard.
  4. For Image, choose amazonlinux2-aarch64-standard:1.0.
  5. For Image version, choose Always use the latest for this runtime version.

Environment

  1. For Service role¸ select New service role.
  2. For Role name, enter your service role name.

Service Role

  1. Chose Create build project.

Creating a CodePipeline pipeline

To create your pipeline, complete the following steps:

  1. On the CodePipeline console, choose Pipelines.
  2. Choose Create pipeline

Code Pipeline

  1. For Pipeline name, enter a name for the pipeline (for example, DemoPipeline).
  2. For Service role, select New service role.
  3. For Role name, enter your service role name.

Pipeline name

  1. Choose Next.
  2. For Source provider, choose AWS CodeCommit.
  3. For Repository name, choose the repository you created.
  4. For Branch name, choose master.

Source Configurations

  1. Choose Next.
  2. For Build provider, choose AWS CodeBuild.
  3. For Region, choose your Region.
  4. For Project name, choose the build project you created.

CodeBuild

  1. Choose Next.
  2. Choose Skip deploy stage.
  3. Choose Skip
  4. Choose Create pipeline.

The pipeline is now created successfully.

Running and monitoring your pipeline

We use the pipeline to release changes. By default, a pipeline starts automatically when it’s created and any time a change is made in a source repository. You can also manually run the most recent revision through your pipeline, as in the following steps:

  1. On the CodePipeline console, choose the pipeline you created.
  2. On the pipeline details page, choose Release change.

The following screenshot shows the status of the run from the pipeline.

Release change

  1. Under Build, choose Details to view build logs, phase details, reports, environment variables, and build details.

Build details

  1. Choose the Build logs tab to view the logs generated as a result of the build in more detail.

The following screenshot shows that we ran the AWS CloudFormation command that was provided in the buildspec.yml file. It also shows that all phases of the build process are successfully complete.

 

Phase Details

The StackSet parameter KMSId has been updated successfully with the new value newCustomValue as a result of running the pipeline.  Please note that we used the parameter KMSId as an example for demonstration purposes. Any other parameter that is part of your StackSet could have been used instead.

Cleaning up

You may delete the resources that you created during this post:

  • AWS CloudFormation StackSet.
  • AWS CodeCommit repository.
  • AWS CodeBuild project.
  • AWS CodePipeline.

Conclusion

In this post, we explored how to use CodePipeline, CodeBuild, and CodeCommit to update an existing CloudFormation StackSet. Happy coding!

About the author

Karim Afifi is a Solutions Architect Leader with Amazon Web Services. He is part of the Global Life Sciences Solution Architecture team. team. He is based out of New York, and enjoys helping customers throughout their journey to innovation.