Streamline your AWS infrastructure development with AI-powered documentation search, validation, and troubleshooting
Introduction
Today, we’re excited to introduce the AWS Infrastructure-as-Code (IaC) MCP Server, a new tool that bridges the gap between AI assistants and your AWS infrastructure development workflow. Built on the Model Context Protocol (MCP), this server enables AI assistants like Kiro CLI, Claude or Cursor to help you search AWS CloudFormation and Cloud Development Kit (CDK) documentation, validate templates, troubleshoot deployments, and follow best practices – all while maintaining the security of local execution.
Whether you’re writing AWS CloudFormation templates or AWS Cloud Development Kit (CDK) code, the IaC MCP Server acts as an intelligent companion that understands your infrastructure needs and provides contextual assistance throughout your development lifecycle.
The Model Context Protocol (MCP) is an open standard that enables AI assistants to securely connect to external data sources and tools. Think of it as a universal adapter that lets AI models interact with your development tools while keeping sensitive operations local and under your control.
The IaC MCP Server provides nine specialized tools organized into two categories:
Remote Documentation Search Tools
These tools connect to the AWS Knowledge MCP backend to retrieve relevant, up-to-date information:
search_cdk_documentation Search the AWS CDK knowledge base for APIs, concepts, and implementation guidance.
search_cdk_samples_and_constructs Discover pre-built AWS CDK constructs and patterns from the AWS Construct Library.
search_cloudformation_documentation Query CloudFormation documentation for resource types, properties, and intrinsic functions.
read_cdk_documentation_page Retrieve and read full documentation pages returned from searches or provided URLs.
Local Validation and Troubleshooting Tools
These tools run entirely on your machine
cdk_best_practices Access a curated collection of AWS CDK best practices and design principles.
validate_cloudformation_template Perform syntax and schema validation using cfn-lint to catch errors before deployment.
check_cloudformation_template_compliance Run security and compliance checks against your templates using AWS Guard rules and cfn-guard.
troubleshoot_cloudformation_deployment Analyze CloudFormation stack deployment failures with integrated CloudTrail event analysis. This tool will use your AWS credentials to analyze your stack status.
get_cloudformation_pre_deploy_validation_instructions Returns instructions for CloudFormation’s pre-deployment validation feature, which validates templates during change set creation.
Key Use Cases
Intelligent Documentation Assistant
Instead of manually searching through documentation, ask your AI assistant natural language questions:
“How do I create an S3 bucket with encryption enabled in CDK?”
The server searches CDK best practic and samples, returning relevant code examples and explanations.
2. Proactive Template Validation
Before deploying infrastructure changes:
User: “Validate my CloudFormation template and check for security issues”
AI Agent: [Uses validate_cloudformation_template and check_cloudformation_template_compliance]
“Found 2 issues: Missing encryption on EBS volumes,
and S3 bucket lacks public access block configuration”
3. Rapid Deployment Troubleshooting
When a stack deployment fails:
User: “My stack ‘stack_03’ in us-east-1 failed to deploy. What happened?”
AI Agent: [Uses troubleshoot_stack_deployment with CloudTrail integration]
“The deployment failed due to insufficient IAM permissions.
CloudTrail shows AccessDenied for ec2:CreateVpc.
You need to add VPC permissions to your deployment role.”
4. Learning and Exploration
New to AWS CDK? The server helps you discover constructs and patterns:
User: “Show me how to build a serverless API”
AI Agent: [Searches CDK constructs and samples]
“Here are three approaches using API Gateway + Lambda…”
Architecture and Security
Security Design
Local Execution: The MCP server runs entirely on your local machine using uv (the fast Python package manager). No code or templates are sent to external services except for documentation searches.
AWS Credentials: The server uses your existing AWS credentials (from ~/.aws/credentials, environment variables, or IAM roles) to access CloudFormation and CloudTrail APIs. This follows the same security model as the AWS CLI.
stdio Communication: The server communicates with AI assistants over standard input/output (stdio), with no network ports opened.
Minimal Permissions: For full functionality, the server requires read-only access to CloudFormation stacks and CloudTrail events—no write permissions needed for validation and troubleshooting workflows.
Getting Started
Prerequisites
Python 3.10 or later uv package manager AWS credentials configured locally MCP-compatible AI client (e.g., Kiro CLI, Claude Desktop)
Configuration
Configure the MCP server in your MCP client configuration. For this blog we will focus on Kiro CLI. Edit .kiro/settings/mcp.json):
Privacy Notice: This MCP server executes AWS API calls using your credentials and shares the response data with your third-party AI model provider (e.g., Amazon Q, Claude Desktop, Cursor, VS Code). Users are responsible for understanding your AI provider’s data handling practices and ensuring compliance with your organization’s security and privacy requirements when using this tool with AWS resources.
IAM Permissions
The MCP server requires the following AWS permissions:
For Template Validation and Compliance:
No AWS permissions required (local validation only)
For Deployment Troubleshooting:
cloudformation:DescribeStacks
cloudformation:DescribeStackEvents
cloudformation:DescribeStackResources
cloudtrail:LookupEvents (for CloudTrail deep links)
IMPORTANT: Ensure you have satisfied all prerequisites before attempting these commands.
1. With the mcp.json file correctly set, try to run a sample prompt. In your terminal, run kiro-cli chat to start using Kiro-cli in the CLI.
Figure 1: Kiro-CLI with AWS IaC MCP server
Scenarios:
“What are the CDK best practices for Lambda functions?”
Figure 2: Search the CDK best practices for Lambda functions
“Search for CDK samples that use DynamoDB with Lambda”
Figure 3: Search for CDK samples that use DynamoDB with Lambda
“Validate my CloudFormation template at ./template.yaml”
Figure 4: Validate my CloudFormation template with AWS IaC MCP Server
“Check if my template complies with security best practices”
Figure 5: Check if my template complies with security best practices with AWS IaC MCP Server
Best Practices
Start with Documentation Search: Before writing code, search for existing constructs and patterns
Validate Early and Often: Run validation tools before attempting deployment
Check Compliance: Use check_template_compliance to catch security issues during development
Leverage CloudTrail: When troubleshooting, the CloudTrail integration provides detailed failure context
Follow CDK Best Practices: Use the cdk_best_practices tool to align with AWS recommendations
What’s Next?
The IAC MCP Server represents a new paradigm in the AI agentic workflow infrastructure development – one where AI assistants understand your tools, help you navigate complex documentation, and provide intelligent assistance throughout the development lifecycle.
Feedback: We welcome issues and pull requests! Or respond to our IaC survey here.
Ready to supercharge your infrastructure as code development? Install the IaC MCP Server today and experience AI-powered assistance for your AWS CDK and CloudFormation workflows.
Have questions or feedback? Reach out to the blog authors on the AWS Developer Forums.
You can now develop AWS Lambda functions using Node.js 24, either as a managed runtime or using the container base image. Node.js 24 is in active LTS status and ready for production use. It is expected to be supported with security patches and bugfixes until April 2028.
The Lambda runtime for Node.js 24 includes a new implementation of the Runtime Interface Client (RIC), which integrates your functions code with the Lambda service. Written in TypeScript, the new RIC streamlines and simplifies Node.js support in Lambda, removing several legacy features. In particular, callback-based function handlers are no longer supported.
Node.js 24 includes several additions to the language, such as Explicit Resource Management, as well as changes to the runtime implementation and the standard library. With this release, Node.js developers can take advantage of these new features and enhancements when creating serverless applications on Lambda.
This blog post highlights important changes to the Node.js runtime, notable Node.js language updates, and how you can use the new Node.js 24 runtime in your serverless applications.
Node.js 24 runtime changes
The Lambda Runtime for Node.js 24 includes the following changes relative to the Node.js 22 and earlier runtimes.
Removing support for callback-based function handlers
Starting with the Node.js 24 runtime, Lambda no longer supports the callback-based handler signature for asynchronous operations. Callback-based handlers take three parameters, with the third parameter a callback. For example:
export const handler = (event, context, callback) => {
try {
// Some processing...
// Success case
// First parameter (error) is null, second is the result
callback(null, {
statusCode: 200,
body: JSON.stringify({
message: "Operation completed successfully"
})
});
} catch (error) {
// Error case
// First parameter contains the error
callback(error);
}
};
The modern approach to asynchronous programming in Node.js is to use the async/await pattern. Lambda introduced support for async handlers with the Node.js 8 runtime, launched in 2018. Here’s how the above function looks when using an async handler:
And Node.js 24 still supports response streaming, enabling more responsive applications by accelerating the time-to-first-byte:
export const handler = awslambda.streamifyResponse(async (event, responseStream, context) => {
// Convert event to a readable stream
const requestStream = Readable.from(Buffer.from(JSON.stringify(event)));
// Stream the response using pipeline
await pipeline(requestStream, responseStream);
});
This change to remove support for callback-based function handlers only affects Node.js 24 (and later) runtimes. Existing runtimes for Node.js 22 and earlier continue to support callback-based function handlers. When migrating functions that use callback-based handlers to Node.js 24, you need to modify your code to use one of the supported function handler signatures
As part of this change, context.callbackWaitsForEmptyEventLoop is removed. In addition, the previously deprecated context.succeed, context.fail, and context.done methods have also been removed. This aligns the runtime with modern Node.js patterns for clearer, more consistent error and result handling.
Harmonizing streaming and non-streaming behavior for unresolved promises
The Node.js 24 runtime also resolves a previous inconsistency in how unresolved promises were handled. Previously, Lambda would not wait for unresolved promises once the handler returns except when using response streaming. Starting with Node.js 24, the response streaming behavior is now consistent with non-streaming behavior, and Lambda no longer waits for unresolved promises once your handler returns or the response stream ends. Any background work (for example, pending timers, fetches, or queued callbacks) is not awaited implicitly. If your response depends on additional asynchronous operations, ensure you await them in your handler or integrate them into the streaming pipeline before closing the stream or returning, so the response only completes after all required work has finished.
Experimental Node.js features
Node.js enables certain experimental features by default in the upstream language releases. Such features include support for importing modules using require() in ECMAScript modules (ES modules) and automatically detecting ES vs CommonJS modules. As they are experimental, these features may be unstable or undergo breaking changes in future Node.js updates. To provide a stable experience, Lambda disables these features by default in the corresponding Lambda runtimes.
Lambda allows you to re-enable these features by adding the --experimental-require-module flag or the --experimental-detect-module flag to the NODE_OPTIONS environment variable. Enabling experimental Node.js features may affect performance and stability, and these features can change or be removed in future Node.js releases; such issues are not covered by AWS Support or the Lambda SLA.
ES modules in CloudFormation inline functions
With AWS CloudFormation inline functions, you provide your function code directly in the CloudFormation template. They’re particularly useful when deploying custom resources. With inline functions, the code filename is always index.js, which by default Node.js interprets as a CommonJS module. With the Node.js 24 runtime, you can use ES modules when authoring inline functions by passing the --experimental-detect-module flag via the NODE_OPTIONS environment variable. Previously, you needed a zip or container package to use ES modules. With Node.js 24, you can write inline functions using standard ESM syntax (import/export) and top‑level await), which simplifies small utilities and bootstrap logic without requiring a packaging step.
Node.js 24 language features
Node.js 24 introduces several language updates and features that enhance developer productivity and improve application performance.
Node.js 24 includes Undici 7, a newer version of the HTTP client that powers global fetch. This version brings performance improvements and broader protocol capabilities. Network‑heavy Lambda functions that call AWS services or external APIs can benefit from better connection management and throughput, especially when reusing clients or using HTTP/2 where supported. Most applications should work without changes, but you should validate behavior for advanced scenarios, such as custom headers or streaming bodies, and continue to define HTTP clients outside of the handler to maximize connection reuse across invocations.
The JavaScript Explicit Resource Management syntax (using and await using) enables deterministic clean-up of resources when a block completes. For Lambda handlers, this makes it easier to ensure short‑lived objects, such as streams, temporary buffers, or file handles, are disposed of promptly, which reduces the risk of resource leaks across warm invocations. You should continue to define long‑lived clients, for example SDK clients or database pools, outside the handler to benefit from connection reuse, and apply explicit disposal only to resources you want to tear down at the end of each invocation.
Finally, the AsyncLocalStorage API now uses AsyncContextFrame by default, improving the performance and reliability of async context propagation. This benefits common serverless patterns such as timers, correlating logs, managing tracing IDs and request‑scoped metadata across async and await boundaries, and streams without manual parameter threading. If you already use AsyncLocalStorage‑based libraries for logging or observability, you may see lower overhead and more consistent context propagation in Node.js 24.
At launch, new Lambda runtimes receive less usage than existing established runtimes. This can result in longer cold start times due to reduced cache residency within internal Lambda sub-systems. Cold start times typically improve in the weeks following launch as usage increases. As a result, AWS recommends not drawing conclusions from side-by-side performance comparisons with other Lambda runtimes until the performance has stabilized. Since performance is highly dependent on workload, customers with performance-sensitive workloads should conduct their own testing, instead of relying on generic test benchmarks.
Builders should continue to measure and test function performance and optimize function code and configuration for any impact. To learn more about how to optimize Node.js performance in Lambda, see our blog post Optimizing Node.js dependencies in AWS Lambda.
Migration from earlier Node.js runtimes
We’ve already discussed changes that are new to the Node.js 24 runtime, such as removing support for callback-based function handlers. As a reminder, we’ll recap some previous changes for customers upgrading from older Node.js functions.
The Node.js 24 runtime is based on the provided.al2023 runtime, which is based on the Amazon Linux 2023 minimal container image. The Amazon Linux 2023 minimal image uses microdnf as a package manager, symlinked as dnf. This replaces the yum package manager used in Node.js 18 and earlier AL2-based images. If you deploy your Lambda function as a container image, you must update your Dockerfile to use dnf instead of yum when upgrading to the Node.js 24 base image from Node.js 18 or earlier.
Finally, we’ll review how to configure your functions to use Node.js 24, using a range of deployment tools.
AWS Management Console
When using the AWS Lambda Console, you can choose Node.js 24.x in the Runtime dropdown when creating a function:
Creating Node.js function in the AWS Management Console
To update an existing Lambda function to Node.js 24, navigate to the function in the Lambda console, click Edit in the Runtime settings panel, then choose Node.js 24.x from the Runtime dropdown:
Editing Node.js function runtime
AWS Lambda container image
Change the Node.js base image version by modifying the FROM statement in your Dockerfile.
FROM public.ecr.aws/lambda/nodejs:24
# Copy function code
COPY lambda_handler.mjs ${LAMBDA_TASK_ROOT}
AWS Serverless Application Model
In AWS SAM, set the Runtime attribute to node24.x to use this version:
AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31
Resources:
MyFunction:
Type: AWS::Serverless::Function
Properties:
Handler: lambda_function.lambda_handler
Runtime: nodejs24.x
CodeUri: my_function/.
Description: My Node.js Lambda Function
AWS SAM supports generating this template with Node.js 24 for new serverless applications using the sam init command. For more information, refer to the AWS SAM documentation.
AWS Cloud Development Kit (AWS CDK)
In AWS CDK, set the runtime attribute to Runtime.NODEJS_24_X to use this version.
import * as cdk from "aws-cdk-lib";
import * as lambda from "aws-cdk-lib/aws-lambda";
import * as path from "path";
import { Construct } from "constructs";
export class CdkStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
// The code that defines your stack goes here
// The Node.js 24 enabled Lambda Function
const lambdaFunction = new lambda.Function(this, "node24LambdaFunction", {
runtime: lambda.Runtime.NODEJS_24_X,
code: lambda.Code.fromAsset(path.join(__dirname, "/../lambda")),
handler: "index.handler",
});
}
}
Conclusion
AWS Lambda now supports Node.js 24 as a managed runtime and container base image. This release uses a new runtime interface client, removes support for callback-based function handlers, and includes several other changes to streamline and simplify Node.js support in Lambda.
You can build and deploy functions using Node.js 24 using the AWS Management Console, AWS CLI, AWS SDK, AWS SAM, AWS CDK, or your choice of infrastructure as code tool. You can also use the Node.js 24 container base image if you prefer to build and deploy your functions using container images.
AWS CloudFormation makes it easy to model and provision your cloud application infrastructure as code. CloudFormation templates can be written directly in JSON or YAML, or they can be generated by tools like the AWS Cloud Development Kit (CDK). Resources are created and managed by CloudFormation as units called Stacks. Additionally, change set enable you to preview the stack changes before deployment.
CloudFormation now offers powerful new features that transform how you develop and troubleshoot infrastructure as code, pre-deployment validation that catches errors in seconds, enhanced operation tracking, and simplified failure debugging. These capabilities shift-left infrastructure code validation, helping you prevent infrastructure deployment failures that impacts development velocity.
In this blog post, we’ll explore how these new features accelerate development cycles by catching common errors during change set creation and providing precise troubleshooting through operation tracking and failure filtering. Whether you’re a platform engineer managing complex multi-service deployments or a developer iterating on infrastructure templates, we’ll show you how to:
Validate resource properties and detect naming conflicts before deployment
Prevent deployment failures by checking S3 bucket emptiness before deletion operations
Track operations with unique IDs for focused troubleshooting
Quickly identify root causes using the new describe-events API
This comprehensive guide will walk through real-world scenarios demonstrating how these capabilities can reduce infrastructure deployment failures from hours of debugging to seconds of validation, helping you deliver cloud infrastructure faster and more reliably.
Key Capabilities
Pre-deployment Validation: Catch template errors instantly instead of discovering them after resource provisioning attempts. These include pre-deployment validation for resource property syntax errors, resource naming conflicts for existing resources in your account, and S3 bucket emptiness constraint violations on delete operations.
Operation Tracking: Say goodbye to long debugging sessions. Each stack action now comes with a unique Operation ID, transforming the “needle in haystack” troubleshooting experience into precise, targeted problem-solving.
Streamlined Events API for simplified Debugging: Use the new describe-events API and FailedEvents=true filter to instantly pinpoint issues. One command tells you exactly what went wrong, eliminating the need to scroll through endless logs.
Immediate Feedback: Transform your CI/CD pipeline from a potential bottleneck into a rapid iteration engine. Get immediate feedback on common deployment issues, allowing your team to fix and deploy faster than ever before.
How It works
Pre-deployment Validation
The following scenarios show how you can leverage CloudFormation pre-deployment validation to detect property syntax errors, resource naming conflicts, and constraint violations during change set creation.
Understanding Validation Modes CloudFormation pre-deployment validation operates in two modes that determine how validation failures are handled.
FAIL mode prevents change set execution when validation detects errors, ensuring problematic templates cannot proceed to deployment. This applies to property syntax errors and resource naming conflicts.
WARN mode allows change set creation to succeed despite validation failures, providing warnings that developers can review and address before execution. This applies to constraint violations like S3 bucket emptiness that may be resolvable through manual intervention.
Understanding these modes helps you anticipate whether validation issues will block your deployment workflow or simply require attention before execution.
Let’s walk you through practical scenarios:
Scenario 1: Validate Resource Property Syntax
CloudFormation evaluates each resource property definition or value before provisioning begins. The following example illustrates several common resource property errors:
The “AWS::Lambda::Function” Role property requires an ARN pattern.
The “AWS::Lambda::Function” Timeout property expects an integer instead of a string.
The “AWS::Lambda::Function” TracingConfig.Mode nested property ENUM value is invalid.
The “AWS::Lambda::Alias” Name property is required but not defined.
The “AWS::Lambda::Alias” the extra property Description in a nested path RoutingConfig.AdditionalVersionWeights.0 is not supported.
Prior to this launch, these resource configuration errors would be detected at the resource provisioning time only. However, with the pre-deployment validations feature, these errors can be identified ahead of the deployment phase, streamlining the development-test lifecycle efficiency and minimizing rollbacks during deployments.
You can see the status of the change set is failed with a detailed status reason. You can now proceed to review the change set validation results.
Step 3: Review validation results
Console
With the console, you can review multiple validation errors in a single interface. When you click on a validation, CloudFormation pinpoints the location of the invalid property error in your template.
Figure 3: Pre-deployment validations view
Use Case: Invalid ENUM value for nested property Catching invalid configuration values before deployment. This demonstrates validation of nested properties like TracingConfig.Mode. The tool helpfully shows the supported values “Active” & “Pass through” as well as the provided invalid value “DISABLED”.
Figure 4: Validation of Invalid ENUM value for nested property
Use Case: Lambda Function Timeout property type mismatch Preventing type-related deployment failures. Shows how validation catches string values (“30s”) where integers are required, saving developers from runtime errors.
Figure 5: Validation of Lambda Function Timeout property type mismatch
Use Case: Lambda Function Role property pattern mismatch Validating ARN format requirements. Demonstrates pattern validation ensuring Role properties match required ARN format.
Figure 6: Lambda Function Role property pattern mismatch
Use Case: Undefined required Lambda Alias Name property Catching missing required properties. Shows validation detecting absent mandatory fields, preventing incomplete resource definitions from reaching deployment.
Figure 7: Validation of undefined required Lambda Alias Name property
Notice how the validation Path field (e.g., “/Resources/MyLambdaFunction/Properties/TracingConfig/Mode”) pinpoints the exact template location of each error. This eliminates manual searching through hundreds of lines of infrastructure code – a common time sink that can take minutes in complex templates.
Use case: Unsupported property Shows how CloudFormation validation catches unsupported properties. In this example, the AWS::Lambda::Alias resource had an unsupported extra property Description in a nested path RoutingConfig.AdditionalVersionWeights.0.
Figure 8: CloudFormation validation of unsupported resource property
CLI command You can also use the new describe-events API to review the validation responses.
Scenario 2: Resource Name Conflict Validation Resource name conflict validation makes sure that new resources added to a template are not already present in your AWS account or globally (e.g: Amazon S3, Amazon Route 53 DNS), preventing deployment errors caused due to resource name conflicts
After reviewing the property validation exceptions, let’s assume that you resolved all the issues and successfully deployed the stack. Next, the you have decided to include a S3 bucket resource in the template. You name the bucket “dev-thumbnails” but didn’t verify if the bucket with this name already exists. If a bucket with this name already exists, the CreateChangeSet operation will fail, reporting to the developer that the bucket already exists.
Step 2: Review Deployment Validations Use CloudFormation change set console to review validations response or use the new DescribeEvents API in the CLi.
Scenario 3: S3 bucket not empty Since AWS S3 service does not allow customers to delete S3 Buckets when there are objects in them, the new pre-deployment validations will warn you if you try to delete a bucket that is not empty.
Resuming our journey, let’s assume that you fix the name conflict issue by renaming the bucket to “dev-test-tumbnails”, and then updates the stack. After testing the lambda function’s integration with S3, the dev-cycle generated a few thumbnail objects in the S3 bucket.
Later, you decide to fix the bucket name because you notice a typo: “dev-test-tumbnails” should be “dev-test-thumbnails” (missing “h”). When you update the template to use the corrected name, CloudFormation will need to create the new bucket then delete the old one during the clean-up phase.
{
"OperationEvents": [
{
"EventId": "24920e0f-1941-45a5-9177-786bc805b724",
"StackId": "arn:aws:cloudformation:us-west-2:123456789012:stack/dev-lambda-stack/2d2c3240-bb59-11f0-b080-0613dc96740d",
"OperationId": "8fef2b60-b411-4d0e-920e-7ec7c7aa39f2",
"OperationType": "CREATE_CHANGESET",
"OperationStatus": "SUCCEEDED",
"EventType": "STACK_EVENT",
"Timestamp": "2025-11-06T22:52:26.355000+00:00",
"StartTime": "2025-11-06T22:52:21.071000+00:00",
"EndTime": "2025-11-06T22:52:26.355000+00:00"
},
{
"EventId": "c117e02d-a652-4755-9586-6d4ccb0f6504",
"StackId": "arn:aws:cloudformation:us-west-2:123456789012:stack/dev-lambda-stack/2d2c3240-bb59-11f0-b080-0613dc96740d",
"OperationId": "8fef2b60-b411-4d0e-920e-7ec7c7aa39f2",
"OperationType": "CREATE_CHANGESET",
"EventType": "VALIDATION_ERROR",
"LogicalResourceId": "MyDevThumbnailsBucket",
"PhysicalResourceId": "",
"ResourceType": "AWS::S3::Bucket",
"Timestamp": "2025-11-06T22:52:25.960000+00:00",
"ValidationFailureMode": "WARN", "ValidationName": "BUCKET_EMPTINESS_VALIDATION", "ValidationStatus": "FAILED", "ValidationStatusReason": "The bucket 'dev-tumbnails' is not empty. You must either delete all objects and versions or use the deletion policy to retain it, otherwise the delete operation will fail.", "ValidationPath": "/Resources/MyDevThumbnailsBucket"
},
{
"EventId": "6c66ff53-6751-4b4c-96b8-d1a33fc43b4f",
"StackId": "arn:aws:cloudformation:us-west-2:123456789012:stack/dev-lambda-stack/2d2c3240-bb59-11f0-b080-0613dc96740d",
"OperationId": "8fef2b60-b411-4d0e-920e-7ec7c7aa39f2",
"OperationType": "CREATE_CHANGESET",
"OperationStatus": "IN_PROGRESS",
"EventType": "STACK_EVENT",
"Timestamp": "2025-11-06T22:52:21.071000+00:00",
"StartTime": "2025-11-06T22:52:21.071000+00:00"
}
]
}
Bucket emptiness validation uses WARN mode, which allows change set creation to succeed even when the validation check fails. This gives you time to review and empty the bucket before execution. However, if you execute the change set without emptying the bucket, the delete operation will fail.
Notice in the output above:
ValidationStatus: "FAILED" – The emptiness check detected objects in the bucket
ValidationFailureMode: "WARN" – This is a warning, not a blocking error
OperationStatus: "SUCCEEDED" – Change set creation completed successfully despite the warning
This design allows you to review the warning, take corrective action (such as emptying the bucket), and then proceed with execution.
Beyond catching errors early, these capabilities also transform how you troubleshoot failed deployments with enhanced operation tracking and filtering.
New DescribeEvents API with Operation IDs and root cause filtering
The new DescribeEvents API retrieves CloudFormation events based on flexible query criteria. It groups stack operations by operation ID, enabling you to focus specifically on individual stack operations involved during your stack deployment.
Operation: An operation is any action performed on a stack, including stack lifecycle actions (Create, Update, Delete, Rollback), change set creation, nested stack creation, and automatic rollbacks triggered by failures. Each operation has a unique identifier and represents a discrete change attempt on the stack.
Figure 11: Stack Events grouped by Operation Id
Scenario When an update operation on an existing stack fails and results in a rollback, and you want to understand the reason behind the update stack failure. Using the operation ID obtained from the update stack response or from the describe stacks response, you can call describe events to get details on the failure.
The stack description available via describe-stacks API now includes LastOperations information showing recent operation IDs and their types. This enables you to quickly identify which operations occurred and their current status without parsing through event logs.
Figure 11: CloudFormation Stack Info page showing new operation IDs
Step 3: Review operation status with describe events API and operation id Using the operation ID from the previous step, you can now query specific operation events to understand exactly what happened during that operation. This targeted approach eliminates the need to search through all stack events to find relevant information.
Figure 12: New CloudFormation stack operation page
Step 4: Identify failure root cause(s) with FailedEvents filter The new failure root cause filter instantly surfaces only the events that caused the operation to fail. This eliminates the need to manually scan through progress events to identify the root cause of deployment failures.
The FailedEvents=true filter transforms troubleshooting from parsing dozens of progress events to instantly seeing only what matters. This can make diagnosis of issues during an incident much easier..
Real-World Impact These features improve your Infrastructure development experience with CloudFormation:
Template syntax errors: Previously discovered after minutes of provisioning, now caught in seconds
Resource conflicts: No more failed deployments due to existing resources
Debugging complexity: Transform troubleshooting sessions into faster targeted fixes
CI/CD reliability: Reduce pipeline failures and improve deployment confidence
Getting Started
These capabilities are available today in all AWS Regions where CloudFormation is supported. Pre-deployment validation is automatically enabled for all change set operations, no configuration required.
Try it now:
Create any change set from the CloudFormation console or via SDK or CLI with aws cloudformation create-change-set
Use `aws cloudformation describe-events –change-set-name <your-changeset-arn>` to see validation results
Filter failure root causes instantly: via console or CLI with aws cloudformation describe-events –operation-id <id> –filter FailedEvents=true
Best Practices
Always use change sets: Even for simple updates, change sets now provide validation feedback
Leverage Operation IDs: Use the unique identifiers for focused troubleshooting
Filter events strategically: Use –filters FailedEvents=true to focus on problems
Automate validation: Integrate the describe-events API into your CI/CD pipelines
Use Console: CloudFormation console provides a visual experience with error source mapping to the specific line on your template.
Conclusion
Start using these features today in your development workflow. Whether you’re building new infrastructure or maintaining existing stacks, early validation and enhanced troubleshooting will accelerate your deployment cycles and make it easier to manage infrastructure.
Ready to experience faster CloudFormation development? Create your first change set and see validation in action.
Organizations operating at scale on AWS often need to manage resources across multiple accounts and regions. Whether it’s deploying security controls, compliance configurations, or shared services, maintaining consistency can be challenging.
AWS CloudFormation StackSets (StackSets) has been helping organizations deploy resources across multiple accounts and regions since its launch. While the service is powerful on its own, combining it with Infrastructure as Code (IaC) tools and implementing automated deployments can significantly enhance its capabilities.
In this post, we’ll show you how to leverage AWS CloudFormation StackSets at scale using AWS CDK and implement a robust CI/CD pipeline for automated deployments with AWS CodePipeline.
StackSets key concepts
AWS CloudFormation StackSets allows you to create, update, or delete CloudFormation stacks across multiple AWS accounts and regions with a single operation. It’s essentially a way to manage infrastructure at scale across your AWS organization. Using an administrator account, you define and manage a CloudFormation template, and use the template as the basis for provisioning stacks into selected target accounts across specified AWS Regions:
Figure 1. StackSets overview.
The Administrator Account is the AWS account where you create and manage StackSets and the Target Accounts are the AWS accounts where the stack instances are deployed.
The Stack Instances are individual stacks created from the StackSet template deployed to specific account-region combinations.
You can make the following operations using StackSets: Create, update, and delete actions performed on stack instances. These operations can be applied in concurrent or sequential way.
Sequential Deployment:
Account-by-account deployment
Region-by-region within accounts
Configurable failure thresholds
Parallel Deployment:
Concurrent account deployments
Maximum concurrent account setting
Region priority configuration
Hybrid Deployment:
Combine sequential and parallel
Account group-based deployment
Regional deployment strategies
The power of StackSets
The use of StackSets allows us to extend AWS CloudFormation’s capabilities in several important ways:
Governance
It provides you with Centralized Management as a single point of control while including consistent deployment patterns and automated stack instance management across AWS accounts and regions.
With Drift Detection feature, you can identify if any of the stack instances of your StackSet have configuration differences according to its expected configuration. You detect changes made outside CloudFormation and changes made to an instance stack through CloudFormation directly without using the StackSet.
Flexible Deployment
You also have flexible deployment options with controlled rollout. For example, with Concurrent Deployments you can deploy to multiple accounts within each region simultaneously while controlling deployment order. It also includes failure tolerance with automated retry failed operations.
Operational Efficiency
It reduces manual effort in managing multi-account and multi-region environments while minimizes human error in deployments.
Cost Management
It delivers comprehensive resource organization and streamlined tracking of resources across accounts and regions containing instance stacks. Using centralized management, simplifies the resource tracking and organization enabling you you to have:
unified visibility: view all related stacks from a single StackSet console (with their deployment status)
consistent tagging: apply standardized tags across all stack instances for cost allocation and resource grouping
drift detection: run drift detection across all stack instances simultaneously
operations tracking: track all operations (create, update and delete) across account/regions from one place
Built-in Safety
You can establish maximum concurrent operation limits, failure tolerance thresholds and automatic retry mechanisms. You also have recovery capabilities through update operations. All these features make a built-in safety mechanisms that prevent widespread failures.
Let’s say you have 100 target accounts, with the maximum concurrent limits, you can for example deploy a change to only 10 accounts. Also, with a failure threshold you can set how many failures do you allow before automatically stopping the process (e.g., stop if more than 5 accounts fail). This way you can gradually deploy and test your templates with a little group, establishing failure thresholds, instead of affecting the stacks preventing mass failures.
When an operation fails, AWS CloudFormation performs a rollback in the stack instances deploying the previous working template. You will still need to correct the template and apply it again in all the stack instances. With StackSets, you can fix the issues in the template and run again an update across all the stacks including the concurrent limit and failure threshold mentioned before to safety test the fix.
Security and Compliance management
This security-focused approach with StackSets helps organizations maintain a strong security posture across their AWS environment while reducing the operational overhead of managing security at scale.
You can use StackSets to deploy standardized security policies across accounts, enforce security baselines automatically and implement security guardrails organization-wide. For example, you can deploy detective control resource and its configuration in all your accounts like Amazon GuardDuty or Amazon Macie. You can also deploy preventive controls like SCPs, AWS Firewall Manager or AWS Shield Advanced. For example you can deploy through StackSets the following CloudFormation template en each target account to block certain actions in a region:
<code>AWSTemplateFormatVersion: '2010-09-09'</code><br /><code>Description: 'Service Control Policy to block access to specific AWS regions'</code><br /><br /><code>Parameters:</code><br /><code> PolicyName:</code><br /><code> Type: String</code><br /><code> Default: 'RegionDenyPolicy'</code><br /><code> Description: 'Name for the Service Control Policy'</code><br /><code> </code><br /><code> PolicyDescription:</code><br /><code> Type: String</code><br /><code> Default: 'Blocks access to Singapore region (ap-southeast-1) while allowing global services'</code><br /><code> Description: 'Description for the Service Control Policy'</code><br /><code> </code><br /><code> BlockedRegion:</code><br /><code> Type: String</code><br /><code> Default: 'ap-southeast-1'</code><br /><code> Description: 'AWS Region to block access to'</code><br /><code> AllowedValues:</code><br /><code> - 'ap-southeast-1'</code><br /><code> - 'ap-southeast-2'</code><br /><code> - 'eu-west-3'</code><br /><code> - 'us-west-1'</code><br /><code> - 'ca-central-1'</code><br /><code> </code><br /><code> TargetOUId:</code><br /><code> Type: String</code><br /><code> Description: 'Organizational Unit ID to attach the policy to (e.g., ou-root-xxxxxxxxxx)'</code><br /><code> </code><br /><code>Resources:</code><br /><code> RegionDenySCP:</code><br /><code> Type: AWS::Organizations::Policy</code><br /><code> Properties:</code><br /><code> Name: !Ref PolicyName</code><br /><code> Description: !Ref PolicyDescription</code><br /><code> Type: SERVICE_CONTROL_POLICY</code><br /><code> Content:</code><br /><code> Version: '2012-10-17'</code><br /><code> Statement:</code><br /><code> - Sid: DenyAccessToSpecificRegion</code><br /><code> Effect: Deny</code><br /><code> NotAction:</code><br /><code> - 'route53:*'</code><br /><code> - 'cloudfront:*'</code><br /><code> - 'sts:*'</code><br /><code> Resource: '*'</code><br /><code> Condition:</code><br /><code> StringEquals:</code><br /><code> 'aws:RequestedRegion':</code><br /><code> - !Ref BlockedRegion</code><br /><code> TargetIds:</code><br /><code> - !Ref TargetOUId</code><br /><code> Tags:</code><br /><code> - Key: Purpose</code><br /><code> Value: RegionCompliance</code><br /><code> - Key: ManagedBy</code><br /><code> Value: CloudFormation</code><br /><br /><code>Outputs:</code><br /><code> PolicyId:</code><br /><code> Description: 'ID of the created Service Control Policy'</code><br /><code> Value: !Ref RegionDenySCP</code><br /><code> Export:</code><br /><code> Name: !Sub '${AWS::StackName}-PolicyId'</code><br /><code> </code><br /><code> PolicyArn:</code><br /><code> Description: 'ARN of the created Service Control Policy'</code><br /><code> Value: !GetAtt RegionDenySCP.Arn</code><br /><code> Export:</code><br /><code> Name: !Sub '${AWS::StackName}-PolicyArn'</code>
Other capabilities include compliance-related resources consistently, maintain audit trails of security configurations and ensure regulatory requirements are met across all accounts. For example, you can enable CouldTrail and deploy AWS Config rules across all the instance stacks managed by the StackSet.
For both Security and Compliance incidents you can use StackSets to deploy automated response workflows, configure event notifications and implement remediation actions across your accounts and regions.
Import existing stacks into StackSets
A stack import operation can import existing stacks into new or existing StackSets, so that you can migrate existing stacks to a StackSet in one operation.
Solution Overview
This solution includes an AWS CodePipeline stack that creates a CI/CD pipeline to deploy our StackSet. This pipeline deploys an application stack containing the AWS CloudFormation StackSet with a monitoring dashboard in AWS CloudWatch.
Figure 2. Solution overview
The following Amazon CloudWatch dashboard is an example of what you will in the target accounts after the StackSet is deployed:
Figure 3. Dashboard example
In the CI/CD pipeline, before running the deployment commands, it applies python security and quality code checks to ensure code quality and security and cdk-nag to ensure AWS Well Architected best practices. You can find more details about these checks in the solution repository in README.md file.
The solution includes 2 AWS CloudFormation stacks defined by in the AWS CDK application and a template for the StackSet that will be deployed in the target accounts and regions. This stack contains the monitoring dashboard that will be deployed en the target regions of each target account as a single unit.
The idea of using AWS CodePipeline with IaC is that development teams can define and share “pipelines-as-code” patterns for deploying their applications making it easy to add stages. This way, security and quality code testing can run any time you change the source code.
Figure 4. Pipeline overview
The best practice is to ensure shift-left: adding this checks to the earlier stages of the SDLC. You can accomplish this complementing your CI/CD pipeline with githooks or IDE Plugins. For example with Amazon Q Developer IDE extension you can use the review function to analyze the security of your code locally.
To use the CI/CD pipeline just create a repository using any of the AWS CodeConnection git supported providers and add the contents of the folder. All details are included in the README.md so you can always get the latest version of the code and how it works.
Conclusion
In this post, we showed how to use AWS CDK to deploy AWS CloudFormation StackSets to reduce operational overhead and ensure consistency, compliance and security across multiple regions and accounts. We also learned how to create a CI/CD pipeline to guarantee a robust DevSecOps cycle for our Infrastructure as Code.
Now that we’ve explored the main concepts together, you can clone the example repository from the walkthrough section, follow the setup instructions, and customize the implementation to enhance AWS resources management across accounts and regions. Whether you’re managing a single account or multiple organizations, these practices can be adapted to your specific needs. Now that you learned the main concepts, go ahead and clone the example repository from walkthrough section, follow the setup instructions and customize the implementation to improve the AWS resources management across your accounts and regions.
As organizations adopt multi-account strategies for improved security features and governance, AWS CloudFormation StackSets enables organizations to deploy infrastructure across multiple accounts and regions. However, monitoring and tracking these distributed deployments across multiple accounts presents operational challenges. When a critical security baseline deployed across 50 accounts suddenly starts failing, teams face the daunting task of logging into each account individually to understand what went wrong and which accounts were affected.
This operational overhead scales exponentially with organization growth, requiring platform teams to spend countless hours switching between accounts and manually correlating deployment events. The lack of centralized visibility slows incident response and makes it difficult to identify patterns or implement proactive monitoring. In this blog post, we’ll explore a solution that centralizes AWS CloudFormation logs from multiple accounts into a single management account, making it easier to monitor and troubleshoot StackSets deployments.
Solution Architecture
Our solution creates a centralized logging system that collects AWS CloudFormation events from all target accounts and forwards them to a central management account. This approach provides a single pane of glass for monitoring and troubleshooting AWS CloudFormation deployments across your entire organization.
Figure 1. Architecture diagram showing event flow from member accounts to management account through EventBridge and CloudWatch Logs.
The architecture consists of four main components:
Management Account Setup: Creates a central event bus, log group, and necessary permissions in the organization’s management account.
Target Account Configuration: Deployed via StackSets to configure event rules that forward AWS CloudFormation events to the management account.
Resource Deployment: Uses StackSets to deploy common resources across target accounts, generating the events we want to monitor.
Monitoring and Visualization: Provides dashboards and queries for operational insights.
Event Capture:Amazon EventBridge rules in each target account capture these AWS CloudFormation events based on defined patterns.
Cross-Account Forwarding: Events are forwarded to a custom event bus in the management account using cross-account permissions.
Centralized Logging: The central event bus routes all events to a Amazon CloudWatch Log Group with structured logging.
Monitoring and Alerting: Administrators can view consolidated logs, create custom queries, and set up alerts from a single location.
Prerequisites
Before implementing this solution, ensure you have the following prerequisites in place:
AWS account: Ensure you have valid AWS account.
AWS Organizations: You must have an AWS Organization structure set up with a primary management account and several member accounts under the management account.
Appropriate Permissions: You must have access to the management account or be configured as a delegated administrator to create and manage StackSets. For detailed information about permissions and security considerations when using StackSets with AWS Organizations, please review the Prerequisites in the AWS CloudFormation StackSets documentation.
Implementation Deep Dive
The solution is implemented using two AWS CloudFormation templates that work together to create a comprehensive monitoring system:
This template establishes the central logging infrastructure in the management account by creating a custom Amazon EventBridge event bus with cross-account access policies and an encrypted Amazon CloudWatch Log Group using a customer-managed AWS Key Management Service (AWS KMS) key. A key feature is the included stack set resource that automatically deploys the target account configuration to all member accounts, eliminating manual setup and ensuring consistent configuration across the entire organization.
This template creates a service-managed stack set that deploys common resources to all accounts in specified organizational units. The StackSet is configured with auto-deployment enabled to automatically provision new accounts added to the organization and includes operation preferences for parallel regional deployment with fault tolerance settings.
On the Stacks page, choose Create stack at top right, and then choose With new resources (standard).
On the Create stack page, Upload a template file, choose Choose File to choose a template file from your local computer.
Choose Next to continue and to validate the template.
On the Specify stack details page, type a stack name in the Stack name box.
In the Parameters section, specify values for the parameters that were defined in the template.
Choose Next to continue creating the stack.
Acknowledge capabilities and transforms.
Choose Next to continue.
Choose Submit to launch your stack.
This creates a stack set that deploys Amazon Simple Storage Service (Amazon S3) infrastructure to all target accounts, generating AWS CloudFormation events that will be captured by your centralized logging system.
Figure 3: Screenshot showing successful deployment of common-resources-stackset.yaml template for target accounts
Step 4: Validation and Testing
Confirm event flow and monitoring functionality by viewing the log streams in the ‘central-cloudformation-logs’ log group.
Monitoring and Visualization
The centralized logging solution provides advanced monitoring capabilities through Amazon CloudWatch Logs Insights and custom dashboards.
You can customize your queries to get:
Recent AWS CloudFormation events across all accounts.
Failed stack operations for quick troubleshooting.
Successful deployments for verification.
Event distribution by account and region.
Status breakdown of all AWS CloudFormation operations.
The following query helps you analyze CloudFormation events across your organization by showing:
You can customize your queries to filter for specific conditions such as failed deployment status, particular resource types, or specific accounts to quickly identify and troubleshoot issues across your organization’s AWS CloudFormation deployments.
Cost Implications
When implementing this centralized monitoring solution, you should consider the following cost components:
Amazon EventBridge pricing – Costs associated with events being published across accounts to the central event bus
Amazon CloudWatch pricing – Storage costs for the centralized log group storing CloudFormation events from all accounts. Query costs when analyzing the centralized logs
To clean up the resources created in this solution, follow these steps:
First, delete the common resources stack set (common-resources-stackset) from the AWS CloudFormation console in your management account. This will remove all the resources deployed across your member accounts.
After the stack set operations are complete, delete the management account logging setup stack (log-setup-management) to remove the centralized logging infrastructure, including the event bus, log groups, and associated IAM roles.
Note: Make sure all stack set operations are complete before deleting the management account logging setup to ensure proper cleanup of all resources.
Conclusion
Managing infrastructure across multiple AWS accounts doesn’t have to be complex. By centralizing AWS CloudFormation logs, you can gain visibility into your multi-account deployments, troubleshoot issues more efficiently, and help achieve consistent resource deployment across your organization.
This solution demonstrates how AWS services like AWS CloudFormation StackSets, Amazon EventBridge, and Amazon CloudWatch Logs can be combined to create a powerful monitoring system for your infrastructure as code deployments.
Get started today by implementing this solution in your AWS Organization to gain immediate visibility into your multi-account deployments. Download the templates from our GitHub repository and follow the step-by-step guide to enhance your cloud operations.
This post is cowritten by Danilo Tommasina and Lalit Kumar B from Thomson Reuters.
Large organizations often struggle with infrastructure management challenges including compliance issues, development bottlenecks and errors from inconsistent AWS resource creation across teams. Without standardized naming, tagging and policy enforcement, teams face repeated boilerplate code and difficulty accessing centrally-managed resources.
In this post, we will show you how Thomson Reuters developed an extension of the AWS Cloud Development Kit (CDK) to automate compliance, standardization and policy enforcement in Infrastructure as Code (IaC) scripts. We will explore the strategic reasoning behind this initiative, outline foundational design principles, and provide technical details on TR’s journey from concept to implementation. The solution accelerates and standardizes cloud infrastructure deployment and management through seamless integration between TR’s custom library and AWS CDK.
Thomson Reuters (TR) is one of the world’s leading information organizations for businesses and professionals. TR provides companies with the intelligence, technology, and human expertise they need to find trusted answers, enabling them to make better decisions more quickly. TR’s customers span the financial, risk, legal, tax, accounting, and media industries.
Overview
In a large organization that offers a variety of customer products, it is essential to manage numerous cloud resources effectively. This involves overseeing multiple AWS accounts, implementing access control or addressing financial tracking challenges. These tasks require the application of centrally defined standards and conventions, with additional requirements tailored to specific sub-organizations.
Infrastructure as Code (IaC) is an effective method for managing cloud resources. However, utilizing vanilla AWS CloudFormation for extensive and intricate infrastructure can pose challenges. It requires careful attention to naming conventions, tagging standards, security, and best practices for infrastructure deployments. Additionally, repeating infrastructure patterns across various services and products often leads to excessive use of copy-paste and dealing with boilerplate code. When projects require configurable and dynamic components – including conditionals, loops, repeatable patterns, and distribution to a large user base – delivering CloudFormation scripts can become quite cumbersome and prone to errors.
AWS CDK addresses these challenges by enabling IaC development in high-level programming languages like TypeScript, JavaScript, Python, Java. AWS CDK Level 2 and 3 constructs simplify and reduce the amount of code to be written to manage complex infrastructure. It allows TR to create custom libraries that extend the vanilla AWS CDK with additional patterns and utilities. The extension libraries can also be distributed for multiple programming languages and package managers thanks to JSII. JSII enables TypeScript libraries to be automatically compiled and packaged for native consumption in each target language, allowing CDK libraries to be written once but used in many different programming environments.
Solution to optimize the process
In a medium to large company, different teams provide the fundamental infrastructure services (e.g. authentication and authorization, networking, security, financial tracking and optimization, base infrastructure provisioning, etc.) to enable use of the cloud for a large community of developers.
Figure 1 illustrates the conventional method involving teams producing documentation that outlines the usage of pre-deployed infrastructure. This includes naming and tagging standards, required security boundaries, default settings and other relevant guidelines. Subsequently, the implementation team reviews these documents and integrates the established rules into their tool chain consistently, often working in isolation. This results in inefficiencies, misinterpretation risks and maintenance challenges when specifications change.
Figure 1: The traditional approach
TR’s optimized approach replaces documentation with working code as shown in Figure 2.
Figure 2: The optimized approach
Infrastructure teams contribute their specifications into an extension library for AWS CDK, while the implementation teams can also contribute common patterns back into the central extension. The central extension library is released as polyglot packages allowing the implementation teams to pick the programming language that fits best to their knowledge.
With this approach, TR introduce a “shift-left” in the development and delivery lifecycle. Standards and best practices are introduced early, things are done right by default, and TR minimizes the risks of getting inappropriately configured resources to be deployed, which leads to a reduction in the number of governance and security incidents. Implementation delivery teams can share well architected patterns for re-use by other teams to improve overall effectiveness.
Implementation
Design principles
Key factors for the adoption of a framework are:
Simplicity, ease-of-use, self-service, and fast onboarding
Low maintenance effort and cost
Controlled roll-out, ability to quickly roll-back
With the above in mind, TR delivered a minimally invasive framework that can be enabled with a tiny set of custom code on top of vanilla AWS CDK code.
Using the TR-AWS CDK core library is straightforward – users simply import the package and adapt their entry point. From there, they can leverage standard AWS CDK code and documentation for most development tasks. There’s no need to learn custom construct classes or follow extensive specialized tutorials – vanilla AWS CDK knowledge is sufficient for most requirements. Additionally, developers can quickly incorporate open-source construct libraries through standard package managers. These third-party libraries integrate seamlessly with the TR implementation, automatically conforming to company standards without requiring additional configuration.
By managing distribution of the library following standard software packaging and release procedures TR enable consumers to adopt new capabilities in a controlled way, with the ability to roll-back to previous versions if something goes wrong during an update.
All this together allows TR to tick off the key factors listed above.
The monorepo approach
TR created a monorepo (monolithic repository) which is a version control strategy where multiple projects or packages are stored in a single repository. This approach offers several advantages over maintaining separate repositories for each package: unified versioning, simplified dependency management, consistent tooling, atomic changes across packages and improved collaboration.
This setup mirrors the configuration used by AWS CDK itself.
TR organized their monorepo following this structure:
repo/package.json: Defines dev dependencies and global scripts used by all packages
repo/packages: contains the different modules
repo/packages/core/package.json: deps of core module and scripts for core module
repo/packages/core/lib/*: typescript code that composes the core module
repo/packages/core/lib/augmentation/*: module augmentations for AWS CDK core components
repo/packages/constructs-pattern-X: define multiple reusable and independent level 3 constructs
repo/packages/tr-cdk-lib/package.json: assembly module that defines scripts to assemble the final mono package that will be shared via a npm repository
Figure 3: Repo structure
This structure enables TR to maintain a collection of related, but distinct CDK constructs while making sure they work together seamlessly.
The modules are assembled and released into one single versioned package which simplifies the end-user’s consumption.
The core module: Foundation of TR AWS CDK library
The core module is the foundation of TR’s CDK extension library, it consists of several key components that work together to “TR-ify” AWS resources and offer simplified access to centrally managed infrastructure resources that are provided by TR’s AWS landing zone teams.
TR refers to “TR-ification”, as the process of dynamically adapting AWS CDK constructs to meet their standards and best practices. From a user perspective, the process happens in a minimally invasive way, for most of the time the user is coding with vanilla AWS CDK components, while having access to short-cuts to a variety of TR specific resources.
The core module serves several critical purposes:
Standardization: makes sure the AWS resources follow TR naming conventions and tagging standards
Simplification: abstracts away complex configurations required for TR compliance
Integration: provides seamless access to TR-managed resources like VPCs, security groups, and Route53 hosted zones
Policy Enforcement: automatically applies custom security and financial optimization policies
The “TR-ification” process happens on every construct following a consistent order, for each construct it will:
If applicable, set a name following a consistent pattern
Apply custom initialization logic (e.g. set IAM permission boundary)
Apply security and financial optimization defaults (if not set)
Perform custom validations
Verify security and financial optimization policies
Tag resources
TR uses a single root-level Aspect instead of multiple Aspects to avoid complex resource type checking and improve maintainability:
// This is the entrypoint that triggers the trification process on all CDK constructs
// we apply all TR specific transformations at this point
Aspects.of(this).add({
visit: (node: IConstruct) => {
node.getTRifier().trify();
},
});
The careful readers at this point will scream: Wait a moment! node.getTRifier().trify() won’t compile!
Which is absolutely correct… unless you know a topic in TypeScript called module augmentation, in TR’s case, they augment the IConstruct interface and Construct class as follows:
/** Defines the set of functionality needed when trifying resources */
export interface ITRifier {
trify(): void;
readonly name: string | undefined;
readonly nameFromTree: string;
}
declare module 'constructs/lib/construct' {
interface IConstruct {
/** Obtain the ITRifier responsible to add TR specific features to this CDK IConstruct */
getTRifier(): ITRifier;
trContext(): AppContext | StageContext | StackContext;
}
interface Construct extends IConstruct {
/** Build the ITRifier responsible to add TR specific features to this CDK IConstruct */
buildTRifier(): ITRifier;
}
}
Then provide default implementations for the generic Construct:
Construct.prototype.getTRifier = function () {
// Lazy getter, build the TRifier only when needed and cache it
return ObjectUtils.lazyGetFrom(this, 'trifier', () => this.buildTRifier());
};
Construct.prototype.buildTRifier = function () {
return new ConstructTRifier(this); // Default dummy implementation
};
Construct.prototype.trContext = function (): StackContext {
return Stack.of(this).trContext() as StackContext;
};
Since AWS CDK constructs implement the IConstruct interface, respectively extend the Construct class automatically, the “TR-ification” process becomes available for many types of constructs. All you need to do now is inject your custom logic for all resources you need customization and make sure the module is loaded, e.g. in case of a Lambda function, it uses:
lambda.CfnFunction.prototype.buildTRifier = function () {
return new CfnResourceTRifierLambda.CfnFunction(
this,
() => { // Accessor for retrieving the lambda function name
return this.functionName;
},
(name: string) => { // Accessor for setting the lambda function name
this.functionName = name;
},
() => {
// Our own stuff to set defaults for financial optimizations
const policyChecker = FinOps.Lambda.Defaults.apply(this);
this.node.addValidation({
validate: () => {
// Inject a custom validation logic to check compliance with financial policies
return policyChecker.addErrorIfNotCompliant(this);
}
});
}
);
};
TR targets L1 (Cfn) constructs like CfnFunction because the higher-level L2 and L3 constructs internally create L1 constructs during synthesis. This architectural decision makes sure TR-ification is applied universally, whether users write new lambda.Function() or new lambda.CfnFunction(), both will be TR-ified. This approach provides complete coverage with a single implementation point while remaining completely transparent to library users who can continue using their preferred abstraction level without awareness of this internal mechanism.
Naming standardization
TR uses standardized naming to support IAM policy filtering and consistent resource management. In order to support a broad range of use-cases, TR defined the resource name pattern as follows: <segregationPrefix>[-appPrefix]-<resourceName>[-region]-<envSuffix> where the elements mean:
segregationPrefix: A prefix used for grouping resources for a specific asset, it implies that a segregated administrative group is responsible for this resource, where applicable it is used for ARN based IAM resource filtering.
appPrefix: Optional, a prefix used to map a resource to a specific application or service, this is shared across stacks within a CDK app.
resourceName: The name of a resource indicating its purpose.
region: Optional, applied only to resources that are global but are part of a CDK stack that is bound to a specific region.
envSuffix: A suffix used to segregate different deployment environments, e.g. development, continuous integration, quality assurance, production.
Traditional approaches require developers to manually construct these names, propagating prefixes and suffixes throughout their code:
new lambda.Function(stack, 'foo', {
runtime: lambda.Runtime.NODEJS_LATEST,
handler: 'index.handler',
code: new lambda.InlineCode('bar'),
functionName: `\${segregationPrefix}-\${appPrefix}-compute-stats-\${envSuffix}`,
});
With TR AWS CDK extension, the code is simplified to:
new lambda.Function(stack, 'MyFunction', {
runtime: lambda.Runtime.NODEJS_LATEST,
handler: 'index.handler',
code: new lambda.InlineCode('foo'),
functionName: 'compute-stats',
});
The functionName describes what the function does without “noise”, TR AWS CDK will transparently generate and inject the name into the synthetized CloudFormation script, matching the specification. Note that functionName is optional and TR-CDK will either TR-ify a provided name or automatically generate a valid one if the user omits it, making sure CloudFormation receives a properly formatted name.
Access to “Landing Zone” resources
TR’s central AWS Landing Zone team is responsible of inflating a set of standard resources (e.g. VPC, subnets, security groups, Route 53 zones, golden AMIs, etc.) into AWS accounts that are made available to application development teams.
Through module augmentation (shown earlier), the TR-ifier defines the function trContext() which provides access to a context-aware utility. When calling this function on a resource that resides within a Stack, it will return an object that implements StackContext interface.
export interface StackContext extends StageContext {
/** Get access to the TR IVpc */
readonly vpc: IVpc;
/** Provides access to standard security groups that are available in all TR accounts */
readonly securityGroups: trparams.ISecurityGroupsResolver;
/** Provides access to private and public hosted zones (with numeric digits) that are available in all TR accounts */
readonly route53: trparams.IRoute53Resolver;
/** Provides access to TR golden AMIs that are available in all TR accounts */
readonly goldenAMI: TRGoldenAMI;
}
The readonly attributes are accessors for the AWS Landing Zones resources listed above. With calls like the following examples, you have a simple way to obtain access to the standard VPC, subnets selections, route 53 private hosted zone, …
// Get the IVpc:
const trVpc: IVpc = stack.trContext().vpc;
// Get the private subnets as array
const privateSubnets: ISubnet[] = trVpc.privateSubnets;
// Get the private subnets as SubnetSelection
const privateSubSel: SubnetSelection = trVpc.selectSubnets({
subnetType: SubnetType.PRIVATE_WITH_EGRESS,
});
// Get the private Route53 hosted zone
const privateHZ = stack.trContext().route53.privateHostedZone;
You might now wonder how TR resolves the resources and obtain objects implementing IVpc, ISubnet, ISecurityGroup, …
Instead of using hard-coded resource attributes (e.g. Id, ARN, …) or complex lookups, TR uses CloudFormation’s ability to resolve Systems Manager parameters at execution time, as part of the AWS account initial inflation along with the resources, Systems Manager parameters are registered as well. The parameter names are the same across TR’s AWS accounts, the value contains e.g. the id of the matching AWS Landing Zone standard resource, e.g. /landing-zone/vpc/vpc-id, /landing-zone/vpc/subnets/private-1-id, /landing-zone/vpc/subnets/private-2-id, …
TR then defined custom IVpc, ISubnet, IHostedZone… implementations and for each function they implemented dynamic resolution of resource attributes via Systems Manager parameters. With this approach, TR obtains portable code that runs on AWS accounts initialized via TR inflation process. There are no hard-coded resource identifiers, and there is no need for lookups via AWS SDK during synthesis.
As a user of the TR AWS CDK library, TR developers interact with an object implementing the IVpc interface and do not have to care about how to obtain e.g. the VPC-id and subnet ids. The same principle applies to Route53 hosted zones, Golden AMI ids, etc.
Application initialization
As mentioned previously, one key design principle is to minimize the custom code that a user of TR AWS CDK is required to use compared to using vanilla AWS CDK. This approach leverages existing AWS CDK and reduces the learning curve for developers.
This is how TR developers initialize an App with vanilla CDK, compared to how they initialize it with TR AWS CDK.
From this point on, the developers can continue using vanilla AWS CDK code, the value returned by TRCdk.newApp(…) is an instance of an extension of CDK’s App class and is fully compatible with it. It, however, injects the TR-ification aspect, manages the tagging process, and initializes contextual information.
Here and there, e.g. when they need to pass the VPC into a construct, they will need to call TR AWS CDK code via the trContext() entry point that is exposed on CDK constructs through TypeScript’s module augmentation feature, but that’s it! 99% of the code is vanilla AWS CDK code.
The segregationId, namingProps, and deploymentEnv attributes are used for multiple purposes like formatting resource names and tagging resources.
Standardized Tagging
TR defines tagging standards, there are mandatory tags (e.g. for attribution to a specific product asset and for tracking resource ownership), and there are optional tags (e.g. for specifying resources that belong to different services within the same product asset).
The segregationId, the resourceOwner, and deploymentEnv attributes are used to set mandatory tags using CDK’s built-in functionality for tagging. TR also defines a standardized set of optional tags that can be passed into the application context or set ad-hoc on individual constructs.
This approach maintains consistency in the use of tag names and setting the values, it happens automatically behind the scenes and will be applied to the taggable constructs. No copy-pasting of tag definitions like in AWS CloudFormation, no issues dealing with CloudFormation’s inconsistent syntax for tag declarations, no forgetting of tagging resources.
Conclusion
In this post, we discussed how the monorepo approach to AWS CDK development, centered around the core module, has significantly improved the infrastructure management at Thomson Reuters. By providing well-architected L3 constructs, standardizing and simplifying AWS resource creation, they’ve reduced errors, enhanced governance, and accelerated development.
The core module’s ability to enforce policies, standardize naming and tagging, and provide access to TR-managed resources makes it an invaluable tool for teams working with AWS infrastructure at Thomson Reuters.
To get started with AWS CDK and build your CDK solutions, check out the AWS CDK Developer Guide.
AWS CloudFormation StackSets enables organizations to deploy infrastructure consistently across multiple AWS accounts and regions. However, success depends on choosing the right deployment strategy that balances three critical factors: deployment speed, operational safety, and organizational scale. This guide explores proven StackSets deployment strategies specifically designed for multi-account infrastructure management.
Understanding StackSets Deployment Fundamentals
What are StackSets Actually Used For?
Unlike single-account AWS CloudFormation templates, StackSets are specifically designed for multi-account infrastructure governance. Common use cases include Security baselines (deploying IAM policies, security groups, and access controls across all accounts), Compliance controls (rolling out AWS Config rules, AWS CloudTrail configurations, and audit requirements), Organizational standards (establishing consistent VPC configurations, tagging policies, and naming conventions), Shared services (deploying monitoring solutions, logging infrastructure, and backup policies) or Cost management (implementing budget controls, cost allocation tags, and resource optimization policies)
The Multi-Account Challenge
Managing infrastructure across dozens or hundreds of AWS accounts presents unique challenges:
Single Account (CFN Template) Multi-Account (StackSets) App A Org Unit A (50 accounts) | | [Deploy Once] [Deploy consistently across all] | | Success/Fail Complex success/failure matrix
Multi account and multi region Cloudformation deployment complexity
The Speed-Safety-Scale Triangle
Every StackSets deployment strategy involves trade-offs: Speed (how quickly changes propagate across your organization), Safety (risk mitigation and failure containment) and Scale (ability to manage hundreds of accounts efficiently)
Prerequisites
Before implementing any of the deployment strategies described in this guide, ensure you have:
“For a more conservative deployment, set Maximum Concurrent Accounts to 1, and Failure Tolerance to 0. Set your lowest-impact region to be first in the Region Order Start with one region.”
“For a faster deployment, increase the values of Maximum Concurrent Accounts and Failure Tolerance as needed. ”
Based on the above, we are proposing below several deployment strategies, depending on the speed, safety and scale you want to achieve.
1. Sequential Deployment: Maximum Safety
Use Case : Critical security updates, compliance requirements, first-time organizational rollouts
Below are listed some possible use cases:
Security baseline updates: New IAM policies affecting root access
Compliance rollouts: SOX, HIPAA, or PCI-DSS control implementations
Critical infrastructure changes: VPC security group modifications
Organizational policy changes: New AWS Config rules for audit compliance
Implementation Example:
For this example, we will download the following template ConfigRuleCloudtrailEnabled.yml from the Cloudformation sample library in the AWS documentation to configure an AWS Config rule to determine if AWS CloudTrail is enabled and follow the next steps:
The expected response should be similar to the following :
{"StacksetId": "security-baseline: ...."}
Step 2: Create Stack Instances
Before you launch the below command, you need to adjust the values of the following parameters:
OrganizationalUnitIds: you must change the value “ou-test” in the below command line to the name of the target OU you want to deploy to. I recommend creating a new test OU in the console or via the CLI for the purpose of this test.
regions: if needed, change the “us-east-1 eu-west-1” value, here you need to list all the regions you want to deploy to. AWS Config must be active in the accounts/regions that you choose, otherwise you’ll get an error when deploying the Stack.
# Deploy security baseline to production accounts # StackSet operation managed from us-east-1 # Deployed to regions us-east-1 and eu-west-1 # SEQUENTIAL = One region at a time, sequentially # MaxConcurrentPercentage = Deploy to 5% of accounts at once # FailureTolerancePercentage = Stop on first failure aws cloudformation create-stack-instances \ --stack-set-name security-baseline \ --deployment-targets OrganizationalUnitIds=ou-test\ --regions us-east-1 eu-west-1 \ --region us-east-1 \ --operation-preferences RegionConcurrencyType=SEQUENTIAL,MaxConcurrentPercentage=5,FailureTolerancePercentage=0
AWS CLI to create security-baseline Stack Instances sequentially for maximum safety
The CLI output should look like the following:
{"OperationId": ....}
Or create the StackSet and add the Stacks with the AWS Console:
In the CloudFormation Console, click “Create StackSet”
AWS CloudFormation Console: create a security-baseline Stackset
Upload your template from S3 or from your computer and click Next:
AWS CloudFormation Console: specify a template
Specify the StackSet name and parameters and click Next:
AWS CloudFormation Console: specify the StackSet name and parameters
Configure StackSet options and click Next:
AWS CloudFormation Console: configure the StackSet options
Set deployment options and click Next:
AWS CloudFormation Console: set deployment options
AWS CloudFormation Console: set more deployment options
Then Review and Submit.
Not to overweight this blog, we’ll provide only this example of CLI output and Console screenshot, but the “Parallel Deployment” and “Balanced Approach” will be similar to this example. You just need to update the parameters for the different StackSet Operations options.
A real-world example would be a financial services company deploying new MFA requirements across 200 production accounts. They could use sequential deployment with 5 concurrency to ensure each batch was validated before proceeding.
2. Parallel Deployment: Maximum Speed
The Parallel Deployment is best for non-critical updates, development environments, routine maintenance
Here are some possible use cases:
Development account standardization: Rolling out new development tools
Monitoring infrastructure: Deploying Amazon CloudWatch dashboards and alarms
Non-production updates: Updating development and staging environments
Implementation Example:
For this example, we will copy paste the .yml template from this Re:Post article about monitoring IAM events in a file called “monitoring-baseline.yml”, and use it in the following command lines.
Just like in the previous example, before you launch the below command, you need to adjust the values of the OrganizationalUnitIds and regions parameters.
# Deploy monitoring baseline to dev and sandbox accounts # StackSet operation managed from us-east-1 # Deployed to regions us-east-1 and eu-west-1 # PARALLEL = Deployment in parallel # MaxConcurrentPercentage = Deploy to 80% of accounts at once # FailureTolerancePercentage = Tolerate failures in 20% of accounts aws cloudformation create-stack-instances \ --stack-set-name monitoring-baseline \ --deployment-targets OrganizationalUnitIds=ou-development,ou-sandbox \ --regions us-east-1 eu-west-1 \ --region us-east-1 \ --operation-preferences RegionConcurrencyType=PARALLEL,MaxConcurrentPercentage=80,FailureTolerancePercentage=20
AWS CLI to create monitoring-baseline Stack Instances in parallel with high value for max concurrent percentage for maximum speed
3. Progressive Deployment: Balanced Approach or Multi Phase Approach (Recommended)
For most production scenarios with moderate risk tolerance, it is recommended to use a Balanced Approach, or Multi-Phase Implementation.
Balanced Approach
For this example, to make it easier, you can create a copy of “monitoring-baseline.yml” created previously, and name it “balanced-template.yml”.
cp monitoring-baseline.yml balanced-template.yml
bash command to copy the monitoring-baseline.yml file to balanced-template.yml
Then you can use it in the following command lines.
You need to adjust the values of the OrganizationalUnitIds and regions parameters.
# Deploy monitoring baseline to production accounts # StackSet operation managed from us-east-1 # Deployed to regions us-east-1 # SEQUENTIAL = Deployment in sequence # MaxConcurrentPercentage = 100% Deploy full speed for small pilot # FailureTolerancePercentage = Zero tolerance in pilot aws cloudformation create-stack-instances \ --stack-set-name balanced-deployment \ --deployment-targets Accounts=pilot-account-1,pilot-account-2 \ --regions us-east-1 \ --region us-east-1 \ --operation-preferences RegionConcurrencyType=SEQUENTIAL,MaxConcurrentPercentage=100,FailureTolerancePercentage=0
AWS CLI to create balanced-deployment Stack Instances sequentially for maximum safety in Pilot accounts
Wait for Pilot validation before proceeding to Phase 2
Phase 2: Early Adopter OUs (30% of target)
Phase 2: Create Early Adopter Stack Instances
You need to adjust the values of the OrganizationalUnitIds and regions parameters.
# Deploy monitoring baseline to production accounts # StackSet operation managed from us-east-1 # Deployed to regions us-east-1, eu-west-1 # PARALLEL = Deployment in parallel # MaxConcurrentPercentage = Deploy to 25% of accounts at once # FailureTolerancePercentage = Tolerate failures in 5% of accounts aws cloudformation create-stack-instances \ --stack-set-name balanced-deployment \ --deployment-targets OrganizationalUnitIds=ou-early-adopter \ --regions us-east-1 \ --region us-east-1 eu-west-1 \ --operation-preferences RegionConcurrencyType=PARALLEL,MaxConcurrentPercentage=25,FailureTolerancePercentage=5
AWS CLI to create balanced-deployment Stack Instances in parallel with low max concurrent percentage for a balanced deployment in Early Adopter OU
Wait for Early Adopter validation before proceeding to Phase 3
Phase 3: Full Deployment (Remaining 60%)
Phase 3: Full Deployment
You need to adjust the values of the OrganizationalUnitIds and regions parameters.
# Deploy monitoring baseline to production accounts # StackSet operation managed from us-east-1 # Deployed to regions us-east-1, eu-west-1 and ap-southeast-1 # PARALLEL = Deployment in parallel # MaxConcurrentPercentage = Deploy to 40% of accounts at once for higher speed after validation # FailureTolerancePercentage = Tolerate failures in 10% of accounts for moderate tolerance aws cloudformation create-stack-instances \ --stack-set-name balanced-deployment \ --deployment-targets OrganizationalUnitIds=ou-standard-prod,ou-legacy-prod \ --regions us-east-1 \ --region us-east-1 eu-west-1 ap-southeast-1 \ --operation-preferences RegionConcurrencyType=PARALLEL,MaxConcurrentPercentage=25,FailureTolerancePercentage=5
AWS CLI to create balanced-deployment Stack Instances in parallel with low max concurrent percentage for a balanced deployment in the remaining OUs
Using Step Functions for Orchestration
AWS Step Functions provides a serverless workflow service that can orchestrate StackSets deployments with advanced control flow, error handling, and state management capabilities. This approach enhances your multi-account deployments with features not available through standard StackSets operations alone.
Some of the Key Benefits include:
Advanced Deployment Orchestration: Coordinate multi-phase rollouts with validation gates
Human Approval Workflows: Implement manual approval steps for critical changes
Enhanced Error Handling: Define sophisticated retry policies and fallback mechanisms
Visual Monitoring: Track deployment progress through the Step Functions visual console
Real-World Use Case: Compliance Control Rollout
In regulated industries, AWS Step Functions enables a phased approach that combines automation with necessary governance. For instance, you can:
Deploy compliance controls to test accounts
Run automated validation and generate compliance reports
Obtain manual approval from compliance team
Deploy to production accounts with comprehensive monitoring
This approach ensures consistent governance while maintaining the complete audit trail required for regulatory compliance.
Monitoring and Optimization
AWS CloudFormation StackSets do not have extensive built-in Amazon CloudWatch metrics specifically designed for monitoring StackSet operations and health. This is actually why the monitoring implementation in our blog post is valuable.
Here’s what AWS does and doesn’t provide out of the box:
What AWS provides natively:
Basic AWS API call metrics via AWS CloudTrail (which show that operations happened but don’t track success rates or performance)
General service quotas and throttling metrics for CloudFormation as a whole
CloudFormation provides some metrics for individual stacks, but not consolidated StackSet-specific metrics
What requires custom implementation (as in our blog post):
Success rate metrics for StackSet operations across accounts
Deployment completion time tracking
Configuration drift detection and monitoring
Account-specific failure analysis
Comprehensive dashboards that show StackSet health across your organization
The code in our blog post demonstrates how to implement the success rate custom metrics by:
Gathering data from the CloudFormation API about StackSet operations
Calculating the success rate metrics for StackSet deployments
Creating custom Amazon CloudWatch metrics in a custom namespace (like “StackSetMonitoring”)
Setting up alerts for issues
This explains why organizations need to implement custom monitoring solutions like the one shown in our blog post rather than relying solely on built-in metrics.
Automated Monitoring Implementation: example of a custom metric to monitor the StackSet operations success rate
The following AWS Cloudformation template provides real-time monitoring and alerting for AWS CloudFormation StackSet operations through automated infrastructure deployment. This solution creates a complete monitoring system using a AWS Lambda function, Amazon EventBridge rules, Amazon SNS notifications, and Amazon CloudWatch dashboards to track StackSet success and failure rates. The core Lambda function named StackSetMonitor continuously monitors all active StackSets in your account, calculating success rates and publishing custom metrics to Amazon CloudWatch under the StackSetMonitoring namespace.
Below you’ll find a few example of possible custom metrics that could be implemented based on this AWS Cloudformation template:
Count of all operations (CREATE, UPDATE, DELETE) per StackSet over time periods
Number of stack instances with configuration drift (requires additional API calls)
Average time taken for StackSet operations to complete
Rate of StackSet operations to identify peak usage times
Number of individual stack instances that failed during operations
Number of retried operations (indicates infrastructure issues)
…
Here’s the StackSetMonitor.yml CloudFormation Template:
# StackSetMonitor.yml
# CFN template for monitoring AWS CloudFormation StackSet operations with real-time alerts, metrics, and dashboards.
AWSTemplateFormatVersion: '2010-09-09'
Description: 'CloudFormation template for StackSet operation monitoring using CloudWatch and SNS'
Parameters:
StackSetName:
Type: String
Description: 'Name of the StackSet to monitor'
Default: 'security-baseline'
MinLength: 1
MaxLength: 128
AllowedPattern: '[a-zA-Z][-a-zA-Z0-9]*'
ConstraintDescription: 'Must be a valid StackSet name (1-128 characters, alphanumeric and hyphens, must start with a letter)'
VpcId:
Type: String
Description: 'VPC ID where the Lambda function will be deployed (leave empty to create new VPC)'
Default: ''
SubnetIds:
Type: CommaDelimitedList
Description: 'List of subnet IDs for the Lambda function (leave empty to create new subnets)'
Default: ''
SecurityGroupIds:
Type: CommaDelimitedList
Description: 'List of security group IDs for the Lambda function (leave empty to create new security group)'
Default: ''
Conditions:
CreateVPC: !Equals [!Ref VpcId, '']
CreateVPCAndSubnets: !And [!Equals [!Ref VpcId, ''], !Equals [!Join [',', !Ref SubnetIds], '']]
HasCustomSecurityGroups: !Not [!Equals [!Join [',', !Ref SecurityGroupIds], '']]
Resources:
# KMS Key for CloudWatch Logs encryption
LogsKMSKey:
Type: AWS::KMS::Key
DeletionPolicy: Delete
UpdateReplacePolicy: Delete
Properties:
Description: 'KMS Key for StackSet Monitor CloudWatch Logs and Lambda environment variable encryption'
EnableKeyRotation: true
KeyPolicy:
Version: '2012-10-17'
Statement:
- Sid: Enable IAM User Permissions
Effect: Allow
Principal:
AWS: !Sub 'arn:${AWS::Partition}:iam::${AWS::AccountId}:root'
Action: 'kms:*'
Resource: '*'
- Sid: Allow CloudWatch Logs
Effect: Allow
Principal:
Service: !Sub 'logs.${AWS::Region}.amazonaws.com'
Action:
- 'kms:Encrypt'
- 'kms:Decrypt'
- 'kms:ReEncrypt*'
- 'kms:GenerateDataKey*'
- 'kms:DescribeKey'
Resource: '*'
Condition:
ArnEquals:
'kms:EncryptionContext:aws:logs:arn':
- !Sub 'arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/StackSetMonitor'
- !Sub 'arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/cloudformation/stacksets'
- Sid: Allow Lambda Service
Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action:
- 'kms:Encrypt'
- 'kms:Decrypt'
- 'kms:ReEncrypt*'
- 'kms:GenerateDataKey*'
- 'kms:DescribeKey'
Resource: '*'
LogsKMSKeyAlias:
Type: AWS::KMS::Alias
Properties:
AliasName: alias/stackset-monitor-logs
TargetKeyId: !Ref LogsKMSKey
# VPC Resources (created when no existing VPC is provided)
StackSetMonitorVPC:
Type: AWS::EC2::VPC
Condition: CreateVPC
Properties:
CidrBlock: 10.0.0.0/16
EnableDnsHostnames: true
EnableDnsSupport: true
Tags:
- Key: Name
Value: StackSetMonitor-VPC
- Key: Purpose
Value: VPC for StackSet Monitor Lambda function
PrivateSubnet1:
Type: AWS::EC2::Subnet
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
CidrBlock: 10.0.1.0/24
AvailabilityZone: !Select [0, !GetAZs '']
Tags:
- Key: Name
Value: StackSetMonitor-Private-Subnet-1
- Key: Purpose
Value: Private subnet for StackSet Monitor Lambda
PrivateSubnet2:
Type: AWS::EC2::Subnet
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
CidrBlock: 10.0.2.0/24
AvailabilityZone: !Select [1, !GetAZs '']
Tags:
- Key: Name
Value: StackSetMonitor-Private-Subnet-2
- Key: Purpose
Value: Private subnet for StackSet Monitor Lambda
PrivateRouteTable1:
Type: AWS::EC2::RouteTable
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
Tags:
- Key: Name
Value: StackSetMonitor-Private-RT-1
PrivateRouteTable2:
Type: AWS::EC2::RouteTable
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
Tags:
- Key: Name
Value: StackSetMonitor-Private-RT-2
PrivateSubnet1RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Condition: CreateVPC
Properties:
RouteTableId: !Ref PrivateRouteTable1
SubnetId: !Ref PrivateSubnet1
PrivateSubnet2RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Condition: CreateVPC
Properties:
RouteTableId: !Ref PrivateRouteTable2
SubnetId: !Ref PrivateSubnet2
# VPC Endpoints for AWS Services (no internet access needed)
CloudFormationVPCEndpoint:
Type: AWS::EC2::VPCEndpoint
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
ServiceName: !Sub com.amazonaws.${AWS::Region}.cloudformation
VpcEndpointType: Interface
SubnetIds:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
SecurityGroupIds:
- !Ref VPCEndpointSecurityGroup
PrivateDnsEnabled: true
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal: '*'
Action:
- cloudformation:ListStackSets
- cloudformation:ListStackSetOperations
- cloudformation:ListStackInstances
- cloudformation:DescribeStackInstance
- cloudformation:DescribeStacks
- cloudformation:GetTemplate
Resource: '*'
CloudWatchVPCEndpoint:
Type: AWS::EC2::VPCEndpoint
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
ServiceName: !Sub com.amazonaws.${AWS::Region}.monitoring
VpcEndpointType: Interface
SubnetIds:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
SecurityGroupIds:
- !Ref VPCEndpointSecurityGroup
PrivateDnsEnabled: true
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal: '*'
Action:
- cloudwatch:PutMetricData
Resource: '*'
SNSVPCEndpoint:
Type: AWS::EC2::VPCEndpoint
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
ServiceName: !Sub com.amazonaws.${AWS::Region}.sns
VpcEndpointType: Interface
SubnetIds:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
SecurityGroupIds:
- !Ref VPCEndpointSecurityGroup
PrivateDnsEnabled: true
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal: '*'
Action:
- sns:Publish
Resource: '*'
EventsVPCEndpoint:
Type: AWS::EC2::VPCEndpoint
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
ServiceName: !Sub com.amazonaws.${AWS::Region}.events
VpcEndpointType: Interface
SubnetIds:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
SecurityGroupIds:
- !Ref VPCEndpointSecurityGroup
PrivateDnsEnabled: true
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal: '*'
Action:
- events:PutEvents
Resource: '*'
LogsVPCEndpoint:
Type: AWS::EC2::VPCEndpoint
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
ServiceName: !Sub com.amazonaws.${AWS::Region}.logs
VpcEndpointType: Interface
SubnetIds:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
SecurityGroupIds:
- !Ref VPCEndpointSecurityGroup
PrivateDnsEnabled: true
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal: '*'
Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
Resource: '*'
SQSVPCEndpoint:
Type: AWS::EC2::VPCEndpoint
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
ServiceName: !Sub com.amazonaws.${AWS::Region}.sqs
VpcEndpointType: Interface
SubnetIds:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
SecurityGroupIds:
- !Ref VPCEndpointSecurityGroup
PrivateDnsEnabled: true
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal: '*'
Action:
- sqs:SendMessage
Resource: '*'
STSVPCEndpoint:
Type: AWS::EC2::VPCEndpoint
Condition: CreateVPC
Properties:
VpcId: !Ref StackSetMonitorVPC
ServiceName: !Sub com.amazonaws.${AWS::Region}.sts
VpcEndpointType: Interface
SubnetIds:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
SecurityGroupIds:
- !Ref VPCEndpointSecurityGroup
PrivateDnsEnabled: true
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal: '*'
Action:
- sts:AssumeRole
- sts:GetCallerIdentity
- sts:AssumeRoleWithWebIdentity
Resource: '*'
# Security Group for Lambda function
LambdaSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Security group for StackSet Monitor Lambda function
VpcId: !If
- CreateVPC
- !Ref StackSetMonitorVPC
- !Ref VpcId
SecurityGroupEgress:
- IpProtocol: tcp
FromPort: 443
ToPort: 443
CidrIp: 10.0.0.0/16
Description: HTTPS to VPC Endpoints
- IpProtocol: tcp
FromPort: 53
ToPort: 53
CidrIp: 10.0.0.0/16
Description: DNS TCP to VPC for name resolution
- IpProtocol: udp
FromPort: 53
ToPort: 53
CidrIp: 10.0.0.0/16
Description: DNS UDP to VPC for name resolution
Tags:
- Key: Name
Value: StackSetMonitor-Lambda-SG
- Key: Purpose
Value: Security group for StackSet Monitor Lambda
VPCEndpointSecurityGroup:
Type: AWS::EC2::SecurityGroup
Condition: CreateVPC
Properties:
GroupDescription: Security group for VPC Endpoints
VpcId: !Ref StackSetMonitorVPC
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 443
ToPort: 443
SourceSecurityGroupId: !Ref LambdaSecurityGroup
Description: HTTPS from Lambda security group
- IpProtocol: tcp
FromPort: 53
ToPort: 53
SourceSecurityGroupId: !Ref LambdaSecurityGroup
Description: DNS TCP from Lambda security group
- IpProtocol: udp
FromPort: 53
ToPort: 53
SourceSecurityGroupId: !Ref LambdaSecurityGroup
Description: DNS UDP from Lambda security group
SecurityGroupEgress:
- IpProtocol: tcp
FromPort: 443
ToPort: 443
CidrIp: 10.0.0.0/16
Description: HTTPS outbound within VPC
- IpProtocol: tcp
FromPort: 53
ToPort: 53
CidrIp: 10.0.0.0/16
Description: DNS TCP outbound within VPC
- IpProtocol: udp
FromPort: 53
ToPort: 53
CidrIp: 10.0.0.0/16
Description: DNS UDP outbound within VPC
Tags:
- Key: Name
Value: StackSetMonitor-VPCEndpoint-SG
- Key: Purpose
Value: Security group for VPC Endpoints
# Dead Letter Queue for Lambda function
StackSetMonitorDLQ:
Type: AWS::SQS::Queue
DeletionPolicy: Delete
UpdateReplacePolicy: Delete
Properties:
QueueName: StackSetMonitor-DLQ
MessageRetentionPeriod: 1209600 # 14 days
KmsMasterKeyId: alias/aws/sqs
Tags:
- Key: Purpose
Value: Dead Letter Queue for StackSet Monitor Lambda
StackSetAlertsTopic:
Type: AWS::SNS::Topic
Properties:
TopicName: StackSetAlerts
DisplayName: StackSet Monitoring Alerts
KmsMasterKeyId: alias/aws/sns
StackSetLogGroup:
Type: AWS::Logs::LogGroup
DeletionPolicy: Delete
UpdateReplacePolicy: Delete
Properties:
LogGroupName: /aws/cloudformation/stacksets
RetentionInDays: 30
KmsKeyId: !GetAtt LogsKMSKey.Arn
LambdaLogGroup:
Type: AWS::Logs::LogGroup
DeletionPolicy: Delete
UpdateReplacePolicy: Delete
Properties:
LogGroupName: /aws/lambda/StackSetMonitor
RetentionInDays: 30
KmsKeyId: !GetAtt LogsKMSKey.Arn
StackSetMonitoringDashboard:
Type: AWS::CloudWatch::Dashboard
Properties:
DashboardName: StackSetMonitoring
DashboardBody: !Sub |
{
"widgets": [
{
"type": "metric",
"width": 24,
"height": 8,
"properties": {
"metrics": [
[ "StackSetMonitoring", "SuccessRate", "StackSetName", "${StackSetName}" ]
],
"region": "${AWS::Region}",
"title": "StackSet Operations",
"period": 300,
"stat": "Average"
}
},
{
"type": "log",
"width": 24,
"height": 6,
"properties": {
"query": "SOURCE '/aws/lambda/StackSetMonitor' | fields @timestamp, @message\n| sort @timestamp desc\n| limit 20",
"region": "${AWS::Region}",
"title": "Latest StackSet Monitor Logs",
"view": "table"
}
}
]
}
# Consolidated rule to catch ALL StackSet events for comprehensive monitoring
AllStackSetOperationsRule:
Type: AWS::Events::Rule
Properties:
Name: AllStackSetOperationsRule
Description: "Rule for monitoring all CloudFormation StackSet operations with failure notifications"
EventPattern: {source: ["aws.cloudformation"], detail-type: ["CloudFormation StackSet Operation Status Change"]}
State: ENABLED
Targets:
- Id: ProcessAllEvents
Arn: !GetAtt StackSetMonitorLambda.Arn
- Id: NotifyFailure
Arn: !Ref StackSetAlertsTopic
InputTransformer:
InputPathsMap:
"stackSetId": "$.detail.stack-set-id"
"operationId": "$.detail.operation-id"
"status": "$.detail.status"
"time": "$.time"
InputTemplate: '"StackSet Event: ID: <stackSetId>, Op: <operationId>, Status: <status>, Time: <time>"'
StackSetMonitorLambda:
Type: AWS::Lambda::Function
DependsOn: LambdaLogGroup
Properties:
FunctionName: StackSetMonitor
Handler: index.lambda_handler
Role: !GetAtt StackSetMonitorRole.Arn
Runtime: python3.12
Timeout: 300
MemorySize: 512
ReservedConcurrentExecutions: 1
DeadLetterConfig:
TargetArn: !GetAtt StackSetMonitorDLQ.Arn
VpcConfig:
SecurityGroupIds: !If
- HasCustomSecurityGroups
- !Ref SecurityGroupIds
- - !Ref LambdaSecurityGroup
SubnetIds: !If
- CreateVPCAndSubnets
- - !Ref PrivateSubnet1
- !Ref PrivateSubnet2
- !Ref SubnetIds
KmsKeyArn: !GetAtt LogsKMSKey.Arn
Code:
ZipFile: |
import boto3
import json
import os
import logging
import time
import datetime
from typing import Dict, Any, Optional
# Custom JSON encoder to handle datetime objects
class DateTimeEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime.datetime):
return obj.isoformat()
return super().default(obj)
# Set up logging with more details
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# Log initialization to verify Lambda is loading correctly
print("StackSetMonitor Lambda initializing...")
def validate_event(event: Dict[str, Any]) -> bool:
"""Validate the incoming event structure"""
if not isinstance(event, dict):
logger.error("Event must be a dictionary")
return False
# If it's an EventBridge event, validate required fields
if 'detail' in event:
detail = event.get('detail', {})
if not isinstance(detail, dict):
logger.error("Event detail must be a dictionary")
return False
# Validate StackSet event structure
if 'stack-set-id' in detail:
stack_set_id = detail.get('stack-set-id')
if not isinstance(stack_set_id, str) or not stack_set_id.strip():
logger.error("stack-set-id must be a non-empty string")
return False
# Validate operation-id if present
operation_id = detail.get('operation-id')
if operation_id is not None and not isinstance(operation_id, str):
logger.error("operation-id must be a string if provided")
return False
# Validate status if present
status = detail.get('status')
if status is not None and not isinstance(status, str):
logger.error("status must be a string if provided")
return False
return True
def validate_context(context: Any) -> bool:
"""Validate the Lambda context object"""
if context is None:
logger.error("Context cannot be None")
return False
# Check for required context attributes
required_attrs = ['function_name', 'function_version', 'invoked_function_arn', 'memory_limit_in_mb']
for attr in required_attrs:
if not hasattr(context, attr):
logger.error(f"Context missing required attribute: {attr}")
return False
return True
def sanitize_string(value: str, max_length: int = 255) -> str:
"""Sanitize and truncate string inputs"""
if not isinstance(value, str):
return str(value)[:max_length]
return value.strip()[:max_length]
def lambda_handler(event: Dict[str, Any], context: Any) -> Dict[str, Any]:
"""Main Lambda handler function for StackSet monitoring with input validation"""
# Input validation
if not validate_event(event):
return {
"statusCode": 400,
"body": json.dumps({
"status": "error",
"message": "Invalid event structure"
}, cls=DateTimeEncoder)
}
if not validate_context(context):
return {
"statusCode": 400,
"body": json.dumps({
"status": "error",
"message": "Invalid context object"
}, cls=DateTimeEncoder)
}
# Log the validated event for debugging
logger.info(f"Event received: {json.dumps(event, cls=DateTimeEncoder)}")
logger.info(f"Function: {context.function_name}, Version: {context.function_version}")
try:
cf = boto3.client('cloudformation')
cw = boto3.client('cloudwatch')
# Log that we're starting processing
logger.info(f"Starting StackSet monitoring at {time.time()}")
# Check if this is an event from EventBridge
if 'detail' in event and 'stack-set-id' in event.get('detail', {}):
detail = event['detail']
stack_set_id = sanitize_string(detail['stack-set-id'])
operation_id = sanitize_string(detail.get('operation-id', 'N/A'))
status = sanitize_string(detail.get('status', 'N/A'))
# Validate stack_set_id format
if not stack_set_id or len(stack_set_id) > 128:
logger.error(f"Invalid stack_set_id: {stack_set_id}")
return {
"statusCode": 400,
"body": json.dumps({
"status": "error",
"message": "Invalid stack_set_id format"
}, cls=DateTimeEncoder)
}
# Log the StackSet operation with additional context
logger.info(f"Processing StackSet event - ID: {stack_set_id}, Op: {operation_id}, Status: {status}")
# Extract stack set name from the ID
stack_set_name = stack_set_id.split('/')[-1] if '/' in stack_set_id else stack_set_id
stack_set_name = sanitize_string(stack_set_name, 128)
logger.info(f"Extracted StackSet name: {stack_set_name}")
# Always gather metrics regardless of event type
# Get all active StackSets
stack_sets_response = cf.list_stack_sets(Status='ACTIVE')
stack_sets = stack_sets_response.get('Summaries', [])
if not isinstance(stack_sets, list):
logger.error("Invalid response from list_stack_sets")
return {
"statusCode": 500,
"body": json.dumps({
"status": "error",
"message": "Invalid CloudFormation API response"
}, cls=DateTimeEncoder)
}
logger.info(f"Found {len(stack_sets)} active StackSets")
for stack_set in stack_sets:
if not isinstance(stack_set, dict) or 'StackSetName' not in stack_set:
logger.warning(f"Skipping invalid stack_set entry: {stack_set}")
continue
stack_set_name = sanitize_string(stack_set['StackSetName'], 128)
logger.info(f"Processing StackSet: {stack_set_name}")
try:
operations = cf.list_stack_set_operations(StackSetName=stack_set_name, MaxResults=5)
# Validate operations response
if not isinstance(operations, dict):
logger.error(f"Invalid operations response for {stack_set_name}")
continue
# Calculate success rate
successes = 0
operations_list = operations.get('Summaries', [])
if not isinstance(operations_list, list):
logger.error(f"Invalid operations list for {stack_set_name}")
continue
total_ops = len(operations_list)
logger.info(f"Found {total_ops} recent operations for {stack_set_name}")
for op in operations_list:
if isinstance(op, dict) and op.get('Status') == 'SUCCEEDED':
successes += 1
success_rate = (successes / total_ops * 100) if total_ops > 0 else 100
# Validate success_rate is within expected bounds
if not (0 <= success_rate <= 100):
logger.error(f"Invalid success_rate calculated: {success_rate}")
continue
# Publish metrics to CloudWatch
cw.put_metric_data(
Namespace='StackSetMonitoring',
MetricData=[
{'MetricName': 'SuccessRate', 'Value': success_rate,
'Dimensions': [{'Name': 'StackSetName', 'Value': stack_set_name}]}
]
)
logger.info(f"Published metrics for {stack_set_name}: Success Rate = {success_rate}%")
except Exception as e:
logger.error(f"Error processing StackSet {stack_set_name}: {str(e)}")
return {
"statusCode": 200,
"body": json.dumps({
"status": "completed",
"message": f"Processed {len(stack_sets)} StackSets"
}, cls=DateTimeEncoder)
}
except Exception as e:
logger.error(f"Error in Lambda function: {str(e)}")
# Return a proper response even on error
return {
"statusCode": 500,
"body": json.dumps({
"status": "error",
"message": str(e)
}, cls=DateTimeEncoder)
}
# Managed IAM Policies
CloudFormationAccessPolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
Description: 'Policy for CloudFormation and CloudWatch access for StackSet Monitor'
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- cloudformation:ListStackSets
- cloudformation:ListStackSetOperations
- cloudformation:ListStackInstances
- cloudformation:DescribeStackInstance
Resource:
- !Sub "arn:${AWS::Partition}:cloudformation:${AWS::Region}:${AWS::AccountId}:stackset/*"
- !Sub "arn:${AWS::Partition}:cloudformation:${AWS::Region}:${AWS::AccountId}:stackset-target/*"
- Effect: Allow
Action:
- cloudwatch:PutMetricData
Resource: "*"
Condition:
StringEquals:
"cloudwatch:namespace": "StackSetMonitoring"
- Effect: Allow
Action:
- sns:Publish
Resource: !Ref StackSetAlertsTopic
EventsAccessPolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
Description: 'Policy for EventBridge access for StackSet Monitor'
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- events:PutEvents
Resource: !Sub "arn:${AWS::Partition}:events:${AWS::Region}:${AWS::AccountId}:event-bus/default"
LogsAccessPolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
Description: 'Policy for CloudWatch Logs access for StackSet Monitor'
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
Resource:
- !Sub "arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/StackSetMonitor"
- !Sub "arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/StackSetMonitor:*"
- !Sub "arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/cloudformation/stacksets"
- !Sub "arn:${AWS::Partition}:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/cloudformation/stacksets:*"
DLQAccessPolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
Description: 'Policy for Dead Letter Queue access for StackSet Monitor'
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- sqs:SendMessage
Resource: !GetAtt StackSetMonitorDLQ.Arn
StackSetMonitorRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole
- !Ref CloudFormationAccessPolicy
- !Ref EventsAccessPolicy
- !Ref LogsAccessPolicy
- !Ref DLQAccessPolicy
# Permissions for event rules to invoke Lambda
AllOperationsRuleLambdaPermission:
Type: AWS::Lambda::Permission
Properties:
FunctionName: !Ref StackSetMonitorLambda
Action: lambda:InvokeFunction
Principal: events.amazonaws.com
SourceArn: !GetAtt AllStackSetOperationsRule.Arn
# Using a one minute schedule for testing, but you can change this value
StackSetMonitorSchedule:
Type: AWS::Events::Rule
Properties:
Name: RegularStackSetMonitoring
Description: "Triggers Lambda function every 1 minute to check StackSet operations"
ScheduleExpression: "rate(1 minute)"
State: ENABLED
Targets:
- Id: RunMonitor
Arn: !GetAtt StackSetMonitorLambda.Arn
ScheduleLambdaInvokePermission:
Type: AWS::Lambda::Permission
Properties:
FunctionName: !Ref StackSetMonitorLambda
Action: lambda:InvokeFunction
Principal: events.amazonaws.com
SourceArn: !GetAtt StackSetMonitorSchedule.Arn
StackSetSuccessRateAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmDescription: "Alarm when StackSet operation success rate is low"
MetricName: SuccessRate
Namespace: "StackSetMonitoring"
Statistic: Average
Period: 300
EvaluationPeriods: 3
DatapointsToAlarm: 2
Threshold: 80
ComparisonOperator: LessThanThreshold
AlarmActions: [!Ref StackSetAlertsTopic]
Dimensions: [{Name: StackSetName, Value: !Ref StackSetName}]
Outputs:
SNSTopicArn:
Description: The ARN of the SNS topic for alerts
Value: !Ref StackSetAlertsTopic
DashboardURL:
Description: URL to the CloudWatch Dashboard
Value: !Sub https://console.aws.amazon.com/cloudwatch/home?region=${AWS::Region}#dashboards:name=StackSetMonitoring
LambdaLogGroupName:
Description: Name of the CloudWatch Log Group for Lambda logs
Value: !Ref LambdaLogGroup
DeadLetterQueueArn:
Description: ARN of the Dead Letter Queue for Lambda function failures
Value: !GetAtt StackSetMonitorDLQ.Arn
DeadLetterQueueURL:
Description: URL of the Dead Letter Queue for monitoring failed Lambda executions
Value: !Ref StackSetMonitorDLQ
TestLambdaCommand:
Description: Command to manually test the Lambda function
Value: !Sub "aws lambda invoke --function-name ${StackSetMonitorLambda} --payload '{}' response.json && cat response.json"
LambdaFunctionArn:
Description: ARN of the Lambda function configured with VPC
Value: !GetAtt StackSetMonitorLambda.Arn
LambdaSecurityGroupId:
Description: Security Group ID created for the Lambda function
Value: !Ref LambdaSecurityGroup
VpcConfiguration:
Description: VPC configuration summary for the Lambda function
Value: !Sub
- "VPC: ${VpcId}, Subnets: ${SubnetList}, Security Groups: ${LambdaSecurityGroup}"
- SubnetList: !Join [',', !Ref SubnetIds]
You need to run the following CLI command to deploy the CloudFormation stacks. You can change the ParameterValue of StackSetName“your-stackset-name” by the name of the StackSet you want to monitor. The default value is “security-baseline”. Your CLI profile should use region=“us-east-1“.
AWS CLI to deploy the StackSetMonitor.yml CloudFormation template
The CLI output should look like the following:
{"StackId": "arn:aws:cloudformation:...."}
Here’s the expected output for the CloudFormation template:
StackSetMonitor Console output
And an example of Amazon CloudWatch Dashboard and Alarm screen:
Amazon CloudWatch Dashboard screenshot for StackSetMonitor stack to track StackSet operations success rate
Amazon CloudWatch Alarm screenshot for StackSetMonitor stack to track StackSet operations success rate
SNS subscription setup involves retrieving the topic ARN from stack outputs and configuring notifications for email or SMS endpoints (below example CLI for email subscription):
AWS CLI to subscribe to the topic providing the user email
Cost:
The estimated monthly expenses ranges between 5 and 15 USD depending on StackSet activity levels, with approximately 2,880 Lambda executions per day (each minute) under the default monitoring schedule.
The solution supports customization of monitoring frequency by modifying the ScheduleExpression from the default one-minute interval. The cost will decrease if the monitoring is less frequent.
Cleanup:
For cleanup, you can run the following command lines:
To cleanup the Stack Instances and StackSets created in the Core Deployment Strategies section:
You need to change the parameter OrganizationalUnitIds value with the name of the OU, the parameter regions with the list of regions where you want to delete your stack instances, and the value of the stack-set-name parameter (security-baseline, monitoring-baseline, balanced-deployment…).
You can also remove any IAM roles/policies that you specifically created for this blog that you might not need anymore
Conclusion
Throughout this guide, we’ve explored the nuanced approaches to AWS CloudFormation StackSets deployments across large-scale environments. The key takeaways include:
Balance is Critical: Every deployment strategy requires careful consideration of the trade-offs between speed, safety, and scale based on your organizational needs.
Progressive Adoption Works: For most organizations, a progressive deployment approach with validation gates provides the optimal balance of safety and efficiency.
Organizational Context Matters: Enterprise, startup, and regulated industry patterns demonstrate that deployment strategies should be tailored to your specific business requirements and risk tolerance.
Monitoring is Essential: As organizations scale to hundreds of accounts, comprehensive monitoring becomes critical for maintaining visibility and ensuring compliance.
These different approaches will help you adopt the right strategy for your AWS CloudFormation Stacksets deployments in your AWS Organization.
You can now test these different approaches on your sandbox environment, before adapting them for your specific needs, in order to balance Speed, Safety and Scale to optimize your deployments.
This is a guest post written by Ramanathan Nachiappan from GoDaddy.
In the world of infrastructure as code, the AWS Cloud Development Kit (AWS CDK) has revolutionized how teams define and provision cloud resources. Central to its operation is the bootstrapping process, which ensures all required resources and permissions are in place to enable secure and scalable deployments.
At GoDaddy, our cloud journey has always prioritized governance, compliance, and a great developer experience. As our AWS footprint expanded across hundreds of teams and thousands of deployments, we faced a classic engineering dilemma: how do we uphold rigorous governance standards without compromising developer velocity?
AWS CDK’s default bootstrapping process—while essential—often clashed with our governance model, creating friction, workarounds, and wasted cycles. This post details how we evolved beyond that friction, eliminating the explicit bootstrap step entirely and replacing it with a seamless, zero-touch experience. The result: a “bootstrapless” CDK deployment flow that enforces governance invisibly and empowers developers to deploy with a single command.
The Governance Imperative: Security by Design
GoDaddy’s governance model isn’t just a checkbox for compliance; it’s the foundation of our cloud security posture. Our approach requires all AWS resource modifications to flow through AWS CloudFormation, with each deployment evaluated against our rule sets covering:
Our CloudFormation hooks evaluate every resource against these rules pre-deployment, helping to reduce the likelihood of non-compliant resources being created. This proactive approach is designed to support governance from day one, rather than retroactively detecting violations.
The CDK Bootstrap Challenge
AWS CDK V1 vs AWS CDK V2:
AWS CDK v1: Used the active AWS CLI credentials for all deployments.
AWS CDK v2: Introduced a new bootstrap template with five new AWS Identity and Access Management (IAM) roles, designed primarily for CDK Pipelines. These roles must be assumed or passed by the AWS CLI. It’s worth noting that AWS CDK v2 still fully supports the legacy synthesizer, allowing users to maintain their existing v1-style workflows.
When AWS CDK v2 arrived, its bootstrap process introduced crucial changes designed to standardize authentication across multiple deployment tools and scenarios (CLI, cross-account deployments, pipelines, etc.). The standard cdk bootstrap command creates several essential components:
# Creates the default bootstrap stack with resources cdk bootstrap
A collection of five IAM roles enabling AWS CDK’s deployment capabilities
Here’s where things got interesting for GoDaddy. While the default AWS CDK setup includes security measures (like encrypted Amazon S3 buckets), our enterprise governance requirements had additional specifications that created some difficulty with the default bootstrap resources:
Amazon S3 buckets needed additional encryption, logging, and compliance settings beyond the defaults
IAM roles required alignment with our specific permission boundaries and organizational policies
Amazon ECR repositories needed mandatory GoDaddy tags and access configurations
Additional compliance requirements around resource naming, backup policies, and monitoring
These GoDaddy-specific governance requirements meant the default bootstrap resources do not pass our validation checks, creating deployment slowdown for developers and increasing support overhead for GoDaddy’s governance platform as teams worked around the governance failures.
Phase 1: Custom Bootstrap Templates
Our first step toward enhancing the developer experience was creating a customized bootstrap approach using two key components:
1. The GDStack and Conformers
We developed a specialized CDK construct called GDStack extending the native CDK Stack. This custom stack framework used CDK Aspects to automatically ensure governance compliance:
Automatic Resource Conformers: We built a system of “conformers” that apply company-wide governance standards to every resource automatically. For example, our S3Conformer ensures all buckets have required encryption, logging, and access settings.
CDK Aspects Under the Hood: These conformers use AWS CDK’s powerful Aspects system—a visitor pattern that traverses all constructs in a stack and applies transformations. This allowed us to inspect and modify any non-compliant resources during synthesis without requiring developers to learn complicated rules.
Seamless Governance: When developers added resources to a GDStack, these aspects would automatically transform the resources to align with our governance rules before deployment—all invisible to the developer.
This approach dramatically reduced turnaround time for developers, who previously had to manually correct violations in their application specific CloudFormation stacks after failed deployments. Instead, the system intelligently fixed issues before they became deployment failures.
2. CliCredentialsStackSynthesizer
Instead of using AWS CDK’s default deployment roles, we used the CliCredentialsStackSynthesizer to:
Use the developer’s CLI credentials directly for deployments
Eliminate the need for complex cross-account role assumptions
This approach worked well, but still required teams to run a bootstrap step with precise GoDaddy-specific parameters. Although our platform documentation was extensive, some users still encountered issues as they continued to use the native cdk bootstrap command instead of the custom command. This behavior likely stemmed from the habit of running cdk bootstrap first, as trained by the native AWS CDK workflow. As a result, this approach still maintained some support troubleshooting workload for teams. We needed a more elegant solution for our needs!
Phase 2: The Revolutionary Bootstrapless Approach
As AWS CDK evolved, so did our thinking. The introduction of the AppStagingSynthesizer opened new possibilities, leading us to develop a completely bootstrapless solution.
The Factory Pattern Solution
We engineered an elegant chain of specialized components:
Bootstrapless CDK Factory Pattern Design
Each component plays a crucial role:
1. GDStack: The Developer Interface
This is the only component developers interact with directly:
// Developer simply extends GDStack instead of Stack
export class MyApplicationStack extends GDStack {
constructor(scope: Construct, id: string, props: GDStackProps) {
super(scope, id, props);
// Normal CDK resource definitions
new s3.Bucket(this, 'MyBucket', { ... });
}
}
2. GDStackSynthesizerFactory: The Orchestrator
This factory connects our custom components with CDK’s synthesis system:
4. GDStagingStack: The Resource Producer for App-Level bootstrapping
This stack implements IStagingResources and creates rule-compliant assets on demand:
export class GDStagingStack extends cdk.Stack implements IStagingResources {
constructor(scope: Construct, id: string, props: GDStagingStackProps) {
super(scope, id, {
...props,
// The magic ingredient - BootstraplessCliSynthesizer
synthesizer: new BootstraplessCliSynthesizer(),
description: `This stack includes resources needed to deploy the AWS CDK app ${props.appId} into this environment`,
});
// Apply governance conformers to everything
this.applyGovernanceConformers();
// Create compliant resources
const bucket = new s3.Bucket(this, "CdkStagingBucket", {
bucketName: `cdk-${this.appId}-staging-${this.account}-${this.region}`,
// Conformers ensure encryption, logging, and other requirements
});
// Additional resource creation...
}
}
The Secret Sauce: BootstraplessCliSynthesizer
The cornerstone of our solution is a custom synthesizer BootstraplessCliSynthesizer that combines the best aspects of AWS CDK’s built-in synthesizers BootstraplessSynthesizer and CliCredentialsStackSynthesizer.
It brings together key features from both AWS CDK synthesizers while adding our own innovations:
From CliCredentialsStackSynthesizer: Uses the CLI credentials directly for all operations
From BootstraplessSynthesizer: Eliminates the need for bootstrap resources
Our custom approach: Purpose-built specifically for the GDStagingStack with explicit asset rejection where GDStagingStack itself essentially creates the required asset resources on demand for the CDK Application.
This synthesizer:
Requires no bootstrapping in any region
Uses AWS CLI credentials directly for all operations
Maintains a minimal implementation focused solely on template generation
export class BootstraplessCliSynthesizer extends cdk.StackSynthesizer {
constructor() {
super();
}
// Prevent asset uploads to enforce governance compliance
public addFileAsset(_asset: cdk.FileAssetSource): cdk.FileAssetLocation {
throw new Error(
"Cannot add assets to a Stack that uses the BootstraplessCliSynthesizer",
);
}
public addDockerImageAsset(
_asset: cdk.DockerImageAssetSource,
): cdk.DockerImageAssetLocation {
throw new Error(
"Cannot add assets to a Stack that uses the BootstraplessCliSynthesizer",
);
}
// Minimal synthesis - just template generation and artifact emission
public synthesize(session: cdk.ISynthesisSession): void {
// Same as LegacySynthesizer
this.synthesizeTemplate(session);
this.emitArtifact(session);
}
}
Our innovation was creating a synthesizer used only for the GDStagingStack that works in concert with our factory pattern. Rather than assuming pre-existing bootstrap resources, it enables the staging stack itself to create the required asset resources on demand, achieving enhanced bootstrapless deployments while maintaining governance compliance.
The Elegant Workflow: Dynamic Asset Management
Our solution transformed the developer experience through intelligent, on-demand resource provisioning:
# Deploy directly - compliant staging resources created automatically when needed npx cdk deploy
The key advantage is our intelligent asset management:
On-Demand Resource Creation: Staging resources (Amazon S3 buckets, Amazon ECR repositories) are created automatically when needed, rather than requiring pre-provisioning
Governance Integration: All staging resources are created with full compliance built-in from the start
Simplified Credential Flow: Uses existing CLI credentials without complex role assumption chains
Multi-Account Scalability: Works seamlessly across any number of AWS accounts and regions
Behind the scenes, our architecture:
Creates governance-compliant staging resources dynamically as applications require them
Uses the developer’s existing CLI credentials for all operations
Applies security and compliance requirements transparently
Eliminates the need to manage bootstrap stacks across environments
Evolution of Approaches
Approach
Bootstrap Required
Security Model
Asset Management
GoDaddy Governance
Developer Workflow
AWS CDK v2 Default
Yes (one-time)
5 deployment roles
Pre-provisioned bootstrap stack
Failed validation checks
Standard setup + deploy
Custom Template + CliCredentialsStackSynthesizer
Yes (one-time)
CLI credentials
Compliant bootstrap stack via custom template
Passes all checks
Setup + deploy
GDStagingStack + BootstraplessCliSynthesizer
No
CLI credentials
Compliant staging resources created dynamically on-demand
Passes all checks
Deploy only
Business Impact: GoDaddy’s Transformation
The business value of our bootstrapless approach has been significant for GoDaddy’s infrastructure teams:
Streamlined developer focus: Our teams now focus entirely on writing infrastructure implementation logic, with AWS CDK bootstrapping fully abstracted and automated. Developers no longer need to work with bootstrap configurations, even though it was a one-time setup per environment previously.
Automated compliance: Deployments automatically meet GoDaddy’s governance requirements without developer intervention, addressing the validation failures we experienced with default bootstrap resources.
Simplified support model: Our platform support team handles fewer bootstrap-related configuration requests, allowing them to focus on broader platform improvements.
Broader CDK adoption: The streamlined workflow has encouraged more teams at GoDaddy to adopt AWS CDK from native CloudFormation YAML code for their infrastructure management.
This bootstrapless approach has worked well for GoDaddy’s specific governance requirements and development workflow preferences, demonstrating one way to integrate enterprise compliance seamlessly into AWS CDK deployments.
Conclusion: The Invisible Framework
The evolution from bootstrap – dependent to bootstrapless CDK deployments represents more than a technical improvement—it demonstrates a pathway to eliminate friction while strengthening organization specific governance. Our implementation at GoDaddy validates that enterprise compliance and developer productivity can be achieved simultaneously.
Organizations seeking to implement similar solutions should begin by evaluating the AppStagingSynthesizer capabilities within their current AWS CDK deployment patterns. This assessment will reveal opportunities to reduce bootstrap dependencies while maintaining security and compliance standards. For comparison, teams can also examine the BootstraplessSynthesizer to understand alternative approaches to eliminating traditional bootstrap resources.
The implementation approach we’ve outlined leverages established AWS CDK patterns, including the CliCredentialsStackSynthesizer for credential management and dynamic resource provisioning interfaces. These core AWS CDK interfaces — IStagingResourcesFactory and IStagingResources — form the foundation for creating governance-compliant, bootstrapless deployment workflows that scale across enterprise environments.
The future of infrastructure as code lies in systems that enforce governance invisibly while empowering developers to focus on business logic. As AWS CDK continues to evolve, the patterns we’ve demonstrated at GoDaddy provide a foundation for organizations to build their own invisible frameworks—where compliance becomes a catalyst for velocity rather than an obstacle to innovation.
The content and opinions in this blog are those of the third-party author and AWS is not responsible for the content or accuracy of this blog.
Today, organizations are heavily using Apache Spark for their big data processing needs. However, managing the entire development lifecycle of Spark applications—from local development to production deployment—can be complex and time-consuming. Managing the entire code base—including application code, infrastructure provisioning, and continuous integration and delivery (CI/CD) pipelines—is sometimes not fully automated and a shared responsibility across multiple teams, which slows down release cycles. This undifferentiated heavy lifting diverts valuable resources away from core business objectives: deriving value from data.
In this post, we explore how to use Amazon EMR, the AWS Cloud Development Kit (AWS CDK), and the Data Solutions Framework (DSF) on AWS to streamline the development process, from setting up a local development environment to deploying serverless Spark infrastructure, and implementing a CI/CD pipeline for automated testing and deployment.
By adopting this approach, developers gain full control over their code and the infrastructure responsible for running it, alleviating the need for cross-team dependency. Developers can customize the infrastructure to meet specific business needs and optimize performance. Additionally, they can customize CI/CD stages to facilitate comprehensive testing, using the self-mutation capability of AWS CDK Pipelines to automatically update and refine the deployment process. This level of control not only accelerates development cycles but also enhances the reliability and efficiency of the entire application lifecycle, so developers can focus more on innovation and less on manual infrastructure management.
Solution overview
The solution consists of the following key components:
The local development environment to develop and test your Spark code locally
The infrastructure as code (IaC) that will run your Spark application in AWS environments
The CI/CD pipeline running end-to-end tests and deploying into the different AWS environments
In the following sections, we discuss how to set up these components.
Prerequisites
To set up this solution, you must have an AWS account with appropriate permissions, Docker and the AWS CDK CLI.
Set up the local development environment
Developing Spark applications locally can be a challenging task due to the need for a consistent and efficient environment that mirrors your production setup. With Amazon EMR, Docker, and the Amazon EMR toolkit extension for Visual Studio Code, you can quickly set up a local development environment for Spark applications, developing and testing Spark code locally, and seamlessly port it to the cloud.
The Amazon EMR toolkit for VS Code includes an “EMR: Create Local Spark Environment” command that generates a development container. This container is based on an Amazon EMR on Amazon EKS image corresponding to the Amazon EMR version you select. You can develop Spark and PySpark code locally, with full compatibility with your remote Amazon EMR environment. Additionally, the toolkit provides helpers to make it straightforward to connect to the AWS Cloud, including an Amazon EMR explorer, an AWS Glue Data Catalog explorer, and commands to run Amazon EMR Serverless jobs from VS Code.
To set up your local environment, complete the following steps:
Now you can launch your dev container using the VS Code command Dev Containers: Rebuild and Reopen in container.
The container will install the latest operating system packages and run a local Spark history server on port 18080.
The container provides spark-shell, spark-sql, and pyspark from the terminal and a Jupyter Python kernel for connecting a Jupyter notebook to execute interactive Spark code.
Using the Amazon EMR Toolkit, you can develop your Spark application and test it locally using Pytest—for example, to validate the business logic. You can also connect to other AWS accounts where you have your development environment.
Build the AWS CDK application with DSF on AWS
After you validate the business logic into your local Spark application, you can implement the infrastructure responsible for running your application. DSF provides AWS CDK L3 Constructs that simplify the creation of Spark-based data pipelines on EMR Serverless or Amazon EMR on EKS.
DSF provides the capability to package your local PySpark application, including the Python dependencies, into artifacts that can consumed by EMR Serverless jobs. The PySparkApplicationPackage is a construct that uses a Dockerfile to perform the packaging of dependencies into a Python virtual environment archive and then upload the archive and the PySpark entrypoint file into a secured Amazon Simple Storage Service (Amazon S3) bucket. The following diagram illustrates this architecture.
See the following example code:
spark_app = dsf.processing.PySparkApplicationPackage(
self,
"SparkApp",
entrypoint_path="./../spark/src/agg_trip_distance.py",
application_name="TaxiAggregation",
# Path of the Dockerfile used to package the dependencies as a Python venv
dependencies_folder='./../spark',
# Path of the venv archive in the docker image
venv_archive_path="/venv-package/pyspark-env.tar.gz",
removal_policy=RemovalPolicy.DESTROY)
You just need to provide the paths for the following:
The PySpark entrypoint. This is the main Python script of your Spark application.
The Dockerfile containing the logic for packaging a virtual environment into an archive.
The path of the resulting archive in the container file system.
DSF provides helpers to connect the application package to the EMR Serverless job. The PySparkApplicationPackage construct exposes properties that can directly be used into the SparkEmrServerlessJob construct parameters. This construct simplifies the configuration of a batch job using an AWS Step Functions state machine. The following diagram illustrates this architecture.
The following code is an example of an EMR Serverless job:
spark_job = dsf.processing.SparkEmrServerlessJob(
self,
"SparkProcessingJob",
dsf.processing.SparkEmrServerlessJobProps(
name=f"taxi-agg-job-{Names.unique_resource_name(self)}",
# ID of the previously created EMR Serverless runtime
application_id=spark_runtime.application.attr_application_id,
# The IAM role used by the EMR Job with permissions required by the application
execution_role=processing_exec_role,
spark_submit_entry_point=spark_app.entrypoint_uri,
# Add the Spark parameters from the PySpark package to configure the dependencies (using venv)
spark_submit_parameters=spark_app.spark_venv_conf + spark_params,
removal_policy=RemovalPolicy.DESTROY,
schedule=schedule))
Note the two parameters of SparkEmrServerlessJob that are provided by PySparkApplicationPackage:
entrypoint_uri, which is the S3 URI of the entrypoint file
spark_venv_conf, which contains the Spark submit parameters for using the Python virtual environment
DSF also provides a SparkEmrServerlessRuntime to simplify the creation of the EMR Serverless application responsible for running the job.
Deploy the Spark application using CI/CD
The final step is to implement a CI/CD pipeline that can test your Spark code and promote from dev/test/stage and then to production. DSF provides a L3 Construct that simplifies the creation of the CI/CD pipeline for your Spark applications. DSF’s implementation of the Spark CI/CD pipeline construct uses the AWS CDK built-in pipeline functionality. One of the key capabilities when using an AWS CDK pipeline is its self-mutating capability. It can update itself whenever you change its definition, avoiding the traditional chicken-and-egg problem of pipeline updates and helping developers fully control their CI/CD pipeline.
When the pipeline runs, it follows a carefully orchestrated sequence. First, it retrieves your code from your repository and synthesizes it into AWS CloudFormation templates. Before doing anything else, it examines these templates to see if you’ve made any changes to the pipeline’s own structure. If the pipeline detects that its definition has changed, it will pause its normal operation and update itself first. After the pipeline has updated itself, it will continue with its regular stages, such as deploying your application.
DSF provides an opinionated implementation of CDK Pipelines for Spark applications, where the PySpark code is automatically unit tested using Pytest and where the configuration is simplified. You only need to configure four components:
The CI/CD stages (testing, staging, production, and so on). This includes the AWS account ID and Region where these environments reside in.
The AWS CDK stack that is deployed in each environment.
(Optional) The integration test script that you want to run against the deployed stack.
The SparkEmrCICDPipeline AWS CDK construct.
The following diagram illustrates how everything works together.
Let’s dive into each of these components.
Define cross-account deployment and CI/CD stages
With the SparkEmrCICDPipeline construct, you can deploy your Spark application stack across different AWS accounts. For example, you can have a separate account for your CI/CD processes and different accounts for your staging and production environments.To set this up, first bootstrap the various AWS accounts (staging, production, and so on):
This step sets up the necessary resources in the environment accounts and creates a trust relationship between those accounts and the CI/CD account where the pipeline will run.Next, choose between two options to define the environments (both options require the relevant configuration in the cdk.context.json file.The first option is to use pre-defined environments, which is defined as follows:
Now that the environments have been bootstrapped and configured, let’s look at the actual stack that contains the resources that will be deployed in the various environments. Two classes must be implemented:
A class that extends the stack – This is where the resources that are going to be deployed in each of the environments are defined. This can be a normal AWS CDK stack, but it can be deployed in another AWS account depending on the environment configuration defined in the previous section.
A class that extends ApplicationStackFactory – This is DSF specific, and makes it possible to configure and then return the stack that is created.
ApplicationStackFactory supports customization of the stack before returning the initialized object to be deployed by the CI/CD pipeline. You can customize your stack behavior by passing the current stage to your stack. For example, you can skip scheduling the Spark application in the integration tests stage because the integration tests trigger it manually as part of the CI/CD pipeline. For the production stage, the scheduling facilitates automatic execution of the Spark application.
Write the integration test script
The integration test script is a bash script that is triggered after the main application stack has been deployed. Inputs to the bash script can come from the AWS CloudFormation outputs of the main application stack. These outputs are mapped into environment variables that the bash script can access directly.
In the Spark CI/CD example, the application stack uses the SparkEMRServerlessJob CDK construct. This construct uses a Step Functions state machine to manage the execution and monitoring of the Spark job. The following is an example integration test bash script that we use to test that the deployed stack can run the associated Spark job successfully:
#!/bin/bash
EXECUTION_ARN=$(aws stepfunctions start-execution --state-machine-arn $STEP_FUNCTION_ARN | jq -r '.executionArn')
while true
do
STATUS=$(aws stepfunctions describe-execution --execution-arn $EXECUTION_ARN | jq -r '.status')
if [ $STATUS = "SUCCEEDED" ]; then
exit 0
elif [ $STATUS = "FAILED" ] || [ $STATUS = "TIMED_OUT" ] || [ $STATUS = "ABORTED" ]; then
exit 1
else
sleep 10
continue
fi
done
The integration test scripts are executed within an AWS CodeBuild project. As part of the IntegrationTestStack, we’ve included a custom resource that periodically checks the status of the integration test script as it runs. Failure of the CodeBuild execution causes the parent pipeline (residing in the pipeline account) to fail. This helps teams only promote changes that pass all the required testing.
Bring all the components together
When you have your components ready, you can use the SparkEmrCICDPipeline to bring them together. See the following example code:
dsf.processing.SparkEmrCICDPipeline(
self,
"SparkCICDPipeline",
spark_application_name="SparkTest",
# The Spark image to use in the CICD unit tests
spark_image=dsf.processing.SparkImage.EMR_7_5,
# The factory class to dynamically pass the Application Stack
application_stack_factory=SparkApplicationStackFactory(),
# Path of the CDK python application to be used by the CICD build and deploy phases
cdk_application_path="infra",
# Path of the Spark application to be built and unit tested in the CICD
spark_application_path="spark",
# Path of the bash script responsible to run integration tests
integ_test_script='./infra/resources/integ-test.sh',
# Environment variables used by the integration test script, value is the CFN output name
integ_test_env={
"STEP_FUNCTION_ARN": "ProcessingStateMachineArn"
},
# Additional permissions to give to the CICD to run the integration tests
integ_test_permissions=[
PolicyStatement(
actions=["states:StartExecution", "states:DescribeExecution"
],
resources=["*"]
)
],
source= CodePipelineSource.connection("your/repo", "branch",
connection_arn="arn:aws:codeconnections:us-east-1:222222222222:connection/7d2469ff-514a-4e4f-9003-5ca4a43cdc41"
),
removal_policy=RemovalPolicy.DESTROY,
)
The following elements of the code are worth highlighting:
With the integ_test_env parameter, you can define the environment variable mapping with the output of your application stack that’s defined in the application_stack_factory parameter
The integ_test_permissions parameter specifies the AWS Identity and Access Management (IAM) permissions that are attached to the CodeBuild project where the integration test script runs in
CDK Pipelines needs an AWS code connection Amazon Resource Name (ARN) to connect to your Git repository when you host your code
Now you can deploy the stack containing the CI/CD pipeline. This is a one-time operation because the CI/CD pipeline will dynamically be updated based on code changes that impact the CI/CD pipeline itself:
cd infra
cdk deploy CICDPipeline
Then you can commit and push the code into the source code repository defined in the source parameter. This step triggers the pipeline and deploys the application in the configured environments. You can check the pipeline definition and status on the AWS CodePipeline console.
Follow the readme guide to delete the resources created by the solution.
Conclusion
By using Amazon EMR, the AWS CDK, DSF on AWS, and the Amazon EMR toolkit, developers can now streamline their Spark application development process. The solution described in this post helps developers gain full control over their code and infrastructure, making it possible to set up local development environments, implement automated CI/CD pipelines, and deploy serverless Spark infrastructure across multiple environments.
Last week, Strands Agents, AWS open source for agentic AI SDK just hit 1 million downloads and earned 3,000+ GitHub Stars less than 4 months since launching as a preview in May 2025. With Strands Agents, you can build production-ready, multi-agent AI systems in a few lines of code.
We’ve continuously improved features including support for multi-agent patterns, A2A protocol, and Amazon Bedrock AgentCore. You can use a collection of sample implementations to help you get started with building intelligent agents using Strands Agents. We always welcome your contribution and feedback to our project including bug reports, new features, corrections, or additional documentation.
Here is the latest research article of Amazon Science about the future of agentic AI and questions that scientists are asking about agent-to-agent communications, contextual understanding, common sense reasoning, and more. You can understand the technical topic of agentic AI with with relatable examples, including one about our personal behaviors about leaving doors open or closed, locked or unlocked.
Last week’s launches Here are some launches that got my attention:
Amazon EC2 M4 and M4 Pro Mac instances – New M4 Mac instances offer up to 20% better application build performance compared to M2 Mac instances, while M4 Pro Mac instances deliver up to 15% better application build performance compared to M2 Pro Mac instances. These instances are ideal for building and testing applications for Apple platforms such as iOS, macOS, iPadOS, tvOS, watchOS, visionOS, and Safari.
LocalStack integration in Visual Studio Code (VS Code) – You can use LocalStack to locally emulate and test your serverless applications using the familiar VS Code interface without switching between tools or managing complex setup, thus simplifying your local serverless development process.
AWS CloudTrail MCP Server – New AWS CloudTrail MCP server allows AI assistants to analyze API calls, track user activities, and perform advanced security analysis across your AWS environment through natural language interactions. You can explore more AWS MCP servers for working with AWS service resources.
Amazon CloudFront support for IPv6 origins – Your applications can send IPv6 traffic all the way to their origins, allowing them to meet their architectural and regulatory requirements for IPv6 adoption. End-to-end IPv6 support improves network performance for end users connecting over IPv6 networks, and also removes concerns for IPv4 address exhaustion for origin infrastructure.
For a full list of AWS announcements, be sure to keep an eye on the What’s New with AWS? page.
Other AWS news Here are some additional news items that you might find interesting:
A city in the palm of your hand – Check out this interactive feature that explains how our AWS Trainium chip designers think like city planners, optimizing every nanometer to move data at near light speed.
Measuring the effectiveness of software development tools and practices – Read how Amazon developers that identified specific challenges before adopting AI tools cut costs by 15.9% year-over-year using our cost-to-serve-software framework (CTS-SW). They deployed more frequently and reduced manual interventions by 30.4% by focusing on the right problems first.
Become an AWS Cloud Club Captain – Join a growing network of student cloud enthusiasts by becoming an AWS Cloud Club Captain! As a Captain, you’ll get to organize events and building cloud communities while developing leadership skills. Application window is open September 1-28, 2025.
Upcoming AWS events Check your calendars and sign up for these upcoming AWS events as well as AWS re:Invent and AWS Summits:
AWS AI Agent Global Hackathon – This is your chance to dive deep into our powerful generative AI stack and create something truly awesome. From September 8 to October 20, you have the opportunity to create AI agents using AWS suite of AI services, competing for over $45,000 in prizes and exclusive go-to-market opportunities.
AWS Gen AI Lofts – You can learn AWS AI products and services with exclusive sessions and meet industry-leading experts, and have valuable networking opportunities with investors and peers. Register in your nearest city: Mexico City (September 30–October 2), Paris (October 7–21), London (Oct 13–21), and Tel Aviv (November 11–19).
AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Aotearoa and Poland (September 18), South Africa (September 20), Bolivia (September 20), Portugal (September 27), Germany (October 7), and Hungary (October 16).
We are excited to announce a new AWS Cloud Development Kit (CDK) feature that makes it easier and safer to refactor your infrastructure as code. CDK Refactor aims to preserve your AWS resources as you rename constructs, move resources between stacks, and reorganize your CDK applications – operations that previously risked resource replacement.
When writing infrastructure as code with the CDK, developers occasionally need to rename Constructs or move them between Stacks or directories. Whether they need to better organize their code, adhere to coding best practices, or take advantage of object-oriented programming patterns like class inheritance, these changes can be risky in environments with deployed resources, because they change the CDK-generated logical ID of those resources. During a CDK deploy, AWS CloudFormation interprets these changes as new resources, which often requires deletion of the existing resource and creation of a new resource with the new logical ID. For stateful resources, this could cause potential downtime and even data loss. To mitigate this effect of ID changes, developers had to stage their changes to create new resources, create a data or network migration plan, and then delete the old resources to prevent these refactoring effects. Sometimes, developers decide the risk of these changes outweigh the benefit of the refactor and choose not perform the refactor at all.
Today, developers can use the new CDK refactor command to detect, review, confirm, and safely apply refactored changes to their resources without resource replacement. This feature leverages the recently-launched AWS CloudFormation refactor feature, but the CDK automatically computes the mappings that CloudFormation needs to redefine the refactored resources, providing a layer of abstraction that allows developers to focus on code rather than resource configuration. Let’s walk through an example to demonstrate the benefits of this refactor capability.
Prerequisites
Along with the usual CDK prerequisites, if you bootstrapped your CDK project before this launch, you need to re-bootstrap your environment to obtain the new permissions associated with the CDK refactor capabilities before attempting your refactor.
Monolith to micro-service example
For this example, let’s say that we have a legacy CDK App that deploys a monolithic Stack with Amazon DynamoDB tables for users, products, and orders, and an AWS Lambda function that implements CRUD operations on all entities.
Monolithic application
function monolithApp() {
const monolith = new CdkAppStack(app, monolithStackName, {env});
const usersTable = makeTable(monolith, 'users');
const productsTable = makeTable(monolith, 'products');
const ordersTable = makeTable(monolith, 'orders');
// We have a single Lambda function in our application
const func = new Function(monolith, `MonolithFunction`, {
code: Code.fromInline(`Some code that accesses all three tables`),
runtime: Runtime.NODEJS_22_X,
handler: 'index.handler',
});
usersTable.grantReadWriteData(func);
productsTable.grantReadWriteData(func);
ordersTable.grantReadWriteData(func);
// This function creates a REST API, resources, methods, and links
// everything together to the functions. Right now, we are passing
// the same function in three places.
makeApi(monolith, {
usersFunction: func,
productsFunction: func,
ordersFunction: func,
});
}
monolithApp();
We’ve been asked to adhere to Well Architected Framework best practices and break up the monolith into separate Lambda functions so they can scale independently. Because they’re so similar, we’re also going to create an inheritable Lambda class that we can reuse to improve readability and maintainability of the code, and avoid having to re-define Lambda configuration settings that are consistent across all of the functions.
Finally, the monolith uses only L1 CDK Constructs. To further abstract our code and take advantage of helper functions, we’re going to start using L2 CDK Constructs for DynamoDB, Lambda, and API Gateway. This change will allow the IAM Roles and permissions to be defined automatically, further simplifying our code.
Proposed refactored application into separate stacks for each domain.
Without the refactor feature, CloudFormation would delete and re-create the Lambda and DynamoDB resources, which would cause all of the data in the latter to be lost. Alternatively, you could create net-new Lambdas and Amazon DynamoDB tables in one deployment, execute an out-of-band, point-in time and streaming data migration from the old tables to the new ones, update the API Gateway configuration to target the new Lambdas in a second deploy, and turn off the streaming migration process.
With the refactor feature, we can move the resource definitions to new files, update them to L2 Constructs, and leave the stateful resources in place!
Replace stateless resources First, let’s refactor our CDK code to break the monolithic Lambda into 3 domain-specific Lambdas. CloudFormation’s refactor capability doesn’t support creating new resources or updating configuration of existing resources, so we will deploy these changes as usual, without using the new refactor feature. All resources will stay inside the monolithic stack for now.
Refactor stateless single lambda function into 3 functions as a prerequisite to the refactor of stateful DynamoDB tables.
function singleStackMicroservicesApp() {
// We still have a single stack
const monolith = new CdkAppStack(app, monolithStackName, {env});
// makeFunctionAndTable creates a different Lambda function and a DynamoDB table
// for each domain that is passed as a parameter.
// In a real CDK application, you would probably define each of them independently.
makeApi(monolith, {
usersFunction: makeFunctionAndTable(monolith, 'users'),
productsFunction: makeFunctionAndTable(monolith, 'products'),
ordersFunction: makeFunctionAndTable(monolith, 'orders'),
});
}
singleStackMicroservicesApp();
Refactor stateful resources Now we can refactor the stateful DynamoDB tables and their respective Lambdas to their own stacks, using cdk refactor to map their new IDs without replacing the resources.
Before refactoring, though, we need to create the new stacks that will receive the functions and tables:
singleStackMicroservicesApp();
const usersStack = new Stack(app, 'Users', {env});
const productsStack = new Stack(app, 'Products', {env});
const ordersStack = new Stack(app, 'Orders', {env});
Refactored Lambda functions and DynamoDB tables into their own separate stacks.
function fullMicroservicesApp() {
const monolith = new Stack(app, monolithStackName, {env});
const usersStack = new Stack(app, 'Users', {env});
const productsStack = new Stack(app, 'Products', {env});
const ordersStack = new Stack(app, 'Orders', {env});
makeApi(monolith, {
// Now each pair function + table is in its own stack
usersFunction: makeFunctionAndTable(usersStack, 'users'),
productsFunction: makeFunctionAndTable(productsStack, 'products'),
ordersFunction: makeFunctionAndTable(ordersStack, 'orders'),
});
}
fullMicroservicesApp();
Running cdk refactor –unstable=refactor starts the process. (The unstable flag is required as this feature is still subject to breaking changes.) The CDK will compare the current state of your application (the deployed monolithic app) with the new state (the output of your refactored CDK application).
CDK refactor confirmation dialog
As expected, it shows a table of resources that were moved from the Monolith stack to their respective refactored stacks. By default, the CLI asks for confirmation before proceeding. Bypass the confirmation by passing the –force flag, or confirm the changes and execute the refactor: All resources, including the stateful tables, were safely moved to other stacks, and we now have our well-architected application.
CDK refactor results
Conclusion With the CDK refactor feature, developers can take full advantage of the object-oriented definition of AWS resources, including the ability to change the structure and layers of abstraction without orchestrating complex migration mechanisms or scheduled downtime. Since the CDK is open source, you can learn more about how the CDK automatically determines what resources need to be refactored via the README. Understanding when resources need to be replaced and refactored will help you plan your infrastructure as code roadmap and when you should use this new refactoring capability.
If you’ve got a refactor that you’ve been waiting to execute, read more about the feature set in the CDK refactor documentation and start refactoring your own CDK App today!
On November 30th, 2025, the AWS Cloud Development Kit (CDK) will no longer support Node.js 18.x, which reached end of life on April 30, 2025. This change applies to all AWS CDK components that depend on Node.js, including the AWS CDK CLI, the Construct Library, and broader CDK ecosystem projects such as JSII, Projen, and CDK8s.
We encourage you to upgrade to a Node.js Active Long Term Support (LTS) version, which is Node.js 22.x as of July 6, 2025. Given that Node.js 18.x is past end of life, we recommend migrating your CDK projects to newer Node.js LTS versions as soon as possible.
Why are we doing this?
Node.js 18.x reached its End of Life support on April 30, 2025, per the Node.js Release Schedule. This means the Node.js community no longer provides bug fixes or security updates for this version. By dropping support for end-of-life versions, we ensure that AWS CDK users benefit from the latest security patches, performance improvements, and modern Node.js capabilities. This approach aligns with AWS’s commitment to security best practices and our standard policy of supporting only actively maintained runtime versions.
What’s changing?
Starting December 1, 2025, AWS CDK will officially end support for Node.js 18.x. While your existing CDK deployments may continue to function, we will no longer address issues, provide bug fixes, or offer technical support for problems that occur specifically with Node.js 18.x. Any support cases or bug reports related to Node.js 18.x will require reproduction on a supported Node.js version (20.x or 22.x as of June 2025) before we can assist.
Key points
Moving forward, projects that remain on Node.js 18.x will gradually lose access to new AWS CDK capabilities as we develop features using modern Node.js APIs that are not available in older versions. This creates a growing compatibility gap that will make it increasingly challenging to leverage CDK innovations and improvements. The security implications are equally concerning, as any vulnerabilities discovered in the unsupported Node.js 18.x runtime will not receive patches or workarounds from our development team, potentially exposing your infrastructure to known security risks.
The challenges extend throughout the development lifecycle. Without regular compatibility testing against Node.js 18.x, we cannot ensure reliable CDK behavior, and you may encounter unexpected issues in production environments. When problems do arise, our support team will need to reproduce any reported issues on supported Node.js versions before providing assistance, which could delay resolution during critical incidents. Additionally, the broader CDK ecosystem, including third-party libraries and tools your projects depend on, will likely follow similar deprecation schedules, creating compounding compatibility challenges that become more difficult to resolve over time.
Timeline
We’re announcing this change in July 2025, to provide you with a five-month transition period before support officially ends on November 30th, 2025. During this transition window, our team will continue to address issues that arise with Node.js 18.x, giving you time to plan, test, and execute your upgrade strategy without immediate pressure. This period is designed to help you thoroughly validate your CDK projects against newer Node.js versions and ensure smooth deployments in your production environment.
Beginning December 1st, 2025, AWS CDK will officially discontinue support for Node.js 18.x across all components and ecosystem projects. From this point forward, all bug fixes, security patches, and new feature development will target only supported Node.js versions, currently 20.x and 22.x as of June 2025. We strongly recommend using this transition period to migrate to Node.js 22.x, the current Active Long Term Support version, which will provide the longest runway for future compatibility as the Node.js release cycle continues.
Version validation and update steps
Begin your migration by checking which Node.js version you’re currently running across all environments where you deploy CDK projects. Run `node -v` in your local development environment, CI/CD pipelines, and any automated deployment systems to get a complete picture of your current setup.
Once you’ve identified all instances of Node.js 18.x, update your runtime to a supported version using either a version manager like nvm or by downloading the official installer from nodejs.org. We recommend upgrading directly to Node.js 22.x since it’s the current Active Long Term Support version and will provide the longest compatibility runway. After updating your runtime, thoroughly test your CDK projects in non-production environments to ensure your deployment scripts and third-party dependencies work correctly with the new version. Pay particular attention to any custom constructs or complex deployment workflows that may be sensitive to changes in Node.js versions.
Finally, establish a process for staying current with future Node.js releases by bookmarking the AWS CDK Node.js Version Support Timeline, which provides up-to-date information on runtime compatibility and upcoming deprecations. This proactive approach will help you anticipate future changes and plan your upgrade strategies well in advance, avoiding the pressure of last-minute migrations when support windows close.
Conclusion
This deprecation is part of our ongoing commitment to provide a secure, high-quality experience for AWS CDK users. By migrating to a Node.js Active Long Term Support (LTS) version, you will benefit from enhanced performance, ongoing security updates, and continued AWS CDK advancements. If you have any questions or concerns about this deprecation, please reach out and open an issue in our GitHub repo.
Today, we’re announcing the release of aws-eks-v2 construct, a new alpha version of AWS Cloud Development Kit (CDK) L2 construct for Amazon Elastic Kubernetes Service (EKS). This construct represents a significant change in how developers can define and manage their EKS environments using infrastructure as code. While maintaining the powerful capabilities of its predecessor library for creating and managing EKS clusters, this alpha release introduces key architectural improvements that enhance both flexibility and maintainability.
The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework that enables you to define your cloud infrastructure using familiar programming languages and deploy it through AWS CloudFormation. The CDK uses constructs – a layered abstraction concept where Layer 1 (L1) constructs map directly to CloudFormation resources, while Layer 2 (L2) constructs provide intuitive APIs, helper functions, best-practice defaults, and generate a lot of the boilerplate code and glue logic for you. This layered approach means you can seamlessly move between high-level abstractions for common use cases and low-level resource definitions when you need fine-grained control. The result is an Infrastructure as Code (IaC) experience that helps you maintain productivity while ensuring you have access to the full power of AWS services when you need it. You can read more about constructs and their benefits in the CDK user guide.
In this post we’ll explore:
The reasoning behind the creation of a new L2 construct for EKS and the improvements introduced by this new library
How to use the new EKS v2 construct
Background
Amazon EKS is a managed Kubernetes service that makes it easy to run Kubernetes on AWS without needing to manage the control plane or nodes. EKS automatically handles critical tasks like patching, node provisioning, and upgrades. You can run EKS using EC2 instances for worker nodes, AWS Fargate for serverless containers, or a combination of both, providing the flexibility to choose the right compute option for your workloads.
While the existing EKS L2 construct has served customers well, we identified opportunities to further enhance the developer experience and operational efficiency based on their feedback. The new aws-eks-v2 construct delivers significant improvements through native AWS CloudFormation resources, modern Access Entry-based authentication, and enhanced architectural flexibility. Key benefits include reduced deployment overhead, simplified cluster access management, support for multiple EKS clusters within a single stack, and granular control over resource creation with features like the optional kubectl Lambda handler. These improvements help customers build and manage their EKS infrastructure more efficiently while maintaining the robust functionality they expect from AWS CDK constructs.
Using the L2
Given that this construct is in the alpha stage, you’ll need to install and import the construct using the experimental construct libraries process. During the alpha stage, the CDK team is actively gathering customer feedback and iterating on the implementation. Once the construct meets our bar for general availability, we’ll integrate it directly into the AWS CDK core library, making it as easily accessible as our other L1 and L2 constructs. This approach allows us to rapidly deliver new capabilities while ensuring they meet the high standards our customers expect.
Deploying EKS Cluster with Default Configuration
Let’s explore how to create an Amazon EKS cluster using AWS CDK aws-eks-v2 construct with minimal configuration requirements. The following example demonstrates the most straightforward way to define an EKS cluster, leveraging the power of CDK’s opinionated defaults. Creating a new cluster is done using the Cluster construct. The only required property is the Kubernetes version.
import * as eksv2 from '@aws-cdk/aws-eks-v2-alpha';
// Creating an EKS Cluster with default properties
const eksCluster = new eksv2.Cluster(this, 'EksCluster', {
version: eksv2.KubernetesVersion.V1_32
});
This translates in the following Architecture as shown in figure 1:
Amazon Virtual Private Cloud (VPC) – A logically isolated section of the AWS Cloud that spans across two Availability Zones, equipped with an Internet Gateway to enable secure communication with the internet. This multi-AZ design helps ensure your applications remain available even if an Availability Zone experiences issues.
Amazon EKS Control Plane – A fully managed Kubernetes control plane deployed in an AWS-managed VPC , providing high availability and automatic version management for the Kubernetes control plane components.
Public Subnet Infrastructure – Two public subnets, each with its own NAT Gateway Instance, enabling your cluster components to securely access the internet for essential operations like pulling container images and downloading updates. These NAT Gateways provide a secure outbound path while protecting your workloads from direct internet exposure.
Private Subnet Configuration – Two private subnets optimized for running your EKS worker nodes, offering enhanced security by isolating your workloads from direct internet access while maintaining the ability to communicate with AWS services and the internet through the NAT Gateways.
IAM Security Foundation – A comprehensive set of IAM roles and policies that implement the principle of least privilege:
Control plane service role that enables EKS to manage AWS resources on your behalf
Node IAM role that allows worker nodes to interact with other AWS services and join the EKS cluster
You can also use FargateCluster to provision a cluster that uses only Fargate workers.
import * as eksv2 from '@aws-cdk/aws-eks-v2-alpha';
// Creating an EKS Cluster with default properties and Fargate workers
const eksFargateCluster = new eksv2.FargateCluster(this, 'EksFargateCluster', {
version: eksv2.KubernetesVersion.V1_32,
});
To help our customers maintain better control over their cluster access patterns, the Kubectl Handler is not automatically deployed with the default configuration. You can easily enable this functionality by configuring the kubectlProviderOptions property when you need kubectl access management as shown below.
import * as eksv2 from '@aws-cdk/aws-eks-v2-alpha';
import { KubectlV32Layer } from '@aws-cdk/lambda-layer-kubectl-v32'
// Creating an EKS Cluster with default properties and kubectl handler
const eksCluster = new eksv2.Cluster(this, 'EksCluster', {
version: eksv2.KubernetesVersion.V1_32,
kubectlProviderOptions: {
kubectlLayer: new KubectlV32Layer(this, 'KubectlLayer')
},
});
Deploying EKS Cluster with AutoMode
EKS Auto Mode represents a significant advancement in how Amazon EKS manages compute capacity for Kubernetes clusters. This intelligent capacity management system automatically provisions and scales node groups based on workload demands, removing the need for manual capacity planning.
When you create a new cluster with the aws-eks-v2 construct, EKS Automode is activated by default, by means that DefaultCapacityType.AUTOMODE is automatically set as the default capacity type for the EKS Cluster. If you prefer, you can specify the defaultCapacityType to AutoMode:
import * as eksv2 from '@aws-cdk/aws-eks-v2-alpha';
// Creating an EKS Cluster with AutoMode
const eksCluster = new eksv2.Cluster(this, 'EksCluster', {
version: eksv2.KubernetesVersion.V1_32,
defaultCapacityType: eksv2.DefaultCapacityType.AUTOMODE, // default value
});
After deploying the Stack containing the construct instance, in the EKS Console you’ll be able to see that an EKS Cluster has been created with AutoMode enabled:
Figure 2 – EKS Cluster Deployed with Automode
Auto Mode enhances your Amazon EKS experience by automatically configuring two strategically designed node pools out of the box:
A system node pool optimized for running critical cluster system components and add-ons, ensuring reliable cluster operations.
A general node pool specifically tuned for your application workloads, providing the flexibility needed for diverse containerized applications.
You can configure which node pools to enable through the compute property:
import * as eksv2 from '@aws-cdk/aws-eks-v2-alpha';
// Creating an EKS Cluster with Automode and selecting nodePools
const eksCluster = new eksv2.Cluster(this, 'EksCluster', {
version: eksv2.KubernetesVersion.V1_32,
defaultCapacityType: eksv2.DefaultCapacityType.AUTOMODE,
compute: {
nodePools: ['system', 'general-purpose'],
},
});
Deploying EKS Cluster with Managed Node Groups
Amazon EKS Managed Node Groups deliver a seamless compute management experience for your Kubernetes clusters. This powerful capability eliminates operational complexity by automating the end-to-end lifecycle of Amazon EC2 instances that power your containerized applications. Behind the scenes, Amazon EKS managed node groups intelligently orchestrate these changes, ensuring zero-disruption to your applications through graceful node draining. The service automatically leverages the latest Amazon EKS-optimized AMIs, providing a secure and optimized foundation for your workloads.
By setting defaultCapacityType to NODEGROUP, customers can leverage the traditional managed node group management approach:
import * as eksv2 from '@aws-cdk/aws-eks-v2-alpha';
// Creating an EKS Cluster with Managed Node Groups and default instance types
const eksCluster = new eksv2.Cluster(this, 'EksCluster', {
version: eksv2.KubernetesVersion.V1_32,
defaultCapacityType: eksv2.DefaultCapacityType.NODEGROUP,
});
By default, when using DefaultCapacityType.NODEGROUP, this library will allocate a managed node group with two m5.large instances. After deploying the above code, you can check the EKS Console to see that an EKS Cluster has been deployed as shown in figure 3:
Figure 3 – EKS Cluster Deployed with Managed Node Groups
You can also check the Compute tab and see the Managed Node Group Configuration as shown in figure 4:
Figure 4 – EKS Cluster Managed Node Group Default Configuration
If you want to have control over instance types of a Managed Node Group, you can specify the default EC2 type as property of the construct:
import * as eksv2 from '@aws-cdk/aws-eks-v2-alpha';
import * as ec2 from 'aws-cdk-lib/aws-ec2'
// Creating an EKS Cluster with Managed Node Groups and specific instance types
const eksCluster = new eksv2.Cluster(this, 'EksCluster', {
version: eksv2.KubernetesVersion.V1_32,
defaultCapacityType: eksv2.DefaultCapacityType.NODEGROUP,
defaultCapacity: 5,
defaultCapacityInstance: ec2.InstanceType.of(ec2.InstanceClass.M5, ec2.InstanceSize.SMALL),
});
You can also specify additional customizations after the EKS cluster declaration, via the addNodegroupCapacity method:
import * as eksv2 from '@aws-cdk/aws-eks-v2-alpha';
import * as ec2 from 'aws-cdk-lib/aws-ec2'
// Creating an EKS Cluster with Managed Node Groups and specific instance types
const eksCluster = new eksv2.Cluster(this, 'EksCluster', {
version: eksv2.KubernetesVersion.V1_32,
defaultCapacityType: eksv2.DefaultCapacityType.NODEGROUP,
defaultCapacity: 0,
});
eksCluster.addNodegroupCapacity('custom-node-group', {
instanceTypes: [new ec2.InstanceType('m5.large')],
minSize: 4,
diskSize: 100,
});
Managing Permissions through Access Entries
The new aws-eks-v2 construct transitions away from the previous ConfigMap-based authentication (which is deprecated in EKS) in favor of the Access Entries Authentication mode. This change introduces Access Entry as the standardized method for managing cluster permissions, offering a more streamlined and secure approach to granting cluster access to IAM users and roles.
You can define Access Policies through the AccessPolicy construct and you can adjust the scope of the Access Policy to the entire EKS cluster or to specific EKS Namespaces:
import * as eksv2 from '@aws-cdk/aws-eks-v2-alpha';
// AmazonEKSClusterAdminPolicy with `cluster` scope
eks.AccessPolicy.fromAccessPolicyName('AmazonEKSClusterAdminPolicy', {
accessScopeType: eks.AccessScopeType.CLUSTER,
});
// AmazonEKSAdminPolicy with `namespace` scope
eks.AccessPolicy.fromAccessPolicyName('AmazonEKSAdminPolicy', {
accessScopeType: eks.AccessScopeType.NAMESPACE,
namespaces: ['foo', 'bar']
});
You can then grant access to specific IAM Roles using the grantAccess method:
import * as iam from 'aws-cdk-lib/aws-iam'
// Defining a IAM Role
const clusterAdminRole = new iam.Role(this, 'ClusterAdminRole', {
assumedBy: new iam.ArnPrincipal('arn_for_trusted_principal'),
});
// Creating an EKS Cluster with AutoMode
const eksCluster = new eksv2.Cluster(this, 'EksCluster', {
version: eksv2.KubernetesVersion.V1_32,
defaultCapacityType: eksv2.DefaultCapacityType.AUTOMODE,
});
// Cluster Admin role for this cluster
eksCluster.grantAccess('clusterAdminAccess', clusterAdminRole.roleArn, [
eks.AccessPolicy.fromAccessPolicyName('AmazonEKSClusterAdminPolicy', {
accessScopeType: eks.AccessScopeType.CLUSTER,
}),
]);
When the Principal assumes the ClusterAdminRole, it receives seamless access to the EKS cluster through a carefully orchestrated permission chain. This access is governed by the AmazonEKSClusterAdminPolicy, which is automatically attached to the Access Policy linked to the IAM Role.
Conclusion
In this post, we introduced the new AWS CDK L2 construct (aws-eks-v2) for Amazon EKS, demonstrating how it simplifies cluster deployment while offering enhanced flexibility and operational efficiency. Through practical examples, we showcased how customers can leverage the construct’s intelligent defaults and customization options to build production-ready Kubernetes environments on AWS
The new L2 construct for Amazon EKS delivers significant improvements that help customers accelerate their container adoption journey:
Enhanced Performance: Eliminates dependency on Custom Resources and AWS Lambda functions by utilizing native AWS CloudFormation resources, resulting in faster and more reliable deployments.
Modern Authentication: Implements Access Entry-based authentication, replacing the deprecated ConfigMap approach with a more secure and programmable solution.
Improved Scalability: Removes the single-cluster-per-stack limitation and eliminates nested stacks, enabling more flexible architectural patterns.
Optimized Resource Creation: Makes the kubectl Lambda handler optional, giving customers fine-grained control over their infrastructure components.
Streamlined Operations: Provides automated node group management with intelligent defaults while maintaining full customer control when needed.
To get started with the new EKS L2 construct, visit the AWS CDK documentation. If you have specific features you’d like to see added, we encourage you to submit a feature request in the aws-cdk GitHub repository. Your feedback helps us continue innovating on your behalf.
Today we’re announcing the general availability (GA) of the Amazon EventBridge Scheduler and Targets Level 2 (L2) constructs in the AWS Cloud Development Kit (AWS CDK) construct library. EventBridge Scheduler is a serverless scheduler that enables users to schedule tasks and events at scale. Prior to the launch of these L2 constructs, developers had to define all relevant properties (via L1 constructs) across schedules and provide the glue logic between resources when defining their AWS CDK applications. The graduated constructs make it easier for users to configure EventBridge schedules, groups, and targets for AWS service integrations. They follow the AWS CDK L2 higher-level API design simplifications and provide a backwards-compatible guarantee across minor versions. Developers can use those alongside other existing stable AWS CDK constructs ready for production use.
Background
The AWS Cloud Development Kit (CDK) is an open-source software development framework for defining cloud infrastructure in code and provisioning it through AWS CloudFormation. It contains pre-written modular and reusable cloud components known as constructs. Constructs are the basic building blocks representing one or more AWS CloudFormation resources and their configuration. They are available in different abstraction levels. L1 constructs are the lowest-level constructs which map directly to AWS CloudFormation resources without abstractions. L2 constructs are thoughtfully developed and provide a higher-level abstraction through an intuitive intent-based API. They leverage default property configurations, best practice security policies, and convenience methods that make it simpler and quicker to define and deploy resources.
Amazon EventBridge Scheduler is a serverless scheduler that allows users to create, run, and manage tasks from one central, managed service. With EventBridge Scheduler, users can create schedules using cron and rate expressions for recurring patterns, or configure one-time invocations. EventBridge supports templated and universal targets. Templated targets include common API operations across a group of core AWS services, such as publishing a message to an Amazon Simple Notification Service (Amazon SNS) topic or invoking an AWS Lambda function. Universal targets are customized triggers supporting more than 270 AWS services and over 6,000 API operations on a schedule. Users can use schedule groups to organize their schedules.
With the L2 constructs for Amazon EventBridge Scheduler and Targets, it becomes even simpler for users to configure and integrate those resources into their CDK applications. Let’s explore the benefits by looking at some examples.
Using the L2 EventBridge Scheduler construct
We introduce two use cases for the EventBridge Scheduler and Targets L2 constructs to demonstrate their usage within common scenarios. Each example is equipped with sample code, emphasizing the simplifications achieved by the L2 constructs.
Example 1 – One time reminder through Amazon SNS
In the first use case, users want to configure one-time notifications to receive reminders of their favorite conferences at a specific time, for example a user may want to set a reminder one month before the start of AWS re:Invent to be reminded of their participation.
The example below uses the EventBridge Scheduler construct with a templated Amazon SNS target. The target applies an on-time schedule configuration and is configured with an Amazon Simple Queue Service (Amazon SQS) dead-letter queue to capture and retry failed events. The schedule payload is encrypted using a customer-managed AWS Key Management Service (AWS KMS) key.
const snsTarget = new targets.SnsPublish(topic, {
input: ScheduleTargetInput.fromObject({
message: "Reminder: AWS re:Invent starts in one month.",
}),
deadLetterQueue: deadLetterQueue,
});
const schedule = new Schedule(this, "ReminderSchedule", {
description:
"This schedule publishes a one-time notification to an Amazon SNS topic.",
schedule: ScheduleExpression.at(
new Date(2025, 10, 1), // Nov 01, 2025
cdk.TimeZone.AMERICA_LOS_ANGELES
),
target: snsTarget,
key: key,
});
From the code example, we can see that well-defined interfaces for ScheduleTargetInput, and ScheduleExpression make it easy to select matching configuration values.
The SnsPublish target and Schedule constructs seamlessly integrate with the existing L2 constructs for Amazon SNS, Amazon SQS, and Amazon KMS. They abstract away the gluing logic used to configure the target API operation, dead-letter queue, and encryption settings with correct references. Instead of manually crafting permissions, the construct generates an AWS Identity and Access Management (IAM) execution role with the minimum necessary permissions to interact with the templated target, as shown in the policy below.
The construct sets default properties. For example, it applies default configurations for the retry policy if not explicitly stated. As shown in Figure 1, the above defined schedule has been defined with a 1-day maximum event retention time and 185 maximum retries.
Example 2 – Start / Stop EC2 instance during business hours
In the second scenario, a recurring cron schedule is used to automatically stop Amazon EC2 instances during the business hours of a specific time zone.
The example below uses the EventBridge Scheduler construct with a universal target to perform the Amazon EC2 stopInstance API operation. It creates a custom schedule group to organize the schedules by time zone and allows an Amazon Lambda function to read all schedules in it for administrative purposes.
const group = new ScheduleGroup(this, "ScheduleGroup", {
scheduleGroupName: "Europe-London",
});
new Schedule(this, "Schedule", {
schedule: ScheduleExpression.cron({
minute: "0",
hour: "23",
timeZone: cdk.TimeZone.EUROPE_LONDON,
}),
target: new targets.Universal({
service: "ec2",
action: "stopInstances",
input: ScheduleTargetInput.fromObject({
InstanceIds: [ec2Instance.instanceId],
}),
}),
scheduleGroup: group,
});
group.grantReadSchedules(lambdaFunction);
Similar to the first example, the ScheduleExpression and ScheduleTargetInput help users to define the correct input types. The universal target is one of the options allowed by the scheduler-target constructs that allow users to perform SDK API operations on AWS services such as Amazon EC2.
The ScheduleGroup construct is used to create the group, which is used as a property on the Schedule construct. The group implements convenience methods that allow simplified permissions management. The example above grants read permissions for the schedule group to an Amazon Lambda function, which is applied to the resources without additional configuration.
Community Shout-Outs
The CDK team would like to give a huge shout-out to the awesome members of the community that contributed to this construct to help get it where it is today! Thank you to:
In this post, we introduced the general availability of the AWS CDK L2 construct for Amazon EventBridge Scheduler and Targets. We showcased practical implementations of the new construct, leveraging two example use cases. For more details on the EventBridge Scheduler L2 construct and examples of its use, see the Scheduler CDK Documentation.
If you’re new to AWS CDK and want to get started, we highly recommend checking out the CDK documentation and the CDK workshop.
Welcome to the 28th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. At the end of a quarter, we share the most recent product launches, feature enhancements, blog posts, videos, live streams, and other interesting things that you might have missed!
In case you missed our last ICYMI, check out what happened in Q4 2024 here.
Serverless calendar Q1 2025
AWS Step Functions
The AWS Step Functions team continues to improve developer experience. Workflow Studio is now available within Visual Studio Code (VS Code) through the AWS Toolkit extension.
AWS Step Functions in IDE
You can now design, test, and deploy your Step Functions workflows without leaving your IDE. The extension provides a drag-and-drop interface with all the familiar Workflow Studio capabilities, making it even easier to build state machines locally.
Step Functions private integrations now allows you to integrate applications seamlessly across private networks, on-premises infrastructure, and cloud platforms. Learn more in a blog post and explanation video.
Step Functions has increased the default quota for state machines and activities from 10,000 to 100,000 per AWS account. This tenfold increase means you can create more workflows to automate your business processes without worrying about hitting quota limits.
Distributed Map is expanding capabilities by adding support for JSON Lines (JSONL) format. JSONL, a highly efficient text-based format, stores structured data as individual JSON objects separated by newlines, making it particularly suitable for processing large datasets.
You no longer need to switch between your IDE and external resources when building serverless architectures. Browse, search, and implement pre-built serverless patterns directly in VS Code.
Example Serverless Pattern
AWS Lambda
Learn how AWS Lambda handles billions of invocations.
AWS Lambda asynchronous invocations
This blog post provides recommendations and insights for implementing highly distributed applications based on the Lambda service team’s experience building its robust asynchronous event processing system. It dives into challenges you might face, solution techniques, and best practices for handling noisy neighbors.
Amazon CloudWatch Application Signals for Java and .NET AWS Lambda runtimes
This provides deep visibility into your function’s performance, including method-level tracing, memory profiling, and automated anomaly detection.
Amazon Bedrock features
Multi-agent collaboration is now available in Bedrock as a preview, enabling you to create systems where multiple AI agents work together to solve complex problems. Agents can specialize in different domains, share context, and coordinate their actions to achieve goals that would be difficult for a single agent.
RAG evaluation is now generally available. This provides metrics to assess and improve your retrieval augmented generation pipelines. GraphRAG for Bedrock Knowledge Bases is now generally available, allowing you to enhance retrievals with graph-based context.
Amazon Bedrock Flows now supports multi-turn conversations, allowing you to build dynamic AI applications that maintain context across multiple user interactions. Bedrock data automation is now generally available, streamlining the process of preparing, ingesting, and maintaining data for your GenAI applications. Bedrock now offers LLM-as-a-judge capability for model evaluation, providing automated assessment of model outputs without requiring human reviewers. Compare different models or prompt strategies against your specific criteria at scale.
Bedrock’s capabilities are now integrated into the Amazon SageMaker Unified Studio, creating a seamless experience for machine learning practitioners who want to incorporate foundation models into their workflows. Access Bedrock models, fine-tuning, and evaluation directly from SageMaker.
Amazon Nova is a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry leading price-performance. Nova has expanded its tool use and converse API capabilities, making it easier for developers to build AI assistants that can use external tools to complete tasks.
Amazon Bedrock Guardrails image content filters are now generally available. Define and enforce boundaries for your AI applications with controls for both text and image content, ensuring outputs align with your organization’s policies.
Bedrock Knowledge Bases now supports using your existing OpenSearch clusters as the vector storage backend. This integration allows you to leverage your investments in OpenSearch while benefiting from the managed RAG capabilities of Bedrock.
New Amazon Bedrock models
Anthropic’s Claude 3.7 Sonnet hybrid reasoning allows you to toggle between standard and extended thinking modes. In standard mode, it functions as an upgraded version of Claude 3.5 Sonnet. While in extended thinking mode, it employs self-reflection to achieve improved results across a wide range of tasks.
DeepSeek R1, an advanced model specialized in research and scientific reasoning excels at complex problem-solving tasks and technical content generation.
Cohere Embed 3 models are now available in both multilingual and English-specific versions. These embedding models support text and images, providing more accurate representation for multimodal content and improving retrieval augmented generation (RAG) applications.
Ray2, Luma AI’s new visual AI model is capable of creating realistic visuals with fluid, natural movement. You can use it for image understanding, 3D scene reconstruction, and visual content generation, opening new possibilities for immersive and visual applications.
Bedrock now supports fine-tuning of Meta’s latest Llama 3.2 models. These upgraded models deliver improved performance across reasoning, coding, and multilingual tasks while being more efficient with computational resources.
Amazon Q Developer
Amazon Q Developer is now available as a CLI agent, bringing AI-assisted development to the command line. Get contextual recommendations, generate shell commands, and solve coding problems without leaving your terminal.
Amazon Q CLI
Amazon Q Developer transformation now supports upgrading Java applications using Maven to Java 21. It offers enhanced code suggestions, refactoring, and optimization recommendations for applications using the latest Java features, like virtual threads and pattern matching.
AWS AppSync
AWS AppSync Events now supports events publishing for WebSocket APIs, enabling real-time publish-subscribe functionality. This feature makes it easier to build applications requiring instant updates, like chat applications, collaborative tools, and real-time dashboards.
AWS AppSync Events
There are new AWS Cloud Development Kit (AWS CDK) L2 constructs for AppSync WebSocket APIs. These make it simpler to define and deploy real-time APIs using infrastructure as code. These high-level constructs handle the details of WebSocket connections, authorization, and messaging patterns.
Amazon SNS
Amazon SNS now supports high throughput mode for SNS FIFO topics, with default throughput matching SNS standard topics. When you enable high-throughput mode, SNS FIFO topics will maintain order within message group, while reducing the de-duplication scope to the message-group level.
The EventBridge console now features event source discovery, making it easier to find and visualize available event sources in your AWS environment. This tool helps you identify potential event producers and understand the event schemas they emit.
AWS Amplify
AWS Amplify now offers a TypeScript data client optimized for server-side Lambda functions, providing type-safe access to your data sources. This client reduces code complexity and improves reliability when working with databases and APIs in server environments.
The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.
You can also follow the Developer Advocacy team members who work on Serverless to see the latest news, follow conversations, and interact with the team.
Today, we are launching IPv6 support for Amazon API Gateway across all endpoint types, custom domains, and management APIs, in all commercial and AWS GovCloud (US) Regions. You can now configure REST, HTTP, and WebSocket APIs, and custom domains, to accept calls from IPv6 clients alongside the existing IPv4 support. You can also call API Gateway management APIs from dual-stack (IPv6 and IPv4) clients. As organizations globally confront growing IPv4 address scarcity and increasing costs, implementing IPv6 becomes critical for future-proofing network infrastructure. This dual-stack approach helps organizations maintain future network compatibility and expand global reach. To learn more about dualstack in the Amazon Web Services (AWS) environment, see the IPv6 on AWS documentation.
When creating a new API or domain name in the console, select IPv4 only or dualstack (IPv4 and IPv6) for the IP address type.
As shown in the following image, you can select the dualstack option when creating a new REST API. For custom domain names, you can similarly configure dualstack as shown in the next image.
If you need to revert to IPv4-only for any reason, you can modify the IP address type setting, with no need to redeploy your API for the update to take effect.
REST APIs of all endpoint types (EDGE, REGIONAL and PRIVATE) support dualstack. Private REST APIs only support dualstack configuration.
AWS CDK
With AWS CDK, start by configuring a dual-stack REST API and domain name.
const api = new apigateway.RestApi(this, "Api", {
restApiName: "MyDualStackAPI",
endpointConfiguration: {ipAddressType: "dualstack"}
});
const domain_name = new apigateway.DomainName(this, "DomainName", {
regionalCertificateArn: 'arn:aws:acm:us-east-1:111122223333:certificate/a1b2c3d4-5678-90ab',
domainName: 'dualstack.example.com',
endpointConfiguration: {
types: ['Regional'],
ipAddressType: 'dualstack'
},
securityPolicy: 'TLS_1_2'
});
const basepathmapping = new apigateway.BasePathMapping(this, "BasePathMapping", {
domainName: domain_name,
restApi: api
});
IPv6 Source IP and authorization
When your API begins receiving IPv6 traffic, client source IPs will be in IPv6 format. If you use resource policies, Lambda authorizers, or AWS Identity and Access Management (IAM) policies that reference source IP addresses, make sure they’re updated to accommodate IPv6 address formats.
For example, to permit traffic from a specific IPv6 range in a resource policy.
API Gateway dual-stack support helps manage IPv4 address scarcity and costs, comply with government and industry mandates, and prepare for the future of networking. The dualstack implementation provides a smooth transition path by supporting both IPv4 and IPv6 clients simultaneously.
To get started with API Gateway dual-stack support, visit the Amazon API Gateway documentation. You can configure dualstack for new APIs or update existing APIs with minimal configuration changes.
Special thanks to Ellie Frank (elliesf), Anjali Gola (anjaligl), and Pranika Kakkar (pranika) for providing resources, answering questions, and offering valuable feedback during the writing process. This blog post was made possible through the collaborative support of the service and product management teams.
(This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.)
On May 30th, 2025, the AWS Cloud Development Kit (CDK) will no longer support Node.js 14.x and 16.x, which reached end of life on 4/30/2023 (14.x) and 9/11/2023 (16.x). This change applies to all AWS CDK components that depend on Node.js, including the AWS CDK CLI, the Construct Library, and broader CDK ecosystem projects such as JSII, Projen, and CDK8s.
We encourage you to upgrade to a Node.js Active Long Term Support (LTS) version, which is Node.js 22.x as of March 11th, 2025. Given that Node.js 14.x and 16.x are past end of life, we recommend migrating your CDK projects to newer Node.js LTS versions as soon as possible.
Why are we doing this?
Node.js 14x and 16.x are past their End of Life and are no longer supported by the Node.js community. This means that there have not been any bug fixes or security updates to these versions. To make sure that we are providing up-to-date and secure libraries, we will drop support for these versions.
What’s changing?
After May 30th, 2025, the AWS CDK will no longer support Node.js 14.x and 16.x. While your existing deployments may continue to work, we will not address issues specific to these versions. Any bug reports or support cases that stem from using Node.js 14.x or 16.x will require reproducing the issue on a supported version of Node.js (18.x, 20.x, 22.x – as of February 26th, 2025) before further assistance can be provided.
Key points
New features for the AWS CDK may rely on APIs or functionalities only available in supported versions of Node.js.
Critical security patches and fixes related to Node.js 14.x or 16.x will not be backported.
Compatibility testing will no longer be performed for Node.js 16.x, making it difficult to guarantee the CDK’s behavior with that runtime.
Timeline
March 11, 2025 through May 30, 2025
We will continue to support that arise for Node.js 14.x and 16.x during this period.
The AWS CDK is officially dropping support for Node.js 14.x and 16.x.
Any bug fixes or security patches will target only supported versions of Node.js (18.x, 20.x, 22.x – as of March 11th, 2025)
Version validation and update steps
Check your current Node.js version
Run node -v in your environment or CI/CD pipelines to see which version of Node.js you’re currently using.
Update your environment
Install or switch your runtime to Node.js using a supported version via a version manager (e.g., nvm) or by downloading an official installer from nodejs.org.
Validate your AWS CDK projects
Ensure your deployment scripts, and any third-party dependencies work correctly under a supported version. Test thoroughly in non-production environments.
Looking ahead
For more information on our deprecation strategy moving forward, please see this RFC, which provides more details.
Conclusion
This deprecation is part of our ongoing commitment to provide a secure, high-quality experience for AWS CDK users. By moving to a Node.js Active Long Term Support (LTS) version, you’ll benefit from improved performance, ongoing security patches, and continued AWS CDK innovations. If you have any questions or concerns about this deprecation, please reach out and open an issue in our GitHub repo.
The AWS Cloud Development Kit (CDK) is an open source framework that enables developers to define cloud infrastructure using a familiar programming language. Additionally, CDK provides higher level abstractions (Constructs), which reduce the complexity required to define and integrate AWS services together when building on AWS. CDK also provides core functionality like CDK Assets, which gives users the ability to bundle application assets into their CDK applications. These assets can be local files (main.py), directories (python_app/), or Docker images (Dockerfile). CDK Assets are stored in an Amazon Simple Storage Service (Amazon S3) Bucket or Amazon Elastic Container Registry (Amazon ECR) Repository that is created during CDK bootstrapping.
For CDK developers that leverage assets at scale, they may notice over time that the bootstrapped bucket or repository accumulated old or unused data. If users wanted to clean this data on their own, CDK didn’t provide a clear way of determining which data is safe to delete. To solve this problem, we are excited to announce the preview launch of CDK Garbage Collection, a new feature of the CDK that automatically deletes old assets in your bootstrapped Amazon S3 Bucket and Amazon ECR Repository, saving users time and money. This feature is available starting in AWS CDK version 2.165.0.
We expect CDK Garbage Collection to help AWS CDK customers save on storage costs associated with using the product while not affecting how customers use CDK.
Quickstart
CDK Garbage Collection is exposed as a CDK CLI command named gc. To use CDK Garbage Collection in its default configuration, run the following command on a terminal in your CDK application.
cdk gc --unstable=gc
The --unstable flag is meant to acknowledge that CDK Garbage Collection is in preview mode. This indicates that the scope and API of the feature might still change, but otherwise the feature is generally production ready and fully supported.
Walkthrough
CDK Garbage Collection works at the environment level, so it will attempt to delete isolated assets in the AWS account / region that you call it in. For the purposes of this walkthrough, you will be re-bootstrapping the environment with a custom qualifier so that you do not delete isolated assets before you are ready.
You now have a new bootstrap template under the name CDKToolkitDemo and bootstrap resources associated with it. Next, set up a CDK application with both Amazon S3 and Amazon ECR assets:
mkdir garbage-collection-demo && cd garbage-collection-demo
cdk init -l typescript app
Your next step is to replace the existing code In lib/garbage-collection-demo-stack.ts with the following CDK Stack:
import * as path from 'path';
import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as lambda from 'aws-cdk-lib/aws-lambda';
export class GarbageCollectionDemoStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
const fn1 = new lambda.Function(this, 'my-function-s3', {
code: lambda.Code.fromAsset(path.join(__dirname, '..', 'lambda')),
runtime: lambda.Runtime.NODEJS_LATEST,
handler: 'index.handler',
});
const fn2 = new lambda.Function(this, 'my-function-ecr', {
code: lambda.Code.fromAssetImage(path.join(__dirname, '..', 'docker')),
runtime: lambda.Runtime.FROM_IMAGE,
handler: lambda.Handler.FROM_IMAGE,
});
}
}
This creates two AWS Lambda functions, one which uses an Amazon S3 asset as its source code and one that uses an Amazon ECR image as its source code. You need to add the assets that are referenced to our CDK application. In lambda/index.js add a simple Lambda function:
At this point you can check to make sure that assets have been correctly added into the bootstrapped Amazon S3 bucket and Amazon ECR repository:
Two objects exist in the bootstrapped Amazon S3 Bucket after the initial AWS CDK Deploy.
One image exists in the bootstrapped Amazon ECR Repository after the initial AWS CDK Deploy.
The output shows that you have the data you expect in both bootstrapped resources. The Amazon S3 Bucket also stores the json file of the AWS CloudFormation Template that was generated when you ran cdk deploy.
You can now simulate a typical CDK development cycle by updating both assets. Add a small change to the Amazon S3 asset that lives in lambda/index.js:
FROM public.ecr.aws/docker/library/alpine:latest
CMD echo 'Hello World'
You can now run cdk deploy again, and both assets should be re-uploaded under a new hash.
Four objects exist in the bootstrapped Amazon S3 Bucket after the second AWS CDK Deploy.
Two images exist in the bootstrapped Amazon ECR Repository after the second AWS CDK Deploy.
This output confirms that everything is as expected and the new assets have been added in. Because you are using new bootstrapped resources, you can still tell which resources are currently isolated and which are not. Right now, only the zipfile prefixed with 50f409b9 is referenced in AWS CloudFormation, and in Amazon ECR, only the image prefixed a5801b5b is referenced. That means that every other asset — 3 objects in Amazon S3 and 1 object in Amazon ECR — are isolated and can be deleted.
One item to note is the additional files in Amazon S3 that are not your local assets — these are AWS CloudFormation templates that are uploaded to Amazon S3 as an intermediary step before being sent to AWS CloudFormation. They are not needed after being copied over and are a perfect candidate for deletion via CDK Garbage Collection.
Here is where CDK Garbage Collection comes in. With the right parameters, you are able to clean up the isolated objects while not disturbing the assets that are actively in use.
Because you want to delete assets immediately, and not tag them for deletion later, set rollback-buffer-days to 0. You also want to delete assets that were just created, so be sure to set created-buffer-days to 0 as well. The default for created-buffer-days is 1.
⏳ Garbage Collecting environment aws://912331974472/us-east-1...
Found 3 objects to delete based off of the following criteria:
- objects have been isolated for > 0 days
- objects were created > 0 days ago
Delete this batch (yes/no/delete-all)?
CDK Garbage Collection found three assets to be deleted from Amazon S3, which is to be expected. It prompts you to verify that you want to delete, which you do, so enter yes. You will then get this response:
Found 1 image to delete based off of the following criteria:
- images have been isolated for > 0 days
- images were created > 0 days ago
Delete this batch (yes/no/delete-all)?
Once again, this is to be expected for Amazon ECR, so you enter yes again. You then get the response:
At this point, CDK Garbage Collection is finished.
Details
CDK Garbage Collection exposes some parameters to help you customize the experience to your specific scenario. These options help you determine how aggressive you want your garbage collection to be.
rollback-buffer-days: this is the amount of days an asset has to be marked as isolated before it is eligible for deletion.
created-buffer-days: this is the amount of days an asset must live before it is eligible for deletion.
Rollback Buffer Days should be considered when you are not using cdk deploy and instead use a deployment method that operates on templates only, like a pipeline. If your pipeline can rollback without any involvement of the CDK CLI, this parameter will help ensure that assets are not prematurely deleted. When used, instead of deleting unused objects, cdk gc tags them with the current date. Subsequent runs of cdk gc will check this tag and delete the asset only after it has been tagged for longer than the specified buffer days.
Created Buffer Days should be considered if you want to be extra safe about assets that have been recently uploaded. When used, cdk gc filters out any assets that have not persisted that number of days. Note that this may not include assets that have been shared across multiple CDK Apps CDK reuses assets that are identical, and its possible that a recent deploy of a CDK App references an asset that was uploaded earlier.
For example, if you want to ensure that only assets that are over a month old and have been isolated for a week are deleted, you can specify:
Decision flow diagram of an asset as it gets audited for garbage collection.
Limitations of CDK Garbage Collection
During CDK Garbage Collection, we collect all stack templates to see what assets are in use. If garbage collection runs between the asset upload and stack deployment, there is a chance that it does not pick up the latest stack deployment, but it does pick up the latest asset. In this scenario, CDK Garbage Collection may delete those assets.
We recommend not deploying stacks while running CDK Garbage Collection. If that is unavoidable, setting --created-buffer-days will help as garbage collection will avoid deleting assets that are recently created. Finally, if you do experience a failed deployment, the mitigation is to redeploy, as the asset upload step will be able to re-upload the missing asset. In practice, this race condition is only for a specific edge case and unlikely to happen. However, we are working on a new method of storing CDK Assets to reduce the risk of this race condition. That work is being tracked in this issue.
Conclusion
CDK Garbage Collection helps users manage the lifecycle of unused CDK Assets in their AWS account. As users continue to scale with the CDK, tools like CDK Garbage Collection will play a crucial role in maintaining clean, efficient, and cost-effective cloud environments. We encourage CDK users to explore this feature, provide feedback, and incorporate it into their workflows to optimize their AWS resource management.
AWS CloudFormation enables you to model and provision your cloud application infrastructure as code-base templates. Whether you prefer writing templates directly in JSON or YAML, or using programming languages like Python, Java, and TypeScript with the AWS Cloud Development Kit (CDK), CloudFormation and CDK provide the flexibility you need. For organizations adopting multi-account strategies, CloudFormation StackSets offers a powerful capability to deploy resources across multiple regions and accounts in parallel.
Last year, we delivered broad set of enhancements that accelerated the development cycle, simplified troubleshooting, and introduced new deployment safety and configuration governance capabilities. Let’s dive into the key launches that shaped CloudFormation in 2024.
Development cycle improvements
Deploy stacks up to 40% faster with optimistic stabilization and configuration complete
In March, we introduced optimistic stabilization with the new CONFIGURATION_COMPLETE event, delivering up to 40% faster stack creation times. This new event signals that CloudFormation has created the resource and applied the configuration as defined in the stack template, allowing us to begin parallel creation of dependent resources. For example, if your stack contains resource B that depends on resource A, CloudFormation will now start provisioning resource B when resource A reaches the CONFIGURATION_COMPLETE state, rather than waiting for full stabilization. Read How we sped up AWS CloudFormation deployments with optimistic stabilization to learn more.
Figure 1: CloudFormation’s old and new deployment strategy
Catch template errors before deployment with early validation
In March, we launched early resource properties validation checks. This feature validates your stack operation upfront for invalid resource property errors, helping you fail fast and minimize the steps required for a successful deployment. Previously, you had to wait until CloudFormation attempted to provision a resource before discovering property-related errors. Now, we validate your template before deploying the first resource and provide clear error messages upfront.
Figure 2: CloudFormation’s early template properties validation feature
Safely clean up failed stacks with enhanced deletion controls
In May, we enhanced the DeleteStack API with a new DeletionMode parameter, allowing you to safely delete stacks that are in DELETE_FAILED state. By passing the FORCE_DELETE_STACK value to this parameter, you can now resolve stuck stacks more efficiently during your development and testing cycles.
Accelerate feedback loops with CloudFormation custom resource timeout controls
In June, we introduced the ServiceTimeout property for custom resources. This new capability allows you to set custom timeout values for your custom resource logic execution. Previously, custom resources had a fixed one-hour timeout, which could lead to long wait times when debugging custom resource logic. Now, you can set appropriate timeout values to accelerate your development feedback loops. Refer to the custom resourcesdocumentation to learn more about the ServiceTimeout property.
Figure 3: CloudFormation’s ServiceTimeout property for Custom resource
Streamlined Troubleshooting Experience
Resolve deployment issues faster with one-click CloudTrail access
In May, we launched integration with AWS CloudTrail in the Events tab of the CloudFormation console. Troubleshooting some failed stack operations can be time-consuming, so we have streamlined this process by providing direct links from stack operation events to relevant CloudTrail events. When you click ‘Detect Root Cause’ in the CloudFormation Console, you’ll now see a pre-configured CloudTrail deep-link to the API events generated by your stack operation, eliminating multiple manual steps from the troubleshooting process.
Figure 4: CloudFormation troubleshooting with CloudTrail integration
Visualize your entire deployment process with timeline view
In November, we launched deployment timeline view. It gives you unprecedented visibility into your stack operations. This visual tool shows the sequence of actions CloudFormation takes during a deployment, helping you understand resource dependencies and provisioning duration. You can see which resources are being created in parallel, track their status through color-coding, and quickly identify bottlenecks in your deployments.
Get instant troubleshooting help with Amazon Q Developer
We integrated Amazon Q Developer to provide AI-powered assistance for troubleshooting. When you encounter a failed stack operation, you can now click “Diagnose with Q” to receive a clear, human-readable analysis of the error. Need more help? The “Help me resolve” button provides actionable steps tailored to your specific scenario.
Figure 6: CloudFormation troubleshooting with Q feature
We’ve also improved how change sets handle references. When referenced values are available before deployment, Change sets can now resolve them to their expected values, giving you a more accurate preview of your planned changes.
Figure 7: CloudFormation’s change sets feature
Easy onboarding to Infrastructure-as-Code (IaC)
Eliminate weeks of manual effort with IaC Generator
In February, we launched the CloudFormation IaC Generator, a capability addressing one of our customers’ biggest challenges: onboarding existing cloud resources to CloudFormation. This feature makes it easier to generate CloudFormation templates for existing AWS resources. You can now onboard workloads to IaC in minutes instead of spending weeks writing templates manually.
The IaC generator supports over 600 AWS resource types and provides recommendations for related resources. For instance, when you select an S3 bucket, it automatically suggests including associated bucket policies. You can use the generated templates to import resources into CloudFormation, download them for deployment.
Figure 8: CloudFormation’s IaC Generator
In August, we enhanced the IaC Generator with two improvements. First, we added a graphical summary view that helps you quickly find resources after the account scan completes. Second, we integrated with AWS Infrastructure Composer to visualize your application architecture, making it easier to understand resource relationships and configurations.
Figure 9: IaC generator resource scan
Proactive Control Improvements
In November, we launched major enhancements to CloudFormation Hooks, giving you easier ways to author proactive configuration controls and more points to enforce them with your cloud infrastructure provisioning.
CloudFormation Hooks for stack and change set target invocation points
First, we introduced stack and change set target invocation points for CloudFormation Hooks. This extends Hooks beyond individual resource validation, allowing you to run validation checks against entire templates and examine resource relationships. For example, you can now create hooks that validate architectural patterns across multiple resources or enforce team-specific deployment standards. With the change set invocation point, you can automate your change set reviews and reduce the time needed to resolve compliance issues. Refer to the Hooks developer guide to learn more.
Figure 10: CloudFormation’s Hooks for stack and change set target invocation points
Managed hooks for the CloudFormation Guard domain specific language
We introduced the managed hooks to author configuration controls using CloudFormation Guard domain-specific language. This simplifies the hook creation process—you can now write hooks by providing your Guard rule set stored as an S3 object. This is particularly valuable if you’re already using Guard for static template validation, as you can extend these rules to dynamic checks before deployments. To learn more about the Guard hook, check out the AWS DevOps Blog or refer to the Guard Hook User Guide.
Figure 11: CloudFormation Hooks’ Guard language feature
Figure 12: CloudFormation Hooks’ Lambda function feature
CloudFormation Hooks for AWS Cloud Control API target invocation points
Lastly, we extended Hooks to support AWS Cloud Control API (CCAPI) resource configurations. This means your existing resource Hooks can now evaluate configurations from CCAPI create and update operations, allowing you to standardize your proactive control evaluation regardless your IaC tool. If you’re already using pre-built Lambda or Guard hooks, you simply need to specify “Cloud_Control” as a target in your hooks’ configuration to extend their coverage. Learn the detail of this feature from this AWS DevOps Blog. Figure 13: CloudFormation Hooks for AWS Cloud Control API target invocation point
Additional Platform Improvements
StackSets ListStackSetAutoDeploymentTargets API
In March, we enhanced StackSets with the ListStackSetAutoDeploymentTargets API. This new capability gives you better visibility into your auto-deployment configurations by allowing you to list existing target Organizational Units (OUs) and AWS Regions for a given stack set. Instead of logging into individual accounts to understand your deployment scope, you can now get this information in a single API call.
CloudFormation Git sync with request review support
In September, we improved CloudFormation Git sync with pull request workflow support. When you create or update a pull request in a linked repository, CloudFormation automatically posts change set information as PR comments. This integration provides a clear overview of proposed changes within your familiar Git workflow, allowing team members to review infrastructure changes alongside code changes. Visit our user guide and launch blog to learn more.
Figure 14: CloudFormation Git sync with request review support feature
Early 2025 improvements
Reshape your AWS CloudFormation stacks seamlessly with stack refactoring
In February 2025, CloudFormation introduced a new capability called stack refactoring that makes it easy to reorganize cloud resources across your CloudFormation stacks. Stack refactoring enables you to move resources from one stack to another, split monolithic stacks into smaller components, and rename the logical name of resources within a stack. This enables you to adapt your stacks to meet architectural patterns, operational needs, or business requirements. To explore an example scenario, read Introducing AWS CloudFormation Stack Refactoring.
Learn more
Here are some resources to help you get started learning and using CloudFormation to manage your cloud infrastructure:
As we are starting 2025, our focus remains on making infrastructure deployment faster, safer, and more manageable. These enhancements reflect our commitment to solving real customer challenges and improving the CloudFormation experience. We are excited about the roadmap ahead and look forward to bringing you more innovations in 2025.
We encourage you to try these new features and share your feedback. For more detailed information about any of these launches, visit our documentation or check out the AWS DevOps Blog.
Today, we’re announcing the release of the new AWS Cloud Development Kit (CDK) L2 construct for AWS Glue. This construct simplifies the correct configuration of Glue jobs, workflows, and triggers. Reviewing Glue documentation and examples of the valid parameters for each job type and language takes time, and having to rely on synth, deploy, and run-time error handling to verify configuration choices can be a frustrating developer experience. With this new construct, developers can leverage constructors that are specific to job type. The new constructors default to opinionated best-practice configuration and leverage convenience functions that reduce the time to build repeatable ETL solutions. The new Glue CDK L2 construct is available in alpha stage and will be rolled into the core CDK library after stabilization.
Background
The AWS CDK is an open-source software development framework to define cloud infrastructure in code using modern programming languages and provision it through AWS CloudFormation. It uses layering through Constructs to provide different levels of abstraction for using cloud components. Layering ensures that you never have to write too much code or have too little access to resource properties when you deploy your infrastructure as code (IaC) stacks. Layer 1 (L1) constructs map directly to CloudFormation primitives, while Layer 2 (L2) constructs provide helper functions and best practice defaults that improve the developer experience and make it easier to do the right thing.
Defining Glue resources at scale presents several challenges that this L2 construct resolves. First, developers must reference documentation to determine the valid combinations of job type, Glue version, worker type, language versions, and other parameters that are only valid in finite combinations. Additionally, developers must already know or look up the networking constraints for data source connections, and there is ambiguity with how to securely store secrets for JDBC connections. Finally, developers want prescriptive guidance via best practice defaults for throughput parameters like number of workers and batching.
The new Glue L2 construct has convenience methods and constructors that work backwards from common use cases and sets required parameters to defaults that align with recommended best practices for each job type. It also provides customers with a balance between flexibility via optional parameter overrides, and opinionated interfaces that discourages anti-patterns, resulting in reduced time to develop and deploy new resources.
Using the L2
The L2 construct only exposes the parameters that apply to each job and workflow type through their respective constructors. For instance, Python and Ray jobs don’t need to configure the Scala job parameters of extra jar files or a main class from which to start execution. Using the construct, the language and job specific configuration elements are in their own interface definitions for properties, keeping the configuration elements that are common across all jobs like job name, job description, and CloudWatch metrics in the parent job class. It also aligns to the same best practice defaults that the Glue Studio console experience provides, which provides a consistent experience when using console and CDK.
Figure 1 – Hierarchy of Glue job type and language support showing the configuration options
The new construct automatically sets the Glue job type that maps to the constructor the developer used to create the job, and sets the Glue version and language version to the latest supported version for the service. In addition, it sets defaults for parameters that the developer would otherwise have to experiment with such as timeout, number of workers, and max retries.
These interfaces, inheritance, and default values allow to us to create constructors that only require a few parameters to create a complete job, as opposed to the nearly 2 dozen options and even more numerous permutations of the valid and invalid combinations that could be made experimenting with the L1 construct. Enforcing values via interfaces means that the developer gets fail-fast feedback on the correct allowed configuration before synth or deploy time via autocomplete and Q Developer code recommendations as well.
While the construct is in the alpha stage, you’ll need to follow the process for using experimental construct libraries. After stabilization, the library will be rolled into the core CDK library and you can use it just like any other L1 or L2 construct.
Creating a new Python Spark ETL Glue Job in Typescript
The following example shows how to create a new Python Spark ETL Glue Job in Typescript.
The new construct also simplifies the way workflows and triggers are provisioned, leveraging the existing Schedule class to define the correct frequency of execution. It also provides helper functions to add different types of triggers.
The following example shows how to create a On-Demand Workflow Trigger.
myWorkflow = new glue.Workflow(this, "GlueWorkflow", {
name: "MyOnDemandWorkflow";
description: "New On Demand Workflow";
});
myWorkflow.addOnDemandTrigger(this, 'TriggerJobOnDemand', {
actions: [{ glue_job }]
});
The new construct maintained its existing functionality for Connections and Databases, Tables, and Job Run Queueing, since they’re consistent for all job types. It also enables CloudWatch logging and (if applicable) SparkUI logging by default, so using this construct will leverage those best-practice observability features unless you explicitly turn them off.
If you’re currently using an older version of Glue or of the language your Glue job supports, we recommend that you consider using this construct launch as part of an upgrade plan to take advantage of the performance, functionality, and language security enhancements for newer versions. If you prefer to stay on older versions of the service or language, we recommend you migrate to the L1 construct which isn’t opinionated about enforcing the latest versions by default.
Conclusion
The AWS CDK Glue L2 construct will migrate from its current alpha state to the AWS CDK core library after it completes the stabilization phase, which usually takes 3 months. For more details on the new Glue L2 construct and examples of its use, see the Glue CDK documentation. As always, if you have any feedback on the new construct or the CDK in general, you may create a GitHub issue on AWS CDK GitHub repository.
If you’re new to AWS CDK and want to get started, we highly recommend checking out the CDK documentation and the CDK workshop.
The collective thoughts of the interwebz
Manage Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional
Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.