Tag Archives: AWS Cloud Development Kit

Node.js 22 runtime now available in AWS Lambda

2024-11-22 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/node-js-22-runtime-now-available-in-aws-lambda/

This post is written by Julian Wood, Principal Developer Advocate, and Andrea Amorosi, Senior SA Engineer.

You can now develop AWS Lambda functions using the Node.js 22 runtime, which is in active LTS status and ready for production use. Node.js 22 includes a number of additions to the language, including require()ing ES modules, as well as changes to the runtime implementation and the standard library. With this release, Node.js developers can take advantage of these new features and enhancements when creating serverless applications on Lambda.

You can develop Node.js 22 Lambda functions using the AWS Management Console, AWS Command Line Interface (AWS CLI), AWS SDK for JavaScript, AWS Serverless Application Model (AWS SAM), AWS Cloud Development Kit (AWS CDK), and other infrastructure as code tools.

To use this new version, specify a runtime parameter value of nodejs22.x when creating or updating functions or by using the appropriate container base image.

You can use Node.js 22 with Powertools for AWS Lambda (TypeScript), a developer toolkit to implement serverless best practices and increase developer velocity. Powertools for AWS Lambda includes libraries to support common tasks such as observability, AWS Systems Manager Parameter Store integration, idempotency, batch processing, and more. You can also use Node.js 22 with Lambda@Edge to customize low-latency content delivered through Amazon CloudFront.

This blog post highlights important changes to the Node.js runtime, notable Node.js language updates, and how you can use the new Node.js 22 runtime in your serverless applications.

Node.js 22 language updates

Node.js 22 introduces several language updates and features that enhance developer productivity and improve application performance.

This release adds support for loading ECMAScript modules (ESM) using require(). You can enable this feature using the --experimental-require-module flag by configuring the NODE_OPTIONS environment variable. require() support for synchronous ESM graphs bridges the gap between CommonJS and ESM, providing more flexibility in module loading. It is important to note that this feature is currently experimental and may change in future releases.

WebSocket support which was previously available behind the --experimental-websocket flag is now enabled by default in Node.js 22. This brings a browser-compatible WebSocket client implementation to Node.js with no need for external dependencies. Native support simplifies building real-time applications and enhances the overall WebSocket experience in Node.js environments.

The new runtime also includes performance improvements to AbortSignal creation. This makes network operations faster and more efficient for the Fetch API and test runner. The Fetch API is also now considered stable in Node.js 22.

For TypeScript users, Node.js 22 introduces experimental support for transforming TypeScript-only syntax into JavaScript code. By using the --experimental-transform-types flag, you can enable this feature to support TypeScript syntax such as Enum and namespace directly. While you can enable the feature in Lambda, your function entrypoint (i.e. index.mjs or app.cjs) cannot currently be written using TypeScript as the runtime expects a file with a JavaScript extension. You can use TypeScript for any other module imported within your codebase.

For a detailed overview of Node.js 22 language features, see the Node.js 22 release blog post and the Node.js 22 changelog.

Experimental features that are unavailable

Node.js 22 includes an experimental feature to detect the module syntax automatically (CommonJS or ES Modules). This feature must be enabled when the Node.js runtime is compiled. Since the Lambda-provided Node.js 22 runtime is intended for production workloads, this experimental feature is not enabled in the Lambda build and cannot be enabled via an execution-time flag. To use this feature in Lambda, you need to deploy your own Node.js runtime using a custom runtime or container image with experimental module syntax detection enabled.

Performance considerations

At launch, new Lambda runtimes receive less usage than existing established runtimes. This can result in longer cold start times due to reduced cache residency within internal Lambda sub-systems. Cold start times typically improve in the weeks following launch as usage increases. As a result, AWS recommends not drawing conclusions from side-by-side performance comparisons with other Lambda runtimes until the performance has stabilized. Since performance is highly dependent on workload, customers with performance-sensitive workloads should conduct their own testing, instead of relying on generic test benchmarks.

Builders should continue to measure and test function performance and optimize function code and configuration for any impact. To learn more about how to optimize Node.js performance in Lambda, see Performance optimization in the Lambda Operator Guide, and our blog post Optimizing Node.js dependencies in AWS Lambda.

Migration from earlier Node.js runtimes

AWS SDK for JavaScript

Up until Node.js 16, Lambda’s Node.js runtimes included the AWS SDK for JavaScript version 2. This has since been superseded by the AWS SDK for JavaScript version 3, which was released in December 2022. Starting with Node.js 18, and continuing with Node.js 22, the Lambda Node.js runtimes include version 3. When upgrading from Node.js 16 or earlier runtimes and using the included version 2, you must upgrade your code to use the v3 SDK.

For optimal performance, and to have full control over your code dependencies, we recommend bundling and minifying the AWS SDK in your deployment package, rather than using the SDK included in the runtime. For more information, see Optimizing Node.js dependencies in AWS Lambda.

Amazon Linux 2023

The Node.js 22 runtime is based on the provided.al2023 runtime, which is based on the Amazon Linux 2023 minimal container image. The Amazon Linux 2023 minimal image uses microdnf as a package manager, symlinked as dnf. This replaces the yum package manager used in Node.js 18 and earlier AL2-based images. If you deploy your Lambda function as a container image, you must update your Dockerfile to use dnf instead of yum when upgrading to the Node.js 22 base image from Node.js 18 or earlier.

Additionally AL2 includes curl and gnupg2 as their minimal versions curl-minimal and gnupg2-minimal.

Learn more about the provided.al2023 runtime in the blog post Introducing the Amazon Linux 2023 runtime for AWS Lambda and the Amazon Linux 2023 launch blog post.

Using the Node.js 22 runtime in AWS Lambda

AWS Management Console

To use the Node.js 22 runtime to develop your Lambda functions, specify a runtime parameter value Node.js 22.x when creating or updating a function. The Node.js 22 runtime version is now available in the Runtime dropdown on the Create function page in the AWS Lambda console:

Creating Node.js function in AWS Management Console

To update an existing Lambda function to Node.js 22, navigate to the function in the Lambda console, then choose Node.js 22.x in the Runtime settings panel. The new version of Node.js is available in the Runtime dropdown:

Changing a function to Node.js 22

AWS Lambda container image

Change the Node.js base image version by modifying the FROM statement in your Dockerfile.

FROM public.ecr.aws/lambda/nodejs:22
# Copy function code
COPY lambda_handler.xx ${LAMBDA_TASK_ROOT}

AWS Serverless Application Model (AWS SAM)

In AWS SAM, set the Runtime attribute to node22.x to use this version:

AWSTemplateFormatVersion: "2210-09-09"
Transform: AWS::Serverless-2216-10-31

Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: lambda_function.lambda_handler
      Runtime: nodejs22.x
      CodeUri: my_function/.
      Description: My Node.js Lambda Function

When you add function code directly in an AWS SAM or AWS CloudFormation template as an inline function, it is seen as common.js.

AWS SAM supports generating this template with Node.js 22 for new serverless applications using the sam init command. Refer to the AWS SAM documentation.

AWS Cloud Development Kit (AWS CDK)

In AWS CDK, set the runtime attribute to Runtime.NODEJS_22_X to use this version.

import * as cdk from "aws-cdk-lib";
import * as lambda from "aws-cdk-lib/aws-lambda";
import * as path from "path";
import { Construct } from "constructs";

export class CdkStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // The code that defines your stack goes here

    // The Node.js 22 enabled Lambda Function
    const lambdaFunction = new lambda.Function(this, "node22LambdaFunction", {
      runtime: lambda.Runtime.NODEJS_22_X,
      code: lambda.Code.fromAsset(path.join(__dirname, "/../lambda")),
      handler: "index.handler",
    });
  }
}

Conclusion

Lambda now supports Node.js 22 as a managed language runtime. This release uses the Amazon Linux 2023 OS as well as other improvements detailed in this blog post.

You can build and deploy functions using Node.js 22 using the AWS Management Console, AWS CLI, AWS SDK, AWS SAM, AWS CDK, or your choice of infrastructure as code tool. You can also use the Node.js 22 container base image if you prefer to build and deploy your functions using container images.

The Node.js 22 runtime helps developers build more efficient, powerful, and scalable serverless applications. Read about the Node.js programming model in the Lambda documentation to learn more about writing functions in Node.js 22. Try the Node.js runtime in Lambda today.

For more serverless learning resources, visit Serverless Land.

Python 3.13 runtime now available in AWS Lambda

2024-11-14 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/python-3-13-runtime-now-available-in-aws-lambda/

This post is written by Julian Wood, Principal Developer Advocate, and Leandro Cavalcante Damascena, Senior Solutions Architect Engineer.

AWS Lambda now supports Python 3.13 as both a managed runtime and container base image. Python is a popular language for building serverless applications. The Python 3.13 release includes a number of changes to the language, the implementation, and the standard library. With this release, Python developers can now take advantage of these new features and enhancements when creating serverless applications on Lambda. Python 3.13 also includes experimental support for a number of features, which are not available in Lambda.

You can develop Lambda functions in Python 3.13 using the AWS Management Console, AWS Command Line Interface (AWS CLI), AWS SDK for Python (Boto3), AWS Serverless Application Model (AWS SAM), AWS Cloud Development Kit (AWS CDK), and other infrastructure as code tools.

The Python 3.13 runtime allows you to implement serverless best practices using Powertools for AWS Lambda (Python). This is a developer toolkit that includes observability, batch processing, AWS Systems Manager Parameter Store integration, idempotency, feature flags, Amazon CloudWatch Metrics, structured logging, and more.

Lambda@Edge allows you to use Python 3.13 to customize low-latency content delivered through Amazon CloudFront.

Lambda runtime changes

Amazon Linux 2023

As with the Python 3.12 runtime, the Python 3.13 runtime is based on the provided.al2023 runtime, which is based on the Amazon Linux 2023 minimal container image. The Amazon Linux 2023 minimal image uses microdnf as a package manager, symlinked as dnf. This replaces the yum package manager used in Python 3.11 and earlier AL2-based images. If you deploy your Lambda functions as container images, you must update your Dockerfiles to use dnf instead of yum when upgrading to the Python 3.13 base image from Python 3.11 or earlier base images.

Learn more about the provided.al2023 runtime in the blog post Introducing the Amazon Linux 2023 runtime for AWS Lambda and the Amazon Linux 2023 launch blog post.

New Python features

Data model improvements

There are improvements to the Python data model. __static_attributes__ stores the names of attributes accessed through self.X in any function in a class body.

Typing changes

With the implementation of PEP 702, you can now use the new warnings.deprecated() decorator to mark deprecations in the type system and at runtime.

Python 3.13 also adds PEP 696, which introduces default values for type parameters. This enhancement allows developers to specify default types for TypeVar, ParamSpec, and TypeVarTuple when omitting type arguments.

Standard library

The standard library includes improvements for a new PythonFinalizationError exception, raised when an operation is blocked during finalization.

The new functions base64.z85encode() and base64.z85decode() support encoding and decoding Z85 data.

The copy module now has a copy.replace() function, with support for many built-in types and any class defining the __replace__() method.

The os module has a suite of new functions for working with Linux’s timer notification file descriptors.

There is a change to the defined mutation semantics for locals().

Experimental features that are unavailable

Python 3.13 includes a number of experimental features which are not enabled for the Lambda managed runtime or base images. These features must be enabled when the Python runtime is compiled. Since the Lambda-provided Python 3.13 runtime is intended for production workloads, these features are not enabled in the Lambda build of Python 3.13 and cannot be enabled via an execution-time flag. To use these features in Lambda, you can deploy your own Python runtime using a custom runtime or container image with these features enabled.

Free-threaded CPython

You can not enable the experimental support for running Python in a free-threaded mode, with the global interpreter lock (GIL) disabled.

Just-in-time (JIT) compiler

You can also not enable the experimental JIT compiler within the Lambda managed runtime or base image.

Performance considerations

Using Python 3.13 in Lambda

AWS Management Console

To use the Python 3.13 runtime to develop your Lambda functions, specify a runtime parameter value Python 3.13 when creating or updating a function. The Python 3.13 version is available in the Runtime dropdown in the Create Function page:

Creating Python function in AWS Management Console

To update an existing Lambda function to Python 3.13, navigate to the function in the Lambda console and choose Edit in the Runtime settings panel. The new version of Python is available in the Runtime dropdown.

Changing a function to Python 3.13

You may need to check your code and dependencies for compatibility with Python 3.13, and update as necessary.

AWS Lambda container image

Change the Python base image version by modifying the FROM statement in your Dockerfile

FROM public.ecr.aws/lambda/python:3.13
# Copy function code
COPY lambda_handler.py ${LAMBDA_TASK_ROOT}

AWS Serverless Application Model (AWS SAM)

In AWS SAM set the Runtime attribute to python3.13 to use this version.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Simple Lambda Function
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Description: My Python Lambda Function
      CodeUri: my_function/
      Handler: lambda_function.lambda_handler
      Runtime: python3.13

AWS SAM supports generating this template with Python 3.13 for new serverless applications using the sam init command. Refer to the AWS SAM documentation.

AWS Cloud Development Kit (AWS CDK)

In AWS CDK, set the runtime attribute to Runtime.PYTHON_3_13 to use this version. In Python CDK:

from constructs import Construct 
from aws_cdk import ( App, Stack, aws_lambda as _lambda )

class SampleLambdaStack(Stack):
    def __init__(self, scope: Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)
        
        base_lambda = _lambda.Function(self, 'python313LambdaFunction', 
                                       handler='lambda_handler.handler', 
                                    runtime=_lambda.Runtime.PYTHON_3_13, 
                                 code=_lambda.Code.from_asset('lambda'))

In TypeScript CDK:

import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda'
import * as path from 'path';
import { Construct } from 'constructs';

export class SampleLambdaStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // The code that defines your stack goes here

    // The python3.13 enabled Lambda Function
    const lambdaFunction = new lambda.Function(this, 'python313LambdaFunction', {
      runtime: lambda.Runtime.PYTHON_3_13,
      memorySize: 512,
      code: lambda.Code.fromAsset(path.join(__dirname, '/../lambda')),
      handler: 'lambda_handler.handler'
    })
  }
}

Conclusion

Lambda now supports Python 3.13 as a managed language runtime. This release uses the Amazon Linux 2023 OS and includes Python 3.13 language additions including data model improvements, typing changes, and updates to the standard library. This release does not support the experimental option to disable the global interpreter lock or the experimental JIT compiler.

You can build and deploy functions using Python 3.13 using the AWS Management Console, AWS CLI, AWS SDK, AWS SAM, AWS CDK, or your choice of infrastructure as code tool. You can also use the Python 3.13 container base image if you prefer to build and deploy your functions using container images.

Python 3.13 runtime support helps developers to build more efficient, powerful, and scalable serverless applications. Try the Python 3.13 runtime in Lambda today and experience the benefits of this updated language version.

For more serverless learning resources, visit Serverless Land.

A new AWS CDK L2 construct for Amazon CloudFront Origin Access Control (OAC)

2024-10-31 Josh DeMuth

Post Syndicated from Josh DeMuth original https://aws.amazon.com/blogs/devops/a-new-aws-cdk-l2-construct-for-amazon-cloudfront-origin-access-control-oac/

Recently, we launched a new AWS Cloud Development Kit (CDK) L2 construct for Amazon CloudFront Origin Access Control (OAC). This construct simplifies the configuration and maintenance of securing Amazon Simple Storage Service (Amazon S3) CloudFront origins with CDK. Launched in 2022, OAC is the recommended way to secure your CloudFront distributions due to additional security features compared to the legacy Origin Access Identity (OAI). This new construct makes it easier for you to use the latest origin access best practices to build and manage your CloudFront distributions.

CDK is an open-source software development framework for defining cloud infrastructure in code and provisioning it through AWS CloudFormation. A primary part of CDK is the AWS CDK Construct Library which is a collection of pre-written constructs. Constructs are the basic building blocks of CDK applications. They help reduce the complexity required to define and integrate AWS services together.

There are different levels of constructs, starting with Level 1 (L1) which map directly to a single CloudFormation resource and offer no abstraction. L1 constructs are auto-generated, which means you can build any CloudFormation resource using CDK. The power of the CDK starts with Level 2 (L2) and higher constructs. L2 constructs, also known as curated constructs, are developed by the CDK team and provide a higher-level abstraction through an intuitive intent-based API. You can read more about constructs and their benefits in the CDK user guide.

In this post we’ll explore:

The reasoning behind the creation of a new L2 construct for OAC
How to use the new OAC construct
How to migrate from the legacy OAI construct to the new OAC construct

Background

Amazon CloudFront is a global content delivery network that reduces latency by delivering data to viewers anywhere in the world. CloudFront can connect to different types of locations or origins, such as S3, AWS Lambda function URLs, and custom origins. A full list of supported origins can be found in the CloudFront user guide.

At launch, the new L2 construct supports OAC with S3 origins. Using OAC with S3 allows you to keep your S3 bucket private, yet accessible, through CloudFront. This forces users to access content only through CloudFront where other security features can be applied, like AWS WAF.

There are two ways to restrict buckets to only CloudFront, using OAI (legacy) or OAC (recommended). Both OAI and OAC allow you to secure your buckets, but OAC offers additional benefits, including support for:

Any new AWS Regions launched after December 2022
Amazon S3 server-side encryption with AWS Key Management Service (SSE-KMS)
Dynamic requests (PUT and DELETE) to Amazon S3
Enhanced security practices like short term credentials, frequent credential rotations, and resource-based policies

Prior to this release, customers had to piece together L1 constructs as well as use escape hatches in order to implement OAC.

The introduction of the new L2 construct simplifies the process by enhancing abstraction and reducing complexity. It makes it easy to use OAC while still offering the flexibility to customize all existing properties available by building with L1’s.

Let’s see the construct in action!

Using the L2

We modified the existing CloudFront Origins L2 to add OAC support. With this change, the S3Origin class has been deprecated in favor of S3BucketOrigin for standard S3 origins, and S3StaticWebsiteOrigin for static website S3 origins.

Using OAC with a standard S3 origin is as simple as passing your bucket as a parameter to the new S3BucketOrigin class using the withOriginAccessControl method.

In the example below, we define a private bucket and let the new L2 construct handle all the OAC configuration for us.

const s3bucket = new s3.Bucket(this, "myBucket", {
    bucketName: `${cdk.Stack.of(this).stackName.toLowerCase()}-oacbucket`,
    blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
    accessControl: s3.BucketAccessControl.PRIVATE,
    enforceSSL: true,
});

const distribution = new cloudfront.Distribution(this, "myDist", {
    defaultBehavior: {
        origin: origins.S3BucketOrigin.withOriginAccessControl(s3bucket),
    }
});

This code defines a S3 bucket configured to block all public access, enforces SSL, and uses private access control. It also defines a CloudFront distribution with the S3 bucket as its origin using the default OAC settings from the OAC construct. By default, the signing behavior is set to “always” and the signing protocol to “sigv4” as is shown below:

Showing the default signing behavior settings for Amazon CloudFront when using the default properties for the L2 construct

Figure 1 – Default OAC Settings for the S3 Origin

In typical CDK fashion, the L2 provides sane defaults out-of-the-box but allows you to customize. Some examples of customizing with the L2, include:

Changing the signing behavior
Granting permissions for dynamic requests (write/delete access)
Migrating to the new construct but continuing to use OAI

S3 Origin with Customer Managed KMS

It is a recommended security best practice to encrypt S3 objects at rest. As detailed in the S3 user guide, using SSE-KMS gives you additional flexibility to meet encryption-related compliance requirements.

If using SSE-KMS, CloudFront must have permission to at least decrypt objects using your AWS KMS key. With the new changes, simply configure your bucket to use an AWS KMS key and the construct will take care of the permissions updates.

The following example shows how to create an SSE-KMS encrypted S3 bucket and use it as a CloudFront origin with OAC:

Create an AWS KMS Key.
Create the S3 Bucket with the encryptionKey property set to the AWS KMS key and encryption as KMS.
Create the CloudFront distribution and use the new S3BucketOrigin class with OAC.

const kmsKey = new kms.Key(this, "myKey");

const myBucket = new s3.Bucket(this, 'myEncryptedBucket', {
  encryption: s3.BucketEncryption.KMS,
  encryptionKey: kmsKey,
  objectOwnership: s3.ObjectOwnership.BUCKET_OWNER_ENFORCED,
}); 

new cloudfront.Distribution(this, 'myDist', {
  defaultBehavior: { 
    origin: origins.S3BucketOrigin.withOriginAccessControl(myBucket) 
  },
});

Due to circular dependencies between the bucket, the KMS key, and the CloudFront distribution, when we synthesize the above code, a warning message similar to the following may appear:

To avoid a circular dependency between the KMS key, Bucket, and Distribution during the initial deployment, a wildcard is used in the Key policy condition to match all Distribution IDs.

After the initial deployment, it is recommended to further restrict the policy to adhere to security best practices. Here is an example of how to use an escape hatch to update the policy:

Note: update the existing KMS Key policy to include the statements in scopedDownKeyPolicy

const scopedDownKeyPolicy = {
    Version: "2012-10-17",
    Statement: [
        {
            Effect: "Allow",
            Principal: {
                AWS: `arn:aws:iam::${this.account}:root`,
            },
            Action: "kms:*",
            Resource: "*",
        },
        {
            Effect: "Allow",
            Principal: {
                Service: "cloudfront.amazonaws.com",
            },
            Action: ["kms:Decrypt", "kms:Encrypt", "kms:GenerateDataKey*"],
            Resource: "*",
            Condition: {
                StringEquals: {
                    "AWS:SourceArn": `arn:aws:cloudfront::${this.account}:distribution/<Distribution ID>`//replace <Distribution ID> with the ID of the deployed CloudFront Distribution 
                },
            },
        },
    ],
};

const cfnKey = kmsKey.node.defaultChild as kms.CfnKey;
cfnKey.addOverride("Properties.KeyPolicy", scopedDownKeyPolicy);

For detailed instructions on how to update the AWS KMS key policy, please refer to the “Scoping down the key policy” section in the CDK API docs.

Considerations when migrating to the new construct

If you have an existing OAI implementation using the now-deprecated S3Origin class, switching to OAC has potential for application downtime. CloudFront could temporarily lose access to the S3 bucket while the CloudFormation is deploying.

To avoid downtime, it is recommended to perform the upgrade across multiple deployments. This will explicitly give CloudFront permissions to both OAC and OAI in the S3 bucket policy before the migration is performed.

At a high level, the three steps are:

Deployment 1: update the bucket policy to explicitly allow CloudFront access
Deployment 2: switch to the new construct
Deployment 3: optionally remove the code that updated the bucket policy in step 1

For detailed instructions, see the Migrating from OAI to OAC section of the CDK API docs.

Conclusion

In this post, we introduced the new AWS CDK L2 construct for Amazon CloudFront Origin Access Control (OAC), highlighting the advantages of using OAC instead of OAI to secure your Amazon S3 CloudFront origins. We showcased practical implementations of the new construct, focusing on using defaults of the L2 construct along with how to customize for your use case.

To summarize, the new L2 construct and OAC offers these benefits:

Simplified configuration: the L2 construct simplifies the process of configuring OAC in your CDK application to set up secure access controls between CloudFront and S3 buckets
Using SSE-KMS encryption: the L2 construct will automatically add the IAM policy statement to the AWS KMS key to allow access to OAC
Ability to customize: the L2 construct offers properties to override defaults, like changing the signing behavior
Easily migrate from OAI to OAC: the L2 construct offers options to migrate from OAI to OAC based on your application’s downtime tolerance

At launch, the construct supports an Amazon S3 origin. If there is a particular origin that you are looking to see added to the construct library for OAC, please submit a feature request issue in the aws-cdk GitHub repo. For example, if you are interested in Lambda support for OAC, add your feedback to this feature request.

If you’re new to CDK and want to get started, we highly recommend checking out the CDK documentation and the CDK workshop.

Diving Deeper into Projen: Exploring Advanced Features

2024-10-29 Michael Tran

Post Syndicated from Michael Tran original https://aws.amazon.com/blogs/devops/diving-deeper-into-projen-exploring-advanced-features/

We will be highlighting Projen’s powerful features that cater to various aspects of project management and development. We’ll examine how Projen enhances polyglot programming within Amazon Web Services (AWS) Cloud Development Kit constructs. We’ll also touch on its built-in support for common development tools and practices.

In our previous blog, we introduced you to the basics of getting started with Projen. Projen is a powerful project generator that simplifies the management of complex software configurations. In our prior blog, we discussed developing a new AWS cloud development kit (CDK) construct library project. For consistency, we will continue using this construct library project as our example while exploring linting, dependency management, and test coverage. It’s important to note that these practices are equally applicable to CDK applications and other project types.

AWS CDK Polyglot Construct Library

The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework that allows developers to define cloud infrastructure using familiar programming languages. In a CDK application, constructs serve as the foundational elements, allowing developers to represent either a single AWS resource or a complex combination of resources. These constructs are not only reusable but can be incorporated into other AWS CDK projects, promoting efficient and scalable development practices.

Projen and Polyglot Programming

Projen leverages the power of the JSII library, enabling developers to write constructs once and generate equivalent constructs across multiple programming languages. This feature streamlines the development process, especially when working with teams that have expertise in different languages.

Automated Publishing with Projen

With its publisher module, Projen automates the distribution of c
ructs to various package managers. This process can be integrated into a GitHub workflow, such as a build job, which triggers the publication of the library to the designated package managers.

Starting with Projen

Initiating an AWS CDK construct library project is straightforward through the Projen command npx projen new <project_type>. By executing the command npx projen new awscdk-construct, you initialize a new project complete with a projenrc file. This file contains the essential configuration for a CDK construct library, setting the stage for further customization and development.

import { awscdk } from 'projen';
const project = new awscdk.AwsCdkConstructLibrary({
  author: 'github username',
  authorAddress: 'github email',
  cdkVersion: '2.1.0',
  defaultReleaseBranch: 'main',
  jsiiVersion: '~5.0.0',
  name: 'cdkconstruct',
  projenrcTs: true,
  repositoryUrl: 'https://github.com/*****/cdkconstruct.git',

  // deps: [],                /* Runtime dependencies of this module. */
  // description: undefined,  /* The description is just a string that helps people understand the purpose of the package. */
  // devDeps: [],             /* Build dependencies for this module. */
  // packageName: undefined,  /* The "name" in package.json. */
});
project.synth();

A release.yml file is generated by projen under the github>workflow directory. This file has the details of the public registry where the construct needs to be published. By default, it will add the details for npm.

release_npm:
    name: Publish to npm

The construct can be developed in typescript under src/main.ts, our previous blog shows how to create one. If the construct needs to be published to other public registries (such as Maven for java, Pypi for python), then a projenrc file can be updated to synthesize a new release.yml file.

For example, to publish a construct developed in typescript to Maven (so that it can be used in a java application) add publishToMaven API to the projenrc file.

const project = new awscdk.AwsCdkConstructLibrary({
  author: 'github username',
  authorAddress: 'github email',
  cdkVersion: '2.1.0',
  defaultReleaseBranch: 'main',
  jsiiVersion: '~5.0.0',
  name: 'cdkconstruct',
  projenrcTs: true,
  repositoryUrl: 'https://github.com/*****/cdkconstruct.git',
  publishToMaven: {
    javaPackage: 'com.cdk.hello',
    mavenArtifactId: 'cdk-construct-jsii',
    mavenGroupId: 'com.cdk.hello',
    mavenServerId: 'github',
    mavenRepositoryUrl: 'https://maven.pkg.github.com/example/hello-jsii',
  },
});

Run npx projen and the release.yml will be updated with Maven central details.

release_maven:
    name: Publish to Maven Central
    needs: release
    ....

Similarly, it can be published to other registries.

publishToPypi: 
publishToMaven:
publishToNuGet:
publishToGo:

This way the construct is built once and published to multiple registries with different programming languages.

Running Projen build runs a variety of processes.

Figure 1: High-level Architecture showing publication to multiple public registries

Linting, Dependency Management & Test Coverage

Projen streamlines the setup process by generating a comprehensive package.json file. This file includes pre-configured dependencies for ESLint and Jest, enabling developers to maintain coding standards and ensure robust test coverage right from the start. ESLint, a widely adopted static code analysis utility, empowers developers to enforce consistent coding practices by analyzing the source code and identifying potential errors, bugs, and stylistic issues. Additionally, Jest equips developers with a comprehensive suite of tools for writing and executing unit tests, facilitating comprehensive test coverage for their codebase. While Projen provides Jest as the default testing framework, it offers developers the flexibility to incorporate alternative testing frameworks based on their project requirements.

Following with the awscdk-construct from the previous section, under test>main.test.ts a default test file is created, which can be updated for writing test cases. A default package.json is generated in the root directory.

{
  "name": "projen_hello",
  "scripts": {
    "build": "npx projen build",
    "bundle": "npx projen bundle",
    "clobber": "npx projen clobber",
    "compile": "npx projen compile",
    "default": "npx projen default",
    "deploy": "npx projen deploy",
    "destroy": "npx projen destroy",
    "diff": "npx projen diff",
    "eject": "npx projen eject",
    "eslint": "npx projen eslint",
    "package": "npx projen package",
    "post-compile": "npx projen post-compile",
    "post-upgrade": "npx projen post-upgrade",
    "pre-compile": "npx projen pre-compile",
    "synth": "npx projen synth",
    "synth:silent": "npx projen synth:silent",
    "test": "npx projen test",
    "test:watch": "npx projen test:watch",
    "upgrade": "npx projen upgrade",
    "watch": "npx projen watch",
    "projen": "npx projen"
  },
  "devDependencies": {
    "@types/jest": "^29.5.4",
    "@types/node": "^16",
    "@typescript-eslint/eslint-plugin": "^6",
    "@typescript-eslint/parser": "^6",
    "aws-cdk": "^2.1.0",
    "esbuild": "^0.19.2",
    "eslint": "^8",
    "eslint-import-resolver-node": "^0.3.9",
    "eslint-import-resolver-typescript": "^3.6.0",
    "eslint-plugin-import": "^2.28.1",
    "jest": "^29.7.0",
    "jest-junit": "^15",
    "npm-check-updates": "^16",
    "projen": "^0.73.17",
    "ts-jest": "^29.1.1",
    "ts-node": "^10.9.1",
    "typescript": "^5.2.2",
    "webpack": "5.88.2"
  },
  "dependencies": {
    "aws-cdk-lib": "^2.1.0",
    "constructs": "^10.0.5"
  },
  "license": "Apache-2.0",
  "version": "0.0.0",
  "jest": {
    "testMatch": [
      "<rootDir>/src/**/__tests__/**/*.ts?(x)",
      "<rootDir>/(test|src)/**/*(*.)@(spec|test).ts?(x)"
    ],
    "clearMocks": true,
    "collectCoverage": true,
    "coverageReporters": [
      "json",
      "lcov",
      "clover",
      "cobertura",
      "text"
    ],
    "coverageDirectory": "coverage",
    "coveragePathIgnorePatterns": [
      "/node_modules/"
    ],
    "testPathIgnorePatterns": [
      "/node_modules/"
    ],
    "watchPathIgnorePatterns": [
      "/node_modules/"
    ],
    "reporters": [
      "default",
      [
        "jest-junit",
        {
          "outputDirectory": "test-reports"
        }
      ]
    ],
    "preset": "ts-jest",
    "globals": {
      "ts-jest": {
        "tsconfig": "tsconfig.dev.json"
      }
    }
  },
  "//": "~~ Generated by projen. To modify, edit .projenrc.ts and run \"npx projen\"."
}

Projen can be extensively configured. For example, if you need to configure webpack as a module bundler, then you need to add a webpack.config.js file and update the projenrc file project.

The other dependencies can be updated in package.json by adding deps in the projenrc.ts file.

const project = new awscdk.AwsCdkTypeScriptApp({
  cdkVersion: '2.1.0',
  defaultReleaseBranch: 'main',
  name: 'projen_hello',
  projenrcTs: true,
  
  deps:[
   "express",
  ],
  
  // add webpack dependencies
  devDeps:[
    "webpack",
    "webpack-cli",
    "ts-loader",
  ]
});
  
// update pre-configured build tasks and execute webpack
project.buildTask.reset
project.buildTask.exec('npx projen');
project.buildTask.exec('npx projen test');
project.buildTask.exec('npx webpack');

Run npx projen build to synthesize a package.json.

Continuous Integration and Continuous Delivery (CI/CD)

When you create a project using Projen, it comes equipped with an automated build process that triggers upon the submission of a pull request. This is one of the key, “out-of-the-box” features that streamlines development workflows.

Projen orchestrates this process through GitHub Actions, utilizing a sequence of tasks predefined in the project’s base ‘Project’ class.

When a build is initiated, it systematically carries out several sub-tasks:

Synthesis: It starts by synthesizing all the project files, ensuring they are up-to-date and correctly configured.
Bundling: Next, it bundles the necessary assets for the project.
Compilation: The project’s code is then compiled.
Testing: Following compilation, Projen runs the suite of tests defined for the project.
Packaging: Finally, it packages everything together, preparing it for deployment or distribution.

Projen manages these steps by auto-generating a build.yml file, which it places within the workflow directory of your project’s structure. This YAML file contains all the instructions for the GitHub Actions to execute the build process.

For instance, when you run the command npx projen new awscdk-app-ts, Projen sets up a TypeScript application for AWS CDK. It automatically creates a ‘build.yml’ file through the default projenrc file, which can be found in the github/workflow folder of your project repository. This automated process is designed to save time and reduce manual errors, making it an essential feature for efficient project management.

 .github       
   workflow    
    build.yml

A Projen build is self-mutating because files generated by Projen are part of the source directory. To ensure that a pull request branch always represents the final state of the repository, you can enable the mutableBuild option in your project configuration (currently only supported for projects derived from NodeProject).

The build process can be customized by adding any task in the project class, which can execute a shell command.

const buildproject = project.addTask('build'); 
buildproject.exec('npm run build');

You can spawn a subtask as well.

const buildproject = project.addTask('world');
buildproject.exec('echo world!');

const testproject = project.addTask('test');
testproject.exec('npm test');
testproject.spawn(buildproject);

The Task also supports the condition option that determines if the condition is true before running the task.

const hello = project.addTask('hello', {
  condition: '[ -n "$CI" ]', // only execute if the CI environment variable is defined
  exec: 'echo running in a CI environment'
});

Releases and Versioning

Projen uses Conventional Commits to generate semantic versioning of the releases automatically. This means that based on the commit message format, it can create the release version automatically.

Initially, the project is released under version 0.0.0. Anything may change at any time and public APIs should not be considered stable. Commits marked as a breaking change will increase the minor version. All other commits will increase the patch version.

You need to manually promote the major version to 1 once your project is considered stable. For major versions 1 and above, if a release includes fix commits only, it will increase the patch version. If a release includes any feat commits, then the new version will be a minor version.

Commit Messages                     Release versions         

feat: <Message>                     1.0.X (Patch)            
fix: <Message>                      1.X.0 (Minor)            
BREAKING CHANGE: <Message>          X.0 (Major)

API Documentation

One of the nice, out-of-the-box features that comes with Projen for AWS CDK constructs is the creation of API documentation for your constructs. By leveraging jsii-docgen, Projen’s build step will generate API documentation (API.md) from the comments in your code.

This feature is powerful for several reasons. Firstly, it ensures that documentation is kept up-to-date with the codebase, as the API documentation is generated directly from the source code comments. This reduces the risk of discrepancies between the code and its documentation, which can lead to misunderstandings and errors in usage.

Secondly, it streamlines the development process by automating a task that is often tedious and time-consuming. Developers can focus more on writing code and less on updating documentation manually.

Thirdly, it promotes better coding practices, as developers are encouraged to write clear and detailed comments in their code. This not only benefits the generation of documentation, but also helps any new developers who may work on the codebase in the future to understand the code more quickly and thoroughly.

Moreover, having readily available and accurate documentation can significantly enhance the developer experience. It makes it more straightforward for users of the CDK constructs to understand the functionality, parameters, return types, and the structure of the code they are working with.

In the context of team collaboration and open-source projects, this feature is especially beneficial. It ensures that anyone who contributes to the codebase is able to generate and view the latest documentation without any additional setup or configuration, facilitating smoother collaboration and integration processes.

Let’s recap all of the features that Projen can introduce into your project right out of the box:

Projen’s automation for linting and testing to maintain high code quality from the beginning.
Automated API documentation feature to keep your project’s documentation synchronized with the latest code changes.
Polyglot capabilities to cater to a diverse development team, ensuring flexibility in language preference.
The publisher module to streamline the release process across multiple package managers, saving time and reducing the scope for human error.
A list of awesome projects developed with Projen for inspiration or use as a template.

Conclusion

As we wrap up our deep dive into some of the advanced features of Projen within AWS CDK, it’s clear that Projen helps alleviate a lot of the pain points of a new greenfield project. By leveraging Projen, developers can navigate the complexities of polyglot programming, automate the mundane tasks of publishing and documentation, and ensure consistent code quality through linting and testing. Projen elevates the development workflow to a level where efficiency and scalability are the norms, not the exception.

What’s more compelling is Projen’s commitment to developer empowerment. Through its automated systems, it encourages developers to adhere to best practices without the overhead of manual enforcement. Its ability to seamlessly integrate with various package managers and generate detailed API documentation from inline comments signifies a leap in developer tooling.

Contact an AWS Representative to know how we can help accelerate your business.

Further Reading

AWS Weekly Roundup: Mithra, Amazon Titan Image Generator v2, AWS GenAI Lofts, and more (August 12, 2024)

2024-08-12 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-mithra-amazon-titan-image-generator-v2-aws-genai-lofts-and-more-august-12-2024/

When Dr. Swami Sivasubramanian, VP of AI and Data, was an intern at Amazon in 2005, Dr. Werner Vogels, CTO of Amazon, was his first manager. Nineteen years later, the two shared a stage at the VivaTech Conference to reflect on Amazon’s history of innovation—from pioneering the pay-as-you-go model with Amazon Web Services (AWS) to transforming customer experiences using “good old-fashioned AI”—as well as what really keeps them up at night in the age of generative artificial intelligence (generative AI).

Asked if competitors ever kept him up at night, Dr. Werner insisted that listening to customer needs—such as guardrails, security, and privacy—and building products based on those needs is what drives success at Amazon. Dr. Swami said he viewed Amazon SageMaker and Amazon Bedrock as prime examples of successful products that have emerged as a result of this customer-first approach. “If you end up chasing your competitors, you are going to end up building what they are building,” he added. “If you actually listen to your customers, you are actually going to lead the way in innovation.” To learn four more lessons on customer-obsessed innovation, visit our AWS Careers blog.

For example, for customer-obsessed security, we build and use Mithra, a powerful neural network model to detect and respond to cyber threats. It analyzes up to 200 trillion internet domain requests daily from the AWS global network, identifying an average of 182,000 new malicious domains with remarkable accuracy. Mithra is just one example of how AWS uses global scale, advanced artificial intelligence and machine learning (AI/ML) technology, and constant innovation to lead the way in cloud security, making the internet safer for everyone. To learn more, visit the blog post of Chief Information Security Officer at Amazon CJ Moses, How AWS tracks the cloud’s biggest security threats and helps shut them down.

Last week’s launches
Here are some launches that got my attention:

Amazon Titan Image Generator v2 in Amazon Bedrock – With the new Amazon Titan Image Generator v2 model, you can guide image creation using a text prompt and reference images, control the color palette of generated images, remove backgrounds, and customize the model to maintain brand style and subject consistency. To learn more, visit my blog post, Amazon Titan Image Generator v2 is now available in Amazon Bedrock.

Regional expansion of Anthropic’s Claude models in Amazon Bedrock – The Claude 3.5 Sonnet, Anthropic’s latest high-performance AI model, is now available in US West (Oregon), Europe (Frankfurt), Asia Pacific (Tokyo), and Asia Pacific (Singapore) Regions in Amazon Bedrock. The Claude 3 Haiku, Anthropic’s compact and affordable AI model, is now available in Asia Pacific (Tokyo) and Asia Pacific (Singapore) Regions in Amazon Bedrock.

Private IPv6 addressing for VPCs and subnets – You can now address private IPv6 for VPCs and subnets with Amazon VPC IP Address Manager (IPAM). Within IPAM, you can configure private IPv6 addresses in a private scope, provision Unique Local IPv6 Unicast Addresses (ULA) and Global Unicast Addresses (GUA), and use them to create VPCs and subnets for private access. To learn more, visit see the Understanding IPv6 addressing on AWS and designing a scalable addressing plan and VPC documentation,

Up to 30 GiB/s of read throughput in Amazon EFS – We are increasing the read throughput to 30 GiB/s, extending simple, fully elastic, and provisioning-free experience of Amazon EFS to support throughput-intensive AI and ML workloads for model training, inference, financial analytics, and genomic data analysis.

Large language models (LLMs) in Amazon Redshift ML – You can use pre-trained publicly available LLMs in Amazon SageMaker JumpStart as part of Amazon Redshift ML. For example, you can use LLMs to summarize feedback, perform entity extraction, and conduct sentiment analysis on data in your Amazon Redshift table, so you can bring the power of generative AI to your data warehouse.

Data products in Amazon DataZone – You can create data products in Amazon DataZone, which enable the grouping of data assets into well-defined, self-contained packages tailored for specific business use cases. For example, a marketing analysis data product can bundle various data assets such as marketing campaign data, pipeline data, and customer data. To learn more, visit this AWS Big Data blog post.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional news items that you might find interesting:

AWS Goodies by Jeff Barr – Want to discover more exciting news about AWS? Jeff Barr is always in catch-up mode, doing his best to share all of the interesting things that he finds or that are shared with him. You can find his goodies once a week. Follow his LinkedIn page.

AWS and Multicloud – You might have missed a great article about the existing capabilities AWS has and the continued enhancements we’ve made in multicloud environments. In the post, Jeff covers the AWS approach to multicloud, provides you with some real-world examples, and reviews some of the newest multicloud and hybrid capabilities found across the lineup of AWS services.

Code transformation in Amazon Q Developer – At Amazon, we asked a small team to use Amazon Q Developer Agent for code transformation to migrate more than 30,000 production applications from older Java versions to Java 17. By using Amazon Q Developer to automate these upgrades, the team saved over 4,500 developer years of effort compared to what it would have taken to do all of these upgrades manually and saved the company $260 million in annual savings by moving to the latest Java version.

Contributing to AWS CDK – AWS Cloud Development Kit (AWS CDK) is an open source software development framework to model and provision your cloud application resources using familiar programming languages. Contributing to AWS CDK not only helps you deepen your knowledge of AWS services but also allows you to give back to the community and improve a tool you rely on.

Upcoming AWS events
Check your calendars and sign up for these AWS events:

AWS re:Invent 2024 – Dive into the first-round session catalog. Explore all the different learning opportunities at AWS re:Invent this year and start building your agenda today. You’ll find sessions for all interests and learning styles.

AWS Innovate Migrate, Modernize, Build – Learn about proven strategies and practical steps for effectively migrating workloads to the AWS Cloud, modernizing applications, and building cloud-native and AI-enabled solutions. Don’t miss this opportunity to learn with the experts and unlock the full potential of AWS. Register now for Asia Pacific, Korea, and Japan (September 26).

AWS Summits – The 2024 AWS Summit season is almost wrapping up! Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: São Paulo (August 15), Jakarta (September 5), and Toronto (September 11).

AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: New Zealand (August 15), Colombia (August 24), New York (August 28), Belfast (September 6), and Bay Area (September 13).

AWS GenAI Lofts – Meet AWS AI experts and attend talks, workshops, fireside chats, and Q&As with industry leaders. All lofts are free and are carefully curated to offer something for everyone to help you accelerate your journey with AI. There are lofts scheduled in San Francisco (August 14–September 27), São Paulo (September 2–November 20), London (September 30–October 25), Paris (October 8–November 25), and Seoul (November).

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Channy

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

How to use Amazon Q Developer to deploy a Serverless web application with AWS CDK

2024-08-07 Riya Dani

Post Syndicated from Riya Dani original https://aws.amazon.com/blogs/devops/how-to-use-amazon-q-developer-to-deploy-a-serverless-web-application-with-aws-cdk/

Did you know that Amazon Q Developer, a new type of Generative AI-powered (GenAI) assistant, can help developers and DevOps engineers accelerate Infrastructure as Code (IaC) development using the AWS Cloud Development Kit (CDK)?

IaC is a practice where infrastructure components such as servers, networks, and cloud resources are defined and managed using code. Instead of manually configuring and deploying infrastructure, with IaC, the desired state of the infrastructure is specified in a machine-readable format, like YAML, JSON, or modern programming languages. This allows for consistent, repeatable, and scalable infrastructure management, as changes can be easily tracked, tested, and deployed across different environments. IaC reduces the risk of human errors, increases infrastructure transparency, and enables the application of DevOps principles, such as version control, testing, and automated deployment, to the infrastructure itself.

There are different IaC tools available to manage infrastructure on AWS. To manage infrastructure as code, one needs to understand the DSL (domain-specific language) of each IaC tool and/or construct interface and spend time defining infrastructure components using IaC tools. With the use of Amazon Q Developer, developers can minimize time spent on this undifferentiated task and focus on business problems. In this post, we will go over how Amazon Q Developer can help deploy a fully functional three-tier web application infrastructure on AWS using CDK. AWS CDK is an open-source software development framework to define cloud infrastructure in modern programming languages and provision it through AWS CloudFormation.

Amazon Q Developer is a generative artificial intelligence (AI)-powered conversational assistant that can help you understand, build, extend, and operate AWS applications. You can ask questions about AWS architecture, your AWS resources, best practices, documentation, support, and more. Amazon Q Developer is constantly updating its capabilities so your questions get the most contextually relevant and actionable answers.

In the following sections, we will take a real-world three-tier web application that uses serverless architecture and showcase how you can accelerate AWS CDK code development using Amazon Q Developer as an AI coding companion and thus improve developer productivity.

Prerequisites

To begin using Amazon Q Developer, the following are required:

An AWS Account
An AWS Builder ID or an AWS Identity Center login controlled by your organization
Visual Studio Code or supported JetBrains IDEs
Install AWS CDK
How to set up and chat with Amazon Q Developer

Application Overview

You are a DevOps engineer at a software company and have been tasked with building and launching a new customer-facing web application using a serverless architecture. It will have three tiers, as shown below, consisting of the presentation layer, application layer, and data layer. You have decided to utilize Amazon Q Developer to deploy the application components using AWS CDK.

Three-Tier Web Application Architecture Overview

Figure 1 – Serverless Application Architecture

Accelerating application deployment using Amazon Q Developer as an AI coding companion

Let’s dive into how Amazon Q Developer can be used as an expert companion to accelerate the deployment of the above serverless application resources using AWS CDK.

1. Deploy Presentation Layer Resources

Creating a secured Amazon S3 bucket to host static assets and front it using Amazon CloudFront

When building modern serverless web applications that host large static content, a key architecture consideration is how to efficiently and securely serve static assets such as images, CSS, and JavaScript files. Simply serving these from your application servers can lead to scaling and performance bottlenecks with increased resource utilization (e.g., CPU, I/O, network) on servers. This is where leveraging AWS services like Amazon Simple Storage Service (Amazon S3) and Amazon CloudFront can be a game-changer. By hosting your static content in a secured S3 bucket, you unlock several powerful benefits. First and foremost, you get robust security controls through S3 bucket policies and CloudFront Origin Access Control (OAC) to ensure only authorized access. This is critical for protecting your assets. Secondly, you take load off your application servers by having CloudFront directly serve static assets from its globally distributed edge locations. This improves application performance and reduces operational costs. AWS CDK helps to simplify the infrastructure provisioning by allowing developers to define S3 bucket and CloudFront resource configurations in a modern programming language using CDK constructs that enhance security and include best practices recommendations.

In this application architecture, we will use Amazon Q Developer to develop AWS CDK code to provision presentation layer resources, which include a secured S3 bucket with public access disabled and an Origin Access Control (OAC) that is used to grant CloudFront access to the S3 bucket to securely serve the static assets of the application.

Prompt: Create a cdk stack with python that creates an s3 bucket for cloudfront/s3 static asset, ensure it is secured by using Origin Access Control (OAC)

Using Amazon Q to Generate python CDK code for the presentation layer resources

Developers can customize these configurations using Amazon Q Developer based on your specific security requirements, such as implementing access controls through IAM policies or enabling bucket logging for audit trails. This approach ensures that the S3 bucket is configured securely, aligning with best practices for data protection and access management.

Lets look at an example of adding CloudTrail logging to the S3 bucket:

Prompt: Update the code to include cloudtrail logging to the S3 bucket created

Using Amazon Q Developer to add CloudTrail logging to the S3 bucket

2. Deploy Application Layer Resources

Provision AWS Lambda and Amazon API Gateway to serve end-user requests

Amazon Q Developer makes it easy to provision serverless application backend infrastructure such as AWS Lambda and Amazon API Gateway using AWS CDK. In the above architecture, you can deploy the Lambda function hosting application code, with just a few lines of CDK code along with Lambda configuration such as function name, runtime, handler, timeouts, and environment variables. This Lambda function is fronted using Amazon API Gateway to serve user requests. Anything from a simple micro-service to a complex serverless application can be defined through code in CDK using Amazon Q assistance and deployed repeatedly through CI/CD pipelines. This enables infrastructure automation and consistent governance for applications on AWS.

Prompt: Create a CDK stack that creates a AWS Lambda function that is invoked by Amazon API Gateway

Using Amazon Q Developer to generate CDK code to create an AWS Lambda function that is invoked by Amazon API Gateway

3. Deploy Data Layer Resources

Provision Amazon DynamoDB tables to host application data

By leveraging Amazon Q Developer, we can generate CDK code to provision DynamoDB tables using CDK constructs that offer AWS default best practice recommendations. With Amazon Q Developer, using the CDK construct library, we can define DynamoDB table names, attributes, secondary indexes, encryption, and auto-scaling in just a few lines of CDK code in our programming language of choice. With CDK, this table definition is synthesized into an AWS CloudFormation template that is deployed as a stack to provision the DynamoDB table with all the desired settings. Any data layer resources can be defined this way as Infrastructure as Code (IAC) using Amazon Q Developer. Overall, Amazon Q Developer drastically simplifies deploying managed data backends on AWS through CDK while enforcing best practices around data security, access control, and scalability leveraging CDK constructs.

Prompt: Create a CDK stack of a DynamoDB table with 100 read capacity units and 100 write capacity units.

Using Amazon Q Developer to generate CDK code to create a DynamoDB with 100 read capacity units and 100 write capacity units

4. Monitoring the Application Components

Monitoring using Amazon CloudWatch

Once the application infrastructure stack has been provisioned, it’s important to setup observability to monitor key metrics, detect any issues proactively, and alert operational teams to troubleshoot and fix issues to minimize application downtime. To get started with observability, developers can leverage Amazon CloudWatch, a fully managed monitoring service. With the use of AWS CDK, it is easy to codify CloudWatch components such as dashboards, metrics, log groups, and alarms alongside the application infrastructure and deploy them in an automated and repeatable way leveraging the AWS CDK construct library. Developers can customize these metrics and alarms to meet their workload requirements. All the monitoring configuration gets deployed as part of the infrastructure stack.

Developers can use Amazon Q Developer to assist with setting up application monitoring in AWS using CloudWatch and CDK. With Q, you can describe the resources you want to monitor, such as EC2 instances, Lambda functions, and RDS databases. Q will then generate the necessary CDK code to provision the appropriate CloudWatch alarms, metrics, and dashboards to monitor the resources. By describing what you want to monitor in natural language, Q handles the underlying complexity of generating code.

Prompt: Create a CDK stack of a Cloudwatch event rule to stop web instance EC2 instance every day at 15:00 UTC

Using Amazon Q Developer to generate CDK code to create a CloudWatch event rule to stop an EC2 instance at 15:00 UTC

5. Automate CI/CD of the application

Build a CDK Pipeline for Continuous Integration(CI) and Continuous Deployment(CD) of the Infrastructure

As you iterate on your serverless application, you’ll want a smooth, automated way to reliably deploy infrastructure changes using a CI/CD pipeline. This is where implementing a CDK pipeline, an automated deployment pipeline, becomes useful. As we’ve seen, AWS Cloud Development Kit (CDK) allows you to define your entire multi-tier infrastructure as a reusable, version-controlled code construct. From S3 buckets to CloudFront, API Gateway, Lambda functions, databases and more, all of these can be deployed with IaC. This CI/CD pipeline streamlines the process of deploying both infrastructure and application code, integrating seamlessly with CI/CD best practices.

Here’s how you can leverage Amazon Q Developer to streamline the process of creating a CDK pipeline.

Prompt: Using python, create a CDK pipeline that deploys a three tier serverless application.

Using Amazon Q Developer to generate CDK pipeline using Python

By leveraging Amazon Q Developer in your IDE for CDK pipeline creation, you can speed up adoption of CI/CD best practices, and receive real-time guidance on CDK deployment patterns, making the development process smoother. This CI/CD integration accelerates your CDK development experience, and allows you to focus on building robust and scalable AWS applications.

Conclusion

In this post, you have learned how developers can leverage Amazon Q Developer, a generative-AI powered assistant, as a true expert AI companion in assisting with accelerating Infrastructure As Code (IaC) Development using AWS CDK and seen how to deploy and manage AWS resources of a three-tier serverless application using AWS CDK. In addition to Infrastructure As Code, Amazon Q Developer can be leveraged to accelerate software development, minimize time spent on undifferentiated coding tasks, and help with troubleshooting, so that developers can focus on creative business problems to delight end-users. See the Amazon Q Developer documentation to get started.

Happy Building with Amazon Q Developer!

Balance deployment speed and stability with DORA metrics

2024-07-31 Rostislav Markov

Post Syndicated from Rostislav Markov original https://aws.amazon.com/blogs/devops/balance-deployment-speed-and-stability-with-dora-metrics/

Development teams adopt DevOps practices to increase the speed and quality of their software delivery. The DevOps Research and Assessment (DORA) metrics provide a popular method to measure progress towards that outcome. Using four key metrics, senior leaders can assess the current state of team maturity and address areas of optimization.

This blog post shows you how to make use of DORA metrics for your Amazon Web Services (AWS) environments. We share a sample solution which allows you to bootstrap automatic metric collection in your AWS accounts.

Benefits of collecting DORA metrics

DORA metrics offer insights into your development teams’ performance and capacity by measuring qualitative aspects of deployment speed and stability. They also indicate the teams’ ability to adapt by measuring the average time to recover from failure. This helps product owners in defining work priorities, establishing transparency on team maturity, and developing a realistic workload schedule. The metrics are appropriate for communication with senior leadership. They help commit leadership support to resolve systemic issues inhibiting team satisfaction and user experience.

Use case

This solution is applicable to the following use case:

Development teams have a multi-account AWS setup including a tooling account where the CI/CD tools are hosted, and an operations account for log aggregation and visualization.
Developers use GitHub code repositories and AWS CodePipeline to promote code changes across application environment accounts.
Tooling, operations, and application environment accounts are member accounts in AWS Control Tower or workload accounts in the Landing Zone Accelerator on AWS solution.
Service impairment resulting from system change is logged as OpsItem in AWS Systems Manager OpsCenter.

Overview of solution

The four key DORA metrics

The ‘four keys’ measure team performance and ability to react to problems:

Deployment Frequency measures the frequency of successful change releases in your production environment.
Lead Time For Changes measures the average time for committed code to reach production.
Change Failure Rate measures how often changes in production lead to service incidents/failures, and is complementary to Mean Time Between Failure.
Mean Time To Recovery measures the average time from service interruption to full recovery.

The first two metrics focus on deployment speed, while the other two indicate deployment stability (Figure 1). We recommend organizations to set their own goals (that is, DORA metric targets) based on service criticality and customer needs. For a discussion of prior DORA benchmark data and what it reveals about the performance of development teams, consult How DORA Metrics Can Measure and Improve Performance.

Figure 1. Overview of DORA metrics

Consult the GitHub code repository Balance deployment speed and stability with DORA metrics for a detailed description of the metric calculation logic. Any modifications to this logic should be made carefully.

For example, the Change Failure Rate focuses on changes that impair the production system. Limiting the calculation to tags (such as hotfixes) on pull requests would exclude issues related to the build process. It’s important to match system change records that lead to actual impairments in production. Limiting the calculation to the number of failed deployments from the deployment pipeline only considers deployments that didn’t reach production. We use AWS Systems Manager OpsCenter as the system of records for change-related outages, rather than relying solely on data from CI/CD tools.

Similarly, Mean Time To Recovery measures the duration from a service impairment in production to a successful pipeline run. We encourage teams to track both pipeline status and recovery time, as frequent pipeline failure can indicate insufficient local testing and potential pipeline engineering issues.

Gathering DORA events

Our metric calculation process runs in four steps:

In the tooling account, we send events from CodePipeline to the default event bus of Amazon EventBridge.
Events are forwarded to custom event buses which process them according to the defined metrics and any filters we may have set up.
The custom event buses call AWS Lambda functions which forward metric data to Amazon CloudWatch. CloudWatch gives us an aggregated view of each of the metrics. From Amazon CloudWatch, you can send the metrics to another designated dashboard like Amazon Managed Grafana.
As part of the data collection, the Lambda function will also query GitHub for the relevant commit to calculate the lead time for changes metric. It will query AWS Systems Manager for OpsItem data for change failure rate and mean time to recovery metrics. You can create OpsItems manually as part of your change management process or configure CloudWatch alarms to create OpsItems automatically.

Figure 2 visualizes these steps. This setup can be replicated to a group of accounts of one or multiple teams.

This figure visualizes the aforementioned four steps of our metric calculation process. AWS Lambda functions process all events and publish custom metrics in Amazon CloudWatch.

Figure 2. DORA metric setup for AWS CodePipeline deployments

Walkthrough

Follow these steps to deploy the solution in your AWS accounts.

Prerequisites

For this walkthrough, you should have the following prerequisites:

AWS accounts for tooling, operations, and application environments
Install Python version 3.9 or later
AWS Cloud Development Kit (AWS CDK) v2 installed
Set up an AWS CDK pipeline
Access to AWS CodePipeline, AWS Systems Manager OpsCenter and GitHub

Deploying the solution

Clone the GitHub code repository Balance deployment speed and stability with DORA metrics.

Before you start deploying or working with this code base, there are a few configurations you need to complete in the constants.py file in the cdk/ directory. Open the file in your IDE and update the following constants:

TOOLING_ACCOUNT_ID & TOOLING_ACCOUNT_REGION: These represent the AWS account ID and AWS region for AWS CodePipeline (that is, your tooling account).
OPS_ACCOUNT_ID & OPS_ACCOUNT_REGION: These are for your operations account (used for centralized log aggregation and dashboard).
TOOLING_CROSS_ACCOUNT_LAMBDA_ROLE: The IAM Role for cross-account access that allows AWS Lambda to post metrics from your tooling account to your operations account/Amazon CloudWatch dashboard.
DEFAULT_MAIN_BRANCH: This is the default branch in your code repository that’s used to deploy to your production application environment. It is set to “main” by default, as we assumed feature-driven development (GitFlow) on the main branch; update if you use a different naming convention.
APP_PROD_STAGE_NAME: This is the name of your production stage and set to “DeployPROD” by default. It’s reserved for teams with trunk-based development.

Setting up the environment

To set up your environment on MacOS and Linux:

Create a virtual environment:
```
$ python3 -m venv .venv
```
Activate the virtual environment: On MacOS and Linux:
```
$ source .venv/bin/activate
```

Alternatively, to set up your environment on Windows:

Create a virtual environment:
```
% .venv\Scripts\activate.bat
```
Install the required Python packages:
```
$ pip install -r requirements.txt
```

To configure the AWS Command Line Interface (AWS CLI):

Follow the configuration steps in the AWS CLI User Guide.
```
$ aws configure sso
```
Configure your user profile (for example, Ops for operations account, Tooling for tooling account). You can check user profile names in the credentials file.

Deploying the CloudFormation stacks

Switch directory
```
$ cd cdk
```
Bootstrap CDK
```
$ cdk bootstrap –-profile Ops
```
Synthesize the AWS CloudFormation template for this project:
```
$ cdk synth
```
To deploy a specific stack (see Figure 3 for an overview), specify the stack name and AWS account number(s) in the following command:
```
$ cdk deploy <Stack-Name> --profile {Tooling, Ops}
```
To launch the DoraToolingEventBridgeStack stack in the Tooling account:
```
$ cdk deploy DoraToolingEventBridgeStack --profile Tooling
```
To launch the other stacks in the Operations account (including DoraOpsGitHubLogsStack, DoraOpsDeploymentFrequencyStack, DoraOpsLeadTimeForChangeStack, DoraOpsChangeFailureRateStack, DoraOpsMeanTimeToRestoreStack, DoraOpsMetricsDashboardStack):
```
$ cdk deploy DoraOps* --profile Ops
```

The following figure shows the resources you’ll launch with each CloudFormation stack. This includes six AWS CloudFormation stacks in operations account. The first stack sets up log integration for GitHub commit activity. Four stacks contain a Lambda function which creates one of the DORA metrics. The sixth stack creates the consolidated dashboard in Amazon CloudWatch.

Figure 3. Resources provisioned with this solution

Testing the deployment

To run the provided tests:

$ pytest

Understanding what you’ve built

Deployed resources in tooling account

The DoraToolingEventBridgeStack includes Amazon EventBridge rules with a target of the central event bus in the operations account, plus an AWS IAM role with cross-account access to put events in the operations account. The event pattern for invoking our EventBridge rules listens for deployment state changes in AWS CodePipeline:

{
  "detail-type": ["CodePipeline Pipeline Execution State Change"],
  "source": ["aws.codepipeline"]
}

Deployed resources in operations account

The Lambda function for Deployment Frequency tracks the number of successful deployments to production, and posts the metric data to Amazon CloudWatch. You can add a dimension with the repository name in Amazon CloudWatch to filter on particular repositories/teams.
The Lambda function for the Lead Time For Change metric calculates the duration from the first commit to successful deployment in production. This covers all factors contributing to lead time for changes, including code reviews, build, test, as well as the deployment itself.
The Lambda function for Change Failure Rate keeps track of the count of successful deployments and the count of system impairment records (OpsItems) in production. It publishes both as metrics to Amazon CloudWatch and the latter calculates the ratio, as shown in below example.
The Lambda function for Mean Time To Recovery keeps track of all deployments with status SUCCEEDED in production and whose repository branch name references an existing OpsItem ID. For every matching event, the function gets the creation time of the OpsItem record and posts the duration between OpsItem creation and successful re-deployment to the CloudWatch dashboard.

All Lambda functions publish metric data to Amazon CloudWatch using the PutMetricData API. The final calculation of the four keys is performed on the CloudWatch dashboard. The solution includes a simple CloudWatch dashboard so you can validate the end-to-end data flow and confirm that it has deployed successfully:

This simple CloudWatch dashboard displays the four DORA metrics for three reporting periods: per day, per week, and per month.

Cleaning up

Remember to delete example resources if you no longer need them to avoid incurring future costs.

You can do this via the CDK CLI:

$ cdk destroy <Stack-Name> --profile {Tooling, Ops}

Alternatively, go to the CloudFormation console in each AWS account, select the stacks related to DORA and click on Delete. Confirm that the status of all DORA stacks is DELETE_COMPLETE.

Conclusion

DORA metrics provide a popular method to measure the speed and stability of your deployments. The solution in this blog post helps you bootstrap automatic metric collection in your AWS accounts. The four keys help you gain consensus on team performance and provide data points to back improvement suggestions. We recommend using the solution to gain leadership support for systemic issues inhibiting team satisfaction and user experience. To learn more about developer productivity research, we encourage you to also review alternative frameworks including DevEx and SPACE.

Further resources

If you enjoyed this post, you may also like:

Author bio

Streamline your data governance by deploying Amazon DataZone with the AWS CDK

2024-07-23 Bandana Das

Post Syndicated from Bandana Das original https://aws.amazon.com/blogs/big-data/streamline-your-data-governance-by-deploying-amazon-datazone-with-the-aws-cdk/

Managing data across diverse environments can be a complex and daunting task. Amazon DataZone simplifies this so you can catalog, discover, share, and govern data stored across AWS, on premises, and third-party sources.

Many organizations manage vast amounts of data assets owned by various teams, creating a complex landscape that poses challenges for scalable data management. These organizations require a robust infrastructure as code (IaC) approach to deploy and manage their data governance solutions. In this post, we explore how to deploy Amazon DataZone using the AWS Cloud Development Kit (AWS CDK) to achieve seamless, scalable, and secure data governance.

Overview of solution

By using IaC with the AWS CDK, organizations can efficiently deploy and manage their data governance solutions. This approach provides scalability, security, and seamless integration across all teams, allowing for consistent and automated deployments.

The AWS CDK is a framework for defining cloud IaC and provisioning it through AWS CloudFormation. Developers can use any of the supported programming languages to define reusable cloud components known as constructs. A construct is a reusable and programmable component that represents AWS resources. The AWS CDK translates the high-level constructs defined by you into equivalent CloudFormation templates. AWS CloudFormation provisions the resources specified in the template, streamlining the usage of IaC on AWS.

Amazon DataZone core components are the building blocks to create a comprehensive end-to-end solution for data management and data governance. The following are the Amazon DataZone core components. For more details, see Amazon DataZone terminology and concepts.

Amazon DataZone domain – You can use an Amazon DataZone domain to organize your assets, users, and their projects. By associating additional AWS accounts with your Amazon DataZone domains, you can bring together your data sources.
Data portal – The data portal is outside the AWS Management Console. This is a browser-based web application where different users can catalog, discover, govern, share, and analyze data in a self-service fashion.
Business data catalog – You can use this component to catalog data across your organization with business context and enable everyone in your organization to ﬁnd and understand data quickly.
Projects – In Amazon DataZone, projects are business use case-based groupings of people, assets (data), and tools used to simplify access to AWS analytics.
Environments – Within Amazon DataZone projects, environments are collections of zero or more configured resources on which a given set of AWS Identity and Access Management (IAM) principals (for example, users with a contributor permissions) can operate.
Amazon DataZone data source – In Amazon DataZone, you can publish an AWS Glue Data Catalog data source or Amazon Redshift data source.
Publish and subscribe workﬂows – You can use these automated workﬂows to secure data between producers and consumers in a self-service manner and make sure that everyone in your organization has access to the right data for the right purpose.

We use an AWS CDK app to demonstrate how to create and deploy core components of Amazon DataZone in an AWS account. The following diagram illustrates the primary core components that we create.

In addition to the core components deployed with the AWS CDK, we provide a custom resource module to create Amazon DataZone components such as glossaries, glossary terms, and metadata forms, which are not supported by AWS CDK constructs (at the time of writing).

Prerequisites

The following local machine prerequisites are required before starting:

An AWS account (with AWS IAM Identity Center enabled).
Either Bash or ZSH terminal.
The AWS Command Line Interface (AWS CLI) v2 installed.
Python version 3.10 or higher.
The AWS SDK for Python version 1.34.87 or higher.
Node version v18.17.* or higher.
NPM version v10.2.* or higher.
An AWS Glue table to be registered as a sample data source in an Amazon DataZone project.
As part of this post, we want to publish AWS Glue tables from an AWS Glue database that already exists. For this, you must explicitly provide Amazon DataZone with the permissions to access tables in this existing AWS Glue database. For more information, refer to Configure Lake Formation permissions for Amazon DataZone.
- Remove the IAMAllowedPrincipals permissions from the AWS Lake Formation tables for which Amazon DataZone handles permissions.
- Make sure you have disabled the default permissions under the Data Catalog settings in Lake Formation (see the following screenshot).

Deploy the solution

Complete the following steps to deploy the solution:

Clone the GitHub repository and go to the root of your downloaded repository folder:

git clone https://github.com/aws-samples/amazon-datazone-cdk-example.git
cd amazon-datazone-cdk-example

Install local dependencies:

$ npm ci ### this will install the packages configured in package-lock.json

Sign in to your AWS account using the AWS CLI by configuring your credential file (replace <PROFILE_NAME> with the profile name of your deployment AWS account):
```
$ export AWS_PROFILE=<PROFILE_NAME>
```
Bootstrap the AWS CDK environment (this is a one-time activity and not needed if your AWS account is already bootstrapped):
```
$ npm run cdk bootstrap
```
Run the script to replace the placeholders for your AWS account and AWS Region in the config files:
```
$ ./scripts/prepare.sh <<YOUR_AWS_ACCOUNT_ID>> <<YOUR_AWS_REGION>>
```

The preceding command will replace the AWS_ACCOUNT_ID_PLACEHOLDER and AWS_REGION_PLACEHOLDER values in the following config files:

lib/config/project_config.json
lib/config/project_environment_config.json
lib/constants.ts

Next, you configure your Amazon DataZone domain, project, business glossary, metadata forms, and environments with your data source.

Go to the file lib/constants.ts. You can keep the DOMAIN_NAME provided or update it as needed.
Go to the file lib/config/project_config.json. You can keep the example values for projectName and projectDescription or update them. An example value for projectMembers has also been provided (as shown in the following code snippet). Update the value of the memberIdentifier parameter with an IAM role ARN of your choice that you would like to be the owner of this project.
```
"projectMembers": [
            {
                "memberIdentifier": "arn:aws:iam::AWS_ACCOUNT_ID_PLACEHOLDER:role/Admin",
                "memberIdentifierType": "UserIdentifier"
            }
        ]
```
Go to the file lib/config/project_glossary_config.json. An example business glossary and glossary terms are provided for the projects; you can keep them as is or update them with your project name, business glossary, and glossary terms.
Go to the lib/config/project_form_config.json file. You can keep the example metadata forms provided for the projects or update your project name and metadata forms.
Go to the lib/config/project_enviornment_config.json file. Update EXISTING_GLUE_DB_NAME_PLACEHOLDER with the existing AWS Glue database name in the same AWS account where you are deploying the Amazon DataZone core components with the AWS CDK. Make sure you have at least one existing AWS Glue table in this AWS Glue database to publish as a data source within Amazon DataZone. Replace DATA_SOURCE_NAME_PLACEHOLDER and DATA_SOURCE_DESCRIPTION_PLACEHOLDER with your choice of Amazon DataZone data source name and description. An example of a cron schedule has been provided (see the following code snippet). This is the schedule for your data source run; you can keep the same or update it.
```
"Schedule":{
   "schedule":"cron(0 7 * * ? *)"
}
```

Next, you update the trust policy of the AWS CDK deployment IAM role to deploy a custom resource module.

On the IAM console, update the trust policy of the IAM role for your AWS CDK deployment that starts with cdk-hnb659fds-cfn-exec-role- by adding the following permissions. Replace ${ACCOUNT_ID} and ${REGION} with your specific AWS account and Region.

     {
         "Effect": "Allow",
         "Principal": {
             "Service": "lambda.amazonaws.com"
         },
         "Action": "sts:AssumeRole",
         "Condition": {
             "ArnLike": {
                 "aws:SourceArn": [
                     
                     "arn:aws:lambda:${REGION}:{ACCOUNT_ID}:function:DataZonePreqStack-GlossaryLambda*",
                     "arn:aws:lambda:${REGION}:{ACCOUNT_ID}:function:DataZonePreqStack-GlossaryTermLambda*",
                     "arn:aws:lambda:${REGION}:{ACCOUNT_ID}:function:DataZonePreqStack-FormLambda*"
                 ]
             }
         }
     }

Now you can configure data lake administrators in Lake Formation.

On the Lake Formation console, choose Administrative roles and tasks in the navigation pane.
Under Data lake administrators, choose Add and add the IAM role for AWS CDK deployment that starts with cdk-hnb659fds-cfn-exec-role- as an administrator.

This IAM role needs permissions in Lake Formation to create resources, such as an AWS Glue database. Without these permissions, the AWS CDK stack deployment will fail.

Deploy the solution:
```
$ npm run cdk deploy --all
```
During deployment, enter y if you want to deploy the changes for some stacks when you see the prompt Do you wish to deploy these changes (y/n)?.
After the deployment is complete, sign in to your AWS account and navigate to the AWS CloudFormation console to verify that the infrastructure deployed.

You should see a list of the deployed CloudFormation stacks, as shown in the following screenshot.

Open the Amazon DataZone console in your AWS account and open your domain.
Open the data portal URL available in the Summary section.
Find your project in the data portal and run the data source job.

This is a one-time activity if you want to publish and search the data source immediately within Amazon DataZone. Otherwise, wait for the data source runs according to the cron schedule mentioned in the preceding steps.

Troubleshooting

If you get the message "Domain name already exists under this account, please use another one (Service: DataZone, Status Code: 409, Request ID: 2d054cb0-0 fb7-466f-ae04-c53ff3c57c9a)" (RequestToken: 85ab4aa7-9e22-c7e6-8f00-80b5871e4bf7, HandlerErrorCode: AlreadyExists), change the domain name under lib/constants.ts and try to deploy again.

If you get the message "Resource of type 'AWS::IAM::Role' with identifier 'CustomResourceProviderRole1' already exists." (RequestToken: 17a6384e-7b0f-03b3 -1161-198fb044464d, HandlerErrorCode: AlreadyExists), this means you’re accidentally trying to deploy everything in the same account but a different Region. Make sure to use the Region you configured in your initial deployment. For the sake of simplicity, the DataZonePreReqStack is in one Region in the same account.

If you get the message “Unmanaged asset” Warning in the data asset on your datazone project, you must explicitly provide Amazon DataZone with Lake Formation permissions to access tables in this external AWS Glue database. For instructions, refer to Configure Lake Formation permissions for Amazon DataZone.

Clean up

To avoid incurring future charges, delete the resources. If you have already shared the data source using Amazon DataZone, then you have to remove those manually first in the Amazon DataZone data portal because the AWS CDK isn’t able to automatically do that.

Unpublish the data within the Amazon DataZone data portal.
Delete the data asset from the Amazon DataZone data portal.
From the root of your repository folder, run the following command:
```
$ npm run cdk destroy --all
```
Delete the Amazon DataZone created databases in AWS Glue. Refer to the tips to troubleshoot Lake Formation permission errors in AWS Glue if needed.
Remove the created IAM roles from Lake Formation administrative roles and tasks.

Conclusion

Amazon DataZone offers a comprehensive solution for implementing a data mesh architecture, enabling organizations to address advanced data governance challenges effectively. Using the AWS CDK for IaC streamlines the deployment and management of Amazon DataZone resources, promoting consistency, reproducibility, and automation. This approach enhances data organization and sharing across your organization.

Ready to streamline your data governance? Dive deeper into Amazon DataZone by visiting the Amazon DataZone User Guide. To learn more about the AWS CDK, explore the AWS CDK Developer Guide.

About the Authors

Bandana Das is a Senior Data Architect at Amazon Web Services and specializes in data and analytics. She builds event-driven data architectures to support customers in data management and data-driven decision-making. She is also passionate about enabling customers on their data management journey to the cloud.

Gezim Musliaj is a Senior DevOps Consultant with AWS Professional Services. He is interested in various things CI/CD, data, and their application in the field of IoT, massive data ingestion, and recently MLOps and GenAI.

Sameer Ranjha is a Software Development Engineer on the Amazon DataZone team. He works in the domain of modern data architectures and software engineering, developing scalable and efficient solutions.

Sindi Cali is an Associate Consultant with AWS Professional Services. She supports customers in building data-driven applications in AWS.

Bhaskar Singh is a Software Development Engineer on the Amazon DataZone team. He has contributed to implementing AWS CloudFormation support for Amazon DataZone. He is passionate about distributed systems and dedicated to solving customers’ problems.

Refactoring to Serverless: From Application to Automation

2024-07-03 Sindhu Pillai

Post Syndicated from Sindhu Pillai original https://aws.amazon.com/blogs/devops/refactoring-to-serverless-from-application-to-automation/

Serverless technologies not only minimize the time that builders spend managing infrastructure, they also help builders reduce the amount of application code they need to write. Replacing application code with fully managed cloud services improves both the operational characteristics and the maintainability of your applications thanks to a cleaner separation between business logic and application topology. This blog post shows you how.

Serverless isn’t a runtime; it’s an architecture

Since the launch of AWS Lambda in 2014, serverless has evolved to be more than just a cloud runtime. The ability to easily deploy and scale individual functions, coupled with per-millisecond billing, has led to the evolution of modern application architectures from monoliths towards loosely-coupled applications. Functions typically communicate through events, an interaction model that’s supported by a combination of serverless integration services, such as Amazon EventBridge and Amazon SNS, and Lambda’s asynchronous invocation model.

Modern distributed architectures with independent runtime elements (like Lambda functions or containers) have a distinct topology graph that represents which elements talk to others. In the diagram below, Amazon API Gateway, Lambda, EventBridge, and Amazon SQS interact to process an order in a typical Order Processing System. The topology has a major influence on the application’s runtime characteristics like latency, throughput, or resilience.

The role of cloud automation evolves

Cloud automation languages, commonly referred to as IaC (Infrastructure as Code), date back to 2011 with the launch of CloudFormation, which allowed users to declare a set of cloud resources in configuration files instead of issuing a series of API calls or CLI commands. Initial document-oriented automation languages like AWS CloudFormation and Terraform were soon complemented by frameworks like AWS Cloud Development Kit (CDK), CDK for Terraform, and Pulumi that introduced the ability to write cloud automation code in popular general-purpose languages like TypeScript, Python, or Java.

The role of cloud automation evolved alongside serverless application architectures. Because serverless technologies free builders from having to manage infrastructure, there really isn’t any “I” in serverless IaC anymore. Instead, serverless cloud automation primarily defines the application’s topology by connecting Lambda functions with event sources or targets, which can be other Lambda functions. This approach more closely resembles “AaC” – Architecture as Code – as the automation now defines the application’s architecture instead of provisioning infrastructure elements.

Improving serverless applications with automation code

By utilizing AWS serverless runtime features, automation code can frequently achieve the same functionality as your application code.

For example, the Lambda function below, written in TypeScript, sends a message to EventBridge:

export const handler = async (event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> => { 
    const result = // some logic
    const eventParam = new PutEventsCommand({
        Entries: [
            {
              Detail: JSON.stringify(result),
              DetailType: 'OrderCreated',
              EventBusName: process.env.EVENTBUS_NAME,
            }
          ]
    });
    await eventBridgeClient.send(eventParam);     return {
       statusCode: 200,
       body: JSON.stringify({ message: 'Order created', result }),
    };
};

You can achieve the same behavior using AWS Lambda Destinations, which instructs the Lambda runtime to publish an event after the completion of the function. You can configure Lambda destinations via below AWS CDK code, also written in TypeScript:

import {EventBridgeDestination} from "aws-cdk-lib/aws-lambda-destinations"

const createOrderLambda = new Function(this,'createOrderLambda', {
    functionName: `OrderService`,
    runtime: Runtime.NODEJS_20_X,
    code: Code.fromAsset('lambda-fns/send-message-using-destination'),
    handler: 'OrderService.handler',
 onSuccess: new EventBridgeDestination(eventBus)
});

With the AWS CDK, you can use the same programming languages for both application and automation code, allowing you to switch easily between the two.

The Lambda function can now focus on the business logic and doesn’t contain any reference to message sending or EventBridge. This separation of concerns is a best practice because changes to the business logic do not run the risk of breaking the architecture and vice versa.

export const handler = async (event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> => {
    const result = //some logic
    return {
        statusCode: 200,
        body: JSON.stringify({ message: 'Order created', result }),
     };
};

Instructing the serverless Lambda runtime to send the event has several advantages over hand-coding it inside the application code

It decouples application logic from topology. The message destination, consisting of the type of the service (e.g., EventBridge vs. another Lambda Function) and the destination’s ARN, define the application’s architecture (or topology). Embedding message sending in the application code mixes architecture with business logic. Handling the sending of the message in the runtime separates concerns and avoids having to touch the application code for a topology change.
It makes the composition explicit. If application code sends a message, it will likely read the destination from an environment variable, which is passed to the Lambda function. The name of the variable that is used for this purpose is buried in the application code, forcing you to rely on naming conventions. Defining all dependencies between service instances in automation code keeps them in a central location, and allows you to use code analysis and refactoring tools to reason about your architecture or make changes to it.
It avoids simple mistakes. Redundant code can lead to mistakes. For example, debugging a Lambda function that accidentally swapped day and month in the message’s date field took hours. Letting the runtime send messages avoids such errors.
Higher-level constructs simplify permission grants. Cloud automation libraries like CDK allow the creation of higher-level constructs, which can combine multiple resources and include necessary IAM permissions. You’ll write less code and avoid debugging cycles.
The runtime is more robust. Delegating message sending to the serverless runtime takes care of any required retries, ensuring the message to be sent and freeing builders from having to write extra code for such undifferentiated heavy lifting.

In summary, letting the managed service handle message passing makes your serverless application cleaner and more robust. We also like to say that it becomes “serverless-native” because it fully utilizes the native services available to the application.

Refactoring to serverless-native

Shifting code from application to automation is what we call “Refactoring to Serverless”. Refactoring is a term popularized by Martin Fowler in the late 90s to describe the restructuring of source code to alter its structure without changing its external behavior. Code refactoring can be as simple as extracting code into a separate method or more sophisticated like replacing conditional expressions with polymorphism.

Developers refactor their code to improve its readability and maintainability. A common approach in Test-Driven Development (TDD) is the so-called red-green-refactor cycle: write a test, which will be red because the functionality isn’t implemented, then write the code to make the test green, and finally refactor to counteract the growing entropy in the codebase.

Serverless refactoring takes inspiration from this concept but augments it to the context of serverless automation:

Serverless refactoring: A controlled technique for improving the design of serverless applications by replacing application code with equivalent automation code.

Let’s explore how serverless refactoring can enhance the design and runtime characteristics of a serverless application. The diagram below shows an AWS Step Functions workflow that performs a quality check through image recognition. An early implementation, shown on the left, would use an intermediate AWS Lambda function to call the Amazon Rekognition service. Thanks to the launch of Step Functions’ AWS SDK service integrations in 2021, you can refactor the workflow to directly call the Rekognition API. This refactored design, seen on the right, eliminates the Lambda function (assuming it didn’t perform any additional tasks), thereby reducing costs and runtime complexity.

Replacing Lambda with Service Integration in Step Function workflow

See the AWS CDK implementation for this refactoring, in TypeScript, on GitHub.

Refactoring Limitations

The initial example of replacing application code to send a message to SQS via Lambda Destinations reveals that refactoring from application to automation code isn’t 100% behavior-preserving.

First, Lambda Destinations are only triggered when the function is invoked asynchronously. For synchronous invocations, the function passes the results back to the caller, and does not invoke the destination. Second, the serverless runtime wraps the data returned from the function inside a message envelope, affecting how the message recipient parses the JSON object. The message data is placed inside the responsePayload field if sending to another Lambda function or the detail field if sending to an EventBridge destination. Last, Lambda Destinations sends a message after the function completes, whereas application code could send the message at any point during the execution.

Lambda Destination Execution

The last change in behavior will be transparent to well-architected asynchronous applications because they won’t depend on the timing of message delivery. If a Lambda function continues processing after sending a message (for example, to EventBridge), that code can’t assume that the message has been processed because delivery is asynchronous. A rare exception could be a loop waiting for the results from the downstream message processing, but such loops violate the principles of asynchronous integration and also waste compute resources (Amazon Step Functions is a great choice for asynchronous callbacks). If such behavior is required, it can be achieved by splitting the Lambda function into two parts.

Can Serverless Refactoring be Automated?

Traditional code refactoring like “Extract Method” is automated thanks to built-in support by many code editors. Serverless refactoring isn’t (yet) a fully automatic, 100%-equivalent code transformation because it translates application code into automation code (or vice versa). While AI-powered tools like Amazon Q Developer are getting us closer to that vision, we consider serverless refactoring primarily as a design technique for developers to better utilize the AWS runtime. Improved code design and runtime characteristics outweigh behavior differences, especially if your application includes automated tests.

Incorporating refactoring into your team structures

If a single team owns both the application and the automation code, refactoring takes place inside the team. However, serverless refactoring can cross team boundaries when separate teams develop business logic versus managing the underlying infrastructure, configuration, and deployment.

In such a model, AWS recommends that the development team be responsible for both the application code and the application-specific automation, such as the CDK code to configure Lambda Destinations, Step Functions workflows, or EventBridge routing. Splitting application and application-specific automation across teams would make the development team dependent on the platform team for each refactoring and introduce unnecessary friction.

If both teams use the same Infrastructure-as-Code (IaC) tool, say AWS CDK, the platform team can build reusable templates and constructs that encapsulate organizational requirements and guardrails, such as CDK constructs for S3 buckets with encryption enabled. Development teams can easily consume those resources across CDK stacks.

However, teams could use different IaC tools, for example, the infrastructure team prefers CloudFormation but the development team prefers AWS CDK. In this setup, development teams can build their automation on top of the CFN Modules provided by the infrastructure team. However, they won’t benefit from the same high-level programming abstractions as they do with CDK.

Collaboration in a split-team model

Continuous Refactoring

Just like traditional code refactoring, refactoring to serverless isn’t a one-time activity but an essential aspect of your software delivery. Because adding functionality increases your application’s complexity, regular refactoring can help keep complexity at bay and maintain your development velocity. Like with Continuous Delivery, you can improve your software delivery with Continuous Refactoring.

Teams who encounter difficulties with serverless refactoring might be lacking automated test coverage or cloud automation. So, refactoring can become a useful forcing function for teams to exercise software delivery hygiene, for example by implementing automated tests.

Getting Started

The refactoring samples discussed here are a subset of an extensive catalog of open source code examples, which you can find along with AWS CDK implementation examples at refactoringserverless.com. You can also dive deeper into how serverless refactoring can make your application architecture more loosely coupled in a separate blog post.

Use the examples to accelerate your own refactoring effort. Now Go Refactor!

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 2

2024-06-12 Asad Bin Imtiaz

Post Syndicated from Asad Bin Imtiaz original https://aws.amazon.com/blogs/big-data/how-swisscom-automated-amazon-redshift-as-part-of-their-one-data-platform-solution-using-aws-cdk-part-2/

In this series, we talk about Swisscom’s journey of automating Amazon Redshift provisioning as part of the Swisscom One Data Platform (ODP) solution using the AWS Cloud Development Kit (AWS CDK), and we provide code snippets and the other useful references.

In Part 1, we did a deep dive on provisioning a secure and compliant Redshift cluster using the AWS CDK and the best practices of secret rotation. We also explained how Swisscom used AWS CDK custom resources to automate the creation of dynamic user groups that are relevant for the AWS Identity and Access Management (IAM) roles matching different job functions.

In this post, we explore using the AWS CDK and some of the key topics for self-service usage of the provisioned Redshift cluster by end-users as well as other managed services and applications. These topics include federation with the Swisscom identity provider (IdP), JDBC connections, detective controls using AWS Config rules and remediation actions, cost optimization using the Redshift scheduler, and audit logging.

Scheduled actions

To optimize cost-efficiency for provisioned Redshift cluster deployments, Swisscom implemented a scheduling mechanism. This functionality is driven by the user configuration of the cluster, as described in Part 1 of this series, wherein the user may enable dynamic pausing and resuming of clusters based on specified cron expressions:

redshift_options:
...
  use_scheduler: true                                         # Whether to use Redshift scheduler
  scheduler_pause_cron: "cron(00 18 ? * MON-FRI *)"           # Cron expression for scheduler pause
  scheduler_resume_cron: "cron(00 08 ? * MON-FRI *)"          # Cron expression for scheduler resume
...

This feature allows Swisscom to reduce operational costs by suspending cluster activity during off-peak hours. This leads to significant cost savings by pausing and resuming clusters at appropriate times. The scheduling is achieved using the AWS CloudFormation action CfnScheduledAction. The following code illustrates how Swisscom implemented this scheduling:

if config.use_scheduler:
    cfn_scheduled_action_pause = aws_redshift.CfnScheduledAction(
        scope, "schedule-pause-action",
        # ...
        schedule=config.scheduler_pause_cron,
        # ...
        target_action=aws_redshift.CfnScheduledAction.ScheduledActionTypeProperty(
                         pause_cluster=aws_redshift.CfnScheduledAction.ResumeClusterMessageProperty(
                            cluster_identifier='cluster-identifier'
                         )
                      )
    )

    cfn_scheduled_action_resume = aws_redshift.CfnScheduledAction(
        scope, "schedule-resume-action",
        # ...
        schedule=config.scheduler_resume_cron,
        # ...
        target_action=aws_redshift.CfnScheduledAction.ScheduledActionTypeProperty(
                         resume_cluster=aws_redshift.CfnScheduledAction.ResumeClusterMessageProperty(
                            cluster_identifier='cluster-identifier'
                         )
                      )
    )

JDBC connections

The JDBC connectivity for Amazon Redshift clusters was also very flexible, adapting to user-defined subnet types and security groups in the configuration:

redshift_options:
...
  subnet_type: "routable-private"         # 'routable-private' OR 'non-routable-private'
  security_group_id: "sg-test_redshift"   # Security Group ID for Amazon Redshift (referenced group must exists in Account)
...

As illustrated in the ODP architecture diagram in Part 1 of this series, a considerable part of extract, transform, and load (ETL) processes is anticipated to operate outside of Amazon Redshift, within the serverless AWS Glue environment. Given this, Swisscom needed a mechanism for AWS Glue to connect to Amazon Redshift. This connectivity to Redshift clusters is provided through JDBC by creating an AWS Glue connection within the AWS CDK code. This connection allows ETL processes to interact with the Redshift cluster by establishing a JDBC connection. The subnet and security group defined in the user configuration guide the creation of JDBC connectivity. If no security groups are defined in the configuration, a default one is created. The connection is configured with details of the data product from which the Redshift cluster is being provisioned, like ETL user and default database, along with network elements like cluster endpoint, security group, and subnet to use, providing secure and efficient data transfer. The following code snippet demonstrates how this was achieved:

jdbc_connection = glue.Connection(
    scope, "redshift-glue-connection",
    type=ConnectionType("JDBC"),
    connection_name="redshift-glue-connection",
    subnet=connection_subnet,
    security_groups=connection_security_groups,
    properties={
        "JDBC_CONNECTION_URL": f"jdbc:redshift://{cluster_endpoint}/{database_name}",
        "USERNAME": etl_user.username,
        "PASSWORD": etl_user.password.to_string(),
        "redshiftTmpDir": f"s3://{data_product_name}-redshift-work"
    }
)

By doing this, Swisscom made sure that serverless ETL workflows in AWS Glue can securely communicate with newly provisioned Redshift cluster running within a secured virtual private cloud (VPC).

Identity federation

Identity federation allows a centralized system (the IdP) to be used for authenticating users in order to access a service provider like Amazon Redshift. A more general overview of the topic can be found in Identity Federation in AWS.

Identity federation not only enhances security due to its centralized user lifecycle management and centralized authentication mechanism (for example, supporting multi-factor authentication), but also improves the user experience and reduces the overall complexity of identity and access management and thereby also its governance.

In Swisscom’s setup, Microsoft Active Directory Services are used for identity and access management. At the initial build stages of ODP, Amazon Redshift offered two different options for identity federation:

IAM-based SAML 2.0 IdP federation, as outlined in Federate Amazon Redshift access with Microsoft Azure AD single sign-on. See also Using IAM authentication to generate database user credentials.
Native IdP federation, as outlined in Integrate Amazon Redshift native IdP federation with Microsoft Azure AD using a SQL client. See also Native identity provider (IdP) federation for Amazon Redshift.

In Swisscom’s context, during the initial implementation, Swisscom opted for IAM-based SAML 2.0 IdP federation because this is a more general approach, which can also be used for other AWS services, such as Amazon QuickSight (see Setting up IdP federation using IAM and QuickSight).

At 2023 AWS re:Invent, AWS announced a new connection option to Amazon Redshift based on AWS IAM Identity Center. IAM Identity Center provides a single place for workforce identities in AWS, allowing the creation of users and groups directly within itself or by federation with standard IdPs like Okta, PingOne, Microsoft Entra ID (Azure AD), or any IdP that supports SAML 2.0 and SCIM. It also provides a single sign-on (SSO) experience for Redshift features and other analytics services such as Amazon Redshift Query Editor V2 (see Integrate Identity Provider (IdP) with Amazon Redshift Query Editor V2 using AWS IAM Identity Center for seamless Single Sign-On), QuickSight, and AWS Lake Formation. Moreover, a single IAM Identity Center instance can be shared with multiple Redshift clusters and workgroups with a simple auto-discovery and connect capability. It makes sure all Redshift clusters and workgroups have a consistent view of users, their attributes, and groups. This whole setup fits well with ODP’s vision of providing self-service analytics across the Swisscom workforce with necessary security controls in place. At the time of writing, Swisscom is actively working towards using IAM Identity Center as the standard federation solution for ODP. The following diagram illustrates the high-level architecture for the work in progress.

Audit logging

Amazon Redshift audit logging is useful for auditing for security purposes, monitoring, and troubleshooting. The logging provides information, such as the IP address of the user’s computer, the type of authentication used by the user, or the timestamp of the request. Amazon Redshift logs the SQL operations, including connection attempts, queries, and changes, and makes it straightforward to track the changes. These logs can be accessed through SQL queries against system tables, saved to a secure Amazon Simple Storage Service (Amazon S3) location, or exported to Amazon CloudWatch.

Amazon Redshift logs information in the following log files:

Connection log – Provides information to monitor users connecting to the database and related connection information like their IP address.
User log – Logs information about changes to database user definitions.
User activity log – Tracks information about the types of queries that both the users and the system perform in the database. It’s useful primarily for troubleshooting purposes.

With the ODP solution, Swisscom wanted to write all the Amazon Redshift logs to CloudWatch. This is currently not directly supported by the AWS CDK, so Swisscom implemented a workaround solution using the AWS CDK custom resources option, which invokes the SDK on the Redshift action enableLogging. See the following code:

    custom_resources.AwsCustomResource(self, f"{self.cluster_identifier}-custom-sdk-logging",
           on_update=custom_resources.AwsSdkCall(
               service="Redshift",
               action="enableLogging",
               parameters={
                   "ClusterIdentifier": self.cluster_identifier,
                   "LogDestinationType": "cloudwatch",
                   "LogExports": ["connectionlog","userlog","useractivitylog"],
               },
               physical_resource_id=custom_resources.PhysicalResourceId.of(
                   f"{self.account}-{self.region}-{self.cluster_identifier}-logging")
           ),
           policy=custom_resources.AwsCustomResourcePolicy.from_sdk_calls(
               resources=[f"arn:aws:redshift:{self.region}:{self.account}:cluster:{self.cluster_identifier}"]
           )
        )

AWS Config rules and remediation

After a Redshift cluster has been deployed, Swisscom needed to make sure that the cluster meets the governance rules defined in every point in time after creation. For that, Swisscom decided to use AWS Config.

AWS Config provides a detailed view of the configuration of AWS resources in your AWS account. This includes how the resources are related to one another and how they were configured in the past so you can see how the configurations and relationships change over time.

An AWS resource is an entity you can work with in AWS, such as an Amazon Elastic Compute Cloud (Amazon EC2) instance, Amazon Elastic Block Store (Amazon EBS) volume, security group, or Amazon VPC.

The following diagram illustrates the process Swisscom implemented.

If an AWS Config rule isn’t compliant, a remediation can be applied. Swisscom defined the pause cluster action as default in case of a non-compliant cluster (based on your requirements, other remediation actions are possible). This is covered using an AWS Systems Manager automation document (SSM document).

Automation, a capability of Systems Manager, simplifies common maintenance, deployment, and remediation tasks for AWS services like Amazon EC2, Amazon Relational Database Service (Amazon RDS), Amazon Redshift, Amazon S3, and many more.

The SSM document is based on the AWS document AWSConfigRemediation-DeleteRedshiftCluster. It looks like the following code:

description: | 
  ### Document name - PauseRedshiftCluster-WithCheck 

  ## What does this document do? 
  This document pauses the given Amazon Redshift cluster using the [PauseCluster](https://docs.aws.amazon.com/redshift/latest/APIReference/API_PauseCluster.html) API. 

  ## Input Parameters 
  * AutomationAssumeRole: (Required) The ARN of the role that allows Automation to perform the actions on your behalf. 
  * ClusterIdentifier: (Required) The identifier of the Amazon Redshift Cluster. 

  ## Output Parameters 
  * PauseRedshiftClusterWithoutSnapShot.Response: The standard HTTP response from the PauseCluster API. 
  * PauseRedshiftClusterWithSnapShot.Response: The standard HTTP response from the PauseCluster API. 
schemaVersion: '0.3' 
assumeRole: '{{ AutomationAssumeRole }}' 
parameters: 
  AutomationAssumeRole: 
    type: String 
    description: (Required) The ARN of the role that allows Automation to perform the actions on your behalf. 
    allowedPattern: '^arn:aws[a-z0-9-]*:iam::\d{12}:role\/[\w-\/.@+=,]{1,1017}$' 
  ClusterIdentifier: 
    type: String 
    description: (Required) The identifier of the Amazon Redshift Cluster. 
    allowedPattern: '[a-z]{1}[a-z0-9_.-]{0,62}' 
mainSteps: 
  - name: GetRedshiftClusterStatus 
    action: 'aws:executeAwsApi' 
    inputs: 
      ClusterIdentifier: '{{ ClusterIdentifier }}' 
      Service: redshift 
      Api: DescribeClusters 
    description: |- 
      ## GetRedshiftClusterStatus 
      Gets the status for the given Amazon Redshift Cluster. 
    outputs: 
      - Name: ClusterStatus 
        Selector: '$.Clusters[0].ClusterStatus' 
        Type: String 
    timeoutSeconds: 600 
  - name: Condition 
    action: 'aws:branch' 
    inputs: 
      Choices: 
        - NextStep: PauseRedshiftCluster 
          Variable: '{{ GetRedshiftClusterStatus.ClusterStatus }}' 
          StringEquals: available 
      Default: Finish 
  - name: PauseRedshiftCluster 
    action: 'aws:executeAwsApi' 
    description: | 
      ## PauseRedshiftCluster 
      Makes PauseCluster API call using Amazon Redshift Cluster identifier and pauses the cluster without taking any final snapshot. 
      ## Outputs 
      * Response: The standard HTTP response from the PauseCluster API. 
    timeoutSeconds: 600 
    isEnd: false 
    nextStep: VerifyRedshiftClusterPause 
    inputs: 
      Service: redshift 
      Api: PauseCluster 
      ClusterIdentifier: '{{ ClusterIdentifier }}' 
    outputs: 
      - Name: Response 
        Selector: $ 
        Type: StringMap 
  - name: VerifyRedshiftClusterPause 
    action: 'aws:assertAwsResourceProperty' 
    timeoutSeconds: 600 
    isEnd: true 
    description: | 
      ## VerifyRedshiftClusterPause 
      Verifies the given Amazon Redshift Cluster is paused. 
    inputs: 
      Service: redshift 
      Api: DescribeClusters 
      ClusterIdentifier: '{{ ClusterIdentifier }}' 
      PropertySelector: '$.Clusters[0].ClusterStatus' 
      DesiredValues: 
        - pausing 
  - name: Finish 
    action: 'aws:sleep' 
    inputs: 
      Duration: PT1S 
    isEnd: true

The SSM automations document is deployed with the AWS CDK:

from aws_cdk import aws_ssm as ssm  

ssm_document_content = #read yaml document as dict  

document_id = 'automation_id'   
document_name = 'automation_name' 

document = ssm.CfnDocument(scope, id=document_id, content=ssm_document_content,  
                           document_format="YAML", document_type='Automation', name=document_name) 

To run the automation document, AWS Config needs the right permissions. You can create an IAM role for this purpose:

from aws_cdk import iam 

#Create role for the automation 
role_name = 'role-to-pause-redshift'
automation_role = iam.Role(scope, 'role-to-pause-redshift-cluster', 
                           assumed_by=iam.ServicePrincipal('ssm.amazonaws.com'), 
                           role_name=role_name) 

automation_policy = iam.Policy(scope, "policy-to-pause-cluster", 
                               policy_name='policy-to-pause-cluster', 
                               statements=[ 
                                   iam.PolicyStatement( 
                                       effect=iam.Effect.ALLOW, 
                                       actions=['redshift:PauseCluster', 
                                                'redshift:DescribeClusters'], 
                                       resources=['*'] 
                                   ) 
                               ]) 

automation_role.attach_inline_policy(automation_policy)

Swisscom defined the rules to be applied following AWS best practices (see Security Best Practices for Amazon Redshift). These are deployed as AWS Config conformance packs. A conformance pack is a collection of AWS Config rules and remediation actions that can be quickly deployed as a single entity in an AWS account and AWS Region or across an organization in AWS Organizations.

Conformance packs are created by authoring YAML templates that contain the list of AWS Config managed or custom rules and remediation actions. You can also use SSM documents to store your conformance pack templates on AWS and directly deploy conformance packs using SSM document names.

This AWS conformance pack can be deployed using the AWS CDK:

from aws_cdk import aws_config  
  
conformance_pack_template = # read yaml file as str 
conformance_pack_content = # substitute `role_arn_for_substitution` and `document_for_substitution` in conformance_pack_template

conformance_pack_id = 'conformance-pack-id' 
conformance_pack_name = 'conformance-pack-name' 


conformance_pack = aws_config.CfnConformancePack(scope, id=conformance_pack_id, 
                                                 conformance_pack_name=conformance_pack_name, 
                                                 template_body=conformance_pack_content)

Conclusion

Swisscom is building its next-generation data-as-a-service platform through a combination of automated provisioning processes, advanced security features, and user-configurable options to cater for diverse data handling and data products’ needs. The integration of the Amazon Redshift construct in the ODP framework is a significant stride in Swisscom’s journey towards a more connected and data-driven enterprise landscape.

In Part 1 of this series, we demonstrated how to provision a secure and compliant Redshift cluster using the AWS CDK as well as how to deal with the best practices of secret rotation. We also showed how to use AWS CDK custom resources in automating the creation of dynamic user groups that are relevant for the IAM roles matching different job functions.

In this post, we showed, through the usage of the AWS CDK, how to address key Redshift cluster usage topics such as federation with the Swisscom IdP, JDBC connections, detective controls using AWS Config rules and remediation actions, cost optimization using the Redshift scheduler, and audit logging.

The code snippets in this post are provided as is and will need to be adapted to your specific use cases. Before you get started, we highly recommend speaking to an Amazon Redshift specialist.

About the Authors

Asad bin Imtiaz is an Expert Data Engineer at Swisscom, with over 17 years of experience in architecting and implementing enterprise-level data solutions.

Jesús Montelongo Hernández is an Expert Cloud Data Engineer at Swisscom. He has over 20 years of experience in IT systems, data warehousing, and data engineering.

Samuel Bucheli is a Lead Cloud Architect at Zühlke Engineering AG. He has over 20 years of experience in software engineering, software architecture, and cloud architecture.

Srikanth Potu is a Senior Consultant in EMEA, part of the Professional Services organization at Amazon Web Services. He has over 25 years of experience in Enterprise data architecture, databases and data warehousing.

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

2024-06-12 Asad Bin Imtiaz

Post Syndicated from Asad Bin Imtiaz original https://aws.amazon.com/blogs/big-data/how-swisscom-automated-amazon-redshift-as-part-of-their-one-data-platform-solution-using-aws-cdk-part-1/

Swisscom is a leading telecommunications provider in Switzerland. Swisscom’s Data, Analytics, and AI division is building a One Data Platform (ODP) solution that will enable every Swisscom employee, process, and product to benefit from the massive value of Swisscom’s data.

In a two-part series, we talk about Swisscom’s journey of automating Amazon Redshift provisioning as part of the Swisscom ODP solution using the AWS Cloud Development Kit (AWS CDK), and we provide code snippets and the other useful references.

In this post, we deep dive into provisioning a secure and compliant Redshift cluster using the AWS CDK and discuss the best practices of secret rotation. We also explain how Swisscom used AWS CDK custom resources in automating the creation of dynamic user groups that are relevant for the AWS Identity and Access management (IAM) roles matching different job functions.

In Part 2 of this series, we explore using the AWS CDK and some of the key topics for self-service usage of the provisioned Redshift cluster by end-users as well as other managed services and applications. These topics include federation with the Swisscom identity provider (IdP), JDBC connections, detective controls using AWS Config rules and remediation actions, cost optimization using the Redshift scheduler, and audit logging.

Amazon Redshift is a fast, scalable, secure, fully managed, and petabyte scale data warehousing service empowering organizations and users to analyze massive volumes of data using standard SQL tools. Amazon Redshift benefits from seamless integration with many AWS services, such as Amazon Simple Storage Service (Amazon S3), AWS Key Management Service (AWS KMS), IAM, and AWS Lake Formation, to name a few.

The AWS CDK helps you build reliable, scalable, and cost-effective applications in the cloud with the considerable expressive power of a programming language. The AWS CDK supports TypeScript, JavaScript, Python, Java, C#/.Net, and Go. Developers can use one of these supported programming languages to define reusable cloud components known as constructs. A data product owner in Swisscom can use the ODP AWS CDK libraries with a simple config file to provision ready-to-use infrastructure, such as S3 buckets; AWS Glue ETL (extract, transform, and load) jobs, Data Catalog databases, and crawlers; Redshift clusters; JDBC connections; and more, with all the needed permissions in just a few minutes.

One Data Platform

The ODP architecture is based on the AWS Well Architected Framework Analytics Lens and follows the pattern of having raw, standardized, conformed, and enriched layers as described in Modern data architecture. By using infrastructure as code (IaC) tools, ODP enables self-service data access with unified data management, metadata management (data catalog), and standard interfaces for analytics tools with a high degree of automation by providing the infrastructure, integrations, and compliance measures out of the box. At the same time, the ODP will also be continuously evolving and adapting to the constant stream of new additional features being added to the AWS analytics services. The following high-level architecture diagram shows ODP with different layers of the modern data architecture. In this series, we specifically discuss the components specific to Amazon Redshift (highlighted in red).

Harnessing Amazon Redshift for ODP

A pivotal decision in the data warehousing migration process involves evaluating the extent of a lift-and-shift approach vs. re-architecture. Balancing system performance, scalability, and cost while taking into account the rigid system pieces requires a strategic solution. In this context, Amazon Redshift has stood out as a cloud-centered data warehousing solution, especially with its straightforward and seamless integration into the modern data architecture. Its straightforward integration and fluid compatibility with AWS services like Amazon QuickSight, Amazon SageMaker, and Lake Formation further solidifies its choice for forward-thinking data warehousing strategies. As a columnar database, it’s particularly well suited for consumer-oriented data products. Consequently, Swisscom chose to provide a solution wherein use case-specific Redshift clusters are provisioned using IaC, specifically using the AWS CDK.

A crucial aspect of Swisscom’s strategy is the integration of these data domain and use case-oriented individual clusters into a virtually single and unified data environment, making sure that data ingestion, transformation, and eventual data product sharing remains convenient and seamless. This is achieved by custom provisioning of the Redshift clusters based on user or use case needs, in a shared virtual private cloud (VPC), with data and system governance policies and remediation, IdP federation, and Lake Formation integration already in place.

Although many controls for governance and security were put in place in the AWS CDK construct, Swisscom users also have the flexibility to customize their clusters based on what they need. The cluster configurator allows users to define the cluster characteristics based on individual use case requirements while remaining within the bounds of defined best practices. The key configurable parameters include node types, sizing, subnet types for routing based on different security policies per user case, enabling scheduler, integration with IdP setup, and any additional post-provisioning setup, like the creation of specific schemas and group-level access on it. This flexibility in configuration is achieved for the Amazon Redshift AWS CDK construct through a Python data class, which serves as a template for users to specify aspects like subnet types, scheduler cron expressions, and specific security groups for the cluster, among other configurations. Users are also able to select the type of subnets (routable-private or non-routable-private) to adhere to network security policies and architectural standards. See the following data class options:

class RedShiftOptions:
    node_type: NodeType
    number_of_nodes: int
    vpc_id: str
    security_group_id: Optional[str]
    subnet_type: SubnetType
    use_redshift_scheduler: bool
    scheduler_pause_cron: str
    scheduler_resume_cron: str
    maintenance_window: str
    # Additional configuration options ...

The separation of configuration in the RedShiftOptions data class from the cluster provisioning logic in the RedShiftCluster AWS CDK construct is in line with AWS CDK best practices, wherein both constructs and stacks should accept a property object to allow for full configurability completely in code. This separates the concerns of configuration and resource creation, enhancing the readability and maintainability. The data class structure reflects the user configuration from a configuration file, making it straightforward for users to specify their requirements. The following code shows what the configuration file for the Redshift construct looks like:

# ===============================
# Amazon Redshift Options
# ===============================
# The enriched layer is based on Amazon Redshift.
# This section has properties for Amazon Redshift.
#
redshift_options:
  provision_cluster: true                                     # Skip provisioning Amazon Redshift in enriched layer (required)
  number_of_nodes: 2                                          # Number of nodes for redshift cluster to provision (optional) (default = 2)
  node_type: "ra3.xlplus"                                     # Type of the cluster nodes (optional) (default = "ra3.xlplus")
  use_scheduler: true                                        # Whether to use the Amazon Redshift scheduler (optional)
  scheduler_pause_cron: "cron(00 18 ? * MON-FRI *)"           # Cron expression for scheduler pause (optional)
  scheduler_resume_cron: "cron(00 08 ? * MON-FRI *)"          # Cron expression for scheduler resume (optional)
  maintenance_window: "sun:23:45-mon:00:15"                   # Maintenance window for Amazon Redshift (optional)
  subnet_type: "routable-private"                             # 'routable-private' OR 'non-routable-private' (optional)
  security_group_id: "sg-test-redshift"                       # Security group ID for Amazon Redshift (optional) (reference must exist)
  user_groups:                                                # User groups and their privileges on default DB
    - group_name: dba
      access: [ 'ALL' ]
    - group_name: data_engineer
      access: [ 'SELECT' , 'INSERT' , 'UPDATE' , 'DELETE' , 'TRUNCATE' ]
    - group_name: qa_engineer
      access: [ 'SELECT' ]
  integrate_all_groups_with_idp: false

Admin user secret rotation

As part of the cluster deployment, an admin user is created with its credentials stored in AWS Secrets Manager for database management. This admin user is used for automating several setup operations, such as the setup of database schemas and integration with Lake Formation. For the admin user, as well as other users created for Amazon Redshift, Swisscom used AWS KMS for encryption of the secrets associated with cluster users. The use of Secrets Manager made it simple to adhere to IAM security best practices by supporting the automatic rotation of credentials. Such a setup can be quickly implemented on the AWS Management Console or may be integrated in AWS CDK code with friendly methods in the aws_redshift_alpha module. This module provides higher-level constructs (specifically, Layer 2 constructs), including convenience and helper methods, as well as sensible default values. This module is experimental and under active development and may have changes that aren’t backward compatible. See the following admin user code:

admin_secret_kms_key_options = KmsKeyOptions(
    ...
    key_name='redshift-admin-secret',
    service="secretsmanager"
)
admin_secret_kms_key = aws_kms.Key(
    scope, 'AdminSecretKmsKey,
    # ...
)

# ...

cluster = aws_redshift_alpha.Cluster(
            scope, cluster_identifier,
            # ...
            master_user=aws_redshift_alpha.Login(
                master_username='admin',
                encryption_key=admin_secret_kms_key
                ),
            default_database_name=database_name,
            # ...
        )

See the following code for secret rotation:

self.cluster.add_rotation_single_user(aws_cdk.Duration.days(60))

Methods such as add_rotation_single_user internally rely on a serverless application hosted in the AWS Serverless Application Model repository, which may be in a different AWS Region outside of the organization’s permission boundary. To effectively use such functions, make sure access to this serverless repository within the organization’s service control policies. If the access is not feasible, consider implementing solutions such as custom AWS Lambda functions replicating these functionalities (within your organization’s permission boundary).

AWS CDK custom resource

A key challenge Swisscom faced was automating the creation of dynamic user groups tied to specific IAM roles at deployment time. As an initial and simple solution, Swisscom’s approach was creating an AWS CDK custom resource using the admin user to submit and run SQL statements. This allowed Swisscom to embed the logic for the database schema, user group assignments, and Lake Formation-specific configurations directly within AWS CDK code, making sure that these crucial steps are automatically handled during cluster deployment. See the following code:

sql = get_rendered_stacked_sqls()

custom_resources.AwsCustomResource(scope, 'RedshiftSQLCustomResource',
                                           on_update=custom_resources.AwsSdkCall(
                                               service='RedshiftData',
                                               action='executeStatement',
                                               parameters={
                                                   'ClusterIdentifier': cluster_identifier,
                                                   'SecretArn': secret_arn,
                                                   'Database': database_name,
                                                   'Sql': f'{sqls}',
                                               },
                                               physical_resource_id=custom_resources.PhysicalResourceId.of(
                                                   f'{account}-{region}-{cluster_identifier}-groups')
                                           ),
                                           policy=custom_resources.AwsCustomResourcePolicy.from_sdk_calls(
                                               resources=[f'arn:aws:redshift:{region}:{account}:cluster:{cluster_identifier}']
                                           )
                                        )


cluster.secret.grant_read(groups_cr)

This method of dynamic SQL, embedded within the AWS CDK code, provides a unified deployment and post-setup of the Redshift cluster in a convenient manner. Although this approach unifies the deployment and post-provisioning configuration with SQL-based operations, it remains an initial strategy. It is tailored for convenience and efficiency in the current context. As ODP further evolves, Swisscom will iterate this solution to streamline SQL operations during cluster provisioning. Swisscom remains open to integrating external schema management tools or similar approaches where they add value.

Another aspect of Swisscom’s architecture is the dynamic creation of IAM roles tailored for the user groups for different job functions within the Amazon Redshift environment. This IAM role generation is also driven by the user configuration, acting as a blueprint for dynamically defining user role to policy mappings. This allowed them to quickly adapt to evolving requirements. The following code illustrates the role assignment:

policy_mappings = {
    "role1": ["Policy1", "Policy2"],
    "role2": ["Policy3", "Policy4"],
    ...
    # Example:
    # "dba-role": ["AmazonRedshiftFullAccess", "CloudWatchFullAccess"],
    # ...
}

def create_redshift_role(role_name, policy_list):
   # Implementation to create Redshift role with provided policies
   ...

redshift_role_1 = create_redshift_role(
    data_product_name, "role1", policy_names=policy_mappings["role1"])
redshift_role_1 = create_redshift_role(
    data_product_name, "role1", policy_names=policy_mappings["role1"])
# Example:
# redshift_dba_role = create_redshift_role(
#   data_product_name, "dba-role", policy_names=policy_mappings["dba-role"])
...

Conclusion

Swisscom is building its data-as-a-service platform, and Amazon Redshift has a crucial role as part of the solution. In this post, we discussed the aspects that need to be covered in your IaC best practices to deploy secure and maintainable Redshift clusters using the AWS CDK. Although Amazon Redshift supports industry-leading security, there are aspects organizations need to adjust to their specific requirements. It is therefore important to define the configurations and best practices that are right for your organization and bring it to your IaC to make it available for your end consumers.

We also discussed how to provision a secure and compliant Redshift cluster using the AWS CDK and deal with the best practices of secret rotation. We also showed how to use AWS CDK custom resources in automating the creation of dynamic user groups that are relevant for the IAM roles matching different job functions.

In Part 2 of this series, we will delve into enhancing self-service capabilities for end-users. We will cover topics like integration with the Swisscom IdP, setting up JDBC connections, and implementing detective controls and remediation actions, among others.

The code snippets in this post are provided as is and will need to be adapted to your specific use cases. Before you get started, we highly recommend speaking to an Amazon Redshift specialist.

About the Authors

Asad bin Imtiaz is an Expert Data Engineer at Swisscom, with over 17 years of experience in architecting and implementing enterprise-level data solutions.

Jesús Montelongo Hernández is an Expert Cloud Data Engineer at Swisscom. He has over 20 years of experience in IT systems, data warehousing, and data engineering.

Samuel Bucheli is a Lead Cloud Architect at Zühlke Engineering AG. He has over 20 years of experience in software engineering, software architecture, and cloud architecture.

Driving Development Forward: How the PGA TOUR speeds up Development with the AWS CDK

2024-03-17 Evgeny Karasik

Post Syndicated from Evgeny Karasik original https://aws.amazon.com/blogs/devops/driving-development-forward-how-the-pga-tour-speeds-up-development-with-the-aws-cdk/

This post is written by Jeff Kammerer, Senior Solutions Architect.

The PGA TOUR is the world’s premier membership organization for touring professional golfers, co-sanctioning tournaments on the PGA TOUR along with several other developmental, senior, and international tournament series.

The PGA TOUR is passionate about bringing its fans closer to the players, tournaments, and courses. They developed a new mobile app and the PGATOUR.com website to give fans immersive, enhanced, and personalized access to near-real-time leaderboards, shot-by-shot data, video highlights, sports news, statistics, and 3D shot tracking. It is critical for PGA TOUR, which operates in a highly competitive space, to keep up with fans’ demands and deliver engaging content. The maturing DevOps culture, partnered with accelerating the development process, was crucial to the PGA TOUR’s fan engagement transformation.

The PGA TOUR’s fans want near real-time and highly accurate data. To deliver and evolve engaging fan experiences, the PGA TOUR needed to empower their team of developers to quickly release new updates and features. However, the TOUR’s previous architecture required separate code bases for their website and mobile app in a monolithic technology stack. Each update required changes in both code bases, causing feature turnaround time of a minimum of two weeks. The cost and time required to deliver features fans wanted to see in both the app and website were not sustainable. As a result, the TOUR redesigned their mobile app and website using AWS native services and a microservice based architecture to alleviate these pain points.

Accelerating Development with Infrastructure as Code (IaC)

The TOUR’s cloud infrastructure team used AWS CloudFormation for several years to model, provision, and manage their cloud infrastructure. However, the app and web development team within the PGA Tour were not familiar with and did not want to use the JSON and YAML templates that CloudFormation requires, and preferred the coding languages that the AWS Cloud Development Kit (CDK) supports. The developers use TypeScript to develop the new mobile app and website using services like AWS AppSync, AWS Lambda, AWS Step Functions, and AWS Batch. Additionally, the PGA TOUR wanted to simplify how they assigned the correct and minimal IAM permissions needed. As a result, the TOUR developers started using the CDK for IaC because it offered a natural extension to how they were already writing code.

The TOUR leverages all three layers of the AWS CDK Construct Library. They take advantage of higher-layer Pattern Constructs for key services like AWS Lambda and AWS Elastic Container Service (Amazon ECS). The CDK pattern constructs provide a reference architecture or design patterns intended to help complete common tasks. The pattern constructs for AWS Lambda, Amazon ECS, and existing patterns saved the TOUR hours and weeks of development time. They also use the lower-level Layer 2 and Layer 1 Constructs for services like Amazon DynamoDB and AWS AppSync.

PGA TOUR’s New Mobile App

Figure1. Welcome to the PGA TOUR’s New App

PGA TOUR Benefits From Using AWS CDK

Using AWS CDK enabled and empowered the platform and development teams and changed how the PGA TOUR operates their technical environments. They create and de-provision environments as needed to build, test, and deploy new features into production. Automating changes in their underlying infrastructure has become very easy for the PGA TOUR. As an example, the TOUR wanted to update their Lambda runtimes to release 18. With AWS CDK, this change was implemented with a single-line change in their Lambda Common stack and pushed to the over 300 Lambda functions they deployed.

The CDK provides flexibility and agility, which helps the TOUR manage constant change in the appearance of their mobile app and website content given they run different tournaments each week. The TOUR uses the CDK to provision parallel environments where they prepare for the next tournament with unique functions and content without risking impact to services during the current tournament. Once the current tournament is complete, they can flip to the new stack and de-provision the old. The CDK has allowed the TOUR to move from a bi-weekly three-hour maintenance window release schedule to multiple as needed releases per day that take approximately 7 minutes. It has enabled the TOUR to push production releases and fixes, even in the middle of tournament play which previously had been deemed too risky under the prior monolithic technology stack. In one case, the TOUR developers could go from identifying a bug to coding a fix with push through User Acceptance Testing (UAT) and into production in 42 minutes. This is a process that was previously measured in hours or days.

High level AWS CDK/App Architecture

Figure2. High level AWS CDK/App Architecture

Expressing the organizational capability change AWS CDK facilitates for the PGA TOUR Digital team in context of the widely accepted DevOps Research & Assessment (DORA) metrics which assesses organizational maturity in DevOps:

DORA Metrics

One of the best benefits the TOUR realized using AWS CDK was how much it helped reduce complexity of managing AWS Identity and Access Management (IAM) permissions. The TOUR understands how important it is to maintain granular control of IAM trust policies, especially when working in a serverless architecture. David Provan, shared “AWS CDK encourages security by design and you end up considering security through the entire project rather than coming back to do security hardening after development”. AWS CDK automates the necessary IAM permissions at an atomic level in a manner where they are set and managed correctly. When the PGA TOUR takes resources down, AWS CDK removes the IAM permissions.

Lessons Learned and Looking Forward

The steepest learning curve for the PGA TOUR was in the granularity of their CDK Stacks. They initially started with a single large stack, but found that breaking the application into smaller stacks allowed them to be more surgical with granular deployments and updates. They found some services like AWS Lambda update very quickly, whereas DynamoDB deployed with global tables across multiple regions takes longer and benefit from being in their own stack. This balance is something the TOUR is still working on as they iterate after the initial launch.

Looking forward, the PGA TOUR sees longer-range benefits where the CDK will allow them to reuse their stacks and accelerate development for other departments or entities in the future. They also see benefit for reusing code and patterns across different workloads entirely.

Conclusion

The AWS Cloud Development Kit has been transformational to how the PGA TOUR is deploying their services on AWS and working to bring exciting and immersive experiences to fans. To learn more, review the AWS CDK Developer Guide to read about best practices for developing cloud applications, and review this blog that provides an overview of Working with the AWS Cloud Development kit and AWS Construct Library. Also, explore what CDK can do for you.

Import entire applications into AWS CloudFormation

2024-02-02 Dan Blanco

Post Syndicated from Dan Blanco original https://aws.amazon.com/blogs/devops/import-entire-applications-into-aws-cloudformation/

AWS Infrastructure as Code (IaC) enables customers to manage, model, and provision infrastructure at scale. You can declare your infrastructure as code in YAML or JSON by using AWS CloudFormation, in a general purpose programming language using the AWS Cloud Development Kit (CDK), or visually using Application Composer. IaC configurations can then be audited and version controlled in a version control system of your choice. Finally, deploying AWS IaC enables deployment previews using change sets, automated rollbacks, proactive enforcement of resource compliance using hooks, and more. Millions of customers enjoy the safety and reliability of AWS IaC products.

Not every resource starts in IaC, however. Customers create non-IaC resources for various reasons: they didn’t know about IaC, or they prefer to work in the CLI or management console. In 2019, we introduced the ability to import existing resources into CloudFormation. While this feature proved integral for bringing resources into IaC on an individual basis, the process of manually creating templates to match those resources wasn’t ideal. Customers were required to look up documentation on resources and painstakingly copy values manually. Customers also told us they traditionally engaged with applications (that is, groupings of related resources), so dealing with individual resources didn’t match that experience. We set out to create a more holistic flow for managing resources and their relations.

Recently, we announced the IaC generator and CDK Migrate, an end-to-end experience that enables customers to create an IaC configuration based off a resource as well as its relationships. This works by scanning an AWS account and using the CloudFormation resource type schema to find relationships between resources. Once this configuration is created, you can use it to either import those resources into an existing stack, or create a brand new stack from scratch. It’s now possible to bring entire applications into a managed CloudFormation stack without having to recreate any resources!

In this post, I’ll explore a common use case we’ve seen and expect the IaC generator to solve: an existing network architecture, created outside of any IaC tool, needs to be managed by CloudFormation.

IaC generator in Action

Consider the following scenario:

As a new hire to an organization that’s just starting its cloud adoption journey, you’ve been tasked with continuing the development of the team’s shared Amazon Virtual Private Cloud (VPC) resources. These are actively in use by the development teams. As you dig around, you find out that these resources were created without any form of IaC. There’s no documentation, and the person who set it up is no longer with the team. Confounding the problem, you have multiple VPCs and their related resources, such subnets, route tables, and internet gateways.

You understand the benefits of IaC – repeatability, reliability, auditability, and safety. Bringing these resources under CloudFormation management will extend these benefits to your existing resources. You’ve imported resources into CloudFormation before, so you set about the task of finding all related resources manually to create a template. You quickly discover, however, that this won’t be a simple task. VPCs don’t store relations to items; instead, relations are reversed – items know which VPC they belong to, but VPCs don’t know which items belong to them. In order to find all the resources that are related to a VPC, you’ll have to manually go through all the VPC-related resources and scan to see which vpc-id they belong to. You’ll have to be diligent, as it’s very easy to miss a resource because you weren’t aware that it existed or it may even be different class of resource altogether! For example, some resources may use an elastic network interface (ENI) to attach to the VPC, like an Amazon Relational Database Service instance.

You, however, recently learned about the IaC generator. The generator works by running a scan of your account and creating an up-to-date inventory of resources. CloudFormation will then leverage the resource type schema to find relationships between resources. For example, it can determine that a subnet has a relationship to a VPC via a vpc-id property. Once these relationships have been determined, you can then select the top-level resources you want to generate a template for. Finally, you’ll be able to leverage the wizard to create a stack from this existing template.

You can navigate to the IaC generator page in the Amazon Management Console and start a scan on your account. Scans last for 30 days, and you can run three scans per day in an account.

Scan account button and status

Once the scan completes, you create a template by selecting the Create Template button. After selecting Start from a new template, you fill out the relevant details about the stack, including the Template name and any stack policies. In this case, you leave it as Retain.

Create template section with "Start from a new template" selected

On the next page, you’ll see all the scanned resources. You can add filters to the resource such as tags to view a subset of scanned resources. This example will only use a Resource type prefix filter. More information on filters can be found here. Once you find the VPC, you can select it from the list.

A VPC selected in the scanned resources list]

On the next page, you’ll see the list of resources that CloudFormation has determined to have a link to this VPC. You see this includes a myriad of networking related resource. You keep these all selected to create a template from them.

A list of related resources, all selected

At this point, you select Create template and CloudFormation will generate a template from the existing resources. Since you don’t have an existing stack to import these resource into, you must create a new stack. You now select this template and then select the Import to stack button.

The template detail page with an import to stack button

After entering the Stack name, you can then enter any Parameters your template needs.

The specify stack details page, with a stack name of "networking" entered

CloudFormation will create a change set for your new stack. Change sets allow you to see the changes CloudFormation will apply to a stack. In this example, all of the resources will have the Import status. You see the resources CloudFormation found, and once you’re satisfied, you create the stack.

A change set indicating the previously found resources will be created

At this point, the create stack operation will proceed as normal, going through each resource and importing it into the stack. You can report back to your team that you have successfully imported your entire networking stack! As next steps, you should source this template in a version control system. We recently announced a new feature to keep CloudFormation templates synced with popular version control systems. Finally, make sure to make any changes through CloudFormation to avoid a configuration drift between the stated configuration and the existing configuration.

This example was primarily CloudFormation-based, but CDK customers can use CDK Migrate to import this configuration into a CDK application.

Available Now

The IaC generator is now available in all regions where CloudFormation is supported. You can access the IaC generator using the console, CLI, and SDK.

Conclusion

In this post, we explored the new IaC generator feature of CloudFormation. We walked through a scenario of needing to manage previously existing resources and using the IaC generator’s provided wizard flow to generate a CloudFormation template. We then used that template and created a stack to manage these resources. These resources will now enjoy the safety and repeatability that IaC provides. Though this is just one example, we foresee other use cases for this feature, such as enabling a console-first development experience. We’re really excited to hear your thoughts about the feature. Please let us know how you feel!

Announcing CDK Migrate: A single command to migrate to the AWS CDK

2024-02-02 Adam Keller

Post Syndicated from Adam Keller original https://aws.amazon.com/blogs/devops/announcing-cdk-migrate-a-single-command-to-migrate-to-the-aws-cdk/

Today we’re excited to announce the general availability of CDK Migrate, a component of the AWS Cloud Development Kit (CDK). This feature enables users to migrate AWS CloudFormation templates, previously deployed CloudFormation stacks, or resources created outside of Infrastructure as Code (IaC) into a CDK application. This feature is being launched in tandem with the CloudFormation IaC Generator, which helps customers import resources created outside of CloudFormation into a template, and into a newly generated, fully managed CloudFormation stack. To read more on this feature, check out the launch post.

There are various ways to create and manage resources in AWS, whether that be via “ClickOps” (creating and updating via the AWS Console), via AWS API’s, or using Infrastructure as Code (IaC). While it’s a good and recommended practice to manage the lifecycle of resources using IaC, there can be an on-ramp to getting started. For those that aren’t ready to use IaC, it is likely that they use the console to create the resources and update them accordingly. While this can be acceptable for smaller use cases or for testing out a new service, it becomes more challenging as the complexity of the environment grows. This is further exacerbated when there is a need to re-deploy the exact configuration to other accounts, environments, or regions, as the process becomes very error prone when trying to replicate it. IaC is built to help solve this problem by allowing users to define once and deploy everywhere. For those who have been putting off the move to IaC, now is the time to take the plunge with the IaC generator functionality and CDK migrate, which can accelerate and simplify the move.

Getting Started

The first step when migrating resources into the AWS CDK is to understand the best mechanism for how the users would prefer to interact with their IaC.

For users that are looking to define their IaC declaratively (manage resources via a configuration language like YAML), it is recommended that they look at IaC generator, which can generate a CloudFormation template as well as manage the existing resources in a CloudFormation stack.
For users that are looking to manage their IaC via a higher level programming language as well as build on top of those templates with higher level abstractions and automation, the AWS Cloud Development Kit and CDK migrate serve as an excellent option,

There is also functionality in the CDK CLI to import resources into an existing CDK application. Let’s review the use cases for when to use CDK migrate vs when to use CDK import.

CDK Migrate

Users are looking to migrate one or many resources into a new CDK application.
- Examples of existing resources in the AWS region to be migrated:
  - Resources created outside of IaC
  - A deployed CloudFormation Stack
Users want to migrate from CloudFormation templates into a new CDK application
Users are looking for a managed experience to generate CDK code from existing resources and/or CloudFormation templates.
While the CDK migrate feature is designed to help accelerate those users looking to use the AWS CDK, it’s important to understand that there are limitations. For more information on the limitations, please review the documentation.

CDK Import

Users have an existing CDK application and want to import one or many resources that were created outside of the CDK.
- Examples of existing resources in the AWS region to be migrated:
  - Resources created outside of IaC (via ClickOps)
  - A deployed CloudFormation Stack
- The user must define the resources in their CDK app on their own, and ensure that the resources defined in the CDK code map directly to the resource as it exists in the account. There is a multi-step process to follow when using this feature, for more information see here.

This post will walk through an example of how to take a local CloudFormation template and convert it into a new CDK application.

Walkthrough

To start, take the CloudFormation template below that will be converted to a CDK application. The template creates an AWS Lambda Function, AWS Identity and Access Management (IAM) role, and an Amazon S3 Bucket along with some parameters to help make some of the inputs dynamic. Below is the template in full:

AWSTemplateFormatVersion: "2010-09-09"
Description: AWS CDK Migrate Demo Template
Parameters:
  FunctionResponse:
    Description: Response message from the Lambda function
    Type: String
    Default: Hello World
  BucketTag:
    Description: The tag value of the S3 bucket
    Type: String
    Default: ChangeMe
Resources:
  LambdaExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
  HelloWorldFunction:
    Type: AWS::Lambda::Function
    Properties:
      Role: !GetAtt LambdaExecutionRole.Arn
      Code:
        ZipFile: |
          import os
          def lambda_handler(event, context):
            function_response = os.getenv('FUNCTION_RESPONSE')
            return {
              "statusCode": 200,
              "body": function_response
            }
      Handler: index.lambda_handler
      Runtime: python3.11
      Environment:
        Variables:
          FUNCTION_RESPONSE: !Ref FunctionResponse
  S3Bucket:
    Type: AWS::S3::Bucket
    Properties:
      PublicAccessBlockConfiguration:
        BlockPublicAcls: true
        BlockPublicPolicy: true
        IgnorePublicAcls: true
        RestrictPublicBuckets: true
      BucketEncryption:
        ServerSideEncryptionConfiguration:
          - ServerSideEncryptionByDefault:
              SSEAlgorithm: AES256
      Tags:
        - Key: Application
          Value: Git-Sync-Demo
        - Key: DynamicTag
          Value: !Ref BucketTag
Outputs:
  S3BucketName:
    Description: The name of the S3 bucket
    Value: !Ref S3Bucket
    Export:
      Name: !Sub ${AWS::StackName}-S3BucketName

This is the template that you will use when running the migration command. As a reminder, this demo migrates a CloudFormation template to a CDK application, but you can also migrate a previously deployed stack or non IaC created resources.

Migrate

The migration from the CloudFormation template to the CDK is done with a single command: cdk migrate. Simply point to the local CloudFormation template file (let’s call it demo-template.yaml), and watch as the CLI converts the template into a CDK application. The output and result from running the command will be a directory comprised of the CDK code and dependencies, but will not deploy the stack.

cdk migrate --stack-name CDK-Local-Template-Migrate-Demo --language typescript --from-path ../demoTemplate.yaml

CDK Migrate command

In the above command, you’re instructing the CDK CLI to consume the CloudFormation template file using the --from-path parameter, and choose the language as the output for the CDK application. The CDK CLI will convert the template as well as create a project folder along with the required dependencies for the CDK application.

When the migration is complete, the CDK application along with the project structure and files are available and ready to use, but have not yet been deployed. Below is the file structure of what was generated:

cdk app directory structure

The above output represents the scaffold for your CDK Typescript application, ready for deployment. The two directories that house the CDK code are bin and lib. Within the bin directory you’ll find the code that creates our CDK app and calls the CDK Stack class. The name of the files will match the input that was passed into the –stack-name parameter when running the migrate command, so in this case the file is named: bin/cdk-local-template-migrate-demo.ts. Below is the generated code:

CDK App Code

The CdkLocalTemplateMigrateDemoStack is imported and then instantiated. This is where the code that was converted from the existing CloudFormation template (or stack, or resources) resides. Again, similar to how the file was named above, the filename and location for the CDK stack code is lib/cdk-local-template-migrate-demo-stack.ts. Let’s look at the code that was converted.

CDK Stack Code

Comparing the above auto generated code to the original CloudFormation template, the definitions of the resources look similar. This is because the migrate command is generating the CDK code using L1 constructs, which represent all resources available in CloudFormation. For more information on CDK constructs and the various levels of abstraction they offer, check out this video.

The CloudFormation parameters were converted to properties inside of an interface, which are passed in to the Stack class. Inside of the Stack class code, it honors the defaults set in the properties based on the defaults were set in the original CloudFormation parameters. If you wanted to override those defaults, you could pass those properties into the CDK stack as follows:

CDK App Code Cleaned Up

With your newly created CDK application, you’re ready to deploy it to your AWS account.

Deploy

If this is the first time that you are using the CDK in the account and region, you will need to run the cdk bootstrap command, which creates assets required for the CDK to properly deploy resources to the region and account. For more information see here. Assuming the bootstrap process has happened, you can proceed to deployment.

The Infrastructure as Code is ready to deploy, but prior to deploying you should run a cdk diff to see what will be deployed. Running the diff command creates a change set and surfaces the changes being proposed (in this case it is a brand new stack with new resources).

Cdk Diff command

From the output you can see that all new resources are being created. If the cdk diff command was run against existing resources or stacks, assuming nothing changed (like above where I updated the properties), the diff would show no changes to the existing resources.

Next, deploy the stack (by running the cdk deploy command) and once the deployment is complete, head over to the AWS console and find your Lambda function. Run a test on your lambda function, and the response should match the functionResponse property that was updated as “CDK Migrate Demo Blog”. Lambda test execution output

Wrapping up

In this post, we discussed how the CDK migrate command can help you move your resources to the CDK to manage your infrastructure as code, whether it’s from a CloudFormation template, previously deployed CloudFormation stack, or from importing resources via the CloudFormation IaC generator feature. As always, we encourage you to test this feature and provide feedback and/or feature requests in our GitHub repo. In addition, if you’re new to the CDK there are some resources that can help you get started.

CDK hands on workshop
CDK Live! (Youtube channel with live streams and walkthroughs)
CDK Best Practices Guide

A new and improved AWS CDK construct for Amazon DynamoDB tables

2024-02-01 Anirudh Sharma

Post Syndicated from Anirudh Sharma original https://aws.amazon.com/blogs/devops/a-new-and-improved-aws-cdk-construct-for-amazon-dynamodb-tables/

Recently, we launched a new AWS Cloud Development Kit (CDK) construct for Amazon DynamoDB tables, known as TableV2. This construct provides a number of new features in addition to what the original construct offered, enabling CDK authors to create global tables, simplifying the configuration of global secondary indexes and auto scaling, as well as supporting AWS CloudFormation drift detection and import operations. We believe that this new construct will make it easier for organizations to build and manage their DynamoDB tables at scale, in addition to providing more flexibility and control over the configuration of tables.

AWS CDK is a framework for defining cloud infrastructure in code and provisioning it through AWS CloudFormation. Developers can use any of the supported programming languages to define reusable cloud components known as Constructs. A construct is a reusable and programmable component that represents AWS resources. CDK translates the high-level constructs defined by you into equivalent AWS CloudFormation templates. CloudFormation provisions the resources specified in the template, streamlining the usage of Infrastructure as a Code (IaC) on AWS.

In this post we’ll explore:

The reasoning behind the creation of a new L2 construct for DynamoDB tables.
Features of new L2 constructs along with examples.
The benefits of leveraging this new construct in terms of scalability, flexibility, and simplicity.

By understanding the reasons behind its development and exploring its capabilities through practical examples, you will gain a comprehensive understanding of how this new L2 construct can enhance their DynamoDB experience. Let’s dive in.

Background

The original DynamoDB L2 Table construct is a powerful and versatile tool for creating and managing DynamoDB tables. It allows you to easily define the schema of your table, as well as the provisioned throughput and replicas. It also supports features like global tables, secondary indexes, and streams.

However, the Table construct uses a custom resource to add replicas to the primary table. This means that a separate Lambda function is created as the resource provider in addition to the Table resources (primary table and any replicas). This can be cumbersome to manage and can lead to drift detection issues.

The new TableV2 construct is an abstraction built on top of the GlobalTable L1 construct. It uses the CloudFormation resource AWS::DynamoDB::GlobalTable to create and manage DynamoDB tables. This has two important benefits:

CloudFormation is in control and aware of all replicas that make up the Global Table, which means you will experience drift detection across all the replicas. With the original table construct, CloudFormation was not aware of any replicas since this was being handled through the Lambda function being used as a resource provider.
No extra resource (Lambda function) is created when replicas are configured with TableV2. This eliminates the need to manage an extra resource and the risk of troubleshooting issues that may arise with the custom resource. TableV2 simplifies the setup and maintenance of DynamoDB tables by using native CloudFormation constructs to directly manage replicas, without the need for a Lambda function. This results in a more efficient and streamlined experience for users.

The new TableV2 construct provides more fine-grained control to customers over the replicas created as part of the Global Table. Specifically, customers can specify properties like contributor insights, deletion protection, point-in-time recovery, table class, read capacity, and global secondary index options on a per-replica basis.

This means that customers can tailor their table setup to meet their specific needs and optimize their overall experience with the Global Table feature. For example, a customer might want to enable contributor insights for all replicas, but only enable deletion protection for the primary replica. Or, a customer might want to use a different table class for each replica, depending on the expected workload.

The new TableV2 construct also offers greater flexibility and customization options by allowing customers to specify these properties on a per-replica basis. This can be helpful for customers who need to have different configurations for their replicas, or who want to fine-tune the performance and availability of their tables.

In the next section, we will explore each of these properties in more detail and how they can be specified in the new construct.

Features Walk-through

The new TableV2 construct is the recommended CDK DynamoDB construct for creating both single tables and global tables. In this section, we will review some specific aspects of the TableV2 construct and how they can be implemented. The walkthrough will cover features like Replicas, Billing, and Encryption, providing a comprehensive understanding of its capabilities.

Replicas

One of the most important benefits of the new L2 construct is the ability to configure properties on a per-replica basis. For example, the following code creates a global DynamoDB table with contributor insights and point-in-time recovery enabled for the table:

import * as cdk from 'aws-cdk-lib';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';

const app = new cdk.App();
const stack = new cdk.Stack(app, 'Stack', { env: { region: 'us-west-2' } });

const globalTable = new dynamodb.TableV2(stack, 'GlobalTable', {
  partitionKey: { name: 'pk', type: dynamodb.AttributeType.STRING },
  contributorInsights: true,
  pointInTimeRecovery: true,
  replicas: [
    {
      region: 'us-east-1',
      tableClass: dynamodb.TableClass.STANDARD_INFREQUENT_ACCESS,
      pointInTimeRecovery: false,
    },
    {
      region: 'us-east-2',
      contributorInsights: false,
    },
  ],
});

// This is an ITableV2 instance for the replica table in us-east-1
const replica = globalTable.replica('us-east-1');

This code creates two replicas, one in the us-east-1 region and one in the us-east-2 region. For the replica in the us-east-1 region, we disable point-in-time recovery and set the table class to STANDARD_INFREQUENT_ACCESS. For the replica in the us-east-2 region, we disable contributor insights. The TableV2 construct also enables users to work with individual instances of the replicas in a global table via the replica() method. We see how this can be utilized from the above code where an ITableV2 instance representing the replica in us-east-1 is returned.

This is particularly useful for the grant() and metric() methods. For example, the following code gives a user write access to a replica in us-east-1 region:

import { Construct } from 'constructs';
import { App, Stack, StackProps } from 'aws-cdk-lib';
import { ITableV2, TableV2 } from 'aws-cdk-lib/aws-dynamodb';
import { AttributeType } from 'aws-cdk-lib/aws-dynamodb';
import * as iam from 'aws-cdk-lib/aws-iam';


class FooStack extends Stack {
  public readonly globalTable: TableV2;

  public constructor(scope: Construct, id: string, props: StackProps) {
    super(scope, id, props);

    this.globalTable = new TableV2(this, 'GlobalTable', {
      partitionKey: { name: 'pk', type: AttributeType.STRING },
      replicas: [
        { region: 'us-east-1' },
        { region: 'us-east-2' },
      ],
    });
  }
}

interface BarStackProps extends StackProps {
  readonly replicaTable: ITableV2;
}

class BarStack extends Stack {
  public constructor(scope: Construct, id: string, props: BarStackProps) {
    super(scope, id, props);
    const user = new iam.User(this, 'User')

    // user is given grantWriteData permissions to replica in us-east-1
    props.replicaTable.grantWriteData(user);
  }
}

const app = new App();

const fooStack = new FooStack(app, 'FooStack', { env: { region: 'us-west-2', account: process.env.CDK_DEFAULT_ACCOUNT } });
const barStack = new BarStack(app, 'BarStack', {
  replicaTable: fooStack.globalTable.replica('us-east-1'),
  env: { region: 'us-east-1', account: process.env.CDK_DEFAULT_ACCOUNT },
});

Before the replica() method was introduced, grant methods on the original Table construct applied to the primary table and all replicas. This was because there was no way to pull out a specific replica. This limited a user’s ability to grant a specific principal read, write, or read/write permission to a specific replica. The replica() method enables granting specific permissions to individual replicas in a global table. It maintains consistent behavior across all methods in the ITableV2 interface, including grants and metrics.

Billing

Table billing is easily configured using the onDemand() or provisioned() static methods of the Billing class. If provisioned billing is configured, the user must provide read and write capacity, which can be easily configured using the fixed() or autoscaled() static methods of the Capacity class.

For example, to configure on-demand billing:

import * as cdk from 'aws-cdk-lib';
import { AttributeType, Billing, TableClass, TableV2 } from 'aws-cdk-lib/aws-dynamodb';
import { Construct } from 'constructs';


export class DynamodbStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);
    new TableV2(this, 'DynamoDBTable', {
      partitionKey: { name: 'id', type: AttributeType.STRING},
      replicas: [
        {region: 'us-east-2'},
        {region: 'us-west-1'}
      ],
      billing: Billing.onDemand(),
      tableClass: TableClass.STANDARD
    })
  }
}

To configure provisioned billing:

import * as cdk from 'aws-cdk-lib';
import { AttributeType, Billing, Capacity, TableClass, TableV2 } from 'aws-cdk-lib/aws-dynamodb';
import { Construct } from 'constructs';

export class DynamodbStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);
    new TableV2(this, 'DynamoDBTable', {
      partitionKey: { name: 'id', type: AttributeType.STRING},
      replicas: [
        {region: 'us-east-2'},
        {region: 'us-west-1'}
      ],
      billing: Billing.provisioned({
        readCapacity: Capacity.fixed(5),
        writeCapacity: Capacity.autoscaled({maxCapacity: 10})
      }),
      tableClass: TableClass.STANDARD
    })
  }
}

Note that with the previous Table construct, users had to set a billingMode property and configure readCapacity and writeCapacity as separate properties. Additionally, configuring autoscaled capacity required calling the autoScaleReadCapacity() or autoScaleWriteCapacity() method on an instance of the Table construct. Lastly, since readCapacity, writeCapacity, and billingMode were all individual properties, a user had to know not to provision read and write capacity for a table with PAY_PER_REQUEST billing mode. With the new Billing class, the user is guided into providing necessary properties via the onDemand() and provisioned() static methods.

Encryption

The TableEncryptionV2 class allows you to provide your own KMS keys for each replica instead of using the default AWS owned keys, thus encrypting every replica with a custom KMS key. This provides more granular control over the encryption of your DynamoDB tables.

Here is an example of how to use the TableEncryptionV2 class to encrypt each replica of a global table with a custom KMS key:

import * as cdk from 'aws-cdk-lib';
import { AttributeType, Billing, BillingMode, Capacity, TableBaseV2, TableEncryptionV2, TableV2 } from 'aws-cdk-lib/aws-dynamodb';
import { IKey, Key } from 'aws-cdk-lib/aws-kms';
import { Construct } from 'constructs';

interface KMSkeys extends cdk.StackProps {
  kmsuswest1: IKey;
  kmsuseast2: IKey;
}

export class GlobalTableStack extends cdk.Stack {
  //public readonly globalTable: TableV2;
  constructor(scope: Construct, id: string, props: KMSkeys) {
    super(scope, id, props);

    const replicaTableKeys = {
      "us-west-1": props.kmsuswest1.keyArn,
      "us-east-2": props.kmsuseast2.keyArn
    }
    const TableKMSKey=new Key(this, 'TableKMSKey', {
      alias: 'KMSuswest2Stack',
    }
    )

    new TableV2(this, 'GlobalTable', {
    tableName: 'FooTableFour',
    encryption: TableEncryptionV2.customerManagedKey(TableKMSKey,replicaTableKeys),

    partitionKey: {
    name: 'FooHashKey',
    type: AttributeType.STRING,
    },
    replicas: [
    {
      region: 'us-west-1',  
    },
    {
      region: 'us-east-2',
    },
  ],
    })
  }
}

The ability to provide custom KMS keys for each replica can help to improve the security of your DynamoDB tables. It also gives you more control over the encryption of your data. This can help you to meet specific compliance requirements.

Conclusion

In this post, I introduced the new AWS CDK TableV2 construct, highlighting its advantages over the original construct. Notably, TableV2 enables drift detection for replica tables and eliminates the need for an extra Lambda function custom resource. I delved into practical implementations, focusing on three key aspects: Replicas, Billing, and Encryption.

To summarize, TableV2 marks a substantial improvement over the original construct. Its user experience provides significant improvement over the original construct in several ways, such as:

Direct support for global tables: TableV2 makes it easy to create and manage global DynamoDB tables.
Easier configuration of global secondary indexes and Autoscaling: TableV2 provides a simplified and streamlined process for configuring global secondary indexes and Autoscaling.
More granular control over replicas: TableV2 allows you to configure properties on a per-replica basis, giving you more control over the performance and availability of your tables.
Improved API design and user experience: TableV2 improves the API design and user experience by implementing new classes for billing, capacity, and encryption.

Overall, TableV2 is a powerful and flexible construct that makes it easier to build and manage DynamoDB tables at scale. It is the preferred CDK DynamoDB construct for creating both single tables and global tables. If you are looking for a powerful and flexible way to build and manage DynamoDB tables, TableV2 is the perfect choice for you.

If you’re new to CDK and eager to get started, we highly recommend checking out the CDK documentation and the CDK workshop.

Best practices for scaling AWS CDK adoption within your organization

2024-01-03 David Hessler

Post Syndicated from David Hessler original https://aws.amazon.com/blogs/devops/best-practices-for-scaling-aws-cdk-adoption-within-your-organization/

Enterprises are constantly seeking ways to accelerate their journey to the cloud. Infrastructure as code (IaC) is crucial for automating and managing cloud resources efficiently. The AWS Cloud Development Kit (AWS CDK) lets you define your cloud infrastructure as code in your favorite programming language and deploy it using AWS CloudFormation. In this post, we will discuss strategies and best practices for accelerating CDK adoption within your organization. Our discussion begins after your organization has successfully completed a pilot. In this post, you will learn how to scale the lessons learned from the pilot project across your organization through platform engineering. You will learn how to reduce complexity through building reusable components, deploy with speed and safety via builder tooling, and accelerate project startup with an internal developer portal (IDP). We will conclude by discussing ways to participate in and benefit from the broader CDK community.

Before we dive in, let’s briefly discuss a new trend in technology: Platform Engineering. DevOps practices have helped IT organizations deliver software to customers more frequently and with higher quality. A recent evolution in DevOps is the introduction of platform engineering teams to build services, toolchains, and documentation to support workload teams. An important responsibility of the platform engineering team is governance of the software delivery process.

At Amazon, we have a long and storied history of leveraging platform engineering to accelerate deployments. This is why we are able to maintain 143 different compliance certifications and attestations while deploying 150 million times per year. Platform engineering increases productivity, reduces friction between ideas and implementation, and improves agility by accelerating the delivery of workloads via a secure, scalable, and reusable set of resources and components through self-service portals and developer tools. Platform Engineering is comprised of seven capabilities: Platform Architecture, Data Architecture, Platform Product Engineering, Data Engineering, Provisioning & Orchestration, Modern App Development and CI/CD. For more information on platform engineering visit the AWS Cloud Adoption Framework.

Establishing these capabilities takes several platform and workload teams working together. From an operating model standpoint, a workload team interacts with Platform Engineering in one of the three following ways (for more information, see Building a Cloud Operating Model):

Reduce Builder Complexity and Cognitive load with Reusable Components

So, how can the platform team incorporate CDK to accomplish their goals? One of the common objectives of the Platform Engineering team is to publish and curate reusable patterns called Constructs. Constructs provide a mechanism to create reusable, extensible, and common components that can be shared across multiple teams and projects.

Many customers write their own implementations for constructs to enforce security best practices such as encryption and specific AWS Identity and Access Management policies. For example, you might create a MyCompanyBucket that implements your organizations security requirements in place of the default Amazon S3 Bucket construct. This bucket configuration can be implemented and extended by multiple teams to ensure they are using components that are validated by your security and compliance teams.

For customers focused on data governance, CDK constructs can automatically add in best practices for recovery time objectives and recovery point objectives by ensuring backups and architecture meet an organization’s resilience policies. For advance customers looking to enforce data lifecycle policies, create uniform access controls, or emit required KPIs, CDK constructs can provide avenues to create safe and secure configuration by default. Applying CDK constructs to DataOps, customers can benefit from templated ETL pipelines that ensure data lineage metadata is maintained and data cleansing occurs.

Customers also build constructs for non-AWS resources. Teams can build Constructs for third-party builder tooling, observability systems, testing apparatuses and more. In this way, workload teams can codify AWS and non-AWS resources in one code base. There is a balance required when writing your own constructs between ensuring standardization and providing the freedom and flexibility of taking advantage of the growing ecosystems of CDK packages. Examples of this balance include AWS Solutions Constructs, as these are typically built upon standard constructs. Without extending standard constructs, the constructs you build will be harder for consumer to integrate with the larger CDK ecosystem since it uses standardize interfaces.

Construct Hub is a central destination for discovering and sharing cloud application design patterns and reference architectures defined for CDK, that are built and published by the AWS community. While AWS provides a public Construct Hub, enterprises can maintain their private Construct Hub inside their own AWS accounts (see construct-hub, the GitHub repository, or the CDK Workshop for more details). The primary objective in either case remains consistent: to provide shared libraries that can be readily utilized by different workload teams. This approach ensures enhanced consistency, reusability, and ultimately leads to cost reduction and faster development timelines.

One of the pitfalls customers often have with leveraging this approach is that Platform Engineering cannot keep up building reusable components to leverage the latest technology enhancements. This is where leveraging the lessons learned from a pilot really can help. A pilot team works with platform engineering to research and implement security best practices. Some customers have the platform engineering team act as approvers for new constructs in addition to authors of new constructs. In this model, a pilot team works to build construct(s) for a new technology. The platform engineers approve the new construct(s). Platform engineers ensure the pilot team meets required standards such as enforcing encryption at rest, encryption in transit, and least privilege. When approval occurs, the pilot team can publish the new construct(s) to Construct Hub. In this way, platform engineering can enable experimentation and innovation, rather than become a gatekeeper. Additionally, platform engineering teams can encourage and curate an inner-sourcing model for construct creation rather than being the sole creator of constructs.

Deploy Applications Using DevSecOps Best Practices

Application builders are most productive when their expertise is channeled towards writing code that directly addresses business challenges. While creating applications is a skill well within the grasp of many software developers, the complex task of deploying and operating these applications in line with organizational standards can be overwhelming, especially for those new to a team. This complexity often acts as a bottleneck, slowing down the experimentation process and delaying the realization of value from new application initiatives.

A solution to this challenge lies in automating the deployment pipeline and operational model. By employing thoroughly tested CDK (Cloud Development Kit) components that are shared across teams and validated through a robust CI/CD (Continuous Integration/Continuous Deployment) process, the burden on developers is significantly reduced. They no longer need to delve into the complexities of the organization’s deployment strategies, allowing them to concentrate on writing unique, innovative code. This approach not only streamlines the development process but also bridges the gap between development and operations, leading to more cohesive teams and faster, more efficient releases.

One key to high-quality software delivery is to have a proper Continuous Integration and Continuous Delivery (CI/CD) process in place. You can see CDK Pipelines: Continuous delivery for AWS CDK applications for practical examples. This high-level construct, powered by AWS CodePipeline, comes in handy when you need to go beyond test deployments with the cdk deploy command and build automated pipelines for production deployments to multiple environments in different regions and/or accounts.

Whenever you commit your AWS CDK app’s source code into AWS CodeCommit, GitHub, GitLab, BitBucket, or Amazon CodeCatalyst source repository, AWS CDK Pipelines automatically builds, tests, and deploys a new version of the application. This pipeline automatically reconfigures itself to deploy as the resources in stacks changes or the environments being deployed to change. For GitHub Actions users, see CDK Pipelines for GitHub Workflows.

A number of teams are extending these pipelines and adding their own stages to ensure deployed code meets the organization’s quality, security, risk, compliance and cloud financial management criteria. For best practices of what automation to put inside the pipeline, see the AWS Deployment Pipeline Reference Architecture. By creating fully functional pipelines, platform engineering teams can reduce the cognitive load place on development teams and increase the developer experience. This strategy has two implementations: QuickStart pipelines and golden pipelines.

In QuickStart pipelines, these pipelines are created as a construct in your Construct Hub and treated similar to the above discussion on reusable components. While these pipelines offer simplified interfaces and a reduction in cognitive load, workload teams remain in control of the pipeline and are free to modify it. As a result, quality gates such as security or compliance tooling can be disabled by workload teams and controls inside the pipeline aren’t provable. This is suboptimal for organizations looking to reduce costs of compliance and audit. As the number of versions of the construct grows, teams can have difficulty governing which versions are used to ensure teams consume.

In golden pipelines, the pipelines are created as constructs, but deployed via a centralized team. Workload teams cannot control or modify these pipelines, so quality gates such as security and compliance tooling cannot be disabled. These controls become provable to stakeholders in security, risk and compliance such as auditors. Removing permissions from workload teams comes with costs. With golden pipelines, platform engineering teams often spend a majority of their time troubleshooting workload teams’ deployments. With so much time spent on troubleshooting, teams have little time to introduce new tooling to raise the security and quality standard, improve environment setup and organizational consistency, or improve audit evidence and enforcement.

Two mechanisms can augment these strategies. Traditional change control boards (CCB) can provide provability in situations where gathering evidence and enforcement are difficult. CCBs can benefit from CDK constructs that integrate IT Service Management (ITSM) approvals and fleet management processes into the pipeline and account creation processes. Alternatively, there is an emerging story with Software Supply Chain Level Artifacts (SLSA). These artifacts can be used as digital proof. In the Kubernetes space, we see this pattern with tools like Tekton chains where attestations associated with OCI images and Kyverno is used for to enforcement the presence of attestations (see Protect the pipe! Secure CI/CD pipelines with a policy-based approach using Tekton and Kyverno for details).

Multi-account and cross-region deployment with CDK

DevOps best practices suggest multiple stages of deployment and testing before deploying to production. On top of that, AWS recommends a dedicated account for each stage to simplify resource isolation and access control. This multi-account strategy helps organizations make best use of AWS resources and provides fine-grain controls (see Recommended OUs and accounts).

Often, you will have a designated AWS account, where all CI/CD pipelines reside. A deployment is executed by these pipelines to publish to other AWS accounts, which may correspond to development, staging, or production stages. For more information about a cross-account strategy in reference to CI/CD pipelines on AWS, see Building a Secure Cross-Account Continuous Delivery Pipeline.

Automated Governance

Many enterprise customers leverage CDK to enforce security controls and policies and can prevent security issues before deployment with tooling to analyze code as part of the deployment pipeline. Using the industry standard tooling of cdk-nag, many teams check applications for best practices using a combination of available rule packs. We are also seeing enterprises build their own Aspects to enforce additional requirements such as tagging requirements to manage and organize their deployed resources.

Customers can create CDK synthesized CloudFormation and add additional checkpoints with CloudFormation Guard to verify the output using policy-as-code domain-specific language (DSL) rules. Platform Engineering teams can build the rules and workload team can consume rules and run CloudFormation Guard inside the pipeline. There is an official construct that supports makes it easy to add CloudFormation Guard checks to your application.

With AWS CDK, infrastructure is code. So, the standard tooling you already use to ensure quality and improve the builder experience should be used with CDK. If your organization has a code quality program, treat CDK applications no differently than web applications or microservices. Similarly, with Amazon CodeGuru Security and Amazon CodeWhisperer, builders can get actionable recommendations on how to improve both the security and quality on their CDK code as they would with any other type of application.

With Aspects, cdk-nag, and code quality tools, organizations can prevent security issues before they are deployed. However, it is also important to create controls that work after a deployment occurs. AWS CloudFormation Hooks allow customers to inspect resources prior to create, update, or delete CloudFormation Stacks or CDK Applications. With CloudFormation Hooks, Platform Engineering teams can provide warnings or prevent provisioning resources for non-compliant resources. These hooks can be created via CDK (see Build and Deploy CloudFormation Hooks using A CI/CD Pipeline for details).

Finally, you can deploy AWS Config’s conformance packs via CDK. These collections of rules you’re your organization insist on security standards at scale. If your organization wishes to build custom rules, teams can build reactive controls using higher level constructs for AWS Config Rules. While many of these patterns existed prior to CDK, CDK helps accelerate building and deploying cloud applications and controls by leveraging reusable components that are shared within the enterprise or by the community at large.

Operate the Application using Observability

The open-source community provides high-level construct libraries that expand basic monitoring capabilities for CDK applications. The cdk-monitoring-constructs project makes it easy to monitor CDK apps. Similarly, Cdk-wakeful takes that a step further, adding many additional services and provides easily configurable interfaces to automatically be notified by AWS System Manager Incident Manager, AWS Chatbot, or Amazon Simple Notification Service. By leveraging prebuilt solutions from the open-source community, you can focus on creating custom metrics and thresholds around your business logic. Platform Engineering teams can modify and extends 1open-source projects to help workload teams simplify their operations and emit health and status to centralized systems.

Accelerate New Project Startup with an Internal Developer Platform

An Internal Developer Platform (IDP) is built by platform engineering teams to build golden paths and enable developer self-service. These golden paths are expressed as a series of templates that the structure of a source control repository and files stored inside the repository. When the IDP uses these templates to create source code repositories, the resultant repository contains the following:

A getting-started tutorial (usually in a README.md)
Reference documentation
Skeleton source code
Dependency Management
CI/CD pipeline template
IaC template
Observability configuration

With CDK, the CI/CD pipeline, IaC template, and observability configuration can all be a part of a single CDK application.

Platform engineering teams build golden paths and expose them using tools like Backstage, Humanitec, or Port. When building golden paths, there are two common approaches to the underlying project structure. Some organizations choose the approach where their IaC code repository is separate from the application code. Others choose to include everything in one repository. There is a healthy tension between how much to place inside a golden path vs a reusable component. In both strategies, platform engineering teams can avoid code duplication by leveraging CDK. The approach your organization chooses will dictate how you organize your reusable components. Below, we will walk through both options and the implications on reusable constructs.

Option 1: Everything in one repository

In this approach, all the code is contained in one repository: infrastructure, application, configuration, and deployment. This approach enables builders to collaborate, build features, and innovate together quickly, which is why it is the recommended approach. For more details, refer to the Best practices documentation. For examples, see AWS Deployment Reference Architecture for Applications.

This approach works best in teams that are “value-stream aligned.” Value-stream aligned teams have development and operations capabilities within the same team. These teams are organized around solving problems for customers rather than technical capabilities. Within the project, teams can organize around logical units such as application tier (API, database, etc.) or business capabilities (order management, product catalog, delivery services, etc.). In organizations that are value stream aligned, larger, highly conventionalized reusable components are better. An extreme example of this type of constructs is a single construct that contains all the code for an entire microservice. In these teams, the cognitive load focuses on the customer problem, so reducing the complexity of developing applications is critical to success.

Option 2: Separated application code pipeline

In this alternative approach, you can decouple your application code from your infrastructure by storing them in separate repositories and having separate pipelines. Separating the pipelines often leads to siloes and less collaboration between workload builders, who shift focus to developing features, and infrastructure engineers, who limit their efforts to building the infrastructure on which those applications run.

This approach works best in teams that are “matrixed.” A matrix organization is structured around technical capabilities (development, operations, security, business, etc.). In these cases, more modular constructs work better than constructs that are highly conventionalized. Experts from each organization can use CDK constructs as mechanisms to share their expertise across the entire organization. Examples of these types of constructs are monitoring, alerting, or security constructs prebuilt with hooks to plug in to centralized monitoring.

Building a Community of Practice with Platform Engineering

Scaling any new technology within a large organization requires the creation and enablement of a community that fosters collaboration, establishes best practices, and stays up to date with the changes in the ecosystem. In order to enable the creation of these communities of practice within your organization, AWS supports multiple public communities centered around the creation of content to educate and enable CDK users. Members of your organization’s community of practice can connect with other CDK development teams around the world through these public AWS supported communities.

Communities of Practice

A Community of Practice (CoP) is a group of people with shared interest who come together to learn, collaborate and develop expertise in a specific domain through informal interactions and knowledge sharing. Within your organization, establishing communities of practice around CDK has been proven to enable mentorship, problem solving, and reusable assets. To get started, your platform engineering team – the creators of reusable constructs and builder tooling with CDK – become early content creators for the community of practice. This establishes a feedback loop where CDK creators publicize their achievements via the CoP and consumers can ask questions and provide direct guidance to creators. Once the CoP has sustainably expanded by the initial group that established it, the CoP can start to add hack-a-thons or game days within your organization, which can bring innovation and solve organization-wide challenges. Fully mature communities of practices own curated wikis or databases of knowledge. They use mechanisms such as townhalls, office hours, newsletters, and chat channels to keep the community up to date. In this way, CDK expertise is diffused across the organization. At AWS, this diffusion of expertise has led to teams other than platform engineering becoming creators of reusable constructs. By expanding who can create reusable constructs, we are able to accelerate our own innovation.

Communities

There is a growing community that supports CDK, with many different platforms available providing content, code, examples and meetups. CDK is currently maintained by AWS with support from the community on AWS CDK GitHub page where you can contribute to the platform, raise issues, see the backlog and join discussions with active community members.

CDK.dev is the community driven hub around the CDK ecosystem. This site brings together all the latest blogs, videos, and educational content. It also provides links to join the community Slack platform.

CDK Patterns houses an open source collection of AWS Serverless architecture patterns built with CDK for developers to use. These patterns are sources via AWS Community Builders / AWS Heroes.

Finally, AWS re:Post provides a question-and-answer portal for the community to resolve.

The AWS Community Builders program offers technical resources, education, and networking opportunities to AWS technical enthusiasts and emerging thought leaders who are passionate about sharing knowledge and connecting with the technical community.

Communities of practice can leverage AWS public communities like cdk.dev to fill gaps in knowledge. Townhalls can benefit from speakers from AWS Heroes or community builders, frequent contributors to GitHub or re:Post, or speakers from CDK Day. Newsletters can aggregate and summarize the latest news from across all AWS channels. Once your community of practice establishes CDK competencies, this collaboration can also be bidirectional. For example, experts in your organization’s community can become AWS Heroes. Success stories can be shared via CDK Day, guest blog posts, and you might even speak at one of our major events such as AWS Summits, AWS re:Invent, AWS re:Inforce, or AWS re:Mars.

Final Thoughts

As we’ve said throughout this blog, with CDK, Infrastructure is code. This has enabled a paradigm shift in the infrastructure management space. Today, we see many customers such as Liberty Mutual, Scenario, Checkmarx, and Registers of Scotland establishing mature ecosystems using CDK. With an active open-source community, an AWS dev team for long term support, and multiple platforms for knowledge sharing, your builders can quickly learn, build, and innovate. Due to successful pilots, many organizations adopt CDK, become more agile, and innovate faster. This is exactly what happened at Amazon, where CDK is the first choice for building new services.

Organizations often scale and reduce complexity through platform engineering. These teams build higher level constructs by applying best practices, and provide CI/CD pipelines to accelerate deployments. Your deployment is safer using unit testing on your infrastructure as code and through robust security controls to provide guidance to builders at every stage: from author to operate.

Finally, establishing a community enables your organization to build its own mature ecosystem. Through both internal and open-source communities your builders can connect, discover, and grow.

New for AWS Amplify – Query MySQL and PostgreSQL database for AWS CDK

2023-12-12 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-for-aws-amplify-query-mysql-and-postgresql-database-for-aws-cdk/

Today we are announcing the general availability to connect and query your existing MySQL and PostgreSQL databases with support for AWS Cloud Development Kit (AWS CDK), a new feature to create a real-time, secure GraphQL API for your relational database within or outside Amazon Web Services (AWS). You can now generate the entire API for all relational database operations with just your database endpoint and credentials. When your database schema changes, you can run a command to apply the latest table schema changes.

In 2021, we announced AWS Amplify GraphQL Transformer version 2, enabling developers to develop more feature-rich, flexible, and extensible GraphQL-based app backends even with minimal cloud expertise. This new GraphQL Transformer was redesigned from the ground up to generate extensible pipeline resolvers to route a GraphQL API request, apply business logic, such as authorization, and communicate with the underlying data source, such as Amazon DynamoDB.

However, customers wanted to use relational database sources for their GraphQL APIs such as their Amazon RDS or Amazon Aurora databases in addition to Amazon DynamoDB. You can now use @model types of Amplify GraphQL APIs for both relational database and DynamoDB data sources. Relational database information is generated to a separate schema.sql.graphql file. You can continue to use the regular schema.graphql files to create and manage DynamoDB-backed types.

When you simply provide any MySQL or PostgreSQL database information, whether behind a virtual private cloud (VPC) or publicly accessible on the internet, AWS Amplify automatically generates a modifiable GraphQL API that securely connects to your database tables and exposes create, read, update, or delete (CRUD) queries and mutations. You can also rename your data models to be more idiomatic for the frontend. For example, a database table is called “todos” (plural, lowercase) but is exposed as “ToDo” (singular, PascalCase) to the client.

With one line of code, you can add any of the existing Amplify GraphQL authorization rules to your API, making it seamless to build use cases such as owner-based authorization or public read-only patterns. Because the generated API is built on AWS AppSync‘ GraphQL capabilities, secure real-time subscriptions are available out of the box. You can subscribe to any CRUD events from any data model with a few lines of code.

Getting started with your MySQL database in AWS CDK
The AWS CDK lets you build reliable, scalable, cost-effective applications in the cloud with the considerable expressive power of a programming language. To get started, install the AWS CDK on your local machine.

$ npm install -g aws-cdk

Run the following command to verify the installation is correct and print the version number of the AWS CDK.

$ cdk –version

Next, create a new directory for your app:

$ mkdir amplify-api-cdk
$ cd amplify-api-cdk

Initialize a CDK app by using the cdk init command.

$ cdk init app --language typescript

Install Amplify’s GraphQL API construct in the new CDK project:

$ npm install @aws-amplify/graphql-api-construct

Open the main stack file in your CDK project (usually located in lib/<your-project-name>-stack.ts). Import the necessary constructs at the top of the file:

import {
    AmplifyGraphqlApi,
    AmplifyGraphqlDefinition
} from '@aws-amplify/graphql-api-construct';

Generate a GraphQL schema for a new relational database API by executing the following SQL statement on your MySQL database. Make sure to output the results to a .csv file, including column headers, and replace <database-name> with the name of your database, schema, or both.

SELECT
  INFORMATION_SCHEMA.COLUMNS.TABLE_NAME,
  INFORMATION_SCHEMA.COLUMNS.COLUMN_NAME,
  INFORMATION_SCHEMA.COLUMNS.COLUMN_DEFAULT,
  INFORMATION_SCHEMA.COLUMNS.ORDINAL_POSITION,
  INFORMATION_SCHEMA.COLUMNS.DATA_TYPE,
  INFORMATION_SCHEMA.COLUMNS.COLUMN_TYPE,
  INFORMATION_SCHEMA.COLUMNS.IS_NULLABLE,
  INFORMATION_SCHEMA.COLUMNS.CHARACTER_MAXIMUM_LENGTH,
  INFORMATION_SCHEMA.STATISTICS.INDEX_NAME,
  INFORMATION_SCHEMA.STATISTICS.NON_UNIQUE,
  INFORMATION_SCHEMA.STATISTICS.SEQ_IN_INDEX,
  INFORMATION_SCHEMA.STATISTICS.NULLABLE
      FROM INFORMATION_SCHEMA.COLUMNS
      LEFT JOIN INFORMATION_SCHEMA.STATISTICS ON INFORMATION_SCHEMA.COLUMNS.TABLE_NAME=INFORMATION_SCHEMA.STATISTICS.TABLE_NAME AND INFORMATION_SCHEMA.COLUMNS.COLUMN_NAME=INFORMATION_SCHEMA.STATISTICS.COLUMN_NAME
      WHERE INFORMATION_SCHEMA.COLUMNS.TABLE_SCHEMA = '<database-name>';

Run the following command, replacing <path-schema.csv> with the path to the .csv file created in the previous step.

$ npx @aws-amplify/cli api generate-schema \
    --sql-schema <path-to-schema.csv> \
    --engine-type mysql –out lib/schema.sql.graphql

You can open schema.sql.graphql file to see the imported data model from your MySQL database schema.

input AMPLIFY {
     engine: String = "mysql"
     globalAuthRule: AuthRule = {allow: public}
}

type Meals @model {
     id: Int! @primaryKey
     name: String!
}

type Restaurants @model {
     restaurant_id: Int! @primaryKey
     address: String!
     city: String!
     name: String!
     phone_number: String!
     postal_code: String!
     ...
}

If you haven’t already done so, go to the Parameter Store in the AWS Systems Manager console and create a parameter for the connection details of your database, such as hostname/url, database name, port, username, and password. These will be required in the next step for Amplify to successfully connect to your database and perform GraphQL queries or mutations against it.

In the main stack class, add the following code to define a new GraphQL API. Replace the dbConnectionConfg options with the parameter paths created in the previous step.

new AmplifyGraphqlApi(this, "MyAmplifyGraphQLApi", {
  apiName: "MySQLApi",
  definition: AmplifyGraphqlDefinition.fromFilesAndStrategy(
    [path.join(__dirname, "schema.sql.graphql")],
    {
      name: "MyAmplifyGraphQLSchema",
      dbType: "MYSQL",
      dbConnectionConfig: {
        hostnameSsmPath: "/amplify-cdk-app/hostname",
        portSsmPath: "/amplify-cdk-app/port",
        databaseNameSsmPath: "/amplify-cdk-app/database",
        usernameSsmPath: "/amplify-cdk-app/username",
        passwordSsmPath: "/amplify-cdk-app/password",
      },
    }
  ),
  authorizationModes: { apiKeyConfig: { expires: cdk.Duration.days(7) } },
  translationBehavior: { sandboxModeEnabled: true },
});

This configuration assums that your database is accessible from the internet. Also, the default authorization mode is set to Api Key for AWS AppSync and the sandbox mode is enabled to allow public access on all models. This is useful for testing your API before adding more fine-grained authorization rules.

Finally, deploy your GraphQL API to AWS Cloud.

$ cdk deploy

You can now go to the AWS AppSync console and find your created GraphQL API.

Choose your project and the Queries menu. You can see newly created GraphQL APIs compatible with your tables of MySQL database, such as getMeals to get one item or listRestaurants to list all items.

For example, when you select items with fields of address, city, name, phone_number, and so on, you can see a new GraphQL query. Choose the Run button and you can see the query results from your MySQL database.

When you query your MySQL database, you can see the same results.

How to customize your GraphQL schema for your database
To add a custom query or mutation in your SQL, open the generated schema.sql.graphql file and use the @sql(statement: "") pass in parameters using the :<variable> notation.

type Query {
     listRestaurantsInState(state: String): Restaurants @sql("SELECT * FROM Restaurants WHERE state = :state;”)
}

For longer, more complex SQL queries, you can reference SQL statements in the customSqlStatements config option. The reference value must match the name of a property mapped to a SQL statement. In the following example, a searchPosts property on customSqlStatements is being referenced:

type Query {
      searchPosts(searchTerm: String): [Post]
      @sql(reference: "searchPosts")
}

Here is how the SQL statement is mapped in the API definition.

new AmplifyGraphqlApi(this, "MyAmplifyGraphQLApi", { 
    apiName: "MySQLApi",
    definition: AmplifyGraphqlDefinition.fromFilesAndStrategy( [path.join(__dirname, "schema.sql.graphql")],
    {
        name: "MyAmplifyGraphQLSchema",
        dbType: "MYSQL",
        dbConnectionConfig: {
        //	...ssmPaths,
     }, customSqlStatements: {
        searchPosts: // property name matches the reference value in schema.sql.graphql 
        "SELECT * FROM posts WHERE content LIKE CONCAT('%', :searchTerm, '%');",
     },
    }
  ),
//...
});

The SQL statement will be executed as if it were defined inline in the schema. The same rules apply in terms of using parameters, ensuring valid SQL syntax, and matching return types. Using a reference file keeps your schema clean and allows the reuse of SQL statements across fields. It is best practice for longer, more complicated SQL queries.

Or you can change a field and model name using the @refersTo directive. If you don’t provide the @refersTo directive, AWS Amplify assumes that the model name and field name exactly match the database table and column names.

type Todo @model @refersTo(name: "todos") {
     content: String
     done: Boolean
}

When you want to create relationships between two database tables, use the @hasOne and @hasMany directives to establish a 1:1 or 1:M relationship. Use the @belongsTo directive to create a bidirectional relationship back to the relationship parent. For example, you can make a 1:M relationship between a restaurant and its meals menus.

type Meals @model {
     id: Int! @primaryKey
     name: String!
     menus: [Restaurants] @hasMany(references: ["restaurant_id"])
}

type Restaurants @model {
     restaurant_id: Int! @primaryKey
     address: String!
     city: String!
     name: String!
     phone_number: String!
     postal_code: String!
     meals: Meals @belongsTo(references: ["restaurant_id"])
     ...
}

Whenever you make any change to your GraphQL schema or database schema in your DB instances, you should deploy your changes to the cloud:

Whenever you make any change to your GraphQL schema or database schema in your DB instances, you should re-run the SQL script and export to .csv step mentioned earlier in this guide to re-generate your schema.sql.graphql file and then deploy your changes to the cloud:

$ cdk deploy

To learn more, see Connect API to existing MySQL or PostgreSQL database in the AWS Amplify documentation.

Now available
The relational database support for AWS Amplify now works with any MySQL and PostgreSQL databases hosted anywhere within Amazon VPC or even outside of AWS Cloud.

Give it a try and send feedback to AWS re:Post for AWS Amplify, the GitHub repository of Amplify GraphQL API, or through your usual AWS Support contacts.

— Channy

P.S. Specially thanks to René Huangtian Brandel, a principal product manager at AWS for his contribution to write sample codes.

Blue/Green deployments using AWS CDK Pipelines and AWS CodeDeploy

2023-10-10 Luiz Decaro

Post Syndicated from Luiz Decaro original https://aws.amazon.com/blogs/devops/blue-green-deployments-using-aws-cdk-pipelines-and-aws-codedeploy/

Customers often ask for help with implementing Blue/Green deployments to Amazon Elastic Container Service (Amazon ECS) using AWS CodeDeploy. Their use cases usually involve cross-Region and cross-account deployment scenarios. These requirements are challenging enough on their own, but in addition to those, there are specific design decisions that need to be considered when using CodeDeploy. These include how to configure CodeDeploy, when and how to create CodeDeploy resources (such as Application and Deployment Group), and how to write code that can be used to deploy to any combination of account and Region.

Today, I will discuss those design decisions in detail and how to use CDK Pipelines to implement a self-mutating pipeline that deploys services to Amazon ECS in cross-account and cross-Region scenarios. At the end of this blog post, I also introduce a demo application, available in Java, that follows best practices for developing and deploying cloud infrastructure using AWS Cloud Development Kit (AWS CDK).

The Pipeline

CDK Pipelines is an opinionated construct library used for building pipelines with different deployment engines. It abstracts implementation details that developers or infrastructure engineers need to solve when implementing a cross-Region or cross-account pipeline. For example, in cross-Region scenarios, AWS CloudFormation needs artifacts to be replicated to the target Region. For that reason, AWS Key Management Service (AWS KMS) keys, an Amazon Simple Storage Service (Amazon S3) bucket, and policies need to be created for the secondary Region. This enables artifacts to be moved from one Region to another. In cross-account scenarios, CodeDeploy requires a cross-account role with access to the KMS key used to encrypt configuration files. This is the sort of detail that our customers want to avoid dealing with manually.

AWS CodeDeploy is a deployment service that automates application deployment across different scenarios. It deploys to Amazon EC2 instances, On-Premises instances, serverless Lambda functions, or Amazon ECS services. It integrates with AWS Identity and Access Management (AWS IAM), to implement access control to deploy or re-deploy old versions of an application. In the Blue/Green deployment type, it is possible to automate the rollback of a deployment using Amazon CloudWatch Alarms.

CDK Pipelines was designed to automate AWS CloudFormation deployments. Using AWS CDK, these CloudFormation deployments may include deploying application software to instances or containers. However, some customers prefer using CodeDeploy to deploy application software. In this blog post, CDK Pipelines will deploy using CodeDeploy instead of CloudFormation.

A pipeline build with CDK Pipelines that deploys to Amazon ECS using AWS CodeDeploy. It contains at least 5 stages: Source, Build, UpdatePipeline, Assets and at least one Deployment stage.

Design Considerations

In this post, I’m considering the use of CDK Pipelines to implement different use cases for deploying a service to any combination of accounts (single-account & cross-account) and regions (single-Region & cross-Region) using CodeDeploy. More specifically, there are four problems that need to be solved:

CodeDeploy Configuration

The most popular options for implementing a Blue/Green deployment type using CodeDeploy are using CloudFormation Hooks or using a CodeDeploy construct. I decided to operate CodeDeploy using its configuration files. This is a flexible design that doesn’t rely on using custom resources, which is another technique customers have used to solve this problem. On each run, a pipeline pushes a container to a repository on Amazon Elastic Container Registry (ECR) and creates a tag. CodeDeploy needs that information to deploy the container.

I recommend creating a pipeline action to scan the AWS CDK cloud assembly and retrieve the repository and tag information. The same action can create the CodeDeploy configuration files. Three configuration files are required to configure CodeDeploy: appspec.yaml, taskdef.json and imageDetail.json. This pipeline action should be executed before the CodeDeploy deployment action. I recommend creating template files for appspec.yaml and taskdef.json. The following script can be used to implement the pipeline action:

##
#!/bin/sh
#
# Action Configure AWS CodeDeploy
# It customizes the files template-appspec.yaml and template-taskdef.json to the environment
#
# Account = The target Account Id
# AppName = Name of the application
# StageName = Name of the stage
# Region = Name of the region (us-east-1, us-east-2)
# PipelineId = Id of the pipeline
# ServiceName = Name of the service. It will be used to define the role and the task definition name
#
# Primary output directory is codedeploy/. All the 3 files created (appspec.json, imageDetail.json and 
# taskDef.json) will be located inside the codedeploy/ directory
#
##
Account=$1
Region=$2
AppName=$3
StageName=$4
PipelineId=$5
ServiceName=$6
repo_name=$(cat assembly*$PipelineId-$StageName/*.assets.json | jq -r '.dockerImages[] | .destinations[] | .repositoryName' | head -1) 
tag_name=$(cat assembly*$PipelineId-$StageName/*.assets.json | jq -r '.dockerImages | to_entries[0].key')  
echo ${repo_name} 
echo ${tag_name} 
printf '{"ImageURI":"%s"}' "$Account.dkr.ecr.$Region.amazonaws.com/${repo_name}:${tag_name}" > codedeploy/imageDetail.json                     
sed 's#APPLICATION#'$AppName'#g' codedeploy/template-appspec.yaml > codedeploy/appspec.yaml 
sed 's#APPLICATION#'$AppName'#g' codedeploy/template-taskdef.json | sed 's#TASK_EXEC_ROLE#arn:aws:iam::'$Account':role/'$ServiceName'#g' | sed 's#fargate-task-definition#'$ServiceName'#g' > codedeploy/taskdef.json 
cat codedeploy/appspec.yaml
cat codedeploy/taskdef.json
cat codedeploy/imageDetail.json

Using a Toolchain

A good strategy is to encapsulate the pipeline inside a Toolchain to abstract how to deploy to different accounts and regions. This helps decoupling clients from the details such as how the pipeline is created, how CodeDeploy is configured, and how cross-account and cross-Region deployments are implemented. To create the pipeline, deploy a Toolchain stack. Out-of-the-box, it allows different environments to be added as needed. Depending on the requirements, the pipeline may be customized to reflect the different stages or waves that different components might require. For more information, please refer to our best practices on how to automate safe, hands-off deployments and its reference implementation.

In detail, the Toolchain stack follows the builder pattern used throughout the CDK for Java. This is a convenience that allows complex objects to be created using a single statement:

 Toolchain.Builder.create(app, Constants.APP_NAME+"Toolchain")
        .stackProperties(StackProps.builder()
                .env(Environment.builder()
                        .account(Demo.TOOLCHAIN_ACCOUNT)
                        .region(Demo.TOOLCHAIN_REGION)
                        .build())
                .build())
        .setGitRepo(Demo.CODECOMMIT_REPO)
        .setGitBranch(Demo.CODECOMMIT_BRANCH)
        .addStage(
                "UAT",
                EcsDeploymentConfig.CANARY_10_PERCENT_5_MINUTES,
                Environment.builder()
                        .account(Demo.SERVICE_ACCOUNT)
                        .region(Demo.SERVICE_REGION)
                        .build())                                                                                                             
        .build();

In the statement above, the continuous deployment pipeline is created in the TOOLCHAIN_ACCOUNT and TOOLCHAIN_REGION. It implements a stage that builds the source code and creates the Java archive (JAR) using Apache Maven. The pipeline then creates a Docker image containing the JAR file.

The UAT stage will deploy the service to the SERVICE_ACCOUNT and SERVICE_REGION using the deployment configuration CANARY_10_PERCENT_5_MINUTES. This means 10 percent of the traffic is shifted in the first increment and the remaining 90 percent is deployed 5 minutes later.

To create additional deployment stages, you need a stage name, a CodeDeploy deployment configuration and an environment where it should deploy the service. As mentioned, the pipeline is, by default, a self-mutating pipeline. For example, to add a Prod stage, update the code that creates the Toolchain object and submit this change to the code repository. The pipeline will run and update itself adding a Prod stage after the UAT stage. Next, I show in detail the statement used to add a new Prod stage. The new stage deploys to the same account and Region as in the UAT environment:

... 
        .addStage(
                "Prod",
                EcsDeploymentConfig.CANARY_10_PERCENT_5_MINUTES,
                Environment.builder()
                        .account(Demo.SERVICE_ACCOUNT)
                        .region(Demo.SERVICE_REGION)
                        .build())                                                                                                                                      
        .build();

In the statement above, the Prod stage will deploy new versions of the service using a CodeDeploy deployment configuration CANARY_10_PERCENT_5_MINUTES. It means that 10 percent of traffic is shifted in the first increment of 5 minutes. Then, it shifts the rest of the traffic to the new version of the application. Please refer to Organizing Your AWS Environment Using Multiple Accounts whitepaper for best-practices on how to isolate and manage your business applications.

Some customers might find this approach interesting and decide to provide this as an abstraction to their application development teams. In this case, I advise creating a construct that builds such a pipeline. Using a construct would allow for further customization. Examples are stages that promote quality assurance or deploy the service in a disaster recovery scenario.

The implementation creates a stack for the toolchain and another stack for each deployment stage. As an example, consider a toolchain created with a single deployment stage named UAT. After running successfully, the DemoToolchain and DemoService-UAT stacks should be created as in the next image:

Two stacks are needed to create a Pipeline that deploys to a single environment. One stack deploys the Toolchain with the Pipeline and another stack deploys the Service compute infrastructure and CodeDeploy Application and DeploymentGroup. In this example, for an application named Demo that deploys to an environment named UAT, the stacks deployed are: DemoToolchain and DemoService-UAT

CodeDeploy Application and Deployment Group

CodeDeploy configuration requires an application and a deployment group. Depending on the use case, you need to create these in the same or in a different account from the toolchain (pipeline). The pipeline includes the CodeDeploy deployment action that performs the blue/green deployment. My recommendation is to create the CodeDeploy application and deployment group as part of the Service stack. This approach allows to align the lifecycle of CodeDeploy application and deployment group with the related Service stack instance.

CodePipeline allows to create a CodeDeploy deployment action that references a non-existing CodeDeploy application and deployment group. This allows us to implement the following approach:

Toolchain stack deploys the pipeline with CodeDeploy deployment action referencing a non-existing CodeDeploy application and deployment group
When the pipeline executes, it first deploys the Service stack that creates the related CodeDeploy application and deployment group
The next pipeline action executes the CodeDeploy deployment action. When the pipeline executes the CodeDeploy deployment action, the related CodeDeploy application and deployment will already exist.

Below is the pipeline code that references the (initially non-existing) CodeDeploy application and deployment group.

private IEcsDeploymentGroup referenceCodeDeployDeploymentGroup(
        final Environment env, 
        final String serviceName, 
        final IEcsDeploymentConfig ecsDeploymentConfig, 
        final String stageName) {

    IEcsApplication codeDeployApp = EcsApplication.fromEcsApplicationArn(
            this,
            Constants.APP_NAME + "EcsCodeDeployApp-"+stageName,
            Arn.format(ArnComponents.builder()
                    .arnFormat(ArnFormat.COLON_RESOURCE_NAME)
                    .partition("aws")
                    .region(env.getRegion())
                    .service("codedeploy")
                    .account(env.getAccount())
                    .resource("application")
                    .resourceName(serviceName)
                    .build()));

    IEcsDeploymentGroup deploymentGroup = EcsDeploymentGroup.fromEcsDeploymentGroupAttributes(
            this,
            Constants.APP_NAME + "-EcsCodeDeployDG-"+stageName,
            EcsDeploymentGroupAttributes.builder()
                    .deploymentGroupName(serviceName)
                    .application(codeDeployApp)
                    .deploymentConfig(ecsDeploymentConfig)
                    .build());

    return deploymentGroup;
}

To make this work, you should use the same application name and deployment group name values when creating the CodeDeploy deployment action in the pipeline and when creating the CodeDeploy application and deployment group in the Service stack (where the Amazon ECS infrastructure is deployed). This approach is necessary to avoid a circular dependency error when trying to create the CodeDeploy application and deployment group inside the Service stack and reference these objects to configure the CodeDeploy deployment action inside the pipeline. Below is the code that uses Service stack construct ID to name the CodeDeploy application and deployment group. I set the Service stack construct ID to the same name I used when creating the CodeDeploy deployment action in the pipeline.

   // configure AWS CodeDeploy Application and DeploymentGroup
   EcsApplication app = EcsApplication.Builder.create(this, "BlueGreenApplication")
           .applicationName(id)
           .build();

   EcsDeploymentGroup.Builder.create(this, "BlueGreenDeploymentGroup")
           .deploymentGroupName(id)
           .application(app)
           .service(albService.getService())
           .role(createCodeDeployExecutionRole(id))
           .blueGreenDeploymentConfig(EcsBlueGreenDeploymentConfig.builder()
                   .blueTargetGroup(albService.getTargetGroup())
                   .greenTargetGroup(tgGreen)
                   .listener(albService.getListener())
                   .testListener(listenerGreen)
                   .terminationWaitTime(Duration.minutes(15))
                   .build())
           .deploymentConfig(deploymentConfig)
           .build();

CDK Pipelines roles and permissions

CDK Pipelines creates roles and permissions the pipeline uses to execute deployments in different scenarios of regions and accounts. When using CodeDeploy in cross-account scenarios, CDK Pipelines deploys a cross-account support stack that creates a pipeline action role for the CodeDeploy action. This cross-account support stack is defined in a JSON file that needs to be published to the AWS CDK assets bucket in the target account. If the pipeline has the self-mutation feature on (default), the UpdatePipeline stage will do a cdk deploy to deploy changes to the pipeline. In cross-account scenarios, this deployment also involves deploying/updating the cross-account support stack. For this, the SelfMutate action in UpdatePipeline stage needs to assume CDK file-publishing and a deploy roles in the remote account.

The IAM role associated with the AWS CodeBuild project that runs the UpdatePipeline stage does not have these permissions by default. CDK Pipelines cannot grant these permissions automatically, because the information about the permissions that the cross-account stack needs is only available after the AWS CDK app finishes synthesizing. At that point, the permissions that the pipeline has are already locked-in. Hence, for cross-account scenarios, the toolchain should extend the permissions of the pipeline’s UpdatePipeline stage to include the file-publishing and deploy roles.

In cross-account environments it is possible to manually add these permissions to the UpdatePipeline stage. To accomplish that, the Toolchain stack may be used to hide this sort of implementation detail. In the end, a method like the one below can be used to add these missing permissions. For each different mapping of stage and environment in the pipeline it validates if the target account is different than the account where the pipeline is deployed. When the criteria is met, it should grant permission to the UpdatePipeline stage to assume CDK bootstrap roles (tagged using key aws-cdk:bootstrap-role) in the target account (with the tag value as file-publishing or deploy). The example below shows how to add permissions to the UpdatePipeline stage:

private void grantUpdatePipelineCrossAccoutPermissions(Map<String, Environment> stageNameEnvironment) {

    if (!stageNameEnvironment.isEmpty()) {

        this.pipeline.buildPipeline();
        for (String stage : stageNameEnvironment.keySet()) {

            HashMap<String, String[]> condition = new HashMap<>();
            condition.put(
                    "iam:ResourceTag/aws-cdk:bootstrap-role",
                    new String[] {"file-publishing", "deploy"});
            pipeline.getSelfMutationProject()
                    .getRole()
                    .addToPrincipalPolicy(PolicyStatement.Builder.create()
                            .actions(Arrays.asList("sts:AssumeRole"))
                            .effect(Effect.ALLOW)
                            .resources(Arrays.asList("arn:*:iam::"
                                    + stageNameEnvironment.get(stage).getAccount() + ":role/*"))
                            .conditions(new HashMap<String, Object>() {{
                                    put("ForAnyValue:StringEquals", condition);
                            }})
                            .build());
        }
    }
}

The Deployment Stage

Let’s consider a pipeline that has a single deployment stage, UAT. The UAT stage deploys a DemoService. For that, it requires four actions: DemoService-UAT (Prepare and Deploy), ConfigureBlueGreenDeploy and Deploy.

The
DemoService-UAT.Deploy action will create the ECS resources and the CodeDeploy application and deployment group. The
ConfigureBlueGreenDeploy action will read the AWS CDK
cloud assembly. It uses the configuration files to identify the Amazon Elastic Container Registry (Amazon ECR) repository and the container image tag pushed. The pipeline will send this information to the
Deploy action. The
Deploy action starts the deployment using CodeDeploy.

Solution Overview

As a convenience, I created an application, written in Java, that solves all these challenges and can be used as an example. The application deployment follows the same 5 steps for all deployment scenarios of account and Region, and this includes the scenarios represented in the following design:

A pipeline created by a Toolchain should be able to deploy to any combination of accounts and regions. This includes four scenarios: single-account and single-Region, single-account and cross-Region, cross-account and single-Region and cross-account and cross-Region

Conclusion

In this post, I identified, explained and solved challenges associated with the creation of a pipeline that deploys a service to Amazon ECS using CodeDeploy in different combinations of accounts and regions. I also introduced a demo application that implements these recommendations. The sample code can be extended to implement more elaborate scenarios. These scenarios might include automated testing, automated deployment rollbacks, or disaster recovery. I wish you success in your transformative journey.

Enhancing Resource Isolation in AWS CDK with the App Staging Synthesizer

2023-10-05 Jehu Gray

Post Syndicated from Jehu Gray original https://aws.amazon.com/blogs/devops/enhancing-resource-isolation-in-aws-cdk-with-the-app-staging-synthesizer/

AWS Cloud Development Kit (CDK) has become a powerful tool for defining and provisioning AWS cloud resources. While CDK simplifies the process of infrastructure as code, managing resources across different projects and environments can still present challenges. In this blog post, we’ll explore a new experimental library, the App Staging Synthesizer, that enhances resource isolation and provides finer control over staging resources in CDK applications.

Background: The CDK Bootstrapping Model

Let’s consider a scenario where a company has two projects in the same account, Project A and Project B. Both projects are developed using the AWS CDK and deploy various AWS resources. However, the company wants to ensure that resources used in Project A are not discoverable or accessible to Project B. Prior to the introduction of the App Staging Synthesizer library in CDK, the default bootstrapping process created shared staging resources, such as a single Amazon S3 bucket and Amazon ECR repository, which are used by all CDK applications deployed in the CDK environment. In AWS CDK, a combination of region and an account is considered to be an environment. The traditional CDK bootstrapping method offers simplicity and consistency by providing a standardized set of shared staging resources for all CDK applications in an environment, which can be cost-effective for multiple applications. This shared model makes it challenging to control access and visibility between the projects in the same account, particularly in scenarios where resource isolation is crucial between different projects. In such scenarios, AWS recommends a best practice of separating projects that need critical isolation into different AWS accounts. However, it is recognized that there might be organizational or practical reasons preventing the immediate adoption of this recommendation. In such cases, mechanisms like the App Staging Synthesizer can provide a valuable workaround.

Introducing the App Staging Synthesizer:

Today, a growing trend among customers is the consolidation of their cloud accounts driven by the desire to optimize costs, bolster security and enhance compliance control. However, while consolidation offers several advantages, it can sometimes limit the flexibility to align ownership and decision making with individual accounts. This can lead to dependencies and conflicts in how workloads across accounts are secured and managed. The App Staging Synthesizer which is an experimental library designed to provide a more flexible approach to resource management and staging in CDK applications was designed to address these challenges. The AppStagingSynthesizer enhances resource isolation and cleanup control by creating separate staging resources for each application, reducing the risk of conflicts between resources and providing more granular management. It also enables better asset lifecycle management and customization of roles and resource handling, offering CDK developers a flexible and organized approach to resource deployment. Let’s delve into some of the advantages and key features of this library.

Advantages and Outcomes:

Isolation and Access Control: The resources created for Project A are now completely isolated from Project B. Project B doesn’t have visibility or access to the staging resources of Project A, and vice versa. This ensures a higher level of data and resource security.
Easier Resource Cleanup: When cleaning up or deleting resources, the Staging Stack specific to each project can be removed independently. This allows for a more streamlined and controlled cleanup process, mitigating the risk of inadvertently affecting other projects.
Lifecycle Management: With separate ECR repositories for each CDK application, the company can apply lifecycle rules independently for retention and cost management. For example, they can configure each ECR repository to retain only the most recent 5 images, effectively cutting down on storage costs.
Reduced Bootstrapping Complexity: As the only shared resources required are global Roles, the company now only needs to bootstrap every account in one Region instead of bootstrapping every Region. This simplifies the bootstrapping process, making it easier to manage with CloudFormation StackSets.

Key Features of the App Staging Synthesizer:

IStagingResources Interface: The App Staging Synthesizer introduces the IStagingResources interface, offering a framework to manage app-level bootstrap stacks. These stacks handle file assets and Docker assets for CDK applications.
DefaultStagingStack: Included in the library, the DefaultStagingStack is a pre-built implementation of the IStagingResources It comes with default configurations for staging resources, making it easier to get started.
AppStagingSynthesizer: This is a new CDK synthesizer that orchestrates the creation of staging resources for each CDK application. It seamlessly integrates with the application deployment process.
Deployment Roles: In addition to creating staging resources, the CDK App Staging Synthesizer also manages deployment roles. These roles are crucial for secure and controlled resource deployment, ensuring that only authorized processes can modify or access the resources.

Implementation:

Let’s explore practical examples of using the App Staging Synthesizer within a CDK application.

Prerequisite:

For this walkthrough, you should have the following prerequisites:

An AWS account
Install AWS CDK version 2.73.0 or later
A basic understanding of CDK. Please go through cdkworkshop.com to get hands-on learning about CDK and related concepts.
NOTE: To utilize the AppStagingSynthesizer, you should have an existing CDK application or should be working on a CDK application.

Using Default Staging Resources:

When configuring your CDK application to use deployment identities with the old bootstrap stack, it’s important to note that the existing staging resources, including the global S3 bucket and ECR repository, will still be created as part of the bootstrapping process. However, they will remain unused by this specific application, thanks to the App Staging Synthesizer.
While we won’t delve into the removal of these unused resources in this blogpost, it’s worth mentioning that for a more streamlined resource setup, you have the option to customize the bootstrap template to remove these resources if desired. This can help reduce clutter and ensure that only the necessary resources are retained within your CDK environment.

To get started, update your CDK App with the following code snippet:

const app = new App({
defaultStackSynthesizer: AppStagingSynthesizer.defaultResources({
appId: 'my-app-id',
// The following line is optional. By default, it is assumed you have bootstrapped in the same region(s) as the stack(s) you are deploying.
deploymentIdentities: DeploymentIdentities.defaultBootstrapRoles({ bootstrapRegion: 'us-east-1' }),
}),
});

This code snippet creates a DefaultStagingStack for a CDK App, allowing you to manage staging resources more effectively.

Customizing Roles:

You can customize roles for the synthesizer, which can be useful for several reasons such as:

Reuse of existing roles: In many AWS environments, organizations have existing IAM roles with specific permissions and policies that are aligned with their security and compliance requirements. Rather than creating new roles from scratch, you might want to leverage these existing roles to maintain consistency and adhere to established security practices.
Compatibility: In scenarios where you have pre-existing IAM roles that are being used across various AWS services or applications, customizing roles within the CDK App Staging Synthesizer allows you to seamlessly integrate CDK deployments into your existing IAM role management strategy.

Overall, customizing roles provides flexibility and control over resources used during CDK application deployments, enabling you to align CDK-based infrastructure with the organization’s policies. An example is:

const app = new App({
defaultStackSynthesizer: AppStagingSynthesizer.defaultResources({
appId: 'my-app-id',
deploymentIdentities: DeploymentIdentities.specifyRoles({
cloudFormationExecutionRole: BootstrapRole.fromRoleArn('arn:aws:iam::123456789012:role/Execute'),
deploymentRole: BootstrapRole.fromRoleArn('arn:aws:iam::123456789012:role/Deploy'),
}),
}),
});

This code snippet illustrates how you can specify custom roles for different stages of the deployment process.

Deploy Time S3 Assets:

Deploy-time S3 assets can be classified into two categories, each serving a distinct purpose:

Assets Used Only During Deployment: These assets are instrumental in handing off substantial data to other services for private copying during deployment. They play a vital role during initial deployment, and afterwards are retained solely for potential future rollbacks
Assets Accessed Throughout Application Lifespan: In contrast, some assets are accessed continuously throughout the runtime of your application. These could include script files utilized in CodeBuild projects, startup scripts for EC2 instances, or, in the case of CDK applications, ECR images that persist throughout the application’s life.

Marking Lambda Assets as Deploy-Time:

By default, Lambda assets are marked as deploy-time assets in the CDK App Staging Synthesizer. This means they fall into the first category mentioned above, serving as essential components during deployment. For instance, consider the following code snippet:

declare const stack: Stack;
new lambda.Function(stack, 'lambda', {
code: lambda.AssetCode.fromAsset(path.join(__dirname, 'assets')), // Lambda code bundle marked as deploy-time
handler: 'index.handler',
runtime: lambda.Runtime.PYTHON_3_9,
});

In this example, the Lambda code bundle is automatically identified as a deploy-time asset. This distinction ensures that it’s cleaned up after the configurable rollback window.

Creating Custom Deploy-Time Assets:

CDK offers the flexibility needed to create custom deploy-time assets. This can be achieved by utilizing the Asset construct from the AWS CDK library:

import { Asset } from 'aws-cdk-lib/aws-s3-assets';
declare const stack: Stack;
const asset = new Asset(stack, 'deploy-time-asset', {
deployTime: true, // Marking the asset as deploy-time
path: path.join(__dirname, './deploy-time-asset'),
});

By setting deployTime to true, the asset is explicitly marked as deploy-time. This allows you to maintain control over the lifecycle of these assets, ensuring they are retained for as long as needed. However, it is important to note that deploy-time assets eventually become eligible for cleanup.

Configuring Asset Lifecycles:
By default, the CDK retains deploy-time assets for a period of 30 days. However, there is flexibility to adjust this duration according to custom requirements. This can be achieved by specifying deployTimeFileAssetLifetime. The value set here determines how long you can roll back to a previous application version without the need for rebuilding and republishing assets:

const app = new App({
defaultStackSynthesizer: AppStagingSynthesizer.defaultResources({
appId: 'my-app-id',
deployTimeFileAssetLifetime: Duration.days(100), // Adjusting the asset retention period to 100 days
}),
});

By fine-tuning the lifecycle of deploy-time S3 assets, you gain more control over CDK deployments and ensure that CDK applications are equipped to handle rollbacks and updates with ease.

Optimizing ECR Repository Management with Lifecycle Rules:

The AWS CDK App Staging Synthesizer provides you with the capability to control the lifecycle of container images by leveraging lifecycle rules within ECR repositories. Let’s explore how this feature can help streamline your CDK workflows.

ECR repositories can accumulate numerous versions of Docker images over time. While retaining some historical versions is essential for rollback scenarios and reference, an unregulated growth of image versions can lead to increased storage costs and management complexity.

The AWS CDK App Staging Synthesizer offers a default configuration that stores a maximum of 3 revisions for a given Docker image asset. This ensures that you maintain access to previous image versions, facilitating seamless rollback operations. When more than 3 revisions of an asset exist in the ECR repository, the oldest one is purged.

Although by default, it’s set to 3, you can also adjust this value using the imageAssetVersionCount property:

const app = new App({
defaultStackSynthesizer: AppStagingSynthesizer.defaultResources({
appId: 'my-app-id',
imageAssetVersionCount: 10, // Customizing the image version count to retain up to 10 revisions
}),
});

By increasing or decreasing the imageAssetVersionCount, you can strike a balance between storage efficiency and the need to access historical image versions. This ensures that ECR repositories are optimized to the CDK application’s requirements.

Streamlining Cleanup: Auto Delete Staging Assets on Stack Deletion

Efficiently managing resources throughout the lifecycle of your CDK applications is essential, and this includes handling the cleanup of staging assets when stacks are deleted. The AWS CDK App Staging Synthesizer simplifies this process by providing an auto-delete feature for staging resources. In this section, we’ll explore how this feature works and how you can customize it according to your needs.

The Default Cleanup Behavior:
By default, the AWS CDK App Staging Synthesizer is designed to facilitate the cleanup of staging resources automatically when a stack is deleted. This means that associated resources, such as S3 buckets and ECR repositories, are configured with a RemovalPolicy.DESTROY and have autoDeleteObjects (for S3 buckets) or autoDeleteImages (for ECR repositories) turned on. Under the hood, custom resources are created to ensure a seamless cleanup process.

Customizing Cleanup Behavior:
While automatic cleanup is convenient for many scenarios, there may be situations where you want to retain staging resources even after stack deletion. This can be useful when you intend to reuse these resources or when you have specific cleanup processes outside of the default behavior. To retain staging assets and disable the auto-delete feature, you can specify autoDeleteStagingAssets: as false when configuring the AWS CDK App Staging Synthesizer:

const app = new App({
defaultStackSynthesizer: AppStagingSynthesizer.defaultResources({
appId: 'my-app-id',
autoDeleteStagingAssets: false, // Disabling auto-delete of staging assets
}),
});

By setting autoDeleteStagingAssets to false, you have full control over the cleanup of staging resources. This allows you to retain and manage these resources independently, giving you the flexibility to align CDK workflows with the organization’s specific practices.

Using an Existing Staging Stack:

While the AWS CDK App Staging Synthesizer offers powerful tools for managing staging resources, there may be scenarios where you already have a meticulously crafted staging stack in place. In such cases, you can seamlessly integrate the existing stack with the AppStagingSynthesizer using the customResources() method. Let’s explore how you can make the most of your pre-existing staging infrastructure.

The process is straightforward—supply your existing staging stack as a resource to the AppStagingSynthesizer using the customResources() method. It’s crucial to ensure that the custom stack adheres to the requirements of the IStagingResources interface for smooth integration.

Here’s an example:

// Create a new CDK App
const resourceApp = new App();

//Instantiate your custom staging stack (make sure it implements IstagingResources)
const resources = new CustomStagingStack(resourceApp, 'CustomStagingStack', {});

//Configure your CDK App to use the App Staging Synthesizer with your custom staging stack
const app = new App({
defaultStackSynthesizer: AppStagingSynthesizer.customResources({
resources,
}),
});

In this example, CustomStagingStack represents the pre-existing staging infrastructure. By providing it as a resource to the App Staging Synthesizer, you seamlessly integrate it into the CDK application’s deployment workflow.

Crafting Custom Staging Stacks for Environment Control:

For those seeking precise control over resource management in different environments, the AWS CDK App Staging Synthesizer offers a robust solution – custom staging stacks. This feature allows you to tailor resource configurations, permissions, and behaviors to meet the unique demands of each environment within the CDK application.

Subclassing DefaultStagingStack for a Quick Start:

If your customization requirements align with the available properties, you can start by subclassing DefaultStagingStack. This streamlined approach lets you inherit existing functionalities while tweaking specific behaviors as needed. Here’s how you can dive right in:

//Define custom staging stack
interface CustomStagingStackOptions extends DefaultStagingStackOptions {}

//Subclass DefaultStagingStack to create the custom stgaing stack
class CustomStagingStack extends DefaultStagingStack {
// Implement customizations here
}

Building Staging Resources from Scratch:

For more granular control, consider building the staging resources entirely from scratch. This approach allows you to define every aspect of the staging stack, from the ground up, by implementing the “IStagingResources” interface. Here’s an example:

// Define custom staging stack properties(if needed)
interface CustomStagingStackProps extends StackProps {}

//Create your custom staging stack that implements IStagingResources
class CustomStagingStack extends Stack implements IStagingResources {
constructor(scope: Construct, id: string, props: CustomStagingStackProps) {
super(scope, id, props);
}

// Implement methods to define your custom staging resources
public addFile(asset: FileAssetSource): FileStagingLocation {
return {
bucketName: 'myBucket',
assumeRoleArn: 'myArn',
dependencyStack: this,
};
}
public addDockerImage(asset: DockerImageAssetSource): ImageStagingLocation {
return {
repoName: 'myRepo',
assumeRoleArn: 'myArn',
dependencyStack: this,
};
}
}

Creating Custom Staging Resources:

Implementing custom staging resources also involves crafting a CustomFactory class to facilitate the creation of these resources in every environment where your CDK App is deployed. This approach offers a high level of customization while ensuring consistency across deployments. Here’s how it works:

// Define a custom factory for your staging resources
class CustomFactory implements IStagingResourcesFactory {
public obtainStagingResources(stack: Stack, context: ObtainStagingResourcesContext) {
const myApp = App.of(stack);

// Create a custom staging stack instance for the current environment
return new CustomStagingStack(myApp!, `CustomStagingStack-${context.environmentString}`, {});
}
}

//Incorporate your custom staging resources into the Application using the customer factory
const app = new App({
defaultStackSynthesizer: AppStagingSynthesizer.customFactory({
factory: new CustomFactory(),
oncePerEnv: true, // by default
}),
});

With this setup, you can create custom staging stacks for each environment, ensuring resource management tailored to your specific needs. Whether you choose to subclass DefaultStagingStack for a quick start or build resources from scratch, custom staging stacks empower you to achieve fine-grained control and consistency across CDK deployments.

Conclusion:

The App Staging Synthesizer introduces a powerful approach to managing staging resources in AWS CDK applications. With enhanced resource isolation and lifecycle control, it addresses the limitations of the default bootstrapping model. By integrating the App Staging Synthesizer into CDK applications, you can achieve better resource management, cleaner cleanup processes, and more control over cloud infrastructure.
Explore this experimental library and unleash the potential of fine-tuned resource management in CDK projects.

For more information and code examples, refer to the official documentation provided by AWS.

About the Authors:

Using AWS CloudFormation and AWS Cloud Development Kit to provision multicloud resources

2023-09-08 Aaron Sempf

Post Syndicated from Aaron Sempf original https://aws.amazon.com/blogs/devops/using-aws-cloudformation-and-aws-cloud-development-kit-to-provision-multicloud-resources/

Customers often need to architect solutions to support components across multiple cloud service providers, a need which may arise if they have acquired a company running on another cloud, or for functional purposes where specific services provide a differentiated capability. In this post, we will show you how to use the AWS Cloud Development Kit (AWS CDK) to create a single pane of glass for managing your multicloud resources.

AWS CDK is an open source framework that builds on the underlying functionality provided by AWS CloudFormation. It allows developers to define cloud resources using common programming languages and an abstraction model based on reusable components called constructs. There is a misconception that CloudFormation and CDK can only be used to provision resources on AWS, but this is not the case. The CloudFormation registry, with support for third party resource types, along with custom resource providers, allow for any resource that can be configured via an API to be created and managed, regardless of where it is located.

Multicloud solution design paradigm

Multicloud solutions are often designed with services grouped and separated by cloud, creating a segregation of resource and functions within the design. This approach leads to a duplication of layers of the solution, most commonly a duplication of resources and the deployment processes for each environment. This duplication increases cost, and leads to a complexity of management increasing the potential break points within the solution or practice.

Along with simplifying resource deployments, and the ever-increasing complexity of customer needs, so too has the need increased for the capability of IaC solutions to deploy resources across hybrid or multicloud environments. Through meeting this need, a proliferation of supported tools, frameworks, languages, and practices has created “choice overload”. At worst, this scares the non-cloud-savvy away from adopting an IaC solution benefiting their cloud journey, and at best confuses the very reason for adopting an IaC practice.

A single pane of glass

Systems Thinking is a holistic approach that focuses on the way a system’s constituent parts interrelate and how systems work as a whole especially over time and within the context of larger systems. Systems thinking is commonly accepted as the backbone of a successful systems engineering approach. Designing solutions taking a full systems view, based on the component’s function and interrelation within the system across environments, more closely aligns with the ability to handle the deployment of each cloud-specific resource, from a single control plane.

While AWS provides a list of services that can be used to help design, manage and operate hybrid and multicloud solutions, with AWS as the primary cloud you can go beyond just using services to support multicloud. CloudFormation registry resource types model and provision resources using custom logic, as a component of stacks in CloudFormation. Public extensions are not only provided by AWS, but third-party extensions are made available for general use by publishers other than AWS, meaning customers can create their own extensions and publish them for anyone to use.

The AWS CDK, which has a 1:1 mapping of all AWS CloudFormation resources, as well as a library of abstracted constructs, supports the ability to import custom AWS CloudFormation extensions, enabling customers and partners to create custom AWS CDK constructs for their extensions. The chosen programming language can be used to inherit and abstract the custom resource into reusable AWS CDK constructs, allowing developers to create solutions that contain native AWS extensions along with secondary hybrid or alternate cloud resources.

Providing the ability to integrate mixed resources in the same stack more closely aligns with the functional design and often diagrammatic depiction of the solution. In essence, we are creating a single IaC pane of glass over the entire solution, deployed through a single control plane. This lowers the complexity and the cost of maintaining separate modules and deployment pipelines across multiple cloud providers.

A common use case for a multicloud: disaster recovery

One of the most common use cases of the requirement for using components across different cloud providers is the need to maintain data sovereignty while designing disaster recovery (DR) into a solution.

Data sovereignty is the idea that data is subject to the laws of where it is physically located, and in some countries extends to regulations that if data is collected from citizens of a geographical area, then the data must reside in servers located in jurisdictions of that geographical area or in countries with a similar scope and rigor in their protection laws.

This requires organizations to remain in compliance with their host country, and in cases such as state government agencies, a stricter scope of within state boundaries, data sovereignty regulations. Unfortunately, not all countries, and especially not all states, have multiple AWS regions to select from when designing where their primary and recovery data backups will reside. Therefore, the DR solution needs to take advantage of multiple cloud providers in the same geography, and as such a solution must be designed to backup or replicate data across providers.

The multicloud solution

A multicloud solution to the proposed use case would be the backup of data from an AWS resource such as an Amazon S3 bucket to another cloud provider within the same geography, such as an Azure Blob Storage container, using AWS event driven behaviour to trigger the copying of data from the primary AWS resource to the secondary Azure backup resource.

Following the IaC single pane of glass approach, the Azure Blob Storage container is created as a resource type in the CloudFormation Registry, and imported into the AWS CDK to be used as a construct in the solution. However, before the extension resource type can be used effectively in the CDK as a reusable construct and added to your private library, you will first need to go through the import into CDK process for creating Constructs.

There are three different levels of constructs, beginning with low-level constructs, which are called CFN Resources (or L1, short for “layer 1”). These constructs directly represent all resources available in AWS CloudFormation. They are named CfnXyz, where Xyz is name of the resource.

Layer 1 Construct

In this example, an L1 construct named CfnAzureBlobStorage represents an Azure::BlobStorage AWS CloudFormation extension. Here you also explicitly configure the ref property, in order for higher level constructs to access the Output value which will be the Azure blob container url being provisioned.

import { CfnResource } from "aws-cdk-lib";
import { Secret, ISecret } from "aws-cdk-lib/aws-secretsmanager";
import { Construct } from "constructs";

export interface CfnAzureBlobStorageProps {
  subscriptionId: string;
  clientId: string;
  tenantId: string;
  clientSecretName: string;
}

// L1 Construct
export class CfnAzureBlobStorage extends Construct {
  // Allows accessing the ref property
  public readonly ref: string;

  constructor(scope: Construct, id: string, props: CfnAzureBlobStorageProps) {
    super(scope, id);

    const secret = this.getSecret("AzureClientSecret", props.clientSecretName);
    
    const azureBlobStorage = new CfnResource(
      this,
      "ExtensionAzureBlobStorage",
      {
        type: "Azure::BlobStorage",
        properties: {
          AzureSubscriptionId: props.subscriptionId,
          AzureClientId: props.clientId,
          AzureTenantId: props.tenantId,
          AzureClientSecret: secret.secretValue.unsafeUnwrap()
        },
      }
    );

    this.ref = azureBlobStorage.ref;
  }

  private getSecret(id: string, secretName: string) : ISecret {  
    return Secret.fromSecretNameV2(this, secretName.concat("Value"), secretName);
  }
}

As with every CDK Construct, the constructor arguments are scope, id and props. scope and id are propagated to the cdk.Construct base class. The props argument is of type CfnAzureBlobStorageProps which includes four properties all of type string. This is how the Azure credentials are propagated down from upstream constructs.

Layer 2 Construct

The next level of constructs, L2, also represent AWS resources, but with a higher-level, intent-based API. They provide similar functionality, but incorporate the defaults, boilerplate, and glue logic you’d be writing yourself with a CFN Resource construct. They also provide convenience methods that make it simpler to work with the resource.

In this example, an L2 construct is created to abstract the CfnAzureBlobStorage L1 construct and provides additional properties and methods.

import { Construct } from "constructs";
import { CfnAzureBlobStorage } from "./cfn-azure-blob-storage";

// L2 Construct
export class AzureBlobStorage extends Construct {
  public readonly blobContainerUrl: string;

  constructor(
    scope: Construct,
    id: string,
    subscriptionId: string,
    clientId: string,
    tenantId: string,
    clientSecretName: string
  ) {
    super(scope, id);

    const azureBlobStorage = new CfnAzureBlobStorage(
      this,
      "CfnAzureBlobStorage",
      {
        subscriptionId: subscriptionId,
        clientId: clientId,
        tenantId: tenantId,
        clientSecretName: clientSecretName,
      }
    );

    this.blobContainerUrl = azureBlobStorage.ref;
  }
}

The custom L2 construct class is declared as AzureBlobStorage, this time without the Cfn prefix to represent an L2 construct. This time the constructor arguments include the Azure credentials and client secret, and the ref from the L1 construct us output to the public variable AzureBlobContainerUrl.

As an L2 construct, the AzureBlobStorage construct could be used in CDK Apps along with AWS Resource Constructs in the same Stack, to be provisioned through AWS CloudFormation creating the IaC single pane of glass for a multicloud solution.

Layer 3 Construct

The true value of the CDK construct programming model is in the ability to extend L2 constructs, which represent a single resource, into a composition of multiple constructs that provide a solution for a common task. These are Layer 3, L3, Constructs also known as patterns.

In this example, the L3 construct represents the solution architecture to backup objects uploaded to an Amazon S3 bucket into an Azure Blob Storage container in real-time, using AWS Lambda to process event notifications from Amazon S3.

import { RemovalPolicy, Duration, CfnOutput } from "aws-cdk-lib";
import { Bucket, BlockPublicAccess, EventType } from "aws-cdk-lib/aws-s3";
import { DockerImageFunction, DockerImageCode } from "aws-cdk-lib/aws-lambda";
import { PolicyStatement, Effect } from "aws-cdk-lib/aws-iam";
import { LambdaDestination } from "aws-cdk-lib/aws-s3-notifications";
import { IStringParameter, StringParameter } from "aws-cdk-lib/aws-ssm";
import { Secret, ISecret } from "aws-cdk-lib/aws-secretsmanager";
import { Construct } from "constructs";
import { AzureBlobStorage } from "./azure-blob-storage";

// L3 Construct
export class S3ToAzureBackupService extends Construct {
  constructor(
    scope: Construct,
    id: string,
    azureSubscriptionIdParamName: string,
    azureClientIdParamName: string,
    azureTenantIdParamName: string,
    azureClientSecretName: string
  ) {
    super(scope, id);

    // Retrieve existing SSM Parameters
    const azureSubscriptionIdParameter = this.getSSMParameter("AzureSubscriptionIdParam", azureSubscriptionIdParamName);
    const azureClientIdParameter = this.getSSMParameter("AzureClientIdParam", azureClientIdParamName);
    const azureTenantIdParameter = this.getSSMParameter("AzureTenantIdParam", azureTenantIdParamName);    
    
    // Retrieve existing Azure Client Secret
    const azureClientSecret = this.getSecret("AzureClientSecret", azureClientSecretName);

    // Create an S3 bucket
    const sourceBucket = new Bucket(this, "SourceBucketForAzureBlob", {
      removalPolicy: RemovalPolicy.RETAIN,
      blockPublicAccess: BlockPublicAccess.BLOCK_ALL,
    });

    // Create a corresponding Azure Blob Storage account and a Blob Container
    const azurebBlobStorage = new AzureBlobStorage(
      this,
      "MyCustomAzureBlobStorage",
      azureSubscriptionIdParameter.stringValue,
      azureClientIdParameter.stringValue,
      azureTenantIdParameter.stringValue,
      azureClientSecretName
    );

    // Create a lambda function that will receive notifications from S3 bucket
    // and copy the new uploaded object to Azure Blob Storage
    const copyObjectToAzureLambda = new DockerImageFunction(
      this,
      "CopyObjectsToAzureLambda",
      {
        timeout: Duration.seconds(60),
        code: DockerImageCode.fromImageAsset("copy_s3_fn_code", {
          buildArgs: {
            "--platform": "linux/amd64"
          }
        }),
      },
    );

    // Add an IAM policy statement to allow the Lambda function to access the
    // S3 bucket
    sourceBucket.grantRead(copyObjectToAzureLambda);

    // Add an IAM policy statement to allow the Lambda function to get the contents
    // of an S3 object
    copyObjectToAzureLambda.addToRolePolicy(
      new PolicyStatement({
        effect: Effect.ALLOW,
        actions: ["s3:GetObject"],
        resources: [`arn:aws:s3:::${sourceBucket.bucketName}/*`],
      })
    );

    // Set up an S3 bucket notification to trigger the Lambda function
    // when an object is uploaded
    sourceBucket.addEventNotification(
      EventType.OBJECT_CREATED,
      new LambdaDestination(copyObjectToAzureLambda)
    );

    // Grant the Lambda function read access to existing SSM Parameters
    azureSubscriptionIdParameter.grantRead(copyObjectToAzureLambda);
    azureClientIdParameter.grantRead(copyObjectToAzureLambda);
    azureTenantIdParameter.grantRead(copyObjectToAzureLambda);

    // Put the Azure Blob Container Url into SSM Parameter Store
    this.createStringSSMParameter(
      "AzureBlobContainerUrl",
      "Azure blob container URL",
      "/s3toazurebackupservice/azureblobcontainerurl",
      azurebBlobStorage.blobContainerUrl,
      copyObjectToAzureLambda
    );      

    // Grant the Lambda function read access to the secret
    azureClientSecret.grantRead(copyObjectToAzureLambda);

    // Output S3 bucket arn
    new CfnOutput(this, "sourceBucketArn", {
      value: sourceBucket.bucketArn,
      exportName: "sourceBucketArn",
    });

    // Output the Blob Conatiner Url
    new CfnOutput(this, "azureBlobContainerUrl", {
      value: azurebBlobStorage.blobContainerUrl,
      exportName: "azureBlobContainerUrl",
    });
  }

}

The custom L3 construct can be used in larger IaC solutions by calling the class called S3ToAzureBackupService and providing the Azure credentials and client secret as properties to the constructor.

import * as cdk from "aws-cdk-lib";
import { Construct } from "constructs";
import { S3ToAzureBackupService } from "./s3-to-azure-backup-service";

export class MultiCloudBackupCdkStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const s3ToAzureBackupService = new S3ToAzureBackupService(
      this,
      "MyMultiCloudBackupService",
      "/s3toazurebackupservice/azuresubscriptionid",
      "/s3toazurebackupservice/azureclientid",
      "/s3toazurebackupservice/azuretenantid",
      "s3toazurebackupservice/azureclientsecret"
    );
  }
}

Solution Diagram

Diagram 1: IaC Single Control Plane, demonstrates the concept of the Azure Blob Storage extension being imported from the AWS CloudFormation Registry into AWS CDK as an L1 CfnResource, wrapped into an L2 Construct and used in an L3 pattern alongside AWS resources to perform the specific task of backing up from and Amazon s3 Bucket into an Azure Blob Storage Container.

Diagram 1: IaC Single Control Plan

The CDK application is then synthesized into one or more AWS CloudFormation Templates, which result in the CloudFormation service deploying AWS resource configurations to AWS and Azure resource configurations to Azure.

This solution demonstrates not only how to consolidate the management of secondary cloud resources into a unified infrastructure stack in AWS, but also the improved productivity by eliminating the complexity and cost of operating multiple deployment mechanisms into multiple public cloud environments.

The following video demonstrates an example in real-time of the end-state solution:

Next Steps

While this was just a straightforward example, with the same approach you can use your imagination to come up with even more and complex scenarios where AWS CDK can be used as a single pane of glass for IaC to manage multicloud and hybrid solutions.

To get started with the solution discussed in this post, this workshop will provide you with the instructions you need to understand the steps required to create the S3ToAzureBackupService.

Once you have learned how to create AWS CloudFormation extensions and develop them into AWS CDK Constructs, you will learn how, with just a few lines of code, you can develop reusable multicloud unified IaC solutions that deploy through a single AWS control plane.

Conclusion

By adopting AWS CloudFormation extensions and AWS CDK, deployed through a single AWS control plane, the cost and complexity of maintaining deployment pipelines across multiple cloud providers is reduced to a single holistic solution-focused pipeline. The techniques demonstrated in this post and the related workshop provide a capability to simplify the design of complex systems, improve the management of integration, and more closely align the IaC and deployment management practices with the design.

About the authors: