Tag Archives: AWS Cloud Development Kit

CDK Corner – May 2021

Post Syndicated from Christian Weber original https://aws.amazon.com/blogs/devops/cdk-corner-may-2021/

Social – community engagement

According to Matt Coulter’s tweet, nearly 4000 people signed up for CDK Day to celebrate all things CDK on April 30. As a single-day, two-track event, there was a significant amount of content to learn from while having fun, and interacting with the CDK community.

Eric Johnson as the emcee, keynoted the first session of the morning, presenting “Better together: AWS CDK and AWS SAM.” This keynote was the announcement for the public preview of the AWS Serverless Application Model CLI (AWS SAM CLI). The AWS Serverless Application Model CLI includes support for local development and testing of AWS CDK projects.

To learn more, the blog post announcing the AWS SAM CLI public preview has more detail about the capabilities of the AWS SAM CLI.

If you missed CDK Day, fear not! CDK Day Track 1 and Track2 are available to watch online.

Great job and round of applause to the sign-language translators, the speakers, the organizers, and the hosts for making the second CDK Day a success! We can’t wait for CDK Day number 3!

Updates to the CDK

AWS CDK v2 developer preview

It’s here! The much-anticipated release of CDK v2’s developer preview is now available!

When using CDK previously, developers in JavaScript and TypeScript have faced challenges with the way that npm handles transitive dependencies; the dependencies that your dependencies rely on. For example, the aws-ec2 package.json file lists dependencies for other CDK construct libraries. If one of these transitive dependencies were updated, all of them would be need to be updated. Or you would run into dependency tree resolution errors, as seen in this StackOverflow thread.

With v2, all construct modules are now provided in a single package: aws-cdk-lib. All of the dependencies are now pinned to a single version of aws-cdk-lib, making it easier to manage. This also gives you the flexibility of having all CDK construct library modules available without having to run npm install each time you want to use a new construct library.

Another change to AWS CDK v2 is the removal of experimental modules. To help promote API stability and comply with semantic versioning, CDK v2 ships only with modules marked as stable.

Experimental modules aren’t going away completely, though. In v1, experimental modules and constructs will be provided together with no change. In v2, experimental modules are distributed and versioned separately from the aws-cdk-lib package, in their own dedicated package and namespace. Once a v2 construct is deemed stable, it is then merged into the aws-cdk-lib package.

The CDK team is still determining the best method of distributing experimental modules and constructs, so stay tuned for more information. Read more about the AWS CDK v2 developer preview in the What’s new blog post.

AWS CDK for Go developer preview

On April 7, the AWS CDK team announced support for golang. From the Go tracking issue on GitHub, nearly 900 members of the CDK community have requested for CDK to support golang, and we’re happy to see it become available! We are looking forward to helping out all the golang gophers out there build amazing CDK applications!

To learn more about Go and AWS CDK, read the AWS CDK for Go module API documentation on pkg.go.dev. You can also read the Go bindings for JSII RFC document on GitHub. Want to contribute to the success of Go and CDK? The project tracking board for Go’s General Availability has tasks and items which could use your help.

Construct modules promoted to General Availability

Many new construct modules were promoted to General Availability recently. General Availability indicates a module’s stability, giving confidence to run these modules in production workloads. In April, a total of 15 modules were promoted stable:

Notable new L2 constructs

In the @aws-cdk/route-53 module, name server (NS) records were previously defined with the route53.RecordType enum. In PR#13895, user stijnbrouwers introduces the NS record as its own L2 construct: route53.NSRecord. bringing it into company with other record type L2s, such as route53.ARecord. This makes managing NS records consistent with the other record types represented as L2 constructs.

Improving the @aws-cdk/aws-events-targets module, CDK community user hedrall submitted PR#13823. This change brings support for Amazon API Gateway as a target for an Amazon EventBridge event.

@aws-cdk/aws-codepipeline-actions now includes an L2 construct for AWS CodeStar Connections supporting BitBucket and GitHub. This construct lets you create a CDK application that uses AWS CodeStar with a source connection from either provider, thanks to PR#13781 from the CDK Team.

Level ups to existing CDK constructs

Amazon Elastic Inference makes available low-cost GPU-acceleration for deep-learning workloads. PR#13950 now lets you use the service via @aws-cdk/aws-ecs in Amazon Elastic Container Service tasks, from CDK community user upparekh.

In PR#13473, from pgarbe, the @aws-cdk/aws-lambda-nodejs module will now bundle AWS Lambda functions with Docker images sourced from the Amazon Elastic Container Registry (Amazon ECR) Public Registry, instead of DockerHub. Prior to this change, CDK used your DockerHub credentials to pull a Docker image for the Lambda function. If your account was in DockerHub’s free-tier account level, your account is throttled whenever it exceeds the API limit within a short time frame set by DockerHub. This can cause your AWS CDK deployment to be delayed until you are under DockerHub’s API limit. By moving to the Amazon ECR Public Registry, this removes the risk of being affected by DockerHub’s API rate limiting . You can read more in this blog post giving customers advice about DockerHub rate limits from last year.

With @aws-cdk/aws-codebuild, you can use concurrent build support to speed up your build process. Sometimes you’ll want to limit the number of builds that run concurrently, whether for cost reduction or reducing the complexity of your build process. PR#14185, authored by gmokki, adds the ability to define a concurrent build limit for an AWS CodeBuild project Stage.

It is common for customers to have applications or resources spanning multiple AWS Regions. If you’re using @aws-cdk/aws-secretsmanager, you can now replicate secrets to multiple Regions, with PR#14266 from the CDK team. Make sure you’re not setting your secret as “test123” for your production databases in multiple Regions!

For users of @aws-cdk/aws-eks, PR#12659 from anguslees lets you pass arguments from bootstrap.sh to avoid the DescribeCluster API call. This will speed up the time it takes nodes to join an EKS cluster.

PR#14250 from the CDK team gives developers using @aws-cdk/aws-ec2 the ability to set fixed IPs when defining NAT gateways. This change will now pre-create Elastic IP address allocations and assign them to the NAT gateway. This can be useful when managing links from an Amazon Virtual Private Cloud (VPC) to an on-premises data center that relies on fixed/static IP addresses.

@aws-cdk/aws-iam now lets you add AWS Identity and Access Management (AWS IAM) users to new or existing groups. For example, you might want to have a user in a specific group for the life of a deployed CDK application. And on stack deletion, revoke that membership. Thanks to PR#13698 from jogold, this is now possible.

Learning – Finds from across the internet

If you work with CDK parameters, you might be curious how parameters derive their names and values. Borislav Hadzhiev released a blog post about setting and using CDK parameters.

Ibrahim Cesar’s wrote an awesome blog post detailing the experience of discovering and working with CDK. It’s an enjoyable read of inspiration and animated gifs.

Twitter user edwin4_ released a tool for CDK automation called RocketCDK. From the project’s GitHub repository, this tool will initialize your CDK app, install your packages, and auto-import them into your stack. Neat! Anything that helps save time is a plus-one.

Community acknowledgments

And finally, congratulations and rounds of applause for these folks who had their first Pull Request merged to the CDK repository!

*These users’ Pull Requests were merged in April.

Thank you for joining us on this update of the CDK corner. See you next time!

CDK Corner – April 2021

Post Syndicated from Christian Weber original https://aws.amazon.com/blogs/devops/cdk-corner-april-2021/

Social – Community Engagement

We’re getting closer and closer to CDK Day, with the event receiving 75 CFP submissions. The cdkday schedule is now available to plan out your conference day.

Updates to the CDK

Constructs promoted to General Availability

Promoting a module to stable/General Availability is always a cause for celebration. Great job to all the folks involved who helped move aws-acmpca from Experimental to Stable. PR#13778 gives a peak into the work involved. If you’re interested in helping promote a module to G.A., or would like to learn more about the process, read the AWS Construct Library Module Lifecycle document. A big thanks to the CDK Community and team for their work!

Dead Letter Queues

Dead Letter Queues (“DLQs”) are a service implementation pattern that can queue messages when a service cannot process them. For example, if an email message can’t be delivered to a client, an email server could implement a DLQ holding onto that undeliverable message until the client can process the message. DLQs are supported by many AWS services, the community and CDK team have been working to support DLQs with CDK in various modules: aws-codebuild in PR#11228, aws-stepfunctions in PR#13450, and aws-lambda-targets in PR#11617.

Amazon API Gateway

Amazon API Gateway is a fully managed service to deploy APIs at scale. Here are the modules that have received updates to their support for API Gateway:

  • stepfunctions-tasks now supports API Gateway with PR#13033.

  • You can now specify regions when integrating Amazon API Gateway with other AWS services in PR#13251.

  • Support for websockets api in PR#13031 is now available in aws-apigatewayv2 as a Level 2 construct. To differentiate configuration between HTTP and websockets APIs, several of the HTTP API properties were renamed. More information about these changes can be found in the conversation section of PR#13031.

  • You can now set default authorizers in PR#13172. This lets you use an API Gateway HTTP, REST, or Websocket APIs with an authorizer and authorization scopes that cover all routes for a given API resource.

Notable new L2 constructs

AWS Global Accelerator is a networking service that lets users of your infrastructure hosted on AWS use the AWS global network infrastructure for traffic routing, improving speed and performance. Amazon Route 53 supports Global Accelerator and, thanks to PR#13407, you can now take advantage of this functionality in the aws-route-53-targets module as an L2 construct.

Amazon CloudWatch is an important part of monitoring AWS workloads. With PR#13281, the aws-cloudwatch-actions module now includes an Ec2Action construct, letting you programmatically set up observability of EC2-based workloads with CDK.

The aws-cognito module now supports Apple ID User Pools in PR#13160 allowing Developers to define workloads that use Apple IDs for identity management.

aws-iam received a new L2 construct with PR#13393, bringing SAML implementation support to CDK. SAML has become a preferred framework when implementing Single Sign On, and has been supported with IAM for sometime. Now, set it up with even more efficiency with the SamlProvider construct.

Amazon Neptune is a managed graph database service available as a construct in the aws-neptune module. PR#12763 adds L2 constructs to support Database Clusters and Database Instances.

Level ups to existing CDK constructs

Service discovery in AWS is provided by AWS CloudMap. With PR#13192, users of aws-ecs can now register an ECS Service with CloudMap.

aws-lambda has received two notable additions related to Docker: PR#13318, and PR#12258 add functionality to package Lambda function code with the output of a Docker build, or from a Docker build asset, respectively.

The aws-ecr module now supports Tag Mutability. Tags can denote a specific release for a piece of software. Setting the enum in the construct to IMMUTABLE will prevent tags from being overwritten by a later image, if that image uses a tag already present in the container repository.

Last year, AWS announced support for deployment circuit breakers in Amazon Elastic Container Service, enabling customers to perform auto-rollbacks on unhealthy service deployments without manual intervention. PR#12719 includes this functionality as part of the aws-ecs-patterns module, via the DeploymentCircuitBreaker interface. This interface is now available and can be used in constructs such as ApplicationLoadBalancedFargateService.

The aws-ec2 module received some nice quality of life upgrades to it: Support for multi-part user-data in PR#11843, client vpn endpoints in PR#12234, and non-numeric security protocols for security groups in PR#13593 all help improve the experience of using EC2 with CDK.

Learning – Finds from across the internet

On the AWS DevOps Blog, Eric Beard and Rico Huijbers penned a post detailing Best Practices for Developing Cloud Applications with AWS CDK.

Users of AWS Elastic Beanstalk wanting to deploy with AWS CDK can read about deploying Elastic Beanstalk applications with the AWS CDK and the aws-elasticbeanstalk module.

Deploying Infrastructure that is HIPAA and HiTrust compliant with AWS CDK can help customers move faster. This best practices guide for Hipaa and HiTrust environments goes into detail on deploying compliant architecture with the AWS CDK.

Community Acknowledgements

And finally, congratulations and rounds of applause for these folks who had their first Pull Request merged to the CDK Repository!*

*These users’ Pull Requests were merged between 2021-03-01 and 2021-03-31.

Thanks for reading this update of the CDK Corner. See you next time!

CDK Corner – January 2021

Post Syndicated from Christian Weber original https://aws.amazon.com/blogs/devops/cdk-corner-february-2021/

Social: Events in the Community

CDK Day is coming up on April 30th! This is your chance to meet and engage with the CDK Community! Last year’s event included an incredible amount of content, whether it was learning the origin story of CDK, learning how CDK is used in a Large Enterprise, there were many great sessions, as well as Eric Johnson cosplaying as the official CDK Mascot.

Do you have a story to share about using CDK, about something funny/crazy/interesting/cool/another adjective? The CFPs are now open — the community wants to hear your stories; so go ahead and submit here!

Updates: Changes made across CDK

In January, the CDK Community and the AWS CDK team were together hard at work, bringing in new changes, features, or, as NetaNir likes to call them, many new “goodies” to the CDK!

AWS Construct Library and Core

The CDK Team announced General Availability of the EKS Module in CDK with PR#12640. Moving a CDK Module from Experimental to Stable requires substantial effort from both the CDK Community and Team — the appreciation for everyone that contributed to this effort cannot be understated. Take a look at the project milestone to explore some of the work that contributed to releasing the EKS constrcut to GA. Great job everyone!

External assets are now supported from PR#12259. With this change, you can now setup cdk-assets.json with Files, Archives, or even Docker Images built by external utilities. This is great if your CDK Application relies on assets from other sources, such as an internal pipeline, or if you want to pull the latest Docker Image built from some external utility.

CDK will now alert you if your stack hits the maximum number of CloudFormation Resources. If you’re deploying complex CDK Stacks, you’ll know that sometimes you will hit this cap which seems to only happen when you’ve walked away from your computer to make a coffee while your stack is deploying, only to come back with a latte and a command line full of exceptions. This wonderful quality-of-life change was merged in PR#12193.

AWS CodeBuild

AWS CodeBuild in CDK can now be configured with Standard 5.0 Runtime Environments, which now supports many new runtime environments, including support for Python 3.9 which means, for example, CodeBuild now natively understands the union operator in Python dictionaries you’ve been using to combine dictionaries in your project.

AWS EC2

There is now support for m6gd and r6gd Graviton EC2 Instances from CDK with PR#12302. Graviton Instances are a great way to utilize ARM Archicture at a lower cost.

Support for new io2 and and gp3 EBS Volumes were announced at re:Invent, followed up with a community contribution from leandrodamascena in PR#12074

AWS ElasticSearch

A big cost savings feature to support ElasticSearch UltraWarm nodes in CDK, now gives CDK users the opportunity to store data in S3 instead of an SSD with ElasticSearch, which can substantially reduce storage costs.

AWS S3

Securing S3 Buckets is a standard practice, and CDK has tightened its security on S3 Buckets by limiting the PutObject permission of Bucket.grantWrite() to just s3:PutObject instead of s3:PutObject*. This subtle change means that only the first permission is added to the IAM Principal, instead of any other IAM permission prefixed with PutObject (Such as s3:PutObjectAcl). You still have the flexibility to make this permission add-on if needed, though.

AWS StepFunctions

A member of the CDK Community, ayush987goyal, submitted PR#12436 for StepFunctions-Tasks. This feature now lets users specify the family and revision of a taskDefinitionFamily inside EcsRunTask, thanks to their effort. This modifies previous behavior of the construct where a user could only deploy the latest revision of a Task by supplying the ARN of the Task.

CloudFormation and new L1 Resources

As CDK synthesizes CloudFormation Templates, it’s important that CDK stays up to date with the CloudFormation Resource Specification these updates to our collection of L1 Constructs. Now that they’re here, the community and team can begin implementing beautiful L2 Constructs for these L1s. Interested in contributing an L2 from these L1s? Take a look at our CONTRIBUTING doc to get up and running.

In January the team introduced several updates of the CloudFormation Resource Spec to CDK, bringing support for a whole slew of new Resources, Attribute Updates and Property Changes. These updates, among others, include new resource types for CloudFormation Modules, SageMaker Pipelines, AWS Config Saved Queries, AWS DataSync, AWS Service Catalog App Registry, AWS QuickSight, Virtual Clusters for EMR Containers for Amazon Elastic MapReduce, support for DNSSEC in Route53, and support for ECR Public Repositories.

My favorite of all these is ECR Public Repositories. Public Repositories support was just recently announced, in December at AWS re:Invent. Now you can deploy and manage a public repository with CDK as an L1 Construct. So, if you have an exciting Container Image that you’ve been wanting to share with the world with your own Public Repository, set it all up with CDK!

To be in the know on updates to the CDK, and updates to CDK’s CloudFormation Resource Spec, update your repository notification settings to watch for new CDK Releases , and browse the cfnspec CHANGELOG.

Learning: Level up your CDK Knowledge

AWS has released a new training module for the CDK. This free 7 module course teaches users the fundamental concepts of the CDK, from explaining its core benefits, to defining the common language and terms, to tips for troubleshooting CDK Projects. This is a great course for developers, or related stakeholders who may be considering whether or not to adopt CDK in their team or organization.

Community Acknowledgements: Thanks for your hard work

We love highlighting Pull Requests from our community of CDK users. This month’s spotlight goes to Jacob-Doetsch, who submitted a fix when deploying Bastion Hosts backed by ARM Architecture. As ARM based architecture increases in usage across AWS, identifying and resolving these types of bugs helps CDK maintain the ability to help Developers continue moving quickly. Great job Jacob!

And finally, to round out the CDK Corner, a round of applause to the following users who merged their first Pull Request to CDK in January! The CDK Community appreciates your hard work and effort!

Scaling up a Serverless Web Crawler and Search Engine

Post Syndicated from Jack Stevenson original https://aws.amazon.com/blogs/architecture/scaling-up-a-serverless-web-crawler-and-search-engine/

Introduction

Building a search engine can be a daunting undertaking. You must continually scrape the web and index its content so it can be retrieved quickly in response to a user’s query. The goal is to implement this in a way that avoids infrastructure complexity while remaining elastic. However, the architecture that achieves this is not necessarily obvious. In this blog post, we will describe a serverless search engine that can scale to crawl and index large web pages.

A simple search engine is composed of two main components:

  • A web crawler (or web scraper) to extract and store content from the web
  • An index to answer search queries

Web Crawler

You may have already read “Serverless Architecture for a Web Scraping Solution.” In this post, Dzidas reviews two different serverless architectures for a web scraper on AWS. Using AWS Lambda provides a simple and cost-effective option for crawling a website. However, it comes with a caveat: the Lambda timeout capped crawling time at 15 minutes. You can tackle this limitation and build a serverless web crawler that can scale to crawl larger portions of the web.

A typical web crawler algorithm uses a queue of URLs to visit. It performs the following:

  • It takes a URL off the queue
  • It visits the page at that URL
  • It scrapes any URLs it can find on the page
  • It pushes the ones that it hasn’t visited yet onto the queue
  • It repeats the preceding steps until the URL queue is empty

Even if we parallelize visiting URLs, we may still exceed the 15-minute limit for larger websites.

Breaking Down the Web Crawler Algorithm

AWS Step Functions is a serverless function orchestrator. It enables you to sequence one or more AWS Lambda functions to create a longer running workflow. It’s possible to break down this web crawler algorithm into steps that can be run in individual Lambda functions. The individual steps can then be composed into a state machine, orchestrated by AWS Step Functions.

Here is a possible state machine you can use to implement this web crawler algorithm:

Figure 1: Basic State Machine

Figure 1: Basic State Machine

1. ReadQueuedUrls – reads any non-visited URLs from our queue
2. QueueContainsUrls? – checks whether there are non-visited URLs remaining
3. CrawlPageAndQueueUrls – takes one URL off the queue, visits it, and writes any newly discovered URLs to the queue
4. CompleteCrawl – when there are no URLs in the queue, we’re done!

Each part of the algorithm can now be implemented as a separate Lambda function. Instead of the entire process being bound by the 15-minute timeout, this limit will now only apply to each individual step.

Where you might have previously used an in-memory queue, you now need a URL queue that will persist between steps. One option is to pass the queue around as an input and output of each step. However, you may be bound by the maximum I/O sizes for Step Functions. Instead, you can represent the queue as an Amazon DynamoDB table, which each Lambda function may read from or write to. The queue is only required for the duration of the crawl. So you can create the DynamoDB table at the start of the execution, and delete it once the crawler has finished.

Scaling up

Crawling one page at a time is going to be a bit slow. You can use the Step Functions “Map state” to run the CrawlPageAndQueueUrls to scrape multiple URLs at once. You should be careful not to bombard a website with thousands of parallel requests. Instead, you can take a fixed-size batch of URLs from the queue in the ReadQueuedUrls step.

An important limit to consider when working with Step Functions is the maximum execution history size. You can protect against hitting this limit by following the recommended approach of splitting work across multiple workflow executions. You can do this by checking the total number of URLs visited on each iteration. If this exceeds a threshold, you can spawn a new Step Functions execution to continue crawling.

Step Functions has native support for error handling and retries. You can take advantage of this to make the web crawler more robust to failures.

With these scaling improvements, here’s our final state machine:

Figure 2: Final State Machine

Figure 2: Final State Machine

This includes the same steps as before (1-4), but also two additional steps (5 and 6) responsible for breaking the workflow into multiple state machine executions.

Search Index

Deploying a scalable, efficient, and full-text search engine that provides relevant results can be complex and involve operational overheads. Amazon Kendra is a fully managed service, so there are no servers to provision. This makes it an ideal choice for our use case. Amazon Kendra supports HTML documents. This means you can store the raw HTML from the crawled web pages in Amazon Simple Storage Service (S3). Amazon Kendra will provide a machine learning powered search capability on top, which gives users fast and relevant results for their search queries.

Amazon Kendra does have limits on the number of documents stored and daily queries. However, additional capacity can be added to meet demand through query or document storage bundles.

The CrawlPageAndQueueUrls step writes the content of the web page it visits to S3. It also writes some metadata to help Amazon Kendra rank or present results. After crawling is complete, it can then trigger a data source sync job to ensure that the index stays up to date.

One aspect to be mindful of while employing Amazon Kendra in your solution is its cost model. It is priced per index/hour, which is more favorable for large-scale enterprise usage, than for smaller personal projects. We recommend you take note of the free tier of Amazon Kendra’s Developer Edition before getting started.

Overall Architecture

You can add in one more DynamoDB table to monitor your web crawl history. Here is the architecture for our solution:

Figure 3: Overall Architecture

Figure 3: Overall Architecture

A sample Node.js implementation of this architecture can be found on GitHub.

In this sample, a Lambda layer provides a Chromium binary (via chrome-aws-lambda). It uses Puppeteer to extract content and URLs from visited web pages. Infrastructure is defined using the AWS Cloud Development Kit (CDK), which automates the provisioning of cloud applications through AWS CloudFormation.

The Amazon Kendra component of the example is optional. You can deploy just the serverless web crawler if preferred.

Conclusion

If you use fully managed AWS services, then building a serverless web crawler and search engine isn’t as daunting as it might first seem. We’ve explored ways to run crawler jobs in parallel and scale a web crawler using AWS Step Functions. We’ve utilized Amazon Kendra to return meaningful results for queries of our unstructured crawled content. We achieve all this without the operational overheads of building a search index from scratch. Review the sample code for a deeper dive into how to implement this architecture.

Mitigate data leakage through the use of AppStream 2.0 and end-to-end auditing

Post Syndicated from Chaim Landau original https://aws.amazon.com/blogs/security/mitigate-data-leakage-through-the-use-of-appstream-2-0-and-end-to-end-auditing/

Customers want to use AWS services to operate on their most sensitive data, but they want to make sure that only the right people have access to that data. Even when the right people are accessing data, customers want to account for what actions those users took while accessing the data.

In this post, we show you how you can use Amazon AppStream 2.0 to grant isolated access to sensitive data and decrease your attack surface. In addition, we show you how to achieve end-to-end auditing, which is designed to provide full traceability of all activities around your data.

To demonstrate this idea, we built a sample solution that provides a data scientist with access to an Amazon SageMaker Studio notebook using AppStream 2.0. The solution deploys a new Amazon Virtual Private Cloud (Amazon VPC) with isolated subnets, where the SageMaker notebook and AppStream 2.0 instances are set up.

Why AppStream 2.0?

AppStream 2.0 is a fully-managed, non-persistent application and desktop streaming service that provides access to desktop applications from anywhere by using an HTML5-compatible desktop browser.

Each time you launch an AppStream 2.0 session, a freshly-built, pre-provisioned instance is provided, using a prebuilt image. As soon as you close your session and the disconnect timeout period is reached, the instance is terminated. This allows you to carefully control the user experience and helps to ensure a consistent, secure environment each time. AppStream 2.0 also lets you enforce restrictions on user sessions, such as disabling the clipboard, file transfers, or printing.

Furthermore, AppStream 2.0 uses AWS Identity and Access Management (IAM) roles to grant fine-grained access to other AWS services such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon SageMaker, and other AWS services. This gives you both control over the access as well as an accounting, via Amazon CloudTrail, of what actions were taken and when.

These features make AppStream 2.0 uniquely suitable for environments that require high security and isolation.

Why SageMaker?

Developers and data scientists use SageMaker to build, train, and deploy machine learning models quickly. SageMaker does most of the work of each step of the machine learning process to help users develop high-quality models. SageMaker access from within AppStream 2.0 provides your data scientists and analysts with a suite of common and familiar data-science packages to use against isolated data.

Solution architecture overview

This solution allows a data scientist to work with a data set while connected to an isolated environment that doesn’t have an outbound path to the internet.

First, you build an Amazon VPC with isolated subnets and with no internet gateways attached. This ensures that any instances stood up in the environment don’t have access to the internet. To provide the resources inside the isolated subnets with a path to commercial AWS services such as Amazon S3, SageMaker, AWS System Manager you build VPC endpoints and attach them to the VPC, as shown in Figure 1.

Figure 1: Network Diagram

Figure 1: Network Diagram

You then build an AppStream 2.0 stack and fleet, and attach a security group and IAM role to the fleet. The purpose of the IAM role is to provide the AppStream 2.0 instances with access to downstream AWS services such as Amazon S3 and SageMaker. The IAM role design follows the least privilege model, to ensure that only the access required for each task is granted.

During the building of the stack, you will enable AppStream 2.0 Home Folders. This feature builds an S3 bucket where users can store files from inside their AppStream 2.0 session. The bucket is designed with a dedicated prefix for each user, where only they have access. We use this prefix to store the user’s pre-signed SagaMaker URLs, ensuring that no one user can access another users SageMaker Notebook.

You then deploy a SageMaker notebook for the data scientist to use to access and analyze the isolated data.

To confirm that the user ID on the AppStream 2.0 session hasn’t been spoofed, you create an AWS Lambda function that compares the user ID of the data scientist against the AppStream 2.0 session ID. If the user ID and session ID match, this indicates that the user ID hasn’t been impersonated.

Once the session has been validated, the Lambda function generates a pre-signed SageMaker URL that gives the data scientist access to the notebook.

Finally, you enable AppStream 2.0 usage reports to ensure that you have end-to-end auditing of your environment.

To help you easily deploy this solution into your environment, we’ve built an AWS Cloud Development Kit (AWS CDK) application and stacks, using Python. To deploy this solution, you can go to the Solution deployment section in this blog post.

Note: this solution was built with all resources being in a single AWS Region. The support of multi Region is possible but isn’t part of this blog post.

Solution requirements

Before you build a solution, you must know your security requirements. The solution in this post assumes a set of standard security requirements that you typically find in an enterprise environment:

  • User authentication is provided by a Security Assertion Markup Language (SAML) identity provider (IdP).
  • IAM roles are used to access AWS services such as Amazon S3 and SageMaker.
  • AWS IAM access keys and secret keys are prohibited.
  • IAM policies follow the least privilege model so that only the required access is granted.
  • Windows clipboard, file transfer, and printing to local devices is prohibited.
  • Auditing and traceability of all activities is required.

Note: before you will be able to integrate SAML with AppStream 2.0, you will need to follow the AppStream 2.0 Integration with SAML 2.0 guide. There are quite a few steps and it will take some time to set up. SAML authentication is optional, however. If you just want to prototype the solution and see how it works, you can do that without enabling SAML integration.

Solution components

This solution uses the following technologies:

  • Amazon VPC – provides an isolated network where the solution will be deployed.
  • VPC endpoints – provide access from the isolated network to commercial AWS services such as Amazon S3 and SageMaker.
  • AWS Systems Manager – stores parameters such as S3 bucket names.
  • AppStream 2.0 – provides hardened instances to run the solution on.
  • AppStream 2.0 home folders – store users’ session information.
  • Amazon S3 – stores application scripts and pre-signed SageMaker URLs.
  • SageMaker notebook – provides data scientists with tools to access the data.
  • AWS Lambda – runs scripts to validate the data scientist’s session, and generates pre-signed URLs for the SageMaker notebook.
  • AWS CDK – deploys the solution.
  • PowerShell – processes scripts on AppStream 2.0 Microsoft Windows instances.

Solution high-level design and process flow

The following figure is a high-level depiction of the solution and its process flow.

Figure 2: Solution process flow

Figure 2: Solution process flow

The process flow—illustrated in Figure 2—is:

  1. A data scientist clicks on an AppStream 2.0 federated or a streaming URL.
    1. If it’s a federated URL, the data scientist authenticates using their corporate credentials, as well as MFA if required.
    1. If it’s a streaming URL, no further authentication is required.
  2. The data scientist is presented with a PowerShell application that’s been made available to them.
  3. After starting the application, it starts the PowerShell script on an AppStream 2.0 instance.
  4. The script then:
    1. Downloads a second PowerShell script from an S3 bucket.
    2. Collects local AppStream 2.0 environment variables:
      1. AppStream_UserName
      2. AppStream_Session_ID
      3. AppStream_Resource_Name
    3. Stores the variables in the session.json file and copies the file to the home folder of the session on Amazon S3.
  5. The PUT event of the JSON file into the Amazon S3 bucket triggers an AWS Lambda function that performs the following:
    1. Reads the session.json file from the user’s home folder on Amazon S3.
    2. Performs a describe action against the AppStream 2.0 API to ensure that the session ID and the user ID match. This helps to prevent the user from manipulating the local environment variable to pretend to be someone else (spoofing), and potentially gain access to unauthorized data.
    3. If the session ID and user ID match, a pre-signed SageMaker URL is generated and stored in session_url.txt, and copied to the user’s home folder on Amazon S3.
    4. If the session ID and user ID do not match, the Lambda function ends without generating a pre-signed URL.
  6. When the PowerShell script detects the session_url.txt file, it opens the URL, giving the user access to their SageMaker notebook.

Code structure

To help you deploy this solution in your environment, we’ve built a set of code that you can use. The code is mostly written in Python and for the AWS CDK framework, and with an AWS CDK application and some PowerShell scripts.

Note: We have chosen the default settings on many of the AWS resources our code deploys. Before deploying the code, you should conduct a thorough code review to ensure the resources you are deploying meet your organization’s requirements.

AWS CDK application – ./app.py

To make this application modular and portable, we’ve structured it in separate AWS CDK nested stacks:

  • vpc-stack – deploys a VPC with two isolated subnets, along with three VPC endpoints.
  • s3-stack – deploys an S3 bucket, copies the AppStream 2.0 PowerShell scripts, and stores the bucket name in an SSM parameter.
  • appstream-service-roles-stack – deploys AppStream 2.0 service roles.
  • appstream-stack – deploys the AppStream 2.0 stack and fleet, along with the required IAM roles and security groups.
  • appstream-start-fleet-stack – builds a custom resource that starts the AppStream 2.0 fleet.
  • notebook-stack – deploys a SageMaker notebook, along with IAM roles, security groups, and an AWS Key Management Service (AWS KMS) encryption key.
  • saml-stack – deploys a SAML role as a placeholder for SAML authentication.

PowerShell scripts

The solution uses the following PowerShell scripts inside the AppStream 2.0 instances:

  • sagemaker-notebook-launcher.ps1 – This script is part of the AppStream 2.0 image and downloads the sagemaker-notebook.ps1 script.
  • sagemaker-notebook.ps1 – starts the process of validating the session and generating the SageMaker pre-signed URL.

Note: Having the second script reside on Amazon S3 provides flexibility. You can modify this script without having to create a new AppStream 2.0 image.

Deployment Prerequisites

To deploy this solution, your deployment environment must meet the following prerequisites:

Note: We used AWS Cloud9 with Amazon Linux 2 to test this solution, as it comes preinstalled with most of the prerequisites for deploying this solution.

Deploy the solution

Now that you know the design and components, you’re ready to deploy the solution.

Note: In our demo solution, we deploy two stream.standard.small AppStream 2.0 instances, using Windows Server 2019. This gives you a reasonable example to work from. In your own environment you might need more instances, a different instance type, or a different version of Windows. Likewise, we deploy a single SageMaker notebook instance of type ml.t3.medium. To change the AppStream 2.0 and SageMaker instance types, you will need to modify the stacks/data_sandbox_appstream.py and stacks/data_sandbox_notebook.py respectively.

Step 1: AppStream 2.0 image

An AppStream 2.0 image contains applications that you can stream to your users. It’s what allows you to curate the user experience by preconfiguring the settings of the applications you stream to your users.

To build an AppStream 2.0 image:

  1. Build an image following the Create a Custom AppStream 2.0 Image by Using the AppStream 2.0 Console tutorial.

    Note: In Step 1: Install Applications on the Image Builder in this tutorial, you will be asked to choose an Instance family. For this example, we chose General Purpose. If you choose a different Instance family, you will need to make sure the appstream_instance_type specified under Step 2: Code modification is of the same family.

    In Step 6: Finish Creating Your Image in this tutorial, you will be asked to provide a unique image name. Note down the image name as you will need it in Step 2 of this blog post.

  2. Copy notebook-launcher.ps1 to a location on the image. We recommend that you copy it to C:\AppStream.
  3. In Step 2—Create an AppStream 2.0 Application Catalog—of the tutorial, use C:\Windows\System32\Windowspowershell\v1.0\powershell.exe as the application, and the path to notebook-launcher.ps1 as the launch parameter.

Note: While testing your application during the image building process, the PowerShell script will fail because the underlying infrastructure is not present. You can ignore that failure during the image building process.

Step 2: Code modification

Next, you must modify some of the code to fit your environment.

Make the following changes in the cdk.json file:

  • vpc_cidr – Supply your preferred CIDR range to be used for the VPC.

    Note: VPC CIDR ranges are your private IP space and thus can consist of any valid RFC 1918 range. However, if the VPC you are planning on using for AppStream 2.0 needs to connect to other parts of your private network (on premise or other VPCs), you need to choose a range that does not conflict or overlap with the rest of your infrastructure.

  • appstream_Image_name – Enter the image name you chose when you built the Appstream 2.0 image in Step 1.a.
  • appstream_environment_name – The environment name is strictly cosmetic and drives the naming of your AppStream 2.0 stack and fleet.
  • appstream_instance_type – Enter the AppStream 2.0 instance type. The instance type must be part of the same instance family you used in Step 1 of the To build an AppStream 2.0 image section. For a list of AppStream 2.0 instances, visit https://aws.amazon.com/appstream2/pricing/.
  • appstream_fleet_type – Enter the fleet type. Allowed values are ALWAYS_ON or ON_DEMAND.
  • Idp_name – If you have integrated SAML with this solution, you will need to enter the IdP name you chose when creating the SAML provider in the IAM Console.

Step 3: Deploy the AWS CDK application

The CDK application deploys the CDK stacks.

The stacks include:

  • VPC with isolated subnets
  • VPC Endpoints for S3, SageMaker, and Systems Manager
  • S3 bucket
  • AppStream 2.0 stack and fleet
  • Two AppStream 2.0 stream.standard.small instances
  • A single SageMaker ml.t2.medium notebook

Run the following commands to deploy the AWS CDK application:

  1. Install the AWS CDK Toolkit.
    npm install -g aws-cdk
    

  2. Create and activate a virtual environment.
    python -m venv .datasandbox-env
    
    source .datasandbox-env/bin/activate
    

  3. Change directory to the root folder of the code repository.
  4. Install the required packages.
    pip install -r requirements.txt
    

  5. If you haven’t used AWS CDK in your account yet, run:
    cdk bootstrap
    

  6. Deploy the AWS CDK stack.
    cdk deploy DataSandbox
    

Step 4: Test the solution

After the stack has successfully deployed, allow approximately 25 minutes for the AppStream 2.0 fleet to reach a running state. Testing will fail if the fleet isn’t running.

Without SAML

If you haven’t added SAML authentication, use the following steps to test the solution.

  1. In the AWS Management Console, go to AppStream 2.0 and then to Stacks.
  2. Select the stack, and then select Action.
  3. Select Create streaming URL.
  4. Enter any user name and select Get URL.
  5. Enter the URL in another tab of your browser and test your application.

With SAML

If you are using SAML authentication, you will have a federated login URL that you need to visit.

If everything is working, your SageMaker notebook will be launched as shown in Figure 3.

Figure 3: SageMaker Notebook

Figure 3: SageMaker Notebook

Note: if you receive a web browser timeout, verify that the SageMaker notebook instance “Data-Sandbox-Notebook” is currently in InService status.

Auditing

Auditing for this solution is provided through AWS CloudTrail and AppStream 2.0 Usage Reports. Though CloudTrail is enabled by default, to collect and store the CloudTrail logs, you must create a trail for your AWS account.

The following logs will be available for you to use, to provide auditing.

Connecting the dots

To get an accurate idea of your users’ activity, you have to correlate some logs from different services. First, you collect the login information from CloudTrail. This gives you the user ID of the user who logged in. You then collect the Amazon S3 put from CloudTrail, which gives you the IP address of the AppStream 2.0 instance. And finally, you collect the AppStream 2.0 usage report which gives you the IP address of the AppStream 2.0 instance, plus the user ID. This allows you to connect the user ID to the activity on Amazon S3. For auditing & controlling exploration activities with SageMaker, please visit this GitHub repository.

Though the logs are automatically being collected, what we have shown you here is a manual way of sifting through those logs. For a more robust solution on querying and analyzing CloudTrail logs, visit Querying AWS CloudTrail Logs.

Costs of this Solution

The cost for running this solution will depend on a number of factors like the instance size, the amount of data you store, and how many hours you use the solution. AppStream 2.0 is charged per instance hour and there is one instance in this example solution. You can see details on the AppStream 2.0 pricing page. VPC endpoints are charged by the hour and by how much data passes through them. There are three VPC endpoints in this solution (S3, System Manager, and SageMaker). VPC endpoint pricing is described on the Privatelink pricing page. SageMaker Notebooks are charged based on the number of instance hours and the instance type. There is one SageMaker instance in this solution, which may be eligible for free tier pricing. See the SageMaker pricing page for more details. Amazon S3 storage pricing depends on how much data you store, what kind of storage you use, and how much data transfers in and out of S3. The use in this solution may be eligible for free tier pricing. You can see details on the S3 pricing page.

Before deploying this solution, make sure to calculate your cost using the AWS Pricing Calculator, and the AppStream 2.0 pricing calculator.

Conclusion

Congratulations! You have deployed a solution that provides your users with access to sensitive and isolated data in a secure manner using AppStream 2.0. You have also implemented a mechanism that is designed to prevent user impersonation, and enabled end-to-end auditing of all user activities.

To learn about how Amazon is using AppStream 2.0, visit the blog post How Amazon uses AppStream 2.0 to provide data scientists and analysts with access to sensitive data.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Chaim Landau

As a Senior Cloud Architect at AWS, Chaim works with large enterprise customers, helping them create innovative solutions to address their cloud challenges. Chaim is passionate about his work, enjoys the creativity that goes into building solutions in the cloud, and derives pleasure from passing on his knowledge. In his spare time, he enjoys outdoor activities, spending time in nature, and immersing himself in his books.

Author

JD Braun

As a Data and Machine Learning Engineer, JD helps organizations design and implement modern data architectures to deliver value to their internal and external customers. In his free time, he enjoys exploring Minneapolis with his fiancée and black lab.

Developing enterprise application patterns with the AWS CDK

Post Syndicated from Krishnakumar Rengarajan original https://aws.amazon.com/blogs/devops/developing-application-patterns-cdk/

Enterprises often need to standardize their infrastructure as code (IaC) for governance, compliance, and quality control reasons. You also need to manage and centrally publish updates to your IaC libraries. In this post, we demonstrate how to use the AWS Cloud Development Kit (AWS CDK) to define patterns for IaC and publish them for consumption in controlled releases using AWS CodeArtifact.

AWS CDK is an open-source software development framework to model and provision cloud application resources in programming languages such as TypeScript, JavaScript, Python, Java, and C#/.Net. The basic building blocks of AWS CDK are called constructs, which map to one or more AWS resources, and can be composed of other constructs. Constructs allow high-level abstractions to be defined as patterns. You can synthesize constructs into AWS CloudFormation templates and deploy them into an AWS account.

AWS CodeArtifact is a fully managed service for managing the lifecycle of software artifacts. You can use CodeArtifact to securely store, publish, and share software artifacts. Software artifacts are stored in repositories, which are aggregated into a domain. A CodeArtifact domain allows organizational policies to be applied across multiple repositories. You can use CodeArtifact with common build tools and package managers such as NuGet, Maven, Gradle, npm, yarn, pip, and twine.

Solution overview

In this solution, we complete the following steps:

  1. Create two AWS CDK pattern constructs in Typescript: one for traditional three-tier web applications and a second for serverless web applications.
  2. Publish the pattern constructs to CodeArtifact as npm packages. npm is the package manager for Node.js.
  3. Consume the pattern construct npm packages from CodeArtifact and use them to provision the AWS infrastructure.

We provide more information about the pattern constructs in the following sections. The source code mentioned in this blog is available in GitHub.

Note: The code provided in this blog post is for demonstration purposes only. You must ensure that it meets your security and production readiness requirements.

Traditional three-tier web application construct

The first pattern construct is for a traditional three-tier web application running on Amazon Elastic Compute Cloud (Amazon EC2), with AWS resources consisting of Application Load Balancer, an Autoscaling group and EC2 launch configuration, an Amazon Relational Database Service (Amazon RDS) or Amazon Aurora database, and AWS Secrets Manager. The following diagram illustrates this architecture.

 

Traditional stack architecture

Serverless web application construct

The second pattern construct is for a serverless application with AWS resources in AWS Lambda, Amazon API Gateway, and Amazon DynamoDB.

Serverless application architecture

Publishing and consuming pattern constructs

Both constructs are written in Typescript and published to CodeArtifact as npm packages. A semantic versioning scheme is used to version the construct packages. After a package gets published to CodeArtifact, teams can consume them for deploying AWS resources. The following diagram illustrates this architecture.

Pattern constructs

Prerequisites

Before getting started, complete the following steps:

  1. Clone the code from the GitHub repository for the traditional and serverless web application constructs:
    git clone https://github.com/aws-samples/aws-cdk-developing-application-patterns-blog.git
    cd aws-cdk-developing-application-patterns-blog
  2. Configure AWS Identity and Access Management (IAM) permissions by attaching IAM policies to the user, group, or role implementing this solution. The following policy files are in the iam folder in the root of the cloned repo:
    • BlogPublishArtifacts.json – The IAM policy to configure CodeArtifact and publish packages to it.
    • BlogConsumeTraditional.json – The IAM policy to consume the traditional three-tier web application construct from CodeArtifact and deploy it to an AWS account.
    • PublishArtifacts.json – The IAM policy to consume the serverless construct from CodeArtifact and deploy it to an AWS account.

Configuring CodeArtifact

In this step, we configure CodeArtifact for publishing the pattern constructs as npm packages. The following AWS resources are created:

  • A CodeArtifact domain named blog-domain
  • Two CodeArtifact repositories:
    • blog-npm-store – For configuring the upstream NPM repository.
    • blog-repository – For publishing custom packages.

Deploy the CodeArtifact resources with the following code:

cd prerequisites/
rm -rf package-lock.json node_modules
npm install
cdk deploy --require-approval never
cd ..

Log in to the blog-repository. This step is needed for publishing and consuming the npm packages. See the following code:

aws codeartifact login \
     --tool npm \
     --domain blog-domain \
     --domain-owner $(aws sts get-caller-identity --output text --query 'Account') \
     --repository blog-repository

Publishing the pattern constructs

  1. Change the directory to the serverless construct:
    cd serverless
  2. Install the required npm packages:
    rm package-lock.json && rm -rf node_modules
    npm install
    
  3. Build the npm project:
    npm run build
  4. Publish the construct npm package to the CodeArtifact repository:
    npm publish

    Follow the previously mentioned steps for building and publishing a traditional (classic Load Balancer plus Amazon EC2) web app by running these commands in the traditional directory.

    If the publishing is successful, you see messages like the following screenshots. The following screenshot shows the traditional infrastructure.

    Successful publishing of Traditional construct package to CodeArtifact

    The following screenshot shows the message for the serverless infrastructure.

    Successful publishing of Serverless construct package to CodeArtifact

    We just published version 1.0.1 of both the traditional and serverless web app constructs. To release a new version, we can simply update the version attribute in the package.json file in the traditional or serverless folder and repeat the last two steps.

    The following code snippet is for the traditional construct:

    {
        "name": "traditional-infrastructure",
        "main": "lib/index.js",
        "files": [
            "lib/*.js",
            "src"
        ],
        "types": "lib/index.d.ts",
        "version": "1.0.1",
    ...
    }

    The following code snippet is for the serverless construct:

    {
        "name": "serverless-infrastructure",
        "main": "lib/index.js",
        "files": [
            "lib/*.js",
            "src"
        ],
        "types": "lib/index.d.ts",
        "version": "1.0.1",
    ...
    }

Consuming the pattern constructs from CodeArtifact

In this step, we demonstrate how the pattern constructs published in the previous steps can be consumed and used to provision AWS infrastructure.

  1. From the root of the GitHub package, change the directory to the examples directory containing code for consuming traditional or serverless constructs.To consume the traditional construct, use the following code:
    cd examples/traditional

    To consume the serverless construct, use the following code:

    cd examples/serverless
  2. Open the package.json file in either directory and note that the packages and versions we consume are listed in the dependencies section, along with their version.
    The following code shows the traditional web app construct dependencies:

    "dependencies": {
        "@aws-cdk/core": "1.30.0",
        "traditional-infrastructure": "1.0.1",
        "aws-cdk": "1.47.0"
    }

    The following code shows the serverless web app construct dependencies:

    "dependencies": {
        "@aws-cdk/core": "1.30.0",
        "serverless-infrastructure": "1.0.1",
        "aws-cdk": "1.47.0"
    }
  3. Install the pattern artifact npm package along with the dependencies:
    rm package-lock.json && rm -rf node_modules
    npm install
    
  4. As an optional step, if you need to override the default Lambda function code, build the npm project. The following commands build the Lambda function source code:
    cd ../override-serverless
    npm run build
    cd -
  5. Bootstrap the project with the following code:
    cdk bootstrap

    This step is applicable for serverless applications only. It creates the Amazon Simple Storage Service (Amazon S3) staging bucket where the Lambda function code and artifacts are stored.

  6. Deploy the construct:
    cdk deploy --require-approval never

    If the deployment is successful, you see messages similar to the following screenshots. The following screenshot shows the traditional stack output, with the URL of the Load Balancer endpoint.

    Traditional CloudFormation stack outputs

    The following screenshot shows the serverless stack output, with the URL of the API Gateway endpoint.

    Serverless CloudFormation stack outputs

    You can test the endpoint for both constructs using a web browser or the following curl command:

    curl <endpoint output>

    The traditional web app endpoint returns a response similar to the following:

    [{"app": "traditional", "id": 1605186496, "purpose": "blog"}]

    The serverless stack returns two outputs. Use the output named ServerlessStack-v1.Api. See the following code:

    [{"purpose":"blog","app":"serverless","itemId":"1605190688947"}]

  7. Optionally, upgrade to a new version of pattern construct.
    Let’s assume that a new version of the serverless construct, version 1.0.2, has been published, and we want to upgrade our AWS infrastructure to this version. To do this, edit the package.json file and change the traditional-infrastructure or serverless-infrastructure package version in the dependencies section to 1.0.2. See the following code example:

    "dependencies": {
        "@aws-cdk/core": "1.30.0",
        "serverless-infrastructure": "1.0.2",
        "aws-cdk": "1.47.0"
    }

    To update the serverless-infrastructure package to 1.0.2, run the following command:

    npm update

    Then redeploy the CloudFormation stack:

    cdk deploy --require-approval never

Cleaning up

To avoid incurring future charges, clean up the resources you created.

  1. Delete all AWS resources that were created using the pattern constructs. We can use the AWS CDK toolkit to clean up all the resources:
    cdk destroy --force

    For more information about the AWS CDK toolkit, see Toolkit reference. Alternatively, delete the stack on the AWS CloudFormation console.

  2. Delete the CodeArtifact resources by deleting the CloudFormation stack that was deployed via AWS CDK:
    cd prerequisites
    cdk destroy –force
    

Conclusion

In this post, we demonstrated how to publish AWS CDK pattern constructs to CodeArtifact as npm packages. We also showed how teams can consume the published pattern constructs and use them to provision their AWS infrastructure.

This mechanism allows your infrastructure for AWS services to be provisioned from the configuration that has been vetted for quality control and security and governance checks. It also provides control over when new versions of the pattern constructs are released, and when the teams consuming the constructs can upgrade to the newly released versions.

About the Authors

Usman Umar

 

Usman Umar is a Sr. Applications Architect at AWS Professional Services. He is passionate about developing innovative ways to solve hard technical problems for the customers. In his free time, he likes going on biking trails, doing car modifications, and spending time with his family.

 

 

Krishnakumar Rengarajan

 

Krishnakumar Rengarajan is a DevOps Consultant with AWS Professional Services. He enjoys working with customers and focuses on building and delivering automated solutions that enables customers on their AWS cloud journeys.

CDK Corner – January 2020

Post Syndicated from Richard H Boyd original https://aws.amazon.com/blogs/devops/cdk-corner-january-2020/

December was an exciting month for CDK! Jason Fulghum delivered an AWS re:Invent presentation on how CDK has changed over the past year and what customers can expect in the next year. The highlight of this talk was the alpha release of CDK v2. This is the first major version bump since the AWS CDK went GA in July of 2019.

CDK v2 addresses two common pieces of feedback we received regarding dependency management with individually packaged modules. First, due to CDK being developed in the open, some modules are more or less mature than others but they are all equally available to install and use. Asking customers to verify the stability of every module they use directly and indirectly isn’t the kind of delightful experience we want customers to have. Second, explicitly installing a package for every service that’s needed can be quite cumbersome. CDK v2 will bundle all AWS CloudFormation L1 constructs and all stable L2 into a single package called aws-cdk-lib (code named “mono-cdk”). We will use a different model for annotating APIs that are not yet final without introducing breaking changes in minor versions. Additionally, all CDK constructs (AWS CDK, CDK8s, and CDKtf) will now inherit directly from a common Construct class. This change lays the foundation for sharing constructs across the CDK ecosystem.

Last month AWS Lambda announced container image support and this marked the first new AWS service feature which also launched on the same day with CDK support. This means that we were able to release an updated Lambda Function construct to support a new feature on the same day the feature was announced. I expect that we’ll see more features and services launching like this in the future.

Brand new L2 constructs were added for Amazon Interactive Video Service and CloudFront’s Lambda@Edge. This marks the start of the journey for these constructs that will eventually become stable and delightful enough to use for your production workloads. Speaking of the journey to stability, December saw three existing modules graduated to Stable. These modules are cloudfront, cloudfront-origins, and codeguruprofiler. Constructs marked as stable may include backward compatible changes only if the major version of CDK is incremented, and even then, most breaking changes will be removal of deprecated APIs from the previous version.

Finally, we get to my favorite part of this update. I’d like to take some time in each post to highlight a community member contribution and talk about how it makes the AWS CDK Community better. This month’s Contribution is PR #12090 by perennial CDK contributor Jonathan Goldwasser. This change adds a feature that will automatically remove the contents of an Amazon S3 Bucket when the bucket resource is removed from its CloudFormation stack. Before this feature was added, customers would need to manually empty a bucket of its contents for the bucket to be successfully deleted via CloudFormation. The coolest part of this contribution is that it both solves a real problem that customers experience and that the work was started by one community member and finished by another. People often think of open source software as the sum of individual contributions, but this specific pull request shows that collaboration takes many forms and contributions don’t always appear in the commit history.

Field Notes: Comparing Algorithm Performance Using MLOps and the AWS Cloud Development Kit

Post Syndicated from Moataz Gaber original https://aws.amazon.com/blogs/architecture/field-notes-comparing-algorithm-performance-using-mlops-and-the-aws-cloud-development-kit/

Comparing machine learning algorithm performance is fundamental for machine learning practitioners, and data scientists. The goal is to evaluate the appropriate algorithm to implement for a known business problem.

Machine learning performance is often correlated to the usefulness of the model deployed. Improving the performance of the model typically results in an increased accuracy of the prediction. Model accuracy is a key performance indicator (KPI) for businesses when evaluating production readiness and identifying the appropriate algorithm to select earlier in model development. Organizations benefit from reduced project expenses, accelerated project timelines and improved customer experience. Nevertheless, some organizations have not introduced a model comparison process into their workflow which negatively impacts cost and productivity.

In this blog post, I describe how you can compare machine learning algorithms using Machine Learning Operations (MLOps). You will learn how to create an MLOps pipeline for comparing machine learning algorithms performance using AWS Step Functions, AWS Cloud Development Kit (CDK) and Amazon SageMaker.

First, I explain the use case that will be addressed through this post. Then, I explain the design considerations for the solution. Finally, I provide access to a GitHub repository which includes all the necessary steps for you to replicate the solution I have described, in your own AWS account.

Understanding the Use Case

Machine learning has many potential uses and quite often the same use case is being addressed by different machine learning algorithms. Let’s take Amazon Sagemaker built-in algorithms. As an example, if you are having a “Regression” use case, it can be addressed using (Linear Learner, XGBoost and KNN) algorithms. Another example for a “Classification” use case you can use algorithm such as (XGBoost, KNN, Factorization Machines and Linear Learner). Similarly for “Anomaly Detection” there are (Random Cut Forests and IP Insights).

In this post, it is a “Regression” use case to identify the age of the abalone which can be calculated based on the number of rings on its shell (age equals to number of rings plus 1.5). Usually the number of rings are counted through microscopes examinations.

I use the abalone dataset in libsvm format which contains 9 fields [‘Rings’, ‘Sex’, ‘Length’,’ Diameter’, ‘Height’,’ Whole Weight’,’ Shucked Weight’,’ Viscera Weight’ and ‘Shell Weight’] respectively.

The features starting from Sex to Shell Weight are physical measurements that can be measured using the correct tools. Therefore, using the machine learning algorithms (Linear Learner and XGBoost) to address this use case, the complexity of having to examine the abalone under microscopes to understand its age can be improved.

Benefits of the AWS Cloud Development Kit (AWS CDK)

The AWS Cloud Development Kit (AWS CDK) is an open source software development framework to define your cloud application resources.

The AWS CDK uses the jsii which is an interface developed by AWS that allows code in any language to naturally interact with JavaScript classes. It is the technology that enables the AWS Cloud Development Kit to deliver polyglot libraries from a single codebase.

This means that you can use the CDK and define your cloud application resources in typescript language for example. Then by compiling your source module using jsii, you can package it as modules in one of the supported target languages (e.g: Javascript, python, Java and .Net). So if your developers or customers prefer any of those languages, you can easily package and export the code to their preferred choice.

Also, the cdk tf provides constructs for defining Terraform HCL state files and the cdk8s enables you to use constructs for defining kubernetes configuration in TypeScript, Python, and Java. So by using the CDK you have a faster development process and easier cloud onboarding. It makes your cloud resources more flexible for sharing.

Prerequisites

Overview of solution

This architecture serves as an example of how you can build a MLOps pipeline that orchestrates the comparison of results between the predictions of two algorithms.

The solution uses a completely serverless environment so you don’t have to worry about managing the infrastructure. It also deletes resources not needed after collecting the predictions results, so as not to incur any additional costs.

Figure 1: Solution Architecture

Walkthrough

In the preceding diagram, the serverless MLOps pipeline is deployed using AWS Step Functions workflow. The architecture contains the following steps:

  1. The dataset is uploaded to the Amazon S3 cloud storage under the /Inputs directory (prefix).
  2. The uploaded file triggers AWS Lambda using an Amazon S3 notification event.
  3. The Lambda function then will initiate the MLOps pipeline built using a Step Functions state machine.
  4. The starting lambda will start by collecting the region corresponding training images URIs for both Linear Learner and XGBoost algorithms. These are used in training both algorithms over the dataset. It will also get the Amazon SageMaker Spark Container Image which is used for running the SageMaker processing Job.
  5. The dataset is in libsvm format which is accepted by the XGBoost algorithm as per the Input/Output Interface for the XGBoost Algorithm. However, this is not supported by the Linear Learner Algorithm as per Input/Output interface for the linear learner algorithm. So we need to run a processing job using Amazon SageMaker Data Processing with Apache Spark. The processing job will transform the data from libsvm to csv and will divide the dataset into train, validation and test datasets. The output of the processing job will be stored under /Xgboost and /Linear directories (prefixes).

Figure 2: Train, validation and test samples extracted from dataset

6. Then the workflow of Step Functions will perform the following steps in parallel:

    • Train both algorithms.
    • Create models out of trained algorithms.
    • Create endpoints configurations and deploy predictions endpoints for both models.
    • Invoke lambda function to describe the status of the deployed endpoints and wait until the endpoints become in “InService”.
    • Invoke lambda function to perform 3 live predictions using boto3 and the “test” samples taken from the dataset to calculate the average accuracy of each model.
    • Invoke lambda function to delete deployed endpoints not to incur any additional charges.

7. Finally, a Lambda function will be invoked to determine which model has better accuracy in predicting the values.

The following shows a diagram of the workflow of the Step Functions:

Figure 3: AWS Step Functions workflow graph

The code to provision this solution along with step by step instructions can be found at this GitHub repo.

Results and Next Steps

After waiting for the complete execution of step functions workflow, the results are depicted in the following diagram:

Figure 4: Comparison results

This doesn’t necessarily mean that the XGBoost algorithm will always be the better performing algorithm. It just means that the performance was the result of these factors:

  • the hyperparameters configured for each algorithm
  • the number of epochs performed
  • the amount of dataset samples used for training

To make sure that you are getting better results from the models, you can run hyperparameters tuning jobs which will run many training jobs on your dataset using the algorithms and ranges of hyperparameters that you specify. This helps you allocate which set of hyperparameters which are giving better results.

Finally, you can use this comparison to determine which algorithm is best suited for your production environment. Then you can configure your step functions workflow to update the configuration of the production endpoint with the better performing algorithm.

Figure 5: Update production endpoint workflow

Conclusion

This post showed you how to create a repeatable, automated pipeline to deliver the better performing algorithm to your production predictions endpoint. This helps increase the productivity and reduce the time of manual comparison.  You also learned to provision the solution using AWS CDK and to perform regular cleaning of deployed resources to drive down business costs. If this post helps you or inspires you to solve a problem, share your thoughts and questions in the comments. You can use and extend the code on the GitHub repo.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers

Discovering sensitive data in AWS CodeCommit with AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/discovering-sensitive-data-in-aws-codecommit-with-aws-lambda-2/

This post is courtesy of Markus Ziller, Solutions Architect.

Today, git is a de facto standard for version control in modern software engineering. The workflows enabled by git’s branching capabilities are a major reason for this. However, with git’s distributed nature, it can be difficult to reliably remove changes that have been committed from all copies of the repository. This is problematic when secrets such as API keys have been accidentally committed into version control. The longer it takes to identify and remove secrets from git, the more likely that the secret has been checked out by another user.

This post shows a solution that automatically identifies credentials pushed to AWS CodeCommit in near-real-time. I also show three remediation measures that you can use to reduce the impact of secrets pushed into CodeCommit:

  • Notify users about the leaked credentials.
  • Lock the repository for non-admins.
  • Hard reset the CodeCommit repository to a healthy state.

I use the AWS Cloud Development Kit (CDK). This is an open source software development framework to model and provision cloud application resources. Using the CDK can reduce the complexity and amount of code needed to automate the deployment of resources.

Overview of solution

The services in this solution are AWS Lambda, AWS CodeCommit, Amazon EventBridge, and Amazon SNS. These services are part of the AWS serverless platform. They help reduce undifferentiated work around managing servers, infrastructure, and the parts of the application that add less value to your customers. With serverless, the solution scales automatically, has built-in high availability, and you only pay for the resources you use.

Solution architecture

This diagram outlines the workflow implemented in this blog:

  1. After a developer pushes changes to CodeCommit, it emits an event to an event bus.
  2. A rule defined on the event bus routes this event to a Lambda function.
  3. The Lambda function uses the AWS SDK for JavaScript to get the changes introduced by commits pushed to the repository.
  4. It analyzes the changes for secrets. If secrets are found, it publishes another event to the event bus.
  5. Rules associated with this event type then trigger invocations of three Lambda functions A, B, and C with information about the problematic changes.
  6. Each of the Lambda functions runs a remediation measure:
    • Function A sends out a notification to an SNS topic that informs users about the situation (A1).
    • Function B locks the repository by setting a tag with the AWS SDK (B2). It sends out a notification about this action (B2).
    • Function C runs git commands that remove the problematic commit from the CodeCommit repository (C2). It also sends out a notification (C1).

Walkthrough

The following walkthrough explains the required components, their interactions and how the provisioning can be automated via CDK.

For this walkthrough, you need:

Checkout and deploy the sample stack:

  1. After completing the prerequisites, clone the associated GitHub repository by running the following command in a local directory:
    git clone [email protected]:aws-samples/discover-sensitive-data-in-aws-codecommit-with-aws-lambda.git
  2. Open the repository in a local editor and review the contents of cdk/lib/resources.ts, src/handlers/commits.ts, and src/handlers/remediations.ts.
  3. Follow the instructions in the README.md to deploy the stack.

The CDK will deploy resources for the following services in your account.

Using CodeCommit to manage your git repositories

The CDK creates a new empty repository called TestRepository and adds a tag RepoState with an initial value of ok. You later use this tag in the LockRepo remediation strategy to restrict access.

It also creates two IAM groups with one user in each. Members of the CodeCommitSuperUsers group are always able to access the repository, while members of the CodeCommitUsers group can only access the repository when the value of the tag RepoState is not locked.

I also import the CodeCommitSystemUser into the CDK. Since the user requires git credentials in a downloaded CSV file, it cannot be created by the CDK. Instead it must be created as described in the README file.

The following CDK code sets up all the described resources:

const TAG_NAME = "RepoState";

const superUsers = new Group(this, "CodeCommitSuperUsers", { groupName: "CodeCommitSuperUsers" });
superUsers.addUser(new User(this, "CodeCommitSuperUserA", {
    password: new Secret(this, "CodeCommitSuperUserPassword").secretValue,
    userName: "CodeCommitSuperUserA"
}));

const users = new Group(this, "CodeCommitUsers", { groupName: "CodeCommitUsers" });
users.addUser(new User(this, "User", {
    password: new Secret(this, "CodeCommitUserPassword").secretValue,
    userName: "CodeCommitUserA"
}));

const systemUser = User.fromUserName(this, "CodeCommitSystemUser", props.codeCommitSystemUserName);

const repo = new Repository(this, "Repository", {
    repositoryName: "TestRepository",
    description: "The repository to test this project out",
});
Tags.of(repo).add(TAG_NAME, "ok");

users.addToPolicy(new PolicyStatement({
    effect: Effect.ALLOW,
    actions: ["*"],
    resources: [repo.repositoryArn],
    conditions: {
        StringNotEquals: {
            [`aws:ResourceTag/${TAG_NAME}`]: "locked"
        }
    }
}));

superUsers.addToPolicy(new PolicyStatement({
    effect: Effect.ALLOW,
    actions: ["*"],
    resources: [repo.repositoryArn]
}));

Using EventBridge to pass events between components

I use EventBridge, a serverless event bus, to connect the Lambda functions together. Many AWS services like CodeCommit are natively integrated into EventBridge and publish events about changes in their environment.

repo.onCommit is a higher-level CDK construct. It creates the required resources to invoke a Lambda function for every commit to a given repository. The created events rule looks like this:

EventBridge rule definition

Note that this event rule only matches commit events in TestRepository. To send commits of all repositories in that account to the inspecting Lambda function, remove the resources filter in the event pattern.

CodeCommit Repository State Change is a default event that is published by CodeCommit if changes are made to a repository. In addition, I define CodeCommit Security Event, a custom event, which Lambda publishes to the same event bus if secrets are discovered in the inspected code.

The sample below shows how you can set up Lambda functions as targets for both type of events.

const DETAIL_TYPE = "CodeCommit Security Event";
const eventBus = new EventBus(this, "CodeCommitEventBus", {
    eventBusName: "CodeCommitSecurityEvents"
});

repo.onCommit("AnyCommitEvent", {
    ruleName: "CallLambdaOnAnyCodeCommitEvent",
    target: new targets.LambdaFunction(commitInspectLambda)
});


new Rule(this, "CodeCommitSecurityEvent", {
    eventBus,
    enabled: true,
    ruleName: "CodeCommitSecurityEventRule",
    eventPattern: {
        detailType: [DETAIL_TYPE]
    },
    targets: [
        new targets.LambdaFunction(lockRepositoryLambda),
        new targets.LambdaFunction(raiseAlertLambda),
        new targets.LambdaFunction(forcefulRevertLambda)
    ]
});

Using Lambda functions to run remediation measures

AWS Lambda functions allow you to run code in response to events. The example defines four Lambda functions.

By comparing the delta to its predecessor, the commitInspectLambda function analyzes if secrets are introduced by a commit. With the CDK, you can create a Lambda function with:

const myLambdaInCDK = new Function(this, "UniqueIdentifierRequiredByCDK", {
    runtime: Runtime.NODEJS_12_X,
    handler: "<handlerfile>.<function name>",
    code: Code.fromAsset(path.join(__dirname, "..", "..", "src", "handlers")),
    // See git repository for complete code
});

The code for this Lambda function uses the AWS SDK for JavaScript to fetch the details of the commit, the differences introduced, and the new content.

The code checks each modified file line by line with a regular expression that matches typical secret formats. In src/handlers/regex.json, I provide a few regular expressions that match common secrets. You can extend this with your own patterns.

If a secret is discovered, a CodeCommit Security Event is published to the event bus. EventBridge then invokes all Lambda functions that are registered as targets with this event. This demo triggers three remediation measures.

The raiseAlertLambda function uses the AWS SDK for JavaScript to send out a notification to all subscribers (that is, CodeCommit administrators) on an SNS topic. It takes no further action.

SNS.publish({
    TopicArn: <TOPIC_ARN>,
    Subject: `[ACTION REQUIRED] Secrets discovered in <repo>`
    Message: `<Your message>
}

Notification about secrets discovered in a commit in TestRepository

The lockRepositoryLambda function uses the AWS SDK for JavaScript to change the RepoState tag from ok to locked. This restricts access to members of the CodeCommitSuperUsers IAM group.

CodeCommit.tagResource({
    resourceArn: event.detail.repositoryArn,
    tags: {
        RepoState: "locked"
    }
})

In addition, the Lambda function uses SNS to send out a notification. The forcefulRevertLambda function runs the following git commands:

git clone <repository>
git checkout <branch>
git reset –hard <previousCommitId>
git push origin <branch> --force

These commands reset the repository to the last accepted commit, by forcefully removing the respective commit from the git history of your CodeCommit repo. I advise you to handle this with care and only activate it on a real project if you fully understand the consequences of rewriting git history.

The Node.js v12 runtime for Lambda does not have a git runtime installed by default. You can add one by using the git-lambda2 Lambda layer. This allows you to run git commands from within the Lambda function.

Logs for the remediation measure Hard Reset

Finally, this Lambda function also sends out a notification. The complete code is available in the GitHub repo.

Using SNS to notify users

To notify users about secrets discovered and actions taken, you create an SNS topic and subscribe to it via email.

const topic = new Topic(this, "CodeCommitSecurityEventNotification", {
    displayName: "CodeCommitSecurityEventNotification",
});

topic.addSubscription(new subs.EmailSubscription(/* your email address */));

Testing the solution

You can test the deployed solution by running these two sets of commands. First, add a file with no credentials:

echo "Clean file - no credentials here" > clean_file.txt
git add clean_file.txt
git commit clean_file.txt -m "Adds clean_file.txt"
git push

Then add a file containing credentials:

SECRET_LIKE_STRING=$(cat /dev/urandom | env LC_CTYPE=C tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)
echo "secret=$SECRET_LIKE_STRING" > problematic_file.txt
git add problematic_file.txt
git commit problematic_file.txt -m "Adds secret-like string to problematic_file.txt"
git push

This first command creates, commits and pushes an unproblematic file clean_file.txt that will pass the checks of commitInspectLambda. The second command creates, commits, and pushes problematic_file.txt, which matches the regular expressions and triggers the remediation measures.

If you check your email, you soon receive notifications about actions taken by the Lambda functions.

Cleaning up

To avoid incurring charges, delete the resources by running cdk destroy and confirming the deletion.

Conclusion

This post demonstrates how you can implement a solution to discover secrets in commits to AWS CodeCommit repositories. It also defines different strategies to remediate this.

The CDK code to set up all components is minimal and can be extended for remediation measures. The template is portable between Regions and uses serverless technologies to minimize cost and complexity.

For more serverless learning resources, visit Serverless Land.

Rapid and flexible Infrastructure as Code using the AWS CDK with AWS Solutions Constructs

Post Syndicated from Biff Gaut original https://aws.amazon.com/blogs/devops/rapid-flexible-infrastructure-with-solutions-constructs-cdk/

Introduction

As workloads move to the cloud and all infrastructure becomes virtual, infrastructure as code (IaC) becomes essential to leverage the agility of this new world. JSON and YAML are the powerful, declarative modeling languages of AWS CloudFormation, allowing you to define complex architectures using IaC. Just as higher level languages like BASIC and C abstracted away the details of assembly language and made developers more productive, the AWS Cloud Development Kit (AWS CDK) provides a programming model above the native template languages, a model that makes developers more productive when creating IaC. When you instantiate CDK objects in your Typescript (or Python, Java, etc.) application, those objects “compile” into a YAML template that the CDK deploys as an AWS CloudFormation stack.

AWS Solutions Constructs take this simplification a step further by providing a library of common service patterns built on top of the CDK. These multi-service patterns allow you to deploy multiple resources with a single object, resources that follow best practices by default – both independently and throughout their interaction.

Comparison of an Application stack with Assembly Language, 4th generation language and Object libraries such as Hibernate with an IaC stack of CloudFormation, AWS CDK and AWS Solutions Constructs

Application Development Stack vs. IaC Development Stack

Solution overview

To demonstrate how using Solutions Constructs can accelerate the development of IaC, in this post you will create an architecture that ingests and stores sensor readings using Amazon Kinesis Data Streams, AWS Lambda, and Amazon DynamoDB.

An architecture diagram showing sensor readings being sent to a Kinesis data stream. A Lambda function will receive the Kinesis records and store them in a DynamoDB table.

Prerequisite – Setting up the CDK environment

Tip – If you want to try this example but are concerned about the impact of changing the tools or versions on your workstation, try running it on AWS Cloud9. An AWS Cloud9 environment is launched with an AWS Identity and Access Management (AWS IAM) role and doesn’t require configuring with an access key. It uses the current region as the default for all CDK infrastructure.

To prepare your workstation for CDK development, confirm the following:

  • Node.js 10.3.0 or later is installed on your workstation (regardless of the language used to write CDK apps).
  • You have configured credentials for your environment. If you’re running locally you can do this by configuring the AWS Command Line Interface (AWS CLI).
  • TypeScript 2.7 or later is installed globally (npm -g install typescript)

Before creating your CDK project, install the CDK toolkit using the following command:

npm install -g aws-cdk

Create the CDK project

  1. First create a project folder called stream-ingestion with these two commands:

mkdir stream-ingestion
cd stream-ingestion

  1. Now create your CDK application using this command:

npx [email protected] init app --language=typescript

Tip – This example will be written in TypeScript – you can also specify other languages for your projects.

At this time, you must use the same version of the CDK and Solutions Constructs. We’re using version 1.68.0 of both based upon what’s available at publication time, but you can update this with a later version for your projects in the future.

Let’s explore the files in the application this command created:

  • bin/stream-ingestion.ts – This is the module that launches the application. The key line of code is:

new StreamIngestionStack(app, 'StreamIngestionStack');

This creates the actual stack, and it’s in StreamIngestionStack that you will write the CDK code that defines the resources in your architecture.

  • lib/stream-ingestion-stack.ts – This is the important class. In the constructor of StreamIngestionStack you will add the constructs that will create your architecture.

During the deployment process, the CDK uploads your Lambda function to an Amazon S3 bucket so it can be incorporated into your stack.

  1. To create that S3 bucket and any other infrastructure the CDK requires, run this command:

cdk bootstrap

The CDK uses the same supporting infrastructure for all projects within a region, so you only need to run the bootstrap command once in any region in which you create CDK stacks.

  1. To install the required Solutions Constructs packages for our architecture, run the these two commands from the command line:

npm install @aws-solutions-constructs/[email protected]
npm install @aws-solutions-constructs/[email protected]

Write the code

First you will write the Lambda function that processes the Kinesis data stream messages.

  1. Create a folder named lambda under stream-ingestion
  2. Within the lambda folder save a file called lambdaFunction.js with the following contents:
var AWS = require("aws-sdk");

// Create the DynamoDB service object
var ddb = new AWS.DynamoDB({ apiVersion: "2012-08-10" });

AWS.config.update({ region: process.env.AWS_REGION });

// We will configure our construct to 
// look for the .handler function
exports.handler = async function (event) {
  try {
    // Kinesis will deliver records 
    // in batches, so we need to iterate through
    // each record in the batch
    for (let record of event.Records) {
      const reading = parsePayload(record.kinesis.data);
      await writeRecord(record.kinesis.partitionKey, reading);
    };
  } catch (err) {
    console.log(`Write failed, err:\n${JSON.stringify(err, null, 2)}`);
    throw err;
  }
  return;
};

// Write the provided sensor reading data to the DynamoDB table
async function writeRecord(partitionKey, reading) {

  var params = {
    // Notice that Constructs automatically sets up 
    // an environment variable with the table name.
    TableName: process.env.DDB_TABLE_NAME,
    Item: {
      partitionKey: { S: partitionKey },  // sensor Id
      timestamp: { S: reading.timestamp },
      value: { N: reading.value}
    },
  };

  // Call DynamoDB to add the item to the table
  await ddb.putItem(params).promise();
}

// Decode the payload and extract the sensor data from it
function parsePayload(payload) {

  const decodedPayload = Buffer.from(payload, "base64").toString(
    "ascii"
  );

  // Our CLI command will send the records to Kinesis
  // with the values delimited by '|'
  const payloadValues = decodedPayload.split("|", 2)
  return {
    value: payloadValues[0],
    timestamp: payloadValues[1]
  }
}

We won’t spend a lot of time explaining this function – it’s pretty straightforward and heavily commented. It receives an event with one or more sensor readings, and for each reading it extracts the pertinent data and saves it to the DynamoDB table.

You will use two Solutions Constructs to create your infrastructure:

The aws-kinesisstreams-lambda construct deploys an Amazon Kinesis data stream and a Lambda function.

  • aws-kinesisstreams-lambda creates the Kinesis data stream and Lambda function that subscribes to that stream. To support this, it also creates other resources, such as IAM roles and encryption keys.

The aws-lambda-dynamodb construct deploys a Lambda function and a DynamoDB table.

  • aws-lambda-dynamodb creates an Amazon DynamoDB table and a Lambda function with permission to access the table.
  1. To deploy the first of these two constructs, replace the code in lib/stream-ingestion-stack.ts with the following code:
import * as cdk from "@aws-cdk/core";
import * as lambda from "@aws-cdk/aws-lambda";
import { KinesisStreamsToLambda } from "@aws-solutions-constructs/aws-kinesisstreams-lambda";

import * as ddb from "@aws-cdk/aws-dynamodb";
import { LambdaToDynamoDB } from "@aws-solutions-constructs/aws-lambda-dynamodb";

export class StreamIngestionStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const kinesisLambda = new KinesisStreamsToLambda(
      this,
      "KinesisLambdaConstruct",
      {
        lambdaFunctionProps: {
          // Where the CDK can find the lambda function code
          runtime: lambda.Runtime.NODEJS_10_X,
          handler: "lambdaFunction.handler",
          code: lambda.Code.fromAsset("lambda"),
        },
      }
    );

    // Next Solutions Construct goes here
  }
}

Let’s explore this code:

  • It instantiates a new KinesisStreamsToLambda object. This Solutions Construct will launch a new Kinesis data stream and a new Lambda function, setting up the Lambda function to receive all the messages in the Kinesis data stream. It will also deploy all the additional resources and policies required for the architecture to follow best practices.
  • The third argument to the constructor is the properties object, where you specify overrides of default values or any other information the construct needs. In this case you provide properties for the encapsulated Lambda function that informs the CDK where to find the code for the Lambda function that you stored as lambda/lambdaFunction.js earlier.
  1. Now you’ll add the second construct that connects the Lambda function to a new DynamoDB table. In the same lib/stream-ingestion-stack.ts file, replace the line // Next Solutions Construct goes here with the following code:
    // Define the primary key for the new DynamoDB table
    const primaryKeyAttribute: ddb.Attribute = {
      name: "partitionKey",
      type: ddb.AttributeType.STRING,
    };

    // Define the sort key for the new DynamoDB table
    const sortKeyAttribute: ddb.Attribute = {
      name: "timestamp",
      type: ddb.AttributeType.STRING,
    };

    const lambdaDynamoDB = new LambdaToDynamoDB(
      this,
      "LambdaDynamodbConstruct",
      {
        // Tell construct to use the Lambda function in
        // the first construct rather than deploy a new one
        existingLambdaObj: kinesisLambda.lambdaFunction,
        tablePermissions: "Write",
        dynamoTableProps: {
          partitionKey: primaryKeyAttribute,
          sortKey: sortKeyAttribute,
          billingMode: ddb.BillingMode.PROVISIONED,
          removalPolicy: cdk.RemovalPolicy.DESTROY
        },
      }
    );

    // Add autoscaling
    const readScaling = lambdaDynamoDB.dynamoTable.autoScaleReadCapacity({
      minCapacity: 1,
      maxCapacity: 50,
    });

    readScaling.scaleOnUtilization({
      targetUtilizationPercent: 50,
    });

Let’s explore this code:

  • The first two const objects define the names and types for the partition key and sort key of the DynamoDB table.
  • The LambdaToDynamoDB construct instantiated creates a new DynamoDB table and grants access to your Lambda function. The key to this call is the properties object you pass in the third argument.
    • The first property sent to LambdaToDynamoDB is existingLambdaObj – by setting this value to the Lambda function created by KinesisStreamsToLambda, you’re telling the construct to not create a new Lambda function, but to grant the Lambda function in the other Solutions Construct access to the DynamoDB table. This illustrates how you can chain many Solutions Constructs together to create complex architectures.
    • The second property sent to LambdaToDynamoDB tells the construct to limit the Lambda function’s access to the table to write only.
    • The third property sent to LambdaToDynamoDB is actually a full properties object defining the DynamoDB table. It provides the two attribute definitions you created earlier as well as the billing mode. It also sets the RemovalPolicy to DESTROY. This policy setting ensures that the table is deleted when you delete this stack – in most cases you should accept the default setting to protect your data.
  • The last two lines of code show how you can use statements to modify a construct outside the constructor. In this case we set up auto scaling on the new DynamoDB table, which we can access with the dynamoTable property on the construct we just instantiated.

That’s all it takes to create the all resources to deploy your architecture.

  1. Save all the files, then compile the Typescript into a CDK program using this command:

npm run build

  1. Finally, launch the stack using this command:

cdk deploy

(Enter “y” in response to Do you wish to deploy all these changes (y/n)?)

You will see some warnings where you override CDK default values. Because you are doing this intentionally you may disregard these, but it’s always a good idea to review these warnings when they occur.

Tip – Many mysterious CDK project errors stem from mismatched versions. If you get stuck on an inexplicable error, check package.json and confirm that all CDK and Solutions Constructs libraries have the same version number (with no leading caret ^). If necessary, correct the version numbers, delete the package-lock.json file and node_modules tree and run npm install. Think of this as the “turn it off and on again” first response to CDK errors.

You have now deployed the entire architecture for the demo – open the CloudFormation stack in the AWS Management Console and take a few minutes to explore all 12 resources that the program deployed (and the 380 line template generated to created them).

Feed the Stream

Now use the CLI to send some data through the stack.

Go to the Kinesis Data Streams console and copy the name of the data stream. Replace the stream name in the following command and run it from the command line.

aws kinesis put-records \
--stream-name StreamIngestionStack-KinesisLambdaConstructKinesisStreamXXXXXXXX-XXXXXXXXXXXX \
--records \
PartitionKey=1301,'Data=15.4|2020-08-22T01:16:36+00:00' \
PartitionKey=1503,'Data=39.1|2020-08-22T01:08:15+00:00'

Tip – If you are using the AWS CLI v2, the previous command will result in an “Invalid base64…” error because v2 expects the inputs to be Base64 encoded by default. Adding the argument --cli-binary-format raw-in-base64-out will fix the issue.

To confirm that the messages made it through the service, open the DynamoDB console – you should see the two records in the table.

Now that you’ve got it working, pause to think about what you just did. You deployed a system that can ingest and store sensor readings and scale to handle heavy loads. You did that by instantiating two objects – well under 60 lines of code. Experiment with changing some property values and deploying the changes by running npm run build and cdk deploy again.

Cleanup

To clean up the resources in the stack, run this command:

cdk destroy

Conclusion

Just as languages like BASIC and C allowed developers to write programs at a higher level of abstraction than assembly language, the AWS CDK and AWS Solutions Constructs allow us to create CloudFormation stacks in Typescript, Java, or Python instead JSON or YAML. Just as there will always be a place for assembly language, there will always be situations where we want to write CloudFormation templates manually – but for most situations, we can now use the AWS CDK and AWS Solutions Constructs to create complex and complete architectures in a fraction of the time with very little code.

AWS Solutions Constructs can currently be used in CDK applications written in Typescript, Javascript, Java and Python and will be available in C# applications soon.

About the Author

Biff Gaut has been shipping software since 1983, from small startups to large IT shops. Along the way he has contributed to 2 books, spoken at several conferences and written many blog posts. He is now a Principal Solutions Architect at AWS working on the AWS Solutions Constructs team, helping customers deploy better architectures more quickly.

Building, bundling, and deploying applications with the AWS CDK

Post Syndicated from Cory Hall original https://aws.amazon.com/blogs/devops/building-apps-with-aws-cdk/

The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework to model and provision your cloud application resources using familiar programming languages.

The post CDK Pipelines: Continuous delivery for AWS CDK applications showed how you can use CDK Pipelines to deploy a TypeScript-based AWS Lambda function. In that post, you learned how to add additional build commands to the pipeline to compile the TypeScript code to JavaScript, which is needed to create the Lambda deployment package.

In this post, we dive deeper into how you can perform these build commands as part of your AWS CDK build process by using the native AWS CDK bundling functionality.

If you’re working with Python, TypeScript, or JavaScript-based Lambda functions, you may already be familiar with the PythonFunction and NodejsFunction constructs, which use the bundling functionality. This post describes how to write your own bundling logic for instances where a higher-level construct either doesn’t already exist or doesn’t meet your needs. To illustrate this, I walk through two different examples: a Lambda function written in Golang and a static site created with Nuxt.js.

Concepts

A typical CI/CD pipeline contains steps to build and compile your source code, bundle it into a deployable artifact, push it to artifact stores, and deploy to an environment. In this post, we focus on the building, compiling, and bundling stages of the pipeline.

The AWS CDK has the concept of bundling source code into a deployable artifact. As of this writing, this works for two main types of assets: Docker images published to Amazon Elastic Container Registry (Amazon ECR) and files published to Amazon Simple Storage Service (Amazon S3). For files published to Amazon S3, this can be as simple as pointing to a local file or directory, which the AWS CDK uploads to Amazon S3 for you.

When you build an AWS CDK application (by running cdk synth), a cloud assembly is produced. The cloud assembly consists of a set of files and directories that define your deployable AWS CDK application. In the context of the AWS CDK, it might include the following:

  • AWS CloudFormation templates and instructions on where to deploy them
  • Dockerfiles, corresponding application source code, and information about where to build and push the images to
  • File assets and information about which S3 buckets to upload the files to

Use case

For this use case, our application consists of front-end and backend components. The example code is available in the GitHub repo. In the repository, I have split the example into two separate AWS CDK applications. The repo also contains the Golang Lambda example app and the Nuxt.js static site.

Golang Lambda function

To create a Golang-based Lambda function, you must first create a Lambda function deployment package. For Go, this consists of a .zip file containing a Go executable. Because we don’t commit the Go executable to our source repository, our CI/CD pipeline must perform the necessary steps to create it.

In the context of the AWS CDK, when we create a Lambda function, we have to tell the AWS CDK where to find the deployment package. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  runtime: lambda.Runtime.GO_1_X,
  handler: 'main',
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-go-executable')),
});

In the preceding code, the lambda.Code.fromAsset() method tells the AWS CDK where to find the Golang executable. When we run cdk synth, it stages this Go executable in the cloud assembly, which it zips and publishes to Amazon S3 as part of the PublishAssets stage.

If we’re running the AWS CDK as part of a CI/CD pipeline, this executable doesn’t exist yet, so how do we create it? One method is CDK bundling. The lambda.Code.fromAsset() method takes a second optional argument, AssetOptions, which contains the bundling parameter. With this bundling parameter, we can tell the AWS CDK to perform steps prior to staging the files in the cloud assembly.

Breaking down the BundlingOptions parameter further, we can perform the build inside a Docker container or locally.

Building inside a Docker container

For this to work, we need to make sure that we have Docker running on our build machine. In AWS CodeBuild, this means setting privileged: true. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-source-code'), {
    bundling: {
      image: lambda.Runtime.GO_1_X.bundlingDockerImage,
      command: [
        'bash', '-c', [
          'go test -v',
          'GOOS=linux go build -o /asset-output/main',
      ].join(' && '),
    },
  })
  ...
});

We specify two parameters:

  • image (required) – The Docker image to perform the build commands in
  • command (optional) – The command to run within the container

The AWS CDK mounts the folder specified as the first argument to fromAsset at /asset-input inside the container, and mounts the asset output directory (where the cloud assembly is staged) at /asset-output inside the container.

After we perform the build commands, we need to make sure we copy the Golang executable to the /asset-output location (or specify it as the build output location like in the preceding example).

This is the equivalent of running something like the following code:

docker run \
  --rm \
  -v folder-containing-source-code:/asset-input \
  -v cdk.out/asset.1234a4b5/:/asset-output \
  lambci/lambda:build-go1.x \
  bash -c 'GOOS=linux go build -o /asset-output/main'

Building locally

To build locally (not in a Docker container), we have to provide the local parameter. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-source-code'), {
    bundling: {
      image: lambda.Runtime.GO_1_X.bundlingDockerImage,
      command: [],
      local: {
        tryBundle(outputDir: string) {
          try {
            spawnSync('go version')
          } catch {
            return false
          }

          spawnSync(`GOOS=linux go build -o ${path.join(outputDir, 'main')}`);
          return true
        },
      },
    },
  })
  ...
});

The local parameter must implement the ILocalBundling interface. The tryBundle method is passed the asset output directory, and expects you to return a boolean (true or false). If you return true, the AWS CDK doesn’t try to perform Docker bundling. If you return false, it falls back to Docker bundling. Just like with Docker bundling, you must make sure that you place the Go executable in the outputDir.

Typically, you should perform some validation steps to ensure that you have the required dependencies installed locally to perform the build. This could be checking to see if you have go installed, or checking a specific version of go. This can be useful if you don’t have control over what type of build environment this might run in (for example, if you’re building a construct to be consumed by others).

If we run cdk synth on this, we see a new message telling us that the AWS CDK is bundling the asset. If we include additional commands like go test, we also see the output of those commands. This is especially useful if you wanted to fail a build if tests failed. See the following code:

$ cdk synth
Bundling asset GolangLambdaStack/MyGoFunction/Code/Stage...
✓  . (9ms)
✓  clients (5ms)

DONE 8 tests in 11.476s
✓  clients (5ms) (coverage: 84.6% of statements)
✓  . (6ms) (coverage: 78.4% of statements)

DONE 8 tests in 2.464s

Cloud Assembly

If we look at the cloud assembly that was generated (located at cdk.out), we see something like the following code:

$ cdk synth
Bundling asset GolangLambdaStack/MyGoFunction/Code/Stage...
✓  . (9ms)
✓  clients (5ms)

DONE 8 tests in 11.476s
✓  clients (5ms) (coverage: 84.6% of statements)
✓  . (6ms) (coverage: 78.4% of statements)

DONE 8 tests in 2.464s

It contains our GolangLambdaStack CloudFormation template that defines our Lambda function, as well as our Golang executable, bundled at asset.01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952/main.

Let’s look at how the AWS CDK uses this information. The GolangLambdaStack.assets.json file contains all the information necessary for the AWS CDK to know where and how to publish our assets (in this use case, our Golang Lambda executable). See the following code:

{
  "version": "5.0.0",
  "files": {
    "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952": {
      "source": {
        "path": "asset.01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952",
        "packaging": "zip"
      },
      "destinations": {
        "current_account-current_region": {
          "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}",
          "objectKey": "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952.zip",
          "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}"
        }
      }
    }
  }
}

The file contains information about where to find the source files (source.path) and what type of packaging (source.packaging). It also tells the AWS CDK where to publish this .zip file (bucketName and objectKey) and what AWS Identity and Access Management (IAM) role to use (assumeRoleArn). In this use case, we only deploy to a single account and Region, but if you have multiple accounts or Regions, you see multiple destinations in this file.

The GolangLambdaStack.template.json file that defines our Lambda resource looks something like the following code:

{
  "Resources": {
    "MyGoFunction0AB33E85": {
      "Type": "AWS::Lambda::Function",
      "Properties": {
        "Code": {
          "S3Bucket": {
            "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}"
          },
          "S3Key": "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952.zip"
        },
        "Handler": "main",
        ...
      }
    },
    ...
  }
}

The S3Bucket and S3Key match the bucketName and objectKey from the assets.json file. By default, the S3Key is generated by calculating a hash of the folder location that you pass to lambda.Code.fromAsset(), (for this post, folder-containing-source-code). This means that any time we update our source code, this calculated hash changes and a new Lambda function deployment is triggered.

Nuxt.js static site

In this section, I walk through building a static site using the Nuxt.js framework. You can apply the same logic to any static site framework that requires you to run a build step prior to deploying.

To deploy this static site, we use the BucketDeployment construct. This is a construct that allows you to populate an S3 bucket with the contents of .zip files from other S3 buckets or from a local disk.

Typically, we simply tell the BucketDeployment construct where to find the files that it needs to deploy to the S3 bucket. See the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-directory')),
  ],
  destinationBucket: myBucket
});

To deploy a static site built with a framework like Nuxt.js, we need to first run a build step to compile the site into something that can be deployed. For Nuxt.js, we run the following two commands:

  • yarn install – Installs all our dependencies
  • yarn generate – Builds the application and generates every route as an HTML file (used for static hosting)

This creates a dist directory, which you can deploy to Amazon S3.

Just like with the Golang Lambda example, we can perform these steps as part of the AWS CDK through either local or Docker bundling.

Building inside a Docker container

To build inside a Docker container, use the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-nuxtjs-project'), {
      bundling: {
        image: cdk.BundlingDockerImage.fromRegistry('node:lts'),
        command: [
          'bash', '-c', [
            'yarn install',
            'yarn generate',
            'cp -r /asset-input/dist/* /asset-output/',
          ].join(' && '),
        ],
      },
    }),
  ],
  ...
});

For this post, we build inside the publicly available node:lts image hosted on DockerHub. Inside the container, we run our build commands yarn install && yarn generate, and copy the generated dist directory to our output directory (the cloud assembly).

The parameters are the same as described in the Golang example we walked through earlier.

Building locally

To build locally, use the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-nuxtjs-project'), {
      bundling: {
        local: {
          tryBundle(outputDir: string) {
            try {
              spawnSync('yarn --version');
            } catch {
              return false
            }

            spawnSync('yarn install && yarn generate');

       fs.copySync(path.join(__dirname, ‘path-to-nuxtjs-project’, ‘dist’), outputDir);
            return true
          },
        },
        image: cdk.BundlingDockerImage.fromRegistry('node:lts'),
        command: [],
      },
    }),
  ],
  ...
});

Building locally works the same as the Golang example we walked through earlier, with one exception. We have one additional command to run that copies the generated dist folder to our output directory (cloud assembly).

Conclusion

This post showed how you can easily compile your backend and front-end applications using the AWS CDK. You can find the example code for this post in this GitHub repo. If you have any questions or comments, please comment on the GitHub repo. If you have any additional examples you want to add, we encourage you to create a Pull Request with your example!

Our code also contains examples of deploying the applications using CDK Pipelines, so if you’re interested in deploying the example yourself, check out the example repo.

 

About the author

Cory Hall

Cory is a Solutions Architect at Amazon Web Services with a passion for DevOps and is based in Charlotte, NC. Cory works with enterprise AWS customers to help them design, deploy, and scale applications to achieve their business goals.