Tag Archives: AWS CodeArtifact

AWS Weekly Roundup — Savings Plans, Amazon DynamoDB, AWS CodeArtifact, and more — March 25, 2024

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-savings-plans-amazon-dynamodb-aws-codeartifact-and-more-march-25-2024/

AWS Summit season is starting! I’m happy I will meet our customers, partners, and the press next week at the AWS Summit Paris and the week after at the AWS Summit Amsterdam. I’ll show you how mobile application developers can use generative artificial intelligence (AI) to boost their productivity. Be sure to stop by and say hi if you’re around.

Now that my talks for the Summit are ready, I took the time to look back at the AWS launches from last week and write this summary for you.

Last week’s launches
Here are some launches that got my attention:

AWS License Manager allows you to track IBM Db2 licenses on Amazon Relational Database Service (Amazon RDS) – I wrote about Amazon RDS when we launched IBM Db2 back in December 2023 and I told you that you must bring your own Db2 license. Starting today, you can track your Amazon RDS for Db2 usage with AWS License Manager. License Manager provides you with better control and visibility of your licenses to help you limit licensing overages and reduce the risk of non-compliance and misreporting.

AWS CodeBuild now supports custom images for AWS Lambda – You can now use compute container images stored in an Amazon Elastic Container Registry (Amazon ECR) repository for projects configured to run on Lambda compute. Previously, you had to use one of the managed container images provided by AWS CodeBuild. AWS managed container images include support for AWS Command Line Interface (AWS CLI), Serverless Application Model, and various programming language runtimes.

AWS CodeArtifact package group configuration – Administrators of package repositories can now manage the configuration of multiple packages in one single place. A package group allows you to define how packages are updated by internal developers or from upstream repositories. You can now allow or block internal developers to publish packages or allow or block upstream updates for a group of packages. Read my blog post for all the details.

Return your Savings Plans – We have announced the ability to return Savings Plans within 7 days of purchase. Savings Plans is a flexible pricing model that can help you reduce your bill by up to 72 percent compared to On-Demand prices, in exchange for a one- or three-year hourly spend commitment. If you realize that the Savings Plan you recently purchased isn’t optimal for your needs, you can return it and if needed, repurchase another Savings Plan that better matches your needs.

Amazon EC2 Mac Dedicated Hosts now provide visibility into supported macOS versions – You can now view the latest macOS versions supported on your EC2 Mac Dedicated Host, which enables you to proactively validate if your Dedicated Host can support instances with your preferred macOS versions.

Amazon Corretto 22 is now generally available – Corretto 22, an OpenJDK feature release, introduces a range of new capabilities and enhancements for developers. New features like stream gatherers and unnamed variables help you write code that’s clearer and easier to maintain. Additionally, optimizations in garbage collection algorithms boost performance. Existing libraries for concurrency, class files, and foreign functions have also been updated, giving you a more powerful toolkit to build robust and efficient Java applications.

Amazon DynamoDB now supports resource-based policies and AWS PrivateLink – With AWS PrivateLink, you can simplify private network connectivity between Amazon Virtual Private Cloud (Amazon VPC), Amazon DynamoDB, and your on-premises data centers using interface VPC endpoints and private IP addresses. On the other side, resource-based policies to help you simplify access control for your DynamoDB resources. With resource-based policies, you can specify the AWS Identity and Access Management (IAM) principals that have access to a resource and what actions they can perform on it. You can attach a resource-based policy to a DynamoDB table or a stream. Resource-based policies also simplify cross-account access control for sharing resources with IAM principals of different AWS accounts.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional news items, open source projects, and Twitch shows that you might find interesting:

British Broadcasting Corporation (BBC) migrated 25PB of archives to Amazon S3 Glacier – The BBC Archives Technology and Services team needed a modern solution to centralize, digitize, and migrate its 100-year-old flagship archives. It began using Amazon Simple Storage Service (Amazon S3) Glacier Instant Retrieval, which is an archive storage class that delivers the lowest-cost storage for long-lived data that is rarely accessed and requires retrieval in milliseconds. I did the math, you need 2,788,555 DVD discs to store 25PB of data. Imagine a pile of DVDs reaching 41.8 kilometers (or 25.9 miles) tall! Read the full story.

AWS Build On Generative AIBuild On Generative AI – Season 3 of your favorite weekly Twitch show about all things generative AI is in full swing! Streaming every Monday, 9:00 AM US PT, my colleagues Tiffany and Darko discuss different aspects of generative AI and invite guest speakers to demo their work.

AWS open source news and updates – My colleague Ricardo writes this weekly open source newsletter in which he highlights new open source projects, tools, and demos from the AWS Community.

Upcoming AWS events
Check your calendars and sign up for these AWS events:

AWS SummitsAWS Summits – As I wrote in the introduction, it’s AWS Summit season again! The first one happens next week in Paris (April 3), followed by Amsterdam (April 9), Sydney (April 10–11), London (April 24), Berlin (May 15–16), and Seoul (May 16–17). AWS Summits are a series of free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS.

AWS re:InforceAWS re:Inforce – Join us for AWS re:Inforce (June 10–12) in Philadelphia, Pennsylvania. AWS re:Inforce is a learning conference focused on AWS security solutions, cloud security, compliance, and identity. Connect with the AWS teams that build the security tools and meet AWS customers to learn about their security journeys.

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— seb

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Improve the security of your software supply chain with Amazon CodeArtifact package group configuration

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/improve-the-security-of-your-software-supply-chain-with-amazon-codeartifact-package-group-configuration/

Starting today, administrators of package repositories can manage the configuration of multiple packages in one single place with the new AWS CodeArtifact package group configuration capability. A package group allows you to define how packages are updated by internal developers or from upstream repositories. You can now allow or block internal developers to publish packages or allow or block upstream updates for a group of packages.

CodeArtifact is a fully managed package repository service that makes it easy for organizations to securely store and share software packages used for application development. You can use CodeArtifact with popular build tools and package managers such as NuGet, Maven, Gradle, npm, yarn, pip, twine, and the Swift Package Manager.

CodeArtifact supports on-demand importing of packages from public repositories such as npmjs.com, maven.org, and pypi.org. This allows your organization’s developers to fetch all their packages from one single source of truth: your CodeArtifact repository.

Simple applications routinely include dozens of packages. Large enterprise applications might have hundreds of dependencies. These packages help developers speed up the development and testing process by providing code that solves common programming challenges such as network access, cryptographic functions, or data format manipulation. These packages might be produced by other teams in your organization or maintained by third parties, such as open source projects.

To minimize the risks of supply chain attacks, some organizations manually vet the packages that are available in internal repositories and the developers who are authorized to update these packages. There are three ways to update a package in a repository. Selected developers in your organization might push package updates. This is typically the case for your organization’s internal packages. Packages might also be imported from upstream repositories. An upstream repository might be another CodeArtifact repository, such as a company-wide source of approved packages or external public repositories offering popular open source packages.

Here is a diagram showing different possibilities to expose a package to your developers.

CodeArtifact Multi Repository

When managing a repository, it is crucial to define how packages can be downloaded and updated. Allowing package installation or updates from external upstream repositories exposes your organization to typosquatting or dependency confusion attacks, for example. Imagine a bad actor publishing a malicious version of a well-known package under a slightly different name. For example, instead of coffee-script, the malicious package is cofee-script, with only one “f.” When your repository is configured to allow retrieval from upstream external repositories, all it takes is a distracted developer working late at night to type npm install cofee-script instead of npm install coffee-script to inject malicious code into your systems.

CodeArtifact defines three permissions for the three possible ways of updating a package. Administrators can allow or block installation and updates coming from internal publish commands, from an internal upstream repository, or from an external upstream repository.

Until today, repository administrators had to manage these important security settings package by package. With today’s update, repository administrators can define these three security parameters for a group of packages at once. The packages are identified by their type, their namespace, and their name. This new capability operates at the domain level, not the repository level. It allows administrators to enforce a rule for a package group across all repositories in their domain. They don’t have to maintain package origin controls configuration in every repository.

Let’s see in detail how it works
Imagine that I manage an internal package repository with CodeArtifact and that I want to distribute only the versions of the AWS SDK for Python, also known as boto3, that have been vetted by my organization.

I navigate to the CodeArtifact page in the AWS Management Console, and I create a python-aws repository that will serve vetted packages to internal developers.

CodeArtifact - Create a repo

This creates a staging repository in addition to the repository I created. The external packages from pypi will first be staged in the pypi-store internal repository, where I will verify them before serving them to the python-aws repository. Here is where my developers will connect to download them.

CodeArtifact - Create a repo - package flowBy default, when a developer authenticates against CodeArtifact and types pip install boto3, CodeArtifact downloads the packages from the public pypi repository, stages them on pypi-store, and copies them on python-aws.

CodeArtifact - pip installCodeArtifact - list of packages after a pip install

Now, imagine I want to block CodeArtifact from fetching package updates from the upstream external pypi repository. I want python-aws to only serve packages that I approved from my pypi-store internal repository.

With the new capability that we released today, I can now apply this configuration for a group of packages. I navigate to my domain and select the Package Groups tab. Then, I select the Create Package Group button.

I enter the Package group definition. This expression defines what packages are included in this group. Packages are identified using a combination of three components: package format, an optional namespace, and name.

Here are a few examples of patterns that you can use for each of the allowed combinations:

  • All package formats: /*
  • A specific package format: /npm/*
  • Package format and namespace prefix: /maven/com.amazon~
  • Package format and namespace: /npm/aws-amplify/*
  • Package format, namespace, and name prefix: /npm/aws-amplify/ui~
  • Package format, namespace, and name: /maven/org.apache.logging.log4j/log4j-core$

I invite you to read the documentation to learn all the possibilities.

In my example, there is no concept of namespace for Python packages, and I want the group to include all packages with names starting with boto3 coming from pypi. Therefore, I write /pypi//boto3~.

CodeArtifact - package group definition

Then, I define the security parameters for my package group. In this example, I don’t want my organization’s developers to publish updates. I also don’t want CodeArtifact to fetch new versions from the external upstream repositories. I want to authorize only package updates from my internal staging directory.

I uncheck all Inherit from parent group boxes. I select Block for Publish and External upstream. I leave Allow on Internal upstream. Then, I select Create Package Group.

CodeArtifact - package group security configuration

Once defined, developers are unable to install different package versions than the ones authorized in the python-aws repository. When I, as a developer, try to install another version of the boto3 package, I receive an error message. This is expected because the newer version of the boto3 package is not available in the upstream staging repo, and there is block rule that prevents fetching packages or package updates from external upstream repositories.

Code ARtifact - installation is denied when using a package version not already present in the repository

Similarly, let’s imagine your administrator wants to protect your organization from dependency substitution attacks. All your internal Python package names start with your company name (mycompany). The administrator wants to block developers for accidentally downloading from pypi.org packages that start with mycompany.

Administrator creates a rule with the pattern /pypi//mycompany~ with publish=allow, external upstream=block, and internal upstream=block. With this configuration, internal developers or your CI/CD pipeline can publish those packages, but CodeArtifact will not import any packages from pypi.org that start with mycompany, such as mycompany.foo or mycompany.bar. This prevents dependency substitution attacks for these packages.

Package groups are available in all AWS Regions where CodeArtifact is available, at no additional cost. It helps you to better control how packages and package updates land in your internal repositories. It helps to prevent various supply chain attacks, such as typosquatting or dependency confusion. It’s one additional configuration that you can add today into your infrastructure-as-code (IaC) tools to create and manage your CodeArtifact repositories.

Go and configure your first package group today.

— seb

New – Add Your Swift Packages to AWS CodeArtifact

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/new-add-your-swift-packages-to-aws-codeartifact/

Starting today, Swift developers who write code for Apple platforms (iOS, iPadOS, macOS, tvOS, watchOS, or visionOS) or for Swift applications running on the server side can use AWS CodeArtifact to securely store and retrieve their package dependencies. CodeArtifact integrates with standard developer tools such as Xcode, xcodebuild, and the Swift Package Manager (the swift package command).

Simple applications routinely include dozens of packages. Large enterprise applications might have hundreds of dependencies. These packages help developers speed up the development and testing process by providing code that solves common programming challenges such as network access, cryptographic functions, or data format manipulation. Developers also embed SDKs–such as the AWS SDKs–to access remote services. These packages might be produced by other teams in your organization or maintained by third-parties, such as open-source projects. Managing packages and their dependencies is an integral part of the software development process. Modern programming languages include tools to download and resolve dependencies: Maven in Java, NuGet in C#, npm or yarn in JavaScript, and pip in Python just to mention a few. Developers for Apple platforms use CocoaPods or the Swift Package Manager (SwiftPM).

Downloading and integrating packages is a routine operation for application developers. However, it presents at least two significant challenges for organizations.

The first challenge is legal. Organizations must ensure that licenses for third-party packages are compatible with the expected use of licenses for your specific project and that the package doesn’t violate someone else’s intellectual property (IP). The second challenge is security. Organizations must ensure that the included code is safe to use and doesn’t include back doors or intentional vulnerabilities designed to introduce security flaws in your app. Injecting vulnerabilities in popular open-source projects is known as a supply chain attack and has become increasingly popular in recent years.

To address these challenges, organizations typically install private package servers on premises or in the cloud. Developers can only use packages vetted by their organization’s security and legal teams and made available through private repositories.

AWS CodeArtifact is a managed service that allows you to safely distribute packages to your internal teams of developers. There is no need to install, manage, or scale the underlying infrastructure. We take care of that for you, giving you more time to work on your apps instead of the software development infrastructure.

I’m excited to announce that CodeArtifact now supports native Swift packages, in addition to npm, PyPI, Maven, NuGet, and generic package formats. Swift packages are a popular way to package and distribute reusable Swift code elements. To learn how to create your own Swift package, you can follow this tutorial. The community has also created more than 6,000 Swift packages that you can use in your Swift applications.

You can now publish and download your Swift package dependencies from your CodeArtifact repository in the AWS Cloud. CodeArtifact SwiftPM works with existing developer tools such as Xcode, VSCode, and the Swift Package Manager command line tool. After your packages are stored in CodeArtifact, you can reference them in your project’s Package.swift file or in your Xcode project, in a similar way you use Git endpoints to access public Swift packages.

After the configuration is complete, your network-jailed build system will download the packages from the CodeArtifact repository, ensuring that only approved and controlled packages are used during your application’s build process.

How To Get Started
As usual on this blog, I’ll show you how it works. Imagine I’m working on an iOS application that uses Amazon DynamoDB as a database. My application embeds the AWS SDK for Swift as a dependency. To comply with my organization policies, the application must use a specific version of the AWS SDK for Swift, compiled in-house and approved by my organization’s legal and security teams. In this demo, I show you how I prepare my environment, upload the package to the repository, and use this specific package build as a dependency for my project.

For this demo, I focus on the steps specific to Swift packages. You can read the tutorial written by my colleague Steven to get started with CodeArtifact.

I use an AWS account that has a package repository (MySwiftRepo) and domain (stormacq-test) already configured.

CodeArtifact repository

To let SwiftPM acess my CodeArtifact repository, I start by collecting an authentication token from CodeArtifact.

export CODEARTIFACT_AUTH_TOKEN=`aws codeartifact get-authorization-token \
                                     --domain stormacq-test              \
                                     --domain-owner 012345678912         \
                                     --query authorizationToken          \
                                     --output text`

Note that the authentication token expires after 12 hours. I must repeat this command after 12 hours to obtain a fresh token.

Then, I request the repository endpoint. I pass the domain name and domain owner (the AWS account ID). Notice the --format swift option.

export CODEARTIFACT_REPO=`aws codeartifact get-repository-endpoint  \
                               --domain stormacq-test               \
                               --domain-owner 012345678912          \
                               --format swift                       \
                               --repository MySwiftRepo             \
                               --query repositoryEndpoint           \
                               --output text`

Now that I have the repository endpoint and an authentication token, I use the AWS Command Line Interface (AWS CLI) to configure SwiftPM on my machine.

SwiftPM can store the repository configurations at user level (in the file ~/.swiftpm/configurations) or at project level (in the file <your project>/.swiftpm/configurations). By default, the CodeArtifact login command creates a project-level configuration to allow you to use different CodeArtifact repositories for different projects.

I use the AWS CLI to configure SwiftPM on my build machine.

aws codeartifact login          \
    --tool swift                \
    --domain stormacq-test      \
    --repository MySwiftRepo    \
    --namespace aws             \
    --domain-owner 012345678912

The command invokes swift package-registry login with the correct options, which in turn, creates the required SwiftPM configuration files with the given repository name (MySwiftRepo) and scope name (aws).

Now that my build machine is ready, I prepare my organization’s approved version of the AWS SDK for Swift package and then I upload it to the repository.

git clone https://github.com/awslabs/aws-sdk-swift.git
pushd aws-sdk-swift
swift package archive-source
mv aws-sdk-swift.zip ../aws-sdk-swift-0.24.0.zip
popd

Finally, I upload this package version to the repository.

When using Swift 5.9 or more recent, I can upload my package to my private repository using the SwiftPM command:

swift package-registry publish           \
                       aws.aws-sdk-swift \
                       0.24.0            \
                       --verbose

The versions of Swift before 5.9 don’t provide a swift package-registry publish command. So, I use the curl command instead.

curl  -X PUT 
      --user "aws:$CODEARTIFACT_AUTH_TOKEN"               \
      -H "Accept: application/vnd.swift.registry.v1+json" \
      -F source-archive="@aws-sdk-swift-0.24.0.zip"       \
      "${CODEARTIFACT_REPO}aws/aws-sdk-swift/0.24.0"

Notice the format of the package name after the URI of the repository: <scope>/<package name>/<package version>. The package version must follow the semantic versioning scheme.

I can use the CLI or the console to verify that the package is available in the repository.

CodeArtifact List Packages

aws codeartifact list-package-versions      \
                  --domain stormacq-test    \
                  --repository MySwiftRepo  \
                  --format swift            \
                  --namespace aws           \
                  --package aws-sdk-swift
{
    "versions": [
        {
            "version": "0.24.0",
            "revision": "6XB5O65J8J3jkTDZd8RMLyqz7XbxIg9IXpTudP7THbU=",
            "status": "Published",
            "origin": {
                "domainEntryPoint": {
                    "repositoryName": "MySwiftRepo"
                },
                "originType": "INTERNAL"
            }
        }
    ],
    "defaultDisplayVersion": "0.24.0",
    "format": "swift",
    "package": "aws-sdk-swift",
    "namespace": "aws"
}

Now that the package is available, I can use it in my projects as usual.

Xcode uses SwiftPM tools and configuration files I just created. To add a package to my Xcode project, I select the project name on the left pane, and then I select the Package Dependencies tab. I can see the packages that are already part of my project. To add a private package, I choose the + sign under Packages.

Xcode add a package as dependency to a project

On the top right search field, I enter aws.aws-sdk-swift (this is <scope name>.<package name>). After a second or two, the package name appears on the list. On the top right side, you can verify the source repository (next to the Registry label). Before selecting the Add Package button, select the version of the package, just like you do for publicly available packages.

Add a private package from Codeartifact on Xcode

Alternatively, for my server-side or command-line applications, I add the dependency in the Package.swift file. I also use the format (<scope>.<package name>) as the first parameter of .package(id:from:)function.

    dependencies: [
        .package(id: "aws.aws-sdk-swift", from: "0.24.0")
    ],

When I type swift package update, SwiftPM downloads the package from the CodeArtifact repository.

Things to Know
There are some things to keep in mind before uploading your first Swift packages.

  • Be sure to update to the latest version of the CLI before trying any command shown in the preceding instructions.
  • You have to use Swift version 5.8 or newer to use CodeArtifact with the swift package command. On macOS, the Swift toolchain comes with Xcode. Swift 5.8 is available on macOS 13 (Ventura) and Xcode 14. On Linux and Windows, you can download the Swift toolchain from swift.org.
  • You have to use Xcode 15 for your iOS, iPadOS, tvOS, or watchOS applications. I tested this with Xcode 15 beta8.
  • The swift package-registry publish command is available with Swift 5.9 or newer. When you use Swift 5.8, you can use curlto upload your package, as I showed in the demo (or use any HTTP client of your choice).
  • Swift packages have the concept of scope. A scope provides a namespace for related packages within a package repository. Scopes are mapped to CodeArtifact namespaces.
  • The authentication token expires after 12 hours. We suggest writing a script to automate its renewal or using a scheduled AWS Lambda function and securely storing the token in AWS Secrets Manager (for example).

Troubleshooting
If Xcode can not find your private package, double-check the registry configuration in ~/.swiftpm/configurations/registries.json. In particular, check if the scope name is present. Also verify that the authentication token is present in the keychain. The name of the entry is the URL of your repository. You can verify the entries in the keychain with the /Application/Utilities/Keychain Access.app application or using the security command line tool.

security find-internet-password                                                  \
          -s "stormacq-test-012345678912.d.codeartifact.us-west-2.amazonaws.com" \
          -g

Here is the SwiftPM configuration on my machine.

cat ~/.swiftpm/configuration/registries.json

{
  "authentication" : {
    "stormacq-test-012345678912.d.codeartifact.us-west-2.amazonaws.com" : {
      "loginAPIPath" : "/swift/MySwiftRepo/login",
      "type" : "token"
    }
  },
  "registries" : {
    "aws" : { // <-- this is the scope name!
      "url" : "https://stormacq-test-012345678912.d.codeartifact.us-west-2.amazonaws.com/swift/MySwiftRepo/"
    }
  },
  "version" : 1
}

Keychain item for codeartifact authentication token

Pricing and Availability
CodeArtifact costs for Swift packages are the same as for the other package formats already supported. CodeArtifact billing depends on three metrics: the storage (measured in GB per month), the number of requests, and the data transfer out to the internet or to other AWS Regions. Data transfer to AWS services in the same Region is not charged, meaning you can run your CICD jobs on Amazon EC2 Mac instances, for example, without incurring a charge for the CodeArtifact data transfer. As usual, the pricing page has the details.

CodeArtifact for Swift packages is available in all 13 Regions where CodeArtifact is available.

Now go build your Swift applications and upload your private packages to CodeArtifact!

— seb

PS : Do you know you can write Lambda functions in the Swift programming language? Check the quick start guide or follow this 35-minute tutorial.

Create a Multi-Region Python Package Publishing Pipeline with AWS CDK and CodePipeline

Post Syndicated from Brian Smitches original https://aws.amazon.com/blogs/devops/create-a-multi-region-python-package-publishing-pipeline-with-aws-cdk-and-codepipeline/

Customers can author and store internal software packages in AWS by leveraging the AWS CodeSuite (AWS CodePipeline, AWS CodeBuild, AWS CodeCommit, and AWS CodeArtifact).  As of the publish date of this blog post, there is no native way to replicate your CodeArtifact Packages across regions. This blog addresses how a custom solution built with the AWS Cloud Development Kit and AWS CodePipeline can create a Multi-Region Python Package Publishing Pipeline.

Whether it’s for resiliency or performance improvement, many customers want to deploy their applications across multiple regions. When applications are dependent on custom software packages, the software packages should be replicated to multiple regions as well. This post will walk through how to deploy a custom package publishing pipeline in your own AWS Account. This pipeline connects a Python package source code repository to build and publish pip packages to CodeArtifact Repositories spanning three regions (the primary and two replica regions). While this sample CDK Application is built specifically for pip packages, the underlying architecture can be reused for different software package formats, such as npm, Maven, NuGet, etc.

Solution overview

The following figure demonstrates the solution workflow:

  1. A CodePipeline pipeline orchestrates the building and publishing of the software package
    1. This pipeline is triggered by commits on the main branch of the CodeCommit repository
    2. A CodeBuild job builds the pip packages using twine to be distributed
    3. The publish stage (third column) uses three parallel CodeBuild jobs to publish the distribution package to the two CodeArtifact repositories in separate regions
  1. The first CodeArtifact Repository stores the package contents in the primary region.
  2. The second and third CodeArtifact Repository act as replicas and store the package contents in other regions.
Figure 1. A figure showing the architecture diagram

Figure 1.  Architecture diagram

All of these resources are defined in a single AWS CDK Application. The resources are defined in CDK Stacks that are deployed as AWS CloudFormation Stacks. AWS CDK can deploy the different stacks across separate regions.

Prerequisites

Before getting started, you will need the following:

  1. An AWS account
  2. An instance of the AWS Cloud9 IDE or an alternative local compute environment, such as your personal computer
  3. The following installed on your compute environment:
    1. AWS CDK
    2. AWS Command Line Interface (AWS CLI)
    3. npm
  1. The AWS Accounts must be bootstrapped for CDK in the necessary regions. The default configuration uses us-east-1, us-east-2 and us-west-2  as these three regions support CodeArtifact.

A new AWS Cloud9 IDE is recommended for this tutorial to isolate these actions in this post from your normal compute environment. See the Cloud9 Documentation for Creating an Environment.

Deploy the Python Package Publishing Pipeline into your AWS Account with the CDK

The source code can be found in this GitHub Repository.

  1. Fork the GitHub Repo into your account. This way you can experiment with changes as necessary to fit your workload.
  2. In your local compute environment, clone the GitHub Repository and cd into the project directory:
git clone [email protected]:<YOUR_GITHUB_USERNAME>/multi-region-
python-package-publishing-pipeline.git && cd multi-region-
python-package-publishing-pipeline
  1. Install the necessary node packages:
npm i
  1. (Optional) Override the default configurations for the CodeArtifact domainName, repositoryName, primaryRegion, and replicaRegions.
    1. navigate to ./bin/multiregion_package_publishing.ts and update the relevant fields.
    2. From the project’s root directory (multi-region-python-package-publishing-pipeline), deploy the AWS CDK application. This step may take 5-10 minutes.
cdk deploy --all
  1. When prompted “Do you wish to deploy these changes (y/n)?”, Enter y.

Viewing the deployed CloudFormation stacks

After the deployment of the AWS CDK application completes, you can view the deployed AWS CDK Stacks via CloudFormation. From the AWS Console, search “CloudFormation’ in the search bar and navigate to the service dashboard. In the primary region (us-east-1(N. Virginia)) you should see two stacks: CodeArtifactPrimaryStack-<region> and PackagePublishingPipelineStack.

Screenshot showing the CloudFormation Stacks in the primary region

Figure 2. Screenshot showing the CloudFormation Stacks in the primary region

Switch regions to one of the secondary regions us-west-2 (Oregon) or us-east-2 (Ohio) to see the remaining stacks named CodeArtifactReplicaStack-<region>. These correspond to the three AWS CDK Stacks from the architecture diagram.

Screenshot showing the CloudFormation stacks in a separate region

Figure 3. Screenshot showing the CloudFormation stacks in a separate region

Viewing the CodePipeline Package Publishing Pipeline

From the Console, select the primary region (us-east-1) and navigate to CodePipeline by utilizing the search bar. Select the Pipeline titled packagePipeline and inspect the state of the pipeline. This pipeline triggers after every commit from the CodeCommit repository named PackageSourceCode. If the pipeline is still in process, then wait a few minutes, as this pipeline can take approximately 7–8 minutes to complete all three stages (Source, Build, and Publish). Once it’s complete, the pipeline should reflect the following screenshot:

A screenshot showing the CodePipeline flow

Figure 4. A screenshot showing the CodePipeline flow

Viewing the Published Package in the CodeArtifact Repository

To view the published artifacts, go to the primary or secondary region and navigate to the CodeArtifact dashboard by utilizing the search bar in the Console. You’ll see a repository named package-artifact-repo. Select the repository and you’ll see the sample pip package named mypippackage inside the repository. This package is defined by the source code in the CodeCommit repository named PackageSourceCode in the primary region (us-east-1).

Screenshot of the package repository

Figure 5. Screenshot of the package repository

Create a new package version in CodeCommit and monitor the pipeline release

Navigate to your CodeCommit’s PackageSourceCode (us-east-1 CodeCommit > Repositories > PackageSourceCode. Open the setup.py file and select the Edit button. Make a simple modification, change the version = '1.0.0' to version = '1.1.0' and commit the changes to the Main branch.

A screenshot of the source package's code repository in CodeCommit

Figure 6. A screenshot of the source package’s code repository in CodeCommit

Now navigate back to CodePipeline and watch as the pipeline performs the release automatically. When the pipeline finishes, this new package version will live in each of the three CodeArtifact Repositories.

Install the custom pip package to your local Python Environment

For your development team to connect to this CodeArtifact Repository to download repositories, you must configure the pip tool to look in this repository. From your Cloud9 IDE (or local development environment), let’s test the installation of this package for Python3:

  1. Copy the connection instructions for the pip tool. Navigate to the CodeArtifact repository of your choice and select View connection instructions
    1. Select Copy to copy the snippet to your clipboard
Screenshot showing directions to connect to a code artifact repository

Figure 7. Screenshot showing directions to connect to a code artifact repository

  1. Paste the command from your clipboard
  2. Run pip install mypippackage==1.0.0
Screenshot showing CodeArtifact login

Figure 8. Screenshot showing CodeArtifact login

  1. Test the package works as expected by importing the modules
  2. Start the Python REPL by running python3 in the terminal
Screenshot of the package being imported

Figure 9. Screenshot of the package being imported

Clean up

Destroy all of the AWS CDK Stacks by running cdk destroy --all from the root AWS CDK application directory.

Conclusion

In this post, we walked through how to deploy a CodePipeline pipeline to automate the publishing of Python packages to multiple CodeArtifact repositories in separate regions. Leveraging the AWS CDK simplifies the maintenance and configuration of this multi-region solution by using Infrastructure as Code and predefined Constructs. If you would like to customize this solution to better fit your needs, please read more about the AWS CDK and AWS Developer Tools. Some links we suggest include the CodeArtifact User Guide (with sections covering npm, Python, Maven, and NuGet), the CDK API Reference, CDK Pipelines, and the CodePipeline User Guide.

About the authors:

Andrew Chen

Andrew Chen is a Solutions Architect with an interest in Data Analytics, Machine Learning, and DevOps. Andrew has previous experience in management consulting in which he worked as a technical architect for various cloud migration projects. In his free time, Andrew enjoys fishing, hiking, kayaking, and keeping up with financial markets.

Brian Smitches

Brian Smitches is a Solutions Architect with an interest in Infrastructure as Code and the AWS Cloud Development Kit. Brian currently supports Federal SMB Partners and has previous experience with Full Stack Application Development. In his personal time, Brian enjoys skiing, water sports, and traveling with friends and family.

Tighten your package security with CodeArtifact Package Origin Control toolkit

Post Syndicated from Davide Semenzin original https://aws.amazon.com/blogs/devops/tighten-your-package-security-with-codeartifact-package-origin-control-toolkit/

Introduction

AWS CodeArtifact is a fully managed artifact repository service that makes it easy for organizations to securely store and share software packages used for application development. On Jul14 2022 we introduced a new feature called Package Origin Controls which allows customers to protect themselves against “dependency substitution“ or “dependency confusion” attacks.

This class of supply chain attacks can be carried out when an attacker with knowledge of an organization’s internally published package names (for example: Sample-Package=1.0.0) is able to publish such name(s) in a public repository. Package managers contain dependency resolution logic that pulls the latest version of a package. The attacker abuses this logic by publishing a high version number of a package with the same name as the organization’s package (for example: Sample-Package=99.0.0). The package manager then would resolve any requests for that package by pulling the attacker’s package version with malicious code instead of the internally published dependency.

In order for this type of attack to be successful, the organization must source their package versions from both internal and remote repositories at the same time. For example, your pip installation could be configured with multiple package indexes, both internal and external; or, as a CodeArtifact user you may have both the repository containing your private packages as well as an external connection to PyPI in the upstream graph of your current repository. In either case, the package manager is able to obtain package versions from more than one source. This causes the package manager to resolve the higher version number from the remote repository, instead of the trusted internal version.

A few strategies can be used to mitigate this kind of mixing: a simple one is to instruct the package manager to only source from an internal repository. While effective, this is often not practical, because it either significantly degrades developer experience or requires a lot of effort in order to set up, maintain, and vet external dependencies. Another mitigation consists in using explicit version pinning, which is also effective, though it might re-introduce the dependency substitution risk upon dependency upgrade without manual vetting. Some package managers also support namespaces or other types of dependency scoping, which are also helpful in preventing this class of attacks, but when available may not always actionable for existing packages due to the large amount of work required to do the renaming.

CodeArtifact is adding another tool to strengthen your software supply chain by introducing per-package per-repository controls which allow you to more precisely configure and control how package versions are sourced. For each package in your repository, you are now able to decide whether to allow or block sourcing versions from both upstream sources and direct publishing. These flags enable you to prevent mixed versions scenarios for all the types of packages supported by CodeArtifact without the need for additional package manager configuration.

While packages retained or published after the launch of this feature come with tighter-by-default origin configurations, in keeping with the principle of least astonishment we decided not to apply any of these policies retroactively. Therefore, your existing packages in CodeArtifact will not have their origin configuration changed and your setup will continue to work continue to work as it did before the feature was released.

Should you want to leverage this feature to tighten the security posture of your existing packages, we are releasing a toolkit to make it easier to bulk-set policy values in your repositories. This blog post describes how to use it.

Solution overview

The purpose of the Package Origin Control toolkit is to provide repository administrators with an easy way to set Origin Control policies in bulk on packages that have not received the default protection because they pre-date feature release. This can be achieved by blocking upstream versions for internal packages. In this blog post we will focus on this use-case, though the toolkit does support blocking publishing package versions to avoid a potentially vulnerable mixed state for external packages as well.

The toolkit is comprised of two scripts: a first one called generate_package_configurations.py for creating a manifest file listing the packages in a domain alongside their proposed origin configuration to apply, and a second one named apply_package_configurations.py, that reads the manifest file and applies the configuration within.

generate_package_configurations.py can operate on a whole repository, or on a subset of packages (specified either via filters, or though a list) and supports two origin control resolution modes:

  • A manual one where you supply the origin configuration you would like to set for all packages in scope. This is a good option if, for instance, you already maintain a list of internal packages, or if they are published in a consistent internal namespace which allows for them to be easily selected.
  • An automated one, which tries to identify what packages should have their upstreams blocked by analyzing the upstream repository graph and external connections, looking for evidence that package versions are only available from the repository at hand- in which case it determines it can disable sourcing of upstream versions can be done without risk of breaking builds. This is a good option if you want a quick way to tighten your security posture without having to manually analyze your whole repository.

With the manifest created, apply_package_configurations.py takes it as an input and effects the changes specified in it by calling the new PutPackageOriginConfiguration API (link). Precisely because it is meant to set these values in bulk, this script supports backup and revert operations by default, as well as dry-run and step-by-step confirmation options. If you identify an issue after applying origin control changes, you will be able to safely revert to the original, working configuration before trying again.

In this blog post we will cover how to use these tools:

  • To block package versions from upstream sources for all recommended packages in a repository
  • To block upstreams for a list of packages you already have
  • To revert to the original state in case of an incorrect configuration push

Prerequisites

The following prerequisites are required before you begin:

  1. Set up the Package Origin Control toolkit as described in the README on GitHub. You will need a working installation of Python 3.6 or later as well as the ability to install dependencies like the Python AWS SDK. The AWS CLI is not required.
  2. Write permissions on the CodeArtifact repository where you want to add package origin controls (see this link for additional info.)

Procedures

To block package versions from upstream sources for all recommended packages in a repository

Introduction

This procedure should be considered if you have a CodeArtifact repository with a variety of package formats and upstreams, and you want to have the toolkit automatically resolve what packages are safe to block upstreams for. It will block acquisition of new versions from upstreams only if two conditions are met:

  • the target repository doesn’t have access to an external connection
  • no versions of the package are available via any of its upstream repositories (either because the target repository itself doesn’t have any upstreams or because none of the upstreams have the package).

Therefore, we assume there isn’t an immediate External Connection attached to the target repository for the package format(s) you are trying to run this script against (because in that case the script would fall back on leaving things as-is for all packages).

Steps

  1. Make sure you have completed the required prerequisites described above
  2. Identify the target repository in your domain you want to automatically apply origin controls for, e.g. myrepo
  3. Identify the query parameters that define the list of packages you want to target. The script supports the same filters as the ListPackagesAPI.
    1. To match all packages in a repo, run :  python generate_package_configurations.py --region us-west-2 --domain mydomain --repository myrepo
      Please note that you always need to specify the AWS region and CodeArtifact domain alongside the repository.
    2. To match only some packages in a repo, for example only Python packages whose name begins with “internal_software_”:  python generate_package_configurations.py --region us-west-2 --domain mydomain --repository myrepo --format pypi --prefix internal_software_*
  4. If necessary, you can review the produced manifest file, which you will be able to find in the same folder under the name origin-configuration_mydomain_myrepo.csv (unless you have specified a different filename and path via the --output-file option)
  5. If the manifest file looks correct, you can apply the changes by calling the second stage: python apply_package_configurations.py --region us-west-2 --domain mydomain --repository myrepo --input origin-configuration_mydomain_myrepo.csv

To block package versions from upstream sources for all packages in a repository matching a list you maintain

Introduction

This procedure should be considered if you have a set of packages within your repository you know you want to apply origin control restrictions to. Rather than relying on a query, you can use such a list as an input to create a manifest.

Steps

Create a file containing a list of package names (and package names only). Multiple namespaces and formats are not supported and you will need to re-run this procedure for each. The expected file format is one package name per line. In this example we will want to block upstreams for three packages of the npm format (format is always mandatory when specifying a list of packages). As an example, a small input file is going to look something like this:

requests
numpy
django

(more information about this option can be found in the README)

Generate the manifest by supplying this file to the first stage, alongside the desired origin control configuration. In this case, we want to block upstreams for all packages in the supplied list (for more information about the origin control configuration string, consult the documentation): python generate_package_configurations.py --region us-west-2 --domain mydomain --repository myrepo --format npm --from-list inputfile.csv --set-restrictions publish=ALLOW,upstream=BLOCK

If necessary, you can review the produced manifest file, which you will be able to find in the same folder under the name origin-configuration_mydomain_myrepo.csv (unless you have specified a different filename and path via the –output-file option)

Run the apply_package_configurations.py script to update the package origin controls in your repository: apply_package_configurations.py --region us-west-2 --domain mydomain --repository myrepo --input origin-configuration_mydomain_myrepo.csv

To revert to the original state in case of an incorrect configuration push

Introduction

Erroneously bulk-changing your origin configuration can lead to broken builds and confusing failure modes for developers. To mitigate this risk, the toolkit backs up the existing configuration before making any changes and lets you easily revert them if need be.

Steps

Identify the manifest containing the configuration you want to revert. The toolkit automatically creates a backup file t for every input manifest you provide. This is the file produced by the first stage, which by default takes a name like origin-configuration_[domain]_[repository], for example origin-configuration_mydomain_myrepo.csv
Run the second stage script in restore mode:  python apply_package_configurations.py --region us-west-2 --domain mydomain --repository myrepo --input origin-configuration_mydomain_myrepo.csv --restore

Conclusion

In this blog post we have explained how to use the Origin Control toolkit to improve package security, focusing on restricting upstream package versions. We have demonstrated both an automated repository-wide application of the toolkit, which tries to minimize the amount of repository administrator work by applying a restriction heuristic, as well as a manual mode where a repository administrator can effect fine-grained origin control changes. Finally, we showed how these changes can be reverted using the built-in backup feature.

Author:

Davide Semenzin

Davide is a Software Development Engineer on the CodeArtifact team at Amazon Web Services (AWS). Previously he has worked at the Internet Archive building the infrastructure to digitize a million books per year. His interests are distributed systems, platform engineering and mission-critical high availability software. In his free time he likes to read books, fly airplanes, fly on airplanes, think about rockets and play with lasers.

Simplify and optimize Python package management for AWS Glue PySpark jobs with AWS CodeArtifact

Post Syndicated from Ashok Padmanabhan original https://aws.amazon.com/blogs/big-data/simplify-and-optimize-python-package-management-for-aws-glue-pyspark-jobs-with-aws-codeartifact/

Data engineers use various Python packages to meet their data processing requirements while building data pipelines with AWS Glue PySpark Jobs. Languages like Python and Scala are commonly used in data pipeline development. Developers can take advantage of their open-source packages or even customize their own to make it easier and faster to perform use cases, such as data manipulation and analysis. However, managing standardized packages can be cumbersome with multiple teams using different versions of packages, installing non-approved packages, and causing duplicate development effort due to the lack of visibility of what is available at the enterprise level. This can be especially challenging in large enterprises with multiple data engineering teams.

ETL Developers have requirements to use additional packages for their AWS Glue ETL jobs. With security being job zero for customers, many will restrict egress traffic from their VPC to the public internet, and they need a way to manage the packages used by applications including their data processing pipelines.

Our proposed solution will enable you with network egress restrictions to manage packages centrally with AWS CodeArtifact and use their favorite libraries in their AWS Glue ETL PySpark code. In this post, we’ll describe how CodeArtifact can be used for managing packages and modules for AWS Glue ETL jobs, and we’ll demo a solution using Glue PySpark jobs that run within VPC Subnets that have no internet access.

Solution overview

The solution uses CodeArtifact as a tool to make it easier for organizations of any size to securely store, publish, and share software packages used in their ETL with AWS Glue. VPC Endpoints will be enabled for CodeArtifact and Glue to enable private link connections. AWS Step Functions makes it easy to coordinate the orchestration of components used in the data processing pipeline. Native integrations with both CodeArtifact and AWS Glue enable the workflow to both authenticate the request to CodeArtifact and start the AWS Glue ETL job.

The following architecture shows an implementation of a solution using AWS Glue, CodeArtifact, and Step Functions to use additional Python modules without egress internet access. The solution is deployed using AWS Cloud Development Kit (AWS CDK), an open-source software development framework to define your cloud application resources using familiar programming languages.

Solution Architecture for the blog post

Fig 1: Architecture Diagram for the Solution

To illustrate how to set up this architecture, we’ll walk you through the following steps:

  1. Deploying an AWS CDK stack to provision the following AWS Resources
    1. CodeArtifact
    2. An AWS Glue job
    3. Step Functions workflow
    4. Amazon Simple Storage Service (Amazon S3) bucket
    5. A VPC with a private Subnet and VPC Endpoints to Amazon S3 and CodeArtifact
  2. Validate the Deployment.
  3. Run a Sample Workflow – This workflow will run an AWS Glue PySpark job that uses a custom Python library, and an upgraded version of boto3.
  4. Cleaning up your resources.

Prerequisites

Make sure that you complete the following steps as prerequisites:

The solution

Launching your AWS CDK Stack

Step 1: Using your device’s command line, check out our Git repository to a local directory on your device:

git clone https://github.com/aws-samples/python-lib-management-without-internet-for-aws-glue-in-private-subnets.git

Step 2: Change directories to the new directory Amazon S3 script location:

cd python-lib-management-without-internet-for-aws-glue-in-private-subnets/scripts/s3

Step 3: Download the following CSV, which contains New York City Taxi and Limousine Commission (TLC) Trip weekly trips. This will serve as the input source for the AWS Glue Job:

aws s3 cp s3://nyc-tlc/misc/FOIL_weekly_trips_apps.csv .

Step 4: Change the directories to the path where the app.py file is located (in reference to the previous step, execute the following step):

cd ../..

Step 5: Create a virtual environment:

macOS/Linux:
python3 -m venv .env

Windows:
python -m venv .env

Step 6: Activate the virtual environment after the init process completes and the virtual environment is created:

macOS/Linux:
source .env/bin/activate

Windows:
.env\Scripts\activate.bat

Step 7: Install the required dependencies:

pip3 install -r requirements.txt

Step 8: Make sure that your AWS profile is setup along with the region that you want to deploy as mentioned in the prerequisite. Synthesize the templates. AWS CDK apps use code to define the infrastructure, and when run they produce or “synthesize” a CloudFormation template for each stack defined in the application:

cdk synthesize

Step 9: BootStrap the cdk app using the following command:

cdk bootstrap aws://<AWS_ACCOUNTID>/<AWS_REGION>

Replace the place holder AWS_ACCOUNTID and AWS_REGION with your AWS account ID and the region to be deployed.

This step provisions the initial resources, including an Amazon S3 bucket for storing files and IAM roles that grant permissions needed to perform deployments.

Step 10: Deploy the solution. By default, some actions that could potentially make security changes require approval. In this deployment, you’re creating an IAM role. The following command overrides the approval prompts, but if you would like to manually accept the prompts, then omit the --require-approval never flag:

cdk deploy "*" --require-approval never

While the AWS CDK deploys the CloudFormation stacks, you can follow the deployment progress in your terminal:

AWS CDK Deployment progress in terminal

Fig 2: AWS CDK Deployment progress in terminal

Once the deployment is successful, you’ll see the successful status as follows:

AWS CDK Deployment completion success

Fig 3: AWS CDK Deployment completion success

Step 11: Log in to the AWS Console, go to CloudFormation, and see the output of the ApplicationStack stack:

AWS CloudFormation stack output

Fig 4: AWS CloudFormation stack output

Note the values of the DomainName and RepositoryName variables. We’ll use them in the next step to upload our artifacts

Step 12: We will upload a custom library into the repo that we created. This will be used by our Glue ETL job.

  • Install twine using pip:
python3 -m pip install twine

The custom python package glueutils-0.2.0.tar.gz can be found under this folder of the cloned repo:

cd scripts/custom_glue_library
  • Configure twine with the login command (additional details here ). Refer to step 11 for the DomainName and RepositoryName from the CloudFormation output:
aws codeartifact login --tool twine --domain <DomainName> --domain-owner <AWS_ACCOUNTID> --repository <RepositoryName>
  • Publish Python package assets:
twine upload --repository codeartifact glueutils-0.2.0.tar.gz
Python package publishing using twine

Fig 5: Python package publishing using twine

Validate the Deployment

The AWS CDK stack will deploy the following AWS resources:

  1. Amazon Virtual Private Cloud (Amazon VPC)
    1. One Private Subnet
  2. AWS CodeArtifact
    1. CodeArtifact Repository
    2. CodeArtifact Domain
    3. CodeArtifact Upstream Repository
  3. AWS Glue
    1. AWS Glue Job
    2. AWS Glue Database
    3. AWS Glue Connection
  4. AWS Step Function
  5. Amazon S3 Bucket for AWS CDK and also for storing scripts and CSV file
  6. IAM Roles and Policies
  7. Amazon Elastic Compute Cloud (Amazon EC2) Security Group

Step 1: Browse to the AWS account and region via the AWS Console to which the resources are deployed.

Step 2: Browse the Subnet page (https://<region> .console.aws.amazon.com/vpc/home?region=<region> #subnets:) (*Replace region with actual AWS Region to which your resources are deployed)

Step 3: Select the Subnet with name as ApplicationStack/enterprise-repo-vpc/Enterprise-Repo-Private-Subnet1

Step 4: Select the Route Table and validate that there are no Internet Gateway or NAT Gateway for routes to Internet, and that it’s similar to the following image:

Route table validation

Fig 6: Route table validation

Step 5: Navigate to the CodeArtifact console and review the repositories created. The enterprise-repo is your local repository, and pypi-store is the upstream repository connected to the PyPI, providing artifacts from pypi.org.

AWS CodeArifact repositories created

Fig 7: AWS CodeArifact repositories created

Step 6: Navigate to enterprise-repo and search for glueutils. This is the custom python package that we published.

AWS CodeArifact custom python package published

Fig 8: AWS CodeArifact custom python package published

Step 7: Navigate to Step Functions Console and review the enterprise-repo-step-function as follows:

AWS Step Functions workflow

Fig 9: AWS Step Functions workflow

The diagram shows how the Step Functions workflow will orchestrate the pattern.

  1. The first step CodeArtifactGetAuthorizationToken calls the getAuthorizationToken API to generate a temporary authorization token for accessing repositories in the domain (this token is valid for 15 mins.).
  2. The next step GenerateCodeArtifactURL takes the authorization token from the response and generates the CodeArtifact URL.
  3. Then, this will move into the GlueStartJobRun state, which makes a synchronous API call to run the AWS Glue job.

Step 8: Navigate to the AWS Glue Console and select the Jobs tab, then select enterprise-repo-glue-job.

The AWS Glue job is created with the following script and AWS Glue Connection enterprise-repo-glue-connection. The AWS Glue connection is a Data Catalog object that enables the job to connect to sources and APIs from within the VPC. The network type connection runs the job from within the private subnet to make requests to Amazon S3 and CodeArtifact over the VPC endpoint connection. This enables the job to run without any traffic through the internet.

Note the connections section in the AWS Glue PySpark Job, which makes the Glue job run on the private subnet in the VPC provisioned.

AWS Glue network connections

Fig 10: AWS Glue network connections

The job takes an Amazon S3 bucket, Glue Database, Python Job Installer Option, and Additional Python Modules as job parameters. The parameters --additional-python-modules and --python-modules-installer-option are passed to install the selected Python module from a PyPI repository hosted in AWS CodeArtifact.

The script itself first reads the Amazon S3 input path of the taxi data in the CSV format. A light transformation to sum the total trips by year, week, and app is performed. Then the output is written to an Amazon S3 path as parquet . A partitioned table in the AWS Glue Data Catalog will either be created or updated if it already exists .

You can find the Glue PySpark script here.

Run a sample workflow

The following steps will demonstrate how to run a sample workflow:

Step 1: Navigate to the Step Functions Console and select the enterprise-repo-step-function.

Step 2: Select Start execution and input the following: We’re including the glueutils and latest boto3 libraries as part of the job run. It is always recommended to pin your python dependencies to avoid any breaking change due to a future version of dependency . In the below example, the latest available version of boto3, and the 0.2.0 version of glueutils will be installed. To pin it to a specific release you may add  boto3==1.24.2   (Current latest release at the time of publishing this post).

{"pythonmodules": "boto3,glueutils==0.2.0"}

Step 3: Select Start execution and wait until Execution Status is Succeeded. This may take a few minutes.

Step 4: Navigate to the CodeArtifact Console to review the enterprise-repo repository. You’ll see the cached PyPi packages and all of their dependencies pulled down from PyPi.

Step 5: In the Glue Console under the Runs section of the enterprise-glue-job, you’ll see the parameters passed:

Fig 11 : AWS Glue job execution history

Fig 11 : AWS Glue job execution history

Note the --index-url which was passed as a parameter to the glue ETL job. The token is valid only for 15 minutes.

Step 6: Navigate to the Amazon CloudWatch Console and go to the /aws/glue-jobs log group to verify that the packages were installed from the local repo.

You will see that the 2 package names passed as parameters are installed with the corresponding versions.

Fig 12 : Amazon CloudWatch logs details for the Glue job

Fig 12 : Amazon CloudWatch logs details for the Glue job

Step 7: Navigate to the Amazon Athena console and select Query Editor.

Step 8: Run the following query to validate the output of the AWS Glue job:

SELECT year, app, SUM(total_trips) as sum_of_total_trips 
FROM 
"codeartifactblog_glue_db"."taxidataparquet" 
GROUP BY year, app;

Clean up

Make sure that you clean up all of the other AWS resources that you created in the AWS CDK Stack deployment. You can delete these resources via the AWS CDK Destroy command as follows or the CloudFormation console.

To destroy the resources using AWS CDK, follow these steps:

  1. Follow Steps 1-6 from the ‘Launching your CDK Stack’ section.
  2. Destroy the app by executing the following command:
    cdk destroy

Conclusion

In this post, we demonstrated how CodeArtifact can be used for managing Python packages and modules for AWS Glue jobs that run within VPC Subnets that have no internet access. We also demonstrated how the versions of existing packages can be updated (i.e., boto3) and a custom Python library (glueutils) that is developed locally is also managed through CodeArtifact.

This post enables you to use your favorite Python packages with AWS Glue ETL PySpark jobs by modifying the input to the AWS StepFunctions workflow (Step 2 in the Run a Sample workflow section).


About the Authors

Bret Pontillo is a Data & ML Engineer with AWS Professional Services. He works closely with enterprise customers building data lakes and analytical applications on the AWS platform. In his free time, Bret enjoys traveling, watching sports, and trying new restaurants.

Gaurav Gundal is a DevOps consultant with AWS Professional Services, helping customers build solutions on the customer platform. When not building, designing, or developing solutions, Gaurav spends time with his family, plays guitar, and enjoys traveling to different places.

Ashok Padmanabhan is a Sr. IOT Data Architect with AWS Professional Services, helping customers build data and analytics platform and solutions. When not helping customers build and design data lakes, Ashok enjoys spending time at the beach near his home in Florida.

Building an InnerSource ecosystem using AWS DevOps tools

Post Syndicated from Debashish Chakrabarty original https://aws.amazon.com/blogs/devops/building-an-innersource-ecosystem-using-aws-devops-tools/

InnerSource is the term for the emerging practice of organizations adopting the open source methodology, albeit to develop proprietary software. This blog discusses the building of a model InnerSource ecosystem that leverages multiple AWS services, such as CodeBuild, CodeCommit, CodePipeline, CodeArtifact, and CodeGuru, along with other AWS services and open source tools.

What is InnerSource and why is it gaining traction?

Most software companies leverage open source software (OSS) in their products, as it is a great mechanism for standardizing software and bringing in cost effectiveness via the re-use of high quality, time-tested code. Some organizations may allow its use as-is, while others may utilize a vetting mechanism to ensure that the OSS adheres to the organization standards of security, quality, etc. This confidence in OSS stems from how these community projects are managed and sustained, as well as the culture of openness, collaboration, and creativity that they nurture.

Many organizations building closed source software are now trying to imitate these development principles and practices. This approach, which has been perhaps more discussed than adopted, is popularly called “InnerSource”. InnerSource serves as a great tool for collaborative software development within the organization perimeter, while keeping its concerns for IP & Legality in check. It provides collaboration and innovation avenues beyond the confines of organizational silos through knowledge and talent sharing. Organizations reap the benefits of better code quality and faster time-to-market, yet at only a fraction of the cost.

What constitutes an InnerSource ecosystem?

Infrastructure and processes that harbor collaboration stand at the heart of InnerSource ecology. These systems (refer Figure 1) would typically include tools supporting features such as code hosting, peer reviews, Pull Request (PR) approval flow, issue tracking, documentation, communication & collaboration, continuous integration, and automated testing, among others. Another major component of this system is an entry portal that enables the employees to discover the InnerSource projects and join the community, beginning as ordinary users of the reusable code and later graduating to contributors and committers.

A typical InnerSource ecosystem

Figure 1: A typical InnerSource ecosystem

More to InnerSource than meets the eye

This blog focuses on detailing a technical solution for establishing the required tools for an InnerSource system primarily to enable a development workflow and infrastructure. But the secret sauce of an InnerSource initiative in an enterprise necessitates many other ingredients.

InnerSource Roles & Use Cases

Figure 2: InnerSource Roles & Use Cases

InnerSource thrives on community collaboration and a low entry barrier to enable adoptability. In turn, that demands a cultural makeover. While strategically deciding on the projects that can be inner sourced as well as the appropriate licensing model, enterprises should bootstrap the initiative with a seed product that draws the community, with maintainers and the first set of contributors. Many of these users would eventually be promoted, through a meritocracy-based system, to become the trusted committers.

Over a set period, the organization should plan to move from an infra specific model to a project specific model. In a Project-specific InnerSource model, the responsibility for a particular software asset is owned by a dedicated team funded by other business units. Whereas in the Infrastructure-based InnerSource model, the organization provides the necessary infrastructure to create the ecosystem with code & document repositories, communication tools, etc. This enables anybody in the organization to create a new InnerSource project, although each project initiator maintains their own projects. They could begin by establishing a community of practice, and designating a core team that would provide continuing support to the InnerSource projects’ internal customers. Having a team of dedicated resources would clearly indicate the organization’s long-term commitment to sustaining the initiative. The organization should promote this culture through regular boot camps, trainings, and a recognition program.

Lastly, the significance of having a modular architecture in the InnerSource projects cannot be understated. This architecture helps developers understand the code better, as well as aids code reuse and parallel development, where multiple contributors could work on different code modules while avoiding conflicts during code merges.

A model InnerSource solution using AWS services

This blog discusses a solution that weaves various services together to create the necessary infrastructure for an InnerSource system. While it is not a full-blown solution, and it may lack some other components that an organization may desire in its own system, it can provide you with a good head start.

The ultimate goal of the model solution is to enable a developer workflow as depicted in Figure 3.

Typical developer workflow at InnerSource

Figure 3: Typical developer workflow at InnerSource

At the core of the InnerSource-verse is the distributed version control (AWS CodeCommit in our case). To maintain system transparency, openness, and participation, we must have a discovery mechanism where users could search for the projects and receive encouragement to contribute to the one they prefer (Step 1 in Figure 4).

Architecture diagram for the model InnerSource system

Figure 4: Architecture diagram for the model InnerSource system

For this purpose, the model solution utilizes an open source reference implantation of InnerSource Portal. The portal indexes data from AWS CodeCommit by using a crawler, and it lists available projects with associated metadata, such as the skills required, number of active branches, and average number of commits. For CodeCommit, you can use the crawler implementation that we created in the open source code repo at https://github.com/aws-samples/codecommit-crawler-innersource.

The major portal feature is providing an option to contribute to a project by using a “Contribute” link. This can present a pop-up form to “apply as a contributor” (Step 2 in Figure 4), which when submitted sends an email (or creates a ticket) to the project maintainer/committer who can create an IAM (Step 3 in Figure 4) user with access to the particular repository. Note that the pop-up form functionality is built into the open source version of the portal. However, it would be trivial to add one with an associated action (send an email, cut a ticket, etc.).

InnerSource portal indexes CodeCommit repos and provides a bird’s eye view

Figure 5: InnerSource portal indexes CodeCommit repos and provides a bird’s eye view

The contributor, upon receiving access, logs in to CodeCommit, clones the mainline branch of the InnerSource project (Step 4 in Figure 4) into a fix or feature branch, and starts altering/adding the code. Once completed, the contributor commits the code to the branch and raises a PR (Step 5 in Figure 4). A Pull Request is a mechanism to offer code to an existing repository, which is then peer-reviewed and tested before acceptance for inclusion.

The PR triggers a CodeGuru review (Step 6 in Figure 4) that adds the recommendations as comments on the PR. Furthermore, it triggers a CodeBuild process (Steps 7 to 10 in Figure 4) and logs the build result in the PR. At this point, the code can be peer reviewed by Trusted Committers or Owners of the project repository. The number of approvals would depend on the approval template rule configured in CodeCommit. The Committer(s) can approve the PR (Step 12 in Figure 4) and merge the code to the mainline branch – that is once they verify that the code serves its purpose, has passed the required tests, and doesn’t break the build. They could also rely on the approval vote from a sanity test conducted by a CodeBuild process. Optionally, a build process could deploy the latest mainline code (Step 14 in Figure 4) on the PR merge commit.

To maintain transparency in all communications related to progress, bugs, and feature requests to downstream users and contributors, a communication tool may be needed. This solution does not show integration with any Issue/Bug tracking tool out of the box. However, multiple of these tools are available at the AWS marketplace, with some offering forum and Wiki add-ons in order to elicit discussions. Standard project documentation can be kept within the repository by using the constructs of the README.md file to provide project mission details and the CONTRIBUTING.md file to guide the potential code contributors.

An overview of the AWS services used in the model solution

The model solution employs the following AWS services:

  • Amazon CodeCommit: a fully managed source control service to host secure and highly scalable private Git repositories.
  • Amazon CodeBuild: a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy.
  • Amazon CodeDeploy: a service that automates code deployments to any instance, including EC2 instances and instances running on-premises.
  • Amazon CodeGuru: a developer tool providing intelligent recommendations to improve code quality and identify an application’s most expensive lines of code.
  • Amazon CodePipeline: a fully managed continuous delivery service that helps automate release pipelines for fast and reliable application and infrastructure updates.
  • Amazon CodeArtifact: a fully managed artifact repository service that makes it easy to securely store, publish, and share software packages utilized in their software development process.
  • Amazon S3: an object storage service that offers industry-leading scalability, data availability, security, and performance.
  • Amazon EC2: a web service providing secure, resizable compute capacity in the cloud. It is designed to ease web-scale computing for developers.
  • Amazon EventBridge: a serverless event bus that eases the building of event-driven applications at scale by using events generated from applications and AWS services.
  • Amazon Lambda: a serverless compute service that lets you run code without provisioning or managing servers.

The journey of a thousand miles begins with a single step

InnerSource might not be the right fit for every organization, but is a great step for those wanting to encourage a culture of quality and innovation, as well as purge silos through enhanced collaboration. It requires backing from leadership to sponsor the engineering initiatives, as well as champion the establishment of an open and transparent culture granting autonomy to the developers across the org to contribute to projects outside of their teams. The organizations best-suited for InnerSource have already participated in open source initiatives, have engineering teams that are adept with CI/CD tools, and are willing to adopt OSS practices. They should start small with a pilot and build upon their successes.

Conclusion

Ever more enterprises are adopting the open source culture to develop proprietary software by establishing an InnerSource. This instills innovation, transparency, and collaboration that result in cost effective and quality software development. This blog discussed a model solution to build the developer workflow inside an InnerSource ecosystem, from project discovery to PR approval and deployment. Additional features, like an integrated Issue Tracker, Real time chat, and Wiki/Forum, can further enrich this solution.

If you need helping hands, AWS Professional Services can help adapt and implement this model InnerSource solution in your enterprise. Moreover, our Advisory services can help establish the governance model to accelerate OSS culture adoption through Experience Based Acceleration (EBA) parties.

References

About the authors

Debashish Chakrabarty

Debashish Chakrabarty

Debashish is a Senior Engagement Manager at AWS Professional Services, India managing complex projects on DevOps, Security and Modernization and help ProServe customers accelerate their adoption of AWS Services. He loves to keep connected to his technical roots. Outside of work, Debashish is a Hindi Podcaster and Blogger. He also loves binge-watching on Amazon Prime, and spending time with family.

Akash Verma

Akash Verma

Akash works as a Cloud Consultant for AWS Professional Services, India. He enjoys learning new technologies and helping customers solve complex technical problems and drive business outcomes by providing solutions using AWS products and services. Outside of work, Akash loves to travel, interact with new people and try different cuisines. He also enjoy gardening, watching Stand-up comedy, and listening to poetry.

CICD on Serverless Applications using AWS CodeArtifact

Post Syndicated from Anand Krishna original https://aws.amazon.com/blogs/devops/cicd-on-serverless-applications-using-aws-codeartifact/

Developing and deploying applications rapidly to users requires a working pipeline that accepts the user code (usually via a Git repository). AWS CodeArtifact was announced in 2020. It’s a secure and scalable artifact management product that easily integrates with other AWS products and services. CodeArtifact allows you to publish, store, and view packages, list package dependencies, and share your application’s packages.

In this post, I will show how we can build a simple DevOps pipeline for a sample JAVA application (JAR file) to be built with Maven.

Solution Overview

We utilize the following AWS services/Tools/Frameworks to set up our continuous integration, continuous deployment (CI/CD) pipeline:

The following diagram illustrates the pipeline architecture and flow:

 

aws-codeartifact-pipeline

 

Our pipeline is built on CodePipeline with CodeCommit as the source (CodePipeline Source Stage). This triggers the pipeline via a CloudWatch Events rule. Then the code is fetched from the CodeCommit repository branch (main) and sent to the next pipeline phase. This CodeBuild phase is specifically for compiling, packaging, and publishing the code to CodeArtifact by utilizing a package manager—in this case Maven.

After Maven publishes the code to CodeArtifact, the pipeline asks for a manual approval to be directly approved in the pipeline. It can also optionally trigger an email alert via Amazon Simple Notification Service (Amazon SNS). After approval, the pipeline moves to another CodeBuild phase. This downloads the latest packaged JAR file from a CodeArtifact repository and deploys to the AWS Lambda function.

Clone the Repository

Clone the GitHub repository as follows:

git clone https://github.com/aws-samples/aws-cdk-codeartifact-pipeline-sample.git

Code Deep Dive

After the Git repository is cloned, the directory structure is shown as in the following screenshot :

aws-codeartifact-pipeline-code

Let’s study the files and code to understand how the pipeline is built.

The directory java-events is a sample Java Maven project. Find numerous sample applications on GitHub. For this post, we use the sample application java-events.

To add your own application code, place the pom.xml and settings.xml files in the root directory for the AWS CDK project.

Let’s study the code in the file lib/cdk-pipeline-codeartifact-new-stack.ts of the stack CdkPipelineCodeartifactStack. This is the heart of the AWS CDK code that builds the whole pipeline. The stack does the following:

  • Creates a CodeCommit repository called ca-pipeline-repository.
  • References a CloudFormation template (lib/ca-template.yaml) in the AWS CDK code via the module @aws-cdk/cloudformation-include.
  • Creates a CodeArtifact domain called cdkpipelines-codeartifact.
  • Creates a CodeArtifact repository called cdkpipelines-codeartifact-repository.
  • Creates a CodeBuild project called JarBuild_CodeArtifact. This CodeBuild phase does all of the code compiling, packaging, and publishing to CodeArtifact into a repository called cdkpipelines-codeartifact-repository.
  • Creates a CodeBuild project called JarDeploy_Lambda_Function. This phase fetches the latest artifact from CodeArtifact created in the previous step (cdkpipelines-codeartifact-repository) and deploys to the Lambda function.
  • Finally, creates a pipeline with four phases:
    • Source as CodeCommit (ca-pipeline-repository).
    • CodeBuild project JarBuild_CodeArtifact.
    • A Manual approval Stage.
    • CodeBuild project JarDeploy_Lambda_Function.

 

CodeArtifact shows the domain-specific and repository-specific connection settings to mention/add in the application’s pom.xml and settings.xml files as below:

aws-codeartifact-repository-connections

Deploy the Pipeline

The AWS CDK code requires the following packages in order to build the CI/CD pipeline:

  • @aws-cdk/core
  • @aws-cdk/aws-codepipeline
  • @aws-cdk/aws-codepipeline-actions
  • @aws-cdk/aws-codecommit
  • @aws-cdk/aws-codebuild
  • @aws-cdk/aws-iam
  • @aws-cdk/cloudformation-include

 

Install the required AWS CDK packages as below:

npm i @aws-cdk/core @aws-cdk/aws-codepipeline @aws-cdk/aws-codepipeline-actions @aws-cdk/aws-codecommit @aws-cdk/aws-codebuild @aws-cdk/pipelines @aws-cdk/aws-iam @ @aws-cdk/cloudformation-include

Compile the AWS CDK code:

npm run build

Deploy the AWS CDK code:

cdk synth
cdk deploy

After the AWS CDK code is deployed, view the final output on the stack’s detail page on the AWS CloudFormation :

aws-codeartifact-pipeline-cloudformation-stack

 

How the pipeline works with artifact versions (using SNAPSHOTS)

In this demo, I publish SNAPSHOT to the repository. As per the documentation here and here, a SNAPSHOT refers to the most recent code along a branch. It’s a development version preceding the final release version. Identify a snapshot version of a Maven package by the suffix SNAPSHOT appended to the package version.

The application settings are defined in the pom.xml file. For this post, we define the following:

  • The version to be used, called 1.0-SNAPSHOT.
  • The specific packaging, called jar.
  • The specific project display name, called JavaEvents.
  • The specific group ID, called JavaEvents.

The screenshot below shows the pom.xml settings we utilised in the application:

aws-codeartifact-pipeline-pom-xml

 

You can’t republish a package asset that already exists with different content, as per the documentation here.

When a Maven snapshot is published, its previous version is preserved in a new version called a build. Each time a Maven snapshot is published, a new build version is created.

When a Maven snapshot is published, its status is set to Published, and the status of the build containing the previous version is set to Unlisted. If you request a snapshot, the version with status Published is returned. This is always the most recent Maven snapshot version.

For example, the image below shows the state when the pipeline is run for the FIRST RUN. The latest version has the status Published and previous builds are marked Unlisted.

aws-codeartifact-repository-package-versions

 

For all subsequent pipeline runs, multiple Unlisted versions will occur every time the pipeline is run, as all previous versions of a snapshot are maintained in its build versions.

aws-codeartifact-repository-package-versions

 

Fetching the Latest Code

Retrieve the snapshot from the repository in order to deploy the code to an AWS Lambda Function. I have used AWS CLI to list and fetch the latest asset of package version 1.0-SNAPSHOT.

 

Listing the latest snapshot

export ListLatestArtifact = `aws codeartifact list-package-version-assets —domain cdkpipelines-codeartifact --domain-owner $Account_Id --repository cdkpipelines-codeartifact-repository --namespace JavaEvents --format maven --package JavaEvents --package-version "1.0-SNAPSHOT"| jq ".assets[].name"|grep jar|sed ’s/“//g’`

NOTE : Please note the dynamic CDK variable $Account_Id which represents AWS Account ID.

 

Fetching the latest code using Package Version

aws codeartifact get-package-version-asset --domain cdkpipelines-codeartifact --repository cdkpipelines-codeartifact-repository --format maven --package JavaEvents --package-version 1.0-SNAPSHOT --namespace JavaEvents --asset $ListLatestArtifact demooutput

Notice that I’m referring the last code by using variable $ListLatestArtifact. This always fetches the latest code, and demooutput is the outfile of the AWS CLI command where the content (code) is saved.

 

Testing the Pipeline

Now clone the CodeCommit repository that we created with the following code:

git clone https://git-codecommit.<region>.amazonaws.com/v1/repos/codeartifact-pipeline-repository

 

Enter the following code to push the code to the CodeCommit repository:

cp -rp cdk-pipeline-codeartifact-new /* ca-pipeline-repository
cd ca-pipeline-repository
git checkout -b main
git add .
git commit -m “testing the pipeline”
git push origin main

Once the code is pushed to Git repository, the pipeline is automatically triggered by Amazon CloudWatch events.

The following screenshots shows the second phase (AWS CodeBuild Phase – JarBuild_CodeArtifact) of the pipeline, wherein the asset is successfully compiled and published to the CodeArtifact repository by Maven:

aws-codeartifact-pipeline-codebuild-jarbuild

aws-codeartifact-pipeline-codebuild-screenshot

aws-codeartifact-pipeline-codebuild-screenshot2

 

The following screenshots show the last phase (AWS CodeBuild Phase – Deploy-to-Lambda) of the pipeline, wherein the latest asset is successfully pulled and deployed to AWS Lambda Function.

Asset JavaEvents-1.0-20210618.131629-5.jar is the latest snapshot code for the package version 1.0-SNAPSHOT. This is the same asset version code that will be deployed to AWS Lambda Function, as seen in the screenshots below:

aws-codeartifact-pipeline-codebuild-jardeploy

aws-codeartifact-pipeline-codebuild-screenshot-jarbuild

The following screenshot of the pipeline shows a successful run. The code was fetched and deployed to the existing Lambda function (codeartifact-test-function).

aws-codeartifact-pipeline-codepipeline

Cleanup

To clean up, You can either delete the entire stack through the AWS CloudFormation console or use AWS CDK command like below –

cdk destroy

For more information on the AWS CDK commands, please check the here or sample here.

Summary

In this post, I demonstrated how to build a CI/CD pipeline for your serverless application with AWS CodePipeline by utilizing AWS CDK with AWS CodeArtifact. Please check the documentation here for an in-depth explanation regarding other package managers and the getting started guide.

Keeping up with your dependencies: building a feedback loop for shared libraries

Post Syndicated from Joerg Woehrle original https://aws.amazon.com/blogs/devops/keeping-up-with-your-dependencies-building-a-feedback-loop-for-shared-libraries/

In a microservices world, it’s common to share as little as possible between services. This enables teams to work independently of each other, helps to reduce wait times and decreases coupling between services.

However, it’s also a common scenario that libraries for cross-cutting-concerns (such as security or logging) are developed one time and offered to other teams for consumption. Although it’s vital to offer an opt-out of those libraries (namely, use your own code to address the cross-cutting-concern, such as when there is no version for a given language), shared libraries also provide the benefit of better governance and time savings.

To avoid these pitfalls when sharing artifacts, two points are important:

  • For consumers of shared libraries, it’s important to stay up to date with new releases in order to benefit from security, performance, and feature improvements.
  • For producers of shared libraries, it’s important to get quick feedback in case of an involuntarily added breaking change.

Based on those two factors, we’re looking for the following solution:

  • A frictionless and automated way to update consumer’s code to the latest release version of a given library
  • Immediate feedback to the library producer in case of a breaking change (the new version of the library breaks the build of a downstream system)

In this blog post I develop a solution that takes care of both those problems. I use Amazon EventBridge to be notified on new releases of a library in AWS CodeArtifact. I use an AWS Lambda function along with an AWS Fargate task to automatically create a pull request (PR) with the new release version on AWS CodeCommit. Finally, I use AWS CodeBuild to kick off a build of the PR and notify the library producer via EventBridge and Amazon Simple Notification Service (Amazon SNS) in case of a failure.

Overview of solution

Let’s start with a short introduction on the services I use for this solution:

  1. CodeArtifact – A fully managed artifact repository service that makes it easy for organizations of any size to securely store, publish, and share software packages used in their software development process. CodeArtifact works with commonly used package managers and build tools like Maven, Gradle, npm, yarn, twine, and pip.
  2. CodeBuild – A fully managed continuous integration service that compiles source code, runs tests, and produces software packages that are ready to deploy.
  3. CodeCommit – A fully-managed source control service that hosts secure Git-based repositories.
  4. EventBridge – A serverless event bus that makes it easy to connect applications together using data from your own applications, integrated software as a service (SaaS) applications, and AWS services. EventBridge makes it easy to build event-driven applications because it takes care of event ingestion and delivery, security, authorization, and error handling.
  5. Fargate – A serverless compute engine for containers that works with both Amazon Elastic Container Service (ECS) and Amazon Elastic Kubernetes Service (EKS). Fargate removes the need to provision and manage servers, lets you specify and pay for resources per application, and improves security through application isolation by design.
  6. Lambda – Lets you run code without provisioning or managing servers. You pay only for the compute time you consume.
  7. Amazon SNS – A fully managed messaging service for both application-to-application (A2A) and application-to-person (A2P) communication.

The resulting flow through the system looks like the following diagram.

Architecture Diagram

 

In my example, I look at two independent teams working in two different AWS accounts. Team A is the provider of the shared library, and Team B is the consumer.

Let’s do a high-level walkthrough of the involved steps and components:

  1. A new library version is released by Team A and pushed to CodeArtifact.
  2. CodeArtifact creates an event when the new version is published.
  3. I send this event to the default event bus in Team B’s AWS account.
  4. An EventBridge rule in Team B’s account triggers a Lambda function for further processing.
  5. The function filters SNAPSHOT releases (in Maven a SNAPSHOT represents an artifact still under development that doesn’t have a final release yet) and runs an Amazon ECS Fargate task for non-SNAPSHOT versions.
  6. The Fargate task checks out the source that uses the shared library, updates the library’s version in the pom.xml, and creates a pull request to integrate the change into the mainline of the code repository.
  7. The pull request creation results in an event being published.
  8. An EventBridge rule triggers the CodeBuild project of the downstream artifact.
  9. The result of the build is published as an event.
  10. If the build fails, this failure is propagated back to the event bus of Team A.
  11. The failure is forwarded to an SNS topic that notifies the subscribers of the failure.

Amazon EventBridge

A central component of the solution is Amazon EventBridge. I use EventBridge to receive and react on events emitted by the various AWS services in the solution (e.g., whenever a new version of an artifact gets uploaded to CodeArtifact, when a PR is created within CodeCommit or when a build fails in CodeBuild). Let’s have a high-level look on some of the central concepts of EventBridge:

  • Event Bus – An event bus is a pipeline that receives events. There is a default event bus in each account which receives events from AWS services. One can send events to an event bus via the PutEvents API.
  • Event – An event indicates a change in e.g., an AWS environment, a SaaS partner service or application or one of your applications.
  • Rule – A rule matches incoming events on an event bus and sends them to targets for processing. To react on a particular event, one creates a rule which matches this event. To learn more about the rule concept check out Rules on the EventBridge documentation.
  • Target – When an event matches the event pattern defined in a rule it is send to a target. There are currently more than 20 target types available in EventBridge. In this blog post I use the targets provided for: an event bus in a different account, a Lambda function, a CodeBuild project and an SNS topic. For a detailed list on available targets see Amazon EventBridge targets.

Solution Details:

In this section I walk through the most important parts of the solution. The complete code can be found on GitHub. For a detailed view on the resources created in each account please refer to the GitHub repository.

I use the AWS Cloud Development Kit (CDK) to create my infrastructure. For some of the resource types I create, no higher-level constructs are available yet (at the time of writing, I used AWS CDK version 1.108.1). This is why I sometimes use low-level AWS CloudFormation constructs or even use the provided escape hatches to use AWS CloudFormation constructs directly.

The code for the shared library producer and consumer is written in Java and uses Apache Maven for dependency management. However, the same concepts apply to e.g., Node.js and npm.

Notify another account of new releases

To send events from EventBridge to another account, the receiving account needs to specify an EventBusPolicy. The AWS CDK code on the consumer account looks like the following code:

new events.CfnEventBusPolicy(this, 'EventBusPolicy', {
    statementId: 'AllowCrossAccount',
    action: 'events:PutEvents',
    principal: consumerAccount
});

With that the producer account has the permission to publish events into the event bus of the consumer account.

I’m interested in CodeArtifact events that are published on the release of a new artifact. I first create a Rule which matches those events. Next, I add a target to the rule which targets the event bus of account B. As of this writing there is no CDK construct available to directly add another account as a target. That is why I use the underlying CloudFormation CfnRule to do that. This is called an escape hatch in CDK. For more information about escape hatches, see Escape hatches.

const onLibraryReleaseRule = new events.Rule(this, 'LibraryReleaseRule', {
  eventPattern: {
    source: [ 'aws.codeartifact' ],
    detailType: [ 'CodeArtifact Package Version State Change' ],
    detail: {
      domainOwner: [ this.account ],
      domainName: [ codeArtifactDomain.domainName ],
      repositoryName: [ codeArtifactRepo.repositoryName ],
      packageVersionState: [ 'Published' ],
      packageFormat: [ 'maven' ]
    }
  }
});
/* there is currently no CDK construct provided to add an event bus in another account as a target. 
That's why we use the underlying CfnRule directly */
const cfnRule = onLibraryReleaseRule.node.defaultChild as events.CfnRule;
cfnRule.targets = [ {arn: `arn:aws:events:${this.region}:${consumerAccount}:event-bus/default`, id: 'ConsumerAccount'} ];

For more information about event formats, see CodeArtifact event format and example.

Act on new releases in the consumer account

I established the connection between the events produced by Account A and Account B: The events now are available in Account B’s event bus. To use them, I add a rule which matches this event in Account B:

const onLibraryReleaseRule = new events.Rule(this, 'LibraryReleaseRule', {
  eventPattern: {
    source: [ 'aws.codeartifact' ],
    detailType: [ 'CodeArtifact Package Version State Change' ],
    detail: {
      domainOwner: [ producerAccount ],
      packageVersionState: [ 'Published' ],
      packageFormat: [ 'maven' ]
    }
  }
});

Add a Lambda function target

Now that I created a rule to trigger anytime a new package version is published, I will now add an EventBridge target which  triggers my runTaskLambda Lambda Function. The below CDK code shows how I add our Lambda function as a target to the onLibraryRelease rule. Notice how I extract information from the event’s payload and pass it into the Lambda function’s invocation event.

onLibraryReleaseRule.addTarget(
    new targets.LambdaFunction( runTaskLambda,{
      event: events.RuleTargetInput.fromObject({
        groupId: events.EventField.fromPath('$.detail.packageNamespace'),
        artifactId: events.EventField.fromPath('$.detail.packageName'),
        version: events.EventField.fromPath('$.detail.packageVersion'),
        repoUrl: codeCommitRepo.repositoryCloneUrlHttp,
        region: this.region
      })
    }));

Filter SNAPSHOT versions

Because I’m not interested in Maven SNAPSHOT versions (such as 1.0.1-SNAPSHOT), I have to find a way to filter those and only act upon non-SNAPSHOT versions. Even though content-based filtering on event patterns is supported by Amazon EventBridge, filtering on suffixes is not supported as of this writing. This is why the Lambda function filters SNAPSHOT versions and only acts upon real, non-SNAPSHOT, releases. For those, I start a custom Amazon ECS Fargate task by using the AWS JavaScript SDK. My function passes some environment overrides to the Fargate task in order to have the required information about the artifact available at runtime.

In the following function code, I pass all required information to create a pull request into the environment of the Fargate task:

const AWS = require('aws-sdk');

const ECS = new AWS.ECS();
exports.handler = async (event) => {
    console.log(`Received event: ${JSON.stringify(event)}`)
    const artifactVersion = event.version;
    const artifactId = event.artifactId;
    if ( artifactVersion.indexOf('SNAPSHOT') > -1 ) {
        console.log(`Skipping SNAPSHOT version ${artifactVersion}`)
    } else {
        console.log(`Triggering task to create pull request for version ${artifactVersion} of artifact ${artifactId}`);
        const params = {
            launchType: 'FARGATE',
            taskDefinition: process.env.TASK_DEFINITION_ARN,
            cluster: process.env.CLUSTER_ARN,
            networkConfiguration: {
                awsvpcConfiguration: {
                    subnets: process.env.TASK_SUBNETS.split(',')
                }
            },
            overrides: {
                containerOverrides: [ {
                    name: process.env.CONTAINER_NAME,
                    environment: [
                        {name: 'REPO_URL', value: process.env.REPO_URL},
                        {name: 'REPO_NAME', value: process.env.REPO_NAME},
                        {name: 'REPO_REGION', value: process.env.REPO_REGION},
                        {name: 'ARTIFACT_VERSION', value: artifactVersion},
                        {name: 'ARTIFACT_ID', value: artifactId}
                    ]
                } ]
            }
        };
        await ECS.runTask(params).promise();
    }
};

Create the pull request

With the environment set, I can use a simple bash script inside the container to create a new Git branch, update the pom.xml with the new dependency version, push the branch to CodeCommit, and use the AWS Command Line Interface (AWS CLI) to create the pull request. The Docker entrypoint looks like the following code:

#!/usr/bin/env bash
set -e

# clone the repository and create a new branch for the change
git clone --depth 1 $REPO_URL repo && cd repo
branch="library_update_$(date +"%Y-%m-%d_%H-%M-%S")"
git checkout -b "$branch"

# replace whatever version is currently used by the new version of the library
sed -i "s/<shared\.library\.version>.*<\/shared\.library\.version>/<shared\.library\.version>${ARTIFACT_VERSION}<\/shared\.library\.version>/g" pom.xml

# stage, commit and push the change
git add pom.xml
git -c "user.name=ECS Pull Request Creator" -c "[email protected]" commit -m "Update version of ${ARTIFACT_ID} to ${ARTIFACT_VERSION}"
git push --set-upstream origin "$branch"

# create pull request
aws codecommit create-pull-request --title "Update version of ${ARTIFACT_ID} to ${ARTIFACT_VERSION}" --targets repositoryName="$REPO_NAME",sourceReference="$branch",destinationReference=main --region "$REPO_REGION"

After a successful run, I can check the CodeCommit UI for the created pull request. The following screenshot shows the changes introduced by one of my pull requests during testing:

Screenshot of the Pull Request in AWS CodeCommit

Now that I have the pull request in place, I want to verify that the dependency update does not break my consumer code. I do this by triggering a CodeBuild project with the help of EventBridge.

Build the pull request

The ingredients I use are the same as with the CodeArtifact event. I create a rule that matches the event emitted by CodeCommit (limiting it to branches that match the prefix used by our Fargate task). Afterwards I add a target to the rule to start the CodeBuild project:

const onPullRequestCreatedRule = new events.Rule(this, 'PullRequestCreatedRule', {
  eventPattern: {
    source: [ 'aws.codecommit' ],
    detailType: [ 'CodeCommit Pull Request State Change' ],
    resources: [ codeCommitRepo.repositoryArn ],
    detail: {
      event: [ 'pullRequestCreated' ],
      sourceReference: [ {
        prefix: 'refs/heads/library_update_'
      } ],
      destinationReference: [ 'refs/heads/main' ]
    }
  }
});
onPullRequestCreatedRule.addTarget( new targets.CodeBuildProject(codeBuild, {
  event: events.RuleTargetInput.fromObject( {
    projectName: codeBuild.projectName,
    sourceVersion: events.EventField.fromPath('$.detail.sourceReference')
  })
}));

This triggers the build whenever a new pull request is created with a branch prefix of refs/head/library_update_.
You can easily add the build results as a comment back to CodeCommit. For more information, see Validating AWS CodeCommit Pull Requests with AWS CodeBuild and AWS Lambda.

My last step is to notify an SNS topic in in case of a failing build. The SNS topic is a resource in Account A. To target a resource in a different account I need to forward the event to this account’s event bus. From there I then target the SNS topic.

First, I forward the failed build event from Account B into the default event bus of Account A:

const onFailedBuildRule = new events.Rule(this, 'BrokenBuildRule', {
  eventPattern: {
    detailType: [ 'CodeBuild Build State Change' ],
    source: [ 'aws.codebuild' ],
    detail: {
      'build-status': [ 'FAILED' ]
    }
  }
});
const producerAccountTarget = new targets.EventBus(events.EventBus.fromEventBusArn(this, 'cross-account-event-bus', `arn:aws:events:${this.region}:${producerAccount}:event-bus/default`))
onFailedBuildRule.addTarget(producerAccountTarget);

Then I target the SNS topic in Account A to be notified of failures:

const onFailedBuildRule = new events.Rule(this, 'BrokenBuildRule', {
  eventPattern: {
    detailType: [ 'CodeBuild Build State Change' ],
    source: [ 'aws.codebuild' ],
    account: [ consumerAccount ],
    detail: {
      'build-status': [ 'FAILED' ]
    }
  }
});
onFailedBuildRule.addTarget(new targets.SnsTopic(notificationTopic));

See it in action

I use the cdk-assume-role-credential-plugin to deploy to both accounts, producer and consumer, with a single CDK command issued to the producer account. To do this I create roles for cross account access from the producer account in the consumer account as described here. I also make sure that the accounts are bootstrapped for CDK as described here. After that I run the following steps:

  1. Deploy the Stacks:
    cd cdk && cdk deploy --context region=<YOUR_REGION> --context producerAccount=<PRODUCER_ACCOUNT_NO> --context consumerAccount==<CONSUMER_ACCOUNT_NO>  --all && cd -
  2. After a successful deployment CDK prints a set of export commands. I set my environment from those Outputs:
    ❯ export CODEARTIFACT_ACCOUNT=<MY_PRODUCER_ACCOUNT>
    ❯ export CODEARTIFACT_DOMAIN=<MY_CODEARTIFACT_DOMAIN>
    ❯ export CODEARTIFACT_REGION=<MY_REGION>
    ❯ export CODECOMMIT_URL=<MY_CODECOMMIT_URL>
  3. Setup Maven to authenticate to CodeArtifact
    export CODEARTIFACT_TOKEN=$(aws codeartifact get-authorization-token --domain $CODEARTIFACT_DOMAIN --domain-owner $CODEARTIFACT_ACCOUNT --query authorizationToken --output text)
  4. Release the first version of the shared library to CodeArtifact:
    cd library_producer/library && mvn --settings ./settings.xml deploy && cd -
  5. From a console which is authenticated/authorized for CodeCommit in the Consumer Account
    1. Setup git to work with CodeCommit
    2. Push the code of the library consumer to CodeCommit:
      cd library_consumer/library && git init && git add . && git commit -m "Add consumer to codecommit" && git remote add codecommit $CODECOMMIT_URL && git push --set-upstream codecommit main && cd -
  6. Release a new version of the shared library:
    cd library_producer/library && sed -i '' 's/<version>1.0.0/<version>1.0.1/' pom.xml && mvn --settings settings.xml deploy && cd -
  7. After 1-3 minutes a Pull Request is created in the CodeCommit repo in the Consumer Account and a build is run to verify this PR:
    Screenshot of AWS CodeBuild running the build
  8. In case of a build failure, you can create a subscription to the SNS topic in Account A to act upon the broken build.

Clean up

In case you followed along with this blog post and want to prevent incurring costs you have to delete the created resources. Run cdk destroy --context region=<YOUR_REGION> --context producerAccount=<PRODUCER_ACCOUNT_NO> --context consumerAccount==<CONSUMER_ACCOUNT_NO> --all to delete the CloudFormation stacks.

Conclusion

In this post, I automated the manual task of updating a shared library dependency version. I used a workflow that not only updates the dependency version, but also notifies the library producer in case the new artifact introduces a regression (for example, an API incompatibility with an older version). By using Amazon EventBridge I’ve created a loosely coupled solution which can be used as a basis for a feedback loop between library creators and consumers.

What next?

To improve the solution, I suggest to look into possibilities of error handling for the Fargate task. What happens if the git operation fails? How do we signal such a failure? You might want to replace the AWS Fargate portion with a Lambda-only solution and use AWS Step Functions for better error handling.

As a next step, I could think of a solution that automates updates for libraries stored in Maven Central. Wouldn’t it be nice to never miss the release of a new Spring Boot version? A Fargate task run on a schedule and the following code should get you going:

curl -sS 'https://search.maven.org/solrsearch/select?q=g:org.springframework.boot%20a:spring-boot-starter&start=0&rows=1&wt=json' | jq -r '.response.docs[ 0 ].latestVersion'

Happy Building!

Author bio

Picture of the author: Joerg Woehrle Joerg is a Solutions Architect at AWS and works with manufacturing customers in Germany. As a former Developer, DevOps Engineer and SRE he enjoys building and automating things.

 

Developing enterprise application patterns with the AWS CDK

Post Syndicated from Krishnakumar Rengarajan original https://aws.amazon.com/blogs/devops/developing-application-patterns-cdk/

Enterprises often need to standardize their infrastructure as code (IaC) for governance, compliance, and quality control reasons. You also need to manage and centrally publish updates to your IaC libraries. In this post, we demonstrate how to use the AWS Cloud Development Kit (AWS CDK) to define patterns for IaC and publish them for consumption in controlled releases using AWS CodeArtifact.

AWS CDK is an open-source software development framework to model and provision cloud application resources in programming languages such as TypeScript, JavaScript, Python, Java, and C#/.Net. The basic building blocks of AWS CDK are called constructs, which map to one or more AWS resources, and can be composed of other constructs. Constructs allow high-level abstractions to be defined as patterns. You can synthesize constructs into AWS CloudFormation templates and deploy them into an AWS account.

AWS CodeArtifact is a fully managed service for managing the lifecycle of software artifacts. You can use CodeArtifact to securely store, publish, and share software artifacts. Software artifacts are stored in repositories, which are aggregated into a domain. A CodeArtifact domain allows organizational policies to be applied across multiple repositories. You can use CodeArtifact with common build tools and package managers such as NuGet, Maven, Gradle, npm, yarn, pip, and twine.

Solution overview

In this solution, we complete the following steps:

  1. Create two AWS CDK pattern constructs in Typescript: one for traditional three-tier web applications and a second for serverless web applications.
  2. Publish the pattern constructs to CodeArtifact as npm packages. npm is the package manager for Node.js.
  3. Consume the pattern construct npm packages from CodeArtifact and use them to provision the AWS infrastructure.

We provide more information about the pattern constructs in the following sections. The source code mentioned in this blog is available in GitHub.

Note: The code provided in this blog post is for demonstration purposes only. You must ensure that it meets your security and production readiness requirements.

Traditional three-tier web application construct

The first pattern construct is for a traditional three-tier web application running on Amazon Elastic Compute Cloud (Amazon EC2), with AWS resources consisting of Application Load Balancer, an Autoscaling group and EC2 launch configuration, an Amazon Relational Database Service (Amazon RDS) or Amazon Aurora database, and AWS Secrets Manager. The following diagram illustrates this architecture.

 

Traditional stack architecture

Serverless web application construct

The second pattern construct is for a serverless application with AWS resources in AWS Lambda, Amazon API Gateway, and Amazon DynamoDB.

Serverless application architecture

Publishing and consuming pattern constructs

Both constructs are written in Typescript and published to CodeArtifact as npm packages. A semantic versioning scheme is used to version the construct packages. After a package gets published to CodeArtifact, teams can consume them for deploying AWS resources. The following diagram illustrates this architecture.

Pattern constructs

Prerequisites

Before getting started, complete the following steps:

  1. Clone the code from the GitHub repository for the traditional and serverless web application constructs:
    git clone https://github.com/aws-samples/aws-cdk-developing-application-patterns-blog.git
    cd aws-cdk-developing-application-patterns-blog
  2. Configure AWS Identity and Access Management (IAM) permissions by attaching IAM policies to the user, group, or role implementing this solution. The following policy files are in the iam folder in the root of the cloned repo:
    • BlogPublishArtifacts.json – The IAM policy to configure CodeArtifact and publish packages to it.
    • BlogConsumeTraditional.json – The IAM policy to consume the traditional three-tier web application construct from CodeArtifact and deploy it to an AWS account.
    • PublishArtifacts.json – The IAM policy to consume the serverless construct from CodeArtifact and deploy it to an AWS account.

Configuring CodeArtifact

In this step, we configure CodeArtifact for publishing the pattern constructs as npm packages. The following AWS resources are created:

  • A CodeArtifact domain named blog-domain
  • Two CodeArtifact repositories:
    • blog-npm-store – For configuring the upstream NPM repository.
    • blog-repository – For publishing custom packages.

Deploy the CodeArtifact resources with the following code:

cd prerequisites/
rm -rf package-lock.json node_modules
npm install
cdk deploy --require-approval never
cd ..

Log in to the blog-repository. This step is needed for publishing and consuming the npm packages. See the following code:

aws codeartifact login \
     --tool npm \
     --domain blog-domain \
     --domain-owner $(aws sts get-caller-identity --output text --query 'Account') \
     --repository blog-repository

Publishing the pattern constructs

  1. Change the directory to the serverless construct:
    cd serverless
  2. Install the required npm packages:
    rm package-lock.json && rm -rf node_modules
    npm install
    
  3. Build the npm project:
    npm run build
  4. Publish the construct npm package to the CodeArtifact repository:
    npm publish

    Follow the previously mentioned steps for building and publishing a traditional (classic Load Balancer plus Amazon EC2) web app by running these commands in the traditional directory.

    If the publishing is successful, you see messages like the following screenshots. The following screenshot shows the traditional infrastructure.

    Successful publishing of Traditional construct package to CodeArtifact

    The following screenshot shows the message for the serverless infrastructure.

    Successful publishing of Serverless construct package to CodeArtifact

    We just published version 1.0.1 of both the traditional and serverless web app constructs. To release a new version, we can simply update the version attribute in the package.json file in the traditional or serverless folder and repeat the last two steps.

    The following code snippet is for the traditional construct:

    {
        "name": "traditional-infrastructure",
        "main": "lib/index.js",
        "files": [
            "lib/*.js",
            "src"
        ],
        "types": "lib/index.d.ts",
        "version": "1.0.1",
    ...
    }

    The following code snippet is for the serverless construct:

    {
        "name": "serverless-infrastructure",
        "main": "lib/index.js",
        "files": [
            "lib/*.js",
            "src"
        ],
        "types": "lib/index.d.ts",
        "version": "1.0.1",
    ...
    }

Consuming the pattern constructs from CodeArtifact

In this step, we demonstrate how the pattern constructs published in the previous steps can be consumed and used to provision AWS infrastructure.

  1. From the root of the GitHub package, change the directory to the examples directory containing code for consuming traditional or serverless constructs.To consume the traditional construct, use the following code:
    cd examples/traditional

    To consume the serverless construct, use the following code:

    cd examples/serverless
  2. Open the package.json file in either directory and note that the packages and versions we consume are listed in the dependencies section, along with their version.
    The following code shows the traditional web app construct dependencies:

    "dependencies": {
        "@aws-cdk/core": "1.30.0",
        "traditional-infrastructure": "1.0.1",
        "aws-cdk": "1.47.0"
    }

    The following code shows the serverless web app construct dependencies:

    "dependencies": {
        "@aws-cdk/core": "1.30.0",
        "serverless-infrastructure": "1.0.1",
        "aws-cdk": "1.47.0"
    }
  3. Install the pattern artifact npm package along with the dependencies:
    rm package-lock.json && rm -rf node_modules
    npm install
    
  4. As an optional step, if you need to override the default Lambda function code, build the npm project. The following commands build the Lambda function source code:
    cd ../override-serverless
    npm run build
    cd -
  5. Bootstrap the project with the following code:
    cdk bootstrap

    This step is applicable for serverless applications only. It creates the Amazon Simple Storage Service (Amazon S3) staging bucket where the Lambda function code and artifacts are stored.

  6. Deploy the construct:
    cdk deploy --require-approval never

    If the deployment is successful, you see messages similar to the following screenshots. The following screenshot shows the traditional stack output, with the URL of the Load Balancer endpoint.

    Traditional CloudFormation stack outputs

    The following screenshot shows the serverless stack output, with the URL of the API Gateway endpoint.

    Serverless CloudFormation stack outputs

    You can test the endpoint for both constructs using a web browser or the following curl command:

    curl <endpoint output>

    The traditional web app endpoint returns a response similar to the following:

    [{"app": "traditional", "id": 1605186496, "purpose": "blog"}]

    The serverless stack returns two outputs. Use the output named ServerlessStack-v1.Api. See the following code:

    [{"purpose":"blog","app":"serverless","itemId":"1605190688947"}]

  7. Optionally, upgrade to a new version of pattern construct.
    Let’s assume that a new version of the serverless construct, version 1.0.2, has been published, and we want to upgrade our AWS infrastructure to this version. To do this, edit the package.json file and change the traditional-infrastructure or serverless-infrastructure package version in the dependencies section to 1.0.2. See the following code example:

    "dependencies": {
        "@aws-cdk/core": "1.30.0",
        "serverless-infrastructure": "1.0.2",
        "aws-cdk": "1.47.0"
    }

    To update the serverless-infrastructure package to 1.0.2, run the following command:

    npm update

    Then redeploy the CloudFormation stack:

    cdk deploy --require-approval never

Cleaning up

To avoid incurring future charges, clean up the resources you created.

  1. Delete all AWS resources that were created using the pattern constructs. We can use the AWS CDK toolkit to clean up all the resources:
    cdk destroy --force

    For more information about the AWS CDK toolkit, see Toolkit reference. Alternatively, delete the stack on the AWS CloudFormation console.

  2. Delete the CodeArtifact resources by deleting the CloudFormation stack that was deployed via AWS CDK:
    cd prerequisites
    cdk destroy –force
    

Conclusion

In this post, we demonstrated how to publish AWS CDK pattern constructs to CodeArtifact as npm packages. We also showed how teams can consume the published pattern constructs and use them to provision their AWS infrastructure.

This mechanism allows your infrastructure for AWS services to be provisioned from the configuration that has been vetted for quality control and security and governance checks. It also provides control over when new versions of the pattern constructs are released, and when the teams consuming the constructs can upgrade to the newly released versions.

About the Authors

Usman Umar

 

Usman Umar is a Sr. Applications Architect at AWS Professional Services. He is passionate about developing innovative ways to solve hard technical problems for the customers. In his free time, he likes going on biking trails, doing car modifications, and spending time with his family.

 

 

Krishnakumar Rengarajan

 

Krishnakumar Rengarajan is a DevOps Consultant with AWS Professional Services. He enjoys working with customers and focuses on building and delivering automated solutions that enables customers on their AWS cloud journeys.

Continuously building and delivering Maven artifacts to AWS CodeArtifact

Post Syndicated from Vinay Selvaraj original https://aws.amazon.com/blogs/devops/continuously-building-and-delivering-maven-artifacts-to-aws-codeartifact/

Artifact repositories are often used to share software packages for use in builds and deployments. Java developers using Apache Maven use artifact repositories to share and reuse Maven packages. For example, one team might own a web service framework that is used by multiple other teams to build their own services. The framework team can publish the framework as a Maven package to an artifact repository, where new versions can be picked up by the service teams as they become available. This post explains how you can set up a continuous integration pipeline with AWS CodePipeline and AWS CodeBuild to deploy Maven artifacts to AWS CodeArtifact. CodeArtifact is a fully managed pay-as-you-go artifact repository service with support for software package managers and build tools like Maven, Gradle, npm, yarn, twine, and pip.

Solution overview

The pipeline we build is triggered each time a code change is pushed to the AWS CodeCommit repository. The code is compiled using the Java compiler, unit tested, and deployed to CodeArtifact. After the artifact is published, it can be consumed by developers working in applications that have a dependency on the artifact or by builds running in other pipelines. The following diagram illustrates this architecture.

Architecture diagram of the solution

 

All the components in this pipeline are fully managed and you don’t pay for idle capacity or have to manage any servers.

 

Prerequisites

This post assumes you have the following tools installed and configured:

 

Creating your resources

To create the CodeArtifact domain, CodeArtifact repository, CodeCommit, CodePipeline, CodeBuild, and associated resources, we use AWS CloudFormation. Save the provided CloudFormation template below as codeartifact-cicd-pipeline.yaml and create a stack:


---
Description: Code Artifact CI/CD Pipeline

Parameters:
  GitRepoBranchName:
    Type: String
    Default: main

Resources:

  ArtifactBucket:
    Type: AWS::S3::Bucket
  
  CodeArtifactDomain:
    Type: AWS::CodeArtifact::Domain
    Properties:
      DomainName: !Sub "${AWS::StackName}-domain"
  
  CodeArtifactRepository:
    Type: AWS::CodeArtifact::Repository
    Properties:
      DomainName: !GetAtt CodeArtifactDomain.Name
      RepositoryName: !Sub "${AWS::StackName}-repo"

  CodeRepository:
    Type: AWS::CodeCommit::Repository
    Properties:
      RepositoryDescription: Maven artifact code repository
      RepositoryName: !Sub "${AWS::StackName}-maven-artifact-repo"
  
  CodeBuildProject:
    Type: AWS::CodeBuild::Project
    Properties:
      Name: !Sub "${AWS::StackName}-CodeBuild"
      Artifacts:
        Type: CODEPIPELINE
      Environment:
        EnvironmentVariables:
          - Name: CODEARTIFACT_DOMAIN
            Type: PLAINTEXT
            Value: !GetAtt CodeArtifactDomain.Name
          - Name: CODEARTIFACT_REPO
            Type: PLAINTEXT
            Value: !GetAtt CodeArtifactRepository.Name
        Type: LINUX_CONTAINER
        ComputeType: BUILD_GENERAL1_SMALL
        Image: aws/codebuild/amazonlinux2-x86_64-standard:3.0
      ServiceRole: !GetAtt CodeBuildServiceRole.Arn
      Source:
        Type: CODEPIPELINE
        BuildSpec: buildspec.yaml
  
  Pipeline:
    Type: AWS::CodePipeline::Pipeline
    Properties:
      ArtifactStore:
        Type: S3
        Location: !Ref ArtifactBucket
      RoleArn: !GetAtt CodePipelineServiceRole.Arn
      Stages:
      - Name: Source
        Actions:
        - Name: SourceAction
          ActionTypeId:
            Category: Source
            Owner: AWS
            Version: '1'
            Provider: CodeCommit
          OutputArtifacts:
          - Name: SourceBundle
          Configuration:
            BranchName: !Ref GitRepoBranchName
            RepositoryName: !GetAtt CodeRepository.Name
          RunOrder: '1'

      - Name: Deliver
        Actions:
        - Name: CodeBuild
          InputArtifacts:
          - Name: SourceBundle
          ActionTypeId:
            Category: Build
            Owner: AWS
            Version: '1'
            Provider: CodeBuild
          Configuration:
            ProjectName: !Ref CodeBuildProject
          RunOrder: '1'

  CodeBuildServiceRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Sid: ''
          Effect: Allow
          Principal:
            Service:
            - codebuild.amazonaws.com
          Action: sts:AssumeRole
      Policies:
      - PolicyName: CodePipelinePolicy
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Sid: CloudWatchLogsPolicy
            Effect: Allow
            Action:
            - logs:CreateLogGroup
            - logs:CreateLogStream
            - logs:PutLogEvents
            Resource:
            - "*"
          - Sid: CodeCommitPolicy
            Effect: Allow
            Action:
            - codecommit:GitPull
            Resource:
            - !GetAtt CodeRepository.Arn
          - Sid: S3GetObjectPolicy
            Effect: Allow
            Action:
            - s3:GetObject
            - s3:GetObjectVersion
            Resource:
            - !Sub "arn:aws:s3:::${ArtifactBucket}/*"
          - Sid: S3PutObjectPolicy
            Effect: Allow
            Action:
            - s3:PutObject
            Resource:
            - !Sub "arn:aws:s3:::${ArtifactBucket}/*"
          - Sid: BearerTokenPolicy
            Effect: Allow
            Action:
            - sts:GetServiceBearerToken
            Resource: "*"
            Condition:
              StringEquals:
                sts:AWSServiceName: codeartifact.amazonaws.com
          - Sid: CodeArtifactPolicy
            Effect: Allow
            Action:
            - codeartifact:GetAuthorizationToken
            Resource:
            - !Sub "arn:aws:codeartifact:${AWS::Region}:${AWS::AccountId}:domain/${CodeArtifactDomain.Name}"
          - Sid: CodeArtifactPackage
            Effect: Allow
            Action:
            - codeartifact:PublishPackageVersion
            - codeartifact:PutPackageMetadata
            - codeartifact:ReadFromRepository
            Resource:
            - !Sub "arn:aws:codeartifact:${AWS::Region}:${AWS::AccountId}:package/${CodeArtifactDomain.Name}/${CodeArtifactRepository.Name}/*"
          - Sid: CodeArtifactRepository
            Effect: Allow
            Action:
            - codeartifact:ReadFromRepository
            - codeartifact:GetRepositoryEndpoint
            Resource:
            - !Sub "arn:aws:codeartifact:${AWS::Region}:${AWS::AccountId}:repository/${CodeArtifactDomain.Name}/${CodeArtifactRepository.Name}"          

  CodePipelineServiceRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Sid: ''
          Effect: Allow
          Principal:
            Service:
            - codepipeline.amazonaws.com
          Action: sts:AssumeRole
      Policies:
      - PolicyName: CodePipelinePolicy
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Action:
            - s3:GetObject
            - s3:GetObjectVersion
            - s3:GetBucketVersioning
            Resource: !Sub "arn:aws:s3:::${ArtifactBucket}/*"
            Effect: Allow
          - Action:
            - s3:PutObject
            Resource:
            - !Sub "arn:aws:s3:::${ArtifactBucket}/*"
            Effect: Allow
          - Action:
            - codecommit:GetBranch
            - codecommit:GetCommit
            - codecommit:UploadArchive
            - codecommit:GetUploadArchiveStatus
            - codecommit:CancelUploadArchive
            Resource:
              - !GetAtt CodeRepository.Arn
            Effect: Allow
          - Action:
            - codebuild:StartBuild
            - codebuild:BatchGetBuilds
            Resource: 
              - !GetAtt CodeBuildProject.Arn
            Effect: Allow
          - Action:
            - iam:PassRole
            Resource: "*"
            Effect: Allow
Outputs:
  CodePipelineArtifactBucket:
    Value: !Ref ArtifactBucket
  CodeRepositoryHttpCloneUrl:
    Value: !GetAtt CodeRepository.CloneUrlHttp
  CodeRepositorySshCloneUrl:
    Value: !GetAtt CodeRepository.CloneUrlSsh

aws cloudformation deploy                         \
  --stack-name codeartifact-pipeline               \
  --template-file codeartifact-cicd-pipeline.yaml  \
  --capabilities CAPABILITY_IAM

 

If you have a Maven project you want to use, you can use that. Otherwise, create a new one:


mvn archetype:generate        \
  -DgroupId=com.mycompany.app \
  -DartifactId=my-app         \
  -DarchetypeArtifactId=maven-archetype-quickstart \
  -DarchetypeVersion=1.4 -DinteractiveMode=false

 

Initialize a Git repository for the Maven project and add the CodeCommit repository that was created in the CloudFormation stack as a remote repository:


cd my-app
git init
CODECOMMIT_URL=$(aws cloudformation describe-stacks --stack-name codeartifact-pipeline --query "Stacks[0].Outputs[?OutputKey=='CodeRepositoryHttpCloneUrl'].OutputValue" --output text)
git remote add origin $CODECOMMIT_URL

 

Updating the POM file

The Maven project’s POM file needs to be updated with the distribution management section. This lets Maven know where to publish artifacts. Add the distributionManagement section inside the project element of the POM. Be sure to update the URL with the correct URL for the CodeArtifact repository you created earlier. You can find the CodeArtifact repository URL with the get-repository-endpoint CLI command:


aws codeartifact get-repository-endpoint --domain codeartifact-pipeline-domain  --repository codeartifact-pipeline-repo --format maven

 

Add the following to the Maven project’s pom.xml:


<distributionManagement>
  <repository>
    <id>codeartifact</id>
    <name>codeartifact</name>
    <url>Replace with the URL from the get-repository-endpoint command</url>
  </repository>
</distributionManagement>

Creating a settings.xml file

Maven needs credentials to use to authenticate with CodeArtifact when it performs the deployment. CodeArtifact uses temporary authorization tokens. To pass the token to Maven, a settings.xml file is created in the top level of the Maven project. During the deployment stage, Maven is instructed to use the settings.xml in the top level of the project instead of the settings.xml that normally resides in $HOME/.m2. Create a settings.xml in the top level of the Maven project with the following contents:


<settings>
  <servers>
    <server>
      <id>codeartifact</id>
      <username>aws</username>
      <password>${env.CODEARTIFACT_TOKEN}</password>
    </server>
  </servers>
</settings>

Creating the buildspec.yaml file

CodeBuild uses a build specification file with commands and related settings that are used during the build, test, and delivery of the artifact. In the build specification file, we specify the CodeBuild runtime to use pre-build actions (update AWS CLI), and build actions (Maven build, test, and deploy). When Maven is invoked, it is provided the path to the settings.xml created in the previous step, instead of the default in $HOME/.m2/settings.xml. Create the buildspec.yaml as shown in the following code:


version: 0.2

phases:
  install:
    runtime-versions:
      java: corretto11

  pre_build:
    commands:
      - pip3 install awscli --upgrade --user

  build:
    commands:
      - export CODEARTIFACT_TOKEN=`aws codeartifact get-authorization-token --domain ${CODEARTIFACT_DOMAIN} --query authorizationToken --output text`
      - mvn -s settings.xml clean package deploy

 

Running the pipeline

The final step is to add the files in the Maven project to the Git repository and push the changes to CodeCommit. This triggers the pipeline to run. See the following code:


git checkout -b main
git add settings.xml buildspec.yaml pom.xml src
git commit -a -m "Initial commit"
git push --set-upstream origin main

 

Checking the pipeline

At this point, the pipeline starts to run. To check its progress, sign in to the AWS Management Console and choose the Region where you created the pipeline. On the CodePipeline console, open the pipeline that the CloudFormation stack created. The pipeline’s name is prefixed with the stack name. If you open the CodePipeline console before the pipeline is complete, you can watch each stage run (see the following screenshot).

CodePipeline Screenshot

If you see that the pipeline failed, you can choose the details in the action that failed for more information.

Checking for new artifacts published in CodeArtifact

When the pipeline is complete, you should be able to see the artifact in the CodeArtifact repository you created earlier. The artifact we published for this post is a Maven snapshot. CodeArtifact handles snapshots differently than release versions. For more information, see Use Maven snapshots. To find the artifact in CodeArtifact, complete the following steps:

  1. On the CodeArtifact console, choose Repositories.
  2. Choose the repository we created earlier named myrepo.
  3. Search for the package named my-app.
  4. Choose the my-app package from the search results.
    CodeArtifact Assets
  5. Choose the Dependencies tab to bring up a list of Maven dependencies that the Maven project depends on.CodeArtifact Dependencies

 

Cleaning up

To clean up the resources you created in this post, you need to remove them in the following order:


# Empty the CodePipeline S3 artifact bucket
CODEPIPELINE_BUCKET=$(aws cloudformation describe-stacks --stack-name codeartifact-pipeline --query "Stacks[0].Outputs[?OutputKey=='CodePipelineArtifactBucket'].OutputValue" --output text)
aws s3 rm s3://$CODEPIPELINE_BUCKET --recursive

# Delete the CloudFormation stack
aws cloudformation delete-stack --stack-name codeartifact-pipeline

Conclusion

This post covered how to build a continuous integration pipeline to deliver Maven artifacts to AWS CodeArtifact. You can modify this solution for your specific needs. For more information about CodeArtifact or the other services used, see the following:

 

Using NuGet with AWS CodeArtifact

Post Syndicated from John Standish original https://aws.amazon.com/blogs/devops/using-nuget-with-aws-codeartifact/

Managing NuGet packages for .NET development can be a challenge. Tasks such as initial configuration, ongoing maintenance, and scaling inefficiencies are the biggest pain points for developers and organizations. With its addition of NuGet package support, AWS CodeArtifact now provides easy-to-configure and scalable package management for .NET developers. You can use NuGet packages stored in CodeArtifact in Visual Studio, allowing you to use the tools you already know.

In this post, we show how you can provision NuGet repositories in 5 minutes. Then we demonstrate how to consume packages from your new NuGet repositories, all while using .NET native tooling.

All relevant code for this post is available in the aws-codeartifact-samples GitHub repo.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Architecture overview

Two core resource types make up CodeArtifact: domains and repositories. Domains provide an easy way manage multiple repositories within an organization. Repositories store packages and their assets. You can connect repositories to other CodeArtifact repositories, or popular public package repositories such as nuget.org, using upstream and external connections. For more information about these concepts, see AWS CodeArtifact Concepts.

The following diagram illustrates this architecture.

AWS CodeArtifact core concepts

Figure: AWS CodeArtifact core concepts

Creating CodeArtifact resources with AWS CloudFormation

The AWS CloudFormation template provided in this post provisions three CodeArtifact resources: a domain, a team repository, and a shared repository. The team repository is configured to use the shared repository as an upstream repository, and the shared repository has an external connection to nuget.org.

The following diagram illustrates this architecture.

Example AWS CodeArtifact architecture

Figure: Example AWS CodeArtifact architecture

The following CloudFormation template used in this walkthrough:

AWSTemplateFormatVersion: '2010-09-09'
Description: AWS CodeArtifact resources for dotnet

Resources:
  # Create Domain
  ExampleDomain:
    Type: AWS::CodeArtifact::Domain
    Properties:
      DomainName: example-domain
      PermissionsPolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              AWS: 
              - !Sub arn:aws:iam::${AWS::AccountId}:root
            Resource: "*"
            Action:
              - codeartifact:CreateRepository
              - codeartifact:DescribeDomain
              - codeartifact:GetAuthorizationToken
              - codeartifact:GetDomainPermissionsPolicy
              - codeartifact:ListRepositoriesInDomain

  # Create External Repository
  MyExternalRepository:
    Type: AWS::CodeArtifact::Repository
    Condition: ProvisionNugetTeamAndUpstream
    Properties:
      DomainName: !GetAtt ExampleDomain.Name
      RepositoryName: my-external-repository       
      ExternalConnections:
        - public:nuget-org
      PermissionsPolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              AWS: 
              - !Sub arn:aws:iam::${AWS::AccountId}:root
            Resource: "*"
            Action:
              - codeartifact:DescribePackageVersion
              - codeartifact:DescribeRepository
              - codeartifact:GetPackageVersionReadme
              - codeartifact:GetRepositoryEndpoint
              - codeartifact:ListPackageVersionAssets
              - codeartifact:ListPackageVersionDependencies
              - codeartifact:ListPackageVersions
              - codeartifact:ListPackages
              - codeartifact:PublishPackageVersion
              - codeartifact:PutPackageMetadata
              - codeartifact:ReadFromRepository

  # Create Repository
  MyTeamRepository:
    Type: AWS::CodeArtifact::Repository
    Properties:
      DomainName: !GetAtt ExampleDomain.Name
      RepositoryName: my-team-repository
      Upstreams:
        - !GetAtt MyExternalRepository.Name
      PermissionsPolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              AWS: 
              - !Sub arn:aws:iam::${AWS::AccountId}:root
            Resource: "*"
            Action:
              - codeartifact:DescribePackageVersion
              - codeartifact:DescribeRepository
              - codeartifact:GetPackageVersionReadme
              - codeartifact:GetRepositoryEndpoint
              - codeartifact:ListPackageVersionAssets
              - codeartifact:ListPackageVersionDependencies
              - codeartifact:ListPackageVersions
              - codeartifact:ListPackages
              - codeartifact:PublishPackageVersion
              - codeartifact:PutPackageMetadata
              - codeartifact:ReadFromRepository

Getting the CloudFormation template

To use the CloudFormation stack, we recommend you clone the following GitHub repo so you also have access to the example projects. See the following code:

git clone https://github.com/aws-samples/aws-codeartifact-samples.git
cd aws-codeartifact-samples/getting-started/dotnet/cloudformation/

Alternatively, you can copy the previous template into a file on your local filesystem named deploy.yml.

Provisioning the CloudFormation stack

Now that you have a local copy of the template, you need to provision the resources using a CloudFormation stack. You can deploy the stack using the AWS CLI or on the AWS CloudFormation console.

To use the AWS CLI, enter the following code:

aws cloudformation deploy \
--template-file deploy.yml \
--region <YOUR_PREFERRED_REGION> \
--stack-name CodeArtifact-GettingStarted-DotNet

To use the AWS CloudFormation console, complete the following steps:

  1. On the AWS CloudFormation console, choose Create stack.
  2. Choose With new resources (standard).
  3. Select Upload a template file.
  4. Choose Choose file.
  5. Name the stack CodeArtifact-GettingStarted-DotNet.
  6. Continue to choose Next until prompted to create the stack.

Configuring your local development experience

We use the CodeArtifact credential provider to connect the Visual Studio IDE to a CodeArtifact repository. You need to download and install the AWS Toolkit for Visual Studio to configure the credential provider. The toolkit is an extension for Microsoft Visual Studio on Microsoft Windows that makes it easy to develop, debug, and deploy .NET applications to AWS. The credential provider automates fetching and refreshing the authentication token required to pull packages from CodeArtifact. For more information about the authentication process, see AWS CodeArtifact authentication and tokens.

To connect to a repository, you complete the following steps:

  1. Configure an account profile in the AWS Toolkit.
  2. Copy the source endpoint from the AWS Explorer.
  3. Set the NuGet package source as the source endpoint.
  4. Add packages for your project via your CodeArtifact repository.

Configuring an account profile in the AWS Toolkit

Before you can use the Toolkit for Visual Studio, you must provide a set of valid AWS credentials. In this step, we set up a profile that has access to interact with CodeArtifact. For instructions, see Providing AWS Credentials.

Visual Studio Toolkit for AWS Account Profile Setup

Figure: Visual Studio Toolkit for AWS Account Profile Setup

Copying the NuGet source endpoint

After you set up your profile, you can see your provisioned repositories.

  1. In the AWS Explorer pane, navigate to the repository you want to connect to.
  2. Choose your repository (right-click).
  3. Choose Copy NuGet Source Endpoint.
AWS CodeArtifact repositories shown in the AWS Explorer

Figure: AWS CodeArtifact repositories shown in the AWS Explorer

 

You use the source endpoint later to configure your NuGet package sources.

Setting the package source using the source endpoint

Now that you have your source endpoint, you can set up the NuGet package source.

  1. In Visual Studio, under Tools, choose Options.
  2. Choose NuGet Package Manager.
  3. Under Options, choose the + icon to add a package source.
  4. For Name , enter codeartifact.
  5. For Source, enter the source endpoint you copied from the previous step.
Configuring Nuget package sources for AWS CodeArtifact

Figure: Configuring NuGet package sources for AWS CodeArtifact

 

Adding packages via your CodeArtifact repository

After the package source is configured against your team repository, you can pull packages via the upstream connection to the shared repository.

  1. Choose Manage NuGet Packages for your project.
    • You can now see packages from nuget.org.
  2. Choose any package to add it to your project.
Exploring packages while connected to a AWS CodeArtifact repository

Exploring packages while connected to a AWS CodeArtifact repository

Viewing packages stored in your CodeArtifact team repository

Packages are stored in a repository you pull from, or referenced via the upstream connection. Because we’re pulling packages from nuget.org through an external connection, you can see cached copies of those packages in your repository. To view the packages, navigate to your repository on the CodeArtifact console.

Packages stored in a AWS CodeArtifact repository

Packages stored in a AWS CodeArtifact repository

Cleaning Up

When you’re finished with this walkthrough, you may want to remove any provisioned resources. To remove the resources that the CloudFormation template created, navigate to the stack on the AWS CloudFormation console and choose Delete Stack. It may take a few minutes to delete all provisioned resources.

After the resources are deleted, there are no more cleanup steps.

Conclusion

We have shown you how to set up CodeArtifact in minutes and easily integrate it with NuGet. You can build and push your package faster, from hours or days to minutes. You can also integrate CodeArtifact directly in your Visual Studio environment with four simple steps. With CodeArtifact repositories, you inherit the durability and security posture from the underlying storage of CodeArtifact for your packages.

As of November 2020, CodeArtifact is available in the following AWS Regions:

  • US: US East (Ohio), US East (N. Virginia), US West (Oregon)
  • AP: Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo)
  • EU: Europe (Frankfurt), Europe (Ireland), Europe (Stockholm)

For an up-to-date list of Regions where CodeArtifact is available, see AWS CodeArtifact FAQ.

About the Authors

John Standish

John Standish is a Solutions Architect at AWS and spent over 13 years as a Microsoft .Net developer. Outside of work, he enjoys playing video games, cooking, and watching hockey.

Nuatu Tseggai

Nuatu Tseggai is a Cloud Infrastructure Architect at Amazon Web Services. He enjoys working with customers to design and build event-driven distributed systems that span multiple services.

Neha Gupta

Neha Gupta is a Solutions Architect at AWS and have 16 years of experience as a Database architect/ DBA. Apart from work, she’s outdoorsy and loves to dance.

Elijah Batkoski

Elijah is a Technical Writer for Amazon Web Services. Elijah has produced technical documentation and blogs for a variety of tools and services, primarily focused around DevOps.

Publishing private npm packages with AWS CodeArtifact

Post Syndicated from Ryan Sonshine original https://aws.amazon.com/blogs/devops/publishing-private-npm-packages-aws-codeartifact/

This post demonstrates how to create, publish, and download private npm packages using AWS CodeArtifact, allowing you to share code across your organization without exposing your packages to the public.

The ability to control CodeArtifact repository access using AWS Identity and Access Management (IAM) removes the need to manage additional credentials for a private npm repository when developers already have IAM roles configured.

You can use private npm packages for a variety of use cases, such as:

  • Reducing code duplication
  • Configuration such as code linting and styling
  • CLI tools for internal processes

This post shows how to easily create a sample project in which we publish an npm package and install the package from CodeArtifact. For more information about pipeline integration, see AWS CodeArtifact and your package management flow – Best Practices for Integration.

Solution overview

The following diagram illustrates this solution.

Diagram showing npm package publish and install with CodeArtifact

In this post, you create a private scoped npm package containing a sample function that can be used across your organization. You create a second project to download the npm package. You also learn how to structure your npm package to make logging in to CodeArtifact automatic when you want to build or publish the package.

The code covered in this post is available on GitHub:

Prerequisites

Before you begin, you need to complete the following:

  1. Create an AWS account.
  2. Install the AWS Command Line Interface (AWS CLI). CodeArtifact is supported in these CLI versions:
    1. 18.83 or later: install the AWS CLI version 1
    2. 0.54 or later: install the AWS CLI version 2
  3. Create a CodeArtifact repository.
  4. Add required IAM permissions for CodeArtifact.

Creating your npm package

You can create your npm package in three easy steps: set up the project, create your npm script for authenticating with CodeArtifact, and publish the package.

Setting up your project

Create a directory for your new npm package. We name this directory my-package because it serves as the name of the package. We use an npm scope for this package, where @myorg represents the scope all of our organization’s packages are published under. This helps us distinguish our internal private package from external packages. See the following code:

npm init --scope=@myorg -y

{
  "name": "@myorg/my-package",
  "version": "1.0.0",
  "description": "A sample private scoped npm package",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  }
}

The package.json file specifies that the main file of the package is called index.js. Next, we create that file and add our package function to it:

module.exports.helloWorld = function() {
  console.log('Hello world!');
}

Creating an npm script

To create your npm script, complete the following steps:

  1. On the CodeArtifact console, choose the repository you created as part of the prerequisites.

If you haven’t created a repository, create one before proceeding.

CodeArtifact repository details console

  1. Select your CodeArtifact repository and choose Details to view the additional details for your repository.

We use two items from this page:

  • Repository name (my-repo)
  • Domain (my-domain)
  1. Create a script named co:login in our package.json. The package.json contains the following code:
{
  "name": "@myorg/my-package",
  "version": "1.0.0",
  "description": "A sample private scoped npm package",
  "main": "index.js",
  "scripts": {
    "co:login": "aws codeartifact login --tool npm --repository my-repo --domain my-domain",
    "test": "echo \"Error: no test specified\" && exit 1"
  }
}

Running this script updates your npm configuration to use your CodeArtifact repository and sets your authentication token, which expires after 12 hours.

  1. To test our new script, enter the following command:

npm run co:login

The following code is the output:

> aws codeartifact login --tool npm --repository my-repo --domain my-domain
Successfully configured npm to use AWS CodeArtifact repository https://my-domain-<ACCOUNT ID>.d.codeartifact.us-east-1.amazonaws.com/npm/my-repo/
Login expires in 12 hours at 2020-09-04 02:16:17-04:00
  1. Add a prepare script to our package.json to run our login command:
{
  "name": "@myorg/my-package",
  "version": "1.0.0",
  "description": "A sample private scoped npm package",
  "main": "index.js",
  "scripts": {
    "prepare": "npm run co:login",
    "co:login": "aws codeartifact login --tool npm --repository my-repo --domain my-domain",
    "test": "echo \"Error: no test specified\" && exit 1"
  }
}

This configures our project to automatically authenticate and generate an access token anytime npm install or npm publish run on the project.

If you see an error containing Invalid choice, valid choices are:, you need to update the AWS CLI according to the versions listed in the perquisites of this post.

Publishing your package

To publish our new package for the first time, run npm publish.

The following screenshot shows the output.

Terminal showing npm publish output

If we navigate to our CodeArtifact repository on the CodeArtifact console, we now see our new private npm package ready to be downloaded.

CodeArtifact console showing published npm package

Installing your private npm package

To install your private npm package, you first set up the project and add the CodeArtifact configs. After you install your package, it’s ready to use.

Setting up your project

Create a directory for a new application and name it my-app. This is a sample project to download our private npm package published in the previous step. You can apply this pattern to all repositories you intend on installing your organization’s npm packages in.

npm init -y

{
  "name": "my-app",
  "version": "1.0.0",
  "description": "A sample application consuming a private scoped npm package",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  }
}

Adding CodeArtifact configs

Copy the npm scripts prepare and co:login created earlier to your new project:

{
  "name": "my-app",
  "version": "1.0.0",
  "description": "A sample application consuming a private scoped npm package",
  "main": "index.js",
  "scripts": {
    "prepare": "npm run co:login",
    "co:login": "aws codeartifact login --tool npm --repository my-repo --domain my-domain",
    "test": "echo \"Error: no test specified\" && exit 1"
  }
}

Installing your new private npm package

Enter the following command:

npm install @myorg/my-package

Your package.json should now list @myorg/my-package in your dependencies:

{
  "name": "my-app",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "prepare": "npm run co:login",
    "co:login": "aws codeartifact login --tool npm --repository my-repo --domain my-domain",
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "dependencies": {
    "@myorg/my-package": "^1.0.0"
  }
}

Using your new npm package

In our my-app application, create a file named index.js to run code from our package containing the following:

const { helloWorld } = require('@myorg/my-package');

helloWorld();

Run node index.js in your terminal to see the console print the message from our @myorg/my-package helloWorld function.

Cleaning Up

If you created a CodeArtifact repository for the purposes of this post, use one of the following methods to delete the repository:

Remove the changes made to your user profile’s npm configuration by running npm config delete registry, this will remove the CodeArtifact repository from being set as your default npm registry.

Conclusion

In this post, you successfully published a private scoped npm package stored in CodeArtifact, which you can reuse across multiple teams and projects within your organization. You can use npm scripts to streamline the authentication process and apply this pattern to save time.

About the Author

Ryan Sonshine

Ryan Sonshine is a Cloud Application Architect at Amazon Web Services. He works with customers to drive digital transformations while helping them architect, automate, and re-engineer solutions to fully leverage the AWS Cloud.

 

 

Integrating Jenkins with AWS CodeArtifact to publish and consume Python artifacts

Post Syndicated from Matt Ulinski original https://aws.amazon.com/blogs/devops/using-jenkins-with-codeartifact/

Python packages are used to share and reuse code across projects. Centralized artifact storage allows sharing versioned artifacts across an organization. This post explains how you can set up two Jenkins projects. The first project builds the Python package and publishes it to AWS CodeArtifact using twine (Python utility for publishing packages), and the second project consumes the package using pip and deploys an application to AWS Fargate.

Solution overview

The following diagram illustrates this architecture.

Architecture Diagram

 

The solution consists of two GitHub repositories and two Jenkins projects. The first repository contains the source code of a Python package. Jenkins builds this package and publishes it to a CodeArtifact repository.

The second repository contains the source code of a Python Flask application that has a dependency on the package produced by the first repository. Jenkins builds a Docker image containing the application and its dependencies, pushes the image to an Amazon Elastic Container Registry (Amazon ECR) registry, and deploys it to AWS Fargate using AWS CloudFormation.

Prerequisites

For this walkthrough, you should have the following prerequisites:

To create a new Jenkins server that includes the required dependencies, complete the following steps:

  1. Launch a CloudFormation stack with the following link:
    Launch CloudFormation stack
  2. Choose Next.
  3. Enter the name for your stack.
  4. Select the Amazon Elastic Compute Cloud (Amazon EC2) instance type for your Jenkins server.
  5. Select the subnet and corresponding VPC.
  6. Choose Next.
  7. Scroll down to the bottom of the page and choose Next.
  8. Review the stack configuration and choose Create stack.

AWS CloudFormation creates the following resources:

  • JenkinsInstance – Amazon EC2 instance that Jenkins and its dependencies is installed on
  • JenkinsWaitCondition – CloudFormation wait condition that waits for Jenkins to be fully installed before finishing the deployment
  • JenkinsSecurityGroup – Security group attached to the EC2 instance that allows inbound traffic on port 8080

The stack takes a few minutes to deploy. When it’s fully deployed, you can find the URL and initial password for Jenkins on the Outputs tab of the stack.

CloudFormation outputs tab

Use the initial password to unlock the Jenkins installation, then follow the setup wizard to install the suggested plugins and create a new Jenkins user. After the user is created, the initial password no longer works.

On the Jenkins homepage, complete the following steps:

  1. Choose Manage Jenkins.
  2. Choose Manage Plugins.
  3. On the Available tab, search for “Docker Pipeline” and select it.
    Jenkins plugins available tab
  4. Choose Download now and install after restart.
  5. Select Restart Jenkins when installation is complete and no jobs are running.

Jenkins plugins installation complete

Jenkins is ready to use after it restarts. Log in with the user you created with the setup wizard.

Setting up a CodeArtifact repository

To get started, create a CodeArtifact repository to store the Python packages.

  1. On the CodeArtifact console, choose Create repository.
  2. For Repository name, enter a name (for this post, I use my-repository).
  3. For Public upstream repositories, choose pypi-store.
  4. Choose Next.
    AWS CodeArtifact repository wizard
  5. Choose This AWS account.
  6. If you already have a CodeArtifact domain, choose it from the drop-down menu. If you don’t already have a CodeArtifact domain, choose a name for your domain and the console creates it for you. For this post, I named my domain my-domain.
  7. Choose Next.
  8. Review the repository details and choose Create repository.
    CodeArtifact repository overview

You now have a CodeArtifact repository created, which you use to store and retrieve Python packages used by the application.

Configuring Jenkins: Creating an IAM user

  1. On the IAM console, choose User.
  2. Choose Add user.
  3. Enter a name for the user (for this post, I used the name Jenkins).
  4. Select Programmatic access as the access type.
  5. Choose Next: Permissions.
  6. Select Attach existing policies directly.
  7. Choose the following policies:
    1. AmazonEC2ContainerRegistryPowerUser – Allows Jenkins to push Docker images to ECR.
    2. AmazonECS_FullAccess – Allows Jenkins to deploy your application to AWS Fargate.
    3. AWSCloudFormationFullAccess – Allows Jenkins to update the CloudFormation stack.
    4. AWSCodeArtifactAdminAccessAllows Jenkins access to the CodeArtifact repository.
  8. Choose Next: Tags.
  9. Choose Next: Review.
  10. Review the configuration and choose Create user.
  11. Record the Access key ID and Secret access key; you need them to configure Jenkins.

Configuring Jenkins: Adding credentials

After you create your IAM user, you need to set up the credentials in Jenkins.

  1. Open Jenkins.
  2. From the left pane, choose Manage Jenkins
  3. Choose Manage Credentials.
  4. Hover over the (global) domain and expand the drop-down menu.
  5. Choose Add credentials.
    Jenkins credentials
  6. Enter the following credentials:
    1. Kind – User name with password.
    2. Scope – Global (Jenkins, nodes, items, all child items).
    3. Username – Enter the Access key ID for the Jenkins IAM user.
    4. Password – Enter the Secret access key for the Jenkins IAM user.
    5. ID – Name for the credentials (for this post, I used AWS).
  7. Choose OK.

You use the credentials to make API calls to AWS as part of the builds.

Publishing a Python package

To publish your Python package, complete the following steps:

  1. Create a new GitHub repo to store the source of the sample package.
  2. Clone the sample GitHub repo onto your local machine.
  3. Navigate to the package_src directory.
  4. Place its contents in your GitHub repo.
    Package repository contents

When your GitHub repo is populated with the sample package, you can create the first Jenkins project.

  1. On the Jenkins homepage, choose New Item.
  2. Enter a name for the project; for example, producer.
  3. Choose Freestyle project.
  4. Choose OK.
    Jenkins new project wizard
  5. In the Source Code Management section, choose Git.
  6. Enter the HTTP clone URL of your GitHub repo into the Repository URL
  7. To make sure that the workspace is clean before each build, under Additional Behaviors, choose Add and select Clean before checkout.
    Jenkins source code managnment
  8. To have builds start automatically when a change occurs in the repository, under Build Triggers, select Poll SCM and enter * * * * * in the Schedule
    Jenkins build triggers
  9. In the Build Environment section, select Use secret text(s) or file(s).
  10. Choose Add and choose Username and password (separated).
  11. Enter the following information:
    1. UsernameAWS_ACCESS_KEY_ID
    2. PasswordAWS_SECRET_ACCESS_KEY
    3. Credentials – Select Specific Credentials and from the drop-down menu and choose the previously created credentials.
      Jenkins credential binding
  12. In the Build section, choose Add build step.
  13. Choose Execute shell.
  14. Enter the following command and replace my-domain, my-repository, and my-region with the name of your CodeArtifact domain, repository, and Region:
    python3 setup.py sdist bdist_wheel
    aws codeartifact login --tool twine --domain my-domain --repository my-repository --region my-region
    python3 -m twine upload dist/* --repository codeartifact

    These commands do the following:

    • Build the Python package
    • Run the aws codeartifact login AWS Command Line Interface (AWS CLI) command, which retrieves the access token for CodeArtifact and configures the twine client
    • Use twine to publish the Python package to CodeArtifact
  15. Choose Save.
  16. Start a new build by choosing Build Now in the left pane.After a build starts, it shows in the Build History on the left pane. To view the build’s details, choose the build’s ID number.
    Jenkins project builds
  17. To view the results of the run commands, from the build details page, choose Console Output.
  18. To see that the package has been successfully published, check the CodeArtifact repository on the console.
    CodeArtifact console showing package

When a change is pushed to the repo, Jenkins will start a new build and attempt to publish the package. CodeArtifact will prevent publishing duplicates of the same package version, failing the Jenkins build.

If you want to publish a new version of the package, you will need to increment the version number.

The sample package uses semantic versioning (major.minor.maintenance), to change the version number modify the version='1.0.0' value in the setup.py file. You can do this manually before pushing any changes to the repo, or automatically as part of the build process by using the python-semantic-release package, or a similar solution.

Consuming a package and deploying an application

After you have a package published, you can use it in an application.

  1. Create a new GitHub repo for this application.
  2. Populate it with the contents of the application_src directory from the sample repo.
    Sample application repository

The version of the sample package used by the application is defined in the requirements.txt file. If you have published a new version of the package and want the application to use it modify the fantastic-ascii==1.0.0 value in this file.

After the repository created, you need to deploy the CloudFormation template application.yml. The template creates the following resources:

  • ECRRepository – Amazon ECR repository to store your Docker image.
  • ClusterAmazon Elastic Container Service (Amazon ECS) cluster that contains the service of your application.
  • TaskDefinition – ECS task definition that defines how your Docker image is deployed.
  • ExecutionRole – IAM role that Amazon ECS uses to pull the Docker image.
  • TaskRole – IAM role provided to the ECS task.
  • ContainerSecurityGroup – Security group that allows outbound traffic to ports 8080 and 80.
  • Service – Amazon ECS service that launches and manages your Docker containers.
  • TargetGroup – Target group used by the Load Balancer to send traffic to Docker containers.
  • Listener – Load Balancer Listener that listens for incoming traffic on port 80.
  • LoadBalancer – Load Balancer that sends traffic to the ECS task.
  1. Choose the following link to create the application’s CloudFormation stack:
    Launch CloudFormation stack
  2. Choose Next.
  3. Enter the following parameters:
    1. Stack name – Name for the CloudFormation stack. For this post, I use the name Consumer.
    2. Container Name – Name for your application (for this post, I use application).
    3. Image Tag – Leave this field blank. Jenkins populates it when you deploy the application.
    4. VPC – Choose a VPC in your account that contains two public subnets.
    5. SubnetA – Choose a public subnet from the previously chosen VPC.
    6. SubnetB – Choose a public subnet from the previously chosen VPC.
  4. Choose Next.
  5. Scroll down to the bottom of the page and choose Next.
  6. Review the configuration of the stack.
  7. Acknowledge the IAM resources warning to allow CloudFormation to create the TaskRole IAM role.
  8. Choose Create Stack.

After the stack is created, the Outputs tab contains information you can use to configure the Jenkins project.

Application stack outputs tab

To access the sample application, choose the ApplicationUrl link. Because the application has not yet been deployed, you receive an error message.

You can now create the second Jenkins project, which uses a configured through a Jenkinsfile stored in the source repository. The Jenkinsfile defines the steps that the build takes to build and deploy a Docker image containing your application.

The Jenkinsfile included in the sample instructs Jenkins to perform these steps:

  1. Get the authorization token for CodeArtifact:
    withCredentials([usernamePassword(
        credentialsId: CREDENTIALS_ID,
        passwordVariable: 'AWS_SECRET_ACCESS_KEY',
        usernameVariable: 'AWS_ACCESS_KEY_ID'
    )]) {
        authToken = sh(
                returnStdout: true,
                script: 'aws codeartifact get-authorization-token \
                --domain $AWS_CA_DOMAIN \
                --query authorizationToken \
                --output text \
                --duration-seconds 900'
        ).trim()
    }

  2. Start a Docker build and pass the authorization token as an argument to the build:
    sh ("""
        set +x
        docker build -t $CONTAINER_NAME:$BUILD_NUMBER \
        --build-arg CODEARTIFACT_TOKEN='$authToken' \
        --build-arg DOMAIN=$AWS_CA_DOMAIN-$AWS_ACCOUNT_ID \
        --build-arg REGION=$AWS_REGION \
        --build-arg REPO=$AWS_CA_REPO .
    """)

  3. Inside of Docker, the passed argument is used to configure pip to use CodeArtifact:
    RUN pip config set global.index-url "https://aws:$CODEARTIFACT_TOKEN@$DOMAIN.d.codeartifact.$REGION.amazonaws.com/pypi/$REPO/simple/"
    RUN pip install -r requirements.txt

  4. Test the image by starting a container and performing a simple GET request.
  5. Log in to the Amazon ECR repository and push the Docker image.
  6. Update the CloudFormation template and start a deployment of the application.

Look at the Jenkinsfile and Dockerfile in your repository to review the exact commands being used, then take the following steps to setup the second Jenkins projects:

  1. Change the variables defined in the environment section at the top of the Jenkinsfile:
    environment {
        AWS_ACCOUNT_ID = 'Your AWS Account ID'
        AWS_REGION = 'Region you used for this project'
        AWS_CA_DOMAIN = 'Name of your CodeArtifact domain'
        AWS_CA_REPO = 'Name of your CodeArtifact repository'
        AWS_STACK_NAME = 'Name of the CloudFormation stack'
        CONTAINER_NAME = 'Container name provided to CloudFormation'
        CREDENTIALS_ID = 'Jenkins credentials ID
    }
  2. Commit the changes to the GitHub repo.
  3. To create a new Jenkins project, on the Jenkins homepage, choose New Item.
  4. Enter a name for the project, for example, Consumer.
  5. Choose Pipeline.
  6. Choose OK.
    Jenkins pipeline wizard
  7. To have a new build start automatically when a change is detected in the repository, under Build Triggers, select Poll SCM and enter * * * * * in the Schedule field.
    Jenkins source polling configuration
  8. In the Pipeline section, choose Pipeline script from SCM from the Definition drop-down menu.
  9. Choose Git for the SCM
  10. Enter the HTTP clone URL of your GitHub repo into the Repository URL
  11. To make sure that your workspace is clean before each build, under Additional Behaviors, choose Add and select Clean before checkout.
    Jenkins source configuration
  12. Choose Save.

The Jenkins project is now ready. To start a new job, choose Build Now from the navigation pane. You see a visualization of the pipeline as it moves through the various stages, gathering the dependencies and deploying your application.

Jenkins application pipeline visualization

When the Deploy to ECS stage of the pipeline is complete, you can choose ApplicationUrl on the Outputs tab of the CloudFormation stack. You see a simple webpage that uses the Python package to display the current time.

Deployed application displaying in browser

Cleaning up

To avoid incurring future charges, delete the resources created in this post.

To empty the Amazon ECR repository:

  1. Open the application’s CloudFormation stack.
  2. On the Resources tab, choose the link next to the ECRRepository
  3. Select the check-box next to each of the images in the repository.
  4. Choose Delete.
  5. Confirm the deletion.

To delete the CloudFormation stacks:

  1. On the AWS CloudFormation console, select the application stack you deployed earlier.
  2. Choose Delete.
  3. Confirm the deletion.

If you created a Jenkins as part of this post, select the Jenkins stack and delete it.

To delete the CodeArtifact repository:

  1. On the CodeArtifact console, navigate to the repository you created.
  2. Choose Delete.
  3. Confirm the deletion.

If you’re not using the CodeArtifact domain for other repositories, you should follow the previous steps to delete the pypi-store repository, because it contains the public packages that were used by the application, then delete the CodeArtifact domain:

  1. On the CodeArtifact console, navigate to the domain you created.
  2. Choose Delete.
  3. Confirm the deletion.

Conclusion

In this post I showed how you can use Jenkins to publish and consume a Python package with Jenkins and CodeArtifact. I walked you through creating two Jenkins projects, a Jenkins freestyle project that built a package and published it to CodeArtifact, and a Jenkins pipeline project that built a Docker image that used the package in an application that was deployed to AWS Fargate.

About the author

Matt Ulinski is a Cloud Support Engineer with Amazon Web Services.