All posts by Bryant Bost

Use AWS Lambda authorizers with a third-party identity provider to secure Amazon API Gateway REST APIs

Post Syndicated from Bryant Bost original https://aws.amazon.com/blogs/security/use-aws-lambda-authorizers-with-a-third-party-identity-provider-to-secure-amazon-api-gateway-rest-apis/

Note: This post focuses on Amazon API Gateway REST APIs used with OAuth 2.0 and custom AWS Lambda authorizers. API Gateway also offers HTTP APIs, which provide native OAuth 2.0 features. For more information about which is right for your organization, see Choosing Between HTTP APIs and REST APIs.

Amazon API Gateway is a fully managed AWS service that simplifies the process of creating and managing REST APIs at any scale. If you are new to API Gateway, check out Amazon API Gateway Getting Started to get familiar with core concepts and terminology. In this post, I will demonstrate how an organization using a third-party identity provider can use AWS Lambda authorizers to implement a standard token-based authorization scheme for REST APIs that are deployed using API Gateway.

In the context of this post, a third-party identity provider refers to an entity that exists outside of AWS and that creates, manages, and maintains identity information for your organization. This identity provider issues cryptographically signed tokens to users containing information about the user identity and their permissions. In order to use these non-AWS tokens to control access to resources within API Gateway, you will need to define custom authorization code using a Lambda function to “map” token characteristics to API Gateway resources and permissions.

Defining custom authorization code is not the only way to implement authorization in API Gateway and ensure resources can only be accessed by the correct users. In addition to Lambda authorizers, API Gateway offers several “native” options that use existing AWS services to control resource access and do not require any custom code. To learn more about the established practices and authorization mechanisms, see Controlling and Managing Access to a REST API in API Gateway.

Lambda authorizers are a good choice for organizations that use third-party identity providers directly (without federation) to control access to resources in API Gateway, or organizations requiring authorization logic beyond the capabilities offered by “native” authorization mechanisms.

Benefits of using third-party tokens with API Gateway

Using a Lambda authorizer with third-party tokens in API Gateway can provide the following benefits:

  • Integration of third-party identity provider with API Gateway: If your organization has already adopted a third-party identity provider, building a Lambda authorizer allows users to access API Gateway resources by using their third-party credentials without having to configure additional services, such as Amazon Cognito. This can be particularly useful if your organization is using the third-party identity provider for single sign-on (SSO).
  • Minimal impact to client applications: If your organization has an application that is already configured to sign in to a third-party identity provider and issue requests using tokens, then minimal changes will be required to use this solution with API Gateway and a Lambda authorizer. By using credentials from your existing identity provider, you can integrate API Gateway resources into your application in the same manner that non-AWS resources are integrated.
  • Flexibility of authorization logic: Lambda authorizers allow for the additional customization of authorization logic, beyond validation and inspection of tokens.

Solution overview

The following diagram shows the authentication/authorization flow for using third-party tokens in API Gateway:

Figure 1: Example Solution Architecture

Figure 1: Example Solution Architecture

  1. After a successful login, the third-party identity provider issues an access token to a client.
  2. The client issues an HTTP request to API Gateway and includes the access token in the HTTP Authorization header.
  3. The API Gateway resource forwards the token to the Lambda authorizer.
  4. The Lambda authorizer authenticates the token with the third-party identity provider.
  5. The Lambda authorizer executes the authorization logic and creates an identity management policy.
  6. API Gateway evaluates the identity management policy against the API Gateway resource that the user requested and either allows or denies the request. If allowed, API Gateway forwards the user request to the API Gateway resource.

Prerequisites

To build the architecture described in the solution overview, you will need the following:

  • An identity provider: Lambda authorizers can work with any type of identity provider and token format. The post uses a generic OAuth 2.0 identity provider and JSON Web Tokens (JWT).
  • An API Gateway REST API: You will eventually configure this REST API to rely on the Lambda authorizer for access control.
  • A means of retrieving tokens from your identity provider and calling API Gateway resources: This can be a web application, a mobile application, or any application that relies on tokens for accessing API resources.

For the REST API in this example, I use API Gateway with a mock integration. To create this API yourself, you can follow the walkthrough in Create a REST API with a Mock Integration in Amazon API Gateway.

You can use any type of client to retrieve tokens from your identity provider and issue requests to API Gateway, or you can consult the documentation for your identity provider to see if you can retrieve tokens directly and issue requests using a third-party tool such as Postman.

Before you proceed to building the Lambda authorizer, you should be able to retrieve tokens from your identity provider and issue HTTP requests to your API Gateway resource with the token included in the HTTP Authorization header. This post assumes that the identity provider issues OAuth JWT tokens, and the example below shows a raw HTTP request addressed to the mock API Gateway resource with an OAuth JWT access token in the HTTP Authorization header. This request should be sent by the client application that you are using to retrieve your tokens and issue HTTP requests to the mock API Gateway resource.


# Example HTTP Request using a Bearer token\
GET /dev/my-resource/?myParam=myValue HTTP/1.1\
Host: rz8w6b1ik2.execute-api.us-east-1.amazonaws.com\
Authorization: Bearer eyJraWQiOiJ0ekgtb1Z5eEpPSF82UDk3...}

Building a Lambda authorizer

When you configure a Lambda authorizer to serve as the authorization source for an API Gateway resource, the Lambda authorizer is invoked by API Gateway before the resource is called. Check out the Lambda Authorizer Authorization Workflow for more details on how API Gateway invokes and exchanges information with Lambda authorizers. The core functionality of the Lambda authorizer is to generate a well-formed identity management policy that dictates the allowed actions of the user, such as which APIs the user can access. The Lambda authorizer will use information in the third-party token to create the identity management policy based on “permissions mapping” documents that you define — I will discuss these permissions mapping documents in greater detail below.

After the Lambda authorizer generates an identity management policy, the policy is returned to API Gateway and API Gateway uses it to evaluate whether the user is allowed to invoke the requested API. You can optionally configure a setting in API Gateway to automatically cache the identity management policy so that subsequent API invocations with the same token do not invoke the Lambda authorizer, but instead use the identity management policy that was generated on the last invocation.

In this post, you will build your Lambda authorizer to receive an OAuth access token and validate its authenticity with the token issuer, then implement custom authorization logic to use the OAuth scopes present in the token to create an identity management policy that dictates which APIs the user is allowed to access. You will also configure API Gateway to cache the identity management policy that is returned by the Lambda authorizer. These patterns provide the following benefits:

  • Leverage third-party identity management services: Validating the token with the third party allows for consolidated management of services such as token verification, token expiration, and token revocation.
  • Cache to improve performance: Caching the token and identity management policy in API Gateway removes the need to call the Lambda authorizer for each invocation. Caching a policy can improve performance; however, this increased performance comes with addition security considerations. These considerations are discussed below.
  • Limit access with OAuth scopes: Using the scopes present in the access token, along with custom authorization logic, to generate an identity management policy and limit resource access is a familiar OAuth practice and serves as a good example of customizable authentication logic. Refer to Defining Scopes for more information on OAuth scopes and how they are typically used to control resource access.

The Lambda authorizer is invoked with the following object as the event parameter when API Gateway is configured to use a Lambda authorizer with the token event payload; refer to Input to an Amazon API Gateway Lambda Authorizer for more information on the types of payloads that are compatible with Lambda authorizers. Since you are using a token-based authorization scheme, you will use the token event payload. This payload contains the methodArn, which is the Amazon Resource Name (ARN) of the API Gateway resource that the request was addressed to. The payload also contains the authorizationToken, which is the third-party token that the user included with the request.


# Lambda Token Event Payload  
{   
 type: 'TOKEN',  
 methodArn: 'arn:aws:execute-api:us-east-1:2198525...',  
 authorizationToken: 'Bearer eyJraWQiOiJ0ekgt...'  
}

Upon receiving this event, your Lambda authorizer will issue an HTTP POST request to your identity provider to validate the token, and use the scopes present in the third-party token with a permissions mapping document to generate and return an identity management policy that contains the allowed actions of the user within API Gateway. Lambda authorizers can be written in any Lambda-supported language. You can explore some starter code templates on GitHub. The example function in this post uses Node.js 10.x.

The Lambda authorizer code in this post uses a static permissions mapping document. This document is represented by apiPermissions. For a complex or highly dynamic permissions document, this document can be decoupled from the Lambda authorizer and exported to Amazon Simple Storage Service (Amazon S3) or Amazon DynamoDB for simplified management. The static document contains the ARN of the deployed API, the API Gateway stage, the API resource, the HTTP method, and the allowed token scope. The Lambda authorizer then generates an identity management policy by evaluating the scopes present in the third-party token against those present in the document.

The fragment below shows an example permissions mapping. This mapping restricts access by requiring that users issuing HTTP GET requests to the ARN arn:aws:execute-api:us-east-1:219852565112:rz8w6b1ik2 and the my-resource resource in the DEV API Gateway stage are only allowed if they provide a valid token that contains the email scope.


# Example permissions document  
{  
 "arn": "arn:aws:execute-api:us-east-1:219852565112:rz8w6b1ik2",  
 "resource": "my-resource",  
 "stage": "DEV",  
 "httpVerb": "GET",  
 "scope": "email"  
}

The logic to create the identity management policy can be found in the generateIAMPolicy() method of the Lambda function. This method serves as a good general example of the extent of customization possible in Lambda authorizers. While the method in the example relies solely on token scopes, you can also use additional information such as request context, user information, source IP address, user agents, and so on, to generate the returned identity management policy.

Upon invocation, the Lambda authorizer below performs the following procedure:

  1. Receive the token event payload, and isolate the token string (trim “Bearer ” from the token string, if present).
  2. Verify the token with the third-party identity provider.

    Note: This Lambda function does not include this functionality. The method, verifyAccessToken(), will need to be customized based on the identity provider that you are using. This code assumes that the verifyAccessToken() method returns a Promise that resolves to the decoded token in JSON format.

  3. Retrieve the scopes from the decoded token. This code assumes these scopes can be accessed as an array at claims.scp in the decoded token.
  4. Iterate over the scopes present in the token and create identity and access management (IAM) policy statements based on entries in the permissions mapping document that contain the scope in question.
  5. Create a complete, well-formed IAM policy using the generated IAM policy statements. Refer to IAM JSON Policy Elements Reference for more information on programmatically building IAM policies.
  6. Return complete IAM policy to API Gateway.
    
    /*
     * Sample Lambda Authorizer to validate tokens originating from
     * 3rd Party Identity Provider and generate an IAM Policy
     */
    
    const apiPermissions = [
      {
        "arn": "arn:aws:execute-api:us-east-1:219852565112:rz8w6b1ik2", // NOTE: Replace with your API Gateway API ARN
        "resource": "my-resource", // NOTE: Replace with your API Gateway Resource
        "stage": "dev", // NOTE: Replace with your API Gateway Stage
        "httpVerb": "GET",
        "scope": "email"
      }
    ];
    
    var generatePolicyStatement = function (apiName, apiStage, apiVerb, apiResource, action) {
      'use strict';
      // Generate an IAM policy statement
      var statement = {};
      statement.Action = 'execute-api:Invoke';
      statement.Effect = action;
      var methodArn = apiName + "/" + apiStage + "/" + apiVerb + "/" + apiResource + "/";
      statement.Resource = methodArn;
      return statement;
    };
    
    var generatePolicy = function (principalId, policyStatements) {
      'use strict';
      // Generate a fully formed IAM policy
      var authResponse = {};
      authResponse.principalId = principalId;
      var policyDocument = {};
      policyDocument.Version = '2012-10-17';
      policyDocument.Statement = policyStatements;
      authResponse.policyDocument = policyDocument;
      return authResponse;
    };
    
    var verifyAccessToken = function (accessToken) {
      'use strict';
      /*
      * Verify the access token with your Identity Provider here (check if your 
      * Identity Provider provides an SDK).
      *
      * This example assumes this method returns a Promise that resolves to 
      * the decoded token, you may need to modify your code according to how
      * your token is verified and what your Identity Provider returns.
      */
    };
    
    var generateIAMPolicy = function (scopeClaims) {
      'use strict';
      // Declare empty policy statements array
      var policyStatements = [];
      // Iterate over API Permissions
      for ( var i = 0; i  -1 ) {
          // User token has appropriate scope, add API permission to policy statements
          policyStatements.push(generatePolicyStatement(apiPermissions[i].arn, apiPermissions[i].stage, apiPermissions[i].httpVerb,
                                                        apiPermissions[i].resource, "Allow"));
        }
      }
      // Check if no policy statements are generated, if so, create default deny all policy statement
      if (policyStatements.length === 0) {
        var policyStatement = generatePolicyStatement("*", "*", "*", "*", "Deny");
        policyStatements.push(policyStatement);
      }
      return generatePolicy('user', policyStatements);
    };
    
    exports.handler = async function(event, context) {
      // Declare Policy
      var iamPolicy = null;
      // Capture raw token and trim 'Bearer ' string, if present
      var token = event.authorizationToken.replace("Bearer ", "");
      // Validate token
      await verifyAccessToken(token).then(data => {
        // Retrieve token scopes
        var scopeClaims = data.claims.scp;
        // Generate IAM Policy
        iamPolicy = generateIAMPolicy(scopeClaims);
      })
      .catch(err => {
        console.log(err);
        // Generate default deny all policy statement if there is an error
        var policyStatements = [];
        var policyStatement = generatePolicyStatement("*", "*", "*", "*", "Deny");
        policyStatements.push(policyStatement);
        iamPolicy = generatePolicy('user', policyStatements);
      });
      return iamPolicy;
    };  
    

The following is an example of the identity management policy that is returned from your function.


# Example IAM Policy
{
  "principalId": "user",
  "policyDocument": {
    "Version": "2012-10-17",
    "Statement": [
      {
        "Action": "execute-api:Invoke",
        "Effect": "Allow",
        "Resource": "arn:aws:execute-api:us-east-1:219852565112:rz8w6b1ik2/get/DEV/my-resource/"
      }
    ]
  }
}

It is important to note that the Lambda authorizer above is not considering the method or resource that the user is requesting. This is because you want to generate a complete identity management policy that contains all the API permissions for the user, instead of a policy that only contains allow/deny for the requested resource. By generating a complete policy, this policy can be cached by API Gateway and used if the user invokes a different API while the policy is still in the cache. Caching the policy can reduce API latency from the user perspective, as well as the total amount of Lambda invocations; however, it can also increase vulnerability to Replay Attacks and acceptance of expired/revoked tokens.

Shorter cache lifetimes introduce more latency to API calls (that is, the Lambda authorizer must be called more frequently), while longer cache lifetimes introduce the possibility of a token expiring or being revoked by the identity provider, but still being used to return a valid identity management policy. For example, the following scenario is possible when caching tokens in API Gateway:

  • Identity provider stamps access token with an expiration date of 12:30.
  • User calls API Gateway with access token at 12:29.
  • Lambda authorizer generates identity management policy and API Gateway caches the token/policy pair for 5 minutes.
  • User calls API Gateway with same access token at 12:32.
  • API Gateway evaluates access against policy that exists in the cache, despite original token being expired.

Since tokens are not re-validated by the Lambda authorizer or API Gateway once they are placed in the API Gateway cache, long cache lifetimes may also increase susceptibility to Replay Attacks. Longer cache lifetimes and large identity management policies can increase the performance of your application, but must be evaluated against the trade-off of increased exposure to certain security vulnerabilities.

Deploying the Lambda authorizer

To deploy your Lambda authorizer, you first need to create and deploy a Lambda deployment package containing your function code and dependencies (if applicable). Lambda authorizer functions behave the same as other Lambda functions in terms of deployment and packaging. For more information on packaging and deploying a Lambda function, see AWS Lambda Deployment Packages in Node.js. For this example, you should name your Lambda function myLambdaAuth and use a Node.js 10.x runtime environment.

After the function is created, add the Lambda authorizer to API Gateway.

  1. Navigate to API Gateway and in the navigation pane, under APIs, select the API you configured earlier
  2. Under your API name, choose Authorizers, then choose Create New Authorizer.
  3. Under Create Authorizer, do the following:
    1. For Name, enter a name for your Lambda authorizer. In this example, the authorizer is named Lambda-Authorizer-Demo.
    2. For Type, select Lambda
    3. For Lambda Function, select the AWS Region you created your function in, then enter the name of the Lambda function you just created.
    4. Leave Lambda Invoke Role empty.
    5. For Lambda Event Payload choose Token.
    6. For Token Source, enter Authorization.
    7. For Token Validation, enter:
      
      ^(Bearer )[a-zA-Z0-9\-_]+?\.[a-zA-Z0-9\-_]+?\.([a-zA-Z0-9\-_]+)$
      			

      This represents a regular expression for validating that tokens match JWT format (more below).

    8. For Authorization Caching, select Enabled and enter a time to live (TTL) of 1 second.
  4. Select Save.

 

Figure 2: Create a new Lambda authorizer

Figure 2: Create a new Lambda authorizer

This configuration passes the token event payload mentioned above to your Lambda authorizer, and is necessary since you are using tokens (Token Event Payload) for authentication, rather than request parameters (Request Event Payload). For more information, see Use API Gateway Lambda Authorizers.

In this solution, the token source is the Authorization header of the HTTP request. If you know the expected format of your token, you can include a regular expression in the Token Validation field, which automatically rejects any request that does not match the regular expression. Token validations are not mandatory. This example assumes the token is a JWT.


# Regex matching JWT Bearer Tokens  
^(Bearer )[a-zA-Z0-9\-_]+?\.[a-zA-Z0-9\-_]+?\.([a-zA-Z0-9\-_]+)$

Here, you can also configure how long the token/policy pair will be cached in API Gateway. This example enables caching with a TTL of 1 second.

In this solution, you leave the Lambda Invoke Role field empty. This field is used to provide an IAM role that allows API Gateway to execute the Lambda authorizer. If left blank, API Gateway configures a default resource-based policy that allows it to invoke the Lambda authorizer.

The final step is to point your API Gateway resource to your Lambda authorizer. Select the configured API Resource and HTTP method.

  1. Navigate to API Gateway and in the navigation pane, under APIs, select the API you configured earlier.
  2. Select the GET method.

    Figure 3: GET Method Execution

    Figure 3: GET Method Execution

  3. Select Method Request.
  4. Under Settings, edit Authorization and select the authorizer you just configured (in this example, Lambda-Authorizer-Demo).

    Figure 4: Select your API authorizer

    Figure 4: Select your API authorizer

Deploy the API to an API Gateway stage that matches the stage configured in the Lambda authorizer permissions document (apiPermissions variable).

  1. Navigate to API Gateway and in the navigation pane, under APIs, select the API you configured earlier.
  2. Select the / resource of your API.
  3. Select Actions, and under API Actions, select Deploy API.
  4. For Deployment stage, select [New Stage] and for the Stage name, enter dev. Leave Stage description and Deployment description blank.
  5. Select Deploy.

    Figure 5: Deploy your API stage

    Figure 5: Deploy your API stage

Testing the results

With the Lambda authorizer configured as your authorization source, you are now able to access the resource only if you provide a valid token that contains the email scope.

The following example shows how to issue an HTTP request with curl to your API Gateway resource using a valid token that contains the email scope passed in the HTTP Authorization header. Here, you are able to authenticate and receive an appropriate response from API Gateway.


# HTTP Request (including valid token with "email" scope)  
$ curl -X GET \  
> 'https://rz8w6b1ik2.execute-api.us-east-1.amazonaws.com/dev/my-resource/?myParam=myValue' \  
> -H 'Authorization: Bearer eyJraWQiOiJ0ekgtb1Z5eE...'  
  
{  
 "statusCode" : 200,  
 "message" : "Hello from API Gateway!"  
}

The following JSON object represents the decoded JWT payload used in the previous example. The JSON object captures the token scopes in scp, and you can see that the token contained the email scope.

Figure 6: JSON object that contains the email scope

Figure 6: JSON object that contains the email scope

If you provide a token that is expired, is invalid, or that does not contain the email scope, then you are not able to access the resource. The following example shows a request to your API Gateway resource with a valid token that does not contain the email scope. In this example, the Lambda authorizer rejects the request.


# HTTP Request (including token without "email" scope)  
$ curl -X GET \  
> 'https://rz8w6b1ik2.execute-api.us-east-1.amazonaws.com/dev/my-resource/?myParam=myValue' \  
> -H 'Authorization: Bearer eyJraWQiOiJ0ekgtb1Z5eE...'  
  
{  
 "Message" : "User is not authorized to access this resource with an explicit deny"  
}

The following JSON object represents the decoded JWT payload used in the above example; it does not include the email scope.

Figure 7: JSON object that does not contain the email scope

Figure 7: JSON object that does not contain the email scope

If you provide no token, or you provide a token not matching the provided regular expression, then you are immediately rejected by API Gateway without invoking the Lambda authorizer. API Gateway only forwards tokens to the Lambda authorizer that have the HTTP Authorization header and pass the token validation regular expression, if a regular expression was provided. If the request does not pass token validation or does not have an HTTP Authorization header, API Gateway rejects it with a default HTTP 401 response. The following example shows how to issue a request to your API Gateway resource using an invalid token that does match the regular expression you configured on your authorizer. In this example, API Gateway rejects your request automatically without invoking the authorizer.


# HTTP Request (including a token that is not a JWT)  
$ curl -X GET \  
> 'https://rz8w6b1ik2.execute-api.us-east-1.amazonaws.com/dev/my-resource/?myParam=myValue' \  
> -H 'Authorization: Bearer ThisIsNotAJWT'  
  
{  
 "Message" : "Unauthorized"  
}

These examples demonstrate how your Lambda authorizer allows and denies requests based on the token format and the token content.

Conclusion

In this post, you saw how Lambda authorizers can be used with API Gateway to implement a token-based authentication scheme using third-party tokens.

Lambda authorizers can provide a number of benefits:

  • Leverage third-party identity management services directly, without identity federation.
  • Implement custom authorization logic.
  • Cache identity management policies to improve performance of authorization logic (while keeping in mind security implications).
  • Minimally impact existing client applications.

For organizations seeking an alternative to Amazon Cognito User Pools and Amazon Cognito identity pools, Lambda authorizers can provide complete, secure, and flexible authentication and authorization services to resources deployed with Amazon API Gateway. For more information about Lambda authorizers, see API Gateway Lambda Authorizers.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Bryant Bost

Bryant Bost is an Application Consultant for AWS Professional Services based out of Washington, DC. As a consultant, he supports customers with architecting, developing, and operating new applications, as well as migrating existing applications to AWS. In addition to web application development, Bryant specializes in serverless and container architectures, and has authored several posts on these topics.

Customizing triggers for AWS CodePipeline with AWS Lambda and Amazon CloudWatch Events

Post Syndicated from Bryant Bost original https://aws.amazon.com/blogs/devops/adding-custom-logic-to-aws-codepipeline-with-aws-lambda-and-amazon-cloudwatch-events/

AWS CodePipeline is a fully managed continuous delivery service that helps automate the build, test, and deploy processes of your application. Application owners use CodePipeline to manage releases by configuring “pipeline,” workflow constructs that describe the steps, from source code to deployed application, through which an application progresses as it is released. If you are new to CodePipeline, check out Getting Started with CodePipeline to get familiar with the core concepts and terminology.

Overview

In a default setup, a pipeline is kicked-off whenever a change in the configured pipeline source is detected. CodePipeline currently supports sourcing from AWS CodeCommit, GitHub, Amazon ECR, and Amazon S3. When using CodeCommit, Amazon ECR, or Amazon S3 as the source for a pipeline, CodePipeline uses an Amazon CloudWatch Event to detect changes in the source and immediately kick off a pipeline. When using GitHub as the source for a pipeline, CodePipeline uses a webhook to detect changes in a remote branch and kick off the pipeline. Note that CodePipeline also supports beginning pipeline executions based on periodic checks, although this is not a recommended pattern.

CodePipeline supports adding a number of custom actions and manual approvals to ensure that pipeline functionality is flexible and code releases are deliberate; however, without further customization, pipelines will still be kicked-off for every change in the pipeline source. To customize the logic that controls pipeline executions in the event of a source change, you can introduce a custom CloudWatch Event, which can result in the following benefits:

  • Multiple pipelines with a single source: Trigger select pipelines when multiple pipelines are listening to a single source. This can be useful if your organization is using monorepos, or is using a single repository to host configuration files for multiple instances of identical stacks.
  • Avoid reacting to unimportant files: Avoid triggering a pipeline when changing files that do not affect the application functionality (e.g. documentation files, readme files, and .gitignore files).
  • Conditionally kickoff pipelines based on environmental conditions: Use custom code to evaluate whether a pipeline should be triggered. This allows for further customization beyond polling a source repository or relying on a push event. For example, you could create custom logic to automatically reschedule deployments on holidays to the next available workday.

This post explores and demonstrates how to customize the actions that invoke a pipeline by modifying the default CloudWatch Events configuration that is used for CodeCommit, ECR, or S3 sources. To illustrate this customization, we will walk through two examples: prevent updates to documentation files from triggering a pipeline, and manage execution of multiple pipelines monitoring a single source repository.

The key concepts behind customizing pipeline invocations extend to GitHub sources and webhooks as well; however, creating a custom webhook is outside the scope of this post.

Sample Architecture

This post is only interested in controlling the execution of the pipeline (as opposed to the deploy, test, or approval stages), so it uses simple source and pipeline configurations. The sample architecture considers a simple CodePipeline with only two stages: source and build.

Example CodePipeline Architecture

Example CodePipeline Architecture with Custom CloudWatch Event Configuration

The sample CodeCommit repository consists only of buildspec.yml, readme.md, and script.py files.

Normally, after you create a pipeline, it automatically triggers a pipeline execution to release the latest version of your source code. From then on, every time you make a change to your source location, a new pipeline execution is triggered. In addition, you can manually re-run the last revision through a pipeline using the “Release Change” button in the console. This architecture uses a custom CloudWatch Event and AWS Lambda function to avoid commits that change only the readme.md file from initiating an execution of the pipeline.

Creating a custom CloudWatch Event

When we create a CodePipeline that monitors a CodeCommit (or other) source, a default CloudWatch Events rule is created to trigger our pipeline for every change to the CodeCommit repository. This CloudWatch Events rule monitors the CodeCommit repository for changes, and triggers the pipeline for events matching the referenceCreated or referenceUpdated CodeCommit Event (refer to CodeCommit Event Types for more information).

Default CloudWatch Events Rule to Trigger CodePipeline

Default CloudWatch Events Rule to Trigger CodePipeline

To introduce custom logic and control the events that kickoff the pipeline, this example configures the default CloudWatch Events rule to detect changes in the source and trigger a Lambda function rather than invoke the pipeline directly. The example uses a CodeCommit source, but the same principle applies to Amazon S3 and Amazon ECR sources as well, as these both use CloudWatch Events rules to notify CodePipeline of changes.

Custom CloudWatch Events Rule to Trigger CodePipeline

Custom CloudWatch Events Rule to Trigger CodePipeline

When a change is introduced to the CodeCommit repository, the configured Lambda function receives an event from CloudWatch signaling that there has been a source change.

{
   "version":"0",
   "id":"2f9a75be-88f6-6827-729d-34495072e5a1",
   "detail-type":"CodeCommit Repository State Change",
   "source":"aws.codecommit",
   "account":"accountNumber",
   "time":"2019-11-12T04:56:47Z",
   "region":"us-east-1",
   "resources":[
      "arn:aws:codecommit:us-east-1:accountNumber:codepipeline-customization-sandbox-repo"
   ],
   "detail":{
      "callerUserArn":"arn:aws:sts::accountNumber:assumed-role/admin/roleName ",
      "commitId":"92e953e268345c77dd93cec860f7f91f3fd13b44",
      "event":"referenceUpdated",
      "oldCommitId":"5a058542a8dfa0dacf39f3c1e53b88b0f991695e",
      "referenceFullName":"refs/heads/master",
      "referenceName":"master",
      "referenceType":"branch",
      "repositoryId":"658045f1-c468-40c3-93de-5de2c000d84a",
      "repositoryName":"codepipeline-customization-sandbox-repo"
   }
}

The Lambda function is responsible for determining whether a source change necessitates kicking-off the pipeline, which in the example is necessary if the change contains modifications to files other than readme.md. To implement this, the Lambda function uses the commitId and oldCommitId fields provided in the body of the CloudWatch event message to determine which files have changed. If the function determines that a change has occurred to a “non-ignored” file, then the function programmatically executes the pipeline. Note that for S3 sources, it may be necessary to process an entire file zip archive, or to retrieve past versions of an artifact.

import boto3

files_to_ignore = [ "readme.md" ]

codecommit_client = boto3.client('codecommit')
codepipeline_client = boto3.client('codepipeline')

def lambda_handler(event, context):
    # Extract commits
    old_commit_id = event["detail"]["oldCommitId"]
    new_commit_id = event["detail"]["commitId"]
    # Get commit differences
    codecommit_response = codecommit_client.get_differences(
        repositoryName="codepipeline-customization-sandbox-repo",
        beforeCommitSpecifier=str(old_commit_id),
        afterCommitSpecifier=str(new_commit_id)
    )
    # Search commit differences for files to ignore
    for difference in codecommit_response["differences"]:
        file_name = difference["afterBlob"]["path"].lower()
        # If non-ignored file is present, kickoff pipeline
        if file_name not in files_to_ignore:
            codepipeline_response = codepipeline_client.start_pipeline_execution(
                name="codepipeline-customization-sandbox-pipeline"
                )
            # Break to avoid executing the pipeline twice
            break

Multiple pipelines sourcing from a single repository

Architectures that use a single-source repository monitored by multiple pipelines can add custom logic to control the types of events that trigger a specific pipeline to execute. Without customization, any change to the source repository would trigger all pipelines.

Consider the following example:

  • A CodeCommit repository contains a number of config files (for example, config_1.json and config_2.json).
  • Multiple pipelines (for example, codepipeline-customization-sandbox-pipeline-1 and codepipeline-customization-sandbox-pipeline-2) source from this CodeCommit repository.
  • Whenever a config file is updated, a custom CloudWatch Event triggers a Lambda function that is used to determine which config files changed, and therefore which pipelines should be executed.
Example CodePipeline Architecture

Example CodePipeline Architecture for Monorepos with Custom CloudWatch Event Configuration

This example follows the same pattern of creating a custom CloudWatch Event and Lambda function shown in the preceding example. However, in this scenario, the Lambda function is responsible for determining which files changed and which pipelines should be kicked off as a result. To execute this logic, the Lambda function uses the config_file_mapping variable to map files to corresponding pipelines. Pipelines are only executed if their designated config file has changed.

Note that the config_file_mapping can be exported to Amazon S3 or Amazon DynamoDB for more complex use cases.

import boto3

# Map config files to pipelines
config_file_mapping = {
        "config_1.json" : "codepipeline-customization-sandbox-pipeline-1",
        "config_2.json" : "codepipeline-customization-sandbox-pipeline-2"
        }
        
codecommit_client = boto3.client('codecommit')
codepipeline_client = boto3.client('codepipeline')

def lambda_handler(event, context):
    # Extract commits
    old_commit_id = event["detail"]["oldCommitId"]
    new_commit_id = event["detail"]["commitId"]
    # Get commit differences
    codecommit_response = codecommit_client.get_differences(
        repositoryName="codepipeline-customization-sandbox-repo",
        beforeCommitSpecifier=str(old_commit_id),
        afterCommitSpecifier=str(new_commit_id)
    )
    # Search commit differences for files that trigger executions
    for difference in codecommit_response["differences"]:
        file_name = difference["afterBlob"]["path"].lower()
        # If file corresponds to pipeline, execute pipeline
        if file_name in config_file_mapping:
            codepipeline_response = codepipeline_client.start_pipeline_execution(
                name=config_file_mapping["file_name"]
                )

Results

For the first example, updates affecting only the readme.md file are completely ignored by the pipeline, while updates affecting other files begin a normal pipeline execution. For the second example, the two pipelines monitor the same source repository; however, codepipeline-customization-sandbox-pipeline-1 is executed only when config_1.json is updated and codepipeline-customization-sandbox-pipeline-2 is executed only when config_2.json is updated.

These CloudWatch Event and Lambda function combinations serve as a good general examples of the introduction of custom logic to pipeline kickoffs, and can be expanded to account for variously complex processing logic.

Cleanup

To avoid additional infrastructure costs from the examples described in this post, be sure to delete all CodeCommit repositories, CodePipeline pipelines, Lambda functions, and CodeBuild projects. When you delete a CodePipeline, the CloudWatch Events rule that was created automatically is deleted, even if the rule has been customized.

Conclusion

For scenarios which need you to define additional custom logic to control the execution of one or multiple pipelines, configuring a CloudWatch Event to trigger a Lambda function allows you to customize the conditions and types of events that can kick-off your pipeline.