Tag Archives: java

Amazon CodeGuru Reviewer Updates: New Java Detectors and CI/CD Integration with GitHub Actions

Post Syndicated from Alex Casalboni original https://aws.amazon.com/blogs/aws/amazon_codeguru_reviewer_updates_new_java_detectors_and_cicd_integration_with_github_actions/

Amazon CodeGuru allows you to automate code reviews and improve code quality, and thanks to the new pricing model announced in April you can get started with a lower and fixed monthly rate based on the size of your repository (up to 90% less expensive). CodeGuru Reviewer helps you detect potential defects and bugs that are hard to find in your Java and Python applications, using the AWS Management Console, AWS SDKs, and AWS CLI.

Today, I’m happy to announce that CodeGuru Reviewer natively integrates with the tools that you use every day to package and deploy your code. This new CI/CD experience allows you to trigger code quality and security analysis as a step in your build process using GitHub Actions.

Although the CodeGuru Reviewer console still serves as an analysis hub for all your onboarded repositories, the new CI/CD experience allows you to integrate CodeGuru Reviewer more deeply with your favorite source code management and CI/CD tools.

And that’s not all! Today we’re also releasing 20 new security detectors for Java to help you identify even more issues related to security and AWS best practices.

A New CI/CD Experience for CodeGuru Reviewer
As a developer or development team, you push new code every day and want to identify security vulnerabilities early in the development cycle, ideally at every push. During a pull-request (PR) review, all the CodeGuru recommendations will appear as a comment, as if you had another pair of eyes on the PR. These comments include useful links to help you resolve the problem.

When you push new code or schedule a code review, recommendations will appear in the Security > Code scanning alerts tab on GitHub.

Let’s see how to integrate CodeGuru Reviewer with GitHub Actions.

First of all, create a .yml file in your repository under .github/workflows/ (or update an existing action). This file will contain all your actions’ step. Let’s go through the individual steps.

The first step is configuring your AWS credentials. You want to do this securely, without storing any credentials in your repository’s code, using the Configure AWS Credentials action. This action allows you to configure an IAM role that GitHub will use to interact with AWS services. This role will require a few permissions related to CodeGuru Reviewer and Amazon S3. You can attach the AmazonCodeGuruReviewerFullAccess managed policy to the action role, in addition to s3:GetObject, s3:PutObject and s3:ListBucket.

This first step will look as follows:

- name: Configure AWS Credentials
  uses: aws-actions/configure-aws-credentials@v1
  with:
    aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
    aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
    aws-region: eu-west-1

These access key and secret key correspond to your IAM role and will be used to interact with CodeGuru Reviewer and Amazon S3.

Next, you add the CodeGuru Reviewer action and a final step to upload the results:

- name: Amazon CodeGuru Reviewer Scanner
  uses: aws-actions/codeguru-reviewer
  if: ${{ always() }} 
  with:
    build_path: target # build artifact(s) directory
    s3_bucket: 'codeguru-reviewer-myactions-bucket'  # S3 Bucket starting with "codeguru-reviewer-*"
- name: Upload review result
  if: ${{ always() }}
  uses: github/codeql-action/upload-sarif@v1
  with:
    sarif_file: codeguru-results.sarif.json

The CodeGuru Reviewer action requires two input parameters:

  • build_path: Where your build artifacts are in the repository.
  • s3_bucket: The name of an S3 bucket that you’ve created previously, used to upload the build artifacts and analysis results. It’s a customer-owned bucket so you have full control over access and permissions, in case you need to share its content with other systems.

Now, let’s put all the pieces together.

Your .yml file should look like this:

name: CodeGuru Reviewer GitHub Actions Integration
on: [pull_request, push, schedule]
jobs:
  CodeGuru-Reviewer-Actions:
    runs-on: ubuntu-latest
    steps:
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-2
	  - name: Amazon CodeGuru Reviewer Scanner
        uses: aws-actions/codeguru-reviewer
        if: ${{ always() }} 
        with:
          build_path: target # build artifact(s) directory
          s3_bucket: 'codeguru-reviewer-myactions-bucket'  # S3 Bucket starting with "codeguru-reviewer-*"
      - name: Upload review result
        if: ${{ always() }}
        uses: github/codeql-action/upload-sarif@v1
        with:
          sarif_file: codeguru-results.sarif.json

It’s important to remember that the S3 bucket name needs to start with codeguru_reviewer- and that these actions can be configured to run with the pull_request, push, or schedule triggers (check out the GitHub Actions documentation for the full list of events that trigger workflows). Also keep in mind that there are minor differences in how you configure GitHub-hosted runners and self-hosted runners, mainly in the credentials configuration step. For example, if you run your GitHub Actions in a self-hosted runner that already has access to AWS credentials, such as an EC2 instance, then you don’t need to provide any credentials to this action (check out the full documentation for self-hosted runners).

Now when you push a change or open a PR CodeGuru Reviewer will comment on your code changes with a few recommendations.

Or you can schedule a daily or weekly repository scan and check out the recommendations in the Security > Code scanning alerts tab.

New Security Detectors for Java
In December last year, we launched the Java Security Detectors for CodeGuru Reviewer to help you find and remediate potential security issues in your Java applications. These detectors are built with machine learning and automated reasoning techniques, trained on over 100,000 Amazon and open-source code repositories, and based on the decades of expertise of the AWS Application Security (AppSec) team.

For example, some of these detectors will look at potential leaks of sensitive information or credentials through excessively verbose logging, exception handling, and storing passwords in plaintext in memory. The security detectors also help you identify several web application vulnerabilities such as command injection, weak cryptography, weak hashing, LDAP injection, path traversal, secure cookie flag, SQL injection, XPATH injection, and XSS (cross-site scripting).

The new security detectors for Java can identify security issues with the Java Servlet APIs and web frameworks such as Spring. Some of the new detectors will also help you with security best practices for AWS APIs when using services such as Amazon S3, IAM, and AWS Lambda, as well as libraries and utilities such as Apache ActiveMQ, LDAP servers, SAML parsers, and password encoders.

Available Today at No Additional Cost
The new CI/CD integration and security detectors for Java are available today at no additional cost, excluding the storage on S3 which can be estimated based on size of your build artifacts and the frequency of code reviews. Check out the CodeGuru Reviewer Action in the GitHub Marketplace and the Amazon CodeGuru pricing page to find pricing examples based on the new pricing model we launched last month.

We’re looking forward to hearing your feedback, launching more detectors to help you identify potential issues, and integrating with even more CI/CD tools in the future.

You can learn more about the CI/CD experience and configuration in the technical documentation.

Alex

Announcing migration of the Java 8 runtime in AWS Lambda to Amazon Corretto

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/announcing-migration-of-the-java-8-runtime-in-aws-lambda-to-amazon-corretto/

This post is written by Jonathan Tuliani, Principal Product Manager, AWS Lambda.

What is happening?

Beginning July 19, 2021, the Java 8 managed runtime in AWS Lambda will migrate from the current Open Java Development Kit (OpenJDK) implementation to the latest Amazon Corretto implementation.

To reflect this change, the AWS Management Console will change how Java 8 runtimes are displayed. The display name for runtimes using the ‘java8’ identifier will change from ‘Java 8’ to ‘Java 8 on Amazon Linux 1’. The display name for runtimes using the ‘java8.al2’ identifier will change from ‘Java 8 (Corretto)’ to ‘Java 8 on Amazon Linux 2’. The ‘java8’ and ‘java8.al2’ identifiers themselves, as used by tools such as the AWS CLI, CloudFormation, and AWS SAM, will not change.

Why are you making this change?

This change enables customers to benefit from the latest innovations and extended support of the Amazon Corretto JDK distribution. Amazon Corretto is a no-cost, multiplatform, production-ready distribution of the OpenJDK. Corretto is certified as compatible with the Java SE standard and used internally at Amazon for many production services.

Amazon is committed to Corretto, and provides regular updates that include security fixes and performance enhancements. With this change, these benefits are available to all Lambda customers. For more information on improvements provided by Amazon Corretto 8, see Amazon Corretto 8 change logs.

How does this affect existing Java 8 functions?

Amazon Corretto 8 is designed as a drop-in replacement for OpenJDK 8. Most functions benefit seamlessly from the enhancements in this update without any action from you.

In rare cases, switching to Amazon Corretto 8 introduces compatibility issues. See below for known issues and guidance on how to verify compatibility in advance of this change.

When will this happen?

This migration to Amazon Corretto takes place in several stages:

  • June 15, 2021: Availability of Lambda layers for testing the compatibility of functions with the Amazon Corretto runtime. Start of AWS Management Console changes to java8 and java8.al2 display names.
  • July 19, 2021: Any new functions using the java8 runtime will use Amazon Corretto. If you update an existing function, it will transition to Amazon Corretto automatically. The public.ecr.aws/lambda/java:8 container base image is updated to use Amazon Corretto.
  • August 16, 2021: For functions that have not been updated since June 28, AWS will begin an automatic transition to the new Corretto runtime.
  • September 10, 2021: Migration completed.

These changes are only applied to functions not using the arn:aws:lambda:::awslayer:Java8Corretto or arn:aws:lambda:::awslayer:Java8OpenJDK layers described below.

Which of my Lambda functions are affected?

Lambda supports two versions of the Java 8 managed runtime: the java8 runtime, which runs on Amazon Linux 1, and the java8.al2 runtime, which runs on Amazon Linux 2. This change only affects functions using the java8 runtime. Functions the java8.al2 runtime are already using the Amazon Corretto implementation of Java 8 and are not affected.

The following command shows how to use the AWS CLI to list all functions in a specific Region using the java8 runtime. To find all such functions in your account, repeat this command for each Region:

aws lambda list-functions --function-version ALL --region us-east-1 --output text --query "Functions[?Runtime=='java8'].FunctionArn"

What do I need to do?

If you are using the java8 runtime, your functions will be updated automatically. For production workloads, we recommend that you test functions in advance for compatibility with Amazon Corretto 8.

For Lambda functions using container images, the existing public.ecr.aws/lambda/java:8 container base image will be updated to use the Amazon Corretto Java implementation. You must manually update your functions to use the updated container base image.

How can I test for compatibility with Amazon Corretto 8?

If you are using the java8 managed runtime, you can test functions with the new version of the runtime by adding the layer reference arn:aws:lambda:::awslayer:Java8Corretto to the function configuration. This layer instructs the Lambda service to use the Amazon Corretto implementation of Java 8. It does not contain any data or code.

If you are using container images, update the JVM in your image to Amazon Corretto for testing. Here is an example Dockerfile:

FROM public.ecr.aws/lambda/java:8

# Update the JVM to the latest Corretto version
## Import the Corretto public key
rpm --import https://yum.corretto.aws/corretto.key

## Add the Corretto yum repository to the system list
curl -L -o /etc/yum.repos.d/corretto.repo https://yum.corretto.aws/corretto.repo

## Install the latest version of Corretto 8
yum install -y java-1.8.0-amazon-corretto-devel

# Copy function code and runtime dependencies from Gradle layout
COPY build/classes/java/main ${LAMBDA_TASK_ROOT}
COPY build/dependency/* ${LAMBDA_TASK_ROOT}/lib/

# Set the CMD to your handler
CMD [ "com.example.LambdaHandler::handleRequest" ]

Can I continue to use the OpenJDK version of Java 8?

You can continue to use the OpenJDK version of Java 8 by adding the layer reference arn:aws:lambda:::awslayer:Java8OpenJDK to the function configuration. This layer tells the Lambda service to use the OpenJDK implementation of Java 8. It does not contain any data or code.

This option gives you more time to address any code incompatibilities with Amazon Corretto 8. We do not recommend that you use this option to continue to use Lambda’s OpenJDK Java implementation in the long term. Following this migration, it will no longer receive bug fix and security updates. After addressing any compatibility issues, remove this layer reference so that the function uses the Lambda-Amazon Corretto managed implementation of Java 8.

What are the known differences between OpenJDK 8 and Amazon Corretto 8 in Lambda?

Amazon Corretto caches TCP sessions for longer than OpenJDK 8. Functions that create new connections (for example, new AWS SDK clients) on each invoke without closing them may experience an increase in memory usage. In the worst case, this could cause the function to consume all the available memory, which results in an invoke error and a subsequent cold start.

We recommend that you do not create AWS SDK clients in your function handler on every function invocation. Instead, create SDK clients outside the function handler as static objects that can be used by multiple invocations. For more information, see static initialization in the Lambda Operator Guide.

If you must use a new client on every invocation, make sure it is shut down at the end of every invocation. This avoids TCP session caches using unnecessary resources.

What if I need additional help?

Contact AWS Support, the AWS Lambda discussion forums, or your AWS account team if you have any questions or concerns.

For more serverless learning resources, visit Serverless Land.

Always Name Your Thread Pools

Post Syndicated from Bozho original https://techblog.bozho.net/always-name-your-thread-pools/

Our software tends to use a lot of thread pools – mostly through java.util.concurrent.ExecutorService implementations (Created via Executors.new.... We create these for various async use-cases, and they can be seen all over the place. All of these executors have a thread factory. It’s hidden in the default factory method, but you can supply a thread factory. If not supplied, a default thread factory is used whenever a thread is needed.

When using spring, those can be created using <task:executor />. In that case, each executor service’s thread factory is provided by spring and it uses the name of the executor bean (specified with id="executorName"). But for those not created by spring, a default name is used which isn’t helpful and doesn’t let you differentiate threads by name.

And you need to differentiate threads by name – in case of performance issues you have various options to investigate: thread dumps and using the top command. In both cases it’s useful to know what function does a thread service, as the stacktrace in the dump might not always be revealing.

And my favorite tool for quick investigation is top. More precisely, top -H -p <pid>. This shows the usual top table, but the -H flag means that threads for the chosen process should be printed. You basically get the most CPU-heavy and currently active threads, by name. In those cases it’s extremely useful to have custom names.

But how do you set a name? By specifying a named thread factory when creating each executor. Here’s a stackoverflow answer with multiple ways to achieve thread naming.

The method that I’m using is based on the 2nd answer:

public class AsyncUtils {
    public static ThreadFactory createNamedThreadFactory(String name) {
        return new ThreadFactoryBuilder().setNameFormat(name + "-%d").build();
    }
}

Centrally managing all executors through spring might be a better idea, but not everyone is using spring and sometimes an executor is needed for a small piece of functionality that could even go outside spring beans. So it’s a good idea to have that method up your sleeve.

The post Always Name Your Thread Pools appeared first on Bozho's tech blog.

Improving AWS Java applications with Amazon CodeGuru Reviewer

Post Syndicated from Rajdeep Mukherjee original https://aws.amazon.com/blogs/devops/improving-aws-java-applications-with-amazon-codeguru-reviewer/

Amazon CodeGuru Reviewer is a machine learning (ML)-based AWS service for providing automated code reviews comments on your Java and Python applications. Powered by program analysis and ML, CodeGuru Reviewer detects hard-to-find bugs and inefficiencies in your code and leverages best practices learned from across millions of lines of open-source and Amazon code. You can start analyzing your code through pull requests and full repository analysis (for more information, see Automating code reviews and application profiling with Amazon CodeGuru).

The recommendations generated by CodeGuru Reviewer for Java fall into the following categories:

  • AWS best practices
  • Concurrency
  • Security
  • Resource leaks
  • Other specialized categories such as sensitive information leaks, input validation, and code clones
  • General best practices on data structures, control flow, exception handling, and more

We expect the recommendations to benefit beginners as well as expert Java programmers.

In this post, we showcase CodeGuru Reviewer recommendations related to using the AWS SDK for Java. For in-depth discussion of other specialized topics, see our posts on concurrency, security, and resource leaks. For Python applications, see Raising Python code quality using Amazon CodeGuru.

The AWS SDK for Java simplifies the use of AWS services by providing a set of features that are consistent and familiar for Java developers. The SDK has more than 250 AWS service clients, which are available on GitHub. Service clients include services like Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, Amazon Kinesis, Amazon Elastic Compute Cloud (Amazon EC2), AWS IoT, and Amazon SageMaker. These services constitute more than 6,000 operations, which you can use to access AWS services. With such rich and diverse services and APIs, developers may not always be aware of the nuances of AWS API usage. These nuances may not be important at the beginning, but become critical as the scale increases and the application evolves or becomes diverse. This is why CodeGuru Reviewer has a category of recommendations: AWS best practices. This category of recommendations enables you to become aware of certain features of AWS APIs so your code can be more correct and performant.

The first part of this post focuses on the key features of the AWS SDK for Java as well as API patterns in AWS services. The second part of this post demonstrates using CodeGuru Reviewer to improve code quality for Java applications that use the AWS SDK for Java.

AWS SDK for Java

The AWS SDK for Java supports higher-level abstractions for simplified development and provides support for cross-cutting concerns such as credential management, retries, data marshaling, and serialization. In this section, we describe a few key features that are supported in the AWS SDK for Java. Additionally, we discuss some key API patterns such as batching, and pagination, in AWS services.

The AWS SDK for Java has the following features:

  • Waiters Waiters are utility methods that make it easy to wait for a resource to transition into a desired state. Waiters makes it easier to abstract out the polling logic into a simple API call. The waiters interface provides a custom delay strategy to control the sleep time between retries, as well as a custom condition on whether polling of a resource should be retried. The AWS SDK for Java also offer an async variant of waiters.
  • Exceptions The AWS SDK for Java uses runtime (or unchecked) exceptions instead of checked exceptions in order to give you fine-grained control over the errors you want to handle and to prevent scalability issues inherent with checked exceptions in large applications. Broadly, the AWS SDK for Java has two types of exceptions:
    • AmazonClientException – Indicates that a problem occurred inside the Java client code, either while trying to send a request to AWS or while trying to parse a response from AWS. For example, the AWS SDK for Java throws an AmazonClientException if no network connection is available when you try to call an operation on one of the clients.
    • AmazonServiceException – Represents an error response from an AWS service. For example, if you try to end an EC2 instance that doesn’t exist, Amazon EC2 returns an error response, and all the details of that response are included in the AmazonServiceException that’s thrown. For some cases, a subclass of AmazonServiceException is thrown to allow you fine-grained control over handling error cases through catch blocks.

The API has the following patterns:

  • Batching – A batch operation provides you with the ability to perform a single CRUD operation (create, read, update, delete) on multiple resources. Some typical use cases include the following:
  • Pagination – Many AWS operations return paginated results when the response object is too large to return in a single response. To enable you to perform pagination, the request and response objects for many service clients in the SDK provide a continuation token (typically named NextToken) to indicate additional results.

AWS best practices

Now that we have summarized the SDK-specific features and API patterns, let’s look at the CodeGuru Reviewer recommendations on AWS API use.

The CodeGuru Reviewer recommendations for the AWS SDK for Java range from detecting outdated or deprecated APIs to warning about API misuse, missing pagination, authentication and exception scenarios, and using efficient API alternatives. In this section, we discuss a few examples patterned after real code.

Handling pagination

Over 1,000 APIs from more than 150 AWS services have pagination operations. The pagination best practice rule in CodeGuru covers all the pagination operations. In particular, the pagination rule checks if the Java application correctly fetches all the results of the pagination operation.

The response of a pagination operation in AWS SDK for Java 1.0 contains a token that has to be used to retrieve the next page of results. In the following code snippet, you make a call to listTables(), a DynamoDB ListTables operation, which can only return up to 100 table names per page. This code might not produce complete results because the operation returns paginated results instead of all results.

public void getDynamoDbTable() {
        AmazonDynamoDBClient client = new AmazonDynamoDBClient();
        List<String> tables = dynamoDbClient.listTables().getTableNames();
        System.out.println(tables)
}

CodeGuru Reviewer detects the missing pagination in the code snippet and makes the following recommendation to add another call to check for additional results.

Screenshot of recommendations for introducing pagination checks

You can accept the recommendation and add the logic to get the next page of table names by checking if a token (LastEvaluatedTableName in ListTablesResponse) is included in each response page. If such a token is present, it’s used in a subsequent request to fetch the next page of results. See the following code:

public void getDynamoDbTable() {
        AmazonDynamoDBClient client = new AmazonDynamoDBClient();
        ListTablesRequest listTablesRequest = ListTablesRequest.builder().build();
        boolean done = false;
        while (!done) {
            ListTablesResponse listTablesResponse = client.listTables(listTablesRequest);
	    System.out.println(listTablesResponse.tableNames());
            if (listTablesResponse.lastEvaluatedTableName() == null) {
                done = true;
            }
            listTablesRequest = listTablesRequest.toBuilder()
                    .exclusiveStartTableName(listTablesResponse.lastEvaluatedTableName())
                    .build();
        }
}

Handling failures in batch operation calls

Batch operations are common with many AWS services that process bulk requests. Batch operations can succeed without throwing exceptions even if some items in the request fail. Therefore, a recommended practice is to explicitly check for any failures in the result of the batch APIs. Over 40 APIs from more than 20 AWS services have batch operations. The best practice rule in CodeGuru Reviewer covers all the batch operations. In the following code snippet, you make a call to sendMessageBatch, a batch operation from Amazon SQS, but it doesn’t handle any errors returned by that batch operation:

public void flush(final String sqsEndPoint,
                     final List<SendMessageBatchRequestEntry> batch) {
    AwsSqsClientBuilder awsSqsClientBuilder;
    AmazonSQS sqsClient = awsSqsClientBuilder.build();
    if (batch.isEmpty()) {
        return;
    }
    sqsClient.sendMessageBatch(sqsEndPoint, batch);
}

CodeGuru Reviewer detects this issue and makes the following recommendation to check the return value for failures.

Screenshot of recommendations for batch operations

You can accept this recommendation and add logging for the complete list of messages that failed to send, in addition to throwing an SQSUpdateException. See the following code:

public void flush(final String sqsEndPoint,
                     final List<SendMessageBatchRequestEntry> batch) {
    AwsSqsClientBuilder awsSqsClientBuilder;
    AmazonSQS sqsClient = awsSqsClientBuilder.build();
    if (batch.isEmpty()) {
        return;
    }
    SendMessageBatchResult result = sqsClient.sendMessageBatch(sqsEndPoint, batch);
    final List<BatchResultErrorEntry> failed = result.getFailed();
    if (!failed.isEmpty()) {
           final String failedMessage = failed.stream()
                         .map(batchResultErrorEntry -> 
                            String.format("…", batchResultErrorEntry.getId(), 
                            batchResultErrorEntry.getMessage()))
                         .collect(Collectors.joining(","));
           throw new SQSUpdateException("Error occurred while sending 
                                        messages to SQS::" + failedMessage);
    }
}

Exception handling best practices

Amazon S3 is one of the most popular AWS services with our customers. A frequent operation with this service is to upload a stream-based object through an Amazon S3 client. Stream-based uploads might encounter occasional network connectivity or timeout issues, and the best practice to address such a scenario is to properly handle the corresponding ResetException error. ResetException extends SdkClientException, which subsequently extends AmazonClientException. Consider the following code snippet, which lacks such exception handling:

private void uploadInputStreamToS3(String bucketName, 
                                   InputStream input, 
                                   String key, ObjectMetadata metadata) 
                         throws SdkClientException {
    final AmazonS3Client amazonS3Client;
    PutObjectRequest putObjectRequest =
          new PutObjectRequest(bucketName, key, input, metadata);
    amazonS3Client.putObject(putObjectRequest);
}

In this case, CodeGuru Reviewer correctly detects the missing handling of the ResetException error and suggests possible solutions.

Screenshot of recommendations for handling exceptions

This recommendation is rich in that it provides alternatives to suit different use cases. The most common handling uses File or FileInputStream objects, but in other cases explicit handling of mark and reset operations are necessary to reliably avoid a ResetException.

You can fix the code by explicitly setting a predefined read limit using the setReadLimit method of RequestClientOptions. Its default value is 128 KB. Setting the read limit value to one byte greater than the size of stream reliably avoids a ResetException.

For example, if the maximum expected size of a stream is 100,000 bytes, set the read limit to 100,001 (100,000 + 1) bytes. The mark and reset always work for 100,000 bytes or less. However, this might cause some streams to buffer that number of bytes into memory.

The fix reliably avoids ResetException when uploading an object of type InputStream to Amazon S3:

private void uploadInputStreamToS3(String bucketName, InputStream input, 
                                   String key, ObjectMetadata metadata) 
                             throws SdkClientException {
        final AmazonS3Client amazonS3Client;
        final Integer READ_LIMIT = 10000;
        PutObjectRequest putObjectRequest =
   			new PutObjectRequest(bucketName, key, input, metadata);  
        putObjectRequest.getRequestClientOptions().setReadLimit(READ_LIMIT);
        amazonS3Client.putObject(putObjectRequest);
}

Replacing custom polling with waiters

A common activity when you’re working with services that are eventually consistent (such as DynamoDB) or have a lead time for creating resources (such as Amazon EC2) is to wait for a resource to transition into a desired state. The AWS SDK provides the Waiters API, a convenient and efficient feature for waiting that abstracts out the polling logic into a simple API call. If you’re not aware of this feature, you might come up with a custom, and potentially inefficient polling logic to determine whether a particular resource had transitioned into a desired state.

The following code appears to be waiting for the status of EC2 instances to change to shutting-down or terminated inside a while (true) loop:

private boolean terminateInstance(final String instanceId, final AmazonEC2 ec2Client)
    throws InterruptedException {
    long start = System.currentTimeMillis();
    while (true) {
        try {
            DescribeInstanceStatusResult describeInstanceStatusResult = 
                            ec2Client.describeInstanceStatus(new DescribeInstanceStatusRequest()
                            .withInstanceIds(instanceId).withIncludeAllInstances(true));
            List<InstanceStatus> instanceStatusList = 
                       describeInstanceStatusResult.getInstanceStatuses();
            long finish = System.currentTimeMillis();
            long timeElapsed = finish - start;
            if (timeElapsed > INSTANCE_TERMINATION_TIMEOUT) {
                break;
            }
            if (instanceStatusList.size() < 1) {
                Thread.sleep(WAIT_FOR_TRANSITION_INTERVAL);
                continue;
            }
            currentState = instanceStatusList.get(0).getInstanceState().getName();
            if ("shutting-down".equals(currentState) || "terminated".equals(currentState)) {
                return true;
             } else {
                 Thread.sleep(WAIT_FOR_TRANSITION_INTERVAL);
             }
        } catch (AmazonServiceException ex) {
            throw ex;
        }
        …
 }

CodeGuru Reviewer detects the polling scenario and recommends you use the waiters feature to help improve efficiency of such programs.

Screenshot of recommendations for introducing waiters feature

Based on the recommendation, the following code uses the waiters function that is available in the AWS SDK for Java. The polling logic is replaced with the waiters() function, which is then run with the call to waiters.run(…), which accepts custom provided parameters, including the request and optional custom polling strategy. The run function polls synchronously until it’s determined that the resource transitioned into the desired state or not. The SDK throws a WaiterTimedOutException if the resource doesn’t transition into the desired state even after a certain number of retries. The fixed code is more efficient, simple, and abstracts the polling logic to determine whether a particular resource had transitioned into a desired state into a simple API call:

public void terminateInstance(final String instanceId, final AmazonEC2 ec2Client)
    throws InterruptedException {
    Waiter<DescribeInstancesRequest> waiter = ec2Client.waiters().instanceTerminated();
    ec2Client.terminateInstances(new TerminateInstancesRequest().withInstanceIds(instanceId));
    try {
        waiter.run(new WaiterParameters()
              .withRequest(new DescribeInstancesRequest()
              .withInstanceIds(instanceId))
              .withPollingStrategy(new PollingStrategy(new MaxAttemptsRetryStrategy(60), 
                    new FixedDelayStrategy(5))));
    } catch (WaiterTimedOutException e) {
        List<InstanceStatus> instanceStatusList = ec2Client.describeInstanceStatus(
               new DescribeInstanceStatusRequest()
                        .withInstanceIds(instanceId)
                        .withIncludeAllInstances(true))
                        .getInstanceStatuses();
        String state;
        if (instanceStatusList != null && instanceStatusList.size() > 0) {
            state = instanceStatusList.get(0).getInstanceState().getName();
        }
    }
}

Service-specific best practice recommendations

In addition to the SDK operation-specific recommendations in the AWS SDK for Java we discussed, there are various AWS service-specific best practice recommendations pertaining to service APIs for services such as Amazon S3, Amazon EC2, DynamoDB, and more, where CodeGuru Reviewer can help to improve Java applications that use AWS service clients. For example, CodeGuru can detect the following:

  • Resource leaks in Java applications that use high-level libraries, such as the Amazon S3 TransferManager
  • Deprecated methods in various AWS services
  • Missing null checks on the response of the GetItem API call in DynamoDB
  • Missing error handling in the output of the PutRecords API call in Kinesis
  • Anti-patterns such as binding the SNS subscribe or createTopic operation with Publish operation

Conclusion

This post introduced how to use CodeGuru Reviewer to improve the use of the AWS SDK in Java applications. CodeGuru is now available for you to try. For pricing information, see Amazon CodeGuru pricing.

Understanding memory usage in your Java application with Amazon CodeGuru Profiler

Post Syndicated from Fernando Ciciliati original https://aws.amazon.com/blogs/devops/understanding-memory-usage-in-your-java-application-with-amazon-codeguru-profiler/

“Where has all that free memory gone?” This is the question we ask ourselves every time our application emits that dreaded OutOfMemoyError just before it crashes. Amazon CodeGuru Profiler can help you find the answer.

Thanks to its brand-new memory profiling capabilities, troubleshooting and resolving memory issues in Java applications (or almost anything that runs on the JVM) is much easier. AWS launched the CodeGuru Profiler Heap Summary feature at re:Invent 2020. This is the first step in helping us, developers, understand what our software is doing with all that memory it uses.

The Heap Summary view shows a list of Java classes and data types present in the Java Virtual Machine heap, alongside the amount of memory they’re retaining and the number of instances they represent. The following screenshot shows an example of this view.

Amazon CodeGuru Profiler heap summary view example

Figure: Amazon CodeGuru Profiler Heap Summary feature

Because CodeGuru Profiler is a low-overhead, production profiling service designed to be always on, it can capture and represent how memory utilization varies over time, providing helpful visual hints about the object types and the data types that exhibit a growing trend in memory consumption.

In the preceding screenshot, we can see that several lines on the graph are trending upwards:

  • The red top line, horizontal and flat, shows how much memory has been reserved as heap space in the JVM. In this case, we see a heap size of 512 MB, which can usually be configured in the JVM with command line parameters like -Xmx.
  • The second line from the top, blue, represents the total memory in use in the heap, independent of their type.
  • The third, fourth, and fifth lines show how much memory space each specific type has been using historically in the heap. We can easily spot that java.util.LinkedHashMap$Entry and java.lang.UUID display growing trends, whereas byte[] has a flat line and seems stable in memory usage.

Types that exhibit constantly growing trend of memory utilization with time deserve a closer look. Profiler helps you focus your attention on these cases. Associating the information presented by the Profiler with your own knowledge of your application and code base, you can evaluate whether the amount of memory being used for a specific data type can be considered normal, or if it might be a memory leak – the unintentional holding of memory by an application due to the failure in freeing-up unused objects. In our example above, java.util.LinkedHashMap$Entry and java.lang.UUIDare good candidates for investigation.

To make this functionality available to customers, CodeGuru Profiler uses the power of Java Flight Recorder (JFR), which is now openly available with Java 8 (since OpenJDK release 262) and above. The Amazon CodeGuru Profiler agent for Java, which already does an awesome job capturing data about CPU utilization, has been extended to periodically collect memory retention metrics from JFR and submit them for processing and visualization via Amazon CodeGuru Profiler. Thanks to its high stability and low overhead, the Profiler agent can be safely deployed to services in production, because it is exactly there, under real workloads, that really interesting memory issues are most likely to show up.

Summary

For more information about CodeGuru Profiler and other AI-powered services in the Amazon CodeGuru family, see Amazon CodeGuru. If you haven’t tried the CodeGuru Profiler yet, start your 90-day free trial right now and understand why continuous profiling is becoming a must-have in every production environment. For Amazon CodeGuru customers who are already enjoying the benefits of always-on profiling, this new feature is available at no extra cost. Just update your Profiler agent to version 1.1.0 or newer, and enable Heap Summary in your agent configuration.

 

Happy profiling!

Resource leak detection in Amazon CodeGuru Reviewer

Post Syndicated from Pranav Garg original https://aws.amazon.com/blogs/devops/resource-leak-detection-in-amazon-codeguru/

This post discusses the resource leak detector for Java in Amazon CodeGuru Reviewer. CodeGuru Reviewer automatically analyzes pull requests (created in supported repositories such as AWS CodeCommit, GitHub, GitHub Enterprise, and Bitbucket) and generates recommendations for improving code quality. For more information, see Automating code reviews and application profiling with Amazon CodeGuru. This blog does not describe the resource leak detector for Python programs that is now available in preview.

What are resource leaks?

Resources are objects with a limited availability within a computing system. These typically include objects managed by the operating system, such as file handles, database connections, and network sockets. Because the number of such resources in a system is limited, they must be released by an application as soon as they are used. Otherwise, you will run out of resources and you won’t be able to allocate new ones. The paradigm of acquiring a resource and releasing it is also followed by other categories of objects such as metric wrappers and timers.

Resource leaks are bugs that arise when a program doesn’t release the resources it has acquired. Resource leaks can lead to resource exhaustion. In the worst case, they can cause the system to slow down or even crash.

Starting with Java 7, most classes holding resources implement the java.lang.AutoCloseable interface and provide a close() method to release them. However, a close() call in source code doesn’t guarantee that the resource is released along all program execution paths. For example, in the following sample code, resource r is acquired by calling its constructor and is closed along the path corresponding to the if branch, shown using green arrows. To ensure that the acquired resource doesn’t leak, you must also close r along the path corresponding to the else branch (the path shown using red arrows).

A resource must be closed along all execution paths to prevent resource leaks

Often, resource leaks manifest themselves along code paths that aren’t frequently run, or under a heavy system load, or after the system has been running for a long time. As a result, such leaks are latent and can remain dormant in source code for long periods of time before manifesting themselves in production environments. This is the primary reason why resource leak bugs are difficult to detect or replicate during testing, and why automatically detecting these bugs during pull requests and code scans is important.

Detecting resource leaks in CodeGuru Reviewer

For this post, we consider the following Java code snippet. In this code, method getConnection() attempts to create a connection in the connection pool associated with a data source. Typically, a connection pool limits the maximum number of connections that can remain open at any given time. As a result, you must close connections after their use so as to not exhaust this limit.

 1     private Connection getConnection(final BasicDataSource dataSource, ...)
               throws ValidateConnectionException, SQLException {
 2         boolean connectionAcquired = false;
 3         // Retrying three times to get the connection.
 4         for (int attempt = 0; attempt < CONNECTION_RETRIES; ++attempt) {
 5             Connection connection = dataSource.getConnection();
 6             // validateConnection may throw ValidateConnectionException
 7             if (! validateConnection(connection, ...)) {
 8                 // connection is invalid
 9                 DbUtils.closeQuietly(connection);
10             } else {
11                 // connection is established
12                 connectionAcquired = true;
13                 return connection;
14             }
15         }
16         return null;
17     }

At first glance, it seems that the method getConnection() doesn’t leak connection resources. If a valid connection is established in the connection pool (else branch on line 10 is taken), the method getConnection() returns it to the client for use (line 13). If the connection established is invalid (if branch on line 7 is taken), it’s closed in line 9 before another attempt is made to establish a connection.

However, method validateConnection() at line 7 can throw a ValidateConnectionException. If this exception is thrown after a connection is established at line 5, the connection is neither closed in this method nor is it returned upstream to the client to be closed later. Furthermore, if this exceptional code path runs frequently, for instance, if the validation logic throws on a specific recurring service request, each new request causes a connection to leak in the connection pool. Eventually, the client can’t acquire new connections to the data source, impacting the availability of the service.

A typical recommendation to prevent resource leak bugs is to declare the resource objects in a try-with-resources statement block. However, we can’t use try-with-resources to fix the preceding method because this method is required to return an open connection for use in the upstream client. The CodeGuru Reviewer recommendation for the preceding code snippet is as follows:

“Consider closing the following resource: connection. The resource is referenced at line 7. The resource is closed at line 9. The resource is returned at line 13. There are other execution paths that don’t close the resource or return it, for example, when validateConnection throws an exception. To prevent this resource leak, close connection along these other paths before you exit this method.”

As mentioned in the Reviewer recommendation, to prevent this resource leak, you must close the established connection when method validateConnection() throws an exception. This can be achieved by inserting the validation logic (lines 7–14) in a try block. In the finally block associated with this try, the connection must be closed by calling DbUtils.closeQuietly(connection) if connectionAcquired == false. The method getConnection() after this fix has been applied is as follows:

private Connection getConnection(final BasicDataSource dataSource, ...) 
        throws ValidateConnectionException, SQLException {
    boolean connectionAcquired = false;
    // Retrying three times to get the connection.
    for (int attempt = 0; attempt < CONNECTION_RETRIES; ++attempt) {
        Connection connection = dataSource.getConnection();
        try {
            // validateConnection may throw ValidateConnectionException
            if (! validateConnection(connection, ...)) {
                // connection is invalid
                DbUtils.closeQuietly(connection);
            } else {
                // connection is established
                connectionAcquired = true;
                return connection;
            }
        } finally {
            if (!connectionAcquired) {
                DBUtils.closeQuietly(connection);
            }
        }
    }
    return null;
}

As shown in this example, resource leaks in production services can be very disruptive. Furthermore, leaks that manifest along exceptional or less frequently run code paths can be hard to detect or replicate during testing and can remain dormant in the code for long periods of time before manifesting themselves in production environments. With the resource leak detector, you can detect such leaks on objects belonging to a large number of popular Java types such as file streams, database connections, network sockets, timers and metrics, etc.

Combining static code analysis with machine learning for accurate resource leak detection

In this section, we dive deep into the inner workings of the resource leak detector. The resource leak detector in CodeGuru Reviewer uses static analysis algorithms and techniques. Static analysis algorithms perform code analysis without running the code. These algorithms are generally prone to high false positives (the tool might report correct code as having a bug). If the number of these false positives is high, it can lead to alarm fatigue and low adoption of the tool. As a result, the resource leak detector in CodeGuru Reviewer prioritizes precision over recall— the findings we surface are resource leaks with a high accuracy, though CodeGuru Reviewer could potentially miss some resource leak findings.

The main reason for false positives in static code analysis is incomplete information available to the analysis. CodeGuru Reviewer requires only the Java source files and doesn’t require all dependencies or the build artifacts. Not requiring the external dependencies or the build artifacts reduces the friction to perform automated code reviews. As a result, static analysis only has access to the code in the source repository and doesn’t have access to its external dependencies. The resource leak detector in CodeGuru Reviewer combines static code analysis with a machine learning (ML) model. This ML model is used to reason about external dependencies to provide accurate recommendations.

To understand the use of the ML model, consider again the code above for method getConnection() that had a resource leak. In the code snippet, a connection to the data source is established by calling BasicDataSource.getConnection() method, declared in the Apache Commons library. As mentioned earlier, we don’t require the source code of external dependencies like the Apache library for code analysis during pull requests. Without access to the code of external dependencies, a pure static analysis-driven technique doesn’t know whether the Connection object obtained at line 5 will leak, if not closed. Similarly, it doesn’t know that DbUtils.closeQuietly() is a library function that closes the connection argument passed to it at line 9. Our detector combines static code analysis with ML that learns patterns over such external function calls from a large number of available code repositories. As a result, our resource leak detector knows that the connection doesn’t leak along the following code path:

  • A connection is established on line 5
  • Method validateConnection() returns false at line 7
  • DbUtils.closeQuietly() is called on line 9

This suppresses the possible false warning. At the same time, the detector knows that there is a resource leak when the connection is established at line 5, and validateConnection() throws an exception at line 7 that isn’t caught.

When we run CodeGuru Reviewer on this code snippet, it surfaces only the second leak scenario and makes an appropriate recommendation to fix this bug.

The ML model used in the resource leak detector has been trained on a large number of internal Amazon and GitHub code repositories.

Responses to the resource leak findings

Although closing an open resource in code isn’t difficult, doing so properly along all program paths is important to prevent resource leaks. This can easily be overlooked, especially along exceptional or less frequently run paths. As a result, the resource leak detector in CodeGuru Reviewer has observed a relatively high frequency, and has alerted developers within Amazon to thousands of resource leaks before they hit production.

The resource leak detections have witnessed a high developer acceptance rate, and developer feedback towards the resource leak detector has been very positive. Some of the feedback from developers includes “Very cool, automated finding,” “Good bot :),” and “Oh man, this is cool.” Developers have also concurred that the findings are important and need to be fixed.

Conclusion

Resource leak bugs are difficult to detect or replicate during testing. They can impact the availability of production services. As a result, it’s important to automatically detect these bugs early on in the software development workflow, such as during pull requests or code scans. The resource leak detector in CodeGuru Reviewer combines static code analysis algorithms with ML to surface only the high confidence leaks. It has a high developer acceptance rate and has alerted developers within Amazon to thousands of leaks before those leaks hit production.

Continuously building and delivering Maven artifacts to AWS CodeArtifact

Post Syndicated from Vinay Selvaraj original https://aws.amazon.com/blogs/devops/continuously-building-and-delivering-maven-artifacts-to-aws-codeartifact/

Artifact repositories are often used to share software packages for use in builds and deployments. Java developers using Apache Maven use artifact repositories to share and reuse Maven packages. For example, one team might own a web service framework that is used by multiple other teams to build their own services. The framework team can publish the framework as a Maven package to an artifact repository, where new versions can be picked up by the service teams as they become available. This post explains how you can set up a continuous integration pipeline with AWS CodePipeline and AWS CodeBuild to deploy Maven artifacts to AWS CodeArtifact. CodeArtifact is a fully managed pay-as-you-go artifact repository service with support for software package managers and build tools like Maven, Gradle, npm, yarn, twine, and pip.

Solution overview

The pipeline we build is triggered each time a code change is pushed to the AWS CodeCommit repository. The code is compiled using the Java compiler, unit tested, and deployed to CodeArtifact. After the artifact is published, it can be consumed by developers working in applications that have a dependency on the artifact or by builds running in other pipelines. The following diagram illustrates this architecture.

Architecture diagram of the solution

 

All the components in this pipeline are fully managed and you don’t pay for idle capacity or have to manage any servers.

 

Prerequisites

This post assumes you have the following tools installed and configured:

 

Creating your resources

To create the CodeArtifact domain, CodeArtifact repository, CodeCommit, CodePipeline, CodeBuild, and associated resources, we use AWS CloudFormation. Save the provided CloudFormation template below as codeartifact-cicd-pipeline.yaml and create a stack:


---
Description: Code Artifact CI/CD Pipeline

Parameters:
  GitRepoBranchName:
    Type: String
    Default: main

Resources:

  ArtifactBucket:
    Type: AWS::S3::Bucket
  
  CodeArtifactDomain:
    Type: AWS::CodeArtifact::Domain
    Properties:
      DomainName: !Sub "${AWS::StackName}-domain"
  
  CodeArtifactRepository:
    Type: AWS::CodeArtifact::Repository
    Properties:
      DomainName: !GetAtt CodeArtifactDomain.Name
      RepositoryName: !Sub "${AWS::StackName}-repo"

  CodeRepository:
    Type: AWS::CodeCommit::Repository
    Properties:
      RepositoryDescription: Maven artifact code repository
      RepositoryName: !Sub "${AWS::StackName}-maven-artifact-repo"
  
  CodeBuildProject:
    Type: AWS::CodeBuild::Project
    Properties:
      Name: !Sub "${AWS::StackName}-CodeBuild"
      Artifacts:
        Type: CODEPIPELINE
      Environment:
        EnvironmentVariables:
          - Name: CODEARTIFACT_DOMAIN
            Type: PLAINTEXT
            Value: !GetAtt CodeArtifactDomain.Name
          - Name: CODEARTIFACT_REPO
            Type: PLAINTEXT
            Value: !GetAtt CodeArtifactRepository.Name
        Type: LINUX_CONTAINER
        ComputeType: BUILD_GENERAL1_SMALL
        Image: aws/codebuild/amazonlinux2-x86_64-standard:3.0
      ServiceRole: !GetAtt CodeBuildServiceRole.Arn
      Source:
        Type: CODEPIPELINE
        BuildSpec: buildspec.yaml
  
  Pipeline:
    Type: AWS::CodePipeline::Pipeline
    Properties:
      ArtifactStore:
        Type: S3
        Location: !Ref ArtifactBucket
      RoleArn: !GetAtt CodePipelineServiceRole.Arn
      Stages:
      - Name: Source
        Actions:
        - Name: SourceAction
          ActionTypeId:
            Category: Source
            Owner: AWS
            Version: '1'
            Provider: CodeCommit
          OutputArtifacts:
          - Name: SourceBundle
          Configuration:
            BranchName: !Ref GitRepoBranchName
            RepositoryName: !GetAtt CodeRepository.Name
          RunOrder: '1'

      - Name: Deliver
        Actions:
        - Name: CodeBuild
          InputArtifacts:
          - Name: SourceBundle
          ActionTypeId:
            Category: Build
            Owner: AWS
            Version: '1'
            Provider: CodeBuild
          Configuration:
            ProjectName: !Ref CodeBuildProject
          RunOrder: '1'

  CodeBuildServiceRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Sid: ''
          Effect: Allow
          Principal:
            Service:
            - codebuild.amazonaws.com
          Action: sts:AssumeRole
      Policies:
      - PolicyName: CodePipelinePolicy
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Sid: CloudWatchLogsPolicy
            Effect: Allow
            Action:
            - logs:CreateLogGroup
            - logs:CreateLogStream
            - logs:PutLogEvents
            Resource:
            - "*"
          - Sid: CodeCommitPolicy
            Effect: Allow
            Action:
            - codecommit:GitPull
            Resource:
            - !GetAtt CodeRepository.Arn
          - Sid: S3GetObjectPolicy
            Effect: Allow
            Action:
            - s3:GetObject
            - s3:GetObjectVersion
            Resource:
            - !Sub "arn:aws:s3:::${ArtifactBucket}/*"
          - Sid: S3PutObjectPolicy
            Effect: Allow
            Action:
            - s3:PutObject
            Resource:
            - !Sub "arn:aws:s3:::${ArtifactBucket}/*"
          - Sid: BearerTokenPolicy
            Effect: Allow
            Action:
            - sts:GetServiceBearerToken
            Resource: "*"
            Condition:
              StringEquals:
                sts:AWSServiceName: codeartifact.amazonaws.com
          - Sid: CodeArtifactPolicy
            Effect: Allow
            Action:
            - codeartifact:GetAuthorizationToken
            Resource:
            - !Sub "arn:aws:codeartifact:${AWS::Region}:${AWS::AccountId}:domain/${CodeArtifactDomain.Name}"
          - Sid: CodeArtifactPackage
            Effect: Allow
            Action:
            - codeartifact:PublishPackageVersion
            - codeartifact:PutPackageMetadata
            - codeartifact:ReadFromRepository
            Resource:
            - !Sub "arn:aws:codeartifact:${AWS::Region}:${AWS::AccountId}:package/${CodeArtifactDomain.Name}/${CodeArtifactRepository.Name}/*"
          - Sid: CodeArtifactRepository
            Effect: Allow
            Action:
            - codeartifact:ReadFromRepository
            - codeartifact:GetRepositoryEndpoint
            Resource:
            - !Sub "arn:aws:codeartifact:${AWS::Region}:${AWS::AccountId}:repository/${CodeArtifactDomain.Name}/${CodeArtifactRepository.Name}"          

  CodePipelineServiceRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Sid: ''
          Effect: Allow
          Principal:
            Service:
            - codepipeline.amazonaws.com
          Action: sts:AssumeRole
      Policies:
      - PolicyName: CodePipelinePolicy
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Action:
            - s3:GetObject
            - s3:GetObjectVersion
            - s3:GetBucketVersioning
            Resource: !Sub "arn:aws:s3:::${ArtifactBucket}/*"
            Effect: Allow
          - Action:
            - s3:PutObject
            Resource:
            - !Sub "arn:aws:s3:::${ArtifactBucket}/*"
            Effect: Allow
          - Action:
            - codecommit:GetBranch
            - codecommit:GetCommit
            - codecommit:UploadArchive
            - codecommit:GetUploadArchiveStatus
            - codecommit:CancelUploadArchive
            Resource:
              - !GetAtt CodeRepository.Arn
            Effect: Allow
          - Action:
            - codebuild:StartBuild
            - codebuild:BatchGetBuilds
            Resource: 
              - !GetAtt CodeBuildProject.Arn
            Effect: Allow
          - Action:
            - iam:PassRole
            Resource: "*"
            Effect: Allow
Outputs:
  CodePipelineArtifactBucket:
    Value: !Ref ArtifactBucket
  CodeRepositoryHttpCloneUrl:
    Value: !GetAtt CodeRepository.CloneUrlHttp
  CodeRepositorySshCloneUrl:
    Value: !GetAtt CodeRepository.CloneUrlSsh

aws cloudformation deploy                         \
  --stack-name codeartifact-pipeline               \
  --template-file codeartifact-cicd-pipeline.yaml  \
  --capabilities CAPABILITY_IAM

 

If you have a Maven project you want to use, you can use that. Otherwise, create a new one:


mvn archetype:generate        \
  -DgroupId=com.mycompany.app \
  -DartifactId=my-app         \
  -DarchetypeArtifactId=maven-archetype-quickstart \
  -DarchetypeVersion=1.4 -DinteractiveMode=false

 

Initialize a Git repository for the Maven project and add the CodeCommit repository that was created in the CloudFormation stack as a remote repository:


cd my-app
git init
CODECOMMIT_URL=$(aws cloudformation describe-stacks --stack-name codeartifact-pipeline --query "Stacks[0].Outputs[?OutputKey=='CodeRepositoryHttpCloneUrl'].OutputValue" --output text)
git remote add origin $CODECOMMIT_URL

 

Updating the POM file

The Maven project’s POM file needs to be updated with the distribution management section. This lets Maven know where to publish artifacts. Add the distributionManagement section inside the project element of the POM. Be sure to update the URL with the correct URL for the CodeArtifact repository you created earlier. You can find the CodeArtifact repository URL with the get-repository-endpoint CLI command:


aws codeartifact get-repository-endpoint --domain codeartifact-pipeline-domain  --repository codeartifact-pipeline-repo --format maven

 

Add the following to the Maven project’s pom.xml:


<distributionManagement>
  <repository>
    <id>codeartifact</id>
    <name>codeartifact</name>
    <url>Replace with the URL from the get-repository-endpoint command</url>
  </repository>
</distributionManagement>

Creating a settings.xml file

Maven needs credentials to use to authenticate with CodeArtifact when it performs the deployment. CodeArtifact uses temporary authorization tokens. To pass the token to Maven, a settings.xml file is created in the top level of the Maven project. During the deployment stage, Maven is instructed to use the settings.xml in the top level of the project instead of the settings.xml that normally resides in $HOME/.m2. Create a settings.xml in the top level of the Maven project with the following contents:


<settings>
  <servers>
    <server>
      <id>codeartifact</id>
      <username>aws</username>
      <password>${env.CODEARTIFACT_TOKEN}</password>
    </server>
  </servers>
</settings>

Creating the buildspec.yaml file

CodeBuild uses a build specification file with commands and related settings that are used during the build, test, and delivery of the artifact. In the build specification file, we specify the CodeBuild runtime to use pre-build actions (update AWS CLI), and build actions (Maven build, test, and deploy). When Maven is invoked, it is provided the path to the settings.xml created in the previous step, instead of the default in $HOME/.m2/settings.xml. Create the buildspec.yaml as shown in the following code:


version: 0.2

phases:
  install:
    runtime-versions:
      java: corretto11

  pre_build:
    commands:
      - pip3 install awscli --upgrade --user

  build:
    commands:
      - export CODEARTIFACT_TOKEN=`aws codeartifact get-authorization-token --domain ${CODEARTIFACT_DOMAIN} --query authorizationToken --output text`
      - mvn -s settings.xml clean package deploy

 

Running the pipeline

The final step is to add the files in the Maven project to the Git repository and push the changes to CodeCommit. This triggers the pipeline to run. See the following code:


git checkout -b main
git add settings.xml buildspec.yaml pom.xml src
git commit -a -m "Initial commit"
git push --set-upstream origin main

 

Checking the pipeline

At this point, the pipeline starts to run. To check its progress, sign in to the AWS Management Console and choose the Region where you created the pipeline. On the CodePipeline console, open the pipeline that the CloudFormation stack created. The pipeline’s name is prefixed with the stack name. If you open the CodePipeline console before the pipeline is complete, you can watch each stage run (see the following screenshot).

CodePipeline Screenshot

If you see that the pipeline failed, you can choose the details in the action that failed for more information.

Checking for new artifacts published in CodeArtifact

When the pipeline is complete, you should be able to see the artifact in the CodeArtifact repository you created earlier. The artifact we published for this post is a Maven snapshot. CodeArtifact handles snapshots differently than release versions. For more information, see Use Maven snapshots. To find the artifact in CodeArtifact, complete the following steps:

  1. On the CodeArtifact console, choose Repositories.
  2. Choose the repository we created earlier named myrepo.
  3. Search for the package named my-app.
  4. Choose the my-app package from the search results.
    CodeArtifact Assets
  5. Choose the Dependencies tab to bring up a list of Maven dependencies that the Maven project depends on.CodeArtifact Dependencies

 

Cleaning up

To clean up the resources you created in this post, you need to remove them in the following order:


# Empty the CodePipeline S3 artifact bucket
CODEPIPELINE_BUCKET=$(aws cloudformation describe-stacks --stack-name codeartifact-pipeline --query "Stacks[0].Outputs[?OutputKey=='CodePipelineArtifactBucket'].OutputValue" --output text)
aws s3 rm s3://$CODEPIPELINE_BUCKET --recursive

# Delete the CloudFormation stack
aws cloudformation delete-stack --stack-name codeartifact-pipeline

Conclusion

This post covered how to build a continuous integration pipeline to deliver Maven artifacts to AWS CodeArtifact. You can modify this solution for your specific needs. For more information about CodeArtifact or the other services used, see the following:

 

Syntactic Sugar Is Not Always Good

Post Syndicated from Bozho original https://techblog.bozho.net/syntactic-sugar-is-not-always-good/

This write-up is partly inspired by a recent post by Vlad Mihalcea on LinkedIn about the recently introduced text blocks in Java. More about them can be read here.

Now, that’s a nice feature. I’ve used it in Scala several years ago, and other languages also have it, so it seems like a no-brainer to introduce it in Java.

But, syntactic sugar (please don’t argue whether that’s precisely syntactic sugar or not) can be problematic and lead to “syntactic diabetes”. It has two possible issues.

The less important one is consistency – if you can do one thing in multiple, equally valid ways, that introduces inconsistency in the code and pointless arguments of “the right way to do things”. In this context – for 2-line strings do you use a text block or not? Should you do multi-line formatting for simple strings or not? Should you configure checkstyle rules to reject one or the other option and in what circumstances?

The second, and bigger problem, is code readability. I know it sounds counter-intuitive, but bear with me. The example that Vlad gave illustrates that – do you want to have a 20-line SQL query in your Java code? I’d say no – you’d better extract that to a separate file, and using some form of templating, populate the right values. This is certainly readable when you browse the code:

String query = QueryUtils.loadQuery("age-query.sql", timestamp, diff, other);
// or
String query = QueryUtils.loadQuery("age-query.sql", 
       Arrays.asList("param1Name", "param2Name"), 
      Arrays.asList(param1Value, param2Value);

Queries can be placed in /src/main/resources and loaded as templates (and cached) by the QueryUtils. And because of the previous lack of text blocks, you would not prefer ugly string concatenated queries inside your code.

But now, with this feature, you are tempted to do that, because, well, it looks good. Same goes for Elasticsearch queries, for JSON templates and whatnot. With this “sugar” you have more incentive to just put them in the code, where they, arguably, make the code less readable. If you really have to debug the query, as opposed to assuming its semantics by its name and relying on a proper implementation, you can easily go to age-query.sql and work with it. Just like when you extract a private method with some implementation details so that it makes the calling method more readable and easy to follow.

Both problems have manifested themselves in my Scala experience, which I’ve summed up in my talk “Scala – the good, the bad and the very ugly” (only slides available). Scala allows you to express things nicely and in multiple ways, which leads to inconsistencies and horrible code readability in some cases.

Counter intuitively, sometimes the syntactic improvements may be worse for code readability. Because they introduce complexity and because they make it easier to do the wrong thing.

That’s not a universal complaint, and certainly syntactic sugar is needed – you don’t have to write List<String> list = new ArrayList<String>() if you can use the diamond operator. But each such feature should be well thought not just for how easy it makes to write the code, but also how easy it is to read it, and more importantly - what type of code it incentivizes.

The post Syntactic Sugar Is Not Always Good appeared first on Bozho's tech blog.

Complete CI/CD with AWS CodeCommit, AWS CodeBuild, AWS CodeDeploy, and AWS CodePipeline

Post Syndicated from Nitin Verma original https://aws.amazon.com/blogs/devops/complete-ci-cd-with-aws-codecommit-aws-codebuild-aws-codedeploy-and-aws-codepipeline/

Many organizations have been shifting to DevOps practices, which is the combination of cultural philosophies, practices, and tools that increases your organization’s ability to deliver applications and services at high velocity; for example, evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes.

DevOps-Feedback-Flow

An integral part of DevOps is adopting the culture of continuous integration and continuous delivery/deployment (CI/CD), where a commit or change to code passes through various automated stage gates, all the way from building and testing to deploying applications, from development to production environments.

This post uses the AWS suite of CI/CD services to compile, build, and install a version-controlled Java application onto a set of Amazon Elastic Compute Cloud (Amazon EC2) Linux instances via a fully automated and secure pipeline. The goal is to promote a code commit or change to pass through various automated stage gates all the way from development to production environments, across AWS accounts.

AWS services

This solution uses the following AWS services:

  • AWS CodeCommit – A fully-managed source control service that hosts secure Git-based repositories. CodeCommit makes it easy for teams to collaborate on code in a secure and highly scalable ecosystem. This solution uses CodeCommit to create a repository to store the application and deployment codes.
  • AWS CodeBuild – A fully managed continuous integration service that compiles source code, runs tests, and produces software packages that are ready to deploy, on a dynamically created build server. This solution uses CodeBuild to build and test the code, which we deploy later.
  • AWS CodeDeploy – A fully managed deployment service that automates software deployments to a variety of compute services such as Amazon EC2, AWS Fargate, AWS Lambda, and your on-premises servers. This solution uses CodeDeploy to deploy the code or application onto a set of EC2 instances running CodeDeploy agents.
  • AWS CodePipeline – A fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates. This solution uses CodePipeline to create an end-to-end pipeline that fetches the application code from CodeCommit, builds and tests using CodeBuild, and finally deploys using CodeDeploy.
  • AWS CloudWatch Events – An AWS CloudWatch Events rule is created to trigger the CodePipeline on a Git commit to the CodeCommit repository.
  • Amazon Simple Storage Service (Amazon S3) – An object storage service that offers industry-leading scalability, data availability, security, and performance. This solution uses an S3 bucket to store the build and deployment artifacts created during the pipeline run.
  • AWS Key Management Service (AWS KMS) – AWS KMS makes it easy for you to create and manage cryptographic keys and control their use across a wide range of AWS services and in your applications. This solution uses AWS KMS to make sure that the build and deployment artifacts stored on the S3 bucket are encrypted at rest.

Overview of solution

This solution uses two separate AWS accounts: a dev account (111111111111) and a prod account (222222222222) in Region us-east-1.

We use the dev account to deploy and set up the CI/CD pipeline, along with the source code repo. It also builds and tests the code locally and performs a test deploy.

The prod account is any other account where the application is required to be deployed from the pipeline in the dev account.

In summary, the solution has the following workflow:

  • A change or commit to the code in the CodeCommit application repository triggers CodePipeline with the help of a CloudWatch event.
  • The pipeline downloads the code from the CodeCommit repository, initiates the Build and Test action using CodeBuild, and securely saves the built artifact on the S3 bucket.
  • If the preceding step is successful, the pipeline triggers the Deploy in Dev action using CodeDeploy and deploys the app in dev account.
  • If successful, the pipeline triggers the Deploy in Prod action using CodeDeploy and deploys the app in the prod account.

The following diagram illustrates the workflow:

cicd-overall-flow

 

Failsafe deployments

This example of CodeDeploy uses the IN_PLACE type of deployment. However, to minimize the downtime, CodeDeploy inherently supports multiple deployment strategies. This example makes use of following features: rolling deployments and automatic rollback.

CodeDeploy provides the following three predefined deployment configurations, to minimize the impact during application upgrades:

  • CodeDeployDefault.OneAtATime – Deploys the application revision to only one instance at a time
  • CodeDeployDefault.HalfAtATime – Deploys to up to half of the instances at a time (with fractions rounded down)
  • CodeDeployDefault.AllAtOnce – Attempts to deploy an application revision to as many instances as possible at once

For OneAtATime and HalfAtATime, CodeDeploy monitors and evaluates instance health during the deployment and only proceeds to the next instance or next half if the previous deployment is healthy. For more information, see Working with deployment configurations in CodeDeploy.

You can also configure a deployment group or deployment to automatically roll back when a deployment fails or when a monitoring threshold you specify is met. In this case, the last known good version of an application revision is automatically redeployed after a failure with the new application version.

How CodePipeline in the dev account deploys apps in the prod account

In this post, the deployment pipeline using CodePipeline is set up in the dev account, but it has permissions to deploy the application in the prod account. We create a special cross-account role in the prod account, which has the following:

  • Permission to use fetch artifacts (app) rom Amazon S3 and deploy it locally in the account using CodeDeploy
  • Trust with the dev account where the pipeline runs

CodePipeline in the dev account assumes this cross-account role in the prod account to deploy the app.

Do I need multiple accounts?
If you answer “yes” to any of the following questions you should consider creating more AWS accounts:

  • Does your business require administrative isolation between workloads? Administrative isolation by account is the most straightforward way to grant independent administrative groups different levels of administrative control over AWS resources based on workload, development lifecycle, business unit (BU), or data sensitivity.
  • Does your business require limited visibility and discoverability of workloads? Accounts provide a natural boundary for visibility and discoverability. Workloads cannot be accessed or viewed unless an administrator of the account enables access to users managed in another account.
  • Does your business require isolation to minimize blast radius? Separate accounts help define boundaries and provide natural blast-radius isolation to limit the impact of a critical event such as a security breach, an unavailable AWS Region or Availability Zone, account suspensions, and so on.
  • Does your business require a particular workload to operate within AWS service limits without impacting the limits of another workload? You can use AWS account service limits to impose restrictions on a business unit, development team, or project. For example, if you create an AWS account for a project group, you can limit the number of Amazon Elastic Compute Cloud (Amazon EC2) or high performance computing (HPC) instances that can be launched by the account.
  • Does your business require strong isolation of recovery or auditing data? If regulatory requirements require you to control access and visibility to auditing data, you can isolate the data in an account separate from the one where you run your workloads (for example, by writing AWS CloudTrail logs to a different account).

Prerequisites

For this walkthrough, you should complete the following prerequisites:

  1. Have access to at least two AWS accounts. For this post, the dev and prod accounts are in us-east-1. You can search and replace the Region and account IDs in all the steps and sample AWS Identity and Access Management (IAM) policies in this post.
  2. Ensure you have EC2 Linux instances with the CodeDeploy agent installed in all the accounts or VPCs where the sample Java application is to be installed (dev and prod accounts).
    • To manually create EC2 instances with CodeDeploy agent, refer Create an Amazon EC2 instance for CodeDeploy (AWS CLI or Amazon EC2 console). Keep in mind the following:
      • CodeDeploy uses EC2 instance tags to identify instances to use to deploy the application, so it’s important to set tags appropriately. For this post, we use the tag name Application with the value MyWebApp to identify instances where the sample app is installed.
      • Make sure to use an EC2 instance profile (AWS Service Role for EC2 instance) with permissions to read the S3 bucket containing artifacts built by CodeBuild. Refer to the IAM role cicd_ec2_instance_profile in the table Roles-1 below for the set of permissions required. You must update this role later with the actual KMS key and S3 bucket name created as part of the deployment process.
    • To create EC2 Linux instances via AWS Cloudformation, download and launch the AWS CloudFormation template from the GitHub repo: cicd-ec2-instance-with-codedeploy.json
      • This deploys an EC2 instance with AWS CodeDeploy agent.
      • Inputs required:
        • AMI : Enter name of the Linux AMI in your region. (This template has been tested with latest Amazon Linux 2 AMI)
        • Ec2SshKeyPairName: Name of an existing SSH KeyPair
        • Ec2IamInstanceProfile: Name of an existing EC2 instance profile. Note: Use the permissions in the template cicd_ec2_instance_profile_policy.json to create the policy for this EC2 Instance Profile role. You must update this role later with the actual KMS key and S3 bucket name created as part of the deployment process.
        • Update the EC2 instance Tags per your need.
  3. Ensure required IAM permissions. Have an IAM user with an IAM Group or Role that has the following access levels or permissions:

    AWS Service / Components  Access Level Accounts Comments
    AWS CodeCommit Full (admin) Dev Use AWS managed policy AWSCodeCommitFullAccess.
    AWS CodePipeline Full (admin) Dev Use AWS managed policy AWSCodePipelineFullAccess.
    AWS CodeBuild Full (admin) Dev Use AWS managed policy AWSCodeBuildAdminAccess.
    AWS CodeDeploy Full (admin) All

    Use AWS managed policy

    AWSCodeDeployFullAccess.

    Create S3 bucket and bucket policies Full (admin) Dev IAM policies can be restricted to specific bucket.
    Create KMS key and policies Full (admin) Dev IAM policies can be restricted to specific KMS key.
    AWS CloudFormation Full (admin) Dev

    Use AWS managed policy

    AWSCloudFormationFullAccess.

    Create and pass IAM roles Full (admin) All Ability to create IAM roles and policies can be restricted to specific IAM roles or actions. Also, an admin team with IAM privileges could create all the required roles. Refer to the IAM table Roles-1 below.
    AWS Management Console and AWS CLI As per IAM User permissions All To access suite of Code services.

     

  4. Create Git credentials for CodeCommit in the pipeline account (dev account). AWS allows you to either use Git credentials or associate SSH public keys with your IAM user. For this post, use Git credentials associated with your IAM user (created in the previous step). For instructions on creating a Git user, see Create Git credentials for HTTPS connections to CodeCommit. Download and save the Git credentials to use later for deploying the application.
  5. Create all AWS IAM roles as per the following tables (Roles-1). Make sure to update the following references in all the given IAM roles and policies:
    • Replace the sample dev account (111111111111) and prod account (222222222222) with actual account IDs
    • Replace the S3 bucket mywebapp-codepipeline-bucket-us-east-1-111111111111 with your preferred bucket name.
    • Replace the KMS key ID key/82215457-e360-47fc-87dc-a04681c91ce1 with your KMS key ID.

Table: Roles-1

Service IAM Role Type Account IAM Role Name (used for this post) IAM Role Policy (required for this post) IAM Role Permissions
AWS CodePipeline Service role Dev (111111111111)

cicd_codepipeline_service_role

Select Another AWS Account and use this account as the account ID to create the role.

Later update the trust as follows:
“Principal”: {“Service”: “codepipeline.amazonaws.com”},

Use the permissions in the template cicd_codepipeline_service_policy.json to create the policy for this role. This CodePipeline service role has appropriate permissions to the following services in a local account:

  • Manage CodeCommit repos
  • Initiate build via CodeBuild
  • Create deployments via CodeDeploy
  • Assume cross-account CodeDeploy role in prod account to deploy the application
AWS CodePipeline IAM role Dev (111111111111)

cicd_codepipeline_trigger_cwe_role

Select Another AWS Account and use this account as the account ID to create the role.

Later update the trust as follows:
“Principal”: {“Service”: “events.amazonaws.com”},

Use the permissions in the template cicd_codepipeline_trigger_cwe_policy.json to create the policy for this role. CodePipeline uses this role to set a CloudWatch event to trigger the pipeline when there is a change or commit made to the code repository.
AWS CodePipeline IAM role Prod (222222222222)

cicd_codepipeline_cross_ac_role

Choose Another AWS Account and use the dev account as the trusted account ID to create the role.

Use the permissions in the template cicd_codepipeline_cross_ac_policy.json to create the policy for this role. This role is created in the prod account and has permissions to use CodeDeploy and fetch from Amazon S3. The role is assumed by CodePipeline from the dev account to deploy the app in the prod account. Make sure to set up trust with the dev account for this IAM role on the Trust relationships tab.
AWS CodeBuild Service role Dev (111111111111)

cicd_codebuild_service_role

Choose CodeBuild as the use case to create the role.

Use the permissions in the template cicd_codebuild_service_policy.json to create the policy for this role. This CodeBuild service role has appropriate permissions to:

  • The S3 bucket to store artefacts
  • Stream logs to CloudWatch Logs
  • Pull code from CodeCommit
  • Get the SSM parameter for CodeBuild
  • Miscellaneous Amazon EC2 permissions
AWS CodeDeploy Service role Dev (111111111111) and Prod (222222222222)

cicd_codedeploy_service_role

Choose CodeDeploy as the use case to create the role.

Use the built-in AWS managed policy AWSCodeDeployRole for this role. This CodeDeploy service role has appropriate permissions to:

  • Miscellaneous Amazon EC2 Auto Scaling
  • Miscellaneous Amazon EC2
  • Publish Amazon SNS topic
  • AWS CloudWatch metrics
  • Elastic Load Balancing
EC2 Instance Service role for EC2 instance profile Dev (111111111111) and Prod (222222222222)

cicd_ec2_instance_profile

Choose EC2 as the use case to create the role.

Use the permissions in the template cicd_ec2_instance_profile_policy.json to create the policy for this role.

This is set as the EC2 instance profile for the EC2 instances where the app is deployed. It has appropriate permissions to fetch artefacts from Amazon S3 and decrypt contents using the KMS key.

 

You must update this role later with the actual KMS key and S3 bucket name created as part of the deployment process.

 

 

Setting up the prod account

To set up the prod account, complete the following steps:

  1. Download and launch the AWS CloudFormation template from the GitHub repo: cicd-codedeploy-prod.json
    • This deploys the CodeDeploy app and deployment group.
    • Make sure that you already have a set of EC2 Linux instances with the CodeDeploy agent installed in all the accounts where the sample Java application is to be installed (dev and prod accounts). If not, refer back to the Prerequisites section.
  2. Update the existing EC2 IAM instance profile (cicd_ec2_instance_profile):
    • Replace the S3 bucket name mywebapp-codepipeline-bucket-us-east-1-111111111111 with your S3 bucket name (the one used for the CodePipelineArtifactS3Bucket variable when you launched the CloudFormation template in the dev account).
    • Replace the KMS key ARN arn:aws:kms:us-east-1:111111111111:key/82215457-e360-47fc-87dc-a04681c91ce1 with your KMS key ARN (the one created as part of the CloudFormation template launch in the dev account).

Setting up the dev account

To set up your dev account, complete the following steps:

  1. Download and launch the CloudFormation template from the GitHub repo: cicd-aws-code-suite-dev.json
    The stack deploys the following services in the dev account:

    • CodeCommit repository
    • CodePipeline
    • CodeBuild environment
    • CodeDeploy app and deployment group
    • CloudWatch event rule
    • KMS key (used to encrypt the S3 bucket)
    • S3 bucket and bucket policy
  2. Use following values as inputs to the CloudFormation template. You should have created all the existing resources and roles beforehand as part of the prerequisites.

    Key Example Value Comments
    CodeCommitWebAppRepo MyWebAppRepo Name of the new CodeCommit repository for your web app.
    CodeCommitMainBranchName master Main branch name on your CodeCommit repository. Default is master (which is pushed to the prod environment).
    CodeBuildProjectName MyCBWebAppProject Name of the new CodeBuild environment.
    CodeBuildServiceRole arn:aws:iam::111111111111:role/cicd_codebuild_service_role ARN of an existing IAM service role to be associated with CodeBuild to build web app code.
    CodeDeployApp MyCDWebApp Name of the new CodeDeploy app to be created for your web app. We assume that the CodeDeploy app name is the same in all accounts where deployment needs to occur (in this case, the prod account).
    CodeDeployGroupDev MyCICD-Deployment-Group-Dev Name of the new CodeDeploy deployment group to be created in the dev account.
    CodeDeployGroupProd MyCICD-Deployment-Group-Prod Name of the existing CodeDeploy deployment group in prod account. Created as part of the prod account setup.

    CodeDeployGroupTagKey

     

    Application Name of the tag key that CodeDeploy uses to identify the existing EC2 fleet for the deployment group to use.

    CodeDeployGroupTagValue

     

    MyWebApp Value of the tag that CodeDeploy uses to identify the existing EC2 fleet for the deployment group to use.
    CodeDeployConfigName CodeDeployDefault.OneAtATime

    Desired Code Deploy config name. Valid options are:

    CodeDeployDefault.OneAtATime

    CodeDeployDefault.HalfAtATime

    CodeDeployDefault.AllAtOnce

    For more information, see Deployment configurations on an EC2/on-premises compute platform.

    CodeDeployServiceRole arn:aws:iam::111111111111:role/cicd_codedeploy_service_role

    ARN of an existing IAM service role to be associated with CodeDeploy to deploy web app.

     

    CodePipelineName MyWebAppPipeline Name of the new CodePipeline to be created for your web app.
    CodePipelineArtifactS3Bucket mywebapp-codepipeline-bucket-us-east-1-111111111111 Name of the new S3 bucket to be created where artifacts for the pipeline are stored for this web app.
    CodePipelineServiceRole arn:aws:iam::111111111111:role/cicd_codepipeline_service_role ARN of an existing IAM service role to be associated with CodePipeline to deploy web app.
    CodePipelineCWEventTriggerRole arn:aws:iam::111111111111:role/cicd_codepipeline_trigger_cwe_role ARN of an existing IAM role used to trigger the pipeline you named earlier upon a code push to the CodeCommit repository.
    CodeDeployRoleXAProd arn:aws:iam::222222222222:role/cicd_codepipeline_cross_ac_role ARN of an existing IAM role in the cross-account for CodePipeline to assume to deploy the app.

    It should take 5–10 minutes for the CloudFormation stack to complete. When the stack is complete, you can see that CodePipeline has built the pipeline (MyWebAppPipeline) with the CodeCommit repository and CodeBuild environment, along with actions for CodeDeploy in local (dev) and cross-account (prod). CodePipeline should be in a failed state because your CodeCommit repository is empty initially.

  3. Update the existing Amazon EC2 IAM instance profile (cicd_ec2_instance_profile):
    • Replace the S3 bucket name mywebapp-codepipeline-bucket-us-east-1-111111111111 with your S3 bucket name (the one used for the CodePipelineArtifactS3Bucket parameter when launching the CloudFormation template in the dev account).
    • Replace the KMS key ARN arn:aws:kms:us-east-1:111111111111:key/82215457-e360-47fc-87dc-a04681c91ce1 with your KMS key ARN (the one created as part of the CloudFormation template launch in the dev account).

Deploying the application

You’re now ready to deploy the application via your desktop or PC.

  1. Assuming you have the required HTTPS Git credentials for CodeCommit as part of the prerequisites, clone the CodeCommit repo that was created earlier as part of the dev account setup. Obtain the name of the CodeCommit repo to clone, from the CodeCommit console. Enter the Git user name and password when prompted. For example:
    $ git clone https://git-codecommit.us-east-1.amazonaws.com/v1/repos/MyWebAppRepo my-web-app-repo
    Cloning into 'my-web-app-repo'...
    Username for 'https://git-codecommit.us-east-1.amazonaws.com/v1/repos/MyWebAppRepo': xxxx
    Password for 'https://[email protected]/v1/repos/MyWebAppRepo': xxxx

  2. Download the MyWebAppRepo.zip file containing a sample Java application, CodeBuild configuration to build the app, and CodeDeploy config file to deploy the app.
  3. Copy and unzip the file into the my-web-app-repo Git repository folder created earlier.
  4. Assuming this is the sample app to be deployed, commit these changes to the Git repo. For example:
    $ cd my-web-app-repo 
    $ git add -A 
    $ git commit -m "initial commit" 
    $ git push

For more information, see Tutorial: Create a simple pipeline (CodeCommit repository).

After you commit the code, the CodePipeline will be triggered and all the stages and your application should be built, tested, and deployed all the way to the production environment!

The following screenshot shows the entire pipeline and its latest run:

 

Troubleshooting

To troubleshoot any service-related issues, see the following:

Cleaning up

To avoid incurring future charges or to remove any unwanted resources, delete the following:

  • EC2 instance used to deploy the application
  • CloudFormation template to remove all AWS resources created through this post
  •  IAM users or roles

Conclusion

Using this solution, you can easily set up and manage an entire CI/CD pipeline in AWS accounts using the native AWS suite of CI/CD services, where a commit or change to code passes through various automated stage gates all the way from building and testing to deploying applications, from development to production environments.

FAQs

In this section, we answer some frequently asked questions:

  1. Can I expand this deployment to more than two accounts?
    • Yes. You can deploy a pipeline in a tooling account and use dev, non-prod, and prod accounts to deploy code on EC2 instances via CodeDeploy. Changes are required to the templates and policies accordingly.
  2. Can I ensure the application isn’t automatically deployed in the prod account via CodePipeline and needs manual approval?
  3. Can I use a CodeDeploy group with an Auto Scaling group?
    • Yes. Minor changes required to the CodeDeploy group creation process. Refer to the following Solution Variations section for more information.
  4. Can I use this pattern for EC2 Windows instances?

Solution variations

In this section, we provide a few variations to our solution:

Author bio

author-pic

 Nitin Verma

Nitin is currently a Sr. Cloud Architect in the AWS Managed Services(AMS). He has many years of experience with DevOps-related tools and technologies. Speak to your AWS Managed Services representative to deploy this solution in AMS!

 

Bullet Updates – Windowing, Apache Pulsar PubSub, Configuration-based Data Ingestion, and More

Post Syndicated from rosaliebeevm original https://yahooeng.tumblr.com/post/183315480351

yahoodevelopers:

By Akshay Sarma, Principal Engineer, Verizon Media & Brian Xiao, Software Engineer, Verizon Media

This is the first of an ongoing series of blog posts sharing releases and announcements for Bullet, an open-sourced lightweight, scalable, pluggable, multi-tenant query system.

Bullet allows you to query any data flowing through a streaming system without having to store it first through its UI or API. The queries are injected into the running system and have minimal overhead. Running hundreds of queries generally fit into the overhead of just reading the streaming data. Bullet requires running an instance of its backend on your data. This backend runs on common stream processing frameworks (Storm and Spark Streaming currently supported).

The data on which Bullet sits determines what it is used for. For example, our team runs an instance of Bullet on user engagement data (~1M events/sec) to let developers find their own events to validate their code that produces this data. We also use this instance to interactively explore data, throw up quick dashboards to monitor live releases, count unique users, debug issues, and more.

Since open sourcing Bullet in 2017, we’ve been hard at work adding many new features! We’ll highlight some of these here and continue sharing update posts for future releases.

Windowing

Bullet used to operate in a request-response fashion – you would submit a query and wait for the query to meet its termination conditions (usually duration) before receiving results. For short-lived queries, say, a few seconds, this was fine. But as we started fielding more interactive and iterative queries, waiting even a minute for results became too cumbersome.

Enter windowing! Bullet now supports time and record-based windowing. With time windowing, you can break up your query into chunks of time over its duration and retrieve results for each chunk.  For example, you can calculate the average of a field, and stream back results every second:

In the above example, the aggregation is operating on all the data since the beginning of the query, but you can also do aggregations on just the windows themselves. This is often called a Tumbling window:

image

With record windowing, you can get the intermediate aggregation for each record that matches your query (a Sliding window). Or you can do a Tumbling window on records rather than time. For example, you could get results back every three records:

image

Overlapping windows in other ways (Hopping windows) or windows that reset based on different criteria (Session windows, Cascading windows) are currently being worked on. Stay tuned!

image
image

Apache Pulsar support as a native PubSub

Bullet uses a PubSub (publish-subscribe) message queue to send queries and results between the Web Service and Backend. As with everything else in Bullet, the PubSub is pluggable. You can use your favorite pubsub by implementing a few interfaces if you don’t want to use the ones we provide. Until now, we’ve maintained and supported a REST-based PubSub and an Apache Kafka PubSub. Now we are excited to announce supporting Apache Pulsar as well! Bullet Pulsar will be useful to those users who want to use Pulsar as their underlying messaging service.

If you aren’t familiar with Pulsar, setting up a local standalone is very simple, and by default, any Pulsar topics written to will automatically be created. Setting up an instance of Bullet with Pulsar instead of REST or Kafka is just as easy. You can refer to our documentation for more details.

image

Plug your data into Bullet without code

While Bullet worked on any data source located in any persistence layer, you still had to implement an interface to connect your data source to the Backend and convert it into a record container format that Bullet understands. For instance, your data might be located in Kafka and be in the Avro format. If you were using Bullet on Storm, you would perhaps write a Storm Spout to read from Kafka, deserialize, and convert the Avro data into the Bullet record format. This was the only interface in Bullet that required our customers to write their own code. Not anymore! Bullet DSL is a text/configuration-based format for users to plug in their data to the Bullet Backend without having to write a single line of code.

Bullet DSL abstracts away the two major components for plugging data into the Bullet Backend. A Connector piece to read from arbitrary data-sources and a Converter piece to convert that read data into the Bullet record container. We currently support and maintain a few of these – Kafka and Pulsar for Connectors and Avro, Maps and arbitrary Java POJOs for Converters. The Converters understand typed data and can even do a bit of minor ETL (Extract, Transform and Load) if you need to change your data around before feeding it into Bullet. As always, the DSL components are pluggable and you can write your own (and contribute it back!) if you need one that we don’t support.

We appreciate your feedback and contributions! Explore Bullet on GitHub, use and help contribute to the project, and chat with us on Google Groups. To get started, try our Quickstarts on Spark or Storm to set up an instance of Bullet on some fake data and play around with it.