Tag Archives: java

Improving AWS Java applications with Amazon CodeGuru Reviewer

Post Syndicated from Rajdeep Mukherjee original https://aws.amazon.com/blogs/devops/improving-aws-java-applications-with-amazon-codeguru-reviewer/

Amazon CodeGuru Reviewer is a machine learning (ML)-based AWS service for providing automated code reviews comments on your Java and Python applications. Powered by program analysis and ML, CodeGuru Reviewer detects hard-to-find bugs and inefficiencies in your code and leverages best practices learned from across millions of lines of open-source and Amazon code. You can start analyzing your code through pull requests and full repository analysis (for more information, see Automating code reviews and application profiling with Amazon CodeGuru).

The recommendations generated by CodeGuru Reviewer for Java fall into the following categories:

  • AWS best practices
  • Concurrency
  • Security
  • Resource leaks
  • Other specialized categories such as sensitive information leaks, input validation, and code clones
  • General best practices on data structures, control flow, exception handling, and more

We expect the recommendations to benefit beginners as well as expert Java programmers.

In this post, we showcase CodeGuru Reviewer recommendations related to using the AWS SDK for Java. For in-depth discussion of other specialized topics, see our posts on concurrency, security, and resource leaks. For Python applications, see Raising Python code quality using Amazon CodeGuru.

The AWS SDK for Java simplifies the use of AWS services by providing a set of features that are consistent and familiar for Java developers. The SDK has more than 250 AWS service clients, which are available on GitHub. Service clients include services like Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, Amazon Kinesis, Amazon Elastic Compute Cloud (Amazon EC2), AWS IoT, and Amazon SageMaker. These services constitute more than 6,000 operations, which you can use to access AWS services. With such rich and diverse services and APIs, developers may not always be aware of the nuances of AWS API usage. These nuances may not be important at the beginning, but become critical as the scale increases and the application evolves or becomes diverse. This is why CodeGuru Reviewer has a category of recommendations: AWS best practices. This category of recommendations enables you to become aware of certain features of AWS APIs so your code can be more correct and performant.

The first part of this post focuses on the key features of the AWS SDK for Java as well as API patterns in AWS services. The second part of this post demonstrates using CodeGuru Reviewer to improve code quality for Java applications that use the AWS SDK for Java.

AWS SDK for Java

The AWS SDK for Java supports higher-level abstractions for simplified development and provides support for cross-cutting concerns such as credential management, retries, data marshaling, and serialization. In this section, we describe a few key features that are supported in the AWS SDK for Java. Additionally, we discuss some key API patterns such as batching, and pagination, in AWS services.

The AWS SDK for Java has the following features:

  • Waiters Waiters are utility methods that make it easy to wait for a resource to transition into a desired state. Waiters makes it easier to abstract out the polling logic into a simple API call. The waiters interface provides a custom delay strategy to control the sleep time between retries, as well as a custom condition on whether polling of a resource should be retried. The AWS SDK for Java also offer an async variant of waiters.
  • Exceptions The AWS SDK for Java uses runtime (or unchecked) exceptions instead of checked exceptions in order to give you fine-grained control over the errors you want to handle and to prevent scalability issues inherent with checked exceptions in large applications. Broadly, the AWS SDK for Java has two types of exceptions:
    • AmazonClientException – Indicates that a problem occurred inside the Java client code, either while trying to send a request to AWS or while trying to parse a response from AWS. For example, the AWS SDK for Java throws an AmazonClientException if no network connection is available when you try to call an operation on one of the clients.
    • AmazonServiceException – Represents an error response from an AWS service. For example, if you try to end an EC2 instance that doesn’t exist, Amazon EC2 returns an error response, and all the details of that response are included in the AmazonServiceException that’s thrown. For some cases, a subclass of AmazonServiceException is thrown to allow you fine-grained control over handling error cases through catch blocks.

The API has the following patterns:

  • Batching – A batch operation provides you with the ability to perform a single CRUD operation (create, read, update, delete) on multiple resources. Some typical use cases include the following:
  • Pagination – Many AWS operations return paginated results when the response object is too large to return in a single response. To enable you to perform pagination, the request and response objects for many service clients in the SDK provide a continuation token (typically named NextToken) to indicate additional results.

AWS best practices

Now that we have summarized the SDK-specific features and API patterns, let’s look at the CodeGuru Reviewer recommendations on AWS API use.

The CodeGuru Reviewer recommendations for the AWS SDK for Java range from detecting outdated or deprecated APIs to warning about API misuse, missing pagination, authentication and exception scenarios, and using efficient API alternatives. In this section, we discuss a few examples patterned after real code.

Handling pagination

Over 1,000 APIs from more than 150 AWS services have pagination operations. The pagination best practice rule in CodeGuru covers all the pagination operations. In particular, the pagination rule checks if the Java application correctly fetches all the results of the pagination operation.

The response of a pagination operation in AWS SDK for Java 1.0 contains a token that has to be used to retrieve the next page of results. In the following code snippet, you make a call to listTables(), a DynamoDB ListTables operation, which can only return up to 100 table names per page. This code might not produce complete results because the operation returns paginated results instead of all results.

public void getDynamoDbTable() {
        AmazonDynamoDBClient client = new AmazonDynamoDBClient();
        List<String> tables = dynamoDbClient.listTables().getTableNames();
        System.out.println(tables)
}

CodeGuru Reviewer detects the missing pagination in the code snippet and makes the following recommendation to add another call to check for additional results.

Screenshot of recommendations for introducing pagination checks

You can accept the recommendation and add the logic to get the next page of table names by checking if a token (LastEvaluatedTableName in ListTablesResponse) is included in each response page. If such a token is present, it’s used in a subsequent request to fetch the next page of results. See the following code:

public void getDynamoDbTable() {
        AmazonDynamoDBClient client = new AmazonDynamoDBClient();
        ListTablesRequest listTablesRequest = ListTablesRequest.builder().build();
        boolean done = false;
        while (!done) {
            ListTablesResponse listTablesResponse = client.listTables(listTablesRequest);
	    System.out.println(listTablesResponse.tableNames());
            if (listTablesResponse.lastEvaluatedTableName() == null) {
                done = true;
            }
            listTablesRequest = listTablesRequest.toBuilder()
                    .exclusiveStartTableName(listTablesResponse.lastEvaluatedTableName())
                    .build();
        }
}

Handling failures in batch operation calls

Batch operations are common with many AWS services that process bulk requests. Batch operations can succeed without throwing exceptions even if some items in the request fail. Therefore, a recommended practice is to explicitly check for any failures in the result of the batch APIs. Over 40 APIs from more than 20 AWS services have batch operations. The best practice rule in CodeGuru Reviewer covers all the batch operations. In the following code snippet, you make a call to sendMessageBatch, a batch operation from Amazon SQS, but it doesn’t handle any errors returned by that batch operation:

public void flush(final String sqsEndPoint,
                     final List<SendMessageBatchRequestEntry> batch) {
    AwsSqsClientBuilder awsSqsClientBuilder;
    AmazonSQS sqsClient = awsSqsClientBuilder.build();
    if (batch.isEmpty()) {
        return;
    }
    sqsClient.sendMessageBatch(sqsEndPoint, batch);
}

CodeGuru Reviewer detects this issue and makes the following recommendation to check the return value for failures.

Screenshot of recommendations for batch operations

You can accept this recommendation and add logging for the complete list of messages that failed to send, in addition to throwing an SQSUpdateException. See the following code:

public void flush(final String sqsEndPoint,
                     final List<SendMessageBatchRequestEntry> batch) {
    AwsSqsClientBuilder awsSqsClientBuilder;
    AmazonSQS sqsClient = awsSqsClientBuilder.build();
    if (batch.isEmpty()) {
        return;
    }
    SendMessageBatchResult result = sqsClient.sendMessageBatch(sqsEndPoint, batch);
    final List<BatchResultErrorEntry> failed = result.getFailed();
    if (!failed.isEmpty()) {
           final String failedMessage = failed.stream()
                         .map(batchResultErrorEntry -> 
                            String.format("…", batchResultErrorEntry.getId(), 
                            batchResultErrorEntry.getMessage()))
                         .collect(Collectors.joining(","));
           throw new SQSUpdateException("Error occurred while sending 
                                        messages to SQS::" + failedMessage);
    }
}

Exception handling best practices

Amazon S3 is one of the most popular AWS services with our customers. A frequent operation with this service is to upload a stream-based object through an Amazon S3 client. Stream-based uploads might encounter occasional network connectivity or timeout issues, and the best practice to address such a scenario is to properly handle the corresponding ResetException error. ResetException extends SdkClientException, which subsequently extends AmazonClientException. Consider the following code snippet, which lacks such exception handling:

private void uploadInputStreamToS3(String bucketName, 
                                   InputStream input, 
                                   String key, ObjectMetadata metadata) 
                         throws SdkClientException {
    final AmazonS3Client amazonS3Client;
    PutObjectRequest putObjectRequest =
          new PutObjectRequest(bucketName, key, input, metadata);
    amazonS3Client.putObject(putObjectRequest);
}

In this case, CodeGuru Reviewer correctly detects the missing handling of the ResetException error and suggests possible solutions.

Screenshot of recommendations for handling exceptions

This recommendation is rich in that it provides alternatives to suit different use cases. The most common handling uses File or FileInputStream objects, but in other cases explicit handling of mark and reset operations are necessary to reliably avoid a ResetException.

You can fix the code by explicitly setting a predefined read limit using the setReadLimit method of RequestClientOptions. Its default value is 128 KB. Setting the read limit value to one byte greater than the size of stream reliably avoids a ResetException.

For example, if the maximum expected size of a stream is 100,000 bytes, set the read limit to 100,001 (100,000 + 1) bytes. The mark and reset always work for 100,000 bytes or less. However, this might cause some streams to buffer that number of bytes into memory.

The fix reliably avoids ResetException when uploading an object of type InputStream to Amazon S3:

private void uploadInputStreamToS3(String bucketName, InputStream input, 
                                   String key, ObjectMetadata metadata) 
                             throws SdkClientException {
        final AmazonS3Client amazonS3Client;
        final Integer READ_LIMIT = 10000;
        PutObjectRequest putObjectRequest =
   			new PutObjectRequest(bucketName, key, input, metadata);  
        putObjectRequest.getRequestClientOptions().setReadLimit(READ_LIMIT);
        amazonS3Client.putObject(putObjectRequest);
}

Replacing custom polling with waiters

A common activity when you’re working with services that are eventually consistent (such as DynamoDB) or have a lead time for creating resources (such as Amazon EC2) is to wait for a resource to transition into a desired state. The AWS SDK provides the Waiters API, a convenient and efficient feature for waiting that abstracts out the polling logic into a simple API call. If you’re not aware of this feature, you might come up with a custom, and potentially inefficient polling logic to determine whether a particular resource had transitioned into a desired state.

The following code appears to be waiting for the status of EC2 instances to change to shutting-down or terminated inside a while (true) loop:

private boolean terminateInstance(final String instanceId, final AmazonEC2 ec2Client)
    throws InterruptedException {
    long start = System.currentTimeMillis();
    while (true) {
        try {
            DescribeInstanceStatusResult describeInstanceStatusResult = 
                            ec2Client.describeInstanceStatus(new DescribeInstanceStatusRequest()
                            .withInstanceIds(instanceId).withIncludeAllInstances(true));
            List<InstanceStatus> instanceStatusList = 
                       describeInstanceStatusResult.getInstanceStatuses();
            long finish = System.currentTimeMillis();
            long timeElapsed = finish - start;
            if (timeElapsed > INSTANCE_TERMINATION_TIMEOUT) {
                break;
            }
            if (instanceStatusList.size() < 1) {
                Thread.sleep(WAIT_FOR_TRANSITION_INTERVAL);
                continue;
            }
            currentState = instanceStatusList.get(0).getInstanceState().getName();
            if ("shutting-down".equals(currentState) || "terminated".equals(currentState)) {
                return true;
             } else {
                 Thread.sleep(WAIT_FOR_TRANSITION_INTERVAL);
             }
        } catch (AmazonServiceException ex) {
            throw ex;
        }
        …
 }

CodeGuru Reviewer detects the polling scenario and recommends you use the waiters feature to help improve efficiency of such programs.

Screenshot of recommendations for introducing waiters feature

Based on the recommendation, the following code uses the waiters function that is available in the AWS SDK for Java. The polling logic is replaced with the waiters() function, which is then run with the call to waiters.run(…), which accepts custom provided parameters, including the request and optional custom polling strategy. The run function polls synchronously until it’s determined that the resource transitioned into the desired state or not. The SDK throws a WaiterTimedOutException if the resource doesn’t transition into the desired state even after a certain number of retries. The fixed code is more efficient, simple, and abstracts the polling logic to determine whether a particular resource had transitioned into a desired state into a simple API call:

public void terminateInstance(final String instanceId, final AmazonEC2 ec2Client)
    throws InterruptedException {
    Waiter<DescribeInstancesRequest> waiter = ec2Client.waiters().instanceTerminated();
    ec2Client.terminateInstances(new TerminateInstancesRequest().withInstanceIds(instanceId));
    try {
        waiter.run(new WaiterParameters()
              .withRequest(new DescribeInstancesRequest()
              .withInstanceIds(instanceId))
              .withPollingStrategy(new PollingStrategy(new MaxAttemptsRetryStrategy(60), 
                    new FixedDelayStrategy(5))));
    } catch (WaiterTimedOutException e) {
        List<InstanceStatus> instanceStatusList = ec2Client.describeInstanceStatus(
               new DescribeInstanceStatusRequest()
                        .withInstanceIds(instanceId)
                        .withIncludeAllInstances(true))
                        .getInstanceStatuses();
        String state;
        if (instanceStatusList != null && instanceStatusList.size() > 0) {
            state = instanceStatusList.get(0).getInstanceState().getName();
        }
    }
}

Service-specific best practice recommendations

In addition to the SDK operation-specific recommendations in the AWS SDK for Java we discussed, there are various AWS service-specific best practice recommendations pertaining to service APIs for services such as Amazon S3, Amazon EC2, DynamoDB, and more, where CodeGuru Reviewer can help to improve Java applications that use AWS service clients. For example, CodeGuru can detect the following:

  • Resource leaks in Java applications that use high-level libraries, such as the Amazon S3 TransferManager
  • Deprecated methods in various AWS services
  • Missing null checks on the response of the GetItem API call in DynamoDB
  • Missing error handling in the output of the PutRecords API call in Kinesis
  • Anti-patterns such as binding the SNS subscribe or createTopic operation with Publish operation

Conclusion

This post introduced how to use CodeGuru Reviewer to improve the use of the AWS SDK in Java applications. CodeGuru is now available for you to try. For pricing information, see Amazon CodeGuru pricing.

Understanding memory usage in your Java application with Amazon CodeGuru Profiler

Post Syndicated from Fernando Ciciliati original https://aws.amazon.com/blogs/devops/understanding-memory-usage-in-your-java-application-with-amazon-codeguru-profiler/

“Where has all that free memory gone?” This is the question we ask ourselves every time our application emits that dreaded OutOfMemoyError just before it crashes. Amazon CodeGuru Profiler can help you find the answer.

Thanks to its brand-new memory profiling capabilities, troubleshooting and resolving memory issues in Java applications (or almost anything that runs on the JVM) is much easier. AWS launched the CodeGuru Profiler Heap Summary feature at re:Invent 2020. This is the first step in helping us, developers, understand what our software is doing with all that memory it uses.

The Heap Summary view shows a list of Java classes and data types present in the Java Virtual Machine heap, alongside the amount of memory they’re retaining and the number of instances they represent. The following screenshot shows an example of this view.

Amazon CodeGuru Profiler heap summary view example

Figure: Amazon CodeGuru Profiler Heap Summary feature

Because CodeGuru Profiler is a low-overhead, production profiling service designed to be always on, it can capture and represent how memory utilization varies over time, providing helpful visual hints about the object types and the data types that exhibit a growing trend in memory consumption.

In the preceding screenshot, we can see that several lines on the graph are trending upwards:

  • The red top line, horizontal and flat, shows how much memory has been reserved as heap space in the JVM. In this case, we see a heap size of 512 MB, which can usually be configured in the JVM with command line parameters like -Xmx.
  • The second line from the top, blue, represents the total memory in use in the heap, independent of their type.
  • The third, fourth, and fifth lines show how much memory space each specific type has been using historically in the heap. We can easily spot that java.util.LinkedHashMap$Entry and java.lang.UUID display growing trends, whereas byte[] has a flat line and seems stable in memory usage.

Types that exhibit constantly growing trend of memory utilization with time deserve a closer look. Profiler helps you focus your attention on these cases. Associating the information presented by the Profiler with your own knowledge of your application and code base, you can evaluate whether the amount of memory being used for a specific data type can be considered normal, or if it might be a memory leak – the unintentional holding of memory by an application due to the failure in freeing-up unused objects. In our example above, java.util.LinkedHashMap$Entry and java.lang.UUIDare good candidates for investigation.

To make this functionality available to customers, CodeGuru Profiler uses the power of Java Flight Recorder (JFR), which is now openly available with Java 8 (since OpenJDK release 262) and above. The Amazon CodeGuru Profiler agent for Java, which already does an awesome job capturing data about CPU utilization, has been extended to periodically collect memory retention metrics from JFR and submit them for processing and visualization via Amazon CodeGuru Profiler. Thanks to its high stability and low overhead, the Profiler agent can be safely deployed to services in production, because it is exactly there, under real workloads, that really interesting memory issues are most likely to show up.

Summary

For more information about CodeGuru Profiler and other AI-powered services in the Amazon CodeGuru family, see Amazon CodeGuru. If you haven’t tried the CodeGuru Profiler yet, start your 90-day free trial right now and understand why continuous profiling is becoming a must-have in every production environment. For Amazon CodeGuru customers who are already enjoying the benefits of always-on profiling, this new feature is available at no extra cost. Just update your Profiler agent to version 1.1.0 or newer, and enable Heap Summary in your agent configuration.

 

Happy profiling!

Resource leak detection in Amazon CodeGuru Reviewer

Post Syndicated from Pranav Garg original https://aws.amazon.com/blogs/devops/resource-leak-detection-in-amazon-codeguru/

This post discusses the resource leak detector for Java in Amazon CodeGuru Reviewer. CodeGuru Reviewer automatically analyzes pull requests (created in supported repositories such as AWS CodeCommit, GitHub, GitHub Enterprise, and Bitbucket) and generates recommendations for improving code quality. For more information, see Automating code reviews and application profiling with Amazon CodeGuru. This blog does not describe the resource leak detector for Python programs that is now available in preview.

What are resource leaks?

Resources are objects with a limited availability within a computing system. These typically include objects managed by the operating system, such as file handles, database connections, and network sockets. Because the number of such resources in a system is limited, they must be released by an application as soon as they are used. Otherwise, you will run out of resources and you won’t be able to allocate new ones. The paradigm of acquiring a resource and releasing it is also followed by other categories of objects such as metric wrappers and timers.

Resource leaks are bugs that arise when a program doesn’t release the resources it has acquired. Resource leaks can lead to resource exhaustion. In the worst case, they can cause the system to slow down or even crash.

Starting with Java 7, most classes holding resources implement the java.lang.AutoCloseable interface and provide a close() method to release them. However, a close() call in source code doesn’t guarantee that the resource is released along all program execution paths. For example, in the following sample code, resource r is acquired by calling its constructor and is closed along the path corresponding to the if branch, shown using green arrows. To ensure that the acquired resource doesn’t leak, you must also close r along the path corresponding to the else branch (the path shown using red arrows).

A resource must be closed along all execution paths to prevent resource leaks

Often, resource leaks manifest themselves along code paths that aren’t frequently run, or under a heavy system load, or after the system has been running for a long time. As a result, such leaks are latent and can remain dormant in source code for long periods of time before manifesting themselves in production environments. This is the primary reason why resource leak bugs are difficult to detect or replicate during testing, and why automatically detecting these bugs during pull requests and code scans is important.

Detecting resource leaks in CodeGuru Reviewer

For this post, we consider the following Java code snippet. In this code, method getConnection() attempts to create a connection in the connection pool associated with a data source. Typically, a connection pool limits the maximum number of connections that can remain open at any given time. As a result, you must close connections after their use so as to not exhaust this limit.

 1     private Connection getConnection(final BasicDataSource dataSource, ...)
               throws ValidateConnectionException, SQLException {
 2         boolean connectionAcquired = false;
 3         // Retrying three times to get the connection.
 4         for (int attempt = 0; attempt < CONNECTION_RETRIES; ++attempt) {
 5             Connection connection = dataSource.getConnection();
 6             // validateConnection may throw ValidateConnectionException
 7             if (! validateConnection(connection, ...)) {
 8                 // connection is invalid
 9                 DbUtils.closeQuietly(connection);
10             } else {
11                 // connection is established
12                 connectionAcquired = true;
13                 return connection;
14             }
15         }
16         return null;
17     }

At first glance, it seems that the method getConnection() doesn’t leak connection resources. If a valid connection is established in the connection pool (else branch on line 10 is taken), the method getConnection() returns it to the client for use (line 13). If the connection established is invalid (if branch on line 7 is taken), it’s closed in line 9 before another attempt is made to establish a connection.

However, method validateConnection() at line 7 can throw a ValidateConnectionException. If this exception is thrown after a connection is established at line 5, the connection is neither closed in this method nor is it returned upstream to the client to be closed later. Furthermore, if this exceptional code path runs frequently, for instance, if the validation logic throws on a specific recurring service request, each new request causes a connection to leak in the connection pool. Eventually, the client can’t acquire new connections to the data source, impacting the availability of the service.

A typical recommendation to prevent resource leak bugs is to declare the resource objects in a try-with-resources statement block. However, we can’t use try-with-resources to fix the preceding method because this method is required to return an open connection for use in the upstream client. The CodeGuru Reviewer recommendation for the preceding code snippet is as follows:

“Consider closing the following resource: connection. The resource is referenced at line 7. The resource is closed at line 9. The resource is returned at line 13. There are other execution paths that don’t close the resource or return it, for example, when validateConnection throws an exception. To prevent this resource leak, close connection along these other paths before you exit this method.”

As mentioned in the Reviewer recommendation, to prevent this resource leak, you must close the established connection when method validateConnection() throws an exception. This can be achieved by inserting the validation logic (lines 7–14) in a try block. In the finally block associated with this try, the connection must be closed by calling DbUtils.closeQuietly(connection) if connectionAcquired == false. The method getConnection() after this fix has been applied is as follows:

private Connection getConnection(final BasicDataSource dataSource, ...) 
        throws ValidateConnectionException, SQLException {
    boolean connectionAcquired = false;
    // Retrying three times to get the connection.
    for (int attempt = 0; attempt < CONNECTION_RETRIES; ++attempt) {
        Connection connection = dataSource.getConnection();
        try {
            // validateConnection may throw ValidateConnectionException
            if (! validateConnection(connection, ...)) {
                // connection is invalid
                DbUtils.closeQuietly(connection);
            } else {
                // connection is established
                connectionAcquired = true;
                return connection;
            }
        } finally {
            if (!connectionAcquired) {
                DBUtils.closeQuietly(connection);
            }
        }
    }
    return null;
}

As shown in this example, resource leaks in production services can be very disruptive. Furthermore, leaks that manifest along exceptional or less frequently run code paths can be hard to detect or replicate during testing and can remain dormant in the code for long periods of time before manifesting themselves in production environments. With the resource leak detector, you can detect such leaks on objects belonging to a large number of popular Java types such as file streams, database connections, network sockets, timers and metrics, etc.

Combining static code analysis with machine learning for accurate resource leak detection

In this section, we dive deep into the inner workings of the resource leak detector. The resource leak detector in CodeGuru Reviewer uses static analysis algorithms and techniques. Static analysis algorithms perform code analysis without running the code. These algorithms are generally prone to high false positives (the tool might report correct code as having a bug). If the number of these false positives is high, it can lead to alarm fatigue and low adoption of the tool. As a result, the resource leak detector in CodeGuru Reviewer prioritizes precision over recall— the findings we surface are resource leaks with a high accuracy, though CodeGuru Reviewer could potentially miss some resource leak findings.

The main reason for false positives in static code analysis is incomplete information available to the analysis. CodeGuru Reviewer requires only the Java source files and doesn’t require all dependencies or the build artifacts. Not requiring the external dependencies or the build artifacts reduces the friction to perform automated code reviews. As a result, static analysis only has access to the code in the source repository and doesn’t have access to its external dependencies. The resource leak detector in CodeGuru Reviewer combines static code analysis with a machine learning (ML) model. This ML model is used to reason about external dependencies to provide accurate recommendations.

To understand the use of the ML model, consider again the code above for method getConnection() that had a resource leak. In the code snippet, a connection to the data source is established by calling BasicDataSource.getConnection() method, declared in the Apache Commons library. As mentioned earlier, we don’t require the source code of external dependencies like the Apache library for code analysis during pull requests. Without access to the code of external dependencies, a pure static analysis-driven technique doesn’t know whether the Connection object obtained at line 5 will leak, if not closed. Similarly, it doesn’t know that DbUtils.closeQuietly() is a library function that closes the connection argument passed to it at line 9. Our detector combines static code analysis with ML that learns patterns over such external function calls from a large number of available code repositories. As a result, our resource leak detector knows that the connection doesn’t leak along the following code path:

  • A connection is established on line 5
  • Method validateConnection() returns false at line 7
  • DbUtils.closeQuietly() is called on line 9

This suppresses the possible false warning. At the same time, the detector knows that there is a resource leak when the connection is established at line 5, and validateConnection() throws an exception at line 7 that isn’t caught.

When we run CodeGuru Reviewer on this code snippet, it surfaces only the second leak scenario and makes an appropriate recommendation to fix this bug.

The ML model used in the resource leak detector has been trained on a large number of internal Amazon and GitHub code repositories.

Responses to the resource leak findings

Although closing an open resource in code isn’t difficult, doing so properly along all program paths is important to prevent resource leaks. This can easily be overlooked, especially along exceptional or less frequently run paths. As a result, the resource leak detector in CodeGuru Reviewer has observed a relatively high frequency, and has alerted developers within Amazon to thousands of resource leaks before they hit production.

The resource leak detections have witnessed a high developer acceptance rate, and developer feedback towards the resource leak detector has been very positive. Some of the feedback from developers includes “Very cool, automated finding,” “Good bot :),” and “Oh man, this is cool.” Developers have also concurred that the findings are important and need to be fixed.

Conclusion

Resource leak bugs are difficult to detect or replicate during testing. They can impact the availability of production services. As a result, it’s important to automatically detect these bugs early on in the software development workflow, such as during pull requests or code scans. The resource leak detector in CodeGuru Reviewer combines static code analysis algorithms with ML to surface only the high confidence leaks. It has a high developer acceptance rate and has alerted developers within Amazon to thousands of leaks before those leaks hit production.

Continuously building and delivering Maven artifacts to AWS CodeArtifact

Post Syndicated from Vinay Selvaraj original https://aws.amazon.com/blogs/devops/continuously-building-and-delivering-maven-artifacts-to-aws-codeartifact/

Artifact repositories are often used to share software packages for use in builds and deployments. Java developers using Apache Maven use artifact repositories to share and reuse Maven packages. For example, one team might own a web service framework that is used by multiple other teams to build their own services. The framework team can publish the framework as a Maven package to an artifact repository, where new versions can be picked up by the service teams as they become available. This post explains how you can set up a continuous integration pipeline with AWS CodePipeline and AWS CodeBuild to deploy Maven artifacts to AWS CodeArtifact. CodeArtifact is a fully managed pay-as-you-go artifact repository service with support for software package managers and build tools like Maven, Gradle, npm, yarn, twine, and pip.

Solution overview

The pipeline we build is triggered each time a code change is pushed to the AWS CodeCommit repository. The code is compiled using the Java compiler, unit tested, and deployed to CodeArtifact. After the artifact is published, it can be consumed by developers working in applications that have a dependency on the artifact or by builds running in other pipelines. The following diagram illustrates this architecture.

Architecture diagram of the solution

 

All the components in this pipeline are fully managed and you don’t pay for idle capacity or have to manage any servers.

 

Prerequisites

This post assumes you have the following tools installed and configured:

 

Creating your resources

To create the CodeArtifact domain, CodeArtifact repository, CodeCommit, CodePipeline, CodeBuild, and associated resources, we use AWS CloudFormation. Save the provided CloudFormation template below as codeartifact-cicd-pipeline.yaml and create a stack:


---
Description: Code Artifact CI/CD Pipeline

Parameters:
  GitRepoBranchName:
    Type: String
    Default: main

Resources:

  ArtifactBucket:
    Type: AWS::S3::Bucket
  
  CodeArtifactDomain:
    Type: AWS::CodeArtifact::Domain
    Properties:
      DomainName: !Sub "${AWS::StackName}-domain"
  
  CodeArtifactRepository:
    Type: AWS::CodeArtifact::Repository
    Properties:
      DomainName: !GetAtt CodeArtifactDomain.Name
      RepositoryName: !Sub "${AWS::StackName}-repo"

  CodeRepository:
    Type: AWS::CodeCommit::Repository
    Properties:
      RepositoryDescription: Maven artifact code repository
      RepositoryName: !Sub "${AWS::StackName}-maven-artifact-repo"
  
  CodeBuildProject:
    Type: AWS::CodeBuild::Project
    Properties:
      Name: !Sub "${AWS::StackName}-CodeBuild"
      Artifacts:
        Type: CODEPIPELINE
      Environment:
        EnvironmentVariables:
          - Name: CODEARTIFACT_DOMAIN
            Type: PLAINTEXT
            Value: !GetAtt CodeArtifactDomain.Name
          - Name: CODEARTIFACT_REPO
            Type: PLAINTEXT
            Value: !GetAtt CodeArtifactRepository.Name
        Type: LINUX_CONTAINER
        ComputeType: BUILD_GENERAL1_SMALL
        Image: aws/codebuild/amazonlinux2-x86_64-standard:3.0
      ServiceRole: !GetAtt CodeBuildServiceRole.Arn
      Source:
        Type: CODEPIPELINE
        BuildSpec: buildspec.yaml
  
  Pipeline:
    Type: AWS::CodePipeline::Pipeline
    Properties:
      ArtifactStore:
        Type: S3
        Location: !Ref ArtifactBucket
      RoleArn: !GetAtt CodePipelineServiceRole.Arn
      Stages:
      - Name: Source
        Actions:
        - Name: SourceAction
          ActionTypeId:
            Category: Source
            Owner: AWS
            Version: '1'
            Provider: CodeCommit
          OutputArtifacts:
          - Name: SourceBundle
          Configuration:
            BranchName: !Ref GitRepoBranchName
            RepositoryName: !GetAtt CodeRepository.Name
          RunOrder: '1'

      - Name: Deliver
        Actions:
        - Name: CodeBuild
          InputArtifacts:
          - Name: SourceBundle
          ActionTypeId:
            Category: Build
            Owner: AWS
            Version: '1'
            Provider: CodeBuild
          Configuration:
            ProjectName: !Ref CodeBuildProject
          RunOrder: '1'

  CodeBuildServiceRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Sid: ''
          Effect: Allow
          Principal:
            Service:
            - codebuild.amazonaws.com
          Action: sts:AssumeRole
      Policies:
      - PolicyName: CodePipelinePolicy
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Sid: CloudWatchLogsPolicy
            Effect: Allow
            Action:
            - logs:CreateLogGroup
            - logs:CreateLogStream
            - logs:PutLogEvents
            Resource:
            - "*"
          - Sid: CodeCommitPolicy
            Effect: Allow
            Action:
            - codecommit:GitPull
            Resource:
            - !GetAtt CodeRepository.Arn
          - Sid: S3GetObjectPolicy
            Effect: Allow
            Action:
            - s3:GetObject
            - s3:GetObjectVersion
            Resource:
            - !Sub "arn:aws:s3:::${ArtifactBucket}/*"
          - Sid: S3PutObjectPolicy
            Effect: Allow
            Action:
            - s3:PutObject
            Resource:
            - !Sub "arn:aws:s3:::${ArtifactBucket}/*"
          - Sid: BearerTokenPolicy
            Effect: Allow
            Action:
            - sts:GetServiceBearerToken
            Resource: "*"
            Condition:
              StringEquals:
                sts:AWSServiceName: codeartifact.amazonaws.com
          - Sid: CodeArtifactPolicy
            Effect: Allow
            Action:
            - codeartifact:GetAuthorizationToken
            Resource:
            - !Sub "arn:aws:codeartifact:${AWS::Region}:${AWS::AccountId}:domain/${CodeArtifactDomain.Name}"
          - Sid: CodeArtifactPackage
            Effect: Allow
            Action:
            - codeartifact:PublishPackageVersion
            - codeartifact:PutPackageMetadata
            - codeartifact:ReadFromRepository
            Resource:
            - !Sub "arn:aws:codeartifact:${AWS::Region}:${AWS::AccountId}:package/${CodeArtifactDomain.Name}/${CodeArtifactRepository.Name}/*"
          - Sid: CodeArtifactRepository
            Effect: Allow
            Action:
            - codeartifact:ReadFromRepository
            - codeartifact:GetRepositoryEndpoint
            Resource:
            - !Sub "arn:aws:codeartifact:${AWS::Region}:${AWS::AccountId}:repository/${CodeArtifactDomain.Name}/${CodeArtifactRepository.Name}"          

  CodePipelineServiceRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Sid: ''
          Effect: Allow
          Principal:
            Service:
            - codepipeline.amazonaws.com
          Action: sts:AssumeRole
      Policies:
      - PolicyName: CodePipelinePolicy
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Action:
            - s3:GetObject
            - s3:GetObjectVersion
            - s3:GetBucketVersioning
            Resource: !Sub "arn:aws:s3:::${ArtifactBucket}/*"
            Effect: Allow
          - Action:
            - s3:PutObject
            Resource:
            - !Sub "arn:aws:s3:::${ArtifactBucket}/*"
            Effect: Allow
          - Action:
            - codecommit:GetBranch
            - codecommit:GetCommit
            - codecommit:UploadArchive
            - codecommit:GetUploadArchiveStatus
            - codecommit:CancelUploadArchive
            Resource:
              - !GetAtt CodeRepository.Arn
            Effect: Allow
          - Action:
            - codebuild:StartBuild
            - codebuild:BatchGetBuilds
            Resource: 
              - !GetAtt CodeBuildProject.Arn
            Effect: Allow
          - Action:
            - iam:PassRole
            Resource: "*"
            Effect: Allow
Outputs:
  CodePipelineArtifactBucket:
    Value: !Ref ArtifactBucket
  CodeRepositoryHttpCloneUrl:
    Value: !GetAtt CodeRepository.CloneUrlHttp
  CodeRepositorySshCloneUrl:
    Value: !GetAtt CodeRepository.CloneUrlSsh

aws cloudformation deploy                         \
  --stack-name codeartifact-pipeline               \
  --template-file codeartifact-cicd-pipeline.yaml  \
  --capabilities CAPABILITY_IAM

 

If you have a Maven project you want to use, you can use that. Otherwise, create a new one:


mvn archetype:generate        \
  -DgroupId=com.mycompany.app \
  -DartifactId=my-app         \
  -DarchetypeArtifactId=maven-archetype-quickstart \
  -DarchetypeVersion=1.4 -DinteractiveMode=false

 

Initialize a Git repository for the Maven project and add the CodeCommit repository that was created in the CloudFormation stack as a remote repository:


cd my-app
git init
CODECOMMIT_URL=$(aws cloudformation describe-stacks --stack-name codeartifact-pipeline --query "Stacks[0].Outputs[?OutputKey=='CodeRepositoryHttpCloneUrl'].OutputValue" --output text)
git remote add origin $CODECOMMIT_URL

 

Updating the POM file

The Maven project’s POM file needs to be updated with the distribution management section. This lets Maven know where to publish artifacts. Add the distributionManagement section inside the project element of the POM. Be sure to update the URL with the correct URL for the CodeArtifact repository you created earlier. You can find the CodeArtifact repository URL with the get-repository-endpoint CLI command:


aws codeartifact get-repository-endpoint --domain codeartifact-pipeline-domain  --repository codeartifact-pipeline-repo --format maven

 

Add the following to the Maven project’s pom.xml:


<distributionManagement>
  <repository>
    <id>codeartifact</id>
    <name>codeartifact</name>
    <url>Replace with the URL from the get-repository-endpoint command</url>
  </repository>
</distributionManagement>

Creating a settings.xml file

Maven needs credentials to use to authenticate with CodeArtifact when it performs the deployment. CodeArtifact uses temporary authorization tokens. To pass the token to Maven, a settings.xml file is created in the top level of the Maven project. During the deployment stage, Maven is instructed to use the settings.xml in the top level of the project instead of the settings.xml that normally resides in $HOME/.m2. Create a settings.xml in the top level of the Maven project with the following contents:


<settings>
  <servers>
    <server>
      <id>codeartifact</id>
      <username>aws</username>
      <password>${env.CODEARTIFACT_TOKEN}</password>
    </server>
  </servers>
</settings>

Creating the buildspec.yaml file

CodeBuild uses a build specification file with commands and related settings that are used during the build, test, and delivery of the artifact. In the build specification file, we specify the CodeBuild runtime to use pre-build actions (update AWS CLI), and build actions (Maven build, test, and deploy). When Maven is invoked, it is provided the path to the settings.xml created in the previous step, instead of the default in $HOME/.m2/settings.xml. Create the buildspec.yaml as shown in the following code:


version: 0.2

phases:
  install:
    runtime-versions:
      java: corretto11

  pre_build:
    commands:
      - pip3 install awscli --upgrade --user

  build:
    commands:
      - export CODEARTIFACT_TOKEN=`aws codeartifact get-authorization-token --domain ${CODEARTIFACT_DOMAIN} --query authorizationToken --output text`
      - mvn -s settings.xml clean package deploy

 

Running the pipeline

The final step is to add the files in the Maven project to the Git repository and push the changes to CodeCommit. This triggers the pipeline to run. See the following code:


git checkout -b main
git add settings.xml buildspec.yaml pom.xml src
git commit -a -m "Initial commit"
git push --set-upstream origin main

 

Checking the pipeline

At this point, the pipeline starts to run. To check its progress, sign in to the AWS Management Console and choose the Region where you created the pipeline. On the CodePipeline console, open the pipeline that the CloudFormation stack created. The pipeline’s name is prefixed with the stack name. If you open the CodePipeline console before the pipeline is complete, you can watch each stage run (see the following screenshot).

CodePipeline Screenshot

If you see that the pipeline failed, you can choose the details in the action that failed for more information.

Checking for new artifacts published in CodeArtifact

When the pipeline is complete, you should be able to see the artifact in the CodeArtifact repository you created earlier. The artifact we published for this post is a Maven snapshot. CodeArtifact handles snapshots differently than release versions. For more information, see Use Maven snapshots. To find the artifact in CodeArtifact, complete the following steps:

  1. On the CodeArtifact console, choose Repositories.
  2. Choose the repository we created earlier named myrepo.
  3. Search for the package named my-app.
  4. Choose the my-app package from the search results.
    CodeArtifact Assets
  5. Choose the Dependencies tab to bring up a list of Maven dependencies that the Maven project depends on.CodeArtifact Dependencies

 

Cleaning up

To clean up the resources you created in this post, you need to remove them in the following order:


# Empty the CodePipeline S3 artifact bucket
CODEPIPELINE_BUCKET=$(aws cloudformation describe-stacks --stack-name codeartifact-pipeline --query "Stacks[0].Outputs[?OutputKey=='CodePipelineArtifactBucket'].OutputValue" --output text)
aws s3 rm s3://$CODEPIPELINE_BUCKET --recursive

# Delete the CloudFormation stack
aws cloudformation delete-stack --stack-name codeartifact-pipeline

Conclusion

This post covered how to build a continuous integration pipeline to deliver Maven artifacts to AWS CodeArtifact. You can modify this solution for your specific needs. For more information about CodeArtifact or the other services used, see the following:

 

Syntactic Sugar Is Not Always Good

Post Syndicated from Bozho original https://techblog.bozho.net/syntactic-sugar-is-not-always-good/

This write-up is partly inspired by a recent post by Vlad Mihalcea on LinkedIn about the recently introduced text blocks in Java. More about them can be read here.

Now, that’s a nice feature. I’ve used it in Scala several years ago, and other languages also have it, so it seems like a no-brainer to introduce it in Java.

But, syntactic sugar (please don’t argue whether that’s precisely syntactic sugar or not) can be problematic and lead to “syntactic diabetes”. It has two possible issues.

The less important one is consistency – if you can do one thing in multiple, equally valid ways, that introduces inconsistency in the code and pointless arguments of “the right way to do things”. In this context – for 2-line strings do you use a text block or not? Should you do multi-line formatting for simple strings or not? Should you configure checkstyle rules to reject one or the other option and in what circumstances?

The second, and bigger problem, is code readability. I know it sounds counter-intuitive, but bear with me. The example that Vlad gave illustrates that – do you want to have a 20-line SQL query in your Java code? I’d say no – you’d better extract that to a separate file, and using some form of templating, populate the right values. This is certainly readable when you browse the code:

String query = QueryUtils.loadQuery("age-query.sql", timestamp, diff, other);
// or
String query = QueryUtils.loadQuery("age-query.sql", 
       Arrays.asList("param1Name", "param2Name"), 
      Arrays.asList(param1Value, param2Value);

Queries can be placed in /src/main/resources and loaded as templates (and cached) by the QueryUtils. And because of the previous lack of text blocks, you would not prefer ugly string concatenated queries inside your code.

But now, with this feature, you are tempted to do that, because, well, it looks good. Same goes for Elasticsearch queries, for JSON templates and whatnot. With this “sugar” you have more incentive to just put them in the code, where they, arguably, make the code less readable. If you really have to debug the query, as opposed to assuming its semantics by its name and relying on a proper implementation, you can easily go to age-query.sql and work with it. Just like when you extract a private method with some implementation details so that it makes the calling method more readable and easy to follow.

Both problems have manifested themselves in my Scala experience, which I’ve summed up in my talk “Scala – the good, the bad and the very ugly” (only slides available). Scala allows you to express things nicely and in multiple ways, which leads to inconsistencies and horrible code readability in some cases.

Counter intuitively, sometimes the syntactic improvements may be worse for code readability. Because they introduce complexity and because they make it easier to do the wrong thing.

That’s not a universal complaint, and certainly syntactic sugar is needed – you don’t have to write List<String> list = new ArrayList<String>() if you can use the diamond operator. But each such feature should be well thought not just for how easy it makes to write the code, but also how easy it is to read it, and more importantly - what type of code it incentivizes.

The post Syntactic Sugar Is Not Always Good appeared first on Bozho's tech blog.

Complete CI/CD with AWS CodeCommit, AWS CodeBuild, AWS CodeDeploy, and AWS CodePipeline

Post Syndicated from Nitin Verma original https://aws.amazon.com/blogs/devops/complete-ci-cd-with-aws-codecommit-aws-codebuild-aws-codedeploy-and-aws-codepipeline/

Many organizations have been shifting to DevOps practices, which is the combination of cultural philosophies, practices, and tools that increases your organization’s ability to deliver applications and services at high velocity; for example, evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes.

DevOps-Feedback-Flow

An integral part of DevOps is adopting the culture of continuous integration and continuous delivery/deployment (CI/CD), where a commit or change to code passes through various automated stage gates, all the way from building and testing to deploying applications, from development to production environments.

This post uses the AWS suite of CI/CD services to compile, build, and install a version-controlled Java application onto a set of Amazon Elastic Compute Cloud (Amazon EC2) Linux instances via a fully automated and secure pipeline. The goal is to promote a code commit or change to pass through various automated stage gates all the way from development to production environments, across AWS accounts.

AWS services

This solution uses the following AWS services:

  • AWS CodeCommit – A fully-managed source control service that hosts secure Git-based repositories. CodeCommit makes it easy for teams to collaborate on code in a secure and highly scalable ecosystem. This solution uses CodeCommit to create a repository to store the application and deployment codes.
  • AWS CodeBuild – A fully managed continuous integration service that compiles source code, runs tests, and produces software packages that are ready to deploy, on a dynamically created build server. This solution uses CodeBuild to build and test the code, which we deploy later.
  • AWS CodeDeploy – A fully managed deployment service that automates software deployments to a variety of compute services such as Amazon EC2, AWS Fargate, AWS Lambda, and your on-premises servers. This solution uses CodeDeploy to deploy the code or application onto a set of EC2 instances running CodeDeploy agents.
  • AWS CodePipeline – A fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates. This solution uses CodePipeline to create an end-to-end pipeline that fetches the application code from CodeCommit, builds and tests using CodeBuild, and finally deploys using CodeDeploy.
  • AWS CloudWatch Events – An AWS CloudWatch Events rule is created to trigger the CodePipeline on a Git commit to the CodeCommit repository.
  • Amazon Simple Storage Service (Amazon S3) – An object storage service that offers industry-leading scalability, data availability, security, and performance. This solution uses an S3 bucket to store the build and deployment artifacts created during the pipeline run.
  • AWS Key Management Service (AWS KMS) – AWS KMS makes it easy for you to create and manage cryptographic keys and control their use across a wide range of AWS services and in your applications. This solution uses AWS KMS to make sure that the build and deployment artifacts stored on the S3 bucket are encrypted at rest.

Overview of solution

This solution uses two separate AWS accounts: a dev account (111111111111) and a prod account (222222222222) in Region us-east-1.

We use the dev account to deploy and set up the CI/CD pipeline, along with the source code repo. It also builds and tests the code locally and performs a test deploy.

The prod account is any other account where the application is required to be deployed from the pipeline in the dev account.

In summary, the solution has the following workflow:

  • A change or commit to the code in the CodeCommit application repository triggers CodePipeline with the help of a CloudWatch event.
  • The pipeline downloads the code from the CodeCommit repository, initiates the Build and Test action using CodeBuild, and securely saves the built artifact on the S3 bucket.
  • If the preceding step is successful, the pipeline triggers the Deploy in Dev action using CodeDeploy and deploys the app in dev account.
  • If successful, the pipeline triggers the Deploy in Prod action using CodeDeploy and deploys the app in the prod account.

The following diagram illustrates the workflow:

cicd-overall-flow

 

Failsafe deployments

This example of CodeDeploy uses the IN_PLACE type of deployment. However, to minimize the downtime, CodeDeploy inherently supports multiple deployment strategies. This example makes use of following features: rolling deployments and automatic rollback.

CodeDeploy provides the following three predefined deployment configurations, to minimize the impact during application upgrades:

  • CodeDeployDefault.OneAtATime – Deploys the application revision to only one instance at a time
  • CodeDeployDefault.HalfAtATime – Deploys to up to half of the instances at a time (with fractions rounded down)
  • CodeDeployDefault.AllAtOnce – Attempts to deploy an application revision to as many instances as possible at once

For OneAtATime and HalfAtATime, CodeDeploy monitors and evaluates instance health during the deployment and only proceeds to the next instance or next half if the previous deployment is healthy. For more information, see Working with deployment configurations in CodeDeploy.

You can also configure a deployment group or deployment to automatically roll back when a deployment fails or when a monitoring threshold you specify is met. In this case, the last known good version of an application revision is automatically redeployed after a failure with the new application version.

How CodePipeline in the dev account deploys apps in the prod account

In this post, the deployment pipeline using CodePipeline is set up in the dev account, but it has permissions to deploy the application in the prod account. We create a special cross-account role in the prod account, which has the following:

  • Permission to use fetch artifacts (app) rom Amazon S3 and deploy it locally in the account using CodeDeploy
  • Trust with the dev account where the pipeline runs

CodePipeline in the dev account assumes this cross-account role in the prod account to deploy the app.

Do I need multiple accounts?
If you answer “yes” to any of the following questions you should consider creating more AWS accounts:

  • Does your business require administrative isolation between workloads? Administrative isolation by account is the most straightforward way to grant independent administrative groups different levels of administrative control over AWS resources based on workload, development lifecycle, business unit (BU), or data sensitivity.
  • Does your business require limited visibility and discoverability of workloads? Accounts provide a natural boundary for visibility and discoverability. Workloads cannot be accessed or viewed unless an administrator of the account enables access to users managed in another account.
  • Does your business require isolation to minimize blast radius? Separate accounts help define boundaries and provide natural blast-radius isolation to limit the impact of a critical event such as a security breach, an unavailable AWS Region or Availability Zone, account suspensions, and so on.
  • Does your business require a particular workload to operate within AWS service limits without impacting the limits of another workload? You can use AWS account service limits to impose restrictions on a business unit, development team, or project. For example, if you create an AWS account for a project group, you can limit the number of Amazon Elastic Compute Cloud (Amazon EC2) or high performance computing (HPC) instances that can be launched by the account.
  • Does your business require strong isolation of recovery or auditing data? If regulatory requirements require you to control access and visibility to auditing data, you can isolate the data in an account separate from the one where you run your workloads (for example, by writing AWS CloudTrail logs to a different account).

Prerequisites

For this walkthrough, you should complete the following prerequisites:

  1. Have access to at least two AWS accounts. For this post, the dev and prod accounts are in us-east-1. You can search and replace the Region and account IDs in all the steps and sample AWS Identity and Access Management (IAM) policies in this post.
  2. Ensure you have EC2 Linux instances with the CodeDeploy agent installed in all the accounts or VPCs where the sample Java application is to be installed (dev and prod accounts).
    • To manually create EC2 instances with CodeDeploy agent, refer Create an Amazon EC2 instance for CodeDeploy (AWS CLI or Amazon EC2 console). Keep in mind the following:
      • CodeDeploy uses EC2 instance tags to identify instances to use to deploy the application, so it’s important to set tags appropriately. For this post, we use the tag name Application with the value MyWebApp to identify instances where the sample app is installed.
      • Make sure to use an EC2 instance profile (AWS Service Role for EC2 instance) with permissions to read the S3 bucket containing artifacts built by CodeBuild. Refer to the IAM role cicd_ec2_instance_profile in the table Roles-1 below for the set of permissions required. You must update this role later with the actual KMS key and S3 bucket name created as part of the deployment process.
    • To create EC2 Linux instances via AWS Cloudformation, download and launch the AWS CloudFormation template from the GitHub repo: cicd-ec2-instance-with-codedeploy.json
      • This deploys an EC2 instance with AWS CodeDeploy agent.
      • Inputs required:
        • AMI : Enter name of the Linux AMI in your region. (This template has been tested with latest Amazon Linux 2 AMI)
        • Ec2SshKeyPairName: Name of an existing SSH KeyPair
        • Ec2IamInstanceProfile: Name of an existing EC2 instance profile. Note: Use the permissions in the template cicd_ec2_instance_profile_policy.json to create the policy for this EC2 Instance Profile role. You must update this role later with the actual KMS key and S3 bucket name created as part of the deployment process.
        • Update the EC2 instance Tags per your need.
  3. Ensure required IAM permissions. Have an IAM user with an IAM Group or Role that has the following access levels or permissions:

    AWS Service / Components  Access Level Accounts Comments
    AWS CodeCommit Full (admin) Dev Use AWS managed policy AWSCodeCommitFullAccess.
    AWS CodePipeline Full (admin) Dev Use AWS managed policy AWSCodePipelineFullAccess.
    AWS CodeBuild Full (admin) Dev Use AWS managed policy AWSCodeBuildAdminAccess.
    AWS CodeDeploy Full (admin) All

    Use AWS managed policy

    AWSCodeDeployFullAccess.

    Create S3 bucket and bucket policies Full (admin) Dev IAM policies can be restricted to specific bucket.
    Create KMS key and policies Full (admin) Dev IAM policies can be restricted to specific KMS key.
    AWS CloudFormation Full (admin) Dev

    Use AWS managed policy

    AWSCloudFormationFullAccess.

    Create and pass IAM roles Full (admin) All Ability to create IAM roles and policies can be restricted to specific IAM roles or actions. Also, an admin team with IAM privileges could create all the required roles. Refer to the IAM table Roles-1 below.
    AWS Management Console and AWS CLI As per IAM User permissions All To access suite of Code services.

     

  4. Create Git credentials for CodeCommit in the pipeline account (dev account). AWS allows you to either use Git credentials or associate SSH public keys with your IAM user. For this post, use Git credentials associated with your IAM user (created in the previous step). For instructions on creating a Git user, see Create Git credentials for HTTPS connections to CodeCommit. Download and save the Git credentials to use later for deploying the application.
  5. Create all AWS IAM roles as per the following tables (Roles-1). Make sure to update the following references in all the given IAM roles and policies:
    • Replace the sample dev account (111111111111) and prod account (222222222222) with actual account IDs
    • Replace the S3 bucket mywebapp-codepipeline-bucket-us-east-1-111111111111 with your preferred bucket name.
    • Replace the KMS key ID key/82215457-e360-47fc-87dc-a04681c91ce1 with your KMS key ID.

Table: Roles-1

Service IAM Role Type Account IAM Role Name (used for this post) IAM Role Policy (required for this post) IAM Role Permissions
AWS CodePipeline Service role Dev (111111111111)

cicd_codepipeline_service_role

Select Another AWS Account and use this account as the account ID to create the role.

Later update the trust as follows:
“Principal”: {“Service”: “codepipeline.amazonaws.com”},

Use the permissions in the template cicd_codepipeline_service_policy.json to create the policy for this role. This CodePipeline service role has appropriate permissions to the following services in a local account:

  • Manage CodeCommit repos
  • Initiate build via CodeBuild
  • Create deployments via CodeDeploy
  • Assume cross-account CodeDeploy role in prod account to deploy the application
AWS CodePipeline IAM role Dev (111111111111)

cicd_codepipeline_trigger_cwe_role

Select Another AWS Account and use this account as the account ID to create the role.

Later update the trust as follows:
“Principal”: {“Service”: “events.amazonaws.com”},

Use the permissions in the template cicd_codepipeline_trigger_cwe_policy.json to create the policy for this role. CodePipeline uses this role to set a CloudWatch event to trigger the pipeline when there is a change or commit made to the code repository.
AWS CodePipeline IAM role Prod (222222222222)

cicd_codepipeline_cross_ac_role

Choose Another AWS Account and use the dev account as the trusted account ID to create the role.

Use the permissions in the template cicd_codepipeline_cross_ac_policy.json to create the policy for this role. This role is created in the prod account and has permissions to use CodeDeploy and fetch from Amazon S3. The role is assumed by CodePipeline from the dev account to deploy the app in the prod account. Make sure to set up trust with the dev account for this IAM role on the Trust relationships tab.
AWS CodeBuild Service role Dev (111111111111)

cicd_codebuild_service_role

Choose CodeBuild as the use case to create the role.

Use the permissions in the template cicd_codebuild_service_policy.json to create the policy for this role. This CodeBuild service role has appropriate permissions to:

  • The S3 bucket to store artefacts
  • Stream logs to CloudWatch Logs
  • Pull code from CodeCommit
  • Get the SSM parameter for CodeBuild
  • Miscellaneous Amazon EC2 permissions
AWS CodeDeploy Service role Dev (111111111111) and Prod (222222222222)

cicd_codedeploy_service_role

Choose CodeDeploy as the use case to create the role.

Use the built-in AWS managed policy AWSCodeDeployRole for this role. This CodeDeploy service role has appropriate permissions to:

  • Miscellaneous Amazon EC2 Auto Scaling
  • Miscellaneous Amazon EC2
  • Publish Amazon SNS topic
  • AWS CloudWatch metrics
  • Elastic Load Balancing
EC2 Instance Service role for EC2 instance profile Dev (111111111111) and Prod (222222222222)

cicd_ec2_instance_profile

Choose EC2 as the use case to create the role.

Use the permissions in the template cicd_ec2_instance_profile_policy.json to create the policy for this role.

This is set as the EC2 instance profile for the EC2 instances where the app is deployed. It has appropriate permissions to fetch artefacts from Amazon S3 and decrypt contents using the KMS key.

 

You must update this role later with the actual KMS key and S3 bucket name created as part of the deployment process.

 

 

Setting up the prod account

To set up the prod account, complete the following steps:

  1. Download and launch the AWS CloudFormation template from the GitHub repo: cicd-codedeploy-prod.json
    • This deploys the CodeDeploy app and deployment group.
    • Make sure that you already have a set of EC2 Linux instances with the CodeDeploy agent installed in all the accounts where the sample Java application is to be installed (dev and prod accounts). If not, refer back to the Prerequisites section.
  2. Update the existing EC2 IAM instance profile (cicd_ec2_instance_profile):
    • Replace the S3 bucket name mywebapp-codepipeline-bucket-us-east-1-111111111111 with your S3 bucket name (the one used for the CodePipelineArtifactS3Bucket variable when you launched the CloudFormation template in the dev account).
    • Replace the KMS key ARN arn:aws:kms:us-east-1:111111111111:key/82215457-e360-47fc-87dc-a04681c91ce1 with your KMS key ARN (the one created as part of the CloudFormation template launch in the dev account).

Setting up the dev account

To set up your dev account, complete the following steps:

  1. Download and launch the CloudFormation template from the GitHub repo: cicd-aws-code-suite-dev.json
    The stack deploys the following services in the dev account:

    • CodeCommit repository
    • CodePipeline
    • CodeBuild environment
    • CodeDeploy app and deployment group
    • CloudWatch event rule
    • KMS key (used to encrypt the S3 bucket)
    • S3 bucket and bucket policy
  2. Use following values as inputs to the CloudFormation template. You should have created all the existing resources and roles beforehand as part of the prerequisites.

    Key Example Value Comments
    CodeCommitWebAppRepo MyWebAppRepo Name of the new CodeCommit repository for your web app.
    CodeCommitMainBranchName master Main branch name on your CodeCommit repository. Default is master (which is pushed to the prod environment).
    CodeBuildProjectName MyCBWebAppProject Name of the new CodeBuild environment.
    CodeBuildServiceRole arn:aws:iam::111111111111:role/cicd_codebuild_service_role ARN of an existing IAM service role to be associated with CodeBuild to build web app code.
    CodeDeployApp MyCDWebApp Name of the new CodeDeploy app to be created for your web app. We assume that the CodeDeploy app name is the same in all accounts where deployment needs to occur (in this case, the prod account).
    CodeDeployGroupDev MyCICD-Deployment-Group-Dev Name of the new CodeDeploy deployment group to be created in the dev account.
    CodeDeployGroupProd MyCICD-Deployment-Group-Prod Name of the existing CodeDeploy deployment group in prod account. Created as part of the prod account setup.

    CodeDeployGroupTagKey

     

    Application Name of the tag key that CodeDeploy uses to identify the existing EC2 fleet for the deployment group to use.

    CodeDeployGroupTagValue

     

    MyWebApp Value of the tag that CodeDeploy uses to identify the existing EC2 fleet for the deployment group to use.
    CodeDeployConfigName CodeDeployDefault.OneAtATime

    Desired Code Deploy config name. Valid options are:

    CodeDeployDefault.OneAtATime

    CodeDeployDefault.HalfAtATime

    CodeDeployDefault.AllAtOnce

    For more information, see Deployment configurations on an EC2/on-premises compute platform.

    CodeDeployServiceRole arn:aws:iam::111111111111:role/cicd_codedeploy_service_role

    ARN of an existing IAM service role to be associated with CodeDeploy to deploy web app.

     

    CodePipelineName MyWebAppPipeline Name of the new CodePipeline to be created for your web app.
    CodePipelineArtifactS3Bucket mywebapp-codepipeline-bucket-us-east-1-111111111111 Name of the new S3 bucket to be created where artifacts for the pipeline are stored for this web app.
    CodePipelineServiceRole arn:aws:iam::111111111111:role/cicd_codepipeline_service_role ARN of an existing IAM service role to be associated with CodePipeline to deploy web app.
    CodePipelineCWEventTriggerRole arn:aws:iam::111111111111:role/cicd_codepipeline_trigger_cwe_role ARN of an existing IAM role used to trigger the pipeline you named earlier upon a code push to the CodeCommit repository.
    CodeDeployRoleXAProd arn:aws:iam::222222222222:role/cicd_codepipeline_cross_ac_role ARN of an existing IAM role in the cross-account for CodePipeline to assume to deploy the app.

    It should take 5–10 minutes for the CloudFormation stack to complete. When the stack is complete, you can see that CodePipeline has built the pipeline (MyWebAppPipeline) with the CodeCommit repository and CodeBuild environment, along with actions for CodeDeploy in local (dev) and cross-account (prod). CodePipeline should be in a failed state because your CodeCommit repository is empty initially.

  3. Update the existing Amazon EC2 IAM instance profile (cicd_ec2_instance_profile):
    • Replace the S3 bucket name mywebapp-codepipeline-bucket-us-east-1-111111111111 with your S3 bucket name (the one used for the CodePipelineArtifactS3Bucket parameter when launching the CloudFormation template in the dev account).
    • Replace the KMS key ARN arn:aws:kms:us-east-1:111111111111:key/82215457-e360-47fc-87dc-a04681c91ce1 with your KMS key ARN (the one created as part of the CloudFormation template launch in the dev account).

Deploying the application

You’re now ready to deploy the application via your desktop or PC.

  1. Assuming you have the required HTTPS Git credentials for CodeCommit as part of the prerequisites, clone the CodeCommit repo that was created earlier as part of the dev account setup. Obtain the name of the CodeCommit repo to clone, from the CodeCommit console. Enter the Git user name and password when prompted. For example:
    $ git clone https://git-codecommit.us-east-1.amazonaws.com/v1/repos/MyWebAppRepo my-web-app-repo
    Cloning into 'my-web-app-repo'...
    Username for 'https://git-codecommit.us-east-1.amazonaws.com/v1/repos/MyWebAppRepo': xxxx
    Password for 'https://[email protected]/v1/repos/MyWebAppRepo': xxxx

  2. Download the MyWebAppRepo.zip file containing a sample Java application, CodeBuild configuration to build the app, and CodeDeploy config file to deploy the app.
  3. Copy and unzip the file into the my-web-app-repo Git repository folder created earlier.
  4. Assuming this is the sample app to be deployed, commit these changes to the Git repo. For example:
    $ cd my-web-app-repo 
    $ git add -A 
    $ git commit -m "initial commit" 
    $ git push

For more information, see Tutorial: Create a simple pipeline (CodeCommit repository).

After you commit the code, the CodePipeline will be triggered and all the stages and your application should be built, tested, and deployed all the way to the production environment!

The following screenshot shows the entire pipeline and its latest run:

 

Troubleshooting

To troubleshoot any service-related issues, see the following:

Cleaning up

To avoid incurring future charges or to remove any unwanted resources, delete the following:

  • EC2 instance used to deploy the application
  • CloudFormation template to remove all AWS resources created through this post
  •  IAM users or roles

Conclusion

Using this solution, you can easily set up and manage an entire CI/CD pipeline in AWS accounts using the native AWS suite of CI/CD services, where a commit or change to code passes through various automated stage gates all the way from building and testing to deploying applications, from development to production environments.

FAQs

In this section, we answer some frequently asked questions:

  1. Can I expand this deployment to more than two accounts?
    • Yes. You can deploy a pipeline in a tooling account and use dev, non-prod, and prod accounts to deploy code on EC2 instances via CodeDeploy. Changes are required to the templates and policies accordingly.
  2. Can I ensure the application isn’t automatically deployed in the prod account via CodePipeline and needs manual approval?
  3. Can I use a CodeDeploy group with an Auto Scaling group?
    • Yes. Minor changes required to the CodeDeploy group creation process. Refer to the following Solution Variations section for more information.
  4. Can I use this pattern for EC2 Windows instances?

Solution variations

In this section, we provide a few variations to our solution:

Author bio

author-pic

 Nitin Verma

Nitin is currently a Sr. Cloud Architect in the AWS Managed Services(AMS). He has many years of experience with DevOps-related tools and technologies. Speak to your AWS Managed Services representative to deploy this solution in AMS!

 

AWS App2Container – A New Containerizing Tool for Java and ASP.NET Applications

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/aws-app2container-a-new-containerizing-tool-for-java-and-asp-net-applications/

Our customers are increasingly developing their new applications with containers and serverless technologies, and are using modern continuous integration and delivery (CI/CD) tools to automate the software delivery life cycle. They also maintain a large number of existing applications that are built and managed manually or using legacy systems. Maintaining these two sets of applications with disparate tooling adds to operational overhead and slows down the pace of delivering new business capabilities. As much as possible, they want to be able to standardize their management tooling and CI/CD processes across both their existing and new applications, and see the option of packaging their existing applications into containers as the first step towards accomplishing that goal.

However, containerizing existing applications requires a long list of manual tasks such as identifying application dependencies, writing dockerfiles, and setting up build and deployment processes for each application. These manual tasks are time consuming, error prone, and can slow down the modernization efforts.

Today, we are launching AWS App2Container, a new command-line tool that helps containerize existing applications that are running on-premises, in Amazon Elastic Compute Cloud (EC2), or in other clouds, without needing any code changes. App2Container discovers applications running on a server, identifies their dependencies, and generates relevant artifacts for seamless deployment to Amazon ECS and Amazon EKS. It also provides integration with AWS CodeBuild and AWS CodeDeploy to enable a repeatable way to build and deploy containerized applications.

AWS App2Container generates the following artifacts for each application component: Application artifacts such as application files/folders, Dockerfiles, container images in Amazon Elastic Container Registry (ECR), ECS Task definitions, Kubernetes deployment YAML, CloudFormation templates to deploy the application to Amazon ECS or EKS, and templates to set up a build/release pipeline in AWS Codepipeline which also leverages AWS CodeBuild and CodeDeploy.

Starting today, you can use App2Container to containerize ASP.NET (.NET 3.5+) web applications running in IIS 7.5+ on Windows, and Java applications running on Linux—standalone JBoss, Apache Tomcat, and generic Java applications such as Spring Boot, IBM WebSphere, Oracle WebLogic, etc.

By modernizing existing applications using containers, you can make them portable, increase development agility, standardize your CI/CD processes, and reduce operational costs. Now let’s see how it works!

AWS App2Container – Getting Started
AWS App2Container requires that the following prerequisites be installed on the server(s) hosting your application: AWS Command Line Interface (CLI) version 1.14 or later, Docker tools, and (in the case of ASP.NET) Powershell 5.0+ for applications running on Windows. Additionally, you need to provide appropriate IAM permissions to App2Container to interact with AWS services.

For example, let’s look how you containerize your existing Java applications. App2Container CLI for Linux is packaged as a tar.gz archive. The file provides users an interactive shell script, install.sh to install the App2Container CLI. Running the script guides users through the install steps and also updates the user’s path to include the App2Container CLI commands.

First, you can begin by running a one-time initialization on the installed server for the App2Container CLI with the init command.

$ sudo app2container init
Workspace directory path for artifacts[default:  /home/ubuntu/app2container/ws]:
AWS Profile (configured using 'aws configure --profile')[default: default]:  
Optional S3 bucket for application artifacts (Optional)[default: none]: 
Report usage metrics to AWS? (Y/N)[default: y]:
Require images to be signed using Docker Content Trust (DCT)? (Y/N)[default: n]:
Configuration saved

This sets up a workspace to store application containerization artifacts (minimum 20GB of disk space available). You can extract them into your Amazon Simple Storage Service (S3) bucket using your AWS profile configured to use AWS services.

Next, you can view Java processes that are running on the application server by using the inventory command. Each Java application process has a unique identifier (for example, java-tomcat-9e8e4799) which is the application ID. You can use this ID to refer to the application with other App2Container CLI commands.

$ sudo app2container inventory
{
    "java-jboss-5bbe0bec": {
        "processId": 27366,
        "cmdline": "java ... /home/ubuntu/wildfly-10.1.0.Final/modules org.jboss.as.standalone -Djboss.home.dir=/home/ubuntu/wildfly-10.1.0.Final -Djboss.server.base.dir=/home/ubuntu/wildfly-10.1.0.Final/standalone ",
        "applicationType": "java-jboss"
    },
    "java-tomcat-9e8e4799": {
        "processId": 2537,
        "cmdline": "/usr/bin/java ... -Dcatalina.home=/home/ubuntu/tomee/apache-tomee-plume-7.1.1 -Djava.io.tmpdir=/home/ubuntu/tomee/apache-tomee-plume-7.1.1/temp org.apache.catalina.startup.Bootstrap start ",
        "applicationType": "java-tomcat"
    }
}

You can also intialize ASP.NET applications on an administrator-run PowerShell session of Windows Servers with IIS version 7.0 or later. Note that Docker tools and container support are available on Windows Server 2016 and later versions. You can select to run all app2container operations on the application server with Docker tools installed or use a worker machine with Docker tools using Amazon ECS-optimized Windows Server AMIs.

PS> app2container inventory
{
    "iis-smarts-51d2dbf8": {
        "siteName": "nopCommerce39",
        "bindings": "http/*:90:",
        "applicationType": "iis"
    }
}

The inventory command displays all IIS websites on the application server that can be containerized. Each IIS website process has a unique identifier (for example, iis-smarts-51d2dbf8) which is the application ID. You can use this ID to refer to the application with other App2Container CLI commands.

You can choose a specific application by referring to its application ID and generate an analysis report for the application by using the analyze command.

$ sudo app2container analyze --application-id java-tomcat-9e8e4799
Created artifacts folder /home/ubuntu/app2container/ws/java-tomcat-9e8e4799
Generated analysis data in /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/analysis.json
Analysis successful for application java-tomcat-9e8e4799
Please examine the same, make appropriate edits and initiate containerization using "app2container containerize --application-id java-tomcat-9e8e4799"

You can use the analysis.json template generated by the application analysis to gather information on the analyzed application that helps identify all system dependencies from the analysisInfo section, and update containerization parameters to customize the container images generated for the application using the containerParameters section.

$ cat java-tomcat-9e8e4799/analysis.json
{
    "a2CTemplateVersion": "1.0",
	"createdTime": "2020-06-24 07:40:5424",
    "containerParameters": {
        "_comment1": "*** EDITABLE: The below section can be edited according to the application requirements. Please see the analyisInfo section below for deetails discoverd regarding the application. ***",
        "imageRepository": "java-tomcat-9e8e4799",
        "imageTag": "latest",
        "containerBaseImage": "ubuntu:18.04",
        "coopProcesses": [ 6446, 6549, 6646]
    },
    "analysisInfo": {
        "_comment2": "*** NON-EDITABLE: Analysis Results ***",
        "processId": 2537
        "appId": "java-tomcat-9e8e4799",
		"userId": "1000",
        "groupId": "1000",
        "cmdline": [...],
        "os": {...},
        "ports": [...]
    }
}

Also, you can run the $ app2container extract --application-id java-tomcat-9e8e4799 command to generate an application archive for the analyzed application. This depends on the analysis.json file generated earlier in the workspace folder for the application,and adheres to any containerization parameter updates specified in there. By using extract command, you can continue the workflow on a worker machine after running the first set of commands on the application server.

Now you can containerize command generated Docker images for the selected application.

$ sudo app2container containerize --application-id java-tomcat-9e8e4799
AWS pre-requisite check succeeded
Docker pre-requisite check succeeded
Extracted container artifacts for application
Entry file generated
Dockerfile generated under /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/Artifacts
Generated dockerfile.update under /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/Artifacts
Generated deployment file at /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/deployment.json
Containerization successful. Generated docker image java-tomcat-9e8e4799
You're all set to test and deploy your container image.

Next Steps:
1. View the container image with \"docker images\" and test the application.
2. When you're ready to deploy to AWS, please edit the deployment file as needed at /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/deployment.json.
3. Generate deployment artifacts using app2container generate app-deployment --application-id java-tomcat-9e8e4799

Using this command, you can view the generated container images using Docker images on the machine where the containerize command is run. You can use the docker run command to launch the container and test application functionality.

Note that in addition to generating container images, the containerize command also generates a deployment.json template file that you can use with the next generate-appdeployment command. You can edit the parameters in the deployment.json template file to change the image repository name to be registered in Amazon ECR, the ECS task definition parameters, or the Kubernetes App name.

$ cat java-tomcat-9e8e4799/deployment.json
{
       "a2CTemplateVersion": "1.0",
       "applicationId": "java-tomcat-9e8e4799",
       "imageName": "java-tomcat-9e8e4799",
       "exposedPorts": [
              {
                     "localPort": 8090,
                     "protocol": "tcp6"
              }
       ],
       "environment": [],
       "ecrParameters": {
              "ecrRepoTag": "latest"
       },
       "ecsParameters": {
              "createEcsArtifacts": true,
              "ecsFamily": "java-tomcat-9e8e4799",
              "cpu": 2,
              "memory": 4096,
              "dockerSecurityOption": "",
              "enableCloudwatchLogging": false,
              "publicApp": true,
              "stackName": "a2c-java-tomcat-9e8e4799-ECS",
              "reuseResources": {
                     "vpcId": "",
                     "cfnStackName": "",
                     "sshKeyPairName": ""
              },
              "gMSAParameters": {
                     "domainSecretsArn": "",
                     "domainDNSName": "",
                     "domainNetBIOSName": "",
                     "createGMSA": false,
                     "gMSAName": ""
              }
       },
       "eksParameters": {
              "createEksArtifacts": false,
              "applicationName": "",
              "stackName": "a2c-java-tomcat-9e8e4799-EKS",
              "reuseResources": {
                     "vpcId": "",
                     "cfnStackName": "",
                     "sshKeyPairName": ""
              }
       }
 }

At this point, the application workspace where the artifacts are generated serves as an iteration sandbox. You can choose to edit the Dockerfile generated here to make changes to their application and use the docker build command to build new container images as needed. You can generate the artifacts needed to deploy the application containers in Amazon EKS by using the generate-deployment command.

$ sudo app2container generate app-deployment --application-id java-tomcat-9e8e4799
AWS pre-requisite check succeeded
Docker pre-requisite check succeeded
Created ECR Repository
Uploaded Cloud Formation resources to S3 Bucket: none
Generated Cloud Formation Master template at: /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/EksDeployment/amazon-eks-master.template.yaml
EKS Cloudformation templates and additional deployment artifacts generated successfully for application java-tomcat-9e8e4799

You're all set to use AWS Cloudformation to manage your application stack.
Next Steps:
1. Edit the cloudformation template as necessary.
2. Create an application stack using the AWS CLI or the AWS Console. AWS CLI command:

       aws cloudformation deploy --template-file /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/EksDeployment/amazon-eks-master.template.yaml --capabilities CAPABILITY_NAMED_IAM --stack-name java-tomcat-9e8e4799

3. Setup a pipeline for your application stack:

       app2container generate pipeline --application-id java-tomcat-9e8e4799

This command works based on the deployment.json template file produced as part of running the containerize command. App2Container will now generate ECS/EKS cloudformation templates as well and an option to deploy those stacks.

The command registers the container image to user specified ECR repository, generates cloudformation template for Amazon ECS and EKS deployments. You can register ECS task definition with Amazon ECS and use kubectl to launch the containerized application on the existing Amazon EKS or self-managed kubernetes cluster using App2Container generated amazon-eks-master.template.deployment.yaml.

Alternatively, you can directly deploy containerized applications by --deploy options into Amazon EKS.

$ sudo app2container generate app-deployment --application-id java-tomcat-9e8e4799 --deploy
AWS pre-requisite check succeeded
Docker pre-requisite check succeeded
Created ECR Repository
Uploaded Cloud Formation resources to S3 Bucket: none
Generated Cloud Formation Master template at: /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/EksDeployment/amazon-eks-master.template.yaml
Initiated Cloudformation stack creation. This may take a few minutes. Please visit the AWS Cloudformation Console to track progress.
Deploying application to EKS

Handling ASP.NET Applications with Windows Authentication
Containerizing ASP.NET applications is almost same process as Java applications, but Windows containers cannot be directly domain joined. They can however still use Active Directory (AD) domain identities to support various authentication scenarios.

App2Container detects if a site is using Windows authentication and accordingly makes the IIS site’s application pool run as the network service identity, and generates the new cloudformation templates for Windows authenticated IIS applications. The creation of gMSA and AD Security group, domain join ECS nodes and making containers use this gMSA are all taken care of by those templates.

Also, it provides two PowerShell scripts as output to the $ app2container containerize command along with an instruction file on how to use it.

The following is an example output:

PS C:\Windows\system32> app2container containerize --application-id iis-SmartStoreNET-a726ba0b
Running AWS pre-requisite check...
Running Docker pre-requisite check...
Container build complete. Please use "docker images" to view the generated container images.
Detected that the Site is using Windows Authentication.
Generating powershell scripts into C:\Users\Admin\AppData\Local\app2container\iis-SmartStoreNET-a726ba0b\Artifacts required to setup Container host with Windows Authentication
Please look at C:\Users\Admin\AppData\Local\app2container\iis-SmartStoreNET-a726ba0b\Artifacts\WindowsAuthSetupInstructions.md for setup instructions on Windows Authentication.
A deployment file has been generated under C:\Users\Admin\AppData\Local\app2container\iis-SmartStoreNET-a726ba0b
Please edit the same as needed and generate deployment artifacts using "app2container generate-deployment"

The first PowerShellscript, DomainJoinAddToSecGroup.ps1, joins the container host and adds it to an Active Directory security group. The second script, CreateCredSpecFile.ps1, creates a Group Managed Service Account (gMSA), grants access to the Active Directory security group, generates the credential spec for this gMSA, and stores it locally on the container host. You can execute these PowerShellscripts on the ECS host. The following is an example usage of the scripts:

PS C:\Windows\system32> .\DomainJoinAddToSecGroup.ps1 -ADDomainName Dominion.com -ADDNSIp 10.0.0.1 -ADSecurityGroup myIISContainerHosts -CreateADSecurityGroup:$true
PS C:\Windows\system32> .\CreateCredSpecFile.ps1 -GMSAName MyGMSAForIIS -CreateGMSA:$true -ADSecurityGroup myIISContainerHosts

Before executing the app2container generate-deployment command, edit the deployment.json file to change the value of dockerSecurityOption to the name of the CredentialSpec file that the CreateCredSpecFile script generated. For example,
"dockerSecurityOption": "credentialspec:file://dominion_mygmsaforiis.json"

Effectively, any access to network resource made by the IIS server inside the container for the site will now use the above gMSA to authenticate. The final step is to authorize this gMSA account on the network resources that the IIS server will access. A common example is authorizing this gMSA inside the SQL Server.

Finally, if the application must connect to a database to be fully functional and you run the container in Amazon ECS, ensure that the application container created from the Docker image generated by the tool has connectivity to the same database. You can refer to this documentation for options on migrating: MS SQL Server from Windows to Linux on AWS, Database Migration Service, and backup and restore your MS SQL Server to Amazon RDS.

Now Available
AWS App2Container is offered free. You only pay for the actual usage of AWS services like Amazon EC2, ECS, EKS, and S3 etc based on their usage. For details, please refer to App2Container FAQs and documentations. Give this a try, and please send us feedback either through your usual AWS Support contacts, on the AWS Forum for ECS, AWS Forum for EKS, or on the container roadmap on Github.

Channy;

Find Your Most Expensive Lines of Code – Amazon CodeGuru Is Now Generally Available

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/find-your-most-expensive-lines-of-code-amazon-codeguru-is-now-generally-available/

Bringing new applications into production, maintaining their code base as they grow and evolve, and at the same time respond to operational issues, is a challenging task. For this reason, you can find many ideas on how to structure your teams, on which methodologies to apply, and how to safely automate your software delivery pipeline.

At re:Invent last year, we introduced in preview Amazon CodeGuru, a developer tool powered by machine learning that helps you improve your applications and troubleshoot issues with automated code reviews and performance recommendations based on runtime data. During the last few months, many improvements have been launched, including a more cost-effective pricing model, support for Bitbucket repositories, and the ability to start the profiling agent using a command line switch, so that you no longer need to modify the code of your application, or add dependencies, to run the agent.

You can use CodeGuru in two ways:

  • CodeGuru Reviewer uses program analysis and machine learning to detect potential defects that are difficult for developers to find, and recommends fixes in your Java code. The code can be stored in GitHub (now also in GitHub Enterprise), AWS CodeCommit, or Bitbucket repositories. When you submit a pull request on a repository that is associated with CodeGuru Reviewer, it provides recommendations for how to improve your code. Each pull request corresponds to a code review, and each code review can include multiple recommendations that appear as comments on the pull request.
  • CodeGuru Profiler provides interactive visualizations and recommendations that help you fine-tune your application performance and troubleshoot operational issues using runtime data from your live applications. It currently supports applications written in Java virtual machine (JVM) languages such as Java, Scala, Kotlin, Groovy, Jython, JRuby, and Clojure. CodeGuru Profiler can help you find the most expensive lines of code, in terms of CPU usage or introduced latency, and suggest ways you can improve efficiency and remove bottlenecks. You can use CodeGuru Profiler in production, and when you test your application with a meaningful workload, for example in a pre-production environment.

Today, Amazon CodeGuru is generally available with the addition of many new features.

In CodeGuru Reviewer, we included the following:

  • Support for Github Enterprise – You can now scan your pull requests and get recommendations against your source code on Github Enterprise on-premises repositories, together with a description of what’s causing the issue and how to remediate it.
  • New types of recommendations to solve defects and improve your code – For example, checking input validation, to avoid issues that can compromise security and performance, and looking for multiple copies of code that do the same thing.

In CodeGuru Profiler, you can find these new capabilities:

  • Anomaly detection – We automatically detect anomalies in the application profile for those methods that represent the highest proportion of CPU time or latency.
  • Lambda function support – You can now profile AWS Lambda functions just like applications hosted on Amazon Elastic Compute Cloud (EC2) and containerized applications running on Amazon ECS and Amazon Elastic Kubernetes Service, including those using AWS Fargate.
  • Cost of issues in the recommendation report – Recommendations contain actionable resolution steps which explain what the problem is, the CPU impact, and how to fix the issue. To help you better prioritize your activities, you now have an estimation of the savings introduced by applying the recommendation.
  • Color-my-code – In the visualizations, to help you easily find your own code, we are coloring your methods differently from frameworks and other libraries you may use.
  • CloudWatch metrics and alerts – To keep track and monitor efficiency issues that have been discovered.

Let’s see some of these new features at work!

Using CodeGuru Reviewer with a Lambda Function
I create a new repo in my GitHub account, and leave it empty for now. Locally, where I am developing a Lambda function using the Java 11 runtime, I initialize my Git repo and add only the README.md file to the master branch. In this way, I can add all the code as a pull request later and have it go through a code review by CodeGuru.

git init
git add README.md
git commit -m "First commit"

Now, I add the GitHub repo as origin, and push my changes to the new repo:

git remote add origin https://github.com/<my-user-id>/amazon-codeguru-sample-lambda-function.git
git push -u origin master

I associate the repository in the CodeGuru console:

When the repository is associated, I create a new dev branch, add all my local files to it, and push it remotely:

git checkout -b dev
git add .
git commit -m "Code added to the dev branch"
git push --set-upstream origin dev

In the GitHub console, I open a new pull request by comparing changes across the two branches, master and dev. I verify that the pull request is able to merge, then I create it.

Since the repository is associated with CodeGuru, a code review is listed as Pending in the Code reviews section of the CodeGuru console.

After a few minutes, the code review status is Completed, and CodeGuru Reviewer issues a recommendation on the same GitHub page where the pull request was created.

Oops! I am creating the Amazon DynamoDB service object inside the function invocation method. In this way, it cannot be reused across invocations. This is not efficient.

To improve the performance of my Lambda function, I follow the CodeGuru recommendation, and move the declaration of the DynamoDB service object to a static final attribute of the Java application object, so that it is instantiated only once, during function initialization. Then, I follow the link in the recommendation to learn more best practices for working with Lambda functions.

Using CodeGuru Profiler with a Lambda Function
In the CodeGuru console, I create a MyServerlessApp-Development profiling group and select the Lambda compute platform.

Next, I give the AWS Identity and Access Management (IAM) role used by my Lambda function permissions to submit data to this profiling group.

Now, the console is giving me all the info I need to profile my Lambda function. To configure the profiling agent, I use a couple of environment variables:

  • AWS_CODEGURU_PROFILER_GROUP_ARN to specify the ARN of the profiling group to use.
  • AWS_CODEGURU_PROFILER_ENABLED to enable (TRUE) or disable (FALSE) profiling.

I follow the instructions (for Maven and Gradle) to add a dependency, and include the profiling agent in the build. Then, I update the code of the Lambda function to wrap the handler function inside the LambdaProfiler provided by the agent.

To generate some load, I start a few scripts invoking my function using the Amazon API Gateway as trigger. After a few minutes, the profiling group starts to show visualizations describing the runtime behavior of my Lambda function.

For example, I can see how much CPU time is spent in the different methods of my function. At the bottom, there are the entry point methods. As I scroll up, I find methods that are called deeper in the stack trace. I right-click and hide the LambdaRuntimeClient methods to focus on my code. Note that my methods are colored differently than those in the packages I am using, such as the AWS SDK for Java.

I am mostly interested in what happens in the handler method invoked by the Lambda platform. I select the handler method, and now it becomes the new “base” of the visualization.

As I move my pointer on each of my methods, I get more information, including an estimation of the yearly cost of running that specific part of the code in production, based on the load experienced by the profiling agent during the selected time window. In my case, the handler function cost is estimated to be $6. If I select the two main functions above, I have an estimation of $3 each. The cost estimation works for code running on Lambda functions, EC2 instances, and containerized applications.

Similarly, I can visualize Latency, to understand how much time is spent inside the methods in my code. I keep the Lambda function handler method selected to drill down into what is under my control, and see where time is being spent the most.

The CodeGuru Profiler is also providing a recommendation based on the data collected. I am spending too much time (more than 4%) in managing encryption. I can use a more efficient crypto provider, such as the open source Amazon Corretto Crypto Provider, described in this blog post. This should lower the time spent to what is expected, about 1% of my profile.

Finally, I edit the profiling group to enable notifications. In this way, if CodeGuru detects an anomaly in the profile of my application, I am notified in one or more Amazon Simple Notification Service (SNS) topics.

Available Now
Amazon CodeGuru is available today in 10 regions, and we are working to add more regions in the coming months. For regional availability, please see the AWS Region Table.

CodeGuru helps you improve your application code and reduce compute and infrastructure costs with an automated code reviewer and application profiler that provide intelligent recommendations. Using visualizations based on runtime data, you can quickly find the most expensive lines of code of your applications. With CodeGuru, you pay only for what you use. Pricing is based on the lines of code analyzed by CodeGuru Reviewer, and on sampling hours for CodeGuru Profiler.

To learn more, please see the documentation.

Danilo

Building a CI/CD pipeline for multi-region deployment with AWS CodePipeline

Post Syndicated from Akash Kumar original https://aws.amazon.com/blogs/devops/building-a-ci-cd-pipeline-for-multi-region-deployment-with-aws-codepipeline/

This post discusses the benefits of and how to build an AWS CI/CD pipeline in AWS CodePipeline for multi-region deployment. The CI/CD pipeline triggers on application code changes pushed to your AWS CodeCommit repository. This automatically feeds into AWS CodeBuild for static and security analysis of the CloudFormation template. Another CodeBuild instance builds the application to generate an AMI image as output. AWS Lambda then copies the AMI image to other Regions. Finally, AWS CloudFormation cross-region actions are triggered and provision the instance into target Regions based on AMI image.

The solution is based on using a single pipeline with cross-region actions, which helps in provisioning resources in the current Region and other Regions. This solution also helps manage the complete CI/CD pipeline at one place in one Region and helps as a single point for monitoring and deployment changes. This incurs less cost because a single pipeline can deploy the application into multiple Regions.

As a security best practice, the solution also incorporates static and security analysis using cfn-lint and cfn-nag. You use these tools to scan CloudFormation templates for security vulnerabilities.

The following diagram illustrates the solution architecture.

Multi region AWS CodePipeline architecture

Multi region AWS CodePipeline architecture

Prerequisites

Before getting started, you must complete the following prerequisites:

  • Create a repository in CodeCommit and provide access to your user
  • Copy the sample source code from GitHub under your repository
  • Create an Amazon S3 bucket in the current Region and each target Region for your artifact store

Creating a pipeline with AWS CloudFormation

You use a CloudFormation template for your CI/CD pipeline, which can perform the following actions:

  1. Use CodeCommit repository as source code repository
  2. Static code analysis on the CloudFormation template to check against the resource specification and block provisioning if this check fails
  3. Security code analysis on the CloudFormation template to check against secure infrastructure rules and block provisioning if this check fails
  4. Compilation and unit test of application code to generate an AMI image
  5. Copy the AMI image into target Regions for deployment
  6. Deploy into multiple Regions using the CloudFormation template; for example, us-east-1, us-east-2, and ap-south-1

You use a sample web application to run through your pipeline, which requires Java and Apache Maven for compilation and testing. Additionally, it uses Tomcat 8 for deployment.

The following table summarizes the resources that the CloudFormation template creates.

Resource Name Type Objective
CloudFormationServiceRole AWS::IAM::Role Service role for AWS CloudFormation
CodeBuildServiceRole AWS::IAM::Role Service role for CodeBuild
CodePipelineServiceRole AWS::IAM::Role Service role for CodePipeline
LambdaServiceRole AWS::IAM::Role Service role for Lambda function
SecurityCodeAnalysisServiceRole AWS::IAM::Role Service role for security analysis of provisioning CloudFormation template
StaticCodeAnalysisServiceRole AWS::IAM::Role Service role for static analysis of provisioning CloudFormation template
StaticCodeAnalysisProject AWS::CodeBuild::Project CodeBuild for static analysis of provisioning CloudFormation template
SecurityCodeAnalysisProject AWS::CodeBuild::Project CodeBuild for security analysis of provisioning CloudFormation template
CodeBuildProject AWS::CodeBuild::Project CodeBuild for compilation, testing, and AMI creation
CopyImage AWS::Lambda::Function Python Lambda function for copying AMI images into other Regions
AppPipeline AWS::CodePipeline::Pipeline CodePipeline for CI/CD

To start creating your pipeline, complete the following steps:

  • Launch the CloudFormation stack with the following link:
Launch button for CloudFormation

Launch button for CloudFormation

  • Choose Next.
  • For Specify details, provide the following values:
Parameter Description
Stack name Name of your stack
OtherRegion1 Input the target Region 1 (other than current Region) for deployment
OtherRegion2 Input the target Region 2 (other than current Region) for deployment
RepositoryBranch Branch name of repository
RepositoryName Repository name of the project
S3BucketName Input the S3 bucket name for artifact store
S3BucketNameForOtherRegion1 Create a bucket in target Region 1 and specify the name for artifact store
S3BucketNameForOtherRegion2 Create a bucket in target Region 2 and specify the name for artifact store

Choose Next.

  • On the Review page, select I acknowledge that this template might cause AWS CloudFormation to create IAM resources.
  • Choose Create.
  • Wait for the CloudFormation stack status to change to CREATE_COMPLETE (this takes approximately 5–7 minutes).

When the stack is complete, your pipeline should be ready and running in the current Region.

  • To validate the pipeline, check the images and EC2 instances running into the target Regions and also refer the AWS CodePipeline Execution summary as below.
AWS CodePipeline Execution Summary

AWS CodePipeline Execution Summary

We will walk you through the following steps for creating a multi-region deployment pipeline:

1. Using CodeCommit as your source code repository

The deployment workflow starts by placing the application code on the CodeCommit repository. When you add or update the source code in CodeCommit, the action generates a CloudWatch event, which triggers the pipeline to run.

2. Static code analysis of CloudFormation template to provision AWS resources

Historically, AWS CloudFormation linting was limited to the ValidateTemplate action in the service API. This action tells you if your template is well-formed JSON or YAML, but doesn’t help validate the actual resources you’ve defined.

You can use a linter such as the cfn-lint tool for static code analysis to improve your AWS CloudFormation development cycle. The tool validates the provisioning CloudFormation template properties and their values (mappings, joins, splits, conditions, and nesting those functions inside each other) against the resource specification. This can cover the most common of the underlying service constraints and help encode some best practices.

The following rules cover underlying service constraints:

  • E2530 – Checks that Lambda functions have correctly configured memory sizes
  • E3025 – Checks that your RDS instances use correct instance types for the database engine
  • W2001 – Checks that each parameter is used at least once

You can also add this step as a pre-commit hook for your GIT repository if you are using CodeCommit or GitHub.

You provision a CodeBuild project for static code analysis as the first step in CodePipeline after source. This helps in early detection of any linter issues.

3. Security code analysis of CloudFormation template to provision AWS resources

You can use Stelligent’s cfn_nag tool to perform additional validation of your template resources for security. The cfn-nag tool looks for patterns in CloudFormation templates that may indicate insecure infrastructure provisioning and validates against AWS best practices. For example:

  • IAM rules that are too permissive (wildcards)
  • Security group rules that are too permissive (wildcards)
  • Access logs that aren’t enabled
  • Encryption that isn’t enabled
  • Password literals

You provision a CodeBuild project for security code analysis as the second step in CodePipeline. This helps detect any insecure infrastructure provisioning issues.

4. Compiling and testing application code and generating an AMI image

Because you use a Java-based application for this walkthrough, you use Amazon Corretto as your JVM. Corretto is a no-cost, multi-platform, production-ready distribution of the Open Java Development Kit (OpenJDK). Corretto comes with long-term support that includes performance enhancements and security fixes.

You also use Apache Maven as a build automation tool to build the sample application, and the HashiCorp Packer tool to generate an AMI image for the application.

You provision a CodeBuild project for compilation, unit testing, AMI generation, and storing the AMI ImageId in the Parameter Store, which the CloudFormation template uses as the next step of the pipeline.

5. Copying the AMI image into target Regions

You use a Lambda function to copy the AMI image into target Regions so the CloudFormation template can use it to provision instances into that Region as the next step of the pipeline. It also writes the target Region AMI ImageId into the target Region’s Parameter Store.

6. Deploying into multiple Regions with the CloudFormation template

You use the CloudFormation template as a cross-region action to provision AWS resources into a target Region. CloudFormation uses Parameter Store’s ImageId as reference and provisions the instances into the target Region.

Cleaning up

To avoid additional charges, you should delete the following AWS resources after you validate the pipeline:

  • The cross-region CloudFormation stack in the target and current Regions
  • The main CloudFormation stack in the current Region
  • The AMI you created in the target and current Regions
  • The Parameter Store AMI_VERSION in the target and current Regions

Conclusion

You have now created a multi-region deployment pipeline in CodePipeline without having to worry about the mechanics of creating and copying AMI images across Regions. CodePipeline abstracts the creating and copying of the images in the background in each Region. You can now upload new source code changes to the CodeCommit repository in the primary Region, and changes deploy automatically to other Regions. Cross-region actions are very powerful and are not limited to deploy actions. You can also use them with build and test actions.

Introducing a new generation of AWS Elastic Beanstalk platforms

Post Syndicated from David LaBissoniere original https://aws.amazon.com/blogs/compute/introducing-a-new-generation-of-aws-elastic-beanstalk-platforms/

In my last post I discussed AWS Elastic Beanstalk’s new public roadmap on GitHub. Today I want to talk about our new generation of Elastic Beanstalk platforms built on top of Amazon Linux 2 (AL2).

Late last year we launched a public beta of a new Elastic Beanstalk platform for Amazon Corretto — Amazon’s no-cost, production-ready distribution of the Open Java Development Kit (OpenJDK). This is also our first platform based on AL2. This year we have launched two more beta AL2 platforms: Docker and Python. More beta platforms are arriving soon, followed by generally available platform releases.

A sample application using the new Python 3.7 beta platform

A sample application using the new Python 3.7 beta platform

I want to dive a little deeper on what we are doing with these platforms. Elastic Beanstalk was publicly launched in 2011, and announced in a blog post by Jeff Barr. Back then there were few enough AWS services that they were all listed as tabs along the top of the AWS Management Console. At launch, we supported only Apache Tomcat applications. Over time, we added support for many other runtimes and began using the term “platform” to describe our offerings. Today we support a wide variety of platforms for popular web application frameworks. For example, Ruby on Rails, PHP, and Node.js, as well as generic Docker-based platforms. In the years since we launched each platform, the underlying communities have continued to evolve. Elastic Beanstalk is an opinionated service, especially when it comes to our platforms. As the service evolves, the opinions baked into our platforms must evolve as well.

With our AL2 platforms, we are refreshing each platform based on feedback we’ve gotten from customers. For example, with Java we heard concerns from many customers about long-term support and licensing of OpenJDK. That’s why in AL2 we are using Amazon’s own Corretto distribution, which includes committed long-term support. It also has performance and scalability improvements learned from Amazon’s years of experience running Java across thousands of production services — such as the Elastic Beanstalk service itself. For more details, see this section of our Java platform documentation.

Our Python AL2 platform has also been modernized. Previously we only supported serving applications through Apache and mod_wsgi. Now we are using NGINX as a reverse proxy in front of Gunicorn, with the flexibility to use another Web Server Gateway Interface (WSGI) server if you prefer. We also took this opportunity to add support for Pipenv and Pipfile, more modern and powerful Python dependency management tools. Learn more in our Python platform documentation.

The Docker AL2 platform is rewritten internally, but provides largely the same customer experience. It does offer improved I/O performance by using the OverlayFS storage driver. This is a change from the previous Docker platform, which used the older and slower Device Mapper storage driver and required an extra Amazon EBS volume.

We are hard at work on another set of beta platforms including PHP, Ruby, and Node.js, which are expected to launch soon. Each of these have been modernized and improved. For a full list of differences between our existing platforms and their Amazon Linux 2 equivalents, check out our documentation. In the next section I want to take a closer look at one new feature that applies to all of the new platforms: platform hooks.

Platform hooks

With our AL2 platforms, we are offering a simplified model for on-instance customization. We’ve long supported configuration files called ebextensions that allow customization of environment options, resources, and on-instance behavior. These have enabled customers to extend their environments in ways we never dreamed of. But we’ve also heard customer feedback about the difficulty of writing complex shell scripts embedded within YAML or JSON. And as they are, ebextensions don’t provide any straightforward mechanism to execute custom code after an application deployment is completed. Customers have pointed out many use cases where they want to do this – for example to enable third party monitoring tools.

With our new generation of Linux platforms, we are introducing platform hooks. Platform hooks are a set of directories inside the application bundle that you can populate with scripts. These scripts are executed at defined points in the on-instance application deployment lifecycle. These hooks are reminiscent of custom platform hooks, but are simplified and easier to manage and version because they are part of the application bundle.

For example, a Corretto application bundle might look like:

├── .platform
│   ├── hooks
│   │   ├── prebuild
│   │   │   ├── 01_set_secrets.sh
│   │   │   └── 10_install_dependencies.sh
│   │   └── predeploy
│   │       └── 01_configure_corretto.sh
│   │   ├── postdeploy
│   │   │   └── 99_log_deployment_complete.pay
│   └── nginx
│       └── conf.d
│           └── custom.conf
├── Procfile
└── application.jar

The files in each of the .platform/hooks/ subdirectories are executed in lexicographical order at predefined points in the deployment process.

  1. prebuild hooks are executed after the application is downloaded and extracted, but before we try to configure anything
  2. predeploy hooks are run after the application is configured and staged, but before it is deployed.
  3. postdeploy hooks are run at the very end — after the application is deployed and running.

Finally, take note of the .platform/nginx/ directory as well. This can be used to provide custom configuration additions or overrides for the on-instance NGINX proxy server. You can either override the provided configuration file completely, or just add a new configuration file that is imported by NGINX. Because all of the AL2 platforms use NGINX and the same base configuration, these customizations are now more portable across platforms. For a full explanation of platform hooks and related functionality, see our Extending Linux Platforms documentation page.

We’re excited to launch this new generation of Elastic Beanstalk platforms, and to hear feedback from you about how we can make them even better. If you have feedback about one of the AL2 beta platforms, please add a comment to the relevant issue on the public roadmap on GitHub. For example, here is the issue for the Corretto platform. Keep an eye on the roadmap and our release notes for announcements of the remaining platforms over the coming weeks.

 

How to run AWS CloudHSM workloads on AWS Lambda

Post Syndicated from Mohamed AboElKheir original https://aws.amazon.com/blogs/security/how-to-run-aws-cloudhsm-workloads-on-aws-lambda/

AWS CloudHSM is a cloud-based hardware security module (HSM) that enables you to generate and use your own encryption keys on the AWS Cloud. With CloudHSM, you can manage your own encryption keys using FIPS 140-2 Level 3 validated HSMs. CloudHSM also automatically manages synchronization, high availability and failover within a cluster.

When the service first launched, many customers ran CloudHSM workloads on Amazon Elastic Compute Cloud (Amazon EC2), which required the CloudHSM client to be installed on the Amazon EC2 instance in order to communicate with the CloudHSM cluster. Today, we see customers who are interested in leveraging CloudHSM for serverless workloads using AWS Lambda, but when using Lambda there is no “instance” to install the CloudHSM client on. This blog post shows a workaround that can be used to satisfy the CloudHSM client installation requirement on Lambda functions to be able to run CloudHSM workloads within these Lambda functions.

The workaround is performed by first packaging the CloudHSM client and its requirements in a Lambda layer, and then running the CloudHSM client in a child process from within the Lambda function code to allow communication with the HSMs in your CloudHSM cluster. By leveraging this approach, you gain the benefits of serverless computing (such as increased scalability and decreased admin overhead), as well as the ability to integrate with other AWS services like Amazon CloudWatch Events, Amazon Simple Storage Service (Amazon S3) and AWS Config.

Why would I want to run CloudHSM workloads on Lambda?

Below are some specific use cases enabled by this solution:

  1. When a file is added to an Amazon S3 bucket, you can trigger a Lambda function to encrypt or decrypt the file using keys stored in CloudHSM.
  2. When a file is added to an Amazon S3 bucket, you can trigger a Lambda function to create a digital signature for the file using a private key stored in CloudHSM. This digital signature can then be used to ensure file integrity.
  3. You can create a custom AWS Config rule that checks to ensure files in a directory or a bucket have not been tampered with by verifying their digital signatures using keys stored in CloudHSM.

Solution overview

This solution shows you how to package the CloudHSM client binary and its dependencies (configuration files and libraries) as well as the CloudHSM Java JCE library to a Lambda layer which is attached to the Lambda function. This enables the function to run the CloudHSM client daemon in the background as a child process, allowing it to connect to the CloudHSM cluster and to perform cryptographic tasks such as encryption and decryption operations.

Using a Lambda layer decouples the code of the Lambda function from the CloudHSM client and the CloudHSM Java JCE library. This way, when a new version of the CloudHSM client and the CloudHSM Java JCE library is released, it can be included in a new Lambda layer version and attached to the Lambda function without needing to rebuild the Lambda function package.

The example solution below includes a complete Java sample for the Lambda function. It uses the CloudHSM Java JCE library to generate a symmetric key on the HSM, and it uses this key to encrypt and decrypt after starting the CloudHSM client. Maven (a build automation tool) will be used to build the Lambda function package.

The solution uses AWS Secrets Manager to store and retrieve the crypto user (CU) credentials that are needed to perform cryptographic operations. If the HSM IPs of the CloudHSM cluster are changed (for example, if the HSMs are deleted and re-created), the Lambda function will automatically update the configuration during runtime.

Note:

  1. The solution only works with version 2.0.4 or later of the CloudHSM client and CloudHSM Java JCE library.
  2. In this workaround, the client is started at the beginning of each Lambda invocation, and is stopped at the end of the invocation. Due to the way Lambda works, the client can’t persist through multiple invocations.
  3. Secrets Manager uses AWS Key Management Service to secure its data. If your workload requires that all data be secured using HSMs under your sole control, without reliance on IAM credentials, this solution may not be appropriate. You should work with your security or compliance officer to ensure you’re using a method of securing HSM login credentials that meets your application and security needs.

Prerequisites

Figure 1: Architectural diagram

Figure 1: Architectural diagram

Here are the resources you’ll need in order to follow along with the example in Figure 1:

  1. An Amazon Virtual Private Cloud (Amazon VPC) with the following components:
    1. Private subnets in multiple Availability Zones to be used for the HSM’s elastic network interfaces (ENIs).
    2. A public subnet that contains a network address translation (NAT) gateway.
    3. A private subnet with a route table that routes internet traffic (0.0.0.0/0) to the NAT gateway. You’ll use this subnet to run the Lambda function. The NAT gateway allows you to connect to the CloudHSM, CloudWatch Logs and Secrets Manager endpoints.

    Note: For high availability, you can add multiple instances of the public and private subnets mentioned in Prerequisites 1.b and 1.c. For more information about how to create an Amazon VPC with public and private subnets as well as a NAT gateway, refer to the Amazon VPC user guide.

  2. An active CloudHSM cluster with at least one active HSM. The HSMs should be created in the private subnets mentioned in Prerequisite 1.a. You can follow the Getting Started with AWS CloudHSM guide to create and initialize the CloudHSM cluster.
  3. An Amazon Linux 2 EC2 instance with the CloudHSM client installed and configured to connect to the CloudHSM cluster. The client instance should be launched in the public subnet mentioned in Prerequisite 1.b. You can again refer to Getting Started With AWS CloudHSM to configure and connect the client instance.

    Note: You only need the client instance to build the Lambda function package. You can terminate the instance after the package has been created.

  4. CU credentials. You can create a CU by following the steps in the user guide.
  5. A server/machine with AWS Command Line Interface (AWS CLI) installed and configured. You’ll need this to follow along, as the example uses AWS CLI to create and configure the necessary AWS resources. The IAM user/role should have at minimum the permissions in the below policy attached to it to follow this example. Make sure you replace the <REGION> and <ACCOUNT-ID> tags below with the actual Region and account ID you are using.
    
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": "secretsmanager:CreateSecret",
                "Resource": "*",
                "Condition": {
                    "StringEquals": {
                        "secretsmanager:Name": "CloudHSM_CU"
                    }
                }
            },
            {
                "Sid": "VisualEditor1",
                "Effect": "Allow",
                "Action": [
                    "ec2:AuthorizeSecurityGroupEgress",
                    "lambda:CreateFunction",
                    "lambda:InvokeFunction",
                    "lambda:GetLayerVersion",
                    "lambda:PublishLayerVersion",
                    "iam:GetRole",
                    "iam:CreateRole",
                    "iam:AttachRolePolicy",
                    "iam:PutRolePolicy",
                    "iam:PassRole",
                    "secretsmanager:DescribeSecret",
                    "secretsmanager:GetResourcePolicy",
                    "secretsmanager:GetSecretValue",
                    "secretsmanager:PutResourcePolicy",
                    "logs:FilterLogEvents"
                ],
                "Resource": [
                    "arn:aws:ec2:<REGION>:<ACCOUNT-ID>:security-group/outbound-443",
                    "arn:aws:lambda:<REGION>:<ACCOUNT-ID>:function:cloudhsm_lambda_example",
                    "arn:aws:lambda:<REGION>:<ACCOUNT-ID>:layer:cloudhsm-client-layer",
                    "arn:aws:lambda:<REGION>:<ACCOUNT-ID>:layer:cloudhsm-client-layer:*",
                    "arn:aws:iam::<ACCOUNT-ID>:role/cloudhsm_lambda_example_role",
                    "arn:aws:secretsmanager:<REGION>:<ACCOUNT-ID>:secret:CloudHSM_CU*",
                    "arn:aws:logs:<REGION>:<ACCOUNT-ID>:log-group:/aws/lambda/cloudhsm_lambda_example:log-stream:"
                ]
            },
            {
                "Sid": "VisualEditor3",
                "Effect": "Allow",
                "Action": [
                    "ec2:DescribeVpcs",
                    "ec2:CreateSecurityGroup",
                    "ec2:DescribeSubnets",
                    "cloudhsm:DescribeClusters",
                    "ec2:DescribeSecurityGroups",
                    "ec2:AuthorizeSecurityGroupEgress"
                ],
                "Resource": "*"
            }
        ]
    }
    	

Step 1: Build the Lambda function package

In this step, you’ll build the Lambda function package using Maven. For more information about using Maven to build an AWS Lambda Java package, refer to the AWS Lambda developer guide.

  1. On your CloudHSM client instance, install the CloudHSM Java JCE library by following the steps in the user guide.
  2. Install OpenJDK 8 and Maven:
    
    $ sudo yum install -y java maven
    	

  3. Download the sample code, unzip it and move to the created directory. The directory will have the name aws-cloudhsm-on-aws-lambda-sample-master and will include:
    • A file with the name pom.xml that contains the Maven project configuration.
    • A file with the name SymmetricKeys.java which is also available on the AWS CloudHSM Java JCE samples repo. This file contains the function that you’ll use to generate the advanced encryption standard (AES) key.
    • A file with the name AESGCMEncryptDecryptLambda.java, which will run when the Lambda function is invoked:
      
      $ wget https://github.com/aws-samples/aws-cloudhsm-on-aws-lambda-sample/archive/master.zip
      $ unzip master.zip
      $ cd aws-cloudhsm-on-aws-lambda-sample-master/
      	

  4. Create a Java Archive (JAR) package by running the below commands. This will create the JAR file under the target/ directory with the name cloudhsm_lambda_project-1.0-SNAPSHOT.jar.

    
    $ export CLOUDHSM_VER=$(ls /opt/cloudhsm/java/ | grep "cloudhsm-[0-9\.]\+.jar" | grep -o "[0-9\.]\+[0-9]")
    $ export LOG4JCORE_VER=$(ls /opt/cloudhsm/java/ | grep "log4j-core-[0-9\.]\+.jar" | grep -o "[0-9\.]\+[0-9]")
    $ export LOG4JAPI_VER=$(ls /opt/cloudhsm/java/ | grep "log4j-api-[0-9\.]\+.jar" | grep -o "[0-9\.]\+[0-9]")
    $ mvn validate && mvn clean package 
    	

Step 2: Create the Lambda layer

In this step, you’ll create the Lambda layer that contains the CloudHSM client and its dependencies and the CloudHSM Java library JARs.

  1. On your CloudHSM client instance, create a directory called “layer” and change directories to it:
    
    $ mkdir ~/layer && cd ~/layer
    	

  2. Create the following directories, which you’ll use in the next steps to hold the CloudHSM binary and its prerequisites such as configuration files and libraries, and the CloudHSM Java JCE JARs:
    
    $ mkdir -p lib cloudhsm/bin cloudhsm/etc java/lib
    	

  3. Copy the cloudhsm_client binary and the needed configuration files to the directories you created in the previous step.
    
    $ cp /opt/cloudhsm/bin/cloudhsm_client cloudhsm/bin
    $ cp -r /opt/cloudhsm/etc/{cloudhsm_client.cfg,customerCA.crt,client.crt,client.key,certs} cloudhsm/etc
    	

  4. Add the necessary libraries by running the commands below. These libraries are needed by the Lambda function to be able to run the cloudhsm_client binary.
    
    $ cp /opt/cloudhsm/lib/libcaviumjca.so lib/
    $ ldd /opt/cloudhsm/bin/cloudhsm_client | awk '{print $3}' | grep "^/" | xargs -I{} cp {} lib/
    	

  5. Add the CloudHSM Java JCE Jars by running the commands below. These JARs include the classes needed by the Lambda function code to run.
    
    $ cp /opt/cloudhsm/java/{cloudhsm-[0-9]*.jar,log4j-*-*.jar} java/lib/
    	

  6. Create the Lambda layer ZIP archive by running the command below. This will create the archive with the name layer.zip in the home directory.
    
    $ zip -r ~/layer.zip * 
    	

  7. Move the ZIP archive (layer.zip) to the server/machine with AWS CLI installed and configured, and run the below command to create the Lambda layer with the name cloudhsm-client-layer.
    
    $ aws lambda publish-layer-version --layer-name cloudhsm-client-layer --zip-file fileb://layer.zip --compatible-runtimes java8
    	

Step 3: Create a secret to store the CU credentials

In this step, you will use Secrets Manager to create a secret to store your CU credentials. You must perform this step on your server/machine that has AWS CLI installed and configured.

Run the following command to create a secret with the name CloudHSM_CU that contains your CU user name and password (Prerequisite 4). Make sure to replace the user name and password below with your actual CU user name and password.


$ export HSM_USER=<user>
$ export HSM_PASSWORD=<password>
$ aws secretsmanager create-secret --name CloudHSM_CU --secret-string "{ \"HSM_USER\": \"$HSM_USER\", \"HSM_PASSWORD\": \"$HSM_PASSWORD\"}"

Step 4: Create an IAM role for the Lambda function

In this step, you’ll create an IAM role that has the permissions necessary for it to be assumed by the Lambda function.

  1. On the server/machine with AWS CLI installed and configured, create a new file with the name trust.json.
    
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "lambda.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }
    	

  2. Create a role named cloudhsm_lambda_example_role using the following AWS CLI command:

    
    $ aws iam create-role --role-name cloudhsm_lambda_example_role --assume-role-policy-document file://trust.json
    	

  3. Run the commands below to create a new file named policy.json. The policy in this file allows the IAM role to perform the following actions:
    • Writing to CloudWatch Logs. This permission allows the IAM role to write to the CloudWatch Logs of the Lambda function. You can then use the logs for troubleshooting. For more information about accessing CloudWatch Logs for Lambda, refer to this guide.
    • Retrieving the CU secret value from Secrets Manager. The CU credentials stored in the CU secret are needed by the Lambda function to be able to log-in to the CloudHSM cluster.
    • Describing CloudHSM clusters. This permission allows the Lambda function to check the current HSM IPs and update its configuration if the IPs have changed.
    
    $ export SECRET_ARN=$(aws secretsmanager describe-secret --secret-id "CloudHSM_CU" --query "ARN" --output text)
    
    $ cat <<EOF> policy.json
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "CWLogs",
                "Effect": "Allow",
                "Action": [
                    "logs:CreateLogGroup",
                    "logs:CreateLogStream",
                    "logs:PutLogEvents"
                ],
                "Resource": "*"
            },
            {
                "Sid": "SecretsManager",
                "Effect": "Allow",
                "Action": "secretsmanager:GetSecretValue",
                "Resource": "$SECRET_ARN"
            },
            {
                "Sid": "CloudHSM",
                "Effect": "Allow",
                "Action": "cloudhsm:DescribeClusters",
                "Resource": "*"
            }
        ]
    }
    EOF
    	

  4. Attach the policy to the IAM role created in step 2 of this section by running the following command:
    
    $ aws iam put-role-policy --role-name cloudhsm_lambda_example_role --policy-name cloudhsm_lambda_example_policy --policy-document file://policy.json
    	

  5. Attach the AWS managed policy AWSLambdaVPCAccessExecutionRole to the created role by running the command below. This policy allows the IAM role to access the VPC, which is necessary in order to run the Lambda function in a VPC and a subnet.
    
    $ aws iam attach-role-policy --role-name cloudhsm_lambda_example_role --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole
    	

  6. To make sure the CU secret is only accessible to the Lambda function role, run the below commands to attach a resource-based policy to the secret:
    
    $ export ROLE_ARN=$(aws iam get-role --role-name cloudhsm_lambda_example_role --query Role.Arn --output text)
    $ export ASSUMED_ROLE_ARN=$(echo $ROLE_ARN | sed -e "s/:iam:/:sts:/" -e "s/:role/:assumed-role/" -e "s/$/\/cloudhsm_lambda_example/")
    $ export ROOT_ARN=$(echo $ROLE_ARN | sed "s/:role.*/:root/")
    $ cat <<EOF> sm_policy.json
    { "Version": "2012-10-17",
    	"Statement": [
    		{
    			"Effect": "Deny",
    			"Action": "secretsmanager:GetSecretValue",
    			"NotPrincipal": {"AWS": [
    				"$ASSUMED_ROLE_ARN",
    				"$ROLE_ARN",
    				"$ROOT_ARN"
    			]},
    				"Resource": "*"
    		}
    	]
    }
    EOF
    
    $ aws secretsmanager put-resource-policy --resource-policy file://sm_policy.json --secret-id CloudHSM_CU
    	

Step 5: Create the Lambda function

In this step, you will create a Lambda function with the necessary settings.

  1. On the server/machine with AWS CLI installed and configured, run the command below to create a security group with the name outbound-443. This security group will be attached to the Lambda function to allow it to connect to the CloudWatch Logs, Secrets Manager and CloudHSM endpoints. Make sure to replace the CLUSTER_ID below with the actual CloudHSM cluster ID of your environment.
    
    $ export CLUSTER_ID=<cluster-xxxxxxxxxx>
    $ export CLUSTER_VPC=$(aws cloudhsmv2 describe-clusters --filters clusterIds=$CLUSTER_ID --query Clusters[0].VpcId --output text)
    $ export OUTBOUND_SG=$(aws ec2 create-security-group --group-name outbound-443 --description "Allow outbound access to port 443" --vpc-id $CLUSTER_VPC --output text)
    $ aws ec2 authorize-security-group-egress --group-id $OUTBOUND_SG --protocol tcp --port 443 --cidr 0.0.0.0/0
    	

  2. Move the JAR package generated in step 4 of the Step 1 section to the current directory on the server/machine that has AWS CLI installed and configured (The file was generated on the CloudHSM client instance under ~/aws-cloudhsm-on-aws-lambda-sample-master/target/cloudhsm_lambda_project-1.0-SNAPSHOT.jar).
  3. Replace the cluster ID and subnet ID below with the CloudHSM cluster ID of your environment, and the ID of the private Lambda subnet in your environment (Prerequisite 1.c), then run the commands below. These commands set environment variables that you’ll need for the next command.
    
    $ export CLUSTER_ID=<cluster-xxxxxxxxxx>
    $ export SUBNET_ID=<subnet-xxxxxxxx>
    $ export CLUSTER_VPC=$(aws cloudhsmv2 describe-clusters --filters clusterIds=$CLUSTER_ID --query Clusters[0].VpcId --output text)
    $ export OUTBOUND_SG=$(aws ec2 describe-security-groups --filters Name=group-name,Values=outbound-443  --query SecurityGroups[0].GroupId --output text)
    $ export CLUSTER_SG=$(aws cloudhsmv2 describe-clusters --filters clusterIds=$CLUSTER_ID --query Clusters[0].SecurityGroup --output text)
    $ export ROLE_ARN=$(aws iam get-role --role-name cloudhsm_lambda_example_role --query Role.Arn --output text)
    $ export LAYER_ARN=$(aws lambda get-layer-version --layer-name cloudhsm-client-layer --version-number 1 --query LayerVersionArn --output text)
    	

  4. Create a Lambda function with the name cloudhsm_lambda_example by running the below command:
    
    $ aws lambda create-function --function-name "cloudhsm_lambda_example" \
    --runtime java8 \
    --role $ROLE_ARN \
    --handler "com.amazonaws.cloudhsm.examples.AESGCMEncryptDecryptLambda::myhandler" \
    --timeout 600 \
    --memory-size 512 \
    --vpc-config SubnetIds=$SUBNET_ID,SecurityGroupIds=$CLUSTER_SG,$OUTBOUND_SG \
    --environment "Variables={CLUSTER_ID=$CLUSTER_ID, SECRET_ID=CloudHSM_CU,liquidsecurity_daemon_id=1}" \
    --layers $LAYER_ARN \
    --zip-file fileb://cloudhsm_lambda_project-1.0-SNAPSHOT.jar
    	

The command will create a Lambda function with the following configuration:

  • Runtime: Java8
  • Execution Role: The role you created in the Step 4 section.
  • Handler: The name of the class and the function in the package created in the Step 1 section.
  • Timeout: 10 minutes.
  • Memory size: 512 MB.
  • Subnet: The private Lambda subnet in your environment (Prerequisite 1.c).
  • Security Groups: The CloudHSM cluster security group AND the security group created in step 1 of the Step 5 section for outbound access to port 443 (outbound-443).
  • Code/Package: The JAR package you created in step 4 of the Step 1 section.
  • Layer: The layer created in the Step 2 section.
  • Environmental Variables:
    • CLUSTER_ID = the CloudHSM cluster ID in your environment
    • SECRET_ID = the ID of the secret you created in the Step 3 section
    • liquidsecurity_daemon_id = 1 (this is needed by the cloudhsm_client binary)

Step 6: Run the Lambda function

In this step, you will invoke the Lambda function and check the logs to view the output.

  1. You can invoke the Lambda function using the following command. This will execute the code in the package you created in Step 1.
    
    $ aws lambda invoke --function-name cloudhsm_lambda_example out.txt
    	

  2. You can check the function’s CloudWatch Log group with a command like this one:
    
    $ aws logs filter-log-events --log-group-name "/aws/lambda/cloudhsm_lambda_example" --start-time "`date -d "now -5min" +%s`000" --query events[*].message --output text | sed "s/\t/\n/g" 
    	

    If the Lambda function was successful, the output of the function should look something like the example below:

    
    START RequestId: 39c627f2-3908-4424-97ef-038c28a72f9a Version: $LATEST
    
    * Running GetSecretValue to get the CU credentials ...
    SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
    
    SLF4J: Defaulting to no-operation (NOP) logger implementation
    
    SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
    
    * Running DescribeClusters to get the HSM IP ...
    DescribeClusters returned the HSM IP = 1.2.3.4
    * Getting the HSM IP inf the configuration file ...
    The configuration file has the HSM IP = 1.2.3.4
    * Starting the cloudhsm client ...
    * Waiting for the cloudhsm client to start ...
    * cloudhsm client started ...
    * Adding the Cavium provider ...
    ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
    
    * Using credentials to Login to the CloudHSM Cluster ...
    Login successful!
    * Generating AES Key ...
    * Generating Random data to encrypt ...
    Plain Text data = 3B0566E9A3FADA8FED7D6C88FE92ECBE8526922E84489AB48F1F3F3116235E69
    * Encrypting data ...
    Cipher Text data = CA6D80AD34BBADEF34275743F309E6730ABC66BA19C2EADC731899B0FB86564EDDB9F7FC103E1C9C2A6A1E64BF2D2C48
    * Decrypting ciphertext ...
    Decrypted Text data = 3B0566E9A3FADA8FED7D6C88FE92ECBE8526922E84489AB48F1F3F3116235E69
     * Successful decryption
    * Logging out the CloudHSM Cluster
    * Closing client ...
    END RequestId: 39c627f2-3908-4424-97ef-038c28a72f9a
    
    REPORT RequestId: 39c627f2-3908-4424-97ef-038c28a72f9a
    Duration: 11990.69 ms
    Billed Duration: 12000 ms
    Memory Size: 512 MB
    Max Memory Used: 103 MB
    	

Note: The StatusLogger No log4j2 configuration file found error above is normal and can be ignored. This is related to missing log4j configuration which is normally used to configure logging, but is not needed in this case as the log messages are being written to CloudWatch Logs by default.

Conclusion

This solution demonstrates how to run CloudHSM workloads on Lambda, which allows you to not only leverage the flexibility of serverless computing, but also helps you meet security and compliance requirements by performing cryptographic tasks such as encryption and decryption operations. This approach also allows you to integrate with other AWS services like Amazon CloudWatch Events, Amazon Simple Storage Service (Amazon S3), or AWS Config for a seamless experience across your environment.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the AWS CloudHSM forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author photo

Mohamed AboElKheir

Mohamed AboElKheir is an Application Security Engineer who works with different teams to ensure AWS services, applications, and websites are designed and implemented to the highest security standards. He is a subject matter expert for CloudHSM and is always enthusiastic about assisting CloudHSM customers with advanced issues and use cases. Mohamed is passionate about InfoSec, specifically cryptography, penetration testing (he’s OSCP certified), application security, and cloud security (he’s AWS Security Specialty certified).

A Disk-Backed ArrayList

Post Syndicated from Bozho original https://techblog.bozho.net/a-disk-backed-arraylist/

It sometimes happens that your list can become too big to fit in memory and you have to do something in order to avoid running out of memory.

The proper way to do that is streaming – instead of fitting everything in memory, you should stream data from the source and discard the entries that are already processed.

However, there are cases when code that’s outside of your control requires a List and you can’t use streaming. These cases are rather rare but in case you hit them, you have to find a workaround. One is to re-implement the code to work with streaming, but depending on the way the library is written, it may not be possible. So the other option is to use a disk-backed list – one that works as a list, but underneath stores and loads elements from disk.

Searching for existing solutions results in several 3+ years old repos like this one and this one and this one.

And then there’s MapDB, which is great and supported. It’s mostly about maps, but it does support a List as well, as shown here.

And finally, you have the option to implement something simpler yourself, in case you need just iteration and almost nothing else. I’ve done it here – DiskBackedArrayList.java. It doesn’t support many things (not all methods are overridden to throw an exception, but they should). But most importantly, it doesn’t support random adding and random getting, and also toArray(). It’s purely “fill the collection” and then “iterate the collection”. It relies on ObjectOutputStream which is not terribly efficient, but is simple to use. Note that I’ve allowed a short in-memory prependList in case small amounts of data need to be prepended to the list.

The list gets filled in memory until a specified threshold and then gets flushed to disk, clearing the memory which starts getting filled again. This too can be more efficient – with background flushing in another thread that doesn’t interfere with adding elements to the list, but optimizations complicate things and in this case the total running time was not an issue. Most importantly, the iterator() method is overridden to return a custom iterator that first streams the prepended list, then reads everything from disk and finally iterates over the latest batch which is still in memory. And finally, the clear() method should be called in the end in order to close the underlying stream. An output stream could be opened and closed on each flush, but ObjectOutputStream can’t be used in append mode due to some implementation specific about writing headers first.

So basically we hide the streaming approach underneath a List interface – it’s still streaming elements and discarding them when not needed. Ideally this should be done at the source of the data (e.g. a database, message queue, etc.) rather than using the disk as overflow space, but there are cases where using the disk is fine. This implementation is a starting point, as it’s not tested in production, but illustrates that you can adapt existing classes to use different data access patterns if needed.

The post A Disk-Backed ArrayList appeared first on Bozho's tech blog.

Java 11 runtime now available in AWS Lambda

Post Syndicated from Rob Sutter original https://aws.amazon.com/blogs/compute/java-11-runtime-now-available-in-aws-lambda/

We are excited to announce that you can now develop your AWS Lambda functions using the Java 11 runtime. Start using this runtime today by specifying a runtime parameter value of java11 when creating or updating your Lambda functions.

The Java 11 runtime does not introduce any changes in Lambda’s programming model, such as handler definition or logging statements. Customers can continue authoring their Lambda functions in Java as they have in the past while benefitting from the new features of Java 11.

New features in Java 11 runtime

Java 11 is a long-term support release and brings with it several new features, including a Java-native HTTP client with HTTP/2 support and the var keyword. The Java 11 runtime also benefits from Amazon Corretto running on Amazon Linux 2.

HTTP client (standard)

Java 11 introduces a native HTTP client, HttpClient. Previous versions of Java provided the HttpURLConnection class for accessing HTTP resources but, for more complex use cases, developers typically had to select and import a third-party library. HttpClient supports both synchronous and asynchronous HTTP requests.

Example: Synchronous HTTP request

Synchronous requests block execution while the HTTP client waits for a response. This is a common programming model for Lambda functions that are invoked synchronously themselves, for example, via Amazon API Gateway.

package helloworld;

import java.net.http.HttpClient;
import java.net.http.HttpHeaders;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.net.http.HttpResponse.BodyHandlers;
import java.net.URI;
import java.time.Duration;
import java.util.HashMap;
import java.util.Map;

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;

/**
 * Handler for requests to Lambda function.
 */
public class App implements RequestHandler<Object, Object> {

    public Object handleRequest(final Object input, final Context context) {
        Map<String, String> headers = new HashMap<>();
        headers.put("Content-Type", "application/json");
        headers.put("X-Custom-Header", "application/json");
        HttpClient client = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
            .GET()
            .version(HttpClient.Version.HTTP_2)
            .uri(URI.create("https://checkip.amazonaws.com"))
            .timeout(Duration.ofSeconds(15))
            .build();

        try {
            HttpResponse<String> response =
            client.send(request, BodyHandlers.ofString());

            String output = String.format("{ \"message\": \"hello world\", \"location\": \"%s\" }", response.body());
            return new GatewayResponse(output, headers, response.statusCode());    
        } catch (Exception e) {
            return new GatewayResponse("{}", headers, 500);
        }
    }
}

 

The var keyword

The var keyword allows you to declare local variables and infer their type at compile time. This helps reduce verbosity, especially with composite types, as you no longer have to explicitly define type information on both sides of the equal sign. For example, to create a map of key/value string pairs, you can now do:

var map = new HashMap<String, String>();

Corretto benefits

The Java 11 runtime benefits from Amazon Corretto. Corretto is a no-cost, multiplatform, production-ready distribution of the Open Java Development Kit (OpenJDK). Corretto comes with long-term support that will include performance enhancements and security fixes. Amazon runs Corretto internally on thousands of production services.

Special considerations

Developers migrating to the new runtimes should consider the following known issues.

Java 8 to Java 11 migration

After migrating from Java 8 to Java 11, using internal packages such as sun.misc.* or sun.* now produces compiler errors instead of warnings.

Amazon Linux 2

Java 11, like Python 3.8 and Node.js 10 and 12, is based on an Amazon Linux 2 execution environment. Amazon Linux 2 provides a secure, stable, and high-performance execution environment to develop and run cloud and enterprise applications.

Next steps

Get started building with Java 11 today by specifying a runtime parameter value of java11 when creating or updating your Lambda functions.

Hope you enjoy building with the new features in Java 11!

Running Java applications on Amazon EC2 A1 instances with Amazon Corretto

Post Syndicated from Neelay Thaker original https://aws.amazon.com/blogs/compute/running-java-applications-on-amazon-ec2-a1-instances-with-amazon-corretto/

This post is contributed by Jeff Underhill | EC2 Principal Business Development Manager and Arthur Petitpierre | Arm Specialist Solutions Architect

 

Amazon EC2 A1 instances deliver up to 45% cost savings for scale-out applications and are powered by AWS Graviton Processors that feature 64-bit Arm Neoverse cores and custom silicon designed by AWS. Amazon Corretto is a no-cost, multiplatform, production-ready distribution of the Open Java Development Kit (OpenJDK).

Production-ready Arm 64-bit Linux builds of Amazon Corretto for JDK8 and JDK 11 were released Sep 17, 2019. This provided an additional Java runtime option when deploying your scale-out Java applications on Amazon EC2 A1 instances. We’re fortunate to have James Gosling, the designer of Java, as a member of the Amazon team, and he recently took to Twitter to announce the General Availability (GA) of Amazon Corretto for the Arm architecture:

For those of you that like playing with Linux on ARM, the Corretto build for ARM64 is now GA.  Fully production ready. Both JDK8 and JDK11

If you’re interested to experiment with Amazon Corretto on Amazon EC2 A1 instances then read on for step-by-step instructions that will have you up and running in no time.

Launching an A1 EC2 instance

The first step is to create a running Amazon EC2 A1 instance. In this example, we demonstrate how to boot your instance using Amazon Linux 2. Starting from the AWS Console, you need to log-in to your AWS account or create a new account if you don’t already have one. Once you logged into the AWS console, navigate to the Amazon Elastic Compute Cloud (Amazon EC2) as follows:

Once you logged into the AWS console, navigate to the Amazon Elastic Compute Cloud (Amazon EC2) and click on Launch a virtual machine

Next, select the operating system and compute architecture of the EC2 instance you want to launch.  In this case, use Amazon Linux 2 because we want an AWS Graviton-based A1 instance we’re selecting the 64-bit (Arm):use Amazon Linux 2 because we want an AWS Graviton-based A1 instance we’re selecting the 64-bit (Arm)

On the next page, we select an A1 instance type. select an a1.xlarge that offers 4 x vCPU’s and 8GB of memory (refer to the Amazon EC2 A1 page for more information). Then, select the “Review and Launch” button:

select an A1 instance type - an a1.xlarge that offers 4 x vCPU’s and 8GB of memory

Next, you can review a summary of your instance details. This summary is shown in the following pictures. Note: the only network port exposed is SSH via TCP port 22. This allows you to remotely connect to the instance via an SSH terminal:

review a summary of your instance details

Before proceeding be aware you are about to start spending money (and don’t forget to terminate the instance at the end to avoid ongoing charges). As the warning in the screen shot above states: the A1 instance selected is not eligible for free tier.  So, you are charged based on the pricing of the instance selected (refer to the Amazon EC2 on-demand pricing page for details. The a1.xlarge instance selected is $0.102 per Hour as of this writing).

Once you’re ready to proceed, select “Launch” to continue. At this point you need to create or supply an existing key-pair for use when connecting to the new instance via SSH. Details to establish this connection can be found in the EC2 documentation.

In this example, I connect from a Windows laptop using PuTTY.  The details of converting EC2 keys into the right format can be found here. You can connect the same way. In the following screenshot, I use an existing key-pair that I generated. You can create an existing key-pair that best suits your workload and do the following:

Select an existing key pair or create one
While your instance launches, you can click on “View Instances” to see the status of instances within my AWS account:

click on “View Instances” to see the status of instances

Once you click on “View Instances,” you can see that your newly launched instance is now in the Running state:

Once you click on “View Instances,” you can see that your newly launched instance is now in the Running state

Now, you can connect to your new instance. Right click on the instance from within the console, then select “Connect” from the pop-up menu to get details and instructions on connecting to the instance. This is shown in the following screenshot:

select “Connect” from the pop-up menu to get details and instructions on connecting to the instance
The following screenshot provides you with instructions and specific details needed to connect to your running A1 instance:

Connect to your instance using an SSH Client
You can now connect to the running a1.xlarge instance through instructions to map your preferred SSH client.

Then, the Amazon Linux 2 command prompt pops up as follows:

Note: I run the ‘uname -a’ command to show that you are running on an ‘aarch64’ architecture which is the Linux architecture name for 64-bit Arm.

 run the ‘uname -a’ command to show that you are running on an ‘aarch64’ architecture which is the Linux architecture name for 64-bit Arm

Once you complete this step, your A1 instance is up and running. From here, you can leverage Corretto8.

 

Installing corretto8

You can now install Amazon Corretto 8 on Amazon Linux 2 following the instructions from the documentation.  Use option 1 to install the application from Amazon Linux 2 repository:

$ sudo amazon-linux-extras enable corretto8

$ sudo yum clean metadata

$ sudo yum install -y java-1.8.0-amazon-corretto

This code initiates the installation. Once complete, you can use the java version command to see that you have the newest version of Amazon Corretto.  The java command is as follows (your version may be more recent):

$ java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment Corretto-8.232.09.1 (build 1.8.0_232-b09)
OpenJDK 64-Bit Server VM Corretto-8.232.09.1 (build 25.232-b09, mixed mode 

This command confirms that you have Amazon Corretto 8 version 8.232.09.1 installed and ready to go. If you see a version string that doesn’t mention Corretto, this means you have another version of Java already running. In this case, run the following command to change the default java providers:

$ sudo alternatives --config java

Installing tomcat8.5 and a simple JSP application

Once the latest Amazon Corretto is installed, confirm that the Java installation works. You can do this by installing and running a simple Java application.

To run this test, you need to install Apache Tomcat, which is a Java based application web server. Then, open up a public port to make it accessible and connect to it from a browser to confirm it’s running as expected.

Then, install tomcat8.5 from amazon-linux-extras using the following code:

$ sudo amazon-linux-extras enable tomcat8.5
$ sudo yum clean metadata
$ sudo yum install -y tomcat 

Now configure tomcat to use /dev/urandom as an entropy source. This is important to do because otherwise tomcat might hang on a freshly booted instance if there’s not enough entropy depth. Note: there’s a kernel patch in flight to provide an alternate entropy mechanism:

$ sudo bash -c 'echo JAVA_OPTS=\"-Djava.security.egd=file:/dev/urandom\" >> /etc/tomcat/tomcat.conf' 

Next, add a simple JavaServer Pages (JSP) application that will display details about your system.

First,  create default web application directory:

$ sudo install -d -o root -g tomcat /var/lib/tomcat/webapps/ROOT 

Then, add the small JSP application:

$ sudo bash -c 'cat <<EOF > /var/lib/tomcat/webapps/ROOT/index.jsp
<html>
<head>
<title>Corretto8 - Tomcat8.5 - Hello world</title>
</head>
<body>
  <table>
    <tr>
      <td>Operating System</td>
      <td><%= System.getProperty("os.name") %></td>
    </tr>
    <tr>
      <td>CPU Architecture</td>
      <td><%= System.getProperty("os.arch") %></td>
    </tr>
    <tr>
      <td>Java Vendor</td>
      <td><%= System.getProperty("java.vendor") %></td>
    </tr>
    <tr>
      <td>Java URL</td>
      <td><%= System.getProperty("java.vendor.url") %></td>
    </tr>
    <tr>
      <td>Java Version</td>
      <td><%= System.getProperty("java.version") %></td>
    </tr>
    <tr>
      <td>JVM Version</td>
      <td><%= System.getProperty("java.vm.version") %></td>
    </tr>
    <tr>
      <td>Tomcat Version</td>
      <td><%= application.getServerInfo() %></td>
    </tr>
</table>

</body>
</html>
EOF
'

Finally, start the Tomcat service:

$ sudo systemctl start tomcat 

Now the Tomcat service is running, you need to configure your EC2 instance to open TCP port 8080 (the default port that Tomcat listens on). This configuration allows you to access the instance from a browser and confirm Tomcat is running and serving content.

To do this, return to the AWS console and select your EC2 a1.xlarge instance. Then,  in the information panel below, select the associated security group so we can modify the inbound rules to allow TCP accesses on port 8080 as follows:

select the associated security group so we can modify the inbound rules to allow TCP accesses on port 8080

With these modifications you should now be able to connect to the Apache Tomcat default page by directing a browser to http://<your instance IPv4 Public IP>:8080 as follows:

connect to the Apache Tomcat default page by directing a browser to http://<your instance IPv4 Public IP>:8080
 Don’t forget to terminate your EC2 instance(s) when you’re done to avoid ongoing charges!

 

To summarize, we spun up an Amazon EC2 A1 instance, installed and enabled Amazon Corretto and Apache Tomcat server, configured the security group for the EC2 Instance to accept connections to TCP port 8080 and then created and connected to a simple default JSP web page. Being able to display the JSP page confirms  that you’re serving content, and can see the underlying Java Virtual Machine and platform architecture specifications. These steps demonstrate setting-up the Amazon Corretto + Apache Tomcat environment, and running a demo JSP web application on AWS Graviton based Amazon EC2 A1 instances using readily available open source software.

You can learn more at the Amazon Corretto website, and the downloads are all available here for Amazon Corretto 8Amazon Corretto 11 and if you’re using containers here’s the Docker Official image.  If you have any questions about your own workloads running on Amazon EC2 A1 instances, contact us at [email protected].

 

How to run AWS CloudHSM workloads on Docker containers

Post Syndicated from Mohamed AboElKheir original https://aws.amazon.com/blogs/security/how-to-run-aws-cloudhsm-workloads-on-docker-containers/

AWS CloudHSM is a cloud-based hardware security module (HSM) that enables you to generate and use your own encryption keys on the AWS Cloud. With CloudHSM, you can manage your own encryption keys using FIPS 140-2 Level 3 validated HSMs. Your HSMs are part of a CloudHSM cluster. CloudHSM automatically manages synchronization, high availability, and failover within a cluster.

CloudHSM is part of the AWS Cryptography suite of services, which also includes AWS Key Management Service (KMS) and AWS Certificate Manager Private Certificate Authority (ACM PCA). KMS and ACM PCA are fully managed services that are easy to use and integrate. You’ll generally use AWS CloudHSM only if your workload needs a single-tenant HSM under your own control, or if you need cryptographic algorithms that aren’t available in the fully-managed alternatives.

CloudHSM offers several options for you to connect your application to your HSMs, including PKCS#11, Java Cryptography Extensions (JCE), or Microsoft CryptoNG (CNG). Regardless of which library you choose, you’ll use the CloudHSM client to connect to all HSMs in your cluster. The CloudHSM client runs as a daemon, locally on the same Amazon Elastic Compute Cloud (EC2) instance or server as your applications.

The deployment process is straightforward if you’re running your application directly on your compute resource. However, if you want to deploy applications using the HSMs in containers, you’ll need to make some adjustments to the installation and execution of your application and the CloudHSM components it depends on. Docker containers don’t typically include access to an init process like systemd or upstart. This means that you can’t start the CloudHSM client service from within the container using the general instructions provided by CloudHSM. You also can’t run the CloudHSM client service remotely and connect to it from the containers, as the client daemon listens to your application using a local Unix Domain Socket. You cannot connect to this socket remotely from outside the EC2 instance network namespace.

This blog post discusses the workaround that you’ll need in order to configure your container and start the client daemon so that you can utilize CloudHSM-based applications with containers. Specifically, in this post, I’ll show you how to run the CloudHSM client daemon from within a Docker container without needing to start the service. This enables you to use Docker to develop, deploy and run applications using the CloudHSM software libraries, and it also gives you the ability to manage and orchestrate workloads using tools and services like Amazon Elastic Container Service (Amazon ECS), Kubernetes, Amazon Elastic Container Service for Kubernetes (Amazon EKS), and Jenkins.

Solution overview

My solution shows you how to create a proof-of-concept sample Docker container that is configured to run the CloudHSM client daemon. When the daemon is up and running, it runs the AESGCMEncryptDecryptRunner Java class, available on the AWS CloudHSM Java JCE samples repo. This class uses CloudHSM to generate an AES key, then it uses the key to encrypt and decrypt randomly generated data.

Note: In my example, you must manually enter the crypto user (CU) credentials as environment variables when running the container. For any production workload, you’ll need to carefully consider how to provide, secure, and automate the handling and distribution of your HSM credentials. You should work with your security or compliance officer to ensure that you’re using an appropriate method of securing HSM login credentials for your application and security needs.

Figure 1: Architectural diagram

Figure 1: Architectural diagram

Prerequisites

To implement my solution, I recommend that you have basic knowledge of the below:

  • CloudHSM
  • Docker
  • Java

Here’s what you’ll need to follow along with my example:

  1. An active CloudHSM cluster with at least one active HSM. You can follow the Getting Started Guide to create and initialize a CloudHSM cluster. (Note that for any production cluster, you should have at least two active HSMs spread across Availability Zones.)
  2. An Amazon Linux 2 EC2 instance in the same Amazon Virtual Private Cloud in which you created your CloudHSM cluster. The EC2 instance must have the CloudHSM cluster security group attached—this security group is automatically created during the cluster initialization and is used to control access to the HSMs. You can learn about attaching security groups to allow EC2 instances to connect to your HSMs in our online documentation.
  3. A CloudHSM crypto user (CU) account created on your HSM. You can create a CU by following these user guide steps.

Solution details

  1. On your Amazon Linux EC2 instance, install Docker:
    
            # sudo yum -y install docker
            

  2. Start the docker service:
    
            # sudo service docker start
            

  3. Create a new directory and step into it. In my example, I use a directory named “cloudhsm_container.” You’ll use the new directory to configure the Docker image.
    
            # mkdir cloudhsm_container
            # cd cloudhsm_container           
            

  4. Copy the CloudHSM cluster’s CA certificate (customerCA.crt) to the directory you just created. You can find the CA certificate on any working CloudHSM client instance under the path /opt/cloudhsm/etc/customerCA.crt. This certificate is created during initialization of the CloudHSM Cluster and is needed to connect to the CloudHSM cluster.
  5. In your new directory, create a new file with the name run_sample.sh that includes the contents below. The script starts the CloudHSM client daemon, waits until the daemon process is running and ready, and then runs the Java class that is used to generate an AES key to encrypt and decrypt your data.
    
            #! /bin/bash
    
            # start cloudhsm client
            echo -n "* Starting CloudHSM client ... "
            /opt/cloudhsm/bin/cloudhsm_client /opt/cloudhsm/etc/cloudhsm_client.cfg &> /tmp/cloudhsm_client_start.log &
            
            # wait for startup
            while true
            do
                if grep 'libevmulti_init: Ready !' /tmp/cloudhsm_client_start.log &> /dev/null
                then
                    echo "[OK]"
                    break
                fi
                sleep 0.5
            done
            echo -e "\n* CloudHSM client started successfully ... \n"
            
            # start application
            echo -e "\n* Running application ... \n"
            
            java -ea -Djava.library.path=/opt/cloudhsm/lib/ -jar target/assembly/aesgcm-runner.jar --method environment
            
            echo -e "\n* Application completed successfully ... \n"                      
            

  6. In the new directory, create another new file and name it Dockerfile (with no extension). This file will specify that the Docker image is built with the following components:
    • The AWS CloudHSM client package.
    • The AWS CloudHSM Java JCE package.
    • OpenJDK 1.8. This is needed to compile and run the Java classes and JAR files.
    • Maven, a build automation tool that is needed to assist with building the Java classes and JAR files.
    • The AWS CloudHSM Java JCE samples that will be downloaded and built.
  7. Cut and paste the contents below into Dockerfile.

    Note: Make sure to replace the HSM_IP line with the IP of an HSM in your CloudHSM cluster. You can get your HSM IPs from the CloudHSM console, or by running the describe-clusters AWS CLI command.

    
            # Use the amazon linux image
            FROM amazonlinux:2
            
            # Install CloudHSM client
            RUN yum install -y https://s3.amazonaws.com/cloudhsmv2-software/CloudHsmClient/EL7/cloudhsm-client-latest.el7.x86_64.rpm
            
            # Install CloudHSM Java library
            RUN yum install -y https://s3.amazonaws.com/cloudhsmv2-software/CloudHsmClient/EL7/cloudhsm-client-jce-latest.el7.x86_64.rpm
            
            # Install Java, Maven, wget, unzip and ncurses-compat-libs
            RUN yum install -y java maven wget unzip ncurses-compat-libs
            
            # Create a work dir
            WORKDIR /app
            
            # Download sample code
            RUN wget https://github.com/aws-samples/aws-cloudhsm-jce-examples/archive/master.zip
            
            # unzip sample code
            RUN unzip master.zip
            
            # Change to the create directory
            WORKDIR aws-cloudhsm-jce-examples-master
            
            # Build JAR files
            RUN mvn validate && mvn clean package
            
            # Set HSM IP as an environmental variable
            ENV HSM_IP <insert the IP address of an active CloudHSM instance here>
            
            # Configure cloudhms-client
            COPY customerCA.crt /opt/cloudhsm/etc/
            RUN /opt/cloudhsm/bin/configure -a $HSM_IP
            
            # Copy the run_sample.sh script
            COPY run_sample.sh .
            
            # Run the script
            CMD ["bash","run_sample.sh"]                        
            

  8. Now you’re ready to build the Docker image. Use the following command, with the name jce_sample_client. This command will let you use the Dockerfile you created in step 6 to create the image.
    
            # sudo docker build -t jce_sample_client .
            

  9. To run a Docker container from the Docker image you just created, use the following command. Make sure to replace the user and password with your actual CU username and password. (If you need help setting up your CU credentials, see prerequisite 3. For more information on how to provide CU credentials to the AWS CloudHSM Java JCE Library, refer to the steps in the CloudHSM user guide.)
    
            # sudo docker run --env HSM_PARTITION=PARTITION_1 \
            --env HSM_USER=<user> \
            --env HSM_PASSWORD=<password> \
            jce_sample_client
            

    If successful, the output should look like this:

    
            * Starting cloudhsm-client ... [OK]
            
            * cloudhsm-client started successfully ...
            
            * Running application ...
            
            ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors 
            to the console.
            70132FAC146BFA41697E164500000000
            Successful decryption
                SDK Version: 2.03
            
            * Application completed successfully ...          
            

Conclusion

My solution provides an example of how to run CloudHSM workloads on Docker containers. You can use it as a reference to implement your cryptographic application in a way that benefits from the high availability and load balancing built in to AWS CloudHSM without compromising on the flexibility that Docker provides for developing, deploying, and running applications. If you have comments about this post, submit them in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author photo

Mohamed AboElKheir

Mohamed AboElKheir joined AWS in September 2017 as a Security CSE (Cloud Support Engineer) based in Cape Town. He is a subject matter expert for CloudHSM and is always enthusiastic about assisting CloudHSM customers with advanced issues and use cases. Mohamed is passionate about InfoSec, specifically cryptography, penetration testing (he’s OSCP certified), application security, and cloud security (he’s AWS Security Specialty certified).

Bullet Updates – Windowing, Apache Pulsar PubSub, Configuration-based Data Ingestion, and More

Post Syndicated from rosaliebeevm original https://yahooeng.tumblr.com/post/183315480351

yahoodevelopers:

By Akshay Sarma, Principal Engineer, Verizon Media & Brian Xiao, Software Engineer, Verizon Media

This is the first of an ongoing series of blog posts sharing releases and announcements for Bullet, an open-sourced lightweight, scalable, pluggable, multi-tenant query system.

Bullet allows you to query any data flowing through a streaming system without having to store it first through its UI or API. The queries are injected into the running system and have minimal overhead. Running hundreds of queries generally fit into the overhead of just reading the streaming data. Bullet requires running an instance of its backend on your data. This backend runs on common stream processing frameworks (Storm and Spark Streaming currently supported).

The data on which Bullet sits determines what it is used for. For example, our team runs an instance of Bullet on user engagement data (~1M events/sec) to let developers find their own events to validate their code that produces this data. We also use this instance to interactively explore data, throw up quick dashboards to monitor live releases, count unique users, debug issues, and more.

Since open sourcing Bullet in 2017, we’ve been hard at work adding many new features! We’ll highlight some of these here and continue sharing update posts for future releases.

Windowing

Bullet used to operate in a request-response fashion – you would submit a query and wait for the query to meet its termination conditions (usually duration) before receiving results. For short-lived queries, say, a few seconds, this was fine. But as we started fielding more interactive and iterative queries, waiting even a minute for results became too cumbersome.

Enter windowing! Bullet now supports time and record-based windowing. With time windowing, you can break up your query into chunks of time over its duration and retrieve results for each chunk.  For example, you can calculate the average of a field, and stream back results every second:

In the above example, the aggregation is operating on all the data since the beginning of the query, but you can also do aggregations on just the windows themselves. This is often called a Tumbling window:

image

With record windowing, you can get the intermediate aggregation for each record that matches your query (a Sliding window). Or you can do a Tumbling window on records rather than time. For example, you could get results back every three records:

image

Overlapping windows in other ways (Hopping windows) or windows that reset based on different criteria (Session windows, Cascading windows) are currently being worked on. Stay tuned!

image
image

Apache Pulsar support as a native PubSub

Bullet uses a PubSub (publish-subscribe) message queue to send queries and results between the Web Service and Backend. As with everything else in Bullet, the PubSub is pluggable. You can use your favorite pubsub by implementing a few interfaces if you don’t want to use the ones we provide. Until now, we’ve maintained and supported a REST-based PubSub and an Apache Kafka PubSub. Now we are excited to announce supporting Apache Pulsar as well! Bullet Pulsar will be useful to those users who want to use Pulsar as their underlying messaging service.

If you aren’t familiar with Pulsar, setting up a local standalone is very simple, and by default, any Pulsar topics written to will automatically be created. Setting up an instance of Bullet with Pulsar instead of REST or Kafka is just as easy. You can refer to our documentation for more details.

image

Plug your data into Bullet without code

While Bullet worked on any data source located in any persistence layer, you still had to implement an interface to connect your data source to the Backend and convert it into a record container format that Bullet understands. For instance, your data might be located in Kafka and be in the Avro format. If you were using Bullet on Storm, you would perhaps write a Storm Spout to read from Kafka, deserialize, and convert the Avro data into the Bullet record format. This was the only interface in Bullet that required our customers to write their own code. Not anymore! Bullet DSL is a text/configuration-based format for users to plug in their data to the Bullet Backend without having to write a single line of code.

Bullet DSL abstracts away the two major components for plugging data into the Bullet Backend. A Connector piece to read from arbitrary data-sources and a Converter piece to convert that read data into the Bullet record container. We currently support and maintain a few of these – Kafka and Pulsar for Connectors and Avro, Maps and arbitrary Java POJOs for Converters. The Converters understand typed data and can even do a bit of minor ETL (Extract, Transform and Load) if you need to change your data around before feeding it into Bullet. As always, the DSL components are pluggable and you can write your own (and contribute it back!) if you need one that we don’t support.

We appreciate your feedback and contributions! Explore Bullet on GitHub, use and help contribute to the project, and chat with us on Google Groups. To get started, try our Quickstarts on Spark or Storm to set up an instance of Bullet on some fake data and play around with it.

New – AWS Toolkits for PyCharm, IntelliJ (Preview), and Visual Studio Code (Preview)

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-aws-toolkits-for-pycharm-intellij-preview-and-visual-studio-code-preview/

Software developers have their own preferred tools. Some use powerful editors, others Integrated Development Environments (IDEs) that are tailored for specific languages and platforms. In 2014 I created my first AWS Lambda function using the editor in the Lambda console. Now, you can choose from a rich set of tools to build and deploy serverless applications. For example, the editor in the Lambda console has been greatly enhanced last year when AWS Cloud9 was released. For .NET applications, you can use the AWS Toolkit for Visual Studio and AWS Tools for Visual Studio Team Services.

AWS Toolkits for PyCharm, IntelliJ, and Visual Studio Code

Today, we are announcing the general availability of the AWS Toolkit for PyCharm. We are also announcing the developer preview of the AWS Toolkits for IntelliJ and Visual Studio Code, which are under active development in GitHub. These open source toolkits will enable you to easily develop serverless applications, including a full create, step-through debug, and deploy experience in the IDE and language of your choice, be it Python, Java, Node.js, or .NET.

For example, using the AWS Toolkit for PyCharm you can:

These toolkits are distributed under the open source Apache License, Version 2.0.

Installation

Some features use the AWS Serverless Application Model (SAM) CLI. You can find installation instructions for your system here.

The AWS Toolkit for PyCharm is available via the IDEA Plugin Repository. To install it, in the Settings/Preferences dialog, click Plugins, search for “AWS Toolkit”, use the checkbox to enable it, and click the Install button. You will need to restart your IDE for the changes to take effect.

The AWS Toolkit for IntelliJ and Visual Studio Code are currently in developer preview and under active development. You are welcome to build and install these from the GitHub repositories:

Building a Serverless application with PyCharm

After installing AWS SAM CLI and AWS Toolkit, I create a new project in PyCharm and choose SAM on the left to create a serverless application using the AWS Serverless Application Model. I call my project hello-world in the Location field. Expanding More Settings, I choose which SAM template to use as the starting point for my project. For this walkthrough, I select the “AWS SAM Hello World”.

In PyCharm you can use credentials and profiles from your AWS Command Line Interface (CLI) configuration. You can change AWS region quickly if you have multiple environments.
The AWS Explorer shows Lambda functions and AWS CloudFormation stacks in the selected AWS region. Starting from a CloudFormation stack, you can see which Lambda functions are part of it.

The function handler is in the app.py file. After I open the file, I click on the Lambda icon on the left of the function declaration to have the option to run the function locally or start a local step-by-step debugging session.

First, I run the function locally. I can configure the payload of the event that is provided in input for the local invocation, starting from the event templates provided for most services, such as the Amazon API Gateway, Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), and so on. You can use a file for the payload, or select the share checkbox to make it available to other team members. The function is executed locally, but here you can choose the credentials and the region to be used if the function is calling other AWS services, such as Amazon Simple Storage Service (S3) or Amazon DynamoDB.

A local container is used to emulate the Lambda execution environment. This function is implementing a basic web API, and I can check that the result is in the format expected by the API Gateway.

After that, I want to get more information on what my code is doing. I set a breakpoint and start a local debugging session. I use the same input event as before. Again, you can choose the credentials and region for the AWS services used by the function.

I step over the HTTP request in the code to inspect the response in the Variables tab. Here you have access to all local variables, including the event and the context provided in input to the function.

After that, I resume the program to reach the end of the debugging session.

Now I am confident enough to deploy the serverless application right-clicking on the project (or the SAM template file). I can create a new CloudFormation stack, or update an existing one. For now, I create a new stack called hello-world-prod. For example, you can have a stack for production, and one for testing. I select an S3 bucket in the region to store the package used for the deployment. If your template has parameters, here you can set up the values used by this deployment.

After a few minutes, the stack creation is complete and I can run the function in the cloud with a right-click in the AWS Explorer. Here there is also the option to jump to the source code of the function.

As expected, the result of the remote invocation is the same as the local execution. My serverless application is in production!

Using these toolkits, developers can test locally to find problems before deployment, change the code of their application or the resources they need in the SAM template, and update an existing stack, quickly iterating until they reach their goal. For example, they can add an S3 bucket to store images or documents, or a DynamoDB table to store your users, or change the permissions used by their functions.

I am really excited by how much faster and easier it is to build your ideas on AWS. Now you can use your preferred environment to accelerate even further. I look forward to seeing what you will do with these new tools!

New – Amazon Kinesis Data Analytics for Java

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-amazon-kinesis-data-analytics-for-java/

Customers are using Amazon Kinesis to collect, process, and analyze real-time streaming data. In this way, they can react quickly to new information from their business, their infrastructure, or their customers. For example, Epic Games ingests more than 1.5 million game events per second for its popular online game, Fornite.

With Amazon Kinesis Data Analytics you can process data in real-time using standard SQL. While SQL provides an easy way to quickly query large volumes of streaming data without learning new frameworks or languages, many customers also want to build more sophisticated data processing applications using general-purpose programming languages.

Using Java with Amazon Kinesis Data Analytics

Today, we are introducing support for Java in Amazon Kinesis Data Analytics. Now, developers can use their own Java code to create powerful real-time applications that process streaming data like continuously transforming and loading data into their data lakes, generating metrics to feed real-time gaming leaderboards, applying machine learning models to data streams from connected devices, and more.

To use this new functionality, developers build applications using open source libraries which include built-in operators for common data processing functions that allow applications to organize, transform, aggregate, and analyze data at any scale. These libraries are both open source and you can run them anywhere:

  • Apache Flink, an open source framework and engine for processing data streams.
  • AWS SDK for Java, providing Java APIs for many AWS services.

Developers can use these Java libraries within their Integrated Development Environment (IDE) of choice. Using these libraries, the following AWS services can be integrated with as little as one line of code:

  • Streaming Data Sources: Amazon Kinesis Data Streams
  • Streaming Destinations: Amazon S3, Amazon DynamoDB, Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose

In addition to the pre-built AWS integrations, the Java libraries include more connectors to tools like Cassandra, ElasticSearch, RabbitMQ, Redis, and more, and the ability to build custom integrations.

Building a Kinesis Data Streams Java Application

I prepared a simple Java application that implements the “mandatory” word count example for data processing. I send some paragraphs of text in input and I get, every five seconds, the number of times each word is being used as output.

First, I create two Kinesis Data Streams:

  • TextInputStream, where I am going to send my input records
  • WordCountOutputStream, where I am going to read the output of the Java application

 

Here is the code of the word-count Java application. To read and write from Kinesis Data Streams, I am using the Kinesis Connector from the Apache Flink project.

public class StreamingJob {

    private static final String region = "us-east-1";
    private static final String inputStreamName = "TextInputStream";
    private static final String outputStreamName = "WordCountOutputStream";

    private static DataStream<String> createSourceFromStaticConfig(
            StreamExecutionEnvironment env) {
        Properties inputProperties = new Properties();
        inputProperties.setProperty(ConsumerConfigConstants.AWS_REGION, region);
        inputProperties.setProperty(ConsumerConfigConstants.STREAM_INITIAL_POSITION,
            "LATEST");

        return env.addSource(new FlinkKinesisConsumer<>(inputStreamName,
            new SimpleStringSchema(), inputProperties));
    }

    private static FlinkKinesisProducer<String> createSinkFromStaticConfig() {
        Properties outputProperties = new Properties();
        outputProperties.setProperty(ConsumerConfigConstants.AWS_REGION, region);

        FlinkKinesisProducer<String> sink = new FlinkKinesisProducer<>(new
            SimpleStringSchema(), outputProperties);
        sink.setDefaultStream(outputStreamName);
        sink.setDefaultPartition("0");
        return sink;
    }

    public static void main(String[] args) throws Exception {

        final StreamExecutionEnvironment env =
        StreamExecutionEnvironment.getExecutionEnvironment();

        DataStream<String> input = createSourceFromStaticConfig(env);

        input.flatMap(new Tokenizer())
             .keyBy(0)
             .timeWindow(Time.seconds(5))
             .sum(1)
             .map(new MapFunction<Tuple2<String, Integer>, String>() {
                 @Override
                 public String map(Tuple2<String, Integer> value) throws Exception {
                     return value.f0 + "," + value.f1.toString();
                }
             })
             .addSink(createSinkFromStaticConfig());

        env.execute("Word Count");
    }

    public static final class Tokenizer
        implements FlatMapFunction<String, Tuple2<String, Integer>> {

		@Override
		public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
			String[] tokens = value.toLowerCase().split("\\W+");
			for (String token : tokens) {
				if (token.length() > 0) {
					out.collect(new Tuple2<>(token, 1));
				}
			}
		}
    }
    
}

The most important part of the application is the manipulation of the input object, where I apply a few DataStream Transformations:

  1. I start with a DataFrame containing the String from the input stream.
  2. I use a Tokenizer in a FlatMap to split the sentence into “words”, each word followed by the number “1”.
  3. I apply the KeyBy operator to logically partition the stream in respect to the “word”.
  4. I use a 5 seconds tumbling window.
  5. I aggregate within the window, summing up for each word the number “1” to count them.
  6. I use a simple Map for each record to join the word and the number into a comma-separated values (CSV) String that I send to the output stream.

One of the most powerful operators shown here is the KeyBy operator. It enables you to re-organize a particular stream by a specified key in real-time. This type of re-keying enables further downstream operations like aggregations, counts, and much more. This enables you to set up streaming map-reduce on different keys within the same application.

I build the Java application using Maven and load the output JAR to an Amazon Simple Storage Service (S3) bucket in the region where I want to deploy the application. In the Kinesis Data Analytics console, I create a new application and select “Flink” as runtime:

I then configure the application to use the code on my S3 bucket. The console updates the IAM role for the application to have permissions to read the code.

You can optionally add key/value properties to the configuration of the application. You can read those properties from within the application, to provide customization at deployment time.

For monitoring, I leave the default metrics. I enable logging to Amazon CloudWatch, for errors only.

Don’t forget to add permissions to the IAM role created by the console to allow the Kinesis Analytics application to read and write from the streams used for input and output, TextInputStream and WordCountOutputStream in my case.

I can now start the application with the “Run” button, and when it is running, I use a script that I prepared to put some text (I am using a description of the Amazon Kinesis platform) in the input stream:

$ python put_records.py TextInputStream
Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data...

The behavior of my application is summarized in the console in the Application Graph, a visual representation of the data flow consisting of operators and intermediate results (complex applications, using multiple streams, have a much more interesting graph):

To read the output stream, I am using a Lambda function written in Python. I am using the one provided with the Kinesis Record Aggregation & Deaggregation Modules for AWS Lambda, that provides automatic “de-aggregation” of records aggregated by the Amazon Kinesis Producer Library (KPL).

As expected, in the CloudWatch Logs console I get the list of the words and the number of times they were used, updated every 5 seconds by the Lambda function:

Pricing and Availability

With Amazon Kinesis Data Analytics for Java, you pay only for what you use. Pricing is similar to Amazon Kinesis Data Analytics for SQL, but there are a few differences.

For Java applications, you are charged a single additional Amazon Kinesis Processing Unit (KPU) per application, used for application orchestration. Java applications are also charged for running application storage and durable application backups. Running application storage is used for Amazon Kinesis Data Analytics’ stateful processing capabilities and is charged per GB-month. Durable application backups are optional and provide a point-in-time recovery point for applications, charged per GB-month.

For example, pricing is $0.11 per KPU hour in US East (N. Virginia), and you are charged for running application storage ($0.10 per GB-month) and durable application backups ($0.023 per GB-month).

Available Now

Amazon Kinesis Data Analytics for Java is available now in US East (N. Virginia), US East (Ohio), US West (Oregon), EU West (Ireland).

I only scratched the surface of the capabilities for stream processing enabled by the support of Java in Amazon Kinesis Data Analytics. I think this is a powerful tool that can enable new use cases. Let me know what you are going to build with it!

Re-affirming Long-Term Support for Java in Amazon Linux

Post Syndicated from Deepak Singh original https://aws.amazon.com/blogs/compute/re-affirming-long-term-support-for-java-in-amazon-linux/

In light of Oracle’s recent announcement indicating an end to free long-term support for OpenJDK after January 2019, we re-affirm that the OpenJDK 8 and OpenJDK 11 Java runtimes in Amazon Linux 2 will continue to receive free long-term support from Amazon until at least June 30, 2023. We are collaborating and contributing in the OpenJDK community to provide our customers with a free long-term supported Java runtime.

In addition, Amazon Linux AMI 2018.03, the last major release of Amazon Linux AMI, will receive support for the OpenJDK 8 runtime at least until June 30, 2020, to facilitate migration to Amazon Linux 2. Java runtimes provided by AWS Services such as AWS Lambda, AWS Elastic Map Reduce (EMR), and AWS Elastic Beanstalk will also use the AWS supported OpenJDK builds.

Amazon Linux users will not need to make any changes to get support for OpenJDK 8. OpenJDK 11 will be made available through the Amazon Linux 2 repositories at a future date. The Amazon Linux OpenJDK support posture will also apply to the on-premises virtual machine images and Docker base image of Amazon Linux 2.

Amazon Linux 2 provides a secure, stable, and high-performance execution environment. Amazon Linux AMI and Amazon Linux 2 include a Java runtime based on OpenJDK 8 and are available in all public AWS regions at no additional cost beyond the pricing for Amazon EC2 instance usage.

Security updates for Monday

Post Syndicated from ris original https://lwn.net/Articles/756489/rss

Security updates have been issued by CentOS (procps, xmlrpc, and xmlrpc3), Debian (batik, prosody, redmine, wireshark, and zookeeper), Fedora (jasper, kernel, poppler, and xmlrpc), Mageia (git and wireshark), Red Hat (rh-java-common-xmlrpc), Slackware (git), SUSE (bzr, dpdk-thunderxdpdk, and ocaml), and Ubuntu (exempi).