Tag Archives: java

AWS App2Container – A New Containerizing Tool for Java and ASP.NET Applications

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/aws-app2container-a-new-containerizing-tool-for-java-and-asp-net-applications/

Our customers are increasingly developing their new applications with containers and serverless technologies, and are using modern continuous integration and delivery (CI/CD) tools to automate the software delivery life cycle. They also maintain a large number of existing applications that are built and managed manually or using legacy systems. Maintaining these two sets of applications with disparate tooling adds to operational overhead and slows down the pace of delivering new business capabilities. As much as possible, they want to be able to standardize their management tooling and CI/CD processes across both their existing and new applications, and see the option of packaging their existing applications into containers as the first step towards accomplishing that goal.

However, containerizing existing applications requires a long list of manual tasks such as identifying application dependencies, writing dockerfiles, and setting up build and deployment processes for each application. These manual tasks are time consuming, error prone, and can slow down the modernization efforts.

Today, we are launching AWS App2Container, a new command-line tool that helps containerize existing applications that are running on-premises, in Amazon Elastic Compute Cloud (EC2), or in other clouds, without needing any code changes. App2Container discovers applications running on a server, identifies their dependencies, and generates relevant artifacts for seamless deployment to Amazon ECS and Amazon EKS. It also provides integration with AWS CodeBuild and AWS CodeDeploy to enable a repeatable way to build and deploy containerized applications.

AWS App2Container generates the following artifacts for each application component: Application artifacts such as application files/folders, Dockerfiles, container images in Amazon Elastic Container Registry (ECR), ECS Task definitions, Kubernetes deployment YAML, CloudFormation templates to deploy the application to Amazon ECS or EKS, and templates to set up a build/release pipeline in AWS Codepipeline which also leverages AWS CodeBuild and CodeDeploy.

Starting today, you can use App2Container to containerize ASP.NET (.NET 3.5+) web applications running in IIS 7.5+ on Windows, and Java applications running on Linux—standalone JBoss, Apache Tomcat, and generic Java applications such as Spring Boot, IBM WebSphere, Oracle WebLogic, etc.

By modernizing existing applications using containers, you can make them portable, increase development agility, standardize your CI/CD processes, and reduce operational costs. Now let’s see how it works!

AWS App2Container – Getting Started
AWS App2Container requires that the following prerequisites be installed on the server(s) hosting your application: AWS Command Line Interface (CLI) version 1.14 or later, Docker tools, and (in the case of ASP.NET) Powershell 5.0+ for applications running on Windows. Additionally, you need to provide appropriate IAM permissions to App2Container to interact with AWS services.

For example, let’s look how you containerize your existing Java applications. App2Container CLI for Linux is packaged as a tar.gz archive. The file provides users an interactive shell script, install.sh to install the App2Container CLI. Running the script guides users through the install steps and also updates the user’s path to include the App2Container CLI commands.

First, you can begin by running a one-time initialization on the installed server for the App2Container CLI with the init command.

$ sudo app2container init
Workspace directory path for artifacts[default:  /home/ubuntu/app2container/ws]:
AWS Profile (configured using 'aws configure --profile')[default: default]:  
Optional S3 bucket for application artifacts (Optional)[default: none]: 
Report usage metrics to AWS? (Y/N)[default: y]:
Require images to be signed using Docker Content Trust (DCT)? (Y/N)[default: n]:
Configuration saved

This sets up a workspace to store application containerization artifacts (minimum 20GB of disk space available). You can extract them into your Amazon Simple Storage Service (S3) bucket using your AWS profile configured to use AWS services.

Next, you can view Java processes that are running on the application server by using the inventory command. Each Java application process has a unique identifier (for example, java-tomcat-9e8e4799) which is the application ID. You can use this ID to refer to the application with other App2Container CLI commands.

$ sudo app2container inventory
{
    "java-jboss-5bbe0bec": {
        "processId": 27366,
        "cmdline": "java ... /home/ubuntu/wildfly-10.1.0.Final/modules org.jboss.as.standalone -Djboss.home.dir=/home/ubuntu/wildfly-10.1.0.Final -Djboss.server.base.dir=/home/ubuntu/wildfly-10.1.0.Final/standalone ",
        "applicationType": "java-jboss"
    },
    "java-tomcat-9e8e4799": {
        "processId": 2537,
        "cmdline": "/usr/bin/java ... -Dcatalina.home=/home/ubuntu/tomee/apache-tomee-plume-7.1.1 -Djava.io.tmpdir=/home/ubuntu/tomee/apache-tomee-plume-7.1.1/temp org.apache.catalina.startup.Bootstrap start ",
        "applicationType": "java-tomcat"
    }
}

You can also intialize ASP.NET applications on an administrator-run PowerShell session of Windows Servers with IIS version 7.0 or later. Note that Docker tools and container support are available on Windows Server 2016 and later versions. You can select to run all app2container operations on the application server with Docker tools installed or use a worker machine with Docker tools using Amazon ECS-optimized Windows Server AMIs.

PS> app2container inventory
{
    "iis-smarts-51d2dbf8": {
        "siteName": "nopCommerce39",
        "bindings": "http/*:90:",
        "applicationType": "iis"
    }
}

The inventory command displays all IIS websites on the application server that can be containerized. Each IIS website process has a unique identifier (for example, iis-smarts-51d2dbf8) which is the application ID. You can use this ID to refer to the application with other App2Container CLI commands.

You can choose a specific application by referring to its application ID and generate an analysis report for the application by using the analyze command.

$ sudo app2container analyze --application-id java-tomcat-9e8e4799
Created artifacts folder /home/ubuntu/app2container/ws/java-tomcat-9e8e4799
Generated analysis data in /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/analysis.json
Analysis successful for application java-tomcat-9e8e4799
Please examine the same, make appropriate edits and initiate containerization using "app2container containerize --application-id java-tomcat-9e8e4799"

You can use the analysis.json template generated by the application analysis to gather information on the analyzed application that helps identify all system dependencies from the analysisInfo section, and update containerization parameters to customize the container images generated for the application using the containerParameters section.

$ cat java-tomcat-9e8e4799/analysis.json
{
    "a2CTemplateVersion": "1.0",
	"createdTime": "2020-06-24 07:40:5424",
    "containerParameters": {
        "_comment1": "*** EDITABLE: The below section can be edited according to the application requirements. Please see the analyisInfo section below for deetails discoverd regarding the application. ***",
        "imageRepository": "java-tomcat-9e8e4799",
        "imageTag": "latest",
        "containerBaseImage": "ubuntu:18.04",
        "coopProcesses": [ 6446, 6549, 6646]
    },
    "analysisInfo": {
        "_comment2": "*** NON-EDITABLE: Analysis Results ***",
        "processId": 2537
        "appId": "java-tomcat-9e8e4799",
		"userId": "1000",
        "groupId": "1000",
        "cmdline": [...],
        "os": {...},
        "ports": [...]
    }
}

Also, you can run the $ app2container extract --application-id java-tomcat-9e8e4799 command to generate an application archive for the analyzed application. This depends on the analysis.json file generated earlier in the workspace folder for the application,and adheres to any containerization parameter updates specified in there. By using extract command, you can continue the workflow on a worker machine after running the first set of commands on the application server.

Now you can containerize command generated Docker images for the selected application.

$ sudo app2container containerize --application-id java-tomcat-9e8e4799
AWS pre-requisite check succeeded
Docker pre-requisite check succeeded
Extracted container artifacts for application
Entry file generated
Dockerfile generated under /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/Artifacts
Generated dockerfile.update under /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/Artifacts
Generated deployment file at /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/deployment.json
Containerization successful. Generated docker image java-tomcat-9e8e4799
You're all set to test and deploy your container image.

Next Steps:
1. View the container image with \"docker images\" and test the application.
2. When you're ready to deploy to AWS, please edit the deployment file as needed at /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/deployment.json.
3. Generate deployment artifacts using app2container generate app-deployment --application-id java-tomcat-9e8e4799

Using this command, you can view the generated container images using Docker images on the machine where the containerize command is run. You can use the docker run command to launch the container and test application functionality.

Note that in addition to generating container images, the containerize command also generates a deployment.json template file that you can use with the next generate-appdeployment command. You can edit the parameters in the deployment.json template file to change the image repository name to be registered in Amazon ECR, the ECS task definition parameters, or the Kubernetes App name.

$ cat java-tomcat-9e8e4799/deployment.json
{
       "a2CTemplateVersion": "1.0",
       "applicationId": "java-tomcat-9e8e4799",
       "imageName": "java-tomcat-9e8e4799",
       "exposedPorts": [
              {
                     "localPort": 8090,
                     "protocol": "tcp6"
              }
       ],
       "environment": [],
       "ecrParameters": {
              "ecrRepoTag": "latest"
       },
       "ecsParameters": {
              "createEcsArtifacts": true,
              "ecsFamily": "java-tomcat-9e8e4799",
              "cpu": 2,
              "memory": 4096,
              "dockerSecurityOption": "",
              "enableCloudwatchLogging": false,
              "publicApp": true,
              "stackName": "a2c-java-tomcat-9e8e4799-ECS",
              "reuseResources": {
                     "vpcId": "",
                     "cfnStackName": "",
                     "sshKeyPairName": ""
              },
              "gMSAParameters": {
                     "domainSecretsArn": "",
                     "domainDNSName": "",
                     "domainNetBIOSName": "",
                     "createGMSA": false,
                     "gMSAName": ""
              }
       },
       "eksParameters": {
              "createEksArtifacts": false,
              "applicationName": "",
              "stackName": "a2c-java-tomcat-9e8e4799-EKS",
              "reuseResources": {
                     "vpcId": "",
                     "cfnStackName": "",
                     "sshKeyPairName": ""
              }
       }
 }

At this point, the application workspace where the artifacts are generated serves as an iteration sandbox. You can choose to edit the Dockerfile generated here to make changes to their application and use the docker build command to build new container images as needed. You can generate the artifacts needed to deploy the application containers in Amazon EKS by using the generate-deployment command.

$ sudo app2container generate app-deployment --application-id java-tomcat-9e8e4799
AWS pre-requisite check succeeded
Docker pre-requisite check succeeded
Created ECR Repository
Uploaded Cloud Formation resources to S3 Bucket: none
Generated Cloud Formation Master template at: /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/EksDeployment/amazon-eks-master.template.yaml
EKS Cloudformation templates and additional deployment artifacts generated successfully for application java-tomcat-9e8e4799

You're all set to use AWS Cloudformation to manage your application stack.
Next Steps:
1. Edit the cloudformation template as necessary.
2. Create an application stack using the AWS CLI or the AWS Console. AWS CLI command:

       aws cloudformation deploy --template-file /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/EksDeployment/amazon-eks-master.template.yaml --capabilities CAPABILITY_NAMED_IAM --stack-name java-tomcat-9e8e4799

3. Setup a pipeline for your application stack:

       app2container generate pipeline --application-id java-tomcat-9e8e4799

This command works based on the deployment.json template file produced as part of running the containerize command. App2Container will now generate ECS/EKS cloudformation templates as well and an option to deploy those stacks.

The command registers the container image to user specified ECR repository, generates cloudformation template for Amazon ECS and EKS deployments. You can register ECS task definition with Amazon ECS and use kubectl to launch the containerized application on the existing Amazon EKS or self-managed kubernetes cluster using App2Container generated amazon-eks-master.template.deployment.yaml.

Alternatively, you can directly deploy containerized applications by --deploy options into Amazon EKS.

$ sudo app2container generate app-deployment --application-id java-tomcat-9e8e4799 --deploy
AWS pre-requisite check succeeded
Docker pre-requisite check succeeded
Created ECR Repository
Uploaded Cloud Formation resources to S3 Bucket: none
Generated Cloud Formation Master template at: /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/EksDeployment/amazon-eks-master.template.yaml
Initiated Cloudformation stack creation. This may take a few minutes. Please visit the AWS Cloudformation Console to track progress.
Deploying application to EKS

Handling ASP.NET Applications with Windows Authentication
Containerizing ASP.NET applications is almost same process as Java applications, but Windows containers cannot be directly domain joined. They can however still use Active Directory (AD) domain identities to support various authentication scenarios.

App2Container detects if a site is using Windows authentication and accordingly makes the IIS site’s application pool run as the network service identity, and generates the new cloudformation templates for Windows authenticated IIS applications. The creation of gMSA and AD Security group, domain join ECS nodes and making containers use this gMSA are all taken care of by those templates.

Also, it provides two PowerShell scripts as output to the $ app2container containerize command along with an instruction file on how to use it.

The following is an example output:

PS C:\Windows\system32> app2container containerize --application-id iis-SmartStoreNET-a726ba0b
Running AWS pre-requisite check...
Running Docker pre-requisite check...
Container build complete. Please use "docker images" to view the generated container images.
Detected that the Site is using Windows Authentication.
Generating powershell scripts into C:\Users\Admin\AppData\Local\app2container\iis-SmartStoreNET-a726ba0b\Artifacts required to setup Container host with Windows Authentication
Please look at C:\Users\Admin\AppData\Local\app2container\iis-SmartStoreNET-a726ba0b\Artifacts\WindowsAuthSetupInstructions.md for setup instructions on Windows Authentication.
A deployment file has been generated under C:\Users\Admin\AppData\Local\app2container\iis-SmartStoreNET-a726ba0b
Please edit the same as needed and generate deployment artifacts using "app2container generate-deployment"

The first PowerShellscript, DomainJoinAddToSecGroup.ps1, joins the container host and adds it to an Active Directory security group. The second script, CreateCredSpecFile.ps1, creates a Group Managed Service Account (gMSA), grants access to the Active Directory security group, generates the credential spec for this gMSA, and stores it locally on the container host. You can execute these PowerShellscripts on the ECS host. The following is an example usage of the scripts:

PS C:\Windows\system32> .\DomainJoinAddToSecGroup.ps1 -ADDomainName Dominion.com -ADDNSIp 10.0.0.1 -ADSecurityGroup myIISContainerHosts -CreateADSecurityGroup:$true
PS C:\Windows\system32> .\CreateCredSpecFile.ps1 -GMSAName MyGMSAForIIS -CreateGMSA:$true -ADSecurityGroup myIISContainerHosts

Before executing the app2container generate-deployment command, edit the deployment.json file to change the value of dockerSecurityOption to the name of the CredentialSpec file that the CreateCredSpecFile script generated. For example,
"dockerSecurityOption": "credentialspec:file://dominion_mygmsaforiis.json"

Effectively, any access to network resource made by the IIS server inside the container for the site will now use the above gMSA to authenticate. The final step is to authorize this gMSA account on the network resources that the IIS server will access. A common example is authorizing this gMSA inside the SQL Server.

Finally, if the application must connect to a database to be fully functional and you run the container in Amazon ECS, ensure that the application container created from the Docker image generated by the tool has connectivity to the same database. You can refer to this documentation for options on migrating: MS SQL Server from Windows to Linux on AWS, Database Migration Service, and backup and restore your MS SQL Server to Amazon RDS.

Now Available
AWS App2Container is offered free. You only pay for the actual usage of AWS services like Amazon EC2, ECS, EKS, and S3 etc based on their usage. For details, please refer to App2Container FAQs and documentations. Give this a try, and please send us feedback either through your usual AWS Support contacts, on the AWS Forum for ECS, AWS Forum for EKS, or on the container roadmap on Github.

Channy;

Find Your Most Expensive Lines of Code – Amazon CodeGuru Is Now Generally Available

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/find-your-most-expensive-lines-of-code-amazon-codeguru-is-now-generally-available/

Bringing new applications into production, maintaining their code base as they grow and evolve, and at the same time respond to operational issues, is a challenging task. For this reason, you can find many ideas on how to structure your teams, on which methodologies to apply, and how to safely automate your software delivery pipeline.

At re:Invent last year, we introduced in preview Amazon CodeGuru, a developer tool powered by machine learning that helps you improve your applications and troubleshoot issues with automated code reviews and performance recommendations based on runtime data. During the last few months, many improvements have been launched, including a more cost-effective pricing model, support for Bitbucket repositories, and the ability to start the profiling agent using a command line switch, so that you no longer need to modify the code of your application, or add dependencies, to run the agent.

You can use CodeGuru in two ways:

  • CodeGuru Reviewer uses program analysis and machine learning to detect potential defects that are difficult for developers to find, and recommends fixes in your Java code. The code can be stored in GitHub (now also in GitHub Enterprise), AWS CodeCommit, or Bitbucket repositories. When you submit a pull request on a repository that is associated with CodeGuru Reviewer, it provides recommendations for how to improve your code. Each pull request corresponds to a code review, and each code review can include multiple recommendations that appear as comments on the pull request.
  • CodeGuru Profiler provides interactive visualizations and recommendations that help you fine-tune your application performance and troubleshoot operational issues using runtime data from your live applications. It currently supports applications written in Java virtual machine (JVM) languages such as Java, Scala, Kotlin, Groovy, Jython, JRuby, and Clojure. CodeGuru Profiler can help you find the most expensive lines of code, in terms of CPU usage or introduced latency, and suggest ways you can improve efficiency and remove bottlenecks. You can use CodeGuru Profiler in production, and when you test your application with a meaningful workload, for example in a pre-production environment.

Today, Amazon CodeGuru is generally available with the addition of many new features.

In CodeGuru Reviewer, we included the following:

  • Support for Github Enterprise – You can now scan your pull requests and get recommendations against your source code on Github Enterprise on-premises repositories, together with a description of what’s causing the issue and how to remediate it.
  • New types of recommendations to solve defects and improve your code – For example, checking input validation, to avoid issues that can compromise security and performance, and looking for multiple copies of code that do the same thing.

In CodeGuru Profiler, you can find these new capabilities:

  • Anomaly detection – We automatically detect anomalies in the application profile for those methods that represent the highest proportion of CPU time or latency.
  • Lambda function support – You can now profile AWS Lambda functions just like applications hosted on Amazon Elastic Compute Cloud (EC2) and containerized applications running on Amazon ECS and Amazon Elastic Kubernetes Service, including those using AWS Fargate.
  • Cost of issues in the recommendation report – Recommendations contain actionable resolution steps which explain what the problem is, the CPU impact, and how to fix the issue. To help you better prioritize your activities, you now have an estimation of the savings introduced by applying the recommendation.
  • Color-my-code – In the visualizations, to help you easily find your own code, we are coloring your methods differently from frameworks and other libraries you may use.
  • CloudWatch metrics and alerts – To keep track and monitor efficiency issues that have been discovered.

Let’s see some of these new features at work!

Using CodeGuru Reviewer with a Lambda Function
I create a new repo in my GitHub account, and leave it empty for now. Locally, where I am developing a Lambda function using the Java 11 runtime, I initialize my Git repo and add only the README.md file to the master branch. In this way, I can add all the code as a pull request later and have it go through a code review by CodeGuru.

git init
git add README.md
git commit -m "First commit"

Now, I add the GitHub repo as origin, and push my changes to the new repo:

git remote add origin https://github.com/<my-user-id>/amazon-codeguru-sample-lambda-function.git
git push -u origin master

I associate the repository in the CodeGuru console:

When the repository is associated, I create a new dev branch, add all my local files to it, and push it remotely:

git checkout -b dev
git add .
git commit -m "Code added to the dev branch"
git push --set-upstream origin dev

In the GitHub console, I open a new pull request by comparing changes across the two branches, master and dev. I verify that the pull request is able to merge, then I create it.

Since the repository is associated with CodeGuru, a code review is listed as Pending in the Code reviews section of the CodeGuru console.

After a few minutes, the code review status is Completed, and CodeGuru Reviewer issues a recommendation on the same GitHub page where the pull request was created.

Oops! I am creating the Amazon DynamoDB service object inside the function invocation method. In this way, it cannot be reused across invocations. This is not efficient.

To improve the performance of my Lambda function, I follow the CodeGuru recommendation, and move the declaration of the DynamoDB service object to a static final attribute of the Java application object, so that it is instantiated only once, during function initialization. Then, I follow the link in the recommendation to learn more best practices for working with Lambda functions.

Using CodeGuru Profiler with a Lambda Function
In the CodeGuru console, I create a MyServerlessApp-Development profiling group and select the Lambda compute platform.

Next, I give the AWS Identity and Access Management (IAM) role used by my Lambda function permissions to submit data to this profiling group.

Now, the console is giving me all the info I need to profile my Lambda function. To configure the profiling agent, I use a couple of environment variables:

  • AWS_CODEGURU_PROFILER_GROUP_ARN to specify the ARN of the profiling group to use.
  • AWS_CODEGURU_PROFILER_ENABLED to enable (TRUE) or disable (FALSE) profiling.

I follow the instructions (for Maven and Gradle) to add a dependency, and include the profiling agent in the build. Then, I update the code of the Lambda function to wrap the handler function inside the LambdaProfiler provided by the agent.

To generate some load, I start a few scripts invoking my function using the Amazon API Gateway as trigger. After a few minutes, the profiling group starts to show visualizations describing the runtime behavior of my Lambda function.

For example, I can see how much CPU time is spent in the different methods of my function. At the bottom, there are the entry point methods. As I scroll up, I find methods that are called deeper in the stack trace. I right-click and hide the LambdaRuntimeClient methods to focus on my code. Note that my methods are colored differently than those in the packages I am using, such as the AWS SDK for Java.

I am mostly interested in what happens in the handler method invoked by the Lambda platform. I select the handler method, and now it becomes the new “base” of the visualization.

As I move my pointer on each of my methods, I get more information, including an estimation of the yearly cost of running that specific part of the code in production, based on the load experienced by the profiling agent during the selected time window. In my case, the handler function cost is estimated to be $6. If I select the two main functions above, I have an estimation of $3 each. The cost estimation works for code running on Lambda functions, EC2 instances, and containerized applications.

Similarly, I can visualize Latency, to understand how much time is spent inside the methods in my code. I keep the Lambda function handler method selected to drill down into what is under my control, and see where time is being spent the most.

The CodeGuru Profiler is also providing a recommendation based on the data collected. I am spending too much time (more than 4%) in managing encryption. I can use a more efficient crypto provider, such as the open source Amazon Corretto Crypto Provider, described in this blog post. This should lower the time spent to what is expected, about 1% of my profile.

Finally, I edit the profiling group to enable notifications. In this way, if CodeGuru detects an anomaly in the profile of my application, I am notified in one or more Amazon Simple Notification Service (SNS) topics.

Available Now
Amazon CodeGuru is available today in 10 regions, and we are working to add more regions in the coming months. For regional availability, please see the AWS Region Table.

CodeGuru helps you improve your application code and reduce compute and infrastructure costs with an automated code reviewer and application profiler that provide intelligent recommendations. Using visualizations based on runtime data, you can quickly find the most expensive lines of code of your applications. With CodeGuru, you pay only for what you use. Pricing is based on the lines of code analyzed by CodeGuru Reviewer, and on sampling hours for CodeGuru Profiler.

To learn more, please see the documentation.

Danilo

Building a CI/CD pipeline for multi-region deployment with AWS CodePipeline

Post Syndicated from Akash Kumar original https://aws.amazon.com/blogs/devops/building-a-ci-cd-pipeline-for-multi-region-deployment-with-aws-codepipeline/

This post discusses the benefits of and how to build an AWS CI/CD pipeline in AWS CodePipeline for multi-region deployment. The CI/CD pipeline triggers on application code changes pushed to your AWS CodeCommit repository. This automatically feeds into AWS CodeBuild for static and security analysis of the CloudFormation template. Another CodeBuild instance builds the application to generate an AMI image as output. AWS Lambda then copies the AMI image to other Regions. Finally, AWS CloudFormation cross-region actions are triggered and provision the instance into target Regions based on AMI image.

The solution is based on using a single pipeline with cross-region actions, which helps in provisioning resources in the current Region and other Regions. This solution also helps manage the complete CI/CD pipeline at one place in one Region and helps as a single point for monitoring and deployment changes. This incurs less cost because a single pipeline can deploy the application into multiple Regions.

As a security best practice, the solution also incorporates static and security analysis using cfn-lint and cfn-nag. You use these tools to scan CloudFormation templates for security vulnerabilities.

The following diagram illustrates the solution architecture.

Multi region AWS CodePipeline architecture

Multi region AWS CodePipeline architecture

Prerequisites

Before getting started, you must complete the following prerequisites:

  • Create a repository in CodeCommit and provide access to your user
  • Copy the sample source code from GitHub under your repository
  • Create an Amazon S3 bucket in the current Region and each target Region for your artifact store

Creating a pipeline with AWS CloudFormation

You use a CloudFormation template for your CI/CD pipeline, which can perform the following actions:

  1. Use CodeCommit repository as source code repository
  2. Static code analysis on the CloudFormation template to check against the resource specification and block provisioning if this check fails
  3. Security code analysis on the CloudFormation template to check against secure infrastructure rules and block provisioning if this check fails
  4. Compilation and unit test of application code to generate an AMI image
  5. Copy the AMI image into target Regions for deployment
  6. Deploy into multiple Regions using the CloudFormation template; for example, us-east-1, us-east-2, and ap-south-1

You use a sample web application to run through your pipeline, which requires Java and Apache Maven for compilation and testing. Additionally, it uses Tomcat 8 for deployment.

The following table summarizes the resources that the CloudFormation template creates.

Resource NameTypeObjective
CloudFormationServiceRoleAWS::IAM::RoleService role for AWS CloudFormation
CodeBuildServiceRoleAWS::IAM::RoleService role for CodeBuild
CodePipelineServiceRoleAWS::IAM::RoleService role for CodePipeline
LambdaServiceRoleAWS::IAM::RoleService role for Lambda function
SecurityCodeAnalysisServiceRoleAWS::IAM::RoleService role for security analysis of provisioning CloudFormation template
StaticCodeAnalysisServiceRoleAWS::IAM::RoleService role for static analysis of provisioning CloudFormation template
StaticCodeAnalysisProjectAWS::CodeBuild::ProjectCodeBuild for static analysis of provisioning CloudFormation template
SecurityCodeAnalysisProjectAWS::CodeBuild::ProjectCodeBuild for security analysis of provisioning CloudFormation template
CodeBuildProjectAWS::CodeBuild::ProjectCodeBuild for compilation, testing, and AMI creation
CopyImageAWS::Lambda::FunctionPython Lambda function for copying AMI images into other Regions
AppPipelineAWS::CodePipeline::PipelineCodePipeline for CI/CD

To start creating your pipeline, complete the following steps:

  • Launch the CloudFormation stack with the following link:
Launch button for CloudFormation

Launch button for CloudFormation

  • Choose Next.
  • For Specify details, provide the following values:
ParameterDescription
Stack nameName of your stack
OtherRegion1Input the target Region 1 (other than current Region) for deployment
OtherRegion2Input the target Region 2 (other than current Region) for deployment
RepositoryBranchBranch name of repository
RepositoryNameRepository name of the project
S3BucketNameInput the S3 bucket name for artifact store
S3BucketNameForOtherRegion1Create a bucket in target Region 1 and specify the name for artifact store
S3BucketNameForOtherRegion2Create a bucket in target Region 2 and specify the name for artifact store

Choose Next.

  • On the Review page, select I acknowledge that this template might cause AWS CloudFormation to create IAM resources.
  • Choose Create.
  • Wait for the CloudFormation stack status to change to CREATE_COMPLETE (this takes approximately 5–7 minutes).

When the stack is complete, your pipeline should be ready and running in the current Region.

  • To validate the pipeline, check the images and EC2 instances running into the target Regions and also refer the AWS CodePipeline Execution summary as below.
AWS CodePipeline Execution Summary

AWS CodePipeline Execution Summary

We will walk you through the following steps for creating a multi-region deployment pipeline:

1. Using CodeCommit as your source code repository

The deployment workflow starts by placing the application code on the CodeCommit repository. When you add or update the source code in CodeCommit, the action generates a CloudWatch event, which triggers the pipeline to run.

2. Static code analysis of CloudFormation template to provision AWS resources

Historically, AWS CloudFormation linting was limited to the ValidateTemplate action in the service API. This action tells you if your template is well-formed JSON or YAML, but doesn’t help validate the actual resources you’ve defined.

You can use a linter such as the cfn-lint tool for static code analysis to improve your AWS CloudFormation development cycle. The tool validates the provisioning CloudFormation template properties and their values (mappings, joins, splits, conditions, and nesting those functions inside each other) against the resource specification. This can cover the most common of the underlying service constraints and help encode some best practices.

The following rules cover underlying service constraints:

  • E2530 – Checks that Lambda functions have correctly configured memory sizes
  • E3025 – Checks that your RDS instances use correct instance types for the database engine
  • W2001 – Checks that each parameter is used at least once

You can also add this step as a pre-commit hook for your GIT repository if you are using CodeCommit or GitHub.

You provision a CodeBuild project for static code analysis as the first step in CodePipeline after source. This helps in early detection of any linter issues.

3. Security code analysis of CloudFormation template to provision AWS resources

You can use Stelligent’s cfn_nag tool to perform additional validation of your template resources for security. The cfn-nag tool looks for patterns in CloudFormation templates that may indicate insecure infrastructure provisioning and validates against AWS best practices. For example:

  • IAM rules that are too permissive (wildcards)
  • Security group rules that are too permissive (wildcards)
  • Access logs that aren’t enabled
  • Encryption that isn’t enabled
  • Password literals

You provision a CodeBuild project for security code analysis as the second step in CodePipeline. This helps detect any insecure infrastructure provisioning issues.

4. Compiling and testing application code and generating an AMI image

Because you use a Java-based application for this walkthrough, you use Amazon Corretto as your JVM. Corretto is a no-cost, multi-platform, production-ready distribution of the Open Java Development Kit (OpenJDK). Corretto comes with long-term support that includes performance enhancements and security fixes.

You also use Apache Maven as a build automation tool to build the sample application, and the HashiCorp Packer tool to generate an AMI image for the application.

You provision a CodeBuild project for compilation, unit testing, AMI generation, and storing the AMI ImageId in the Parameter Store, which the CloudFormation template uses as the next step of the pipeline.

5. Copying the AMI image into target Regions

You use a Lambda function to copy the AMI image into target Regions so the CloudFormation template can use it to provision instances into that Region as the next step of the pipeline. It also writes the target Region AMI ImageId into the target Region’s Parameter Store.

6. Deploying into multiple Regions with the CloudFormation template

You use the CloudFormation template as a cross-region action to provision AWS resources into a target Region. CloudFormation uses Parameter Store’s ImageId as reference and provisions the instances into the target Region.

Cleaning up

To avoid additional charges, you should delete the following AWS resources after you validate the pipeline:

  • The cross-region CloudFormation stack in the target and current Regions
  • The main CloudFormation stack in the current Region
  • The AMI you created in the target and current Regions
  • The Parameter Store AMI_VERSION in the target and current Regions

Conclusion

You have now created a multi-region deployment pipeline in CodePipeline without having to worry about the mechanics of creating and copying AMI images across Regions. CodePipeline abstracts the creating and copying of the images in the background in each Region. You can now upload new source code changes to the CodeCommit repository in the primary Region, and changes deploy automatically to other Regions. Cross-region actions are very powerful and are not limited to deploy actions. You can also use them with build and test actions.

Introducing a new generation of AWS Elastic Beanstalk platforms

Post Syndicated from David LaBissoniere original https://aws.amazon.com/blogs/compute/introducing-a-new-generation-of-aws-elastic-beanstalk-platforms/

In my last post I discussed AWS Elastic Beanstalk’s new public roadmap on GitHub. Today I want to talk about our new generation of Elastic Beanstalk platforms built on top of Amazon Linux 2 (AL2).

Late last year we launched a public beta of a new Elastic Beanstalk platform for Amazon Corretto — Amazon’s no-cost, production-ready distribution of the Open Java Development Kit (OpenJDK). This is also our first platform based on AL2. This year we have launched two more beta AL2 platforms: Docker and Python. More beta platforms are arriving soon, followed by generally available platform releases.

A sample application using the new Python 3.7 beta platform

A sample application using the new Python 3.7 beta platform

I want to dive a little deeper on what we are doing with these platforms. Elastic Beanstalk was publicly launched in 2011, and announced in a blog post by Jeff Barr. Back then there were few enough AWS services that they were all listed as tabs along the top of the AWS Management Console. At launch, we supported only Apache Tomcat applications. Over time, we added support for many other runtimes and began using the term “platform” to describe our offerings. Today we support a wide variety of platforms for popular web application frameworks. For example, Ruby on Rails, PHP, and Node.js, as well as generic Docker-based platforms. In the years since we launched each platform, the underlying communities have continued to evolve. Elastic Beanstalk is an opinionated service, especially when it comes to our platforms. As the service evolves, the opinions baked into our platforms must evolve as well.

With our AL2 platforms, we are refreshing each platform based on feedback we’ve gotten from customers. For example, with Java we heard concerns from many customers about long-term support and licensing of OpenJDK. That’s why in AL2 we are using Amazon’s own Corretto distribution, which includes committed long-term support. It also has performance and scalability improvements learned from Amazon’s years of experience running Java across thousands of production services — such as the Elastic Beanstalk service itself. For more details, see this section of our Java platform documentation.

Our Python AL2 platform has also been modernized. Previously we only supported serving applications through Apache and mod_wsgi. Now we are using NGINX as a reverse proxy in front of Gunicorn, with the flexibility to use another Web Server Gateway Interface (WSGI) server if you prefer. We also took this opportunity to add support for Pipenv and Pipfile, more modern and powerful Python dependency management tools. Learn more in our Python platform documentation.

The Docker AL2 platform is rewritten internally, but provides largely the same customer experience. It does offer improved I/O performance by using the OverlayFS storage driver. This is a change from the previous Docker platform, which used the older and slower Device Mapper storage driver and required an extra Amazon EBS volume.

We are hard at work on another set of beta platforms including PHP, Ruby, and Node.js, which are expected to launch soon. Each of these have been modernized and improved. For a full list of differences between our existing platforms and their Amazon Linux 2 equivalents, check out our documentation. In the next section I want to take a closer look at one new feature that applies to all of the new platforms: platform hooks.

Platform hooks

With our AL2 platforms, we are offering a simplified model for on-instance customization. We’ve long supported configuration files called ebextensions that allow customization of environment options, resources, and on-instance behavior. These have enabled customers to extend their environments in ways we never dreamed of. But we’ve also heard customer feedback about the difficulty of writing complex shell scripts embedded within YAML or JSON. And as they are, ebextensions don’t provide any straightforward mechanism to execute custom code after an application deployment is completed. Customers have pointed out many use cases where they want to do this – for example to enable third party monitoring tools.

With our new generation of Linux platforms, we are introducing platform hooks. Platform hooks are a set of directories inside the application bundle that you can populate with scripts. These scripts are executed at defined points in the on-instance application deployment lifecycle. These hooks are reminiscent of custom platform hooks, but are simplified and easier to manage and version because they are part of the application bundle.

For example, a Corretto application bundle might look like:

├── .platform
│   ├── hooks
│   │   ├── prebuild
│   │   │   ├── 01_set_secrets.sh
│   │   │   └── 10_install_dependencies.sh
│   │   └── predeploy
│   │       └── 01_configure_corretto.sh
│   │   ├── postdeploy
│   │   │   └── 99_log_deployment_complete.pay
│   └── nginx
│       └── conf.d
│           └── custom.conf
├── Procfile
└── application.jar

The files in each of the .platform/hooks/ subdirectories are executed in lexicographical order at predefined points in the deployment process.

  1. prebuild hooks are executed after the application is downloaded and extracted, but before we try to configure anything
  2. predeploy hooks are run after the application is configured and staged, but before it is deployed.
  3. postdeploy hooks are run at the very end — after the application is deployed and running.

Finally, take note of the .platform/nginx/ directory as well. This can be used to provide custom configuration additions or overrides for the on-instance NGINX proxy server. You can either override the provided configuration file completely, or just add a new configuration file that is imported by NGINX. Because all of the AL2 platforms use NGINX and the same base configuration, these customizations are now more portable across platforms. For a full explanation of platform hooks and related functionality, see our Extending Linux Platforms documentation page.

We’re excited to launch this new generation of Elastic Beanstalk platforms, and to hear feedback from you about how we can make them even better. If you have feedback about one of the AL2 beta platforms, please add a comment to the relevant issue on the public roadmap on GitHub. For example, here is the issue for the Corretto platform. Keep an eye on the roadmap and our release notes for announcements of the remaining platforms over the coming weeks.

 

How to run AWS CloudHSM workloads on AWS Lambda

Post Syndicated from Mohamed AboElKheir original https://aws.amazon.com/blogs/security/how-to-run-aws-cloudhsm-workloads-on-aws-lambda/

AWS CloudHSM is a cloud-based hardware security module (HSM) that enables you to generate and use your own encryption keys on the AWS Cloud. With CloudHSM, you can manage your own encryption keys using FIPS 140-2 Level 3 validated HSMs. CloudHSM also automatically manages synchronization, high availability and failover within a cluster.

When the service first launched, many customers ran CloudHSM workloads on Amazon Elastic Compute Cloud (Amazon EC2), which required the CloudHSM client to be installed on the Amazon EC2 instance in order to communicate with the CloudHSM cluster. Today, we see customers who are interested in leveraging CloudHSM for serverless workloads using AWS Lambda, but when using Lambda there is no “instance” to install the CloudHSM client on. This blog post shows a workaround that can be used to satisfy the CloudHSM client installation requirement on Lambda functions to be able to run CloudHSM workloads within these Lambda functions.

The workaround is performed by first packaging the CloudHSM client and its requirements in a Lambda layer, and then running the CloudHSM client in a child process from within the Lambda function code to allow communication with the HSMs in your CloudHSM cluster. By leveraging this approach, you gain the benefits of serverless computing (such as increased scalability and decreased admin overhead), as well as the ability to integrate with other AWS services like Amazon CloudWatch Events, Amazon Simple Storage Service (Amazon S3) and AWS Config.

Why would I want to run CloudHSM workloads on Lambda?

Below are some specific use cases enabled by this solution:

  1. When a file is added to an Amazon S3 bucket, you can trigger a Lambda function to encrypt or decrypt the file using keys stored in CloudHSM.
  2. When a file is added to an Amazon S3 bucket, you can trigger a Lambda function to create a digital signature for the file using a private key stored in CloudHSM. This digital signature can then be used to ensure file integrity.
  3. You can create a custom AWS Config rule that checks to ensure files in a directory or a bucket have not been tampered with by verifying their digital signatures using keys stored in CloudHSM.

Solution overview

This solution shows you how to package the CloudHSM client binary and its dependencies (configuration files and libraries) as well as the CloudHSM Java JCE library to a Lambda layer which is attached to the Lambda function. This enables the function to run the CloudHSM client daemon in the background as a child process, allowing it to connect to the CloudHSM cluster and to perform cryptographic tasks such as encryption and decryption operations.

Using a Lambda layer decouples the code of the Lambda function from the CloudHSM client and the CloudHSM Java JCE library. This way, when a new version of the CloudHSM client and the CloudHSM Java JCE library is released, it can be included in a new Lambda layer version and attached to the Lambda function without needing to rebuild the Lambda function package.

The example solution below includes a complete Java sample for the Lambda function. It uses the CloudHSM Java JCE library to generate a symmetric key on the HSM, and it uses this key to encrypt and decrypt after starting the CloudHSM client. Maven (a build automation tool) will be used to build the Lambda function package.

The solution uses AWS Secrets Manager to store and retrieve the crypto user (CU) credentials that are needed to perform cryptographic operations. If the HSM IPs of the CloudHSM cluster are changed (for example, if the HSMs are deleted and re-created), the Lambda function will automatically update the configuration during runtime.

Note:

  1. The solution only works with version 2.0.4 or later of the CloudHSM client and CloudHSM Java JCE library.
  2. In this workaround, the client is started at the beginning of each Lambda invocation, and is stopped at the end of the invocation. Due to the way Lambda works, the client can’t persist through multiple invocations.
  3. Secrets Manager uses AWS Key Management Service to secure its data. If your workload requires that all data be secured using HSMs under your sole control, without reliance on IAM credentials, this solution may not be appropriate. You should work with your security or compliance officer to ensure you’re using a method of securing HSM login credentials that meets your application and security needs.

Prerequisites

Figure 1: Architectural diagram

Figure 1: Architectural diagram

Here are the resources you’ll need in order to follow along with the example in Figure 1:

  1. An Amazon Virtual Private Cloud (Amazon VPC) with the following components:
    1. Private subnets in multiple Availability Zones to be used for the HSM’s elastic network interfaces (ENIs).
    2. A public subnet that contains a network address translation (NAT) gateway.
    3. A private subnet with a route table that routes internet traffic (0.0.0.0/0) to the NAT gateway. You’ll use this subnet to run the Lambda function. The NAT gateway allows you to connect to the CloudHSM, CloudWatch Logs and Secrets Manager endpoints.

    Note: For high availability, you can add multiple instances of the public and private subnets mentioned in Prerequisites 1.b and 1.c. For more information about how to create an Amazon VPC with public and private subnets as well as a NAT gateway, refer to the Amazon VPC user guide.

  2. An active CloudHSM cluster with at least one active HSM. The HSMs should be created in the private subnets mentioned in Prerequisite 1.a. You can follow the Getting Started with AWS CloudHSM guide to create and initialize the CloudHSM cluster.
  3. An Amazon Linux 2 EC2 instance with the CloudHSM client installed and configured to connect to the CloudHSM cluster. The client instance should be launched in the public subnet mentioned in Prerequisite 1.b. You can again refer to Getting Started With AWS CloudHSM to configure and connect the client instance.

    Note: You only need the client instance to build the Lambda function package. You can terminate the instance after the package has been created.

  4. CU credentials. You can create a CU by following the steps in the user guide.
  5. A server/machine with AWS Command Line Interface (AWS CLI) installed and configured. You’ll need this to follow along, as the example uses AWS CLI to create and configure the necessary AWS resources. The IAM user/role should have at minimum the permissions in the below policy attached to it to follow this example. Make sure you replace the <REGION> and <ACCOUNT-ID> tags below with the actual Region and account ID you are using.
    
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": "secretsmanager:CreateSecret",
                "Resource": "*",
                "Condition": {
                    "StringEquals": {
                        "secretsmanager:Name": "CloudHSM_CU"
                    }
                }
            },
            {
                "Sid": "VisualEditor1",
                "Effect": "Allow",
                "Action": [
                    "ec2:AuthorizeSecurityGroupEgress",
                    "lambda:CreateFunction",
                    "lambda:InvokeFunction",
                    "lambda:GetLayerVersion",
                    "lambda:PublishLayerVersion",
                    "iam:GetRole",
                    "iam:CreateRole",
                    "iam:AttachRolePolicy",
                    "iam:PutRolePolicy",
                    "iam:PassRole",
                    "secretsmanager:DescribeSecret",
                    "secretsmanager:GetResourcePolicy",
                    "secretsmanager:GetSecretValue",
                    "secretsmanager:PutResourcePolicy",
                    "logs:FilterLogEvents"
                ],
                "Resource": [
                    "arn:aws:ec2:<REGION>:<ACCOUNT-ID>:security-group/outbound-443",
                    "arn:aws:lambda:<REGION>:<ACCOUNT-ID>:function:cloudhsm_lambda_example",
                    "arn:aws:lambda:<REGION>:<ACCOUNT-ID>:layer:cloudhsm-client-layer",
                    "arn:aws:lambda:<REGION>:<ACCOUNT-ID>:layer:cloudhsm-client-layer:*",
                    "arn:aws:iam::<ACCOUNT-ID>:role/cloudhsm_lambda_example_role",
                    "arn:aws:secretsmanager:<REGION>:<ACCOUNT-ID>:secret:CloudHSM_CU*",
                    "arn:aws:logs:<REGION>:<ACCOUNT-ID>:log-group:/aws/lambda/cloudhsm_lambda_example:log-stream:"
                ]
            },
            {
                "Sid": "VisualEditor3",
                "Effect": "Allow",
                "Action": [
                    "ec2:DescribeVpcs",
                    "ec2:CreateSecurityGroup",
                    "ec2:DescribeSubnets",
                    "cloudhsm:DescribeClusters",
                    "ec2:DescribeSecurityGroups",
                    "ec2:AuthorizeSecurityGroupEgress"
                ],
                "Resource": "*"
            }
        ]
    }
    	

Step 1: Build the Lambda function package

In this step, you’ll build the Lambda function package using Maven. For more information about using Maven to build an AWS Lambda Java package, refer to the AWS Lambda developer guide.

  1. On your CloudHSM client instance, install the CloudHSM Java JCE library by following the steps in the user guide.
  2. Install OpenJDK 8 and Maven:
    
    $ sudo yum install -y java maven
    	

  3. Download the sample code, unzip it and move to the created directory. The directory will have the name aws-cloudhsm-on-aws-lambda-sample-master and will include:
    • A file with the name pom.xml that contains the Maven project configuration.
    • A file with the name SymmetricKeys.java which is also available on the AWS CloudHSM Java JCE samples repo. This file contains the function that you’ll use to generate the advanced encryption standard (AES) key.
    • A file with the name AESGCMEncryptDecryptLambda.java, which will run when the Lambda function is invoked:
      
      $ wget https://github.com/aws-samples/aws-cloudhsm-on-aws-lambda-sample/archive/master.zip
      $ unzip master.zip
      $ cd aws-cloudhsm-on-aws-lambda-sample-master/
      	

  4. Create a Java Archive (JAR) package by running the below commands. This will create the JAR file under the target/ directory with the name cloudhsm_lambda_project-1.0-SNAPSHOT.jar.

    
    $ export CLOUDHSM_VER=$(ls /opt/cloudhsm/java/ | grep "cloudhsm-[0-9\.]\+.jar" | grep -o "[0-9\.]\+[0-9]")
    $ export LOG4JCORE_VER=$(ls /opt/cloudhsm/java/ | grep "log4j-core-[0-9\.]\+.jar" | grep -o "[0-9\.]\+[0-9]")
    $ export LOG4JAPI_VER=$(ls /opt/cloudhsm/java/ | grep "log4j-api-[0-9\.]\+.jar" | grep -o "[0-9\.]\+[0-9]")
    $ mvn validate && mvn clean package 
    	

Step 2: Create the Lambda layer

In this step, you’ll create the Lambda layer that contains the CloudHSM client and its dependencies and the CloudHSM Java library JARs.

  1. On your CloudHSM client instance, create a directory called “layer” and change directories to it:
    
    $ mkdir ~/layer && cd ~/layer
    	

  2. Create the following directories, which you’ll use in the next steps to hold the CloudHSM binary and its prerequisites such as configuration files and libraries, and the CloudHSM Java JCE JARs:
    
    $ mkdir -p lib cloudhsm/bin cloudhsm/etc java/lib
    	

  3. Copy the cloudhsm_client binary and the needed configuration files to the directories you created in the previous step.
    
    $ cp /opt/cloudhsm/bin/cloudhsm_client cloudhsm/bin
    $ cp -r /opt/cloudhsm/etc/{cloudhsm_client.cfg,customerCA.crt,client.crt,client.key,certs} cloudhsm/etc
    	

  4. Add the necessary libraries by running the commands below. These libraries are needed by the Lambda function to be able to run the cloudhsm_client binary.
    
    $ cp /opt/cloudhsm/lib/libcaviumjca.so lib/
    $ ldd /opt/cloudhsm/bin/cloudhsm_client | awk '{print $3}' | grep "^/" | xargs -I{} cp {} lib/
    	

  5. Add the CloudHSM Java JCE Jars by running the commands below. These JARs include the classes needed by the Lambda function code to run.
    
    $ cp /opt/cloudhsm/java/{cloudhsm-[0-9]*.jar,log4j-*-*.jar} java/lib/
    	

  6. Create the Lambda layer ZIP archive by running the command below. This will create the archive with the name layer.zip in the home directory.
    
    $ zip -r ~/layer.zip * 
    	

  7. Move the ZIP archive (layer.zip) to the server/machine with AWS CLI installed and configured, and run the below command to create the Lambda layer with the name cloudhsm-client-layer.
    
    $ aws lambda publish-layer-version --layer-name cloudhsm-client-layer --zip-file fileb://layer.zip --compatible-runtimes java8
    	

Step 3: Create a secret to store the CU credentials

In this step, you will use Secrets Manager to create a secret to store your CU credentials. You must perform this step on your server/machine that has AWS CLI installed and configured.

Run the following command to create a secret with the name CloudHSM_CU that contains your CU user name and password (Prerequisite 4). Make sure to replace the user name and password below with your actual CU user name and password.


$ export HSM_USER=<user>
$ export HSM_PASSWORD=<password>
$ aws secretsmanager create-secret --name CloudHSM_CU --secret-string "{ \"HSM_USER\": \"$HSM_USER\", \"HSM_PASSWORD\": \"$HSM_PASSWORD\"}"

Step 4: Create an IAM role for the Lambda function

In this step, you’ll create an IAM role that has the permissions necessary for it to be assumed by the Lambda function.

  1. On the server/machine with AWS CLI installed and configured, create a new file with the name trust.json.
    
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": "lambda.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
        }
      ]
    }
    	

  2. Create a role named cloudhsm_lambda_example_role using the following AWS CLI command:

    
    $ aws iam create-role --role-name cloudhsm_lambda_example_role --assume-role-policy-document file://trust.json
    	

  3. Run the commands below to create a new file named policy.json. The policy in this file allows the IAM role to perform the following actions:
    • Writing to CloudWatch Logs. This permission allows the IAM role to write to the CloudWatch Logs of the Lambda function. You can then use the logs for troubleshooting. For more information about accessing CloudWatch Logs for Lambda, refer to this guide.
    • Retrieving the CU secret value from Secrets Manager. The CU credentials stored in the CU secret are needed by the Lambda function to be able to log-in to the CloudHSM cluster.
    • Describing CloudHSM clusters. This permission allows the Lambda function to check the current HSM IPs and update its configuration if the IPs have changed.
    
    $ export SECRET_ARN=$(aws secretsmanager describe-secret --secret-id "CloudHSM_CU" --query "ARN" --output text)
    
    $ cat <<EOF> policy.json
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "CWLogs",
                "Effect": "Allow",
                "Action": [
                    "logs:CreateLogGroup",
                    "logs:CreateLogStream",
                    "logs:PutLogEvents"
                ],
                "Resource": "*"
            },
            {
                "Sid": "SecretsManager",
                "Effect": "Allow",
                "Action": "secretsmanager:GetSecretValue",
                "Resource": "$SECRET_ARN"
            },
            {
                "Sid": "CloudHSM",
                "Effect": "Allow",
                "Action": "cloudhsm:DescribeClusters",
                "Resource": "*"
            }
        ]
    }
    EOF
    	

  4. Attach the policy to the IAM role created in step 2 of this section by running the following command:
    
    $ aws iam put-role-policy --role-name cloudhsm_lambda_example_role --policy-name cloudhsm_lambda_example_policy --policy-document file://policy.json
    	

  5. Attach the AWS managed policy AWSLambdaVPCAccessExecutionRole to the created role by running the command below. This policy allows the IAM role to access the VPC, which is necessary in order to run the Lambda function in a VPC and a subnet.
    
    $ aws iam attach-role-policy --role-name cloudhsm_lambda_example_role --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole
    	

  6. To make sure the CU secret is only accessible to the Lambda function role, run the below commands to attach a resource-based policy to the secret:
    
    $ export ROLE_ARN=$(aws iam get-role --role-name cloudhsm_lambda_example_role --query Role.Arn --output text)
    $ export ASSUMED_ROLE_ARN=$(echo $ROLE_ARN | sed -e "s/:iam:/:sts:/" -e "s/:role/:assumed-role/" -e "s/$/\/cloudhsm_lambda_example/")
    $ export ROOT_ARN=$(echo $ROLE_ARN | sed "s/:role.*/:root/")
    $ cat <<EOF> sm_policy.json
    { "Version": "2012-10-17",
    	"Statement": [
    		{
    			"Effect": "Deny",
    			"Action": "secretsmanager:GetSecretValue",
    			"NotPrincipal": {"AWS": [
    				"$ASSUMED_ROLE_ARN",
    				"$ROLE_ARN",
    				"$ROOT_ARN"
    			]},
    				"Resource": "*"
    		}
    	]
    }
    EOF
    
    $ aws secretsmanager put-resource-policy --resource-policy file://sm_policy.json --secret-id CloudHSM_CU
    	

Step 5: Create the Lambda function

In this step, you will create a Lambda function with the necessary settings.

  1. On the server/machine with AWS CLI installed and configured, run the command below to create a security group with the name outbound-443. This security group will be attached to the Lambda function to allow it to connect to the CloudWatch Logs, Secrets Manager and CloudHSM endpoints. Make sure to replace the CLUSTER_ID below with the actual CloudHSM cluster ID of your environment.
    
    $ export CLUSTER_ID=<cluster-xxxxxxxxxx>
    $ export CLUSTER_VPC=$(aws cloudhsmv2 describe-clusters --filters clusterIds=$CLUSTER_ID --query Clusters[0].VpcId --output text)
    $ export OUTBOUND_SG=$(aws ec2 create-security-group --group-name outbound-443 --description "Allow outbound access to port 443" --vpc-id $CLUSTER_VPC --output text)
    $ aws ec2 authorize-security-group-egress --group-id $OUTBOUND_SG --protocol tcp --port 443 --cidr 0.0.0.0/0
    	

  2. Move the JAR package generated in step 4 of the Step 1 section to the current directory on the server/machine that has AWS CLI installed and configured (The file was generated on the CloudHSM client instance under ~/aws-cloudhsm-on-aws-lambda-sample-master/target/cloudhsm_lambda_project-1.0-SNAPSHOT.jar).
  3. Replace the cluster ID and subnet ID below with the CloudHSM cluster ID of your environment, and the ID of the private Lambda subnet in your environment (Prerequisite 1.c), then run the commands below. These commands set environment variables that you’ll need for the next command.
    
    $ export CLUSTER_ID=<cluster-xxxxxxxxxx>
    $ export SUBNET_ID=<subnet-xxxxxxxx>
    $ export CLUSTER_VPC=$(aws cloudhsmv2 describe-clusters --filters clusterIds=$CLUSTER_ID --query Clusters[0].VpcId --output text)
    $ export OUTBOUND_SG=$(aws ec2 describe-security-groups --filters Name=group-name,Values=outbound-443  --query SecurityGroups[0].GroupId --output text)
    $ export CLUSTER_SG=$(aws cloudhsmv2 describe-clusters --filters clusterIds=$CLUSTER_ID --query Clusters[0].SecurityGroup --output text)
    $ export ROLE_ARN=$(aws iam get-role --role-name cloudhsm_lambda_example_role --query Role.Arn --output text)
    $ export LAYER_ARN=$(aws lambda get-layer-version --layer-name cloudhsm-client-layer --version-number 1 --query LayerVersionArn --output text)
    	

  4. Create a Lambda function with the name cloudhsm_lambda_example by running the below command:
    
    $ aws lambda create-function --function-name "cloudhsm_lambda_example" \
    --runtime java8 \
    --role $ROLE_ARN \
    --handler "com.amazonaws.cloudhsm.examples.AESGCMEncryptDecryptLambda::myhandler" \
    --timeout 600 \
    --memory-size 512 \
    --vpc-config SubnetIds=$SUBNET_ID,SecurityGroupIds=$CLUSTER_SG,$OUTBOUND_SG \
    --environment "Variables={CLUSTER_ID=$CLUSTER_ID, SECRET_ID=CloudHSM_CU,liquidsecurity_daemon_id=1}" \
    --layers $LAYER_ARN \
    --zip-file fileb://cloudhsm_lambda_project-1.0-SNAPSHOT.jar
    	

The command will create a Lambda function with the following configuration:

  • Runtime: Java8
  • Execution Role: The role you created in the Step 4 section.
  • Handler: The name of the class and the function in the package created in the Step 1 section.
  • Timeout: 10 minutes.
  • Memory size: 512 MB.
  • Subnet: The private Lambda subnet in your environment (Prerequisite 1.c).
  • Security Groups: The CloudHSM cluster security group AND the security group created in step 1 of the Step 5 section for outbound access to port 443 (outbound-443).
  • Code/Package: The JAR package you created in step 4 of the Step 1 section.
  • Layer: The layer created in the Step 2 section.
  • Environmental Variables:
    • CLUSTER_ID = the CloudHSM cluster ID in your environment
    • SECRET_ID = the ID of the secret you created in the Step 3 section
    • liquidsecurity_daemon_id = 1 (this is needed by the cloudhsm_client binary)

Step 6: Run the Lambda function

In this step, you will invoke the Lambda function and check the logs to view the output.

  1. You can invoke the Lambda function using the following command. This will execute the code in the package you created in Step 1.
    
    $ aws lambda invoke --function-name cloudhsm_lambda_example out.txt
    	

  2. You can check the function’s CloudWatch Log group with a command like this one:
    
    $ aws logs filter-log-events --log-group-name "/aws/lambda/cloudhsm_lambda_example" --start-time "`date -d "now -5min" +%s`000" --query events[*].message --output text | sed "s/\t/\n/g" 
    	

    If the Lambda function was successful, the output of the function should look something like the example below:

    
    START RequestId: 39c627f2-3908-4424-97ef-038c28a72f9a Version: $LATEST
    
    * Running GetSecretValue to get the CU credentials ...
    SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
    
    SLF4J: Defaulting to no-operation (NOP) logger implementation
    
    SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
    
    * Running DescribeClusters to get the HSM IP ...
    DescribeClusters returned the HSM IP = 1.2.3.4
    * Getting the HSM IP inf the configuration file ...
    The configuration file has the HSM IP = 1.2.3.4
    * Starting the cloudhsm client ...
    * Waiting for the cloudhsm client to start ...
    * cloudhsm client started ...
    * Adding the Cavium provider ...
    ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
    
    * Using credentials to Login to the CloudHSM Cluster ...
    Login successful!
    * Generating AES Key ...
    * Generating Random data to encrypt ...
    Plain Text data = 3B0566E9A3FADA8FED7D6C88FE92ECBE8526922E84489AB48F1F3F3116235E69
    * Encrypting data ...
    Cipher Text data = CA6D80AD34BBADEF34275743F309E6730ABC66BA19C2EADC731899B0FB86564EDDB9F7FC103E1C9C2A6A1E64BF2D2C48
    * Decrypting ciphertext ...
    Decrypted Text data = 3B0566E9A3FADA8FED7D6C88FE92ECBE8526922E84489AB48F1F3F3116235E69
     * Successful decryption
    * Logging out the CloudHSM Cluster
    * Closing client ...
    END RequestId: 39c627f2-3908-4424-97ef-038c28a72f9a
    
    REPORT RequestId: 39c627f2-3908-4424-97ef-038c28a72f9a
    Duration: 11990.69 ms
    Billed Duration: 12000 ms
    Memory Size: 512 MB
    Max Memory Used: 103 MB
    	

Note: The StatusLogger No log4j2 configuration file found error above is normal and can be ignored. This is related to missing log4j configuration which is normally used to configure logging, but is not needed in this case as the log messages are being written to CloudWatch Logs by default.

Conclusion

This solution demonstrates how to run CloudHSM workloads on Lambda, which allows you to not only leverage the flexibility of serverless computing, but also helps you meet security and compliance requirements by performing cryptographic tasks such as encryption and decryption operations. This approach also allows you to integrate with other AWS services like Amazon CloudWatch Events, Amazon Simple Storage Service (Amazon S3), or AWS Config for a seamless experience across your environment.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the AWS CloudHSM forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author photo

Mohamed AboElKheir

Mohamed AboElKheir is an Application Security Engineer who works with different teams to ensure AWS services, applications, and websites are designed and implemented to the highest security standards. He is a subject matter expert for CloudHSM and is always enthusiastic about assisting CloudHSM customers with advanced issues and use cases. Mohamed is passionate about InfoSec, specifically cryptography, penetration testing (he’s OSCP certified), application security, and cloud security (he’s AWS Security Specialty certified).

A Disk-Backed ArrayList

Post Syndicated from Bozho original https://techblog.bozho.net/a-disk-backed-arraylist/

It sometimes happens that your list can become too big to fit in memory and you have to do something in order to avoid running out of memory.

The proper way to do that is streaming – instead of fitting everything in memory, you should stream data from the source and discard the entries that are already processed.

However, there are cases when code that’s outside of your control requires a List and you can’t use streaming. These cases are rather rare but in case you hit them, you have to find a workaround. One is to re-implement the code to work with streaming, but depending on the way the library is written, it may not be possible. So the other option is to use a disk-backed list – one that works as a list, but underneath stores and loads elements from disk.

Searching for existing solutions results in several 3+ years old repos like this one and this one and this one.

And then there’s MapDB, which is great and supported. It’s mostly about maps, but it does support a List as well, as shown here.

And finally, you have the option to implement something simpler yourself, in case you need just iteration and almost nothing else. I’ve done it here – DiskBackedArrayList.java. It doesn’t support many things (not all methods are overridden to throw an exception, but they should). But most importantly, it doesn’t support random adding and random getting, and also toArray(). It’s purely “fill the collection” and then “iterate the collection”. It relies on ObjectOutputStream which is not terribly efficient, but is simple to use. Note that I’ve allowed a short in-memory prependList in case small amounts of data need to be prepended to the list.

The list gets filled in memory until a specified threshold and then gets flushed to disk, clearing the memory which starts getting filled again. This too can be more efficient – with background flushing in another thread that doesn’t interfere with adding elements to the list, but optimizations complicate things and in this case the total running time was not an issue. Most importantly, the iterator() method is overridden to return a custom iterator that first streams the prepended list, then reads everything from disk and finally iterates over the latest batch which is still in memory. And finally, the clear() method should be called in the end in order to close the underlying stream. An output stream could be opened and closed on each flush, but ObjectOutputStream can’t be used in append mode due to some implementation specific about writing headers first.

So basically we hide the streaming approach underneath a List interface – it’s still streaming elements and discarding them when not needed. Ideally this should be done at the source of the data (e.g. a database, message queue, etc.) rather than using the disk as overflow space, but there are cases where using the disk is fine. This implementation is a starting point, as it’s not tested in production, but illustrates that you can adapt existing classes to use different data access patterns if needed.

The post A Disk-Backed ArrayList appeared first on Bozho's tech blog.

Java 11 runtime now available in AWS Lambda

Post Syndicated from Rob Sutter original https://aws.amazon.com/blogs/compute/java-11-runtime-now-available-in-aws-lambda/

We are excited to announce that you can now develop your AWS Lambda functions using the Java 11 runtime. Start using this runtime today by specifying a runtime parameter value of java11 when creating or updating your Lambda functions.

The Java 11 runtime does not introduce any changes in Lambda’s programming model, such as handler definition or logging statements. Customers can continue authoring their Lambda functions in Java as they have in the past while benefitting from the new features of Java 11.

New features in Java 11 runtime

Java 11 is a long-term support release and brings with it several new features, including a Java-native HTTP client with HTTP/2 support and the var keyword. The Java 11 runtime also benefits from Amazon Corretto running on Amazon Linux 2.

HTTP client (standard)

Java 11 introduces a native HTTP client, HttpClient. Previous versions of Java provided the HttpURLConnection class for accessing HTTP resources but, for more complex use cases, developers typically had to select and import a third-party library. HttpClient supports both synchronous and asynchronous HTTP requests.

Example: Synchronous HTTP request

Synchronous requests block execution while the HTTP client waits for a response. This is a common programming model for Lambda functions that are invoked synchronously themselves, for example, via Amazon API Gateway.

package helloworld;

import java.net.http.HttpClient;
import java.net.http.HttpHeaders;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.net.http.HttpResponse.BodyHandlers;
import java.net.URI;
import java.time.Duration;
import java.util.HashMap;
import java.util.Map;

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;

/**
 * Handler for requests to Lambda function.
 */
public class App implements RequestHandler<Object, Object> {

    public Object handleRequest(final Object input, final Context context) {
        Map<String, String> headers = new HashMap<>();
        headers.put("Content-Type", "application/json");
        headers.put("X-Custom-Header", "application/json");
        HttpClient client = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
            .GET()
            .version(HttpClient.Version.HTTP_2)
            .uri(URI.create("https://checkip.amazonaws.com"))
            .timeout(Duration.ofSeconds(15))
            .build();

        try {
            HttpResponse<String> response =
            client.send(request, BodyHandlers.ofString());

            String output = String.format("{ \"message\": \"hello world\", \"location\": \"%s\" }", response.body());
            return new GatewayResponse(output, headers, response.statusCode());    
        } catch (Exception e) {
            return new GatewayResponse("{}", headers, 500);
        }
    }
}

 

The var keyword

The var keyword allows you to declare local variables and infer their type at compile time. This helps reduce verbosity, especially with composite types, as you no longer have to explicitly define type information on both sides of the equal sign. For example, to create a map of key/value string pairs, you can now do:

var map = new HashMap<String, String>();

Corretto benefits

The Java 11 runtime benefits from Amazon Corretto. Corretto is a no-cost, multiplatform, production-ready distribution of the Open Java Development Kit (OpenJDK). Corretto comes with long-term support that will include performance enhancements and security fixes. Amazon runs Corretto internally on thousands of production services.

Special considerations

Developers migrating to the new runtimes should consider the following known issues.

Java 8 to Java 11 migration

After migrating from Java 8 to Java 11, using internal packages such as sun.misc.* or sun.* now produces compiler errors instead of warnings.

Amazon Linux 2

Java 11, like Python 3.8 and Node.js 10 and 12, is based on an Amazon Linux 2 execution environment. Amazon Linux 2 provides a secure, stable, and high-performance execution environment to develop and run cloud and enterprise applications.

Next steps

Get started building with Java 11 today by specifying a runtime parameter value of java11 when creating or updating your Lambda functions.

Hope you enjoy building with the new features in Java 11!

Running Java applications on Amazon EC2 A1 instances with Amazon Corretto

Post Syndicated from Neelay Thaker original https://aws.amazon.com/blogs/compute/running-java-applications-on-amazon-ec2-a1-instances-with-amazon-corretto/

This post is contributed by Jeff Underhill | EC2 Principal Business Development Manager and Arthur Petitpierre | Arm Specialist Solutions Architect

 

Amazon EC2 A1 instances deliver up to 45% cost savings for scale-out applications and are powered by AWS Graviton Processors that feature 64-bit Arm Neoverse cores and custom silicon designed by AWS. Amazon Corretto is a no-cost, multiplatform, production-ready distribution of the Open Java Development Kit (OpenJDK).

Production-ready Arm 64-bit Linux builds of Amazon Corretto for JDK8 and JDK 11 were released Sep 17, 2019. This provided an additional Java runtime option when deploying your scale-out Java applications on Amazon EC2 A1 instances. We’re fortunate to have James Gosling, the designer of Java, as a member of the Amazon team, and he recently took to Twitter to announce the General Availability (GA) of Amazon Corretto for the Arm architecture:

For those of you that like playing with Linux on ARM, the Corretto build for ARM64 is now GA.  Fully production ready. Both JDK8 and JDK11

If you’re interested to experiment with Amazon Corretto on Amazon EC2 A1 instances then read on for step-by-step instructions that will have you up and running in no time.

Launching an A1 EC2 instance

The first step is to create a running Amazon EC2 A1 instance. In this example, we demonstrate how to boot your instance using Amazon Linux 2. Starting from the AWS Console, you need to log-in to your AWS account or create a new account if you don’t already have one. Once you logged into the AWS console, navigate to the Amazon Elastic Compute Cloud (Amazon EC2) as follows:

Once you logged into the AWS console, navigate to the Amazon Elastic Compute Cloud (Amazon EC2) and click on Launch a virtual machine

Next, select the operating system and compute architecture of the EC2 instance you want to launch.  In this case, use Amazon Linux 2 because we want an AWS Graviton-based A1 instance we’re selecting the 64-bit (Arm):use Amazon Linux 2 because we want an AWS Graviton-based A1 instance we’re selecting the 64-bit (Arm)

On the next page, we select an A1 instance type. select an a1.xlarge that offers 4 x vCPU’s and 8GB of memory (refer to the Amazon EC2 A1 page for more information). Then, select the “Review and Launch” button:

select an A1 instance type - an a1.xlarge that offers 4 x vCPU’s and 8GB of memory

Next, you can review a summary of your instance details. This summary is shown in the following pictures. Note: the only network port exposed is SSH via TCP port 22. This allows you to remotely connect to the instance via an SSH terminal:

review a summary of your instance details

Before proceeding be aware you are about to start spending money (and don’t forget to terminate the instance at the end to avoid ongoing charges). As the warning in the screen shot above states: the A1 instance selected is not eligible for free tier.  So, you are charged based on the pricing of the instance selected (refer to the Amazon EC2 on-demand pricing page for details. The a1.xlarge instance selected is $0.102 per Hour as of this writing).

Once you’re ready to proceed, select “Launch” to continue. At this point you need to create or supply an existing key-pair for use when connecting to the new instance via SSH. Details to establish this connection can be found in the EC2 documentation.

In this example, I connect from a Windows laptop using PuTTY.  The details of converting EC2 keys into the right format can be found here. You can connect the same way. In the following screenshot, I use an existing key-pair that I generated. You can create an existing key-pair that best suits your workload and do the following:

Select an existing key pair or create one
While your instance launches, you can click on “View Instances” to see the status of instances within my AWS account:

click on “View Instances” to see the status of instances

Once you click on “View Instances,” you can see that your newly launched instance is now in the Running state:

Once you click on “View Instances,” you can see that your newly launched instance is now in the Running state

Now, you can connect to your new instance. Right click on the instance from within the console, then select “Connect” from the pop-up menu to get details and instructions on connecting to the instance. This is shown in the following screenshot:

select “Connect” from the pop-up menu to get details and instructions on connecting to the instance
The following screenshot provides you with instructions and specific details needed to connect to your running A1 instance:

Connect to your instance using an SSH Client
You can now connect to the running a1.xlarge instance through instructions to map your preferred SSH client.

Then, the Amazon Linux 2 command prompt pops up as follows:

Note: I run the ‘uname -a’ command to show that you are running on an ‘aarch64’ architecture which is the Linux architecture name for 64-bit Arm.

 run the ‘uname -a’ command to show that you are running on an ‘aarch64’ architecture which is the Linux architecture name for 64-bit Arm

Once you complete this step, your A1 instance is up and running. From here, you can leverage Corretto8.

 

Installing corretto8

You can now install Amazon Corretto 8 on Amazon Linux 2 following the instructions from the documentation.  Use option 1 to install the application from Amazon Linux 2 repository:

$ sudo amazon-linux-extras enable corretto8

$ sudo yum clean metadata

$ sudo yum install -y java-1.8.0-amazon-corretto

This code initiates the installation. Once complete, you can use the java version command to see that you have the newest version of Amazon Corretto.  The java command is as follows (your version may be more recent):

$ java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment Corretto-8.232.09.1 (build 1.8.0_232-b09)
OpenJDK 64-Bit Server VM Corretto-8.232.09.1 (build 25.232-b09, mixed mode 

This command confirms that you have Amazon Corretto 8 version 8.232.09.1 installed and ready to go. If you see a version string that doesn’t mention Corretto, this means you have another version of Java already running. In this case, run the following command to change the default java providers:

$ sudo alternatives --config java

Installing tomcat8.5 and a simple JSP application

Once the latest Amazon Corretto is installed, confirm that the Java installation works. You can do this by installing and running a simple Java application.

To run this test, you need to install Apache Tomcat, which is a Java based application web server. Then, open up a public port to make it accessible and connect to it from a browser to confirm it’s running as expected.

Then, install tomcat8.5 from amazon-linux-extras using the following code:

$ sudo amazon-linux-extras enable tomcat8.5
$ sudo yum clean metadata
$ sudo yum install -y tomcat 

Now configure tomcat to use /dev/urandom as an entropy source. This is important to do because otherwise tomcat might hang on a freshly booted instance if there’s not enough entropy depth. Note: there’s a kernel patch in flight to provide an alternate entropy mechanism:

$ sudo bash -c 'echo JAVA_OPTS=\"-Djava.security.egd=file:/dev/urandom\" >> /etc/tomcat/tomcat.conf' 

Next, add a simple JavaServer Pages (JSP) application that will display details about your system.

First,  create default web application directory:

$ sudo install -d -o root -g tomcat /var/lib/tomcat/webapps/ROOT 

Then, add the small JSP application:

$ sudo bash -c 'cat <<EOF > /var/lib/tomcat/webapps/ROOT/index.jsp
<html>
<head>
<title>Corretto8 - Tomcat8.5 - Hello world</title>
</head>
<body>
  <table>
    <tr>
      <td>Operating System</td>
      <td><%= System.getProperty("os.name") %></td>
    </tr>
    <tr>
      <td>CPU Architecture</td>
      <td><%= System.getProperty("os.arch") %></td>
    </tr>
    <tr>
      <td>Java Vendor</td>
      <td><%= System.getProperty("java.vendor") %></td>
    </tr>
    <tr>
      <td>Java URL</td>
      <td><%= System.getProperty("java.vendor.url") %></td>
    </tr>
    <tr>
      <td>Java Version</td>
      <td><%= System.getProperty("java.version") %></td>
    </tr>
    <tr>
      <td>JVM Version</td>
      <td><%= System.getProperty("java.vm.version") %></td>
    </tr>
    <tr>
      <td>Tomcat Version</td>
      <td><%= application.getServerInfo() %></td>
    </tr>
</table>

</body>
</html>
EOF
'

Finally, start the Tomcat service:

$ sudo systemctl start tomcat 

Now the Tomcat service is running, you need to configure your EC2 instance to open TCP port 8080 (the default port that Tomcat listens on). This configuration allows you to access the instance from a browser and confirm Tomcat is running and serving content.

To do this, return to the AWS console and select your EC2 a1.xlarge instance. Then,  in the information panel below, select the associated security group so we can modify the inbound rules to allow TCP accesses on port 8080 as follows:

select the associated security group so we can modify the inbound rules to allow TCP accesses on port 8080

With these modifications you should now be able to connect to the Apache Tomcat default page by directing a browser to http://<your instance IPv4 Public IP>:8080 as follows:

connect to the Apache Tomcat default page by directing a browser to http://<your instance IPv4 Public IP>:8080
 Don’t forget to terminate your EC2 instance(s) when you’re done to avoid ongoing charges!

 

To summarize, we spun up an Amazon EC2 A1 instance, installed and enabled Amazon Corretto and Apache Tomcat server, configured the security group for the EC2 Instance to accept connections to TCP port 8080 and then created and connected to a simple default JSP web page. Being able to display the JSP page confirms  that you’re serving content, and can see the underlying Java Virtual Machine and platform architecture specifications. These steps demonstrate setting-up the Amazon Corretto + Apache Tomcat environment, and running a demo JSP web application on AWS Graviton based Amazon EC2 A1 instances using readily available open source software.

You can learn more at the Amazon Corretto website, and the downloads are all available here for Amazon Corretto 8Amazon Corretto 11 and if you’re using containers here’s the Docker Official image.  If you have any questions about your own workloads running on Amazon EC2 A1 instances, contact us at [email protected].

 

How to run AWS CloudHSM workloads on Docker containers

Post Syndicated from Mohamed AboElKheir original https://aws.amazon.com/blogs/security/how-to-run-aws-cloudhsm-workloads-on-docker-containers/

AWS CloudHSM is a cloud-based hardware security module (HSM) that enables you to generate and use your own encryption keys on the AWS Cloud. With CloudHSM, you can manage your own encryption keys using FIPS 140-2 Level 3 validated HSMs. Your HSMs are part of a CloudHSM cluster. CloudHSM automatically manages synchronization, high availability, and failover within a cluster.

CloudHSM is part of the AWS Cryptography suite of services, which also includes AWS Key Management Service (KMS) and AWS Certificate Manager Private Certificate Authority (ACM PCA). KMS and ACM PCA are fully managed services that are easy to use and integrate. You’ll generally use AWS CloudHSM only if your workload needs a single-tenant HSM under your own control, or if you need cryptographic algorithms that aren’t available in the fully-managed alternatives.

CloudHSM offers several options for you to connect your application to your HSMs, including PKCS#11, Java Cryptography Extensions (JCE), or Microsoft CryptoNG (CNG). Regardless of which library you choose, you’ll use the CloudHSM client to connect to all HSMs in your cluster. The CloudHSM client runs as a daemon, locally on the same Amazon Elastic Compute Cloud (EC2) instance or server as your applications.

The deployment process is straightforward if you’re running your application directly on your compute resource. However, if you want to deploy applications using the HSMs in containers, you’ll need to make some adjustments to the installation and execution of your application and the CloudHSM components it depends on. Docker containers don’t typically include access to an init process like systemd or upstart. This means that you can’t start the CloudHSM client service from within the container using the general instructions provided by CloudHSM. You also can’t run the CloudHSM client service remotely and connect to it from the containers, as the client daemon listens to your application using a local Unix Domain Socket. You cannot connect to this socket remotely from outside the EC2 instance network namespace.

This blog post discusses the workaround that you’ll need in order to configure your container and start the client daemon so that you can utilize CloudHSM-based applications with containers. Specifically, in this post, I’ll show you how to run the CloudHSM client daemon from within a Docker container without needing to start the service. This enables you to use Docker to develop, deploy and run applications using the CloudHSM software libraries, and it also gives you the ability to manage and orchestrate workloads using tools and services like Amazon Elastic Container Service (Amazon ECS), Kubernetes, Amazon Elastic Container Service for Kubernetes (Amazon EKS), and Jenkins.

Solution overview

My solution shows you how to create a proof-of-concept sample Docker container that is configured to run the CloudHSM client daemon. When the daemon is up and running, it runs the AESGCMEncryptDecryptRunner Java class, available on the AWS CloudHSM Java JCE samples repo. This class uses CloudHSM to generate an AES key, then it uses the key to encrypt and decrypt randomly generated data.

Note: In my example, you must manually enter the crypto user (CU) credentials as environment variables when running the container. For any production workload, you’ll need to carefully consider how to provide, secure, and automate the handling and distribution of your HSM credentials. You should work with your security or compliance officer to ensure that you’re using an appropriate method of securing HSM login credentials for your application and security needs.

Figure 1: Architectural diagram

Figure 1: Architectural diagram

Prerequisites

To implement my solution, I recommend that you have basic knowledge of the below:

  • CloudHSM
  • Docker
  • Java

Here’s what you’ll need to follow along with my example:

  1. An active CloudHSM cluster with at least one active HSM. You can follow the Getting Started Guide to create and initialize a CloudHSM cluster. (Note that for any production cluster, you should have at least two active HSMs spread across Availability Zones.)
  2. An Amazon Linux 2 EC2 instance in the same Amazon Virtual Private Cloud in which you created your CloudHSM cluster. The EC2 instance must have the CloudHSM cluster security group attached—this security group is automatically created during the cluster initialization and is used to control access to the HSMs. You can learn about attaching security groups to allow EC2 instances to connect to your HSMs in our online documentation.
  3. A CloudHSM crypto user (CU) account created on your HSM. You can create a CU by following these user guide steps.

Solution details

  1. On your Amazon Linux EC2 instance, install Docker:
    
            # sudo yum -y install docker
            

  2. Start the docker service:
    
            # sudo service docker start
            

  3. Create a new directory and step into it. In my example, I use a directory named “cloudhsm_container.” You’ll use the new directory to configure the Docker image.
    
            # mkdir cloudhsm_container
            # cd cloudhsm_container           
            

  4. Copy the CloudHSM cluster’s CA certificate (customerCA.crt) to the directory you just created. You can find the CA certificate on any working CloudHSM client instance under the path /opt/cloudhsm/etc/customerCA.crt. This certificate is created during initialization of the CloudHSM Cluster and is needed to connect to the CloudHSM cluster.
  5. In your new directory, create a new file with the name run_sample.sh that includes the contents below. The script starts the CloudHSM client daemon, waits until the daemon process is running and ready, and then runs the Java class that is used to generate an AES key to encrypt and decrypt your data.
    
            #! /bin/bash
    
            # start cloudhsm client
            echo -n "* Starting CloudHSM client ... "
            /opt/cloudhsm/bin/cloudhsm_client /opt/cloudhsm/etc/cloudhsm_client.cfg &> /tmp/cloudhsm_client_start.log &
            
            # wait for startup
            while true
            do
                if grep 'libevmulti_init: Ready !' /tmp/cloudhsm_client_start.log &> /dev/null
                then
                    echo "[OK]"
                    break
                fi
                sleep 0.5
            done
            echo -e "\n* CloudHSM client started successfully ... \n"
            
            # start application
            echo -e "\n* Running application ... \n"
            
            java -ea -Djava.library.path=/opt/cloudhsm/lib/ -jar target/assembly/aesgcm-runner.jar --method environment
            
            echo -e "\n* Application completed successfully ... \n"                      
            

  6. In the new directory, create another new file and name it Dockerfile (with no extension). This file will specify that the Docker image is built with the following components:
    • The AWS CloudHSM client package.
    • The AWS CloudHSM Java JCE package.
    • OpenJDK 1.8. This is needed to compile and run the Java classes and JAR files.
    • Maven, a build automation tool that is needed to assist with building the Java classes and JAR files.
    • The AWS CloudHSM Java JCE samples that will be downloaded and built.
  7. Cut and paste the contents below into Dockerfile.

    Note: Make sure to replace the HSM_IP line with the IP of an HSM in your CloudHSM cluster. You can get your HSM IPs from the CloudHSM console, or by running the describe-clusters AWS CLI command.

    
            # Use the amazon linux image
            FROM amazonlinux:2
            
            # Install CloudHSM client
            RUN yum install -y https://s3.amazonaws.com/cloudhsmv2-software/CloudHsmClient/EL7/cloudhsm-client-latest.el7.x86_64.rpm
            
            # Install CloudHSM Java library
            RUN yum install -y https://s3.amazonaws.com/cloudhsmv2-software/CloudHsmClient/EL7/cloudhsm-client-jce-latest.el7.x86_64.rpm
            
            # Install Java, Maven, wget, unzip and ncurses-compat-libs
            RUN yum install -y java maven wget unzip ncurses-compat-libs
            
            # Create a work dir
            WORKDIR /app
            
            # Download sample code
            RUN wget https://github.com/aws-samples/aws-cloudhsm-jce-examples/archive/master.zip
            
            # unzip sample code
            RUN unzip master.zip
            
            # Change to the create directory
            WORKDIR aws-cloudhsm-jce-examples-master
            
            # Build JAR files
            RUN mvn validate && mvn clean package
            
            # Set HSM IP as an environmental variable
            ENV HSM_IP <insert the IP address of an active CloudHSM instance here>
            
            # Configure cloudhms-client
            COPY customerCA.crt /opt/cloudhsm/etc/
            RUN /opt/cloudhsm/bin/configure -a $HSM_IP
            
            # Copy the run_sample.sh script
            COPY run_sample.sh .
            
            # Run the script
            CMD ["bash","run_sample.sh"]                        
            

  8. Now you’re ready to build the Docker image. Use the following command, with the name jce_sample_client. This command will let you use the Dockerfile you created in step 6 to create the image.
    
            # sudo docker build -t jce_sample_client .
            

  9. To run a Docker container from the Docker image you just created, use the following command. Make sure to replace the user and password with your actual CU username and password. (If you need help setting up your CU credentials, see prerequisite 3. For more information on how to provide CU credentials to the AWS CloudHSM Java JCE Library, refer to the steps in the CloudHSM user guide.)
    
            # sudo docker run --env HSM_PARTITION=PARTITION_1 \
            --env HSM_USER=<user> \
            --env HSM_PASSWORD=<password> \
            jce_sample_client
            

    If successful, the output should look like this:

    
            * Starting cloudhsm-client ... [OK]
            
            * cloudhsm-client started successfully ...
            
            * Running application ...
            
            ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors 
            to the console.
            70132FAC146BFA41697E164500000000
            Successful decryption
                SDK Version: 2.03
            
            * Application completed successfully ...          
            

Conclusion

My solution provides an example of how to run CloudHSM workloads on Docker containers. You can use it as a reference to implement your cryptographic application in a way that benefits from the high availability and load balancing built in to AWS CloudHSM without compromising on the flexibility that Docker provides for developing, deploying, and running applications. If you have comments about this post, submit them in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author photo

Mohamed AboElKheir

Mohamed AboElKheir joined AWS in September 2017 as a Security CSE (Cloud Support Engineer) based in Cape Town. He is a subject matter expert for CloudHSM and is always enthusiastic about assisting CloudHSM customers with advanced issues and use cases. Mohamed is passionate about InfoSec, specifically cryptography, penetration testing (he’s OSCP certified), application security, and cloud security (he’s AWS Security Specialty certified).

Bullet Updates – Windowing, Apache Pulsar PubSub, Configuration-based Data Ingestion, and More

Post Syndicated from rosaliebeevm original https://yahooeng.tumblr.com/post/183315480351

yahoodevelopers:

By Akshay Sarma, Principal Engineer, Verizon Media & Brian Xiao, Software Engineer, Verizon Media

This is the first of an ongoing series of blog posts sharing releases and announcements for Bullet, an open-sourced lightweight, scalable, pluggable, multi-tenant query system.

Bullet allows you to query any data flowing through a streaming system without having to store it first through its UI or API. The queries are injected into the running system and have minimal overhead. Running hundreds of queries generally fit into the overhead of just reading the streaming data. Bullet requires running an instance of its backend on your data. This backend runs on common stream processing frameworks (Storm and Spark Streaming currently supported).

The data on which Bullet sits determines what it is used for. For example, our team runs an instance of Bullet on user engagement data (~1M events/sec) to let developers find their own events to validate their code that produces this data. We also use this instance to interactively explore data, throw up quick dashboards to monitor live releases, count unique users, debug issues, and more.

Since open sourcing Bullet in 2017, we’ve been hard at work adding many new features! We’ll highlight some of these here and continue sharing update posts for future releases.

Windowing

Bullet used to operate in a request-response fashion -- you would submit a query and wait for the query to meet its termination conditions (usually duration) before receiving results. For short-lived queries, say, a few seconds, this was fine. But as we started fielding more interactive and iterative queries, waiting even a minute for results became too cumbersome.

Enter windowing! Bullet now supports time and record-based windowing. With time windowing, you can break up your query into chunks of time over its duration and retrieve results for each chunk.  For example, you can calculate the average of a field, and stream back results every second:

In the above example, the aggregation is operating on all the data since the beginning of the query, but you can also do aggregations on just the windows themselves. This is often called a Tumbling window:

image

With record windowing, you can get the intermediate aggregation for each record that matches your query (a Sliding window). Or you can do a Tumbling window on records rather than time. For example, you could get results back every three records:

image

Overlapping windows in other ways (Hopping windows) or windows that reset based on different criteria (Session windows, Cascading windows) are currently being worked on. Stay tuned!

image
image

Apache Pulsar support as a native PubSub

Bullet uses a PubSub (publish-subscribe) message queue to send queries and results between the Web Service and Backend. As with everything else in Bullet, the PubSub is pluggable. You can use your favorite pubsub by implementing a few interfaces if you don’t want to use the ones we provide. Until now, we’ve maintained and supported a REST-based PubSub and an Apache Kafka PubSub. Now we are excited to announce supporting Apache Pulsar as well! Bullet Pulsar will be useful to those users who want to use Pulsar as their underlying messaging service.

If you aren’t familiar with Pulsar, setting up a local standalone is very simple, and by default, any Pulsar topics written to will automatically be created. Setting up an instance of Bullet with Pulsar instead of REST or Kafka is just as easy. You can refer to our documentation for more details.

image

Plug your data into Bullet without code

While Bullet worked on any data source located in any persistence layer, you still had to implement an interface to connect your data source to the Backend and convert it into a record container format that Bullet understands. For instance, your data might be located in Kafka and be in the Avro format. If you were using Bullet on Storm, you would perhaps write a Storm Spout to read from Kafka, deserialize, and convert the Avro data into the Bullet record format. This was the only interface in Bullet that required our customers to write their own code. Not anymore! Bullet DSL is a text/configuration-based format for users to plug in their data to the Bullet Backend without having to write a single line of code.

Bullet DSL abstracts away the two major components for plugging data into the Bullet Backend. A Connector piece to read from arbitrary data-sources and a Converter piece to convert that read data into the Bullet record container. We currently support and maintain a few of these -- Kafka and Pulsar for Connectors and Avro, Maps and arbitrary Java POJOs for Converters. The Converters understand typed data and can even do a bit of minor ETL (Extract, Transform and Load) if you need to change your data around before feeding it into Bullet. As always, the DSL components are pluggable and you can write your own (and contribute it back!) if you need one that we don’t support.

We appreciate your feedback and contributions! Explore Bullet on GitHub, use and help contribute to the project, and chat with us on Google Groups. To get started, try our Quickstarts on Spark or Storm to set up an instance of Bullet on some fake data and play around with it.

New – AWS Toolkits for PyCharm, IntelliJ (Preview), and Visual Studio Code (Preview)

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-aws-toolkits-for-pycharm-intellij-preview-and-visual-studio-code-preview/

Software developers have their own preferred tools. Some use powerful editors, others Integrated Development Environments (IDEs) that are tailored for specific languages and platforms. In 2014 I created my first AWS Lambda function using the editor in the Lambda console. Now, you can choose from a rich set of tools to build and deploy serverless applications. For example, the editor in the Lambda console has been greatly enhanced last year when AWS Cloud9 was released. For .NET applications, you can use the AWS Toolkit for Visual Studio and AWS Tools for Visual Studio Team Services.

AWS Toolkits for PyCharm, IntelliJ, and Visual Studio Code

Today, we are announcing the general availability of the AWS Toolkit for PyCharm. We are also announcing the developer preview of the AWS Toolkits for IntelliJ and Visual Studio Code, which are under active development in GitHub. These open source toolkits will enable you to easily develop serverless applications, including a full create, step-through debug, and deploy experience in the IDE and language of your choice, be it Python, Java, Node.js, or .NET.

For example, using the AWS Toolkit for PyCharm you can:

These toolkits are distributed under the open source Apache License, Version 2.0.

Installation

Some features use the AWS Serverless Application Model (SAM) CLI. You can find installation instructions for your system here.

The AWS Toolkit for PyCharm is available via the IDEA Plugin Repository. To install it, in the Settings/Preferences dialog, click Plugins, search for “AWS Toolkit”, use the checkbox to enable it, and click the Install button. You will need to restart your IDE for the changes to take effect.

The AWS Toolkit for IntelliJ and Visual Studio Code are currently in developer preview and under active development. You are welcome to build and install these from the GitHub repositories:

Building a Serverless application with PyCharm

After installing AWS SAM CLI and AWS Toolkit, I create a new project in PyCharm and choose SAM on the left to create a serverless application using the AWS Serverless Application Model. I call my project hello-world in the Location field. Expanding More Settings, I choose which SAM template to use as the starting point for my project. For this walkthrough, I select the “AWS SAM Hello World”.

In PyCharm you can use credentials and profiles from your AWS Command Line Interface (CLI) configuration. You can change AWS region quickly if you have multiple environments.
The AWS Explorer shows Lambda functions and AWS CloudFormation stacks in the selected AWS region. Starting from a CloudFormation stack, you can see which Lambda functions are part of it.

The function handler is in the app.py file. After I open the file, I click on the Lambda icon on the left of the function declaration to have the option to run the function locally or start a local step-by-step debugging session.

First, I run the function locally. I can configure the payload of the event that is provided in input for the local invocation, starting from the event templates provided for most services, such as the Amazon API Gateway, Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), and so on. You can use a file for the payload, or select the share checkbox to make it available to other team members. The function is executed locally, but here you can choose the credentials and the region to be used if the function is calling other AWS services, such as Amazon Simple Storage Service (S3) or Amazon DynamoDB.

A local container is used to emulate the Lambda execution environment. This function is implementing a basic web API, and I can check that the result is in the format expected by the API Gateway.

After that, I want to get more information on what my code is doing. I set a breakpoint and start a local debugging session. I use the same input event as before. Again, you can choose the credentials and region for the AWS services used by the function.

I step over the HTTP request in the code to inspect the response in the Variables tab. Here you have access to all local variables, including the event and the context provided in input to the function.

After that, I resume the program to reach the end of the debugging session.

Now I am confident enough to deploy the serverless application right-clicking on the project (or the SAM template file). I can create a new CloudFormation stack, or update an existing one. For now, I create a new stack called hello-world-prod. For example, you can have a stack for production, and one for testing. I select an S3 bucket in the region to store the package used for the deployment. If your template has parameters, here you can set up the values used by this deployment.

After a few minutes, the stack creation is complete and I can run the function in the cloud with a right-click in the AWS Explorer. Here there is also the option to jump to the source code of the function.

As expected, the result of the remote invocation is the same as the local execution. My serverless application is in production!

Using these toolkits, developers can test locally to find problems before deployment, change the code of their application or the resources they need in the SAM template, and update an existing stack, quickly iterating until they reach their goal. For example, they can add an S3 bucket to store images or documents, or a DynamoDB table to store your users, or change the permissions used by their functions.

I am really excited by how much faster and easier it is to build your ideas on AWS. Now you can use your preferred environment to accelerate even further. I look forward to seeing what you will do with these new tools!

New – Amazon Kinesis Data Analytics for Java

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-amazon-kinesis-data-analytics-for-java/

Customers are using Amazon Kinesis to collect, process, and analyze real-time streaming data. In this way, they can react quickly to new information from their business, their infrastructure, or their customers. For example, Epic Games ingests more than 1.5 million game events per second for its popular online game, Fornite.

With Amazon Kinesis Data Analytics you can process data in real-time using standard SQL. While SQL provides an easy way to quickly query large volumes of streaming data without learning new frameworks or languages, many customers also want to build more sophisticated data processing applications using general-purpose programming languages.

Using Java with Amazon Kinesis Data Analytics

Today, we are introducing support for Java in Amazon Kinesis Data Analytics. Now, developers can use their own Java code to create powerful real-time applications that process streaming data like continuously transforming and loading data into their data lakes, generating metrics to feed real-time gaming leaderboards, applying machine learning models to data streams from connected devices, and more.

To use this new functionality, developers build applications using open source libraries which include built-in operators for common data processing functions that allow applications to organize, transform, aggregate, and analyze data at any scale. These libraries are both open source and you can run them anywhere:

  • Apache Flink, an open source framework and engine for processing data streams.
  • AWS SDK for Java, providing Java APIs for many AWS services.

Developers can use these Java libraries within their Integrated Development Environment (IDE) of choice. Using these libraries, the following AWS services can be integrated with as little as one line of code:

  • Streaming Data Sources: Amazon Kinesis Data Streams
  • Streaming Destinations: Amazon S3, Amazon DynamoDB, Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose

In addition to the pre-built AWS integrations, the Java libraries include more connectors to tools like Cassandra, ElasticSearch, RabbitMQ, Redis, and more, and the ability to build custom integrations.

Building a Kinesis Data Streams Java Application

I prepared a simple Java application that implements the “mandatory” word count example for data processing. I send some paragraphs of text in input and I get, every five seconds, the number of times each word is being used as output.

First, I create two Kinesis Data Streams:

  • TextInputStream, where I am going to send my input records
  • WordCountOutputStream, where I am going to read the output of the Java application

 

Here is the code of the word-count Java application. To read and write from Kinesis Data Streams, I am using the Kinesis Connector from the Apache Flink project.

public class StreamingJob {

    private static final String region = "us-east-1";
    private static final String inputStreamName = "TextInputStream";
    private static final String outputStreamName = "WordCountOutputStream";

    private static DataStream<String> createSourceFromStaticConfig(
            StreamExecutionEnvironment env) {
        Properties inputProperties = new Properties();
        inputProperties.setProperty(ConsumerConfigConstants.AWS_REGION, region);
        inputProperties.setProperty(ConsumerConfigConstants.STREAM_INITIAL_POSITION,
            "LATEST");

        return env.addSource(new FlinkKinesisConsumer<>(inputStreamName,
            new SimpleStringSchema(), inputProperties));
    }

    private static FlinkKinesisProducer<String> createSinkFromStaticConfig() {
        Properties outputProperties = new Properties();
        outputProperties.setProperty(ConsumerConfigConstants.AWS_REGION, region);

        FlinkKinesisProducer<String> sink = new FlinkKinesisProducer<>(new
            SimpleStringSchema(), outputProperties);
        sink.setDefaultStream(outputStreamName);
        sink.setDefaultPartition("0");
        return sink;
    }

    public static void main(String[] args) throws Exception {

        final StreamExecutionEnvironment env =
        StreamExecutionEnvironment.getExecutionEnvironment();

        DataStream<String> input = createSourceFromStaticConfig(env);

        input.flatMap(new Tokenizer())
             .keyBy(0)
             .timeWindow(Time.seconds(5))
             .sum(1)
             .map(new MapFunction<Tuple2<String, Integer>, String>() {
                 @Override
                 public String map(Tuple2<String, Integer> value) throws Exception {
                     return value.f0 + "," + value.f1.toString();
                }
             })
             .addSink(createSinkFromStaticConfig());

        env.execute("Word Count");
    }

    public static final class Tokenizer
        implements FlatMapFunction<String, Tuple2<String, Integer>> {

		@Override
		public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
			String[] tokens = value.toLowerCase().split("\\W+");
			for (String token : tokens) {
				if (token.length() > 0) {
					out.collect(new Tuple2<>(token, 1));
				}
			}
		}
    }
    
}

The most important part of the application is the manipulation of the input object, where I apply a few DataStream Transformations:

  1. I start with a DataFrame containing the String from the input stream.
  2. I use a Tokenizer in a FlatMap to split the sentence into “words”, each word followed by the number “1”.
  3. I apply the KeyBy operator to logically partition the stream in respect to the “word”.
  4. I use a 5 seconds tumbling window.
  5. I aggregate within the window, summing up for each word the number “1” to count them.
  6. I use a simple Map for each record to join the word and the number into a comma-separated values (CSV) String that I send to the output stream.

One of the most powerful operators shown here is the KeyBy operator. It enables you to re-organize a particular stream by a specified key in real-time. This type of re-keying enables further downstream operations like aggregations, counts, and much more. This enables you to set up streaming map-reduce on different keys within the same application.

I build the Java application using Maven and load the output JAR to an Amazon Simple Storage Service (S3) bucket in the region where I want to deploy the application. In the Kinesis Data Analytics console, I create a new application and select “Flink” as runtime:

I then configure the application to use the code on my S3 bucket. The console updates the IAM role for the application to have permissions to read the code.

You can optionally add key/value properties to the configuration of the application. You can read those properties from within the application, to provide customization at deployment time.

For monitoring, I leave the default metrics. I enable logging to Amazon CloudWatch, for errors only.

Don’t forget to add permissions to the IAM role created by the console to allow the Kinesis Analytics application to read and write from the streams used for input and output, TextInputStream and WordCountOutputStream in my case.

I can now start the application with the “Run” button, and when it is running, I use a script that I prepared to put some text (I am using a description of the Amazon Kinesis platform) in the input stream:

$ python put_records.py TextInputStream
Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data...

The behavior of my application is summarized in the console in the Application Graph, a visual representation of the data flow consisting of operators and intermediate results (complex applications, using multiple streams, have a much more interesting graph):

To read the output stream, I am using a Lambda function written in Python. I am using the one provided with the Kinesis Record Aggregation & Deaggregation Modules for AWS Lambda, that provides automatic “de-aggregation” of records aggregated by the Amazon Kinesis Producer Library (KPL).

As expected, in the CloudWatch Logs console I get the list of the words and the number of times they were used, updated every 5 seconds by the Lambda function:

Pricing and Availability

With Amazon Kinesis Data Analytics for Java, you pay only for what you use. Pricing is similar to Amazon Kinesis Data Analytics for SQL, but there are a few differences.

For Java applications, you are charged a single additional Amazon Kinesis Processing Unit (KPU) per application, used for application orchestration. Java applications are also charged for running application storage and durable application backups. Running application storage is used for Amazon Kinesis Data Analytics’ stateful processing capabilities and is charged per GB-month. Durable application backups are optional and provide a point-in-time recovery point for applications, charged per GB-month.

For example, pricing is $0.11 per KPU hour in US East (N. Virginia), and you are charged for running application storage ($0.10 per GB-month) and durable application backups ($0.023 per GB-month).

Available Now

Amazon Kinesis Data Analytics for Java is available now in US East (N. Virginia), US East (Ohio), US West (Oregon), EU West (Ireland).

I only scratched the surface of the capabilities for stream processing enabled by the support of Java in Amazon Kinesis Data Analytics. I think this is a powerful tool that can enable new use cases. Let me know what you are going to build with it!

Re-affirming Long-Term Support for Java in Amazon Linux

Post Syndicated from Deepak Singh original https://aws.amazon.com/blogs/compute/re-affirming-long-term-support-for-java-in-amazon-linux/

In light of Oracle’s recent announcement indicating an end to free long-term support for OpenJDK after January 2019, we re-affirm that the OpenJDK 8 and OpenJDK 11 Java runtimes in Amazon Linux 2 will continue to receive free long-term support from Amazon until at least June 30, 2023. We are collaborating and contributing in the OpenJDK community to provide our customers with a free long-term supported Java runtime.

In addition, Amazon Linux AMI 2018.03, the last major release of Amazon Linux AMI, will receive support for the OpenJDK 8 runtime at least until June 30, 2020, to facilitate migration to Amazon Linux 2. Java runtimes provided by AWS Services such as AWS Lambda, AWS Elastic Map Reduce (EMR), and AWS Elastic Beanstalk will also use the AWS supported OpenJDK builds.

Amazon Linux users will not need to make any changes to get support for OpenJDK 8. OpenJDK 11 will be made available through the Amazon Linux 2 repositories at a future date. The Amazon Linux OpenJDK support posture will also apply to the on-premises virtual machine images and Docker base image of Amazon Linux 2.

Amazon Linux 2 provides a secure, stable, and high-performance execution environment. Amazon Linux AMI and Amazon Linux 2 include a Java runtime based on OpenJDK 8 and are available in all public AWS regions at no additional cost beyond the pricing for Amazon EC2 instance usage.

Security updates for Monday

Post Syndicated from ris original https://lwn.net/Articles/756489/rss

Security updates have been issued by CentOS (procps, xmlrpc, and xmlrpc3), Debian (batik, prosody, redmine, wireshark, and zookeeper), Fedora (jasper, kernel, poppler, and xmlrpc), Mageia (git and wireshark), Red Hat (rh-java-common-xmlrpc), Slackware (git), SUSE (bzr, dpdk-thunderxdpdk, and ocaml), and Ubuntu (exempi).

Security updates for Thursday

Post Syndicated from ris original https://lwn.net/Articles/756164/rss

Security updates have been issued by CentOS (389-ds-base, corosync, firefox, java-1.7.0-openjdk, java-1.8.0-openjdk, kernel, librelp, libvirt, libvncserver, libvorbis, PackageKit, patch, pcs, and qemu-kvm), Fedora (asterisk, ca-certificates, gifsicle, ncurses, nodejs-base64-url, nodejs-mixin-deep, and wireshark), Mageia (thunderbird), Red Hat (procps), SUSE (curl, kvm, and libvirt), and Ubuntu (apport, haproxy, and tomcat7, tomcat8).

Security updates for Monday

Post Syndicated from ris original https://lwn.net/Articles/755796/rss

Security updates have been issued by Debian (batik, cups, gitlab, ming, and xdg-utils), Fedora (dpdk, firefox, glibc, nodejs-deep-extend, strongswan, thunderbird, thunderbird-enigmail, wavpack, xdg-utils, and xen), Gentoo (ntp, rkhunter, and zsh), openSUSE (Chromium, GraphicsMagick, jasper, opencv, pdns, and wireshark), SUSE (jasper, java-1_7_1-ibm, krb5, libmodplug, and openstack-nova), and Ubuntu (thunderbird).

Security updates for Friday

Post Syndicated from ris original https://lwn.net/Articles/755667/rss

Security updates have been issued by Arch Linux (bind, libofx, and thunderbird), Debian (thunderbird, xdg-utils, and xen), Fedora (procps-ng), Mageia (gnupg2, mbedtls, pdns, and pdns-recursor), openSUSE (bash, GraphicsMagick, icu, and kernel), Oracle (thunderbird), Red Hat (java-1.7.1-ibm, java-1.8.0-ibm, and thunderbird), Scientific Linux (thunderbird), and Ubuntu (curl).

C is to low level

Post Syndicated from Robert Graham original https://blog.erratasec.com/2018/05/c-is-too-low-level.html

I’m in danger of contradicting myself, after previously pointing out that x86 machine code is a high-level language, but this article claiming C is a not a low level language is bunk. C certainly has some problems, but it’s still the closest language to assembly. This is obvious by the fact it’s still the fastest compiled language. What we see is a typical academic out of touch with the real world.

The author makes the (wrong) observation that we’ve been stuck emulating the PDP-11 for the past 40 years. C was written for the PDP-11, and since then CPUs have been designed to make C run faster. The author imagines a different world, such as where CPU designers instead target something like LISP as their preferred language, or Erlang. This misunderstands the state of the market. CPUs do indeed supports lots of different abstractions, and C has evolved to accommodate this.


The author criticizes things like “out-of-order” execution which has lead to the Spectre sidechannel vulnerabilities. Out-of-order execution is necessary to make C run faster. The author claims instead that those resources should be spent on having more slower CPUs, with more threads. This sacrifices single-threaded performance in exchange for a lot more threads executing in parallel. The author cites Sparc Tx CPUs as his ideal processor.

But here’s the thing, the Sparc Tx was a failure. To be fair, it’s mostly a failure because most of the time, people wanted to run old C code instead of new Erlang code. But it was still a failure at running Erlang.

Time after time, engineers keep finding that “out-of-order”, single-threaded performance is still the winner. A good example is ARM processors for both mobile phones and servers. All the theory points to in-order CPUs as being better, but all the products are out-of-order, because this theory is wrong. The custom ARM cores from Apple and Qualcomm used in most high-end phones are so deeply out-of-order they give Intel CPUs competition. The same is true on the server front with the latest Qualcomm Centriq and Cavium ThunderX2 processors, deeply out of order supporting more than 100 instructions in flight.

The Cavium is especially telling. Its ThunderX CPU had 48 simple cores which was replaced with the ThunderX2 having 32 complex, deeply out-of-order cores. The performance increase was massive, even on multithread-friendly workloads. Every competitor to Intel’s dominance in the server space has learned the lesson from Sparc Tx: many wimpy cores is a failure, you need fewer beefy cores. Yes, they don’t need to be as beefy as Intel’s processors, but they need to be close.

Even Intel’s “Xeon Phi” custom chip learned this lesson. This is their GPU-like chip, running 60 cores with 512-bit wide “vector” (sic) instructions, designed for supercomputer applications. Its first version was purely in-order. Its current version is slightly out-of-order. It supports four threads and focuses on basic number crunching, so in-order cores seems to be the right approach, but Intel found in this case that out-of-order processing still provided a benefit. Practice is different than theory.

As an academic, the author of the above article focuses on abstractions. The criticism of C is that it has the wrong abstractions which are hard to optimize, and that if we instead expressed things in the right abstractions, it would be easier to optimize.

This is an intellectually compelling argument, but so far bunk.

The reason is that while the theoretical base language has issues, everyone programs using extensions to the language, like “intrinsics” (C ‘functions’ that map to assembly instructions). Programmers write libraries using these intrinsics, which then the rest of the normal programmers use. In other words, if your criticism is that C is not itself low level enough, it still provides the best access to low level capabilities.

Given that C can access new functionality in CPUs, CPU designers add new paradigms, from SIMD to transaction processing. In other words, while in the 1980s CPUs were designed to optimize C (stacks, scaled pointers), these days CPUs are designed to optimize tasks regardless of language.

The author of that article criticizes the memory/cache hierarchy, claiming it has problems. Yes, it has problems, but only compared to how well it normally works. The author praises the many simple cores/threads idea as hiding memory latency with little caching, but misses the point that caches also dramatically increase memory bandwidth. Intel processors are optimized to read a whopping 256 bits every clock cycle from L1 cache. Main memory bandwidth is orders of magnitude slower.

The author goes onto criticize cache coherency as a problem. C uses it, but other languages like Erlang don’t need it. But that’s largely due to the problems each languages solves. Erlang solves the problem where a large number of threads work on largely independent tasks, needing to send only small messages to each other across threads. The problems C solves is when you need many threads working on a huge, common set of data.

For example, consider the “intrusion prevention system”. Any thread can process any incoming packet that corresponds to any region of memory. There’s no practical way of solving this problem without a huge coherent cache. It doesn’t matter which language or abstractions you use, it’s the fundamental constraint of the problem being solved. RDMA is an important concept that’s moved from supercomputer applications to the data center, such as with memcached. Again, we have the problem of huge quantities (terabytes worth) shared among threads rather than small quantities (kilobytes).

The fundamental issue the author of the the paper is ignoring is decreasing marginal returns. Moore’s Law has gifted us more transistors than we can usefully use. We can’t apply those additional registers to just one thing, because the useful returns we get diminish.

For example, Intel CPUs have two hardware threads per core. That’s because there are good returns by adding a single additional thread. However, the usefulness of adding a third or fourth thread decreases. That’s why many CPUs have only two threads, or sometimes four threads, but no CPU has 16 threads per core.

You can apply the same discussion to any aspect of the CPU, from register count, to SIMD width, to cache size, to out-of-order depth, and so on. Rather than focusing on one of these things and increasing it to the extreme, CPU designers make each a bit larger every process tick that adds more transistors to the chip.

The same applies to cores. It’s why the “more simpler cores” strategy fails, because more cores have their own decreasing marginal returns. Instead of adding cores tied to limited memory bandwidth, it’s better to add more cache. Such cache already increases the size of the cores, so at some point it’s more effective to add a few out-of-order features to each core rather than more cores. And so on.

The question isn’t whether we can change this paradigm and radically redesign CPUs to match some academic’s view of the perfect abstraction. Instead, the goal is to find new uses for those additional transistors. For example, “message passing” is a useful abstraction in languages like Go and Erlang that’s often more useful than sharing memory. It’s implemented with shared memory and atomic instructions, but I can’t help but think it couldn’t better be done with direct hardware support.

Of course, as soon as they do that, it’ll become an intrinsic in C, then added to languages like Go and Erlang.

Summary

Academics live in an ideal world of abstractions, the rest of us live in practical reality. The reality is that vast majority of programmers work with the C family of languages (JavaScript, Go, etc.), whereas academics love the epiphanies they learned using other languages, especially function languages. CPUs are only superficially designed to run C and “PDP-11 compatibility”. Instead, they keep adding features to support other abstractions, abstractions available to C. They are driven by decreasing marginal returns — they would love to add new abstractions to the hardware because it’s a cheap way to make use of additional transitions. Academics are wrong believing that the entire system needs to be redesigned from scratch. Instead, they just need to come up with new abstractions CPU designers can add.