AWS Lambda launched support for packaging and deploying functions as container images at re:Invent 2020. In this post you learn how to build container images that reduce image size as well as build, deployment, and update time. Lambda container images have unique characteristics to consider for optimization. This means that the techniques you use to optimize container images for Lambda functions are slightly different from those you use for other environments.
To understand how to optimize container images, it helps to understand how container images are packaged, as well as how the Lambda service retrieves, caches, deploys, and retires container images.
Pre-requisites and assumptions
This post assumes you have access to an IAM user or role in an AWS account and a version of the tar utility on your machine. You must also install Docker and the AWS SAM CLI and start Docker.
Lambda container image packaging
Lambda container images are packaged according to the Open Container Initiative (OCI) Image Format specification. The specification defines how programs build and package individual layers into a single container image. To explore an example of the OCI Image Format, open a terminal and perform the following steps:
Create an AWS SAM application.
sam init –name container-images
Choose 1 to select an AWS quick start template, then choose 2 to select container image as the packaging format, and finally choose 9 to use the amazon-go1.x-base image.
After the AWS SAM CLI generates the application, enter the following commands to change into the new directory and build the Lambda container image
cd container-images
sam build
AWS SAM builds your function and packages it as helloworldfunction:go1.x-v1. Export this container image to a tar archive and extract the filesystem into a new directory to explore the image format.
docker save helloworldfunction:go1.x-v1 > oci-image.tar
mkdir -p image
tar xf oci-image.tar -C image
The image directory contains several subdirectories, a container metadata JSON file, a manifest JSON file, and a repositories JSON file. Each subdirectory represents a single layer, and contains a version file, its own metadata JSON file, and a tar archive of the files that make up the layer.
The manifest.json file contains a single JSON object with the name of the container metadata file, a list of repository tags, and a list of included layers. The list of included layers is ordered according to the build order in your Dockerfile. The metadata JSON file in each subfolder also contains a mapping from each layer to its parent layer or final container.
Your function should have layers similar to the following. A separate layer is created any time files are added to the container image. This includes FROM, RUN, ADD, and COPY statements in your Dockerfile and base image Dockerfiles. Note that the specific layer IDs, layer sizes, number, and composition of layers may change over time.
ID
Size
Description
Your function’s Dockerfile step
5fc256be…
641 MB
Amazon Linux
c73e7f67…
320 KB
Third-party licenses
de5f5100…
12 KB
Lambda entrypoint script
2bd3c722…
7.8 MB
AWS Lambda RIE
5d9d381b…
10.0 MB
AWS Lambda runtime
cb832ffc…
12 KB
Bootstrap link
1fcc74e8…
560 KB
Lambda runtime library
FROM public.ecr.aws/lambda/go:1
acb8dall…
9.6 MB
Function code
COPY –from=build-image /go/bin/ /var/task/
Runtimes generate a filesystem image by destructively overlaying each image layer over its parent. This means that any changes to one layer require all child layers to be recreated. In the following example, if you change the layer cb832ffc... then the layers 1fcc74e8… and acb8da111… are also considered “dirty” and must be recreated from the new parent image. This results in a new container image with eight layers, the first five the same as the original image, and the last three newly built, each with new IDs and parents.
The layered structure of container images informs several decisions you make when optimizing your container images.
Strategies for optimizing container images
There are four main strategies for optimizing your container images. First, wherever possible, use the AWS-provided base images as a starting point for your container images. Second, use multi-stage builds to avoid adding unnecessary layers and files to your final image. Third, order the operations in your Dockerfile from most stable to most frequently changing. Fourth, if your application uses one or more large layers across all of your functions, store all of your functions in a single repository.
Use AWS-provided base images
If you have experience packaging traditional applications for container runtimes, using AWS-provided base images may seem counterintuitive. The AWS-provided base images are typically larger than other minimal container base images. For example, the AWS-provided base image for the Go runtime public.ecr.aws/lambda/go:1 is 670 MB, while alpine:latest, a popular starting point for building minimal container images, is only 5.58 MB. However, using the AWS-provided base images offers three advantages.
First, the AWS-provided base images are cached pro-actively by the Lambda service. This means that the base image is either nearby in another upstream cache or already in the worker instance cache. Despite being much larger, the deployment time may still be shorter when compared to third-party base images, which may not be cached. For additional details on how the Lambda service caches container images, see the re:Invent 2021 talk Deep dive into AWS Lambda security: Function isolation.
Second, the AWS-provided base images are stable. As the base image is at the bottom layer of the container image, any changes require every other layer to be rebuilt and redeployed. Fewer changes to your base image mean fewer rebuilds and redeployments, which can reduce build cost.
Finally, the AWS-provided base images are built on Amazon Linux and Amazon Linux 2. Depending on your chosen runtime, they may already contain a number of utilities and libraries that your functions may need. This means that you do not need to add them in later, saving you from creating additional layers that can cause more build steps leading to increased costs.
Use multi-stage builds
Multi-stage builds allow you to build your code in larger preliminary images, copy only the artifacts you need into your final container image, and discard the preliminary build steps. This means you can run any arbitrarily large number of commands and add or copy files into the intermediate image, but still only create one additional layer in your container image for the artifact. This reduces both the final size and the attack surface of your container image by excluding build-time dependencies from your runtime image.
AWS SAM CLI generates Dockerfiles that use multi-stage builds.
FROM golang:1.14 as build-image
WORKDIR /go/src
COPY go.mod main.go ./
RUN go build -o ../bin
FROM public.ecr.aws/lambda/go:1
COPY --from=build-image /go/bin/ /var/task/
# Command can be overwritten by providing a different command in the template directly.
CMD ["hello-world"]
This Dockerfile defines a two-stage build. First, it pulls the golang:1.14 container image and names it build-image. Naming intermediate stages is optional, but it makes it easier to refer to previous stages when packaging your final container image. Note that the golang:1.14 image is 810 MB, is not likely to be cached by the Lambda service, and contains a number of build tools that you should not include in your production images. The build-image stage then builds your function and saves it in /go/bin.
The second and final stage begins from the public.ecr.aws/lambda/go:1 base image. This image is 670 MB, but because it is an AWS-provided image, it is more likely to be cached on worker instances. The COPY command copies the contents of /go/bin from the build-image stage into /var/task in the container image, and discards the intermediate stage.
Build from stable to frequently changing
Any time a layer in an image changes, all layers that follow must be rebuilt, repackaged, redeployed, and recached by the Lambda service. In practice, this means that you should make your most frequently occurring changes as late in your Dockerfile as possible.
For example, if you have a stable Lambda function that uses a frequently updated machine learning model to make predictions, add your function to the container image before adding the machine learning model. However, if you have a function that changes frequently but relies on a stable Lambda extension, copy the extension into the image first.
If you put the frequently changing component early in your Dockerfile, all the build steps that follow must be re-run every time that component changes. If one of those actions is costly, for example, compiling a large library or running a complex simulation, these repetitions add unnecessary time and cost to your deployment pipeline.
Use a single repository for functions with large layers
When you create an application with multiple Lambda functions, you either store the container images in a single Amazon ECR repository or in multiple repositories, one for each function. If your application uses one or more large layers across all of your functions, store all of your functions in a single repository.
ECR repositories compare each layer of a container image when it is pushed to avoid uploading and storing duplicates. If each function in your application uses the same large layer, such as a custom runtime or machine learning model, that layer is stored exactly once in a shared repository. If you use a separate repository for each function, that layer is duplicated across repositories and must be uploaded separately to each one. This costs you time and network bandwidth.
Conclusion
Packaging your Lambda functions as container images enables you to use familiar tooling and take advantage of larger deployment limits. In this post you learn how to build container images that reduce image size as well as build, deployment, and update time. You learn some of the unique characteristics of Lambda container images that impact optimization. Finally, you learn how to think differently about image optimization for Lambda functions when compared to packaging traditional applications for container runtimes.
For more information on how to build serverless applications, including source code, blogs, videos, and more, visit the Serverless Land website.
As previously discussed on the Netflix Tech Blog, Titus is the Netflix container orchestration system. It runs a wide variety of workloads from various parts of the company — everything from the frontend API for netflix.com, to machine learning training workloads, to video encoders. In Titus, the hosts that workloads run on are abstracted from our users. The Titus platform maintains large pools of homogenous node capacity to run user workloads, and the Titus scheduler places workloads. This abstraction allows the compute team to influence the reliability, efficiency, and operability of the fleet via the scheduler. The hosts that run workloads are called Titus “agents.” In this post, we describe how Titus agents leverage user namespaces to improve the overall security of the Titus agent fleet.
Titus’s Multi-Tenant Clusters
The Titus agent fleet appears to users as a homogenous pool of capacity. Titus internally employs a cellular bulkhead architecture for scalability, so the fleet is composed of multiple cells. Many bulkhead architectures partition their cells on tenants, where a tenant is defined as a team and their collection of applications. We do not take this approach, and instead, we partition our cells to balance load. We do this for reliability, scalability, and efficiency reasons.
Titus is a multi-tenant system, allowing multiple teams and users to run workloads on the system, and ensuring they can all co-exist while still providing guarantees about security and performance. Much of this comes down to isolation, which comes in multiple forms. These forms include performance isolation (ensuring workloads do not degrade one another’s performance), capacity isolation (ensuring that a given tenant can acquire resources when they ask for them), fault isolation (ensuring that the failure of a part of the system doesn’t cause the whole system to fail), and security isolation (ensuring that the compromise of one tenant’s workload does not affect the security of other tenants). This post focuses on our approaches to security isolation.
Secure Multi-tenancy
One of Titus’s biggest concerns with multi-tenancy is security isolation. We want to allow different kinds of containers from different tenants to run on the same instance. Security isolation in containers has been a contentious topic. Despite the risks, we’ve chosen to leverage containers as part of our security boundary. To offset the risks brought about by the container security boundary, we employ some additional protections.
The building blocks of multi-tenancy are Linux namespaces, the very technology that makes LXC, Docker, and other kinds of containers possible. For example, the PID namespace makes it so that a process can only see PIDs in its own namespace, and therefore cannot send kill signals to random processes on the host. In addition to the default Docker namespaces (mount, network, UTS, IPC, and PID), we employ user namespaces for added layers of isolation. Unfortunately, these default namespace boundaries are not sufficient to prevent container escape, as seen in CVEs like CVE-2015–2925. These vulnerabilities arise due to the complexity of interactions between namespaces, a large number of historical decisions during kernel development, and leaky abstractions like the proc filesystem in Linux. Composing these security isolation primitives correctly is difficult, so we’ve looked to other layers for additional protection.
Running many different workloads multi-tenant on a host necessitates the prevention lateral movement, a technique in which the attacker compromises a single piece of software running in a container on the system, and uses that to compromise other containers on the same system. To mitigate this, we run containers as unprivileged users — making it so that users cannot use “root.” This is important because, in Linux, UID 0 (or root’s privileges), do not come from the mere fact that the user is root, but from capabilities. These capabilities are tied to the current process’s credentials. Capabilities can be added via privilege escalation (e.g., sudo, file capabilities) or removed (e.g., setuid, or switching namespaces). Various capabilities control what the root user can do. For example, the CAP_SYS_BOOT capability controls the ability of a given user to reboot the machine. There are also more common capabilities that are granted to users like CAP_NET_RAW, which allows a process the ability to open raw sockets. A user can automatically have capabilities added when they execute specific files via file capabilities. For example, on a stock Ubuntu system, the ping command needs CAP_NET_RAW:
One of the most powerful capabilities in Linux is CAP_SYS_ADMIN, which is effectively equivalent to having superuser access. It gives the user the ability to do everything from mounting arbitrary filesystems, to accessing tracepoints that can expose vital information about the Linux kernel. Other powerful capabilities include CAP_CHOWN and CAP_DAC_OVERRIDE, which grant the capability to manipulate file permissions.
In the kernel, you’ll often see capability checks spread throughout the code, which looks something like this:
Notice this function doesn’t check if the user is root, but if the task has the CAP_SYS_ADMIN capability before allowing it to execute.
Docker takes the approach of using an allow-list to define which capabilities a container receives. These can be extended or attenuated by the user. Even the default capabilities that are defined in the Docker profile can be abused in certain situations. When we looked into running workloads as unprivileged users without many of these capabilities, we found that it was a non-starter. Various pieces of software used elevated capabilities for FUSE, low-level packet monitoring, and performance tracing amongst other use cases. Programs will usually start with capabilities, perform any activities that require those capabilities, and then “drop” them when the process no longer needs them.
User Namespaces
Fortunately, Linux has a solution — User Namespaces. Let’s go back to that kernel code example earlier. The pcrlock function called the capable function to determine whether or not the task was capable. This function is defined as:
This checks if the task has this capability relative to the init_user_ns. The init_user_ns is the namespace that processes are initialially spawned in, as it’s the only user namespace that exists at kernel startup time. User namespaces are a mechanism to split up the init_user_ns UID space. The interface to set up the mappings is via a “uid_map” and “gid_map” that’s exposed via /proc. The mapping looks something like this:
This allows UIDs in user-namespaced containers to be mapped to host UIDs. A variety of translations occur, but from the container’s perspective, everything is from the perspective of the UID ranges (otherwise known as extents) that are mapped. This is powerful in a few ways:
It allows you to make certain UIDs off-limits to the container — if a UID is not mapped in the user namespace to a real UID, and you try to examine a file on disk with it, it will show up as overflowuid / overflowgid, a UID and GID specified in /proc/sys to indicate that it cannot be mapped into the current working space. Also, the container cannot setuid to a UID that can access files owned by that “outside uid.”
From the user namespace’s perspective, the container’s root user appears to be UID 0, and the container can use the entire range of UIDs that are mapped into that namespace.
Kernel subsystems can then proceed to call ns_capable with the specific user namespace that is tied to the resource. Many capability checks are now done to a user namespace that is relative to the resource being manipulated. This, in turn, allows processes to exercise certain privileges without having any privileges in the init user namespace. Even if the mapping is the same across many different namespaces, capability checks are still done relative to a specific user namespace.
One critical aspect of understanding how permissions work is that every namespace belongs to a specific user namespace. For example, let’s look at the UTS namespace, which is responsible for controlling the hostname:
The namespace has a relationship with a particular user namespace. The ability for a user to manipulate the hostname is based on whether or not the process has the appropriate capability in that user namespace.
Let’s Get Into It
We can examine how the interaction of namespaces and users work ourselves. To set the hostname in the UTS namespace, you need to have CAP_SYS_ADMIN in its user namespace. We can see this in action here, where an unprivileged process doesn’t have permission to set the hostname:
The reason for this is that the process does not have CAP_SYS_ADMIN. According to /proc/self/status, the effective capability set of this process is empty:
Now, let’s try to set up a user namespace, and see what happens:
Immediately, you’ll notice the command prompt says the current user is root, and that the id command agrees. Can we set the hostname now?
We still cannot set the hostname. This is because the process is still in the initial UTS namespace. Let’s see if we can unshare the UTS namespace, and set the hostname:
This is now successful, and the process is in an isolated UTS namespace with the hostname “foo.” This is because the process now has all of the capabilities that a traditional root user would have, except they are relative to the new user namespace we created:
If we inspect this process from the outside, we can see that the process still runs as the unprivileged user, and the hostname in the original outside namespace hasn’t changed:
From here, we can do all sorts of things, like mount filesystems, create other new namespaces, and in fact, we can create an entire container environment. Notice how no privilege escalation mechanism was used to perform any of these actions. This approach is what some people refer to as “rootless containers.”
Road to Implementation
We began work to enable user namespaces in early 2017. At the time we had a naive model that was simpler. This simplicity was possible because we were running without user namespaces:
This approach mirrored the process layout and boundaries of contemporary container orchestration systems. We had a shared metrics daemon on the machine that reached in and polled metrics from the container. User access was done by exposing an SSH daemon, and automatically doing nsenter on the user’s behalf to drop them into the container. To expose files to the container we would use bind mounts. The same mechanism was used to expose configuration, such as secrets.
This had the benefit that much of our software could be installed in the host namespace, and only manage files in the that namespace. The container runtime management system (Titus) was then responsible for configuring Docker to expose the right files to the container via bind mounts. In addition to that, we could use our standard metrics daemons on the host.
Although this model was easy to reason about and write software for, it had several shortcomings that we addressed by shifting everything to running inside of the container’s unprivileged user namespace. The first shortcoming was that all of the host daemons now needed to be aware of the UID translation, and perform the proper setuid or chown calls to transition across the container boundary. Second, each of these transitions represented a security risk. If the SSH daemon only partially transitioned into the container namespace by changing into the container’s pid namespace, it would leave its /proc accessible. This could then be used by a malicious attacker to escape.
With user namespaces, we can improve our security posture and reduce the complexity of the system by running those daemons in the container’s unprivileged user namespace, which removes the need to cross the namespace boundaries. In turn, this removes the need to correctly implement a cross-namespace transition mechanism thus, reducing the risk of introducing container escapes.
We did this by moving aspects of the container runtime environment into the container. For example, we run an SSH daemon per container and a metrics daemon per container. These run inside of the namespaces of the container, and they have the same capabilities and lifecycle as the workloads in the container. We call this model “System Services” — one can think of it as a primordial version of pods. By the end of 2018, we had moved all of our containers to run in unprivileged user namespaces successfully.
Why is this useful?
This may seem like another level of indirection that just introduces complexity, but instead, it allows us to leverage an extremely useful concept — “unprivileged containers.” In unprivileged containers, the root user starts from a baseline in which they don’t automatically have access to the entire system. This means that DAC, MAC, and seccomp policies are now an extra layer of defense against accessing privileged aspects of the system — not the only layer. As new privileges are added, we do not have to add them to an exclusion list. This allows our users to write software where they can control low-level system details in their own containers, rather than forcing all of the complexity up into the container runtime.
Use Case: FUSE
Netflix internally uses a purpose built FUSE filesystem called MezzFS. The purpose of this filesystem is to provide access to our content for a variety of encoding tools. Most of these encoding tools are designed to interact with the POSIX filesystem API. Our Media Cloud Engineering team wanted to leverage containers for a new platform they were building, called Archer. Archer, in turn, uses MezzFS, which needs FUSE, and at the time, FUSE required that the user have CAP_SYS_ADMIN in the initial user namespace. To accommodate the use case from our internal partner, we had to run them in a dedicated cluster where they could run privileged containers.
In 2017, we worked with our partner, Kinvolk, to have patches added to the Linux kernel that allowed users to safely use FUSE from non-init user namespaces. They were able to successfully upstream these patches, and we’ve been using them in production. From our user’s perspective, we were able to seamlessly move them into an unprivileged environment that was more secure. This simplified operations, as this workload was no longer considered exceptional, and could run alongside every other workload in the general node pool. In turn, this allowed the media encoding team access to a massive amount of compute capacity from the shared clusters, and better reliability due to the homogeneous nature of the deployment.
Use Case: Unintended Privileges
Many CVEs related to granting containers unintended privileges have been released in the past few years:
CVE-2019–5736: Privilege escalation via overwriting host runc binary
CVE-2018–10892: Access to /proc/acpi, allowing an attacker to modify hardware configuration
There will certainly be more vulnerabilities in the future, as is to be expected in any complex, quickly evolving system. We already use the default settings offered by Docker, such as AppArmor, and seccomp, but by adding user namespaces, we can achieve a superior defense-in-depth security model. These CVEs did not affect our infrastructure because we were using user namespaces for all of our containers. The attenuation of capabilities in the init user namespace performed as intended and stopped these attacks.
The Future
There are still many bits of the Kernel that are receiving support for user namespaces or enhancements making user namespaces easier to use. Much of the work left to do is focused on filesystems and container orchestration systems themselves. Some of these changes are slated for upcoming kernel releases. Work is being done to add unprivileged mounts to overlayfs allowing for nested container builds in a user namespace with layers. Future work is going on to make the Linux kernel VFS layer natively understand ID translation. This will make user namespaces with different ID mappings able to access the same underlying filesystem by shifting UIDs through a bind mount. Our partners at Kinvolk are also working on bringing user namespaces to Kubernetes.
Today, a variety of container runtimes support user namespaces. Docker can set up machine-wide UID mappings with separate user namespaces per container, as outlined in their docs. Any OCI compliant runtime such as Containerd / runc, Podman, and systemd-nspawn support user namespaces. Various container orchestration engines also support user namespaces via their underlying container runtimes, such as Nomad and Docker Swarm.
As part of our move to Kubernetes, Netflix has been working with Kinvolk on getting user namespaces to work under Kubernetes. You can follow this work via the KEP discussion here, and Kinvolk has more information about running user namespaces under Kubernetes on their blog. We look forward to evolving container security together with the Kubernetes community.
At AWS re:Invent 2020, AWS Lambda released Container Image Support for Lambda functions. This new feature allows developers to package and deploy Lambda functions as container images of up to 10 GB in size. With this release, AWS SAM also added support to manage, build, and deploy Lambda functions using container images.
In this blog post, I walk through building a simple serverless application that uses Lambda functions packaged as container images with AWS SAM. I demonstrate creating a new application and highlight changes to the AWS SAM template specific to container image support. I then cover building the image locally for debugging in addition to eventual deployment. Finally, I show using AWS SAM to handle packaging and deploying Lambda functions from a developer’s machine or a CI/CD pipeline.
Push to invoke lifecycle
The process for creating a Lambda function packaged as a container requires only a few steps. A developer first creates the container image and tags that image with the appropriate label. The image is then uploaded to an Amazon Elastic Container Registry (ECR) repository using docker push.
During the Lambda create or update process, the Lambda service pulls the image from ECR, optimizes the image for use, and deploys the image to the Lambda service. Once this, and any other configuration processes are complete, the Lambda function is then in Active status and ready to be invoked. The AWS SAM CLI manages most of these steps for you.
Prerequisites
The following tools are required in this walkthrough:
Use the terminal and follow these steps to create a serverless application:
Enter sam init.
For Template source, select option one for AWS Quick Start Templates.
For Package type, choose option two for Image.
For Base image, select option one for amazon/nodejs12.x-base.
Name the application demo-app.
Demonstration of sam init
Exploring the application
Open the template.yaml file in the root of the project to see the new options available for container image support. The AWS SAM template has two new values that are required when working with container images. PackageType: Image tells AWS SAM that this function is using container images for packaging.
AWS SAM template
The second set of required data is in the Metadata section that helps AWS SAM manage the container images. When a container is created, a new tag is added to help identify that image. By default, Docker uses the tag, latest. However, AWS SAM passes an explicit tag name to help differentiate between functions. That tag name is a combination of the Lambda function resource name, and the DockerTag value found in the Metadata. Additionally, the DockerContext points to the folder containing the function code and Dockerfile identifies the name of the Dockerfile used in building the container image.
In addition to changes in the template.yaml file, AWS SAM also uses the Docker CLI to build container images. Each Lambda function has a Dockerfile that instructs Docker how to construct the container image for that function. The Dockerfile for the HelloWorldFunction is at hello-world/Dockerfile.
Local development of the application
AWS SAM provides local development support for zip-based and container-based Lambda functions. When using container-based images, as you modify your code, update the local container image using sam build. AWS SAM then calls docker build using the Dockerfile for instructions.
Dockerfile for Lambda function
In the case of the HelloWorldFunction that uses Node.js, the Docker command:
sam build --cached --parallel && sam local invoke HelloWorldFunction
Deploying the application
There are two ways to deploy container-based Lambda functions with AWS SAM. The first option is to deploy from AWS SAM using the sam deploy command. The deploy command tags the local container image, uploads it to ECR, and then creates or updates your Lambda function. The second method is the sam package command used in continuous integration and continuous delivery or deployment (CI/CD) pipelines, where the deployment process is separate from the artifact creation process.
AWS SAM package tags and uploads the container image to ECR but does not deploy the application. Instead, it creates a modified version of the template.yaml file with the newly created container image location. This modified template is later used to deploy the serverless application using AWS CloudFormation.
Deploying from AWS SAM with the guided flag
Before you can deploy the application, use the AWS CLI to create a new ECR repository to store the container image for the HelloWorldFunction.
This command creates a new ECR repository called demo-app-hello-world. The –image-tag-mutability IMMUTABLE option prevents overwriting tags. The –image-scanning-configuration scanOnPush=true enables automated vulnerability scanning whenever a new image is pushed to the repository. The output is:
Amazon ECR creation output
Make a note of the repositoryUri as you need it in the next step.
Before you can push your images to this new repository, ensure that you have logged in to the managed Docker service that ECR provides. Update the bracketed tokens with your information and run the following command in the terminal:
After building the application locally and creating a repository for the container image, you can deploy the application. The first time you deploy an application, use the guided version of the sam deploy command and follow these steps:
Type sam deploy --guided, or sam deploy -g.
For StackName, enter demo-app.
Choose the same Region that you created the ECR repository in.
Enter the Image Repository for the HelloWorldFunction (this is the repositoryUri of the ECR repository).
For Confirm changes before deploy and Allow SAM CLI IAM role creation, keep the defaults.
For HelloWorldFunction may not have authorization defined, Is this okay? Select Y.
Keep the defaults for the remaining prompts.
Results of sam deploy –guided
AWS SAM uploads the container images to the ECR repo and deploys the application. During this process, you see a changeset along with the status of the deployment. When the deployment is complete, the stack outputs are then displayed. Use the HelloWorldApi endpoint to test your application in production.
Deploy outputs
When you use the guided version, AWS SAM saves the entered data to the samconfig.toml file. For subsequent deployments with the same parameters, use sam deploy. If you want to make a change, use the guided deployment again.
This example demonstrates deploying a serverless application with a single, container-based Lambda function in it. However, most serverless applications contain more than one Lambda function. To work with an application that has more than one Lambda function, follow these steps to add a second Lambda function to your application:
Copy the hello-world directory using the terminal command cp -R hello-world hola-world
Replace the contents of the template.yaml file with the following
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: demo app
Globals:
Function:
Timeout: 3
Resources:
HelloWorldFunction:
Type: AWS::Serverless::Function
Properties:
PackageType: Image
Events:
HelloWorld:
Type: Api
Properties:
Path: /hello
Method: get
Metadata:
DockerTag: nodejs12.x-v1
DockerContext: ./hello-world
Dockerfile: Dockerfile
HolaWorldFunction:
Type: AWS::Serverless::Function
Properties:
PackageType: Image
Events:
HolaWorld:
Type: Api
Properties:
Path: /hola
Method: get
Metadata:
DockerTag: nodejs12.x-v1
DockerContext: ./hola-world
Dockerfile: Dockerfile
Outputs:
HelloWorldApi:
Description: "API Gateway endpoint URL for Prod stage for Hello World function"
Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/hello/"
HolaWorldApi:
Description: "API Gateway endpoint URL for Prod stage for Hola World function"
Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/hola/"
Replace the contents of hola-world/app.js with the following
Run the guided deploy to add the second repository:
sam deploy -g
The AWS SAM guided deploy process allows you to provide the information again but prepopulates the defaults with previous values. Update the following:
Keep the same stack name, Region, and Image Repository for HelloWorldFunction.
Use the new repository for HolaWorldFunction.
For the remaining steps, use the same values from before. For Lambda functions not to have authorization defined, enter Y.
Results of sam deploy –guided
Deploying in a CI/CD pipeline
Companies use continuous integration and continuous delivery (CI/CD) pipelines to automate application deployment. Because the process is automated, using an interactive process like a guided AWS SAM deployment is not possible.
Developers can use the packaging process in AWS SAM to prepare the artifacts for deployment and produce a separate template usable by AWS CloudFormation. The package command is:
sam package --output-template-file packaged-template.yaml \
--image-repository 5555555555.dkr.ecr.us-west-2.amazonaws.com/demo-app
For multiple repositories:
sam package --output-template-file packaged-template.yaml \
--image-repositories HelloWorldFunction=5555555555.dkr.ecr.us-west-2.amazonaws.com/demo-app-hello-world \
--image-repositories HolaWorldFunction=5555555555.dkr.ecr.us-west-2.amazonaws.com/demo-app-hola-world
Both cases create a file called packaged-template.yaml. The Lambda functions in this template have an added tag called ImageUri that points to the ECR repository and a tag for the Lambda function.
Packaged template
Using sam package to generate a separate CloudFormation template enables developers to separate artifact creation from application deployment. The deployment process can then be placed in an isolated stage allowing for greater customization and observability of the pipeline.
Conclusion
Container image support for Lambda enables larger application artifacts and the ability to use container tooling to manage Lambda images. AWS SAM simplifies application management by bringing these tools into the serverless development workflow.
In this post, you create a container-based serverless application in using command lines in the terminal. You create ECR repositories and associate them with functions in the application. You deploy the application from your local machine and package the artifacts for separate deployment in a CI/CD pipeline.
To learn more about serverless and AWS SAM, visit the Sessions with SAM series at s12d.com/sws and find more resources at serverlessland.com.
This post is contributed by Bill Kerr and Raj Seshadri
For most customers, infrastructure is hardly done with CI/CD in mind. However, Infrastructure as Code (IaC) should be a best practice for DevOps professionals when they provision cloud-native assets. Microservice apps that run inside an Amazon EKS cluster often use CI/CD, so why not the cluster and related cloud infrastructure as well?
This blog demonstrates how to spin up cluster infrastructure managed by CI/CD using CDK code and Cloud Resource Property Manager (CRPM) property files. Managing cloud resources is ultimately about managing properties, such as instance type, cluster version, etc. CRPM helps you organize all those properties by importing bite-sized YAML files, which are stitched together with CDK. It keeps all of what’s good about YAML in YAML, and places all of the logic in beautiful CDK code. Ultimately this improves productivity and reliability as it eliminates manual configuration steps.
Architecture Overview
In this architecture, we create a six node Amazon EKS cluster. The Amazon EKS cluster has a node group spanning private subnets across two Availability Zones. There are two public subnets in different Availability Zones available for use with an Elastic Load Balancer.
Changes to the primary (master) branch triggers a pipeline, which creates CloudFormation change sets for an Amazon EKS stack and a CI/CD stack. After human approval, the change sets are initiated (executed).
Prerequisites
Get ready to deploy the CloudFormation stacks with CDK
First, to get started with CDK you spin up a AWS Cloud9 environment, which gives you a code editor and terminal that runs in a web browser. Using AWS Cloud9 is optional but highly recommended since it speeds up the process.
Leave the default settings and select Next step again.
Select Create environment.
Download and install the dependencies and demo CDK application
In a terminal, let’s review the code used in this article and install it.
# Install TypeScript globally for CDK
npm i -g typescript
# If you are running these commands in Cloud9 or already have CDK installed, then
skip this command
npm i -g aws-cdk
# Clone the demo CDK application code
git clone https://github.com/shi/crpm-eks
# Change directory
cd crpm-eks
# Install the CDK application
npm i
Create the IAM service role
When creating an EKS cluster, the IAM role that was used to create the cluster is also the role that will be able to access it afterwards.
Deploy the CloudFormation stack containing the role
Let’s deploy a CloudFormation stack containing a role that will later be used to create the cluster and also to access it. While we’re at it, let’s also add our current user ARN to the role, so that we can assume the role.
# Deploy the EKS management role CloudFormation stack
cdk deploy role --parameters AwsArn=$(aws sts get-caller-identity --query Arn --output text)
# It will ask, "Do you wish to deploy these changes (y/n)?"
# Enter y and then press enter to continue deploying
Notice the Outputs section that shows up in the CDK deploy results, which contains the role name and the role ARN. You will need to copy and paste the role ARN (ex. arn:aws:iam::123456789012:role/eks-role-us-east- 1) from your Outputs when deploying the next stack.
Example Outputs:
role.ExportsOutputRefRoleFF41A16F = eks-role-us-east-1
role.ExportsOutputFnGetAttRoleArnED52E3F8 = arn:aws:iam::123456789012:role/eksrole-us-east-1
Create the EKS cluster
Now that we have a role created, it’s time to create the cluster using that role.
Deploy the stack containing the EKS cluster in a new VPC
Expect it to take over 20 minutes for this stack to deploy.
# Deploy the EKS cluster CloudFormation stack
# REPLACE ROLE_ARN WITH ROLE ARN FROM OUTPUTS IN ROLE STACK CREATED ABOVE
cdk deploy eks -r ROLE_ARN
# It will ask, "Do you wish to deploy these changes (y/n)?"
# Enter y and then press enter to continue deploying
Notice the Outputs section, which contains the cluster name (ex. eks-demo) and the UpdateKubeConfigCommand. The UpdateKubeConfigCommand is useful if you already have kubectl installed somewhere and would rather use your own to interact with the cluster instead of using Cloud9’s.
Navigate to this page in the AWS console if you would like to see your cluster, which is now ready to use.
Configure kubectl with access to cluster
If you are following along in Cloud9, you can skip configuring kubectl.
If you prefer to use kubectl installed somewhere else, now would be a good time to configure access to the newly created cluster by running the UpdateKubeConfigCommand mentioned in the Outputs section above. It requires that you have the AWS CLI installed and configured.
aws eks update-kubeconfig --name eks-demo --region us-east-1 --role-arn
arn:aws:iam::123456789012:role/eks-role-us-east-1
# Test access to cluster
kubectl get nodes
Leveraging Infrastructure CI/CD
Now that the VPC and cluster have been created, it’s time to turn on CI/CD. This will create a cloned copy of github.com/shi/crpm-eks in CodeCommit. Then, an AWS CloudWatch Events rule will start watching the CodeCommit repo for changes and triggering a CI/CD pipeline that builds and validates CloudFormation templates, and executes CloudFormation change sets.
Deploy the stack containing the code repo and pipeline
# Deploy the CI/CD CloudFormation stack
cdk deploy cicd
# It will ask, "Do you wish to deploy these changes (y/n)?"
# Enter y and then press enter to continue deploying
Notice the Outputs section, which contains the CodeCommit repo name (ex. eks-ci-cd). This is where the code now lives that is being watched for changes.
Example Outputs:
cicd.ExportsOutputFnGetAttLambdaRoleArn275A39EB =
arn:aws:iam::774461968944:role/eks-ci-cd-LambdaRole-6PFYXVSLTQ0D
cicd.ExportsOutputFnGetAttRepositoryNameC88C868A = eks-ci-cd
Review the pipeline for the first time
Navigate to this page in the AWS console and you should see a new pipeline in progress. The pipeline is automatically run for the first time when it is created, even though no changes have been made yet. Open the pipeline and scroll down to the Review stage. You’ll see that two change sets were created in parallel (one for the EKS stack and the other for the CI/CD stack).
Select Review to open an approval popup where you can enter a comment.
Select Reject or Approve. Following the Review button, the blue link to the left of Fetch: Initial commit by AWS CodeCommit can be selected to see the infrastructure code changes that triggered the pipeline.
Go ahead and approve it.
Clone the new AWS CodeCommit repo
Now that the golden source that is being watched for changes lives in a AWS CodeCommit repo, we need to clone that repo and get rid of the repo we’ve been using up to this point.
If you are following along in AWS Cloud9, you can skip cloning the new repo because you are just going to discard the old AWS Cloud9 environment and start using a new one.
Now would be a good time to clone the newly created repo mentioned in the preceding Outputs section Next, delete the old repo that was cloned from GitHub at the beginning of this blog. You can visit this repository to get the clone URL for the repo.
Review this documentation for help with accessing your private AWS CodeCommit repo using HTTPS.
Review this documentation for help with accessing your repo using SSH.
# Clone the CDK application code (this URL assumes the us-east-1 region)
git clone https://git-codecommit.us-east-1.amazonaws.com/v1/repos/eks-ci-cd
# Change directory
cd eks-ci-cd
# Install the CDK application
npm i
# Remove the old repo
rm -rf ../crpm-eks
Deploy the stack containing the Cloud9 IDE with kubectl and CodeCommit repo
If you are NOT using Cloud9, you can skip this section.
To make life easy, let’s create another Cloud9 environment that has kubectl preconfigured and ready to use, and also has the new CodeCommit repo checked out and ready to edit.
# Deploy the IDE CloudFormation stack
cdk deploy ide
Configuring the new Cloud9 environment
Although kubectl and the code are now ready to use, we still have to manually configure Cloud9 to stop using AWS managed temporary credentials in order for kubectl to be able to access the cluster with the management role. Here’s how to do that and test kubectl:
1. Navigate to this page in the AWS console. 2. In Your environments, select Open IDE for the newly created environment (possibly named eks-ide). 3. Once opened, navigate at the top to AWS Cloud9 -> Preferences. 4. Expand AWS SETTINGS, and under Credentials, disable AWS managed temporary credentials by selecting the toggle button. Then, close the Preferences tab. 5. In a terminal in Cloud9, enter aws configure. Then, answer the questions by leaving them set to None and pressing enter, except for Default region name. Set the Default region name to the current region that you created everything in. The output should look similar to:
AWS Access Key ID [None]:
AWS Secret Access Key [None]:
Default region name [None]: us-east-1
Default output format [None]:
6. Test the environment
kubectl get nodes
If everything is working properly, you should see two nodes appear in the output similar to:
NAME STATUS ROLES AGE VERSION
ip-192-168-102-69.ec2.internal Ready <none> 4h50m v1.17.11-ekscfdc40
ip-192-168-69-2.ec2.internal Ready <none> 4h50m v1.17.11-ekscfdc40
You can use kubectl from this IDE to control the cluster. When you close the IDE browser window, the Cloud9 environment will automatically shutdown after 30 minutes and remain offline until the next time you reopen it from the AWS console. So, it’s a cheap way to have a kubectl terminal ready when needed.
Delete the old Cloud9 environment
If you have been following along using Cloud9 the whole time, then you should have two Cloud9 environments running at this point (one that was used to initially create everything from code in GitHub, and one that is now ready to edit the CodeCommit repo and control the cluster with kubectl). It’s now a good time to delete the old Cloud9 environment.
In Your environments, select the radio button for the old environment (you named it when creating it) and select Delete.
In the popup, enter the word Delete and select Delete.
Now you should be down to having just one AWS Cloud9 environment that was created when you deployed the ide stack.
Trigger the pipeline to change the infrastructure
Now that we have a cluster up and running that’s defined in code stored in a AWS CodeCommit repo, it’s time to make some changes:
We’ll commit and push the changes, which will trigger the pipeline to update the infrastructure.
We’ll go ahead and make one change to the cluster nodegroup and another change to the actual CI/CD build process, so that both the eks-cluster stack as well as the eks-ci-cd stack get changed.
1. In the code that was checked out from AWS CodeCommit, open up res/compute/eks/nodegroup/props.yaml. At the bottom of the file, try changing minSize from 1 to 4, desiredSize from 2 to 6, and maxSize from 3 to 6 as seen in the following screenshot. Then, save the file and close it. The res (resource) directory is your well organized collection of resource properties files.
2. Next, open up res/developer-tools/codebuild/project/props.yaml and find where it contains computeType: ‘BUILD_GENERAL1_SMALL’. Try changing BUILD_GENERAL1_SMALL to BUILD_GENERAL1_MEDIUM. Then, save the file and close it.
3. Commit and push the changes in a terminal.
cd eks-ci-cd
git add .
git commit -m "Increase nodegroup scaling config sizes and use larger build
environment"
git push
4. Visit https://console.aws.amazon.com/codesuite/codepipeline/pipelines in the AWS console and you should see your pipeline in progress.
5. Wait for the Review stage to become Pending.
a. Following the Approve action box, click the blue link to the left of “Fetch: …” to see the infrastructure code changes that triggered the pipeline. You should see the two code changes you committed above.
3. After reviewing the changes, go back and select Review to open an approval popup.
4. In the approval popup, enter a comment and select Approve.
5. Wait for the pipeline to finish the Deploy stage as it executes the two change sets. You can refresh the page until you see it has finished. It should take a few minutes.
6. To see that the CodeBuild change has been made, scroll up to the Build stage of the pipeline and click on the AWS CodeBuild link as shown in the following screenshot.
7. Next, select the Build details tab, and you should determine that your Compute size has been upgraded to 7 GB memory, 4 vCPUs as shown in the following screenshot.
8. By this time, the cluster nodegroup sizes are probably updated. You can confirm with kubectl in a terminal.
# Get nodes
kubectl get nodes
If everything is ready, you should see six (desired size) nodes appear in the output similar to:
NAME STATUS ROLES AGE VERSION
ip-192-168-102-69.ec2.internal Ready <none> 5h42m v1.17.11-ekscfdc40
ip-192-168-69-2.ec2.internal Ready <none> 5h42m v1.17.11-ekscfdc40
ip-192-168-43-7.ec2.internal Ready <none> 10m v1.17.11-ekscfdc40
ip-192-168-27-14.ec2.internal Ready <none> 10m v1.17.11-ekscfdc40
ip-192-168-36-56.ec2.internal Ready <none> 10m v1.17.11-ekscfdc40
ip-192-168-37-27.ec2.internal Ready <none> 10m v1.17.11-ekscfdc40
Cluster is now manageable by code
You now have a cluster than can be maintained by simply making changes to code! The only resources not being managed by CI/CD in this demo, are the management role, and the optional AWS Cloud9 IDE. You can log into the AWS console and edit the role, adding other Trust relationships in the future, so others can assume the role and access the cluster.
Clean up
Do not try to delete all of the stacks at once! Wait for the stack(s) in a step to finish deleting before moving onto the next step.
1. Navigate to this page in the AWS console. 2. Delete the two IDE stacks first (the ide stack spawned another stack). 3. Delete the ci-cd stack. 4. Delete the cluster stack (this one takes a long time). 5. Delete the role stack.
Additional resources
Cloud Resource Property Manager (CRPM) is an open source project maintained by SHI, hosted on GitHub, and available through npm.
Conclusion
In this blog, we demonstrated how you can spin up an Amazon EKS cluster managed by CI/CD using CDK code and Cloud Resource Property Manager (CRPM) property files. Making updates to this cluster is easy as modifying the property files and updating the AWS CodePipline. Using CRPM can improve productivity and reliability because it eliminates manual configurations steps.
Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers
This post is authored by Deepthi Chelupati, Senior Product Manager for Amazon EC2 Spot Instances, and Chad Schmutzer, Principal Developer Advocate for Amazon EC2
Customers have been using EC2 Spot Instances to save money and scale workloads to new levels for over a decade. Launched in late 2009, Spot Instances are spare Amazon EC2 compute capacity in the AWS Cloud available for steep discounts off On-Demand Instance prices. One thing customers love about Spot Instances is their integration across many services on AWS, the AWS Partner Network, and open source software. These integrations unlock the ability to take advantage of the deep savings and scale Spot Instances provide for interruptible workloads. Some of the most popular services used with Spot Instances include Amazon EC2 Auto Scaling, Amazon EMR, AWS Batch, Amazon Elastic Container Service (Amazon ECS), and Amazon Elastic Kubernetes Service (EKS). When we talk with customers who are early in their cost optimization journey about the advantages of using Spot Instances with their favorite workloads, they typically can’t wait to get started. They often tell us that while they’d love to start using Spot Instances right away, flipping between documentation, blog posts, and the AWS Management Console is time consuming and eats up precious development cycles. They want to know the fastest way to start saving with Spot Instances, while feeling confident they’ve applied best practices. Customers have overwhelmingly told us that the best way for them to get started quickly is to see complete workload configuration examples in infrastructure as code templates for AWS CloudFormation and Hashicorp Terraform. To address this feedback, we launched Spot Blueprints.
Spot Blueprints overview
Today we are excited to tell you about Spot Blueprints, an infrastructure code template generator that lives right in the EC2 Spot Console. Built based on customer feedback, Spot Blueprints guides you through a few short steps. These steps are designed to gather your workload requirements while explaining and configuring Spot best practices along the way. Unlike traditional wizards, Spot Blueprints generates custom infrastructure as code in real time within each step so you can easily understand the details of the configuration. Spot Blueprints makes configuring EC2 instance type agnostic workloads easy by letting you express compute capacity requirements as vCPU and memory. Then the wizard automatically expands those requirements to a flexible list of EC2 instance types available in the AWS Region you are operating. Spot Blueprints also takes high-level Spot best practices like Availability Zone flexibility, and applies them to your workload compute requirements. For example, automatically including all of the required resources and dependencies for creating a Virtual Private Cloud (VPC) with all Availability Zones configured for use. Additional workload-specific Spot best practices are also configured, such as graceful interruption handling for load balancer connection draining, automatic job retries, or container rescheduling.
Spot Blueprints makes keeping up with the latest and greatest Spot features (like the capacity-optimized allocation strategy and Capacity Rebalancing for EC2 Auto Scaling) easy. Spot Blueprints is continually updated to support new features as they become available. Today, Spot Blueprints supports generating infrastructure as code workload templates for some of the most popular services used with Spot Instances: Amazon EC2 Auto Scaling, Amazon EMR, AWS Batch, and Amazon EKS. You can tell us what blueprint you’d like to see next right in Spot Blueprints. We are excited to hear from you!
What are Spot best practices?
We’ve mentioned Spot best practices a few times so far in this blog. Let’s quickly review the best practices and how they relate to Spot Blueprints. When using Spot Instances, it is important to understand a couple of points:
Spot Instances are interruptible and must be returned when EC2 needs the capacity back
The location and amount of spare capacity available at any given moment is dynamic and continually changes in real time
For these reasons, it is important to follow best practices when using Spot Instances in your workloads. We like to call these “Spot best practices.” These best practices can be summarized as follows:
Only run workloads that are truly interruption tolerant, meaning interruptible at both the individual instance level and overall application level
Spot workloads should be flexible, meaning they can be shifted in real time to where the spare capacity currently is, or otherwise be paused until spare capacity is available again
In practice, being flexible means qualifying a workload to run on multiple EC2 instance types (think big: multiple families, sizes, and generations), and in multiple Availability Zones, at any given time
Over the last few years, we’ve focused on making it easier to follow these best practices by adding features such as the following:
Mixed instances policy: an Auto Scaling group configuration to enhance availability by deploying across multiple instance types running in multiple Availability Zones
Amazon EC2 Instance Selector: a CLI tool and go library that recommends instance types based on resource criteria like vCPUs and memory
As mentioned prior, in addition to the Spot best practices we reviewed, there is an additional set of Spot best practices native to each workload that has integrated support for Spot Instances. For example, the ability to implement graceful interruption handling for:
Load balancer connection draining
Automatic job retries
Container draining and rescheduling
Spot Blueprints are designed to quickly explain and generate templates with Spot best practices for each specific workload. The custom-generated workload templates can be downloaded in either AWS CloudFormation or HashiCorp Terraform format, allowing for further customization and learning before being deployed in your environment.
Next, let’s walk through configuring and deploying an example Spot blueprint.
Example tutorial
In this example tutorial, we use Spot Blueprints to configure an Apache Spark environment running on Amazon EMR, deploy the template as a CloudFormation stack, run a sample job, and then delete the CloudFormation stack.
First, we navigate to the EC2 Spot console and click on “Spot Blueprints”:
In the next screen, we select the EMR blueprint, and then click “Configure blueprint”:
A couple of notes here:
If you are in a hurry, you can grab a preconfigured template to get started quickly. The preconfigured template has default best practices in place and can be further customized as needed.
If you have a suggestion for a new blueprint you’d like to see, you can click on “Don’t see a blueprint you need?” to give us feedback. We’d love to hear from you!
In Step 1, we give the blueprint a name and configure permissions. We have the blueprint create new IAM roles to allow Amazon EMR and Amazon EC2 compute resources to make calls to the AWS APIs on your behalf:
We see a summary of resources that will be created, along with a code snippet preview. Unlike traditional wizards, you can see the code generated in every step!
In Step 2, the network is configured. We create a new VPC with public subnets in all Availability Zones in the Region. This is a Spot best practice because it increases the number of Spot capacity pools, which increases the possibility for EC2 to find and provision the required capacity:
We see a summary of resources that will be created, along with a code snippet preview. Here is the CloudFormation code for our template, where you can see the VPC creation, including all dependencies such as the internet gateway, route table, and subnets:
In Step 3, we configure the compute environment. Spot Blueprints makes this easy by allowing us to simply define our capacity units and required compute capacity for each type of node in our EMR cluster. In this walkthrough we will use vCPUs. We select 4 vCPUs for the core node capacity, and 12 vCPUs for the task node capacity. Based on these configurations, Spot Blueprints will apply best practices to the EMR cluster settings. These include using On-Demand Instances for core nodes since interruptions to core nodes can cause instability in the EMR cluster, and using Spot Instances for task nodes because the EMR cluster is typically more tolerant to task node interruptions. Finally, task nodes are provisioned using the capacity-optimized allocation strategy because it helps find the most optimal Spot capacity:
You’ll notice there is no need to spend time thinking about which instance types to select – Spot Blueprints takes our application’s minimum vCPUs per instance and minimum vCPU to memory ratio requirements and automatically selects the optimal instance types. This instance type selection process applies Spot best practices by a) using instance types across different families, generations, and sizes, and b) using the maximum number of instance types possible for core (5) and task (15) nodes:
Here is the CloudFormation code for our template. You can see the EMR cluster creation, the applications being installed (Spark, Hadoop, and Ganglia), the flexible list of instance types and Availability Zones, and the capacity-optimized allocation strategy enabled (along with all dependencies):
In Step 4, we have the option of enabling EMR managed scaling on the cluster. Enabling EMR managed scaling is a Spot best practice because this allows EMR to automatically increase or decrease the compute capacity in the cluster based on the workload, further optimizing performance and cost. EMR continuously evaluates cluster metrics to make scaling decisions that optimize the cluster for cost and speed. Spot Blueprints automatically configures minimum and maximum scaling values based on the compute requirements we defined in the previous step:
Here is the updated CloudFormation code for our template with managed scaling (ManagedScalingPolicy) enabled:
In Step 5, we can review and download the template code for further customization in either CloudFormation or Terraform format, review the instance configuration summary, and review a summary of the resources that would be created from the template. Spot Blueprints will also upload the CloudFormation template to an Amazon S3 bucket so we can deploy the template directly from the CloudFormation console or CLI. Let’s go ahead and click on “Deploy in CloudFormation,” copy the URL, and then click on “Deploy in CloudFormation” again:
Having copied the S3 URL, we go to the CloudFormation console to launch the CloudFormation stack:
We walk through the CloudFormation stack creation steps and the stack launches, creating all of the resources as configured in the blueprint. It takes roughly 15-20 minutes for our stack creation to complete:
Once the stack creation is complete, we navigate to the Amazon EMR console to view the EMR cluster configured with Spot best practices:
Next, let’s run a sample Spark application written in Python to calculate the value of pi on the cluster. We’ll do this by uploading the sample application code to an S3 bucket in our account and then adding a step to the cluster referencing the application location code with arguments:
The step runs and completes successfully:
The results of the calculation are sent to our S3 bucket as configured in the arguments:
{"tries":1600000,"hits":1256253,"pi":3.1406325}
Cleanup
Now that our job is done, we delete the CloudFormation stack in order to remove the AWS resources created. Please note that as a part of the EMR cluster creation, EMR creates some EC2 security groups that cannot be removed by CloudFormation since they were created by the EMR cluster and not by CloudFormation. As a result, the deletion of the CloudFormation stack will fail to delete the VPC on the first try. To solve this, we have the option of deleting the VPC manually by hand, or we can let the CloudFormation stack ignore the VPC and leave it (along with the security groups) in place for future use. We can then delete the CloudFormation stack a final time:
Conclusion
No matter if you are a first-time Spot user learning how to take advantage of the savings and scale offered by Spot Instances, or a veteran Spot user expanding your Spot usage into a new architecture, Spot Blueprints has you covered. We hope you enjoy using Spot Blueprints to quickly get you started generating example template frameworks for workloads like Kubernetes and Apache Spark with Spot best practices in place. Please tell us which blueprint you’d like to see next right in Spot Blueprints. We can’t wait to hear from you!
In this post, I explain how to use AWS Lambda layers and extensions with Lambda functions packaged and deployed as container images.
Previously, Lambda functions were packaged only as .zip archives. This includes functions created in the AWS Management Console. You can now also package and deploy Lambda functions as container images.
You can use familiar container tooling such as the Docker CLI with a Dockerfile to build, test, and tag images locally. Lambda functions built using container images can be up to 10 GB in size. You push images to an Amazon Elastic Container Registry (ECR) repository, a managed AWS container image registry service. You create your Lambda function, specifying the source code as the ECR image URL from the registry.
Lambda container image support
Lambda functions packaged as container images do not support adding Lambda layers to the function configuration. However, there are a number of solutions to use the functionality of Lambda layers with container images. You take on the responsible for packaging your preferred runtimes and dependencies as a part of the container image during the build process.
Understanding how Lambda layers and extensions work as .zip archives
If you deploy function code using a .zip archive, you can use Lambda layers as a distribution mechanism for libraries, custom runtimes, and other function dependencies.
When you include one or more layers in a function, during initialization, the contents of each layer are extracted in order to the /opt directory in the function execution environment. Each runtime then looks for libraries in a different location under /opt, depending on the language. You can include up to five layers per function, which count towards the unzipped deployment package size limit of 250 MB. Layers are automatically set as private, but they can be shared with other AWS accounts, or shared publicly.
Lambda Extensions are a way to augment your Lambda functions and are deployed as Lambda layers. You can use Lambda Extensions to integrate functions with your preferred monitoring, observability, security, and governance tools. You can choose from a broad set of tools provided by AWS, AWS Lambda Ready Partners, and AWS Partners, or create your own Lambda Extensions. For more information, see “Introducing AWS Lambda Extensions – In preview.”
Extensions can run in either of two modes, internal and external. An external extension runs as an independent process in the execution environment. They can start before the runtime process, and can continue after the function invocation is fully processed. Internal extensions run as part of the runtime process, in-process with your code.
Lambda searches the /opt/extensions directory and starts initializing any extensions found. Extensions must be executable as binaries or scripts. As the function code directory is read-only, extensions cannot modify function code.
It helps to understand that Lambda layers and extensions are just files copied into specific file paths in the execution environment during the function initialization. The files are read-only in the execution environment.
Understanding container images with Lambda
A container image is a packaged template built from a Dockerfile. The image is assembled or built from commands in the Dockerfile, starting from a parent or base image, or from scratch. Each command then creates a new layer in the image, which is stacked in order on top of the previous layer. Once built from the packaged template, a container image is immutable and read-only.
For Lambda, a container image includes the base operating system, the runtime, any Lambda extensions, your application code, and its dependencies. Lambda provides a set of open-source base images that you can use to build your container image. Lambda uses the image to construct the execution environment during function initialization. You can use the AWS Serverless Application Model (AWS SAM) CLI or native container tools such as the Docker CLI to build and test container images locally.
Using Lambda layers in container images
Container layers are added to a container image, similar to how Lambda layers are added to a .zip archive function.
There are a number of ways to use container image layering to add the functionality of Lambda layers to your Lambda function container images.
Use a container image version of a Lambda layer
A Lambda layer publisher may have a container image format equivalent of a Lambda layer. To maintain the same file path as Lambda layers, the published container images must have the equivalent files located in the /opt directory. An image containing an extension must include the files in the /opt/extensions directory.
An example Lambda function, packaged as a .zip archive, is created with two layers. One layer contains shared libraries, and the other layer is a Lambda extension from an AWS Partner.
The corresponding Dockerfile syntax for a function packaged as a container image includes the following lines. These pull the container image versions of the Lambda layers and copy them into the function image. The shared library image is pulled from ECR and the extension image is pulled from Docker Hub.
FROM public.ecr.aws/myrepo/shared-lib-layer:1 AS shared-lib-layer
# Layer code
WORKDIR /opt
COPY --from=shared-lib-layer /opt/ .
FROM aws-partner/extensions-layer:1 as extensions-layer
# Extension code
WORKDIR /opt/extensions
COPY --from=extensions-layer /opt/extensions/ .
Copy the contents of a Lambda layer into a container image
You can use existing Lambda layers, and copy the contents of the layers into the function container image /opt directory during docker build.
This creates a container image containing the existing Lambda layer and extension files. This can be pushed to ECR and used in a function.
Build a container image from a Lambda layer
You can repackage and publish Lambda layer file content as container images. Creating separate container images for different layers allows you to add them to multiple functions, and share them in a similar way as Lambda layers.
You can create a separate container image containing the files from a single layer, or combine the files from multiple layers into a single image. If you create separate container images for layer files, you then add these images into your function image.
There are two ways to manage language code dependencies. You can pre-build the dependencies and copy the files into the container image, or build the dependencies during docker build.
In this example, I migrate an existing Python application. This comprises a Lambda function and extension, from a .zip archive to separate function and extension container images. The extension writes logs to S3.
You can choose how to store images in repositories. You can either push both images to the same ECR repository with different image tags, or push to different repositories. In this example, I use separate ECR repositories.
To set up the example, visit the GitHub repo and follow the instructions in the README.md file.
The existing example extension uses a makefile to install boto3 using pip install with a requirements.txt file. This is migrated to the docker build process. I must add a Python runtime to be able to run pip install as part of the build process. I use python:3.8-alpine as a minimal base image.
I create separate Dockerfiles for the function and extension. The extension Dockerfile contains the following lines.
FROM python:3.8-alpine AS installer
#Layer Code
COPY extensionssrc /opt/
COPY extensionssrc/requirements.txt /opt/
RUN pip install -r /opt/requirements.txt -t /opt/extensions/lib
FROM scratch AS base
WORKDIR /opt/extensions
COPY --from=installer /opt/extensions .
I build, tag, login, and push the extension container image to an existing ECR repository.
The function Dockerfile contains the following lines, which add the files from the previously created extension image to the function image. There is no need to run pip install for the function as it does not require any additional dependencies.
FROM 123456789012.dkr.ecr.us-east-1.amazonaws.com/log-extension-image:latest AS layer
FROM public.ecr.aws/lambda/python:3.8
# Layer code
WORKDIR /opt
COPY --from=layer /opt/ .
# Function code
WORKDIR /var/task
COPY app.py .
CMD ["app.lambda_handler"]
I build, tag, and push the function container image to a separate existing ECR repository. This creates an immutable image of the Lambda function.
The function requires a unique S3 bucket to store the logs files, which I create in the S3 console. I create a Lambda function from the ECR repository image, and specify the bucket name as a Lambda environment variable.
For subsequent extension code changes, I need to update both the extension and function images. If only the function code changes, I need to update the function image. I push the function image as the :latest image to ECR. I then update the function code deployment to use the updated :latest ECR image.
With .zip archive functions, custom runtimes are added using Lambda layers. With container images, you no longer need to copy in Lambda layer code for custom runtimes.
You can build your own custom runtime images starting with AWS provided base images for custom runtimes. You can add your preferred runtime, dependencies, and code to these images. To communicate with Lambda, the image must implement the Lambda Runtime API. We provide Lambda runtime interface clients for all supported runtimes, or you can implement your own for additional runtimes.
Running extensions in container images
A Lambda extension running in a function packaged as a container image works in the same way as a .zip archive function. You build a function container image including the extension files, or adding an extension image layer. Lambda looks for any external extensions in the /opt/extensions directory and starts initializing them. Extensions must be executable as binaries or scripts.
Internal extensions modify the Lambda runtime startup behavior using language-specific environment variables, or wrapper scripts. For language-specific environment variables, you can set the following environment variables in your function configuration to augment the runtime command line.
JAVA_TOOL_OPTIONS (Java Corretto 8 and 11)
NODE_OPTIONS (Node.js 10 and 12)
DOTNET_STARTUP_HOOKS (.NET Core 3.1)
An example Lambda environment variable for JAVA_TOOL_OPTIONS:
-javaagent:"/opt/ExampleAgent-0.0.jar"
Wrapper scripts delegate the runtime start-up to a script. The script can inject and alter arguments, set environment variables, or capture metrics, errors, and other diagnostic information. The following runtimes support wrapper scripts: Node.js 10 and 12, Python 3.8, Ruby 2.7, Java 8 and 11, and .NET Core 3.1
You specify the script by setting the value of the AWS_LAMBDA_EXEC_WRAPPER environment variable as the file system path of an executable binary or script, for example:
/opt/wrapper_script
Conclusion
You can now package and deploy Lambda functions as container images in addition to .zip archives. Lambda functions packaged as container images do not directly support adding Lambda layers to the function configuration as .zip archives do.
In this post, I show a number of solutions to use the functionality of Lambda layers and extensions with container images, including example Dockerfiles.
I show how to migrate an existing Lambda function and extension from a .zip archive to separate function and extension container images. Follow the instructions in the README.md file in the GitHub repository.
With AWS Lambda, you upload your code and run it without thinking about servers. Many customers enjoy the way this works, but if you’ve invested in container tooling for your development workflows, it’s not easy to use the same approach to build applications using Lambda.
To help you with that, you can now package and deploy Lambda functions as container images of up to 10 GB in size. In this way, you can also easily build and deploy larger workloads that rely on sizable dependencies, such as machine learning or data intensive workloads. Just like functions packaged as ZIP archives, functions deployed as container images benefit from the same operational simplicity, automatic scaling, high availability, and native integrations with many services.
We are providing base images for all the supported Lambda runtimes (Python, Node.js, Java, .NET, Go, Ruby) so that you can easily add your code and dependencies. We also have base images for custom runtimes based on Amazon Linux that you can extend to include your own runtime implementing the Lambda Runtime API.
You can deploy your own arbitrary base images to Lambda, for example images based on Alpine or Debian Linux. To work with Lambda, these images must implement the Lambda Runtime API. To make it easier to build your own base images, we are releasing Lambda Runtime Interface Clients implementing the Runtime API for all supported runtimes. These implementations are available via native package managers, so that you can easily pick them up in your images, and are being shared with the community using an open source license.
We are also releasing as open source a Lambda Runtime Interface Emulator that enables you to perform local testing of the container image and check that it will run when deployed to Lambda. The Lambda Runtime Interface Emulator is included in all AWS-provided base images and can be used with arbitrary images as well.
Your container images can also use the Lambda Extensions API to integrate monitoring, security and other tools with the Lambda execution environment.
To deploy a container image, you select one from an Amazon Elastic Container Registryrepository. Let’s see how this works in practice with a couple of examples, first using an AWS-provided image for Node.js, and then building a custom image for Python.
Using the AWS-Provided Base Image for Node.js Here’s the code (app.js) for a simple Node.js Lambda function generating a PDF file using the PDFKit module. Each time it is invoked, it creates a new mail containing random data generated by the faker.js module. The output of the function is using the syntax of the Amazon API Gateway to return the PDF file.
I use npm to initialize the package and add the three dependencies I need in the package.json file. In this way, I also create the package-lock.json file. I am going to add it to the container image to have a more predictable result.
Now, I create a Dockerfile to create the container image for my Lambda function, starting from the AWS provided base image for the nodejs12.x runtime:
FROM amazon/aws-lambda-nodejs:12
COPY app.js package*.json ./
RUN npm install
CMD [ "app.lambdaHandler" ]
The Dockerfile is adding the source code (app.js) and the files describing the package and the dependencies (package.json and package-lock.json) to the base image. Then, I run npm to install the dependencies. I set the CMD to the function handler, but this could also be done later as a parameter override when configuring the Lambda function.
I use the Docker CLI to build the random-letter container image locally:
$ docker build -t random-letter .
To check if this is working, I start the container image locally using the Lambda Runtime Interface Emulator:
$ docker run -p 9000:8080 random-letter:latest
Now, I test a function invocation with cURL. Here, I am passing an empty JSON payload.
If there are errors, I can fix them locally. When it works, I move to the next step.
To upload the container image, I create a new ECR repository in my account and tag the local image to push it to ECR. To help me identify software vulnerabilities in my container images, I enable ECR image scanning.
In the Lambda console, I click on Create function. I select Container image, give the function a name, and then Browse images to look for the right image in my ECR repositories.
After I select the repository, I use the latest image I uploaded. When I select the image, the Lambda is translating that to the underlying image digest (on the right of the tag in the image below). You can see the digest of your images locally with the docker images --digests command. In this way, the function is using the same image even if the latest tag is passed to a newer one, and you are protected from unintentional deployments. You can update the image to use in the function code. Updating the function configuration has no impact on the image used, even if the tag was reassigned to another image in the meantime.
Optionally, I can override some of the container image values. I am not doing this now, but in this way I can create images that can be used for different functions, for example by overriding the function handler in the CMD value.
I leave all other options to their default and select Create function.
When creating or updating the code of a function, the Lambda platform optimizes new and updated container images to prepare them to receive invocations. This optimization takes a few seconds or minutes, depending on the size of the image. After that, the function is ready to be invoked. I test the function in the console.
It’s working! Now let’s add the API Gateway as trigger. I select Add Trigger and add the API Gateway using an HTTP API. For simplicity, I leave the authentication of the API open.
Now, I click on the API endpoint a few times and download a few random mails.
It works as expected! Here are a few of the PDF files that are generated with random data from the faker.js module.
Building a Custom Image for Python Sometimes you need to use your custom container images, for example to follow your company guidelines or to use a runtime version that we don’t support.
In this case, I want to build an image to use Python 3.9. The code (app.py) of my function is very simple, I just want to say hello and the version of Python that is being used.
import sys
def handler(event, context):
return 'Hello from AWS Lambda using Python' + sys.version + '!'
As I mentioned before, we are sharing with you open source implementations of the Lambda Runtime Interface Clients (which implement the Runtime API) for all the supported runtimes. In this case, I start with a Python image based on Alpine Linux. Then, I add the Lambda Runtime Interface Client for Python (link coming soon) to the image. Here’s the Dockerfile:
# Define global args
ARG FUNCTION_DIR="/home/app/"
ARG RUNTIME_VERSION="3.9"
ARG DISTRO_VERSION="3.12"
# Stage 1 - bundle base image + runtime
# Grab a fresh copy of the image and install GCC
FROM python:${RUNTIME_VERSION}-alpine${DISTRO_VERSION} AS python-alpine
# Install GCC (Alpine uses musl but we compile and link dependencies with GCC)
RUN apk add --no-cache \
libstdc++
# Stage 2 - build function and dependencies
FROM python-alpine AS build-image
# Install aws-lambda-cpp build dependencies
RUN apk add --no-cache \
build-base \
libtool \
autoconf \
automake \
libexecinfo-dev \
make \
cmake \
libcurl
# Include global args in this stage of the build
ARG FUNCTION_DIR
ARG RUNTIME_VERSION
# Create function directory
RUN mkdir -p ${FUNCTION_DIR}
# Copy handler function
COPY app/* ${FUNCTION_DIR}
# Optional – Install the function's dependencies
# RUN python${RUNTIME_VERSION} -m pip install -r requirements.txt --target ${FUNCTION_DIR}
# Install Lambda Runtime Interface Client for Python
RUN python${RUNTIME_VERSION} -m pip install awslambdaric --target ${FUNCTION_DIR}
# Stage 3 - final runtime image
# Grab a fresh copy of the Python image
FROM python-alpine
# Include global arg in this stage of the build
ARG FUNCTION_DIR
# Set working directory to function root directory
WORKDIR ${FUNCTION_DIR}
# Copy in the built dependencies
COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}
# (Optional) Add Lambda Runtime Interface Emulator and use a script in the ENTRYPOINT for simpler local runs
COPY https://github.com/aws/aws-lambda-runtime-interface-emulator/releases/latest/download/aws-lambda-rie /usr/bin/aws-lambda-rie
RUN chmod 755 /usr/bin/aws-lambda-rie
COPY entry.sh /
ENTRYPOINT [ "/entry.sh" ]
CMD [ "app.handler" ]
The Dockerfile this time is more articulated, building the final image in three stages, following the Docker best practices of multi-stage builds. You can use this three-stage approach to build your own custom images:
Stage 1 is building the base image with the runtime, Python 3.9 in this case, plus GCC that we use to compile and link dependencies in stage 2.
Stage 2 is installing the Lambda Runtime Interface Client and building function and dependencies.
Stage 3 is creating the final image adding the output from stage 2 to the base image built in stage 1. Here I am also adding the Lambda Runtime Interface Emulator, but this is optional, see below.
I create the entry.sh script below to use it as ENTRYPOINT. It executes the Lambda Runtime Interface Client for Python. If the execution is local, the Runtime Interface Client is wrapped by the Lambda Runtime Interface Emulator.
#!/bin/sh
if [ -z "${AWS_LAMBDA_RUNTIME_API}" ]; then
exec /usr/bin/aws-lambda-rie /usr/local/bin/python -m awslambdaric
else
exec /usr/local/bin/python -m awslambdaric
fi
Now, I can use the Lambda Runtime Interface Emulator to check locally if the function and the container image are working correctly:
$ docker run -p 9000:8080 lambda/python:3.9-alpine3.12
Not Including the Lambda Runtime Interface Emulator in the Container Image It’s optional to add the Lambda Runtime Interface Emulator to a custom container image. If I don’t include it, I can test locally by installing the Lambda Runtime Interface Emulator in my local machine following these steps:
In Stage 3 of the Dockerfile, I remove the commands copying the Lambda Runtime Interface Emulator (aws-lambda-rie) and the entry.sh script. I don’t need the entry.sh script in this case.
I use this ENTRYPOINT to start by default the Lambda Runtime Interface Client: ENTRYPOINT [ "/usr/local/bin/python", “-m”, “awslambdaric” ]
I run these commands to install the Lambda Runtime Interface Emulator in my local machine, for example under ~/.aws-lambda-rie:
When the Lambda Runtime Interface Emulator is installed on my local machine, I can mount it when starting the container. The command to start the container locally now is (assuming the Lambda Runtime Interface Emulator is at ~/.aws-lambda-rie):
"Hello from AWS Lambda using Python3.9.0 (default, Oct 22 2020, 05:03:39) \n[GCC 9.3.0]!"
I push the image to ECR and create the function as before. Here’s my test in the console:
My custom container image based on Alpine is running Python 3.9 on Lambda!
Available Now You can use container images to deploy your Lambda functions today in US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Tokyo), Asia Pacific (Singapore), Europe (Ireland), Europe (Frankfurt), South America (São Paulo). We are working to add support in more Regions soon. The container image support is offered in addition to ZIP archives and we will continue to support the ZIP packaging format.
This new capability opens up new scenarios, simplifies the integration with your development pipeline, and makes it easier to use custom images and your favorite programming platforms to build serverless applications.
Today, we are excited to announce the public preview of AWS Proton, a new service that helps you automate and manage infrastructure provisioning and code deployments for serverless and container-based applications.
Maintaining hundreds – or sometimes thousands – of microservices with constantly changing infrastructure resources and configurations is a challenging task for even the most capable teams.
AWS Proton enables infrastructure teams to define standard templates centrally and make them available for developers in their organization. This allows infrastructure teams to manage and update infrastructure without impacting developer productivity.
How AWS Proton Works The process of defining a service template involves the definition of cloud resources, continuous integration and continuous delivery (CI/CD) pipelines, and observability tools. AWS Proton will integrate with commonly used CI/CD pipelines and observability tools such as CodePipeline and CloudWatch. It also provides curated templates that follow AWS best practices for common use cases such as web services running on AWS Fargate or stream processing apps built on AWS Lambda.
Infrastructure teams can visualize and manage the list of service templates in the AWS Management Console.
This is what the list of templates looks like.
AWS Proton also collects information about the deployment status of the application such as the last date it was successfully deployed. When a template changes, AWS Proton identifies all the existing applications using the old version and allows infrastructure teams to upgrade them to the most recent definition, while monitoring application health during the upgrade so it can be rolled-back in case of issues.
This is what a service template looks like, with its versions and running instances.
Once service templates have been defined, developers can select and deploy services in a self-service fashion. AWS Proton will take care of provisioning cloud resources, deploying the code, and health monitoring, while providing visibility into the status of all the deployed applications and their pipelines.
This way, developers can focus on building and shipping application code for serverless and container-based applications without having to learn, configure, and maintain the underlying resources.
This is what the list of deployed services looks like.
Available in Preview AWS Proton is now available in preview in US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Tokyo), and Europe (Ireland); it’s free of charge, as you only pay for the underlying services and resources. Check out the technical documentation.
ECR Public allows you to store, manage, share, and deploy container images for anyone to discover and download globally. You have long been able to host private container images on AWS with Amazon Elastic Container Registry, and now with the release of ECR Public, you can host public ones too, enabling anyone (with or without an AWS account) to browse and pull your published container artifacts.
As part of the announcement, a new website allows you to browse and search for public container images, view developer provided details, and discover the commands you need to pull containers.
If you check the website out now, you will see we have hosted some of our container images on the registry, including including the Amazon EKS Distro images. We also have hundreds of images from partners such as Bitnami, Canonical and HashiCorp.
Publishing A Container Before I can upload a container, I need to create a repository. There is now a Public tab in the Repositories section of the Elastic Container Registry console. In this tab, I click the button Create repository.
I am taken to the Create repository screen, where I select that I would like the repository to be public.
I give my repository the name news-blog and upload a logo that will be used on the gallery details page. I provide a description and select Linux as the Content type, There is also an option to select the CPU architectures that the image supports, if you have a multi-architecture image you can select more than one architecture type. The content types are used for filtering on the gallery website, this will enable people using the gallery to filter their searches by architecture and operating system support.
I enter some markdown in the About section. The text I enter here will be displayed on the gallery page and would ordinarily explain what’s included in the repository, any licensing details, or other relevant information. There is also a Usage section where I enter some sample text to explain how to use my image and that I created the repository for a demo. Finally, I click Create repository.
Back at the repository page, I now have one public repository. There is also a button that says View push commands. I click on this so I can learn more about how to push containers to the repository.
I follow the four steps contained in this section. The first step helps me authenticate with ECR Public so that I can push containers to my repository. Steps two, three, and four show me how to build, tag, and push my container to ECR Public.
The application that I have containerized is a simple app that runs and outputs a terminal message. I use the docker CLI to push my container to my repository, it’s quite a small container, so it only takes a minute or two.
Once complete, I head over to the ECR Public gallery and can see that my image is now publicly available for anyone to pull.
Pulling A Container You pull containers from ECR Public using the familiar docker pull command with the URL of the image.
You can easily find this URL on the ECR Public website, where the image URL is displayed along with other published information. Clicking on the URL copies the image URL to your clipboard for an easy copy-paste.
ECR Public image URLs are in the format public.ecr.aws/<namespace>/<image>:<tag>
For example, if I wanted to pull the image I uploaded earlier, I would open my terminal and run the following command (please note: you will need docker installed locally to run these commands).
I have now download the Docker Image onto my machine, and I can run the container using the following command:
docker run public.ecr.aws/r6g9m2o3/news-blog:latest
My application runs and writes a message from Jeff Barr. If you are wondering about the switches and parameters I have used on my docker run command, it’s to make sure that the log is written in color because we wouldn’t want to miss out on Jeff’s glorious purple hair.
Nice to Know ECR Public automatically replicates container images across two AWS Regions to reduce download times and improve availability. Therefore, using public images directly from ECR Public may simplify your build process if you were previously creating and managing local copies. ECR Public caches image layers in Amazon CloudFront, to improve pull performance for a global audience, especially for popular images.
ECR Public also has some nice integrations that will make working with containers easier on AWS. For example, ECR Public will automatically notify services such as AWS CodeBuild to rebuild an application when a public container image changes.
Pricing All AWS customers will get 50 GB of free storage each month, and if you exceed that limit, you will pay nominal charges. Check out the pricing page for details.
Anyone who pulls images anonymously will get 500 GB of free data bandwidth each month, after which they can sign up or sign in to an AWS account to get more. Simply authenticating with an AWS account increases free data bandwidth up to 5 TB each month when pulling images from the internet.
Finally, workloads running in AWS will get unlimited data bandwidth from any region when pulling publicly shared images from ECR Public.
Verification and Namespaces You can create a custom namespace such as your organization or project name to be used in a ECR Public URL subdomain unless it’s a reserved namespace. Namespaces such as sagemaker and eks that identify AWS services are reserved. Namespaces that identify AWS Marketplace sellers are also reserved.
Available Today ECR Public is available today and you can find out more over on the product page. Visit the gallery to explore and use available public container software and log into the ECR Console to share containers publicly.
Micro Focus – AWS Advanced Technology Parnter, they are a global infrastructure software company with 40 years of experience in delivering and supporting enterprise software.
We have seen mainframe customers often encounter scalability constraints, and they can’t support their development and test workforce to the scale required to support business requirements. These constraints can lead to delays, reduce product or feature releases, and make them unable to respond to market requirements. Furthermore, limits in capacity and scale often affect the quality of changes deployed, and are linked to unplanned or unexpected downtime in products or services.
The conventional approach to address these constraints is to scale up, meaning to increase MIPS/MSU capacity of the mainframe hardware available for development and testing. The cost of this approach, however, is excessively high, and to ensure time to market, you may reject this approach at the expense of quality and functionality. If you’re wrestling with these challenges, this post is written specifically for you.
In the APG, we introduce DevOps automation and AWS CI/CD architecture to support mainframe application development. Our solution enables you to embrace both Test Driven Development (TDD) and Behavior Driven Development (BDD). Mainframe developers and testers can automate the tests in CI/CD pipelines so they’re repeatable and scalable. To speed up automated mainframe application tests, the solution uses team pipelines to run functional and integration tests frequently, and uses systems test pipelines to run comprehensive regression tests on demand. For more information about the pipelines, see Mainframe Modernization: DevOps on AWS with Micro Focus.
In this post, we focus on how to automate and scale mainframe application tests in AWS. We show you how to use AWS services and Micro Focus products to automate mainframe application tests with best practices. The solution can scale your mainframe application CI/CD pipeline to run thousands of tests in AWS within minutes, and you only pay a fraction of your current on-premises cost.
The following diagram illustrates the solution architecture.
Figure: Mainframe DevOps On AWS Architecture Overview
Best practices
Before we get into the details of the solution, let’s recap the following mainframe application testing best practices:
Create a “test first” culture by writing tests for mainframe application code changes
Automate preparing and running tests in the CI/CD pipelines
Provide fast and quality feedback to project management throughout the SDLC
Assess and increase test coverage
Scale your test’s capacity and speed in line with your project schedule and requirements
Automated smoke test
In this architecture, mainframe developers can automate running functional smoke tests for new changes. This testing phase typically “smokes out” regression of core and critical business functions. You can achieve these tests using tools such as py3270 with x3270 or Robot Framework Mainframe 3270 Library.
The following code shows a feature test written in Behave and test step using py3270:
# home_loan_calculator.feature
Feature: calculate home loan monthly repayment
the bankdemo application provides a monthly home loan repayment caculator
User need to input into transaction of home loan amount, interest rate and how many years of the loan maturity.
User will be provided an output of home loan monthly repayment amount
Scenario Outline: As a customer I want to calculate my monthly home loan repayment via a transaction
Given home loan amount is <amount>, interest rate is <interest rate> and maturity date is <maturity date in months> months
When the transaction is submitted to the home loan calculator
Then it shall show the monthly repayment of <monthly repayment>
Examples: Homeloan
| amount | interest rate | maturity date in months | monthly repayment |
| 1000000 | 3.29 | 300 | $4894.31 |
# home_loan_calculator_steps.py
import sys, os
from py3270 import Emulator
from behave import *
@given("home loan amount is {amount}, interest rate is {rate} and maturity date is {maturity_date} months")
def step_impl(context, amount, rate, maturity_date):
context.home_loan_amount = amount
context.interest_rate = rate
context.maturity_date_in_months = maturity_date
@when("the transaction is submitted to the home loan calculator")
def step_impl(context):
# Setup connection parameters
tn3270_host = os.getenv('TN3270_HOST')
tn3270_port = os.getenv('TN3270_PORT')
# Setup TN3270 connection
em = Emulator(visible=False, timeout=120)
em.connect(tn3270_host + ':' + tn3270_port)
em.wait_for_field()
# Screen login
em.fill_field(10, 44, 'b0001', 5)
em.send_enter()
# Input screen fields for home loan calculator
em.wait_for_field()
em.fill_field(8, 46, context.home_loan_amount, 7)
em.fill_field(10, 46, context.interest_rate, 7)
em.fill_field(12, 46, context.maturity_date_in_months, 7)
em.send_enter()
em.wait_for_field()
# collect monthly replayment output from screen
context.monthly_repayment = em.string_get(14, 46, 9)
em.terminate()
@then("it shall show the monthly repayment of {amount}")
def step_impl(context, amount):
print("expected amount is " + amount.strip() + ", and the result from screen is " + context.monthly_repayment.strip())
assert amount.strip() == context.monthly_repayment.strip()
To run this functional test in Micro Focus Enterprise Test Server (ETS), we use AWS CodeBuild.
In the CodeBuild project, we need to create a buildspec to orchestrate the commands for preparing the Micro Focus Enterprise Test Server CICS environment and issue the test command. In the buildspec, we define the location for CodeBuild to look for test reports and upload them into the CodeBuild report group. The following buildspec code uses custom scripts DeployES.ps1 and StartAndWait.ps1 to start your CICS region, and runs Python Behave BDD tests:
version: 0.2
phases:
build:
commands:
- |
# Run Command to start Enterprise Test Server
CD C:\
.\DeployES.ps1
.\StartAndWait.ps1
py -m pip install behave
Write-Host "waiting for server to be ready ..."
do {
Write-Host "..."
sleep 3
} until(Test-NetConnection 127.0.0.1 -Port 9270 | ? { $_.TcpTestSucceeded } )
CD C:\tests\features
MD C:\tests\reports
$Env:Path += ";c:\wc3270"
$address=(Get-NetIPAddress -AddressFamily Ipv4 | where { $_.IPAddress -Match "172\.*" })
$Env:TN3270_HOST = $address.IPAddress
$Env:TN3270_PORT = "9270"
behave.exe --color --junit --junit-directory C:\tests\reports
reports:
bankdemo-bdd-test-report:
files:
- '**/*'
base-directory: "C:\\tests\\reports"
In the smoke test, the team may run both unit tests and functional tests. Ideally, these tests are better to run in parallel to speed up the pipeline. In AWS CodePipeline, we can set up a stage to run multiple steps in parallel. In our example, the pipeline runs both BDD tests and Robot Framework (RPA) tests.
The following CloudFormation code snippet runs two different tests. You use the same RunOrder value to indicate the actions run in parallel.
The following screenshot shows the example actions on the CodePipeline console that use the preceding code.
Figure – Screenshot of CodePipeine parallel execution tests
Both DBB and RPA tests produce jUnit format reports, which CodeBuild can ingest and show on the CodeBuild console. This is a great way for project management and business users to track the quality trend of an application. The following screenshot shows the CodeBuild report generated from the BDD tests.
Figure – CodeBuild report generated from the BDD tests
Automated regression tests
After you test the changes in the project team pipeline, you can automatically promote them to another stream with other team members’ changes for further testing. The scope of this testing stream is significantly more comprehensive, with a greater number and wider range of tests and higher volume of test data. The changes promoted to this stream by each team member are tested in this environment at the end of each day throughout the life of the project. This provides a high-quality delivery to production, with new code and changes to existing code tested together with hundreds or thousands of tests.
In enterprise architecture, it’s commonplace to see an application client consuming web services APIs exposed from a mainframe CICS application. One approach to do regression tests for mainframe applications is to use Micro Focus Verastream Host Integrator (VHI) to record and capture 3270 data stream processing and encapsulate these 3270 data streams as business functions, which in turn are packaged as web services. When these web services are available, they can be consumed by a test automation product, which in our environment is Micro Focus UFT One. This uses the Verastream server as the orchestration engine that translates the web service requests into 3270 data streams that integrate with the mainframe CICS application. The application is deployed in Micro Focus Enterprise Test Server.
The following diagram shows the end-to-end testing components.
Figure – Regression Test Infrastructure end-to-end Setup
To ensure we have the coverage required for large mainframe applications, we sometimes need to run thousands of tests against very large production volumes of test data. We want the tests to run faster and complete as soon as possible so we reduce AWS costs—we only pay for the infrastructure when consuming resources for the life of the test environment when provisioning and running tests.
Therefore, the design of the test environment needs to scale out. The batch feature in CodeBuild allows you to run tests in batches and in parallel rather than serially. Furthermore, our solution needs to minimize interference between batches, a failure in one batch doesn’t affect another running in parallel. The following diagram depicts the high-level design, with each batch build running in its own independent infrastructure. Each infrastructure is launched as part of test preparation, and then torn down in the post-test phase.
Figure – Regression Tests in CodeBuoild Project setup to use batch mode
Building and deploying regression test components
Following the design of the parallel regression test environment, let’s look at how we build each component and how they are deployed. The followings steps to build our regression tests use a working backward approach, starting from deployment in the Enterprise Test Server:
Create a batch build in CodeBuild.
Deploy to Enterprise Test Server.
Deploy the VHI model.
Deploy UFT One Tests.
Integrate UFT One into CodeBuild and CodePipeline and test the application.
Creating a batch build in CodeBuild
We update two components to enable a batch build. First, in the CodePipeline CloudFormation resource, we set BatchEnabled to be true for the test stage. The UFT One test preparation stage uses the CloudFormation template to create the test infrastructure. The following code is an example of the AWS CloudFormation snippet with batch build enabled:
Second, in the buildspec configuration of the test stage, we provide a build matrix setting. We use the custom environment variable TEST_BATCH_NUMBER to indicate which set of tests runs in each batch. See the following code:
After setting up the batch build, CodeBuild creates multiple batches when the build starts. The following screenshot shows the batches on the CodeBuild console.
Figure – Regression tests Codebuild project ran in batch mode
Deploying to Enterprise Test Server
ETS is the transaction engine that processes all the online (and batch) requests that are initiated through external clients, such as 3270 terminals, web services, and websphere MQ. This engine provides support for various mainframe subsystems, such as CICS, IMS TM and JES, as well as code-level support for COBOL and PL/I. The following screenshot shows the Enterprise Test Server administration page.
Figure – Enterprise Server Administrator window
In this mainframe application testing use case, the regression tests are CICS transactions, initiated from 3270 requests (encapsulated in a web service). For more information about Enterprise Test Server, see the Enterprise Test Server and Micro Focus websites.
In the regression pipeline, after the stage of mainframe artifact compiling, we bake in the artifact into an ETS Docker container and upload the image to an Amazon ECR repository. This way, we have an immutable artifact for all the tests.
During each batch’s test preparation stage, a CloudFormation stack is deployed to create an Amazon ECS service on Windows EC2. The stack uses a Network Load Balancer as an integration point for the VHI’s integration.
The following code is an example of the CloudFormation snippet to create an Amazon ECS service using an Enterprise Test Server Docker image:
In this architecture, the VHI is a bridge between mainframe and clients.
We use the VHI designer to capture the 3270 data streams and encapsulate the relevant data streams into a business function. We can then deliver this function as a web service that can be consumed by a test management solution, such as Micro Focus UFT One.
The following screenshot shows the setup for getCheckingDetails in VHI. Along with this procedure we can also see other procedures (eg calcCostLoan) defined that get generated as a web service. The properties associated with this procedure are available on this screen to allow for the defining of the mapping of the fields between the associated 3270 screens and exposed web service.
Figure – Setup for getCheckingDetails in VHI
The following screenshot shows the editor for this procedure and is initiated by the selection of the Procedure Editor. This screen presents the 3270 screens that are involved in the business function that will be generated as a web service.
Figure – VHI designer Procedure Editor shows the procedure
After you define the required functional web services in VHI designer, the resultant model is saved and deployed into a VHI Docker image. We use this image and the associated model (from VHI designer) in the pipeline outlined in this post.
For more information about VHI, see the VHI website.
The pipeline contains two steps to deploy a VHI service. First, it installs and sets up the VHI models into a VHI Docker image, and it’s pushed into Amazon ECR. Second, a CloudFormation stack is deployed to create an Amazon ECS Fargate service, which uses the latest built Docker image. In AWS CloudFormation, the VHI ECS task definition defines an environment variable for the ETS Network Load Balancer’s DNS name. Therefore, the VHI can bootstrap and point to an ETS service. In the VHI stack, it uses a Network Load Balancer as an integration point for UFT One test integration.
The following code is an example of a ECS Task Definition CloudFormation snippet that creates a VHI service in Amazon ECS Fargate and integrates it with an ETS server:
UFT One is a test client that uses each of the web services created by the VHI designer to orchestrate running each of the associated business functions. Parameter data is supplied to each function, and validations are configured against the data returned. Multiple test suites are configured with different business functions with the associated data.
The following screenshot shows the test suite API_Bankdemo3, which is used in this regression test process.
Figure – API_Bankdemo3 in UFT One Test Editor Console
The last step is to integrate UFT One into CodeBuild and CodePipeline to test our mainframe application. First, we set up CodeBuild to use a UFT One container. The Docker image is available in Docker Hub. Then we author our buildspec. The buildspec has the following three phrases:
Setting up a UFT One license and deploying the test infrastructure
Starting the UFT One test suite to run regression tests
Tearing down the test infrastructure after tests are complete
The following code is an example of a buildspec snippet in the pre_build stage. The snippet shows the command to activate the UFT One license:
Because ETS and VHI are both deployed with a load balancer, the build detects when the load balancers become healthy before starting the tests. The following AWS CLI commands detect the load balancer’s target group health:
The next release of Micro Focus UFT One (November or December 2020) will provide an exit status to indicate a test’s success or failure.
When the tests are complete, the post_build stage tears down the test infrastructure. The following AWS CLI command tears down the CloudFormation stack:
At the end of the build, the buildspec is set up to upload UFT One test reports as an artifact into Amazon Simple Storage Service (Amazon S3). The following screenshot is the example of a test report in HTML format generated by UFT One in CodeBuild and CodePipeline.
Figure – UFT One HTML report
A new release of Micro Focus UFT One will provide test report formats supported by CodeBuild test report groups.
Conclusion
In this post, we introduced the solution to use Micro Focus Enterprise Suite, Micro Focus UFT One, Micro Focus VHI, AWS developer tools, and Amazon ECS containers to automate provisioning and running mainframe application tests in AWS at scale.
The on-demand model allows you to create the same test capacity infrastructure in minutes at a fraction of your current on-premises mainframe cost. It also significantly increases your testing and delivery capacity to increase quality and reduce production downtime.
A demo of the solution is available in AWS Partner Micro Focus website AWS Mainframe CI/CD Enterprise Solution. If you’re interested in modernizing your mainframe applications, please visit Micro Focus and contact AWS mainframe business development at [email protected].
Peter has been with Micro Focus for almost 30 years, in a variety of roles and geographies including Technical Support, Channel Sales, Product Management, Strategic Alliances Management and Pre-Sales, primarily based in Europe but for the last four years in Australia and New Zealand. In his current role as Pre-Sales Manager, Peter is charged with driving and supporting sales activity within the Application Modernization and Connectivity team, based in Melbourne.
Leo Ervin
Leo Ervin is a Senior Solutions Architect working with Micro Focus Enterprise Solutions working with the ANZ team. After completing a Mathematics degree Leo started as a PL/1 programming with a local insurance company. The next step in Leo’s career involved consulting work in PL/1 and COBOL before he joined a start-up company as a technical director and partner. This company became the first distributor of Micro Focus software in the ANZ region in 1986. Leo’s involvement with Micro Focus technology has continued from this distributorship through to today with his current focus on cloud strategies for both DevOps and re-platform implementations.
Kevin Yung
Kevin is a Senior Modernization Architect in AWS Professional Services Global Mainframe and Midrange Modernization (GM3) team. Kevin currently is focusing on leading and delivering mainframe and midrange applications modernization for large enterprise customers.
This post provides a clear path for customers who are evaluating and adopting Graviton2 instance types for performance improvements and cost-optimization.
Graviton2 processors are custom designed by AWS using 64-bit Arm Neoverse N1 cores. They power the T4g*, M6g*, R6g*, and C6g* Amazon Elastic Compute Cloud (Amazon EC2) instance types and offer up to 40% better price performance over the current generation of x86-based instances in a variety of workloads, such as high-performance computing, application servers, media transcoding, in-memory caching, gaming, and more.
More and more customers want to make the move to Graviton2 to take advantage of these performance optimizations while saving money.
During the transition process, a great benefit AWS provides is the ability to perform native builds for each architecture, instead of attempting to cross-compile on homogenous hardware. This has the benefit of decreasing build time as well as reducing complexity and cost to set up.
To see this benefit in action, we look at how to build a CI/CD pipeline using AWS CodePipeline and AWS CodeBuild that can build multi-architecture Docker images in parallel to aid you in evaluating and migrating to Graviton2.
Solution overview
With CodePipeline and CodeBuild, we can automate the creation of architecture-specific Docker images, which can be pushed to Amazon Elastic Container Registry (Amazon ECR). The following diagram illustrates this architecture.
The steps in this process are as follows:
Create a sample Node.js application and associated Dockerfile.
Create the buildspec files that contain the commands that CodeBuild runs.
Create three CodeBuild projects to automate each of the following steps:
CodeBuild for x86 – Creates a x86 Docker image and pushes to Amazon ECR.
CodeBuild for arm64 – Creates a Arm64 Docker image and pushes to Amazon ECR.
CodeBuild for manifest list – Creates a Docker manifest list, annotates the list, and pushes to Amazon ECR.
Automate the orchestration of these projects with CodePipeline.
Prerequisites
The prerequisites for this solution are as follows:
The correct AWS Identity and Access Management (IAM) role permissions for your account allowing for the creation of the CodePipeline pipeline, CodeBuild projects, and Amazon ECR repositories
An Amazon ECR repository named multi-arch-test
A source control service such as AWS CodeCommit or GitHub that CodeBuild and CodePipeline can interact with
Creating a sample Node.js application and associated Dockerfile
For this post, we create a sample “Hello World” application that self-reports the processor architecture. We work in the local folder that is cloned from our source repository as specified in the prerequisites.
In your preferred text editor, add a new file with the following Node.js code:
# Hello World sample app.
const http = require('http');
const port = 3000;
const server = http.createServer((req, res) => {
res.statusCode = 200;
res.setHeader('Content-Type', 'text/plain');
res.end(`Hello World. This processor architecture is ${process.arch}`);
});
server.listen(port, () => {
console.log(`Server running on processor architecture ${process.arch}`);
});
Save the file in the root of your source repository and name it app.js.
Commit the changes to Git and push the changes to our source repository. See the following code:
We also need to create a sample Dockerfile that instructs the docker build command how to build the Docker images. We use the default Node.js image tag for version 14.
In a text editor, add a new file with the following code:
# Sample nodejs application
FROM node:14
WORKDIR /usr/src/app
COPY package*.json app.js ./
RUN npm install
EXPOSE 3000
CMD ["node", "app.js"]
Save the file in the root of the source repository and name it Dockerfile. Make sure it is Dockerfile with no extension.
Commit the changes to Git and push the changes to our source repository:
git add .
git commit -m "Adding Dockerfile to host the Node.js sample application."
git push
Creating a build specification file for your application
It’s time to create and add a buildspec file to our source repository. We want to use a single buildspec.yml file for building, tagging, and pushing the Docker images to Amazon ECR for both target native architectures, x86, and Arm64. We use CodeBuild to inject environment variables, some of which need to be changed for each architecture (such as image tag and image architecture).
A buildspec is a collection of build commands and related settings, in YAML format, that CodeBuild uses to run a build. For more information, see Build specification reference for CodeBuild.
The buildspec we add instructs CodeBuild to do the following:
install phase – Update the yum package manager
pre_build phase – Sign in to Amazon ECR using the IAM role assumed by CodeBuild
build phase – Build the Docker image using the Docker CLI and tag the newly created Docker image
post_build phase – Push the Docker image to our Amazon ECR repository
We first need to add the buildspec.yml file to our source repository.
In a text editor, add a new file with the following build specification:
version: 0.2
phases:
install:
commands:
- yum update -y
pre_build:
commands:
- echo Logging in to Amazon ECR...
- $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION)
build:
commands:
- echo Build started on `date`
- echo Building the Docker image...
- docker build -t $IMAGE_REPO_NAME:$IMAGE_TAG .
- docker tag $IMAGE_REPO_NAME:$IMAGE_TAG $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG
post_build:
commands:
- echo Build completed on `date`
- echo Pushing the Docker image...
- docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG
Save the file in the root of the repository and name it buildspec.yml.
Because we specify environment variables in the CodeBuild project, we don’t need to hard code any values in the buildspec file.
Commit the changes to Git and push the changes to our source repository:
Creating a build specification file for your manifest list creation
Next we create a buildspec file that instructs CodeBuild to create a Docker manifest list, and associate that manifest list with the Docker images that the buildspec file builds.
A manifest list is a list of image layers that is created by specifying one or more (ideally more than one) image names. You can then use it in the same way as an image name in docker pull and docker run commands, for example. For more information, see manifest create.
As of this writing, manifest creation is an experimental feature of the Docker command line interface (CLI).
Experimental features provide early access to future product functionality. These features are intended only for testing and feedback because they may change between releases without warning or be removed entirely from a future release. Experimental features must not be used in production environments. For more information, Experimental features.
When creating the CodeBuild project for manifest list creation, we specify a buildspec file name override as buildspec-manifest.yml. This buildspec instructs CodeBuild to do the following:
install phase – Update the yum package manager
pre_build phase – Sign in to Amazon ECR using the IAM role assumed by CodeBuild
build phase – Perform three actions:
Set environment variable to enable Docker experimental features for the CLI
Create the Docker manifest list using the Docker CLI
Annotate the manifest list to add the architecture-specific Docker image references
post_build phase – Push the Docker image to our Amazon ECR repository and use docker manifest inspect to echo out the contents of the manifest list from Amazon ECR
We first need to add the buildspec-manifest.yml file to our source repository.
In a text editor, add a new file with the following build specification:
version: 0.2
# Based on the Docker documentation, must include the DOCKER_CLI_EXPERIMENTAL environment variable
# https://docs.docker.com/engine/reference/commandline/manifest/
phases:
install:
commands:
- yum update -y
pre_build:
commands:
- echo Logging in to Amazon ECR...
- $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION)
build:
commands:
- echo Build started on `date`
- echo Building the Docker manifest...
- export DOCKER_CLI_EXPERIMENTAL=enabled
- docker manifest create $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:latest-arm64v8 $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:latest-amd64
- docker manifest annotate --arch arm64 $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:latest-arm64v8
- docker manifest annotate --arch amd64 $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:latest-amd64
post_build:
commands:
- echo Build completed on `date`
- echo Pushing the Docker image...
- docker manifest push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME
- docker manifest inspect $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME
Save the file in the root of the repository and name it buildspec-manifest.yml.
Commit the changes to Git and push the changes to our source repository:
Now we have created a single buildspec.yml file for building, tagging, and pushing the Docker images to Amazon ECR for both target native architectures: x86 and Arm64. This file is shared by two of the three CodeBuild projects that we create. We use CodeBuild to inject environment variables, some of which need to be changed for each architecture (such as image tag and image architecture). We also want to use the single Docker file, regardless of the architecture. We also need to ensure any third-party libraries are present and compiled correctly for the target architecture.
For more information about third-party libraries and software versions that have been optimized for Arm, see the Getting started with AWS Graviton GitHub repo.
We use the same environment variable names for the CodeBuild projects, but each project has specific values, as detailed in the following table. You need to modify these values to your numeric AWS account ID, the AWS Region where your Amazon ECR registry endpoint is located, and your Amazon ECR repository name. The instructions for adding the environment variables in the CodeBuild projects are in the following sections.
Environment Variable
x86 Project values
Arm64 Project values
manifest Project values
1
AWS_DEFAULT_REGION
us-east-1
us-east-1
us-east-1
2
AWS_ACCOUNT_ID
111111111111
111111111111
111111111111
3
IMAGE_REPO_NAME
multi-arch-test
multi-arch-test
multi-arch-test
4
IMAGE_TAG
latest-amd64
latest-arm64v8
latest
The image we use in this post uses architecture-specific tags with the term latest. This is for demonstration purposes only; it’s best to tag the images with an explicit version or another meaningful reference.
CodeBuild for x86
We start with creating a new CodeBuild project for x86 on the CodeBuild console.
CodeBuild looks for a file named buildspec.yml by default, unless overridden. For these first two CodeBuild projects, we rely on that default and don’t specify the buildspec name.
On the CodeBuild console, choose Create build project.
For Project name, enter a unique project name for your build project, such as node-x86.
To add tags, add them under Additional Configuration.
Choose a Source provider (for this post, we choose GitHub).
For Environment image, choose Managed image.
Select Amazon Linux 2.
For Runtime(s), choose Standard.
For Image, choose aws/codebuild/amazonlinux2-x86_64-standard:3.0.
This is a x86 build image.
Select Privileged.
For Service role, choose New service role.
Enter a name for the new role (one is created for you), such as CodeBuildServiceRole-nodeproject.
We reuse this same service role for the other CodeBuild projects associated with this project.
Expand Additional configurations and move to the Environment variables
Create the following Environment variables:
Name
Value
Type
1
AWS_DEFAULT_REGION
us-east-1
Plaintext
2
AWS_ACCOUNT_ID
111111111111
Plaintext
3
IMAGE_REPO_NAME
multi-arch-test
Plaintext
4
IMAGE_TAG
latest-amd64
Plaintext
Choose Create build project.
Attaching the IAM policy
Now that we have created the CodeBuild project, we need to adjust the new service role that was just created and attach an IAM policy so that it can interact with the Amazon ECR API.
On the CodeBuild console, choose the node-x86 project
Choose the Build details
Under Service role, choose the link that looks like arn:aws:iam::111111111111:role/service-role/CodeBuildServiceRole-nodeproject.
A new browser tab should open.
Choose Attach policies.
In the Search field, enter AmazonEC2ContainerRegistryPowerUser.
Select AmazonEC2ContainerRegistryPowerUser.
Choose Attach policy.
CodeBuild for arm64
Now we move on to creating a new (second) CodeBuild project for Arm64.
On the CodeBuild console, choose Create build project.
For Project name, enter a unique project name, such as node-arm64.
If you want to add tags, add them under Additional Configuration.
Choose a Source provider (for this post, choose GitHub).
For Environment image, choose Managed image.
Select Amazon Linux 2.
For Runtime(s), choose Standard.
For Image, choose aws/codebuild/amazonlinux2-aarch64-standard:2.0.
This is an Arm build image and is different from the image selected in the previous CodeBuild project.
Select Privileged.
For Service role, choose Existing service role.
Choose CodeBuildServiceRole-nodeproject.
Select Allow AWS CodeBuild to modify this service role so it can be used with this build project.
Expand Additional configurations and move to the Environment variables
Create the following Environment variables:
Name
Value
Type
1
AWS_DEFAULT_REGION
us-east-1
Plaintext
2
AWS_ACCOUNT_ID
111111111111
Plaintext
3
IMAGE_REPO_NAME
multi-arch-test
Plaintext
4
IMAGE_TAG
latest-arm64v8
Plaintext
Choose Create build project.
CodeBuild for manifest list
For the last CodeBuild project, we create a Docker manifest list, associating that manifest list with the Docker images that the preceding projects create, and pushing the manifest list to ECR. This project uses the buildspec-manifest.yml file created earlier.
On the CodeBuild console, choose Create build project.
For Project name, enter a unique project name for your build project, such as node-manifest.
If you want to add tags, add them under Additional Configuration.
Choose a Source provider (for this post, choose GitHub).
For Environment image, choose Managed image.
Select Amazon Linux 2.
For Runtime(s), choose Standard.
For Image, choose aws/codebuild/amazonlinux2-x86_64-standard:3.0.
This is a x86 build image.
Select Privileged.
For Service role, choose Existing service role.
Choose CodeBuildServiceRole-nodeproject.
Select Allow AWS CodeBuild to modify this service role so it can be used with this build project.
Expand Additional configurations and move to the Environment variables
Create the following Environment variables:
Name
Value
Type
1
AWS_DEFAULT_REGION
us-east-1
Plaintext
2
AWS_ACCOUNT_ID
111111111111
Plaintext
3
IMAGE_REPO_NAME
multi-arch-test
Plaintext
4
IMAGE_TAG
latest
Plaintext
For Buildspec name – optional, enter buildspec-manifest.yml to override the default.
Choose Create build project.
Setting up CodePipeline
Now we can move on to creating a pipeline to orchestrate the builds and manifest creation.
On the CodePipeline console, choose Create pipeline.
For Pipeline name, enter a unique name for your pipeline, such as node-multi-architecture.
For Service role, choose New service role.
Enter a name for the new role (one is created for you). For this post, we use the generated role name CodePipelineServiceRole-nodeproject.
Select Allow AWS CodePipeline to create a service role so it can be used with this new pipeline.
Choose Next.
Choose a Source provider (for this post, choose GitHub).
If you don’t have any existing Connections to GitHub, select Connect to GitHub and follow the wizard.
Choose your Branch name (for this post, I choose main, but your branch might be different).
For Output artifact format, choose CodePipeline default.
Choose Next.
You should now be on the Add build stage page.
For Build provider, choose AWS CodeBuild.
Verify the Region is your Region of choice (for this post, I use US East (N. Virginia)).
For Project name, choose node-x86.
For Build type, select Single build.
Choose Next.
You should now be on the Add deploy stage page.
Choose Skip deploy stage.
A pop-up appears that reads Your pipeline will not include a deployment stage. Are you sure you want to skip this stage?
Choose Skip.
Choose Create pipeline.
CodePipeline immediately attempts to run a build. You can let it continue without worry if it fails. We are only part of the way done with the setup.
Adding an additional build step
We need to add the additional build step for the Arm CodeBuild project in the Build stage.
On the CodePipeline console, choose node-multi-architecture pipeline
Choose Edit to start editing the pipeline stages.
You should now be on the Editing: node-multi-architecture page.
For the Build stage, choose Edit stage.
Choose + Add action.
For Action name, enter Build-arm64.
For Action provider, choose AWS CodeBuild.
Verify your Region is correct.
For Input artifacts, select SourceArtifact.
For Project name, choose node-arm64.
For Build type, select Single build.
Choose Done.
Choose Save.
A pop-up appears that reads Saving your changes cannot be undone. If the pipeline is running when you save your changes, that execution will not complete.
Choose Save.
Updating the first build action name
This step is optional. The CodePipeline wizard doesn’t allow you to enter your Build action name during creation, but you can update the Build stage’s first build action to have consistent naming.
Choose Edit to start editing the pipeline stages.
Choose the Edit icon.
For Action name, enter Build-x86.
Choose Done.
Choose Save.
A pop-up appears that says Saving your changes cannot be undone. If the pipeline is running when you save your changes, that execution will not complete.
Choose Save.
Adding the project
Now we add the CodeBuild project for manifest creation and publishing.
On the CodePipeline console, choose node-multi-architecture pipeline.
Choose Edit to start editing the pipeline stages.
Choose +Add stage below the Build
Set the Stage name to Manifest
Choose +Add action group.
For Action name, enter Create-manifest.
For Action provider, choose AWS CodeBuild.
Verify your Region is correct.
For Input artifacts, select SourceArtifact.
For Project name, choose node-manifest.
For Build type, select Single build.
Choose Done.
Choose Save.
A pop-up appears that reads Saving your changes cannot be undone. If the pipeline is running when you save your changes, that execution will not complete.
Choose Save.
Testing the pipeline
Now let’s verify everything works as planned.
In the pipeline details page, choose Release change.
This runs the pipeline in stages. The process should take a few minutes to complete. The pipeline should show each stage as Succeeded.
Now we want to inspect the output of the Create-manifest action that runs the CodeBuild project for manifest creation.
Choose Details in the Create-manifest
This opens the CodeBuild pipeline.
Under Build logs, we should see the output from the manifest inspect command we ran as the last step in the buildspec-manifest.yml See the following sample log:
To avoid incurring future charges, clean up the resources created as part of this post.
On the CodePipeline console, choose the pipeline node-multi-architecture.
Choose Delete pipeline.
When prompted, enter delete.
Choose Delete.
On the CodeBuild console, choose the Build project node-x86.
Choose Delete build project.
When prompted, enter delete.
Choose Delete.
Repeat the deletion process for Build projects node-arm64 and node-manifest.
Next we delete the Docker images we created and pushed to Amazon ECR. Be careful to not delete a repository that is being used for other images.
On the Amazon ECR console, choose the repository multi-arch-test.
You should see a list of Docker images.
Select latest, latest-arm64v8, and latest-amd64.
Choose Delete.
When prompted, enter delete.
Choose Delete.
Finally, we remove the IAM roles that we created.
On the IAM console, choose Roles.
In the search box, enter CodePipelineServiceRole-nodeproject.
Select the role and choose Delete role.
When prompted, choose Yes, delete.
Repeat these steps for the role CodeBuildServiceRole-nodeproject.
Conclusion
To summarize, we successfully created a pipeline to create multi-architecture Docker images for both x86 and arm64. We referenced them via annotation in a Docker manifest list and stored them in Amazon ECR. The Docker images were based on a single Docker file that uses environment variables as parameters to allow for Docker file reuse.
For more information about these services, see the following:
This post was co-written with Brian Nutt, Senior Software Engineer and Kao Makino, Principal Performance Engineer, both at Snowflake.
Transactional databases are a key component of any production system. Maintaining data integrity while rows are read and written at a massive scale is a major technical challenge for these types of databases. To ensure their stability, it’s necessary to test many different scenarios and configurations. Simulating as many of these as possible allows engineers to quickly catch defects and build resilience. But the Holy Grail is to accomplish this at scale and within a timeframe that allows your developers to iterate quickly.
Snowflake is a single, integrated data platform delivered as a service. Built from the ground up for the cloud, Snowflake’s unique multi-cluster shared data architecture delivers the performance, scale, elasticity, and concurrency that today’s organizations require. It features storage, compute, and global services layers that are physically separated but logically integrated. Data workloads scale independently from one another, making it an ideal platform for data warehousing, data lakes, data engineering, data science, modern data sharing, and developing data applications.
Developing a simulation-based testing and validation framework
Snowflake’s cloud services layer is composed of a collection of services that manage virtual warehouses, query optimization, and transactions. This layer relies on rich metadata stored in FDB.
Prior to the creation of the simulation framework, Project Joshua, FDB developers ran tests on their laptops and were limited by the number they could run. Additionally, there was a scheduled nightly job for running further tests.
Amazon EKS as the foundation
Snowflake’s platform team decided to use Kubernetes to build Project Joshua. Their focus was on helping engineers run their workloads instead of spending cycles on the management of the control plane. They turned to Amazon EKS to achieve their scalability needs. This was a crucial success criterion for Project Joshua since at any point in time there could be hundreds of nodes running in the cluster. Snowflake utilizes the Kubernetes Cluster Autoscaler to dynamically scale worker nodes in minutes to support a tests-based queue of Joshua’s requests.
With the integration of Amazon EKS and Amazon Virtual Private Cloud (Amazon VPC), Snowflake is able to control access to the required resources. For example: the database that serves Joshua’s test queues is external to the EKS cluster. By using the Amazon VPC CNI plugin, each pod receives an IP address in the VPC and Snowflake can control access to the test queue via security groups.
To achieve its desired performance, Snowflake created its own custom pod scaler, which responds quicker to changes than using a custom metric for pod scheduling.
The agent scaler is responsible for monitoring a test queue in the coordination database (which, coincidentally, is also FDB) to schedule Joshua agents. The agent scaler communicates directly with Amazon EKS using the Kubernetes API to schedule tests in parallel.
Joshua agents (one agent per pod) are responsible for pulling tests from the test queue, executing, and reporting results. Tests are run one at a time within the EKS Cluster until the test queue is drained.
Achieving scale and cost savings with Amazon EC2 Spot
A Spot Fleet is a collection—or fleet—of Amazon EC2 Spot instances that Joshua uses to make the infrastructure more reliable and cost effective. Spot Fleet is used to reduce the cost of worker nodes by running a variety of instance types.
With Spot Fleet, Snowflake requests a combination of different instance types to help ensure that demand gets fulfilled. These options make Fleet more tolerant of surges in demand for instance types. If a surge occurs it will not significantly affect tasks since Joshua is agnostic to the type of instance and can fall back to a different instance type and still be available.
For reservations, Snowflake uses the capacity-optimized allocation strategy to automatically launch Spot Instances into the most available pools by looking at real-time capacity data and predicting which are the most available. This helps Snowflake quickly switch instances reserved to what is most available in the Spot market, instead of spending time contending for the cheapest instances, at the cost of a potentially higher price.
Overcoming hurdles
Snowflake’s usage of a public container registry posed a scalability challenge. When starting hundreds of worker nodes, each node needs to pull images from the public registry. This can lead to a potential rate limiting issue when all outbound traffic goes through a NAT gateway.
For example, consider 1,000 nodes pulling a 10 GB image. Each pull request requires each node to download the image across the public internet. Some issues that need to be addressed are latency, reliability, and increased costs due to the additional time to download an image for each test. Also, container registries can become unavailable or may rate-limit download requests. Lastly, images are pulled through public internet and other services in the cluster can experience pulling issues.
For anything more than a minimal workload, a local container registry is needed. If an image is first pulled from the public registry and then pushed to a local registry (cache), it only needs to pull once from the public registry, then all worker nodes benefit from a local pull. That’s why Snowflake decided to replicate images to ECR, a fully managed docker container registry, providing a reliable local registry to store images. Additional benefits for the local registry are that it’s not exclusive to Joshua; all platform components required for Snowflake clusters can be cached in the local ECR Registry. For additional security and performance Snowflake uses AWS PrivateLink to keep all network traffic from ECR to the workers nodes within the AWS network. It also resolved rate-limiting issues from pulling images from a public registry with unauthenticated requests, unblocking other cluster nodes from pulling critical images for operation.
Conclusion
Project Joshua allows Snowflake to enable developers to test more scenarios without having to worry about the management of the infrastructure. Snowflake’s engineers can schedule thousands of test simulations and configurations to catch bugs faster. FDB is a key component of the Snowflake stack and Project Joshua helps make FDB more stable and resilient. Additionally, Amazon EC2 Spot has provided non-trivial cost savings to Snowflake vs. running on-demand or buying reserved instances.
If you want to know more about how Snowflake built its high performance data warehouse as a Service on AWS, watch the This is My Architecture video below.
When I am delivering an introduction to the AWS Cloud for developers, I usually spend a bit of time to mention and to demonstrate Amazon Lightsail. It is by far the easiest way to get started on AWS. It allows you to get your application running on your own virtual server in a matter of minutes. Today, we are adding the possibility to deploy your container-based workloads on Amazon Lightsail. You can now deploy your container images to the cloud with the same simplicity and the same bundled pricing Amazon Lightsail provides for your virtual servers.
Amazon Lightsail is an easy-to-use cloud service that offers you everything needed to deploy an application or website, for a cost effective and easy to understand monthly plan. It is ideal to deploy simple workloads, websites, or to get started with AWS. The typical Lightsail customers range from developers to small businesses or startups who are looking to get quickly started in the cloud and AWS. At any time, you can later adopt the broad AWS Services when you are getting more familiar with the AWS cloud.
When deploying to Lightsail, you can choose between six operating systems (4 Linux distributions, FreeBSD, or Windows), seven applications (such as WordPress, Drupal, Joomla, Plesk…), and seven stacks (such as Node.js, Lamp, GitLab, Django…). But what about Docker containers?
Starting today, Amazon Lightsail offers an simple way for developers to deploy their containers to the cloud. All you need to provide is a Docker image for your containers and we automatically containerize it for you. Amazon Lightsail gives you an HTTPS endpoint that is ready to serve your application running in the cloud container. It automatically sets up a load balanced TLS endpoint, and take care of the TLS certificate. It replaces unresponsive containers for you automatically, it assigns a DNS name to your endpoint, it maintains the old version till the new version is healthy and ready to go live, and more.
Let’s see how it works by deploying a simple Python web app as a container. I assume you have the AWS Command Line Interface (CLI) and Docker installed on your laptop. Python is not required, it will be installed in the container only.
I first create a Python REST API, using the Flask simple application framework. Any programming language and any framework that can run inside a container works too. I just choose Python and Flask because they are simple and elegant.
You can safely copy /paste the following commands:
mkdir helloworld-python
cd helloworld-python
# create a simple Flask application in helloworld.py
echo "
from flask import Flask, request
from flask_restful import Resource, Api
app = Flask(__name__)
api = Api(app)
class Greeting (Resource):
def get(self):
return { "message" : "Hello Flask API World!" }
api.add_resource(Greeting, '/') # Route_1
if __name__ == '__main__':
app.run('0.0.0.0','8080')
" > helloworld.py
Then I create a Dockerfile that contains the steps and information required to build the container image:
# create a Dockerfile
echo '
FROM python:3
ADD helloworld.py /
RUN pip install flask
RUN pip install flask_restful
EXPOSE 8080
CMD [ "python", "./helloworld.py"]
' > Dockerfile
Now I can build my container:
docker build -t lightsail-hello-world .
The build command outputs many lines while it builds the container, it eventually terminates with the following message (actual ID differs):
Successfully built 7848e055edff
Successfully tagged lightsail-hello-world:latest
I test the container by launching it on my laptop:
docker run -it --rm -p 8080:8080 lightsail-hello-world
and connect a browser to localhost:8080
When I am satisfied with my app, I push the container to Docker Hub.
docker tag lightsail-hello-world sebsto/lightsail-hello-world
docker login
docker push sebsto/lightsail-hello-world
Now that I have a container ready on Docker Hub, let’s create a Lightsail Container Service.
I point my browser to the Amazon Lightsailconsole. I can see container services already deployed and I can manage them. To create a new service, I click Create container service:
On the next screen, I select the size of the container I want to use, in terms of vCPU and memory available to my application. I also select the number of container instances I want to run in parallel for high availability or scalability reasons. I can change the number of container instances or their power (vCPU and RAM) at any time, without interrupting the service. Both these parameters impact the price AWS charges you per month. The price is indicated and dynamically adjusted on the screen, as shown on the following video.
Slightly lower on the screen, I choose to skip the deployment for now. I give a name for the service (“hello-world“). I click Create container service.
Once the service is created, I click Create your first deployment to create a deployment. A deployment is a combination of a specific container image and version to be deployed on the service I just created.
I chose a name for my image and give the address of the image on Docker Hub, using the format user/<my container name>:tag. This is also where I have the possibility to enter environment variables, port mapping, or a launch command.
My container is offering a network service on port TCP 8080, so I add that port to the deployment configuration. The Open Ports configuration specifies which ports and protocols are open to other systems in my container’s network. Other containers or virtual machines can only connect to my container when the port is explicitly configured in the console or EXPOSE‘d in my Dockerfile. None of these ports are exposed to the public internet.
But in this example, I also want Lightsail to route the traffic from the public internet to this container. So, I add this container as an endpoint of the hello-world service I just created. The endpoint is automatically configured for TLS, there is no certificate to install or manage.
I can add up to 10 containers for one single deployment. When ready, I click Save and deploy.
After a while, my deployment is active and I can test the endpoint.
The endpoint DNS address is available on the top-right side of the console. If I must, I can configure my own DNS domain name.
I open another tab in my browser and point it at the https endpoint URL:
When I must deploy a new version, I use the console again to modify the deployment. I spare you the details of modifying the application code, build, and push a new version of the container. Let’s say I have my second container image version available under the name sebsto/lightsail-hello-world:v2. Back to Amazon Lightsail console, I click Deployments, then Modify your Deployments. I enter the full name, including the tag, of the new version of the container image and click Save and Deploy.
After a while, the new version is deployed and automatically activated.
I open a new tab in my browser and I point it to the endpoint URI available on the top-right corner of Amazon Lightsail console. I observe the JSON version is different. It now has a version attribute with a value of 2.
When something goes wrong during my deployment, Amazon Lightsail automatically keeps the last deployment active, to avoid any service interruption. I can also manually activate a previous deployment version to reverse any undesired changes.
I just deployed my first container image from Docker Hub. I can also manage my services and deploy local container images from my laptop using the AWS Command Line Interface (CLI). To push container images to my Amazon Lightsail container service directly from my laptop, I must install the LightSail Controler Plugin. (TL;DR curl, cp and chmod are your friends here, I also maintain a DockerFile to use the CLI inside a container.)
To create, list, or delete a container service, I type:
I can also use the CLI to deploy container images directly from my laptop. Be sure lightsailctl is installed.
# Build the new version of my image (v3) docker build -t sebsto/lightsail-hello-world:v3 .
# Push the new image. aws lightsail push-container-image --service-name hello-world --label hello-world --image sebsto/lightsail-hello-world:v3
After a while, I see the output:
Image "sebsto/lightsail-hello-world:v3" registered. Refer to this image as ":hello-world.hello-world.1" in deployments.
I create a lc.json file to hold the details of the deployment configuration. it is aligned to the options I see on the console. I report the name given by the previous command on the image property:
After a while, the status becomes ACTIVE, and I can test my endpoint.
curl https://hello-world.nxxxxxxxxxxx.lightsail.ec2.aws.dev/ {"message": "Hello Flask API World!", "version": 3}
If you plan to later deploy your container to Amazon ECS or Amazon Elastic Kubernetes Service, no changes are required. You can pull the container image from your repository, just like you do with Amazon Lightsail.
You can deploy your containers on Lightsail in all AWS Regions where Amazon Lightsail is available. As of today, this is US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (London), and Europe (Paris).
As usual when using Amazon Lightsail, pricing is easy to understand and predictable. Amazon Lightsail Containers have a fixed price per month per container, depending on the size of the container (the vCPU/memory combination you use). You are charged on the prorated hours you keep the service running. The price per month is the maximum price you will be charged for running your service 24h/7. The prices are identical in all AWS Regions. They are ranging from $7 / month for a Nano container (512MB memory and 0.25 vCPU) to $160 / month for a X-Large container (8GB memory and 4 vCPU cores). This price not only includes the container itself, but also the load balancer, the DNS, and a generous data transfer tier. The details and prices for other AWS Regions are on the Lightsail pricing page.
I can’t wait to discover what solutions you will build and deploy on Amazon Lightsail Containers!
Earlier this year we hosted the first serverless themed virtual event, the Serverless-First Function. We enjoyed the opportunity to virtually connect with our customers so much that we want to do it again. This time, we’re expanding the scope to feature serverless, containers, and front-end development content. The Modern Applications Online Event is scheduled for November 4-5, 2020.
This free, two-day event addresses how to build and operate modern applications at scale across your organization, enabling you to become more agile and respond to change faster. The event covers topics including serverless application development, containers best practices, front-end web development and more. If you missed the containers or serverless virtual events earlier this year, this is great opportunity to watch the content and interact directly with expert moderators. The full agenda is listed below.
Move fast and ship things: Using serverless to increase speed and agility within your organization In this session, Adrian Cockcroft demonstrates how you can use serverless to build modern applications faster than ever. Cockcroft uses real-life examples and customer stories to debunk common misconceptions about serverless.
Eliminating busywork at the organizational level: Tips for using serverless to its fullest potential In this session, David Yanacek discusses key ways to unlock the full benefits of serverless, including building services around APIs and using service-oriented architectures built on serverless systems to remove the roadblocks to continuous innovation.
Faster Mobile and Web App Development with AWS Amplify In this session, Brice Pellé, introduces AWS Amplify, a set of tools and services that enables mobile and front-end web developers to build full stack serverless applications faster on AWS. Learn how to accelerate development with AWS Amplify’s use-case centric open-source libraries and CLI, and its fully managed web hosting service with built-in CI/CD.
Built Serverless-First: How Workgrid Software transformed from a Liberty Mutual project to its own global startup Connected through a central IT team, Liberty Mutual has embraced serverless since AWS Lambda’s inception in 2014. In this session, Gillian McCann discusses Workgrid’s serverless journey—from internal microservices project within Liberty Mutual to independent business entity, all built serverless-first. Introduction by AWS Principal Serverless SA, Sam Dengler.
Market insights: A conversation with Forrester analyst Jeffrey Hammond & Director of Product for Lambda Ajay Nair In this session, guest speaker Jeffrey Hammond and Director of Product for AWS Lambda, Ajay Nair, discuss the state of serverless, Lambda-based architectural approaches, Functions-as-a-Service platforms, and more. You’ll learn about the high-level and enduring enterprise patterns and advancements that analysts see driving the market today and determining the market in the future.
AWS Fargate Platform Version 1.4 In this session we will go through a brief introduction of AWS Fargate, what it is, its relation to EKS and ECS and the problems it addresses for customers. We will later introduce the concept of Fargate “platform versions” and we will then dive deeper into the new features that the new platform version 1.4 enables.
Persistent Storage on Containers Containerizing applications that require data persistence or shared storage is often challenging since containers are ephemeral in nature, are scaled in and out dynamically, and typically clear any saved state when terminated. In this session you will learn about Amazon Elastic File System (EFS), a fully managed, elastic, highly-available, scalable, secure, high-performance, cloud native, shared file system that enables data to be persisted separately from compute for your containerized applications.
Security Best Practices on Amazon ECR In this session, we will cover best practices with securing your container images using ECR. Learn how user access controls, image assurance, and image scanning contribute to securing your images.
Application Level Design
Thursday, November 5, 2020, 9:00 AM – 1:00 PM PT
Building a Live Streaming Platform with Amplify Video In this session, learn how to build a live-streaming platform using Amplify Video and the platform powering it, AWS Elemental Live. Amplify video is an open source plugin for the Amplify CLI that makes it easy to incorporate video streaming into your mobile and web applications powered by AWS Amplify.
Building Serverless Web Applications In this session, follow along as Ben Smith shows you how to build and deploy a completely serverless web application from scratch. The application will span from a mobile friendly front end to complex business logic on the back end.
Automating serverless application development workflows In this talk, Eric Johnson breaks down how to think about CI/CD when building serverless applications with AWS Lambda and Amazon API Gateway. This session will cover using technologies like AWS SAM to build CI/CD pipelines for serverless application back ends.
Observability for your serverless applications In this session, Julian Wood walks you through how to add monitoring, logging, and distributed tracing to your serverless applications. Join us to learn how to track platform and business metrics, visualize the performance and operations of your application, and understand which services should be optimized to improve your customer’s experience.
Happy Building with AWS Copilot The hard part’s done. You and your team have spent weeks pouring over pull requests, building micro-services and containerizing them. Congrats! But what do you do now? How do you get those services on AWS? Copilot is a new command line tool that makes building, developing and operating containerized apps on AWS a breeze. In this session, we’ll talk about how Copilot helps you and your team set up modern applications that follow AWS best practices
CDK for Kubernetes The CDK for Kubernetes (cdk8s) is a new open-source software development framework for defining Kubernetes applications and resources using familiar programming languages. Applications running on Kubernetes are composed of dozens of resources maintained through carefully maintained YAML files. As applications evolve and teams grow, these YAML files become harder and harder to manage. It’s also really hard to reuse and create abstractions through config files — copying & pasting from previous projects is not the solution! In this webinar, the creators of cdk8s show you how to define your first cdk8s application, define reusable components called “constructs” and generally say goodbye (and thank you very much) to writing in YAML.
Machine Learning on Amazon EKS Amazon EKS has quickly emerged as a leading choice for machine learning workloads. In this session, we’ll walk through some of the recent ML related enhancements the Kubernetes team at AWS has released. We will then dive deep with walkthroughs of how to optimize your machine learning workloads on Amazon EKS, including demos of the latest features we’ve contributed to popular open source projects like Spark and Kubeflow
Deep Dive on Amazon ECS Capacity Providers In this talk, we’ll dive into the different ways that ECS Capacity Providers can enable teams to focus more on their core business, and less on the infrastructure behind the scenes. We’ll look at the benefits and discuss scenarios where Capacity Providers can help solve the problems that customers face when using container orchestration. Lastly, we’ll review what features have been released with Capacity Providers, as well as look ahead at what’s to come.
As machine learning and deep learning models become more sophisticated, hardware acceleration is increasingly required to deliver fast predictions at high throughput. Today, we’re very happy to announce that AWS customers can now use the Amazon EC2 Inf1 instances on Amazon ECS, for high performance and the lowest prediction cost in the cloud. For a few weeks now, these instances have also been available on Amazon Elastic Kubernetes Service.
A primer on EC2 Inf1 instances Inf1 instances were launched at AWS re:Invent 2019. They are powered by AWS Inferentia, a custom chip built from the ground up by AWS to accelerate machine learning inference workloads.
Inf1 instances are available in multiple sizes, with 1, 4, or 16 AWS Inferentia chips, with up to 100 Gbps network bandwidth and up to 19 Gbps EBS bandwidth. An AWS Inferentia chip contains four NeuronCores. Each one implements a high-performance systolic array matrix multiply engine, which massively speeds up typical deep learning operations such as convolution and transformers. NeuronCores are also equipped with a large on-chip cache, which helps cut down on external memory accesses, saving I/O time in the process. When several AWS Inferentia chips are available on an Inf1 instance, you can partition a model across them and store it entirely in cache memory. Alternatively, to serve multi-model predictions from a single Inf1 instance, you can partition the NeuronCores of an AWS Inferentia chip across several models.
Compiling Models for EC2 Inf1 Instances To run machine learning models on Inf1 instances, you need to compile them to a hardware-optimized representation using the AWS Neuron SDK. All tools are readily available on the AWS Deep Learning AMI, and you can also install them on your own instances. You’ll find instructions in the Deep Learning AMI documentation, as well as tutorials for TensorFlow, PyTorch, and Apache MXNet in the AWS Neuron SDK repository.
In the demo below, I will show you how to deploy a Neuron-optimized model on an ECS cluster of Inf1 instances, and how to serve predictions with TensorFlow Serving. The model in question is BERT, a state of the art model for natural language processing tasks. This is a huge model with hundreds of millions of parameters, making it a great candidate for hardware acceleration.
Creating an Amazon ECS Cluster Creating a cluster is the simplest thing: all it takes is a call to the CreateCluster API.
An Amazon Machine Image (AMI) containing the ECS agent and supporting Inf1 instances. You could build your own, or use the ECS-optimized AMI for Inferentia. In the us-east-1 region, its id is ami-04450f16e0cd20356.
A Security Group, opening network ports for TensorFlow Serving (8500 for gRPC, 8501 for HTTP). The identifier for mine is sg-0994f5c7ebbb48270.
If you’d like to have ssh access, your Security Group should also open port 22, and you should pass the name of an SSH key pair. Mine is called admin.
We also need to create a small user data file in order to let instances join our cluster. This is achieved by storing the name of the cluster in an environment variable, itself written to the configuration file of the ECS agent.
$ docker tag neuron-tensorflow-inference 123456789012.dkr.ecr.us-east-1.amazonaws.com/ecs-inf1-demo:latest
$ docker push
Now, we need to create a task definition in order to run this container on our cluster.
Creating a Task Definition for Inf1 Instances If you don’t have one already, you should first create an execution role, i.e. a role allowing the ECS agent to perform API calls on your behalf. You can find more information in the documentation. Mine is called ecsTaskExecutionRole.
The full task definition is visible below. As you can see, it holds two containers:
The BERT container that I built,
A sidecar container called neuron-rtd, that allows the BERT container to access NeuronCores present on the Inf1 instance. The AWS_NEURON_VISIBLE_DEVICES environment variable lets you control which ones may be used by the container. You could use it to pin a container on one or several specific NeuronCores.
We’re now ready to run our container, and predict with it.
Running a Container on Inf1 Instances As this is a prediction service, I want to make sure that it’s always available on the cluster. Instead of simply running a task, I create an ECS Service that will make sure the required number of container copies is running, relaunching them should any failure happen.
A minute later, I see that both task containers are running on the cluster.
Predicting with BERT on ECS and Inf1 The inner workings of BERT are beyond the scope of this post. This particular model expects a sequence of 128 tokens, encoding the words of two sentences we’d like to compare for semantic equivalence.
Here, I’m only interested in measuring prediction latency, so dummy data is fine. I build 100 prediction requests storing a sequence of 128 zeros. Using the IP address of the BERT container, I send them to the TensorFlow Serving endpoint via grpc, and I compute the average prediction time.
Here is the full code.
import numpy as np
import grpc
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
import time
if __name__ == '__main__':
channel = grpc.insecure_channel('18.234.61.31:8500')
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
request = predict_pb2.PredictRequest()
request.model_spec.name = 'bert'
i = np.zeros([1, 128], dtype=np.int32)
request.inputs['input_ids'].CopyFrom(tf.contrib.util.make_tensor_proto(i, shape=i.shape))
request.inputs['input_mask'].CopyFrom(tf.contrib.util.make_tensor_proto(i, shape=i.shape))
request.inputs['segment_ids'].CopyFrom(tf.contrib.util.make_tensor_proto(i, shape=i.shape))
latencies = []
for i in range(100):
start = time.time()
result = stub.Predict(request)
latencies.append(time.time() - start)
print("Inference successful: {}".format(i))
print ("Ran {} inferences successfully. Latency average = {}".format(len(latencies), np.average(latencies)))
For convenience, I’m running this code on an EC2 instance based on the Deep Learning AMI. It comes pre-installed with a Conda environment for TensorFlow and TensorFlow Serving, saving me from installing any dependencies.
On average, prediction took 56.5ms. As far as BERT goes, this is pretty good!
Ran 100 inferences successfully. Latency average = 0.05647835493087769
Getting Started You can now deploy Amazon Elastic Compute Cloud (EC2)Inf1 instances on Amazon ECS today in the US East (N. Virginia) and US West (Oregon) regions. As Inf1 deployment progresses, you’ll be able to use them with Amazon ECS in more regions.
Give this a try, and please send us feedback either through your usual AWS Support contacts, on the AWS Forum for Amazon ECS, or on the container roadmap on Github.
We’re sharing an update to the Logical Separation on AWS: Moving Beyond Physical Isolation in the Era of Cloud Computing whitepaper to help customers benefit from the security and innovation benefits of logical separation in the cloud. This paper discusses using a multi-pronged approach—leveraging identity management, network security, serverless and containers services, host and instance features, logging, and encryption—to build logical security mechanisms that meet and often exceed the security results of physical separation of resources and other on-premises security approaches. Public sector and commercial organizations worldwide can leverage these mechanisms to more confidently migrate sensitive workloads to the cloud without the need for physically dedicated infrastructure.
Amazon Web Services (AWS) addresses the concerns driving physical separation requirements through the logical security capabilities we provide customers and the security controls we have in place to protect customer data. The strength of that isolation combined with the automation and flexibility that the isolation provides is on par with or better than the security controls seen in traditional physically separated environments.
The paper also highlights a U.S. Department of Defense (DoD) use case demonstrating how the AWS logical separation capabilities met the intent behind a DoD requirement for dedicated, physically isolated infrastructure for its most sensitive unclassified workloads.
If you have questions or want to learn more, contact your account executive or contact AWS Support. If you have feedback about this post, submit comments in the Comments section below.
This post was contributed by James Bland, Sr. Partner Solutions Architect, AWS, Jay Yeras, Head of Cloud and Cloud Native Solution Architecture, Snyk, and Venkat Subramanian, Group Product Manager, Bitbucket
One of our goals at Atlassian is to make the software delivery and development process easier. This post explains how you can set up a software delivery pipeline using Bitbucket Pipelines and Snyk, a tool that finds and fixes vulnerabilities in open-source dependencies and container images, to deploy secured applications on Amazon Elastic Kubernetes Service (Amazon EKS). By presenting important development information directly on pull requests inside the product, you can proactively diagnose potential issues, shorten test cycles, and improve code quality.
Atlassian Bitbucket Cloud is a Git-based code hosting and collaboration tool, built for professional teams. Bitbucket Pipelines is an integrated CI/CD service that allows you to automatically build, test, and deploy your code. With its best-in-class integrations with Jira, Bitbucket Pipelines allows different personas in an organization to collaborate and get visibility into the deployments. Bitbucket Pipes are small chunks of code that you can drop into your pipeline to make it easier to build powerful, automated CI/CD workflows.
In this post, we go over the following topics:
The importance of security as practices shift-left in DevOps
How embedding security into pull requests helps developer workflows
Deploying an application on Amazon EKS using Bitbucket Pipelines and Snyk
Shift-left on security
Security is usually an afterthought. Developers tend to focus on delivering software first and addressing security issues later when IT Security, Ops, or InfoSec teams discover them. However, research from the 2016 State of DevOps Report shows that you can achieve better outcomes by testing for security earlier in the process within a developer’s workflow. This concept is referred to as shift-left, where left indicates earlier in the process, as illustrated in the following diagram.
There are two main challenges in shifting security left to developers:
Developers aren’t security experts – They develop software in the most efficient way they know how, which can mean importing libraries to take care of lower-level details. And sometimes these libraries import other libraries within them, and so on. This makes it almost impossible for a developer, who is not a security expert, to keep track of security.
It’s time-consuming – There is no automation. Developers have to run tests to understand what’s happening and then figure out how to fix it. This slows them down and takes them away from their core job: building software.
Enabling security into a developer’s workflow
Code Insights is a new feature in Bitbucket that provides contextual information as part of the pull request interface. It surfaces information relevant to a pull request so issues related to code quality or security vulnerabilities can be viewed and acted upon during the code review process. The following screenshot shows Code Insights on the pull request sidebar.
In the security space, we’ve partnered with Snyk, McAfee, Synopsys, and Anchore. When you use any of these integrations in your Bitbucket Pipeline, security vulnerabilities are automatically surfaced within your pull request, prompting developers to address them. By bringing the vulnerability information into the pull request interface before the actual deployment, it’s much easier for code reviewers to assess the impact of the vulnerability and provide actionable feedback.
When security issues are fixed as part of a developer’s workflow instead of post-deployment, it means fewer sev1 incidents, which saves developer time and IT resources down the line, and leads to a better user experience for your customers.
Securing your Atlassian Workflow with Snyk
To demonstrate how you can easily introduce a few steps to your workflow that improve your security posture, we take advantage of the new Snyk integration to Atlassian’s Code Insights and other Snyk integrations to Bitbucket Cloud, Amazon Elastic Container Registry (Amazon ECR, for more information see Container security with Amazon Elastic Container Registry (ECR): integrate and test), and Amazon EKS (for more information see Kubernetes workload and image scanning. We reference sample code in a publicly available Bitbucket repository. In this repository, you can find resources such as a multi-stage build Dockerfile for a sample Java web application, a sample bitbucket-pipelines.yml configured to perform Snyk scans and push container images to Amazon ECR, and a reference Kubernetes manifest to deploy your application.
In the preceding code, we invoke two Bitbucket Pipes to easily configure our pipeline and complete two critical tasks in just a few lines: scan our container image and push to our private registry. This saves time and allows for reusability across repositories while discovering innovative ways to automate our pipelines thanks to an extensive catalog of integrations.
Snyk pipe for Bitbucket Pipelines
In the following use case, we build a container image from the Dockerfile included in the Bitbucket repository and scan the image using the Snyk pipe. We also invoke the aws-ecr-push-image pipe to securely store our image in a private registry on Amazon ECR. When the pipeline runs, we see results as shown in the following screenshot.
If we choose the available report, we can view the detailed results of our Snyk scan. In the following screenshot, we see detailed insights into the content of that report: three high, one medium, and five low-severity vulnerabilities were found in our container image.
Snyk scans of Bitbucket and Amazon ECR repositories
Because we use Snyk’s integration to Amazon ECR and Snyk’s Bitbucket Cloud integration to scan and monitor repositories, we can dive deeper into these results by linking our Dockerfile stored in our Bitbucket repository to the results of our last container image scan. By doing so, we can view recommendations for upgrading our base image, as in the following screenshot.
As a result, we can move past informational insights and onto actionable recommendations. In the preceding screenshot, our current image of jboss/wilfdly:11.0.0.Final contains 76 vulnerabilities. We also see two recommendations: a major upgrade to jboss/wildfly:18.0.1.FINAL, which brings our total vulnerabilities down to 65, and an alternative upgrade, which is less desirable.
We can investigate further by drilling down into the report to view additional context on how a potential vulnerability was introduced, and also create a Jira issue to Atlassian Jira Software Cloud. The following screenshot shows a detailed report on the Issues tab.
We can also explore the Dependencies tab for a list of all the direct dependencies, transitive dependencies, and the vulnerabilities those may contain. See the following screenshot.
Snyk scan Amazon EKS configuration
The final step in securing our workflow involves integrating Snyk with Kubernetes and deploying to Amazon EKS and Bitbucket Pipelines. Sample Kubernetes manifest files and a bitbucket-pipeline.yml are available for you to use in the accompanying Bitbucket repository for this post. Our bitbucket-pipeline.yml contains the following step:
Now that we have provisioned the supporting infrastructure and invoked kubectl apply -f java-app.yaml to deploy our pods using our container images in Amazon ECR, we can monitor our project details and view some initial results. The following screenshot shows that our initial configuration isn’t secure.
The reason for this is that we didn’t explicitly define a few parameters in our Kubernetes manifest under securityContext. For example, parameters such as readOnlyRootFilesystem, runAsNonRoot, allowPrivilegeEscalation, and capabilities either aren’t defined or are set incorrectly in our template. As a result, we see this in our findings with the FAIL flag. Hovering over these on the Snyk console provides specific insights on how to fix these, for example:
Run as non-root – Whether any containers in the workload have securityContext.runAsNonRoot set to false or unset
Read-only root file system – Whether any containers in the workload have securityContext.readOnlyFilesystem set to false or unset
Drop capabilities – Whether all capabilities are dropped and CAP_SYS_ADMIN isn’t added
To save you the trouble of researching this, we provide another sample template, java-app-snyk.yaml, which you can apply against your running pods. The difference in this template is that we have included the following lines to the manifest, which address the three failed findings in our report:
After a subsequent scan, we can validate our changes propagated successfully and our Kubernetes configuration is secure (see the following screenshot).
Conclusion
This post demonstrated how to secure your entire flow proactively with Atlassian Bitbucket Cloud and Snyk. Seamless integrations to Bitbucket Cloud provide you with actionable insights at each step of your development process.
Our customers are increasingly developing their new applications with containers and serverless technologies, and are using modern continuous integration and delivery (CI/CD) tools to automate the software delivery life cycle. They also maintain a large number of existing applications that are built and managed manually or using legacy systems. Maintaining these two sets of applications with disparate tooling adds to operational overhead and slows down the pace of delivering new business capabilities. As much as possible, they want to be able to standardize their management tooling and CI/CD processes across both their existing and new applications, and see the option of packaging their existing applications into containers as the first step towards accomplishing that goal.
However, containerizing existing applications requires a long list of manual tasks such as identifying application dependencies, writing dockerfiles, and setting up build and deployment processes for each application. These manual tasks are time consuming, error prone, and can slow down the modernization efforts.
Today, we are launching AWS App2Container, a new command-line tool that helps containerize existing applications that are running on-premises, in Amazon Elastic Compute Cloud (EC2), or in other clouds, without needing any code changes. App2Container discovers applications running on a server, identifies their dependencies, and generates relevant artifacts for seamless deployment to Amazon ECS and Amazon EKS. It also provides integration with AWS CodeBuild and AWS CodeDeploy to enable a repeatable way to build and deploy containerized applications.
AWS App2Container generates the following artifacts for each application component: Application artifacts such as application files/folders, Dockerfiles, container images in Amazon Elastic Container Registry (ECR), ECS Task definitions, Kubernetes deployment YAML, CloudFormation templates to deploy the application to Amazon ECS or EKS, and templates to set up a build/release pipeline in AWS Codepipeline which also leverages AWS CodeBuild and CodeDeploy.
Starting today, you can use App2Container to containerize ASP.NET (.NET 3.5+) web applications running in IIS 7.5+ on Windows, and Java applications running on Linux—standalone JBoss, Apache Tomcat, and generic Java applications such as Spring Boot, IBM WebSphere, Oracle WebLogic, etc.
By modernizing existing applications using containers, you can make them portable, increase development agility, standardize your CI/CD processes, and reduce operational costs. Now let’s see how it works!
AWS App2Container – Getting Started AWS App2Container requires that the following prerequisites be installed on the server(s) hosting your application: AWS Command Line Interface (CLI) version 1.14 or later, Docker tools, and (in the case of ASP.NET) Powershell 5.0+ for applications running on Windows. Additionally, you need to provide appropriate IAM permissions to App2Container to interact with AWS services.
For example, let’s look how you containerize your existing Java applications. App2Container CLI for Linux is packaged as a tar.gz archive. The file provides users an interactive shell script, install.sh to install the App2Container CLI. Running the script guides users through the install steps and also updates the user’s path to include the App2Container CLI commands.
First, you can begin by running a one-time initialization on the installed server for the App2Container CLI with the init command.
$ sudo app2container init
Workspace directory path for artifacts[default: /home/ubuntu/app2container/ws]:
AWS Profile (configured using 'aws configure --profile')[default: default]:
Optional S3 bucket for application artifacts (Optional)[default: none]:
Report usage metrics to AWS? (Y/N)[default: y]:
Require images to be signed using Docker Content Trust (DCT)? (Y/N)[default: n]:
Configuration saved
This sets up a workspace to store application containerization artifacts (minimum 20GB of disk space available). You can extract them into your Amazon Simple Storage Service (S3) bucket using your AWS profile configured to use AWS services.
Next, you can view Java processes that are running on the application server by using the inventory command. Each Java application process has a unique identifier (for example, java-tomcat-9e8e4799) which is the application ID. You can use this ID to refer to the application with other App2Container CLI commands.
You can also intialize ASP.NET applications on an administrator-run PowerShell session of Windows Servers with IIS version 7.0 or later. Note that Docker tools and container support are available on Windows Server 2016 and later versions. You can select to run all app2container operations on the application server with Docker tools installed or use a worker machine with Docker tools using Amazon ECS-optimized Windows Server AMIs.
The inventory command displays all IIS websites on the application server that can be containerized. Each IIS website process has a unique identifier (for example, iis-smarts-51d2dbf8) which is the application ID. You can use this ID to refer to the application with other App2Container CLI commands.
You can choose a specific application by referring to its application ID and generate an analysis report for the application by using the analyze command.
$ sudo app2container analyze --application-id java-tomcat-9e8e4799
Created artifacts folder /home/ubuntu/app2container/ws/java-tomcat-9e8e4799
Generated analysis data in /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/analysis.json
Analysis successful for application java-tomcat-9e8e4799
Please examine the same, make appropriate edits and initiate containerization using "app2container containerize --application-id java-tomcat-9e8e4799"
You can use the analysis.json template generated by the application analysis to gather information on the analyzed application that helps identify all system dependencies from the analysisInfo section, and update containerization parameters to customize the container images generated for the application using the containerParameters section.
$ cat java-tomcat-9e8e4799/analysis.json
{
"a2CTemplateVersion": "1.0",
"createdTime": "2020-06-24 07:40:5424",
"containerParameters": {
"_comment1": "*** EDITABLE: The below section can be edited according to the application requirements. Please see the analyisInfo section below for deetails discoverd regarding the application. ***",
"imageRepository": "java-tomcat-9e8e4799",
"imageTag": "latest",
"containerBaseImage": "ubuntu:18.04",
"coopProcesses": [ 6446, 6549, 6646]
},
"analysisInfo": {
"_comment2": "*** NON-EDITABLE: Analysis Results ***",
"processId": 2537
"appId": "java-tomcat-9e8e4799",
"userId": "1000",
"groupId": "1000",
"cmdline": [...],
"os": {...},
"ports": [...]
}
}
Also, you can run the $ app2container extract --application-id java-tomcat-9e8e4799 command to generate an application archive for the analyzed application. This depends on the analysis.json file generated earlier in the workspace folder for the application,and adheres to any containerization parameter updates specified in there. By using extract command, you can continue the workflow on a worker machine after running the first set of commands on the application server.
Now you can containerize command generated Docker images for the selected application.
$ sudo app2container containerize --application-id java-tomcat-9e8e4799
AWS pre-requisite check succeeded
Docker pre-requisite check succeeded
Extracted container artifacts for application
Entry file generated
Dockerfile generated under /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/Artifacts
Generated dockerfile.update under /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/Artifacts
Generated deployment file at /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/deployment.json
Containerization successful. Generated docker image java-tomcat-9e8e4799
You're all set to test and deploy your container image.
Next Steps:
1. View the container image with \"docker images\" and test the application.
2. When you're ready to deploy to AWS, please edit the deployment file as needed at /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/deployment.json.
3. Generate deployment artifacts using app2container generate app-deployment --application-id java-tomcat-9e8e4799
Using this command, you can view the generated container images using Docker images on the machine where the containerize command is run. You can use the docker run command to launch the container and test application functionality.
Note that in addition to generating container images, the containerize command also generates a deployment.json template file that you can use with the next generate-appdeployment command. You can edit the parameters in the deployment.json template file to change the image repository name to be registered in Amazon ECR, the ECS task definition parameters, or the Kubernetes App name.
At this point, the application workspace where the artifacts are generated serves as an iteration sandbox. You can choose to edit the Dockerfile generated here to make changes to their application and use the docker build command to build new container images as needed. You can generate the artifacts needed to deploy the application containers in Amazon EKS by using the generate-deployment command.
$ sudo app2container generate app-deployment --application-id java-tomcat-9e8e4799
AWS pre-requisite check succeeded
Docker pre-requisite check succeeded
Created ECR Repository
Uploaded Cloud Formation resources to S3 Bucket: none
Generated Cloud Formation Master template at: /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/EksDeployment/amazon-eks-master.template.yaml
EKS Cloudformation templates and additional deployment artifacts generated successfully for application java-tomcat-9e8e4799
You're all set to use AWS Cloudformation to manage your application stack.
Next Steps:
1. Edit the cloudformation template as necessary.
2. Create an application stack using the AWS CLI or the AWS Console. AWS CLI command:
aws cloudformation deploy --template-file /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/EksDeployment/amazon-eks-master.template.yaml --capabilities CAPABILITY_NAMED_IAM --stack-name java-tomcat-9e8e4799
3. Setup a pipeline for your application stack:
app2container generate pipeline --application-id java-tomcat-9e8e4799
This command works based on the deployment.json template file produced as part of running the containerize command. App2Container will now generate ECS/EKS cloudformation templates as well and an option to deploy those stacks.
The command registers the container image to user specified ECR repository, generates cloudformation template for Amazon ECS and EKS deployments. You can register ECS task definition with Amazon ECS and use kubectl to launch the containerized application on the existing Amazon EKS or self-managed kubernetes cluster using App2Container generated amazon-eks-master.template.deployment.yaml.
Alternatively, you can directly deploy containerized applications by --deploy options into Amazon EKS.
$ sudo app2container generate app-deployment --application-id java-tomcat-9e8e4799 --deploy
AWS pre-requisite check succeeded
Docker pre-requisite check succeeded
Created ECR Repository
Uploaded Cloud Formation resources to S3 Bucket: none
Generated Cloud Formation Master template at: /home/ubuntu/app2container/ws/java-tomcat-9e8e4799/EksDeployment/amazon-eks-master.template.yaml
Initiated Cloudformation stack creation. This may take a few minutes. Please visit the AWS Cloudformation Console to track progress.
Deploying application to EKS
Handling ASP.NET Applications with Windows Authentication Containerizing ASP.NET applications is almost same process as Java applications, but Windows containers cannot be directly domain joined. They can however still use Active Directory (AD) domain identities to support various authentication scenarios.
App2Container detects if a site is using Windows authentication and accordingly makes the IIS site’s application pool run as the network service identity, and generates the new cloudformation templates for Windows authenticated IIS applications. The creation of gMSA and AD Security group, domain join ECS nodes and making containers use this gMSA are all taken care of by those templates.
Also, it provides two PowerShell scripts as output to the $ app2container containerize command along with an instruction file on how to use it.
The following is an example output:
PS C:\Windows\system32> app2container containerize --application-id iis-SmartStoreNET-a726ba0b
Running AWS pre-requisite check...
Running Docker pre-requisite check...
Container build complete. Please use "docker images" to view the generated container images.
Detected that the Site is using Windows Authentication.Generating powershell scripts into C:\Users\Admin\AppData\Local\app2container\iis-SmartStoreNET-a726ba0b\Artifacts required to setup Container host with Windows Authentication
Please look at C:\Users\Admin\AppData\Local\app2container\iis-SmartStoreNET-a726ba0b\Artifacts\WindowsAuthSetupInstructions.md for setup instructions on Windows Authentication.
A deployment file has been generated under C:\Users\Admin\AppData\Local\app2container\iis-SmartStoreNET-a726ba0b
Please edit the same as needed and generate deployment artifacts using "app2container generate-deployment"
The first PowerShellscript, DomainJoinAddToSecGroup.ps1, joins the container host and adds it to an Active Directory security group. The second script, CreateCredSpecFile.ps1, creates a Group Managed Service Account (gMSA), grants access to the Active Directory security group, generates the credential spec for this gMSA, and stores it locally on the container host. You can execute these PowerShellscripts on the ECS host. The following is an example usage of the scripts:
Before executing the app2container generate-deployment command, edit the deployment.json file to change the value of dockerSecurityOption to the name of the CredentialSpec file that the CreateCredSpecFile script generated. For example, "dockerSecurityOption": "credentialspec:file://dominion_mygmsaforiis.json"
Effectively, any access to network resource made by the IIS server inside the container for the site will now use the above gMSA to authenticate. The final step is to authorize this gMSA account on the network resources that the IIS server will access. A common example is authorizing this gMSA inside the SQL Server.
Now Available AWS App2Container is offered free. You only pay for the actual usage of AWS services like Amazon EC2, ECS, EKS, and S3 etc based on their usage. For details, please refer to App2Container FAQs and documentations. Give this a try, and please send us feedback either through your usual AWS Support contacts, on the AWS Forum for ECS, AWS Forum for EKS, or on the container roadmap on Github.
If you’re building modern applications, you might be using containers, or have experimented with them. A container is a standard way to package your application’s code, configurations, and dependencies into a single object. In contrast to virtual machines (VMs), containers virtualize the operating system rather than the server. Thus, the images are orders of magnitude smaller, and they start up much more quickly.
Like VMs, containers need to be scanned for vulnerabilities and patched as appropriate. For VMs running on Amazon Elastic Compute Cloud (Amazon EC2), you can use Amazon Inspector, a managed vulnerability assessment service, and then patch your EC2 instances as needed. For containers, vulnerability management is a little different. Instead of patching, you destroy and redeploy the container.
Many container deployments use Docker. Docker uses Dockerfiles to define the commands you use to build the Docker image that forms the basis of your container. Instead of patching in place, you rewrite your Dockerfile to point to more up-to-date base images, dependencies, or both and to rebuild the Docker image. Trivy lets you know which dependencies in the Docker image are vulnerable, and which version of those dependencies are no longer vulnerable, allowing you to quickly understand what to patch to get back to a secure state.
Solution architecture
Figure 1: Solution architecture
Here’s how the solution works, as shown in Figure 1:
Developers push Dockerfiles and other code to AWS CodeCommit.
AWS CodeBuild pushes the build logs in near real-time to an Amazon CloudWatch Logs group.
Trivy scans for all vulnerabilities and sends them to AWS Security Hub, regardless of severity.
If no critical vulnerabilities are found, the Docker images are deemed to have passed the scan and are pushed to Amazon Elastic Container Registry (ECR), so that they can be deployed.
Note: CodePipeline supports different sources, such as Amazon Simple Storage Service (Amazon S3) or GitHub. If you’re comfortable with those services, feel free to substitute them for this walkthrough of the solution.
To quickly deploy the solution, you’ll use an AWS CloudFormation template to deploy all needed services.
Prerequisites
You must have Security Hub enabled in the AWS Region where you deploy this solution. In the AWS Management Console, go to AWS Security Hub, and select Enable Security Hub.
You must have Aqua Security integration enabled in Security Hub in the Region where you deploy this solution. To do so, go to the AWS Security Hub console and, on the left, select Integrations, search for Aqua Security, and then select Accept Findings.
Setting up
For this stage, you’ll deploy the CloudFormation template and do preliminary setup of the CodeCommit repository.
After the CloudFormation stack completes, go to the CloudFormation console and select the Resources tab to see the resources created, as shown in Figure 2.
Figure 2: CloudFormation output
Setting up the CodeCommit repository
CodeCommit repositories need at least one file to initialize their master branch. Without a file, you can’t use a CodeCommit repository as a source for CodePipeline. To create a sample file, do the following.
Go to the CodeCommit console and, on the left, select Repositories, and then select your CodeCommit repository.
Scroll to the bottom of the page, select the Add File dropdown, and then select Create file.
In the Create a file screen, enter readme into the text body, name the file readme.md, enter your name as Author name and your Email address, and then select Commit changes, as shown in Figure 3.
Figure 3: Creating a file in CodeCommit
Simulate a vulnerable image build
For this stage, you’ll create the necessary files and add them to your CodeCommit repository to start an automated container vulnerability scan.
Note: In the buildspec.yml code, the values prepended with $ will be populated by the CodeBuild environmental variables you created earlier. Also, the command trivy -f json -o results.json –exit-code 1 will fail your build by forcing Trivy to return an exit code 1 upon finding a critical vulnerability. You can add additional severity levels here to force Trivy to fail your builds and ensure vulnerabilities of lower severity are not published to Amazon ECR.
Download the python code filesechub_parser.py from the GitHub repository. This script parses vulnerability details from the JSON file that Trivy generates, maps the information to the AWS Security Finding Format (ASFF), and then imports it to Security Hub.
Next, download the Dockerfile from the GitHub repository. The code clones a GitHub repository maintained by the Trivy team that has purposely vulnerable packages that generate critical vulnerabilities.
Go back to your CodeCommit repository, select the Add file dropdown menu, and then select Upload file.
In the Upload file screen, select Choose file, select the build specification you just created (buildspec.yml), complete the Commit changes to master section by adding the Author name and Email address, and select Commit changes, as shown in Figure 4.
Figure 4: Uploading a file to CodeCommit
To upload your Dockerfile and sechub_parser.py script to CodeCommit, repeat steps 4 and 5 for each of these files.
Your pipeline will automatically start in response to every new commit to your repository. To check the status, go back to the pipeline status view of your CodePipeline pipeline.
When CodeBuild starts, select Details in the Build stage of the CodePipeline, under BuildAction, to go to the Build section on the CodeBuild console. To see a stream of logs as your build progresses, select Tail logs, as shown in Figure 5.
Figure 5: CodeBuild Tailed Logs
After Trivy has finished scanning your image, CodeBuild will fail due to the critical vulnerabilities found, as shown in Figure 6.
Note: The command specified in the post-build stage will run even if the CodeBuild build fails. This is by design and allows the sechub_parser.py script to run and send findings to Security Hub.
Figure 6: CodeBuild logs failure
You’ll now go to Security Hub to further analyze the findings and create saved searches for future use.
Analyze container vulnerabilities in Security Hub
For this stage, you’ll analyze your container vulnerabilities in Security Hub and use the findings view to locate information within the ASFF.
Go to the Security Hub console and select Integrations in the left-hand navigation pane.
Scroll down to the Aqua Security integration card and select See findings, as shown in Figure 7. This filters to only Aqua Security product findings in the Findings view.
Figure 7: Aqua Security integration card
You should now see critical vulnerabilities from your previous scan in the Findings view, as shown in Figure 8. To see more details of a finding, select the Title of any of the vulnerabilities, and you will see the details in the right side of the Findings view.
Figure 8: Security Hub Findings pane
Note: Within the Findings view, you can apply quick filters by checking the box next to certain fields, although you won’t do that for the solution in this post.
To open a new tab to a website about the Common Vulnerabilities and Exposures (CVE) for the finding, select the hyperlink within the Remediation section, as shown in Figure 9.
Figure 9: Remediation information
Note: The fields shown in Figure 9 are dynamically populated by Trivy in its native output, and the CVE website can differ greatly from vulnerability to vulnerability.
To see the JSON of the full ASFF, at the top right of the Findings view, select the hyperlink for Finding ID.
To find information mapped from Trivy, such as the CVE title and what the patched version of the vulnerable package is, scroll down to the Other section, as shown in Figure 10.
Figure 10: ASFF, other
This was a brief demonstration of exploring findings with Security Hub. You can use custom actions to define response and remediation actions, such as sending these findings to a ticketing system or aggregating them in a security information event management (SIEM) tool.
Push a non-vulnerable Dockerfile
Now that you’ve seen Trivy perform correctly with a vulnerable image, you’ll fix the vulnerabilities. For this stage, you’ll modify Dockerfile to remove any vulnerable dependencies.
Open a text editor, paste in the code shown below, and save it as Dockerfile. You can overwrite your previous example if desired.
FROM alpine:3.7
RUN apk add --no-cache mysql-client
ENTRYPOINT ["mysql"]
Upload the new Dockerfile to CodeCommit, as shown earlier in this post.
Clean up
To avoid incurring additional charges from running these services, disable Security Hub and delete the CloudFormation stack after you’ve finished evaluating this solution. This will delete all resources created during this post. Deleting the CloudFormation stack will not remove the findings in Security Hub. If you don’t disable Security Hub, you can archive those findings and wait 90 days for them to be fully removed from your Security Hub instance.
Conclusion
In this post, you learned how to create a CI/CD Pipeline with CodeCommit, CodeBuild, and CodePipeline for building and scanning Docker images for critical vulnerabilities. You also used Security Hub to aggregate scan findings and take action on your container vulnerabilities.
You now have the skills to create similar CI/CD pipelines with AWS Developer Tools and to perform vulnerability scans as part of your container build process. Using these reports, you can efficiently identify CVEs and work with your developers to use up-to-date libraries and binaries for Docker images. To further build upon this solution, you can change the Trivy commands in your build specification to fail the builds on different severity levels of found vulnerabilities. As a final suggestion, you can use ECR as a source for a downstream CI/CD pipeline responsible for deploying your newly-scanned images to Amazon Elastic Container Service (Amazon ECS).
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Security Hub forum or contact AWS Support.
Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.
This blog post is a continuation of an existing article about optimizing your Java application for Amazon ECS with Quarkus. In this blog post, we examine the benefits of Quarkus in the context of AWS Lambda. Quarkus is a framework that uses the Open Java Development Kit (OpenJDK) with GraalVM and over 50 libraries like RESTEasy, Vertx, Hibernate, and Netty. This blog post shows you an effective approach for implementing a Java-based application and compiling it into a native-image through Quarkus. You can find the demo application code on GitHub.
Getting started
To build and deploy this application, you will need the AWS CLI, the AWS Serverless Application Model (AWS SAM), Git, Maven, OpenJDK 11, and Docker. AWS Cloud9 makes the setup easy. AWS Cloud9 is a cloud-based integrated development environment (IDE) that lets you write, run, and debug your code with just a browser. It comes with the AWS tools, Git, and Docker installed.
Create a new AWS Cloud9 EC2 environment based on Amazon Linux. Because the compilation process is very memory intensive, it is recommended to select an instance with at least 8 GiB of RAM (for example, m5.large).
Figure 1 – AWS Cloud9 environment
Launching the AWS Cloud9 environment from the AWS Management Console, you select the instance type. Pick an instance type with at least 8 GiB of RAM.
After creation, you are redirected automatically to your AWS Cloud9 environment’s IDE. You can navigate back to your IDE at any time through the AWS Cloud9 console.
All code blocks in this blog post refer to commands you enter into the terminal provided by the AWS Cloud9 IDE. AWS Cloud9 executes the commands on an underlying EC2 instance. If necessary, you can open a new Terminal in AWS Cloud9 by selecting Window → New Terminal.
Modify the EBS volume of the AWS Cloud9 EC2 instance to at least 20 GB to have enough space for the compilation of your application. Then, reboot the instance using the following command in the AWS Cloud9 IDE terminal, and wait for the AWS Cloud9 IDE to automatically reconnect to your instance.
sudo reboot
To satisfy the OpenJDK 11 requirement, run the following commands in the AWS Cloud9 IDE terminal to install Amazon Corretto 11. Amazon Corretto is a no-cost, multiplatform, production-ready distribution of the OpenJDK.
After you clone the demo application code, you can then build the application. Compiling the application to a self-contained JAR is straight forward. Navigate to the aws-quarkus-demo/lambda directory and kick off the Apache Maven build.
git clone https://github.com/aws-samples/aws-quarkus-demo.git cd aws-quarkus-demo/lambda/ mvn clean install
To compile the application to a native binary, you must add the parameter used by the Apache Maven build to run the necessary steps.
Create an AWS Lambda custom runtime from the application. A runtime is a program that runs an AWS Lambda function’s handler method when the function is invoked. Include a runtime in your function’s deployment package as an executable file named bootstrap.
A runtime is responsible for running the function’s setup code, reading the handler name from an environment variable, and reading invocation events from the AWS Lambda runtime API. The runtime passes the event data to the function handler, and posts the response from the handler back to AWS Lambda. Your custom runtime can be a shell script, a script in a language that’s included in Amazon Linux, or a binary executable file that’s compiled in Amazon Linux.
The architecture of the application is simple and consists of a few classes that implement a REST-service that stores all information in an Amazon DynamoDB-table. Quarkus offers an extension for Amazon DynamoDB that is based on the AWS SDK for Java V2.
Setting up the infrastructure
After you build the AWS Lambda function (as a regular build or native build), package your application and deploy it. The command sam deploy creates a zip file of your code and dependencies, uploads it to an Amazon S3 bucket, creates an AWS CloudFormation template, and deploys its resources.
The following command guides you through all necessary steps for packaging and deployment.
sam deploy --template-file sam.jvm.yaml \ --stack-name APIGatewayQuarkusDemo --capabilities
CAPABILITY_IAM --guided
If you want to deploy the native version of the application, you must use a different AWS SAM template.
sam deploy --template-file sam.native.yaml \ --stack-name APIGatewayQuarkusDemo --capabilities CAPABILITY_IAM --guided
During deployment, the AWS CloudFormation template creates the AWS Lambda function, an Amazon DynamoDB table, an Amazon API Gateway REST-API, and all necessary IAM roles. The output of the AWS CloudFormation stack is the API Gateway’s DNS record.
Let’s investigate the impact of using a native build in comparison to the regular build of our sample Java application. In this benchmark, we focus on the performance of the application. We want to get the AWS services and architecture out of the equation as much as possible, so we measure the duration of the AWS Lambda function executions of a function including its downstream calls to Amazon DynamoDB. This duration is provided in the Amazon CloudWatch Logs of the function.
The following two charts illustrate 40 Create (POST), Read (GET), and Delete (DELETE) call iterations for a user with the execution durations plotted on the vertical axis. The first graph shows the development of the duration over time observing a single JVM instance. In a second graph, exclusively reports on the performance of the iterations each hitting a fresh JVM.
This is an example application to demonstrate the use of a native build. When you start the optimization of your application make sure to read the best practices for working with AWS Lambda Functions first. Verify the effect of all your optimizations.
The “single cold call” -graph starts with a cold call and each consecutive call hits the same AWS Lambda function container and thus the same JVM. This graph shows both the regular build and the native build (denoted with *) of our application as an AWS Lambda function with 256 MB of memory on Java 11. The native build has been compiled with Quarkus version 1.2.1.
Figure 3 – Single cold call, followed by warm calls only
The first executions of the regular build have long durations (the vertical axis has a logarithmic scale) but quickly drop below 100 ms. Still, you can observe an ongoing fluctuation between 10 and 100 ms. For the native build you can observe a consistent execution duration, except for the first call and one outlier in iteration 20. The first call is slow because it is a cold call. The outlier occurs because the Substrate VM still needs garbage collector pauses. Only the first call is slower than the calls to a warmed up regular build.
Let’s dive deeper into the cold calls of the application and their duration. The following chart shows 40 cold calls for both the regular build and the native build.
Figure 3 – Each of the 40 Create-Read-Delete iterations start with one cold call
You can observe a consistent and predictable duration of the first calls. In this example, the execution of the AWS Lambda function of the native build takes just 0.6–5% of the duration of the regular build.
Tradeoffs
GraalVM assumes that all code is known at build time of the image, which means that no new code will be loaded at runtime. This means that not all applications can be optimized using GraalVM. Read a list of limitations in the GraalVM documentation. If the native image build for your application fails, a fallback image is created that requires a full JVM for execution.
Cleaning up
After you are finished, you can easily destroy all of these resources with a single command to save costs.
In this post, we described how Java applications are compiled to a native image through Quarkus and run using AWS Lambda. Testing the demo application, we’ve seen a performance improvement of more than 95% compared to a regular build. Keep in mind that the potential performance benefits vary depending on your application and its downstream calls to other services. Due to the limitations of GraalVM, your application may not be a candidate for optimization.
We also demonstrated how AWS SAM deploys the native image as an AWS Lambda function with a custom runtime behind an Amazon API Gateway.We hope we’ve given you some ideas on how you can optimize your existing Java application to reduce startup time and memory consumption. Feel free to submit enhancements to the sample application in the source repository.
Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.