Do you encrypt your Amazon Machine Instances (AMIs) with AWS Key Management Service (AWS KMS) customer master keys (CMKs) for regulatory or compliance reasons? Do you launch instances with encrypted root volumes? Do you create a golden AMI and distribute it to other accounts in your organization for standardizing application-specific Amazon Elastic Compute Cloud (Amazon EC2) instance launches? If so, then we have good news for you!
We’re happy to announce that you can now share AMIs encrypted with customer-managed CMKs across accounts with a single API call. Additionally, you can launch an EC2 instance directly from an encrypted AMI that has been shared with you. Previously, this was possible for unencrypted AMIs only. Extending this capability to encrypted AMIs simplifies your AMI distribution process and reduces the snapshot storage cost associated with the older method of sharing encrypted AMIs that resulted in a copy in each of your accounts.
In this post, we demonstrate how you can share an encrypted AMI between accounts and launch an encrypted, Amazon Elastic Block Store (Amazon EBS) backed EC2 instance from the shared AMI.
Prerequisites for sharing AMIs encrypted with a customer-managed CMK
Before you can begin sharing the encrypted AMI and launching an instance from it, you’ll need to set up your AWS KMS key policy and AWS Identity and Access Management (IAM) policies.
For this walkthrough, you need two AWS accounts:
A source account in which you build a custom AMI and encrypt the associated EBS snapshots.
A target account in which you launch instances using the shared custom AMI with encrypted snapshots.
In this example, I’ll use fictitious account IDs 111111111111 and 999999999999 for the source account and the target account, respectively. In addition, you’ll need to create an AWS KMS customer master key (CMK) in the source account in the source region. For simplicity, I’ve created an AWS KMS CMK with the alias cmkSource under account 111111111111 in us-east-1. I’ll use this cmkSource to encrypt my AMI (ami-1234578) that I’ll share with account 999999999999. As you go through this post, be sure to change the account IDs, source account, AWS KMS CMK, and AMI ID to match your own.
Create the policy setting for the source account
First, an IAM user or role in the source account needs permission to share the AMI (an EC2 ModifyImageAttribute operation). The following JSON policy document shows an example of the permissions an IAM user or role policy needs to have. In order to share ami-12345678, you’ll need to create a policy like this:
Second, the target account needs permission to use cmkSource for re-encrypting the snapshots. The following steps walk you through how to add the target account’s ID to the cmkSource key policy.
From the IAM console, select Encryption Keys in the left pane, and then select the source account’s CMK, cmkSource, as shown in Figure 1:
Figure 1: Select “cmkSource”
Look for the External Accounts subsection, type the target account ID (for example, 999999999999), and then select Add External Account.
Figure 2: Enter the target account ID and then select “Add External Account”
Create a policy setting for the target account
Once you’ve configured the source account, you need to configure the target account. The IAM user or role in the target account needs to be able to perform the AWS KMS DescribeKey, CreateGrant, ReEncrypt* and Decrypt operations on cmkSource in order to launch an instance from a shared encrypted AMI. The following JSON policy document shows an example of these permissions:
Once you’ve completed the configuration steps in the source and target accounts, then you’re ready to share and launch encrypted AMIs. The actual sharing of the encrypted AMI is no different from sharing an unencrypted AMI. If you want to use the AWS Management Console, then follow the steps described in Sharing an AMI (Console). To use the AWS Command Line Interface (AWS CLI), follow the steps described in Sharing an AMI (AWS CLI).
Below is an example of what the CLI command to share the AMI would look like:
To launch an AMI that was shared with you, set the AMI ID of the shared AMI in the image-id parameter of Run-Instances API/CLI. Optionally, to re-encrypt the volumes with a custom CMK in your account, you can specify the KmsKeyId in the Block Device Mapping as follows:
While launching the instance from a shared encrypted AMI, you can specify a CMK of your choice. You may also choose cmkSource to encrypt volumes in your account. However, we recommend that you re-encrypt the volumes using a CMK in the target account. This protects you if the source CMK is compromised, or if the source account revokes permissions, which could cause you to lose access to any encrypted volumes you created using cmkSource.
Summary
In this blog post, we discussed how you can easily share encrypted AMIs across accounts to launch encrypted instances. AWS has extended the same capabilities to snapshots, allowing you to create encrypted EBS volumes from shared encrypted snapshots.
This feature is available through the AWS Management Console, AWS CLI, or AWS SDKs at no extra charge in all commercial AWS regions except China. If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the Amazon EC2 forum or contact AWS Support.
Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.
An Amazon Machine Image (AMI) provides the information that you need to launch an instance (a virtual server) in your AWS environment. There are a number of AMIs on the AWS Marketplace (such as Amazon Linux, Red Hat or Ubuntu) that you can use to launch an Amazon Elastic Compute Cloud (Amazon EC2) instance. When you launch an instance from these AMIs, the resulting volumes are unencrypted. However, for regulatory purposes or internal compliance reasons, you might need to launch instances with encrypted root volumes. Previously, this required you to follow a a multi-step process in which you maintained a separate, encrypted copy of the AMI in your account in order to launch instances with encrypted volumes. Now, you no longer need to maintain multiple AMI copies for encryption. To launch an encrypted Amazon Elastic Block Store (Amazon EBS) backed instance from an unencrypted AMI, you can directly specify the encryption properties in your existing workflow (such as with the RunInstances API or the Launch Instance Wizard). This simplifies the process of launching instances with encrypted volumes and reduces your associated AMI storage costs.
In this post, we demonstrate how you can start from an unencrypted AMI and launch an encrypted EBS-backed Amazon EC2 instance. We’ll show you how to do this from both the AWS Management Console, and using the RunInstances API with the AWS Command Line Interface (AWS CLI).
Launching an instance from the AWS Management Console
In step 3, provide additional configuration details. For details about configuring your instances, see Launching an Instance.
In step 4, specify your EBS volumes. The encryption properties of the volumes will be inherited from the AMI that you’ve chosen. If you’re using an unencrypted AMI, it will show up as “Not Encrypted.” From the dropdown, you can then select a customer master key (CMK) for encrypting the volume. You may select the same CMK for each volume that you want to create, or you may use a different CMK for each volume.
Figure 1: Specifying your EBS volumes
Select Review and then Launch. Your instance will launch with an encrypted Amazon EBS volume that uses the CMK you selected. To learn more about the launch wizard, see Launching an Instance with Launch Wizard.
Launching an instance from the RunInstances API
From the RunInstances API/CLI, you can provide the kmsKeyID for encrypting the volumes that will be created from the AMI by specifying encryption in the BlockDeviceMapping (BDM) object. If you don’t specify the kmsKeyID in BDM but set the encryption flag to “true,” then AWS Managed CMK will be used for encrypting the volume.
For example, to launch an encrypted instance from an Amazon Linux AMI with an additional empty 100 GB of data volume (/dev/sdb), the API call would be as follows:
You may specify different keys for different volumes. Providing a kmsKeyID without the encryption flag will result in an API error.
Conclusion
In this blog post, we demonstrated how you can quickly launch encrypted, EBS-backed instances from an unencrypted AMI in a few steps. You can also use the same process to launch EBS-backed encrypted volumes from unencrypted snapshots. This simplifies the process of launching instances with encrypted volumes and reduces your AMI storage costs.
This feature is available through the AWS Management Console, AWS CLI, or AWS SDKs at no extra charge in all commercial AWS regions except China. If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the Amazon EC2 forum or contact AWS Support.
Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.
Contributed by Manuel Manzano Hoss, Cloud Support Engineer
I remember playing around with graphics processing units (GPUs) workload examples in 2017 when the Deep Learning on AWS Batch post was published by my colleague Kiuk Chung. He provided an example of how to train a convolutional neural network (CNN), the LeNet architecture, to recognize handwritten digits from the MNIST dataset using Apache MXNet as the framework. Back then, to run such jobs with GPU capabilities, I had to do the following:
Identify the type of P2 EC2 instance that had the required amount of GPU for my job.
Check the amount of vCPUs that it offered (even if I was not interested on using them).
Specify that number of vCPUs for my job.
All that, when I didn’t have any certainty that the instance was going to have the GPU required available when my job was already running. Back then, there was no GPU pinning. Other jobs running on the same EC2 instance were able to use that GPU, making the orchestration of my jobs a tricky task.
Fast forward two years. Today, AWS Batch announced integrated support for Amazon EC2 Accelerated Instances. It is now possible to specify an amount of GPU as a resource that AWS Batch considers in choosing the EC2 instance to run your job, along with vCPU and memory. That allows me to take advantage of the main benefits of using AWS Batch, the compute resource selection algorithm and job scheduler. It also frees me from having to check the types of EC2 instances that have enough GPU.
Also, I can take advantage of the Amazon ECS GPU-optimized AMI maintained by AWS. It comes with the NVIDIA drivers and all the necessary software to run GPU-enabled jobs. When I allow the P2 or P3 instance types on my compute environment, AWS Batch launches my compute resources using the Amazon ECS GPU-optimized AMI automatically.
In other words, now I don’t worry about the GPU task list mentioned earlier. I can focus on deciding which framework and command to run on my GPU-accelerated workload. At the same time, I’m now sure that my jobs have access to the required performance, as physical GPUs are pinned to each job and not shared among them.
A GPU race against the past
As a kind of GPU-race exercise, I checked a similar example to the one from Kiuk’s post, to see how fast it could be to run a GPU-enabled job now. I used the AWS Management Console to demonstrate how simple the steps are.
In this case, I decided to use the deep neural network architecture called multilayer perceptron (MLP), not the LeNet CNN, to compare the validation accuracy between them.
To make the test even simpler and faster to implement, I thought I would use one of the recently announced AWS Deep Learning containers, which come pre-packed with different frameworks and ready-to-process data. I chose the container that comes with MXNet and Python 2.7, customized for Training and GPU. For more information about the Docker images available, see the AWS Deep Learning Containers documentation.
In the AWS Batch console, I created a managed compute environment with the default settings, allowing AWS Batch to create the required IAM roles on my behalf.
On the configuration of the compute resources, I selected the P2 and P3 families of instances, as those are the type of instance with GPU capabilities. You can select On-Demand Instances, but in this case I decided to use Spot Instances to take advantage of the discounts that this pricing model offers. I left the defaults for all other settings, selecting the AmazonEC2SpotFleetRole role that I created the first time that I used Spot Instances:
Finally, I also left the network settings as default. My compute environment selected the default VPC, three subnets, and a security group. They are enough to run my jobs and at the same time keep my environment safe by limiting connections from outside the VPC:
I created a job queue, GPU_JobQueue, attaching it to the compute environment that I just created:
Next, I registered the same job definition that I would have created following Kiuk’s post. I specified enough memory to run this test, one vCPU, and the AWS Deep Learning Docker image that I chose, in this case mxnet-training:1.4.0-gpu-py27-cu90-ubuntu16.04. The amount of GPU required was in this case, one. To have access to run the script, the container must run as privileged, or using the root user.
Finally, I submitted the job. I first cloned the MXNet repository for the train_mnist.py Python script. Then I ran the script itself, with the parameter –gpus 0 to indicate that the assigned GPU should be used. The job inherits all the other parameters from the job definition:
That’s all, and my GPU-enabled job was running. It took me less than two minutes to go from zero to having the job submitted. This is the log of my job, from which I removed the iterations from epoch 1 to 18 to make it shorter:
As you can see, after AWS Batch launched the instance, the job took slightly more than two minutes to run. I spent roughly five minutes from start to finish. That was much faster than the time that I was previously spending just to configure the AMI. Using the AWS CLI, one of the AWS SDKs, or AWS CloudFormation, the same environment could be created even faster.
From a training point of view, I lost on the validation accuracy, as the results obtained using the LeNet CNN are higher than when using an MLP network. On the other hand, my job was faster, with a time cost of 1.6 seconds in average for each epoch. As the software stack evolves, and increased hardware capabilities come along, these numbers keep improving, but that shouldn’t mean extra complexity. Using managed primitives like the one presented in this post enables a simpler implementation.
I encourage you to test this example and see for yourself how just a few clicks or commands lets you start running GPU jobs with AWS Batch. Then, it is just a matter of replacing the Docker image that I used for one with the framework of your choice, TensorFlow, Caffe, PyTorch, Keras, etc. Start to run your GPU-enabled machine learning, deep learning, computational fluid dynamics (CFD), seismic analysis, molecular modeling, genomics, or computational finance workloads. It’s faster and easier than ever.
If you decide to give it a try, have any doubt or just want to let me know what you think about this post, please write in the comments section!
IPSec (IP Security) is a protocol for in-transit data protection between hosts. Configuration of site-to-site IPSec between multiple hosts can be an error-prone and intensive task. If you need to protect N EC2 instances, then you need a full mesh of N*(N-1)IPSec tunnels. You must manually propagate every IP change to all instances, configure credentials and configuration changes, and integrate monitoring and metrics into the operation. The efforts to keep the full-mesh parameters in sync are enormous.
Full mesh IPSec, known as any-to-any, builds an underlying network layer that protects application communication. Common use cases are:
You’re migrating legacy applications to AWS, and they don’t support encryption. Examples of protocols without encryption are File Transfer Protocol (FTP), Hypertext Transfer Protocol (HTTP) or Lightweight Directory Access Protocol (LDAP).
You’re offloading protection to IPSec to take advantage of fast Linux kernel encryption and automated certificate management, the use case we focus on in this solution.
You want to segregate duties between your application development and infrastructure security teams.
You want to protect container or application communication that leaves an EC2 instance.
In this post, I’ll show you how to build an opportunistic IPSec mesh that sets up dynamic IPSec tunnels between your Amazon Elastic Compute Cloud (EC2) instances. IPSec is based on Libreswan, an open-source project implementing opportunistic IPSec encryption (IKEv2 and IPSec) on a large scale.
Solution benefits and deliverable
The solution delivers the following benefits (versus manual site-to-site IPSec setup):
Automatic configuration of opportunistic IPSec upon EC2 launch.
Generation of instance certificates and weekly re-enrollment.
IPSec Monitoring metrics in Amazon CloudWatch for each EC2 instance.
An initial generation of a CA root key if needed, including IAM Policies and two customer master keys (CMKs) that will protect the CA key and instance key.
Out of scope
This solution does not deliver IPSec protection between EC2 instances and hosts that are on-premises, or between EC2 instances and managed AWS components, like Elastic Load Balancing, Amazon Relational Database Service, or Amazon Kinesis. Your EC2 instances must have general IP connectivity that allows NACLs and Security Groups. This solution cannot deliver extra connectivity like VPC peering or Transit VPC can.
Prerequisites
You’ll need the following resources to deploy the solution:
A trusted Unix/Linux/MacOS machine with AWS SDK for Python and OpenSSL
AWS admin rights in your AWS account (including API access)
AWS Systems Manager on EC2
Linux RedHat, Amazon Linux 2, or CentOS installed on the EC2 instances you want to configure
Internet access on the EC2 instances for downloading Linux packages and reaching AWS Systems Manager endpoint
My solution does not require any additional charges to standard AWS services, since it uses well-established open source software. Involved AWS services are as follows:
AWS Lambda is used to issue the certificates. Per EC2 and per week, I estimate the use of two 30 second Lambda functions with 256 MB of allocated memory. For 100 EC2 instances, the cost will be several cents. See AWS Lambda Pricing for details.
Certificates have no charge, since they’re issued by the Lambda function.
CloudWatch Events and Amazon S3 Storage usage are within the free tier policy.
AWS Systems Manager has no additional charge.
AWS EC2 is a standard AWS service on which you deploy your workload. There are no charges for IPSec encryption.
EC2 CPU performance decrease due to encryption is negligible since we use hardware encryption support of the Linux kernel. The IKE negotiation that is done by the OS in your CPU may add minimal CPU overhead depending on the number of EC2 instances involved.
Installation (one-time setup)
To get started, on a trusted Unix/Linux/MacOS machine that has admin access to your AWS account and AWS SDK for Python already installed, complete the following steps:
Edit the following files in the package to match your network setup:
config/private should contain all networks with mandatory IPSec protection, such as EC2s that should only be communicated with via IPSec. All of these hosts must have IPSec installed.
config/clear-or-private should contain networks with optional IPSec protection. These networks will start clear and attempt to add IPSec.
config/private-or-clear should also contain networks with optional IPSec protection. However, these networks will start with IPSec and fail back to clear.
Execute ./aws_setup.py and carefully set and verify the parameters. Use -h to view help. If you don’t provide customized options, default values will be generated. The parameters are:
Region to install the solution (default: your AWS Command Line Interface region)
Buckets for configuration, sources, published host certificates and CA storage. (Default: random values that follow the pattern ipsec-{hostcerts|cacrypto|sources}-{stackname} will be generated.) If the buckets do not exist, they will be automatically created.
Reuse of an existing CA? (default: no)
Leave encrypted backup copy of the CA key? The password will be printed to stdout (default: no)
Restrict provisioning to certain VPC (default: any)
Here is an example output:
./aws_setup.py -r ca-central-1 -p ipsec-host-v -c ipsec-crypto-v -s ipsec-source-v
Provisioning IPsec-Mesh version 0.1
Use --help for more options
Arguments:
----------------------------
Region: ca-central-1
Vpc ID: any
Hostcerts bucket: ipsec-host-v
CA crypto bucket: ipsec-crypto-v
Conf and sources bucket: ipsec-source-v
CA use existing: no
Leave CA key in local folder: no
AWS stackname: ipsec-efxqqfwy
----------------------------
Do you want to proceed ? [yes|no]: yes
The bucket ipsec-source-v already exists
File config/clear uploaded in bucket ipsec-source-v
File config/private uploaded in bucket ipsec-source-v
File config/clear-or-private uploaded in bucket ipsec-source-v
File config/private-or-clear uploaded in bucket ipsec-source-v
File config/oe-cert.conf uploaded in bucket ipsec-source-v
File sources/enroll_cert_lambda_function.zip uploaded in bucket ipsec-source-v
File sources/generate_certifcate_lambda_function.zip uploaded in bucket ipsec-source-v
File sources/ipsec_setup_lambda_function.zip uploaded in bucket ipsec-source-v
File sources/cron.txt uploaded in bucket ipsec-source-v
File sources/cronIPSecStats.sh uploaded in bucket ipsec-source-v
File sources/ipsecSetup.yaml uploaded in bucket ipsec-source-v
File sources/setup_ipsec.sh uploaded in bucket ipsec-source-v
File README.md uploaded in bucket ipsec-source-v
File aws_setup.py uploaded in bucket ipsec-source-v
The bucket ipsec-host-v already exists
Stack ipsec-efxqqfwy creation started. Waiting to finish (ca 3-5 min)
Created CA CMK key arn:aws:kms:ca-central-1:123456789012:key/abcdefgh-1234-1234-1234-abcdefgh123456
Certificate generation lambda arn:aws:lambda:ca-central-1:123456789012:function:GenerateCertificate-ipsec-efxqqfwy
Generating RSA private key, 4096 bit long modulus
.............................++
.................................................................................................................................................................................................................................................................................................................................................................................++
e is 65537 (0x10001)
Certificate and key generated. Subject CN=ipsec.ca-central-1 Valid 10 years
The bucket ipsec-crypto-v already exists
Encrypted CA key uploaded in bucket ipsec-crypto-v
CA cert uploaded in bucket ipsec-crypto-v
CA cert and key remove from local folder
Lambda functionarn:aws:lambda:ca-central-1:123456789012:function:GenerateCertificate-ipsec-efxqqfwy updated
Resource policy for CA CMK hardened - removed action kms:encrypt
done :-)
Launching the EC2 Instance
Now that you’ve installed the solution, you can start launching EC2 instances. From the EC2 Launch Wizard, execute the following steps. The instructions assume that you’re using RedHat, Amazon Linux 2, or CentOS.
Note: Steps or details that I don’t explicitly mention can be set to default (or according to your needs).
Select the IAM Role already configured by the solution with the pattern Ec2IPsec-{stackname}
Figure 1: Select the IAM Role
(You can skip this step if you are using Amazon Linux 2.) Under Advanced Details, select User data as text and activate the AWS Systems Manager Agent (SSM Agent) by providing the following string (for RedHat and CentOS 64 Bits only):
Figure 2: Select User data as text and activate the AWS Systems Manager Agent
Set the tag name to IPSec with the value todo. This is the identifier that triggers the installation and management of IPsec on the instance.
Figure 3: Set the tag name to “IPSec” with the value “todo”
On the Configuration page for the security group, allow ESP (Protocol 50) and IKE (UDP 500) for your network, like 172.31.0.0/16. You need to enter these values as shown in following screen:
Figure 4: Enter values on the “Configuration” page
After 1-2 minutes, the value of the IPSec instance tag will change to enabled, meaning the instance is successfully set up.
Figure 5: Look for the “enabled” value for the IPSec key
So what’s happening in the background?
Figure 6: Architectural diagram
As illustrated in the solution architecture diagram, the following steps are executed automatically in the background by the solution:
An EC2 launch triggers a CloudWatch event, which launches an IPSecSetup Lambda function.
The IPSecSetup Lambda function checks whether the EC2 instance has the tag IPSec:todo. If the tag is present, the Lambda function issues a certificate calling a GenerateCertificate Lambda.
The GenerateCertificate Lambda function downloads the encrypted CA certificate and key.
The GenerateCertificate Lambda function decrypts the CA key with a customer master key (CMK).
The GenerateCertificate Lambda function issues a host certificate to the EC2 instance. It encrypts the host certificate and key with a KMS generated random secret in PKCS12 structure. The secret is envelope-encrypted with a dedicated CMK.
The GenerateCertificate Lambda function publishes the issued certificates to your dedicated bucket for documentation.
The IPSec Lambda function calls and runs the installation via SSM.
The installation downloads the configuration and installs python, aws-sdk, libreswan, and curl if needed.
The EC2 instance decrypts the host key with the dedicated CMK and installs it in the IPSec database.
A weekly scheduled event triggers reenrollment of the certificates via the Reenrollcertificates Lambda function.
The Reenrollcertificates Lambda function triggers the IPSecSetup Lambda (call event type: execution). The IPSecSetup Lambda will renew the certificate only, leaving the rest of the configuration untouched.
Testing the connection on the EC2 instance
You can log in to the instance and ping one of the hosts in your network. This will trigger the IPSec connection and you should see successful answers.
$ ping 172.31.1.26
PING 172.31.1.26 (172.31.1.26) 56(84) bytes of data.
64 bytes from 172.31.1.26: icmp_seq=2 ttl=255 time=0.722 ms
64 bytes from 172.31.1.26: icmp_seq=3 ttl=255 time=0.483 ms
To see a list of IPSec tunnels you can execute the following:
sudo ipsec whack --trafficstatus
Here is an example of the execution:
Figure 7: Example execution
Changing your configuration or installing it on already running instances
All configuration exists in the source bucket (default: ipsec-source prefix), in files for libreswan standard. If you need to change the configuration, follow the following instructions:
Review and update the following files:
oe-conf, which is the configuration for libreswan
clear, private, private-to-clear and clear-to-ipsec, which should contain your network ranges.
Change the tag for the IPSec instance to IPSec:todo.
Stop and Start the instance (don’t restart). This will retrigger the setup of the instance.
Figure 8: Stop and start the instance
As an alternative to step 3, if you prefer not to stop and start the instance, you can invoke the IPSecSetup Lambda function via Test Event with a test JSON event in the following format:
A sample of test event creation in the Lambda Design window is shown below:
Figure 9: Sample test event creation
Monitoring and alarms
The solution delivers and takes care of IPSec/IKE Metrics and SNS Alarms in the case of errors. To monitor your IPSec environment, you can use Amazon CloudWatch. You can see metrics for active IPSec sessions, IKE/ESP errors, and connection shunts.
Figure 10: View metrics for active IPSec sessions, IKE/ESP errors, and connection shunts
There are two SNS topics and alarms configured for IPSec setup failure or certificate reenrollment failure. You will see an alarm and an SNS message. It’s very important that your administrator subscribes to notifications so that you can react quickly. If you receive an alarm, please use the information in the “Troubleshooting” section of this post, below.
Figure 11: Alarms
Troubleshooting
Below, I’ve listed some common errors and how to troubleshoot them:
The IPSec Tag doesn’t change to IPSec:enabled upon EC2 launch.
Wait 2 minutes after the EC2 instance launches, so that it becomes reachable for AWS SSM.
Check that the EC2 Instance has the right role assigned for the SSM Agent. The role is provisioned by the solution named Ec2IPsec-{stackname}.
Check that the SSM Agent is reachable via a NAT gateway, an Internet gateway, or a private SSM endpoint.
For CenOS and RedHat, check that you’ve installed the SSM Agent. See “Launching the EC2 instance.”
Check the output of the SSM Agent command execution in the EC2 service.
Check the IPSecSetup Lambda logs in CloudWatch for details.
The IPSec connection is lost after a few hours and can only be established from one host (in one direction).
Check that your Security Groups allow ESP Protocol and UDP 500. Security Groups are stateful. They may only allow a single direction for IPSec establishment.
Check that your network ACL allows UDP 500 and ESP Protocol.
The SNS Alarm on IPSec reenrollment is trigged, but everything seems to work fine.
Certificates are valid for 30 days and rotated every week. If the rotation fails, you have three weeks to fix the problem.
Check that the EC2 instances are reachable over AWS SSM. If reachable, trigger the certificate rotation Lambda again.
See the IPSecSetup Lambda logs in CloudWatch for details.
DNS Route 53, RDS, and other managed services are not reachable.
DNS, RDS and other managed services do not support IPSec. You need to exclude them from encryption by listing them in the config/clear list. For more details see step 2 of Installation (one-time setup) in this blog.
Here are some additional general IPSec commands for troubleshooting:
Stopping IPSec can be done by executing the following unix command:
sudo ipsec stop
If you want to stop IPSec on all instances, you can execute this command via AWS Systems Manager on all instances with the tag IPSec:enabled. Stopping encryption means all traffic will be sent unencrypted.
If you want to have a fail-open case, meaning on IKE(IPSec) failure send the data unencrypted, then configure your network in config/private-or-clear as described in step 2 of Installation (one-time setup).
Debugging IPSec issues can be done using Libreswan commands . For example:
The CA key is encrypted using an Advanced Encryption Standard (AES) 256 CBC 128-byte secret and stored in a bucket with server-side encryption (SSE). The secret is envelope-encrypted with a CMK in AWS KMP pattern. Only the certificate-issuing Lambda function can decrypt the secret KMS resource policy. The encrypted secret for the CA key is set in an encrypted environment variable of the certificate-issuing Lambda function.
The IPSec host private key is generated by the certificate-issuing Lambda function. The private key and certificate are encrypted with AES 256 CBC (PKCS12) and protected with a 128-byte secret generated by KMS. The secret is envelope-encrypted with a user CMK. Only the EC2 instances with attached IPSec IAM policy can decrypt the secret and private key.
The issuing of the certificate is a full synchronous call: One request and one corresponding response without any polling or similar sync/callbacks. The host private key is not stored in a database or an S3 bucket.
The issued certificates are valid for 30 days and are stored for auditing purposes in a certificates bucket without a private key.
Alternate subject names and multiple interfaces or secondary IPs
The certificate subject name and AltSubjectName attribute contains the private Domain Name System (DNS) of the EC2 and all private IPs assigned to the instance (interfaces, primary, and secondary IPs).
The provided default libreswan configuration covers a single interface. You can adjust the configuration according to libreswan documentation for multiple interfaces, for example, to cover Amazon Elastic Container Service for Kubernetes (Amazon EKS).
Conclusion
With the solution in this blog post, you can automate the process of building an encryption IPSec layer for your EC2 instances to protect your workloads. You don’t need to worry about configuring certificates, monitoring, and alerting. The solution uses a combination of AWS KMS, IAM, AWS Lambda, CloudWatch and the libreswan implementation. If you need libreswan support, use the mailing list or github. AWS forums can give you more information on KMS for IAM. If you require a special enterprise enhancement, contact AWS professional services.
Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.
Amazon GuardDuty is a managed threat detection service that continuously monitors for malicious or unauthorized behavior to help you protect your AWS accounts and workloads. Given the many log types that Amazon GuardDuty analyzes (Amazon Virtual Private Cloud (VPC) Flow Logs, AWS CloudTrail, and DNS logs), you never know what it might discover in your AWS account. After enabling GuardDuty, you might quickly find serious threats lurking in your account or, preferably, just end up staring at a blank dashboard for weeks…or even longer.
A while back at an AWS Loft event, one of the customers enabled GuardDuty in their AWS account for a lab we were running. Soon after, GuardDuty alerts (findings) popped up that indicated multiple Amazon Elastic Compute Cloud (EC2) instances were communicating with known command and control servers. This means that GuardDuty detected activity commonly seen in the situation where an EC2 instance has been taken over as part of a botnet. The customer asked if this was part of the lab, and we explained it wasn’t and that the findings should be immediately investigated. This led to an investigation by that customer’s security team and luckily the issue was resolved quickly.
Then there was the time we spoke to a customer that had been running GuardDuty for a few days but had yet to see any findings in the dashboard. They were concerned that the service wasn’t working. We explained that the lack of findings was actually a good thing, and we discussed how to generate sample findings to test GuardDuty and their remediation pipeline.
This post, and the corresponding GitHub repository, will help prepare you for either type of experience by walking you through a threat detection and remediation scenario. The scenario will show you how to quickly enable GuardDuty, generate and examine test findings, and then review automated remediation examples using AWS Lambda.
Scenario overview
The instructions and AWS CloudFormation template for setting everything up are provided in a GitHub repository. The CloudFormation template sets up a test environment in your AWS Account, configures everything needed to run through the scenario, generates GuardDuty findings and provides automatic remediation for the simulated threats in the scenario. All you need to do is run the CloudFormation template in the GitHub repository and then follow the instructions to investigate what occurred.
The scenario presented is that you manage an IT organization and Alice, your security engineer, has enabled GuardDuty in a production AWS Account and configured a few automated remediations. In threat detection and remediation, the standard pattern starts with a threat which is then investigated and finally remediated. These remediations can be manual or automated. Alice focused on a few specific attack vectors, which represent a small sample of what GuardDuty is capable of detecting. Alice has set all this up on Thursday but isn’t in the office on Monday. Unfortunately, as soon as you arrive at the office, GuardDuty notifies you that multiple threats have been detected (and given the automated remediation setup, these threats have been addressed but you still need to investigate.) The documentation in GitHub will guide you through the analysis of the findings and discuss how the automatic remediation works. You will also have the opportunity to manually trigger a GuardDuty finding and view that automated remediation.
The GuardDuty findings generated in the scenario are listed here:
You can get started immediately by browsing to the GitHub repository for this scenario where you will find the instructions and AWS CloudFormation template. This scenario will show you how easy it is to enable GuardDuty in addition to demonstrating some of the threats GuardDuty can discover. To learn more about Amazon GuardDuty please see the GuardDuty site and GuardDuty documentation.
If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the Amazon GuardDuty forum or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
Earlier this month we launched the C5 Instances with Local NVMe Storage and I told you that we would be doing the same for additional instance types in the near future!
Today we are introducing M5 instances equipped with local NVMe storage. Available for immediate use in 5 regions, these instances are a great fit for workloads that require a balance of compute and memory resources. Here are the specs:
Instance Name
vCPUs
RAM
Local Storage
EBS-Optimized Bandwidth
Network Bandwidth
m5d.large
2
8 GiB
1 x 75 GB NVMe SSD
Up to 2.120 Gbps
Up to 10 Gbps
m5d.xlarge
4
16 GiB
1 x 150 GB NVMe SSD
Up to 2.120 Gbps
Up to 10 Gbps
m5d.2xlarge
8
32 GiB
1 x 300 GB NVMe SSD
Up to 2.120 Gbps
Up to 10 Gbps
m5d.4xlarge
16
64 GiB
1 x 600 GB NVMe SSD
2.210 Gbps
Up to 10 Gbps
m5d.12xlarge
48
192 GiB
2 x 900 GB NVMe SSD
5.0 Gbps
10 Gbps
m5d.24xlarge
96
384 GiB
4 x 900 GB NVMe SSD
10.0 Gbps
25 Gbps
The M5d instances are powered by Custom Intel® Xeon® Platinum 8175M series processors running at 2.5 GHz, including support for AVX-512.
You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe; this includes the latest Amazon Linux, Microsoft Windows (Server 2008 R2, Server 2012, Server 2012 R2 and Server 2016), Ubuntu, RHEL, SUSE, and CentOS AMIs.
Here are a couple of things to keep in mind about the local NVMe storage on the M5d instances:
Naming – You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.
Encryption – Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated.
Lifetime – Local NVMe devices have the same lifetime as the instance they are attached to, and do not stick around after the instance has been stopped or terminated.
Available Now M5d instances are available in On-Demand, Reserved Instance, and Spot form in the US East (N. Virginia), US West (Oregon), EU (Ireland), US East (Ohio), and Canada (Central) Regions. Prices vary by Region, and are just a bit higher than for the equivalent M5 instances.
Join us this month to learn about AWS services and solutions. New this month, we have a fireside chat with the GM of Amazon WorkSpaces and our 2nd episode of the “How to re:Invent” series. We’ll also cover best practices, deep dives, use cases and more! Join us and register today!
AWS re:Invent June 13, 2018 | 05:00 PM – 05:30 PM PT – Episode 2: AWS re:Invent Breakout Content Secret Sauce – Hear from one of our own AWS content experts as we dive deep into the re:Invent content strategy and how we maintain a high bar. Compute
Containers June 25, 2018 | 09:00 AM – 09:45 AM PT – Running Kubernetes on AWS – Learn about the basics of running Kubernetes on AWS including how setup masters, networking, security, and add auto-scaling to your cluster.
June 19, 2018 | 11:00 AM – 11:45 AM PT – Launch AWS Faster using Automated Landing Zones – Learn how the AWS Landing Zone can automate the set up of best practice baselines when setting up new
June 21, 2018 | 01:00 PM – 01:45 PM PT – Enabling New Retail Customer Experiences with Big Data – Learn how AWS can help retailers realize actual value from their big data and deliver on differentiated retail customer experiences.
June 28, 2018 | 01:00 PM – 01:45 PM PT – Fireside Chat: End User Collaboration on AWS – Learn how End User Compute services can help you deliver access to desktops and applications anywhere, anytime, using any device. IoT
June 27, 2018 | 11:00 AM – 11:45 AM PT – AWS IoT in the Connected Home – Learn how to use AWS IoT to build innovative Connected Home products.
Mobile June 25, 2018 | 11:00 AM – 11:45 AM PT – Drive User Engagement with Amazon Pinpoint – Learn how Amazon Pinpoint simplifies and streamlines effective user engagement.
June 26, 2018 | 11:00 AM – 11:45 AM PT – Deep Dive: Hybrid Cloud Storage with AWS Storage Gateway – Learn how you can reduce your on-premises infrastructure by using the AWS Storage Gateway to connecting your applications to the scalable and reliable AWS storage services. June 27, 2018 | 01:00 PM – 01:45 PM PT – Changing the Game: Extending Compute Capabilities to the Edge – Discover how to change the game for IIoT and edge analytics applications with AWS Snowball Edge plus enhanced Compute instances. June 28, 2018 | 11:00 AM – 11:45 AM PT – Big Data and Analytics Workloads on Amazon EFS – Get best practices and deployment advice for running big data and analytics workloads on Amazon EFS.
Previously, I showed you how to rotate Amazon RDS database credentials automatically with AWS Secrets Manager. In addition to database credentials, AWS Secrets Manager makes it easier to rotate, manage, and retrieve API keys, OAuth tokens, and other secrets throughout their lifecycle. You can configure Secrets Manager to rotate these secrets automatically, which can help you meet your compliance needs. You can also use Secrets Manager to rotate secrets on demand, which can help you respond quickly to security events. In this post, I show you how to store an API key in Secrets Manager and use a custom Lambda function to rotate the key automatically. I’ll use a Twitter API key and bearer token as an example; you can reference this example to rotate other types of API keys.
The instructions are divided into four main phases:
Store a Twitter API key and bearer token in Secrets Manager.
Create a custom Lambda function to rotate the bearer token.
Configure your application to retrieve the bearer token from Secrets Manager.
Configure Secrets Manager to use the custom Lambda function to rotate the bearer token automatically.
For the purpose of this post, I use the placeholder Demo/Twitter_Api_Key to denote the API key, the placeholder Demo/Twitter_bearer_token to denote the bearer token, and placeholder Lambda_Rotate_Bearer_Token to denote the custom Lambda function. Be sure to replace these placeholders with the resource names from your account.
Phase 1: Store a Twitter API key and bearer token in Secrets Manager
Twitter enables developers to register their applications and retrieve an API key, which includes a consumer_key and consumer_secret. Developers use these to generate a bearer token that applications can then use to authenticate and retrieve information from Twitter. At any given point of time, you can use an API key to create only one valid bearer token.
Start by storing the API key in Secrets Manager. Here’s how:
Figure 1: The “Store a new secret” button in the AWS Secrets Manager console
Select Other type of secrets (because you’re storing an API key).
Input the consumer_key and consumer_secret, and then select Next.
Figure 2: Select the consumer_key and the consumer_secret
Specify values for Secret Name and Description, then select Next. For this example, I use Demo/Twitter_API_Key.
Figure 3: Set values for “Secret Name” and “Description”
On the next screen, keep the default setting, Disable automatic rotation, because you’ll use the same API key to rotate bearer tokens programmatically and automatically. Applications and employees will not retrieve this API key. Select Next.
Figure 4: Keep the default “Disable automatic rotation” setting
Review the information on the next screen and, if everything looks correct, select Store. You’ve now successfully stored a Twitter API key in Secrets Manager.
Next, store the bearer token in Secrets Manager. Here’s how:
From the Secrets Manager console, select Store a new secret, select Other type of secrets, input details (access_token, token_type, and ARN of the API key) about the bearer token, and then select Next.
Figure 5: Add details about the bearer token
Specify values for Secret Name and Description, and then select Next. For this example, I use Demo/Twitter_bearer_token.
Figure 6: Again set values for “Secret Name” and “Description”
Keep the default rotation setting, Disable automatic rotation, and then select Next. You’ll enable rotation after you’ve updated the application to use Secrets Manager APIs to retrieve secrets.
Review the information and select Store. You’ve now completed storing the bearer token in Secrets Manager. I take note of the sample code provided on the review page. I’ll use this code to update my application to retrieve the bearer token using Secrets Manager APIs.
Figure 7: The sample code you can use in your app
Phase 2: Create a custom Lambda function to rotate the bearer token
While Secrets Manager supports rotating credentials for databases hosted on Amazon RDS natively, it also enables you to meet your unique rotation-related use cases by authoring custom Lambda functions. Now that you’ve stored the API key and bearer token, you’ll create a Lambda function to rotate the bearer token. For this example, I’ll create my Lambda function using Python 3.6.
Figure 8: In the Lambda console, select “Create function”
Select Author from scratch. For this example, I use the name Lambda_Rotate_Bearer_Token for my Lambda function. I also set the Runtime environment as Python 3.6.
Figure 9: Create a new function from scratch
This Lambda function requires permissions to call AWS resources on your behalf. To grant these permissions, select Create a custom role. This opens a console tab.
Select Create a new IAM Role and specify the value for Role Name. For this example, I use Role_Lambda_Rotate_Twitter_Bearer_Token.
Figure 10: For “IAM Role,” select “Create a new IAM role”
Next, to define the IAM permissions, copy and paste the following IAM policy in the View Policy Document text-entry field. Be sure to replace the placeholder ARN-OF-Demo/Twitter_API_Key with the ARN of your secret.
Figure 11: The IAM policy pasted in the “View Policy Document” text-entry field
Now, select Allow. This brings me back to the Lambda console with the appropriate Role selected.
Select Create function.
Figure 12: Select the “Create function” button in the lower-right corner
Copy the following Python code and paste it in the Function code section.
import base64
import json
import logging
import os
import boto3
from botocore.vendored import requests
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
"""Secrets Manager Twitter Bearer Token Handler
This handler uses the master-user rotation scheme to rotate a bearer token of a Twitter app.
The Secret PlaintextString is expected to be a JSON string with the following format:
{
'access_token': ,
'token_type': ,
'masterarn':
}
Args:
event (dict): Lambda dictionary of event parameters. These keys must include the following:
- SecretId: The secret ARN or identifier
- ClientRequestToken: The ClientRequestToken of the secret version
- Step: The rotation step (one of createSecret, setSecret, testSecret, or finishSecret)
context (LambdaContext): The Lambda runtime information
Raises:
ResourceNotFoundException: If the secret with the specified arn and stage does not exist
ValueError: If the secret is not properly configured for rotation
KeyError: If the secret json does not contain the expected keys
"""
arn = event['SecretId']
token = event['ClientRequestToken']
step = event['Step']
# Setup the client and environment variables
service_client = boto3.client('secretsmanager', endpoint_url=os.environ['SECRETS_MANAGER_ENDPOINT'])
oauth2_token_url = os.environ['TWITTER_OAUTH2_TOKEN_URL']
oauth2_invalid_token_url = os.environ['TWITTER_OAUTH2_INVALID_TOKEN_URL']
tweet_search_url = os.environ['TWITTER_SEARCH_URL']
# Make sure the version is staged correctly
metadata = service_client.describe_secret(SecretId=arn)
if not metadata['RotationEnabled']:
logger.error("Secret %s is not enabled for rotation" % arn)
raise ValueError("Secret %s is not enabled for rotation" % arn)
versions = metadata['VersionIdsToStages']
if token not in versions:
logger.error("Secret version %s has no stage for rotation of secret %s." % (token, arn))
raise ValueError("Secret version %s has no stage for rotation of secret %s." % (token, arn))
if "AWSCURRENT" in versions[token]:
logger.info("Secret version %s already set as AWSCURRENT for secret %s." % (token, arn))
return
elif "AWSPENDING" not in versions[token]:
logger.error("Secret version %s not set as AWSPENDING for rotation of secret %s." % (token, arn))
raise ValueError("Secret version %s not set as AWSPENDING for rotation of secret %s." % (token, arn))
# Call the appropriate step
if step == "createSecret":
create_secret(service_client, arn, token, oauth2_token_url, oauth2_invalid_token_url)
elif step == "setSecret":
set_secret(service_client, arn, token, oauth2_token_url)
elif step == "testSecret":
test_secret(service_client, arn, token, tweet_search_url)
elif step == "finishSecret":
finish_secret(service_client, arn, token)
else:
logger.error("lambda_handler: Invalid step parameter %s for secret %s" % (step, arn))
raise ValueError("Invalid step parameter %s for secret %s" % (step, arn))
def create_secret(service_client, arn, token, oauth2_token_url, oauth2_invalid_token_url):
"""Get a new bearer token from Twitter
This method invalidates existing bearer token for the Twitter app and retrieves a new one from Twitter.
If a secret version with AWSPENDING stage exists, updates it with the newly retrieved bearer token and if
the AWSPENDING stage does not exist, creates a new version of the secret with that stage label.
Args:
service_client (client): The secrets manager service client
arn (string): The secret ARN or other identifier
token (string): The ClientRequestToken associated with the secret version
oauth2_token_url (string): The Twitter API endpoint to request a bearer token
oauth2_invalid_token_url (string): The Twitter API endpoint to invalidate a bearer token
Raises:
ValueError: If the current secret is not valid JSON
KeyError: If the secret json does not contain the expected keys
ResourceNotFoundException: If the current secret is not found
"""
# Make sure the current secret exists and try to get the master arn from the secret
try:
current_secret_dict = get_secret_dict(service_client, arn, "AWSCURRENT")
master_arn = current_secret_dict['masterarn']
logger.info("createSecret: Successfully retrieved secret for %s." % arn)
except service_client.exceptions.ResourceNotFoundException:
return
# create bearer token credentials to be passed as authorization string to Twitter
bearer_token_credentials = encode_credentials(service_client, master_arn, "AWSCURRENT")
# get the bearer token from Twitter
bearer_token_from_twitter = get_bearer_token(bearer_token_credentials,oauth2_token_url)
# invalidate the current bearer token
invalidate_bearer_token(oauth2_invalid_token_url,bearer_token_credentials,bearer_token_from_twitter)
# get a new bearer token from Twitter
new_bearer_token = get_bearer_token(bearer_token_credentials, oauth2_token_url)
# if a secret version with AWSPENDING stage exists, update it with the lastest bearer token
# if the AWSPENDING stage does not exist, then create the version with AWSPENDING stage
try:
pending_secret_dict = get_secret_dict(service_client, arn, "AWSPENDING", token)
pending_secret_dict['access_token'] = new_bearer_token
service_client.put_secret_value(SecretId=arn, ClientRequestToken=token, SecretString=json.dumps(pending_secret_dict), VersionStages=['AWSPENDING'])
logger.info("createSecret: Successfully invalidated the bearer token of the secret %s and updated the pending version" % arn)
except service_client.exceptions.ResourceNotFoundException:
current_secret_dict['access_token'] = new_bearer_token
service_client.put_secret_value(SecretId=arn, ClientRequestToken=token, SecretString=json.dumps(current_secret_dict), VersionStages=['AWSPENDING'])
logger.info("createSecret: Successfully invalidated the bearer token of the secret %s and and created the pending version." % arn)
def set_secret(service_client, arn, token, oauth2_token_url):
"""Validate the pending secret with that in Twitter
This method checks wether the bearer token in Twitter is the same as the one in the version with AWSPENDING stage.
Args:
service_client (client): The secrets manager service client
arn (string): The secret ARN or other identifier
token (string): The ClientRequestToken associated with the secret version
oauth2_token_url (string): The Twitter API endopoint to get a bearer token
Raises:
ResourceNotFoundException: If the secret with the specified arn and stage does not exist
ValueError: If the secret is not valid JSON or master credentials could not be used to login to DB
KeyError: If the secret json does not contain the expected keys
"""
# First get the pending version of the bearer token and compare it with that in Twitter
pending_secret_dict = get_secret_dict(service_client, arn, "AWSPENDING")
master_arn = pending_secret_dict['masterarn']
# create bearer token credentials to be passed as authorization string to Twitter
bearer_token_credentials = encode_credentials(service_client, master_arn, "AWSCURRENT")
# get the bearer token from Twitter
bearer_token_from_twitter = get_bearer_token(bearer_token_credentials, oauth2_token_url)
# if the bearer tokens are same, invalidate the bearer token in Twitter
# if not, raise an exception that bearer token in Twitter was changed outside Secrets Manager
if pending_secret_dict['access_token'] == bearer_token_from_twitter:
logger.info("createSecret: Successfully verified the bearer token of arn %s" % arn)
else:
raise ValueError("The bearer token of the Twitter app was changed outside Secrets Manager. Please check.")
def test_secret(service_client, arn, token, tweet_search_url):
"""Test the pending secret by calling a Twitter API
This method tries to use the bearer token in the secret version with AWSPENDING stage and search for tweets
with 'aws secrets manager' string.
Args:
service_client (client): The secrets manager service client
arn (string): The secret ARN or other identifier
token (string): The ClientRequestToken associated with the secret version
Raises:
ResourceNotFoundException: If the secret with the specified arn and stage does not exist
ValueError: If the secret is not valid JSON or pending credentials could not be used to login to the database
KeyError: If the secret json does not contain the expected keys
"""
# First get the pending version of the bearer token and compare it with that in Twitter
pending_secret_dict = get_secret_dict(service_client, arn, "AWSPENDING", token)
# Now verify you can search for tweets using the bearer token
if verify_bearer_token(pending_secret_dict['access_token'], tweet_search_url):
logger.info("testSecret: Successfully authorized with the pending secret in %s." % arn)
return
else:
logger.error("testSecret: Unable to authorize with the pending secret of secret ARN %s" % arn)
raise ValueError("Unable to connect to Twitter with pending secret of secret ARN %s" % arn)
def finish_secret(service_client, arn, token):
"""Finish the rotation by marking the pending secret as current
This method moves the secret from the AWSPENDING stage to the AWSCURRENT stage.
Args:
service_client (client): The secrets manager service client
arn (string): The secret ARN or other identifier
token (string): The ClientRequestToken associated with the secret version
Raises:
ResourceNotFoundException: If the secret with the specified arn and stage does not exist
"""
# First describe the secret to get the current version
metadata = service_client.describe_secret(SecretId=arn)
current_version = None
for version in metadata["VersionIdsToStages"]:
if "AWSCURRENT" in metadata["VersionIdsToStages"][version]:
if version == token:
# The correct version is already marked as current, return
logger.info("finishSecret: Version %s already marked as AWSCURRENT for %s" % (version, arn))
return
current_version = version
break
# Finalize by staging the secret version current
service_client.update_secret_version_stage(SecretId=arn, VersionStage="AWSCURRENT", MoveToVersionId=token, RemoveFromVersionId=current_version)
logger.info("finishSecret: Successfully set AWSCURRENT stage to version %s for secret %s." % (version, arn))
def encode_credentials(service_client, arn, stage):
"""Encodes the Twitter credentials
This helper function encodes the Twitter credentials (consumer_key and consumer_secret)
Args:
service_client (client):The secrets manager service client
arn (string): The secret ARN or other identifier
stage (stage): The stage identifying the secret version
Returns:
encoded_credentials (string): base64 encoded authorization string for Twitter
Raises:
KeyError: If the secret json does not contain the expected keys
"""
required_fields = ['consumer_key','consumer_secret']
master_secret_dict = get_secret_dict(service_client, arn, stage)
for field in required_fields:
if field not in master_secret_dict:
raise KeyError("%s key is missing from the secret JSON" % field)
encoded_credentials = base64.urlsafe_b64encode(
'{}:{}'.format(master_secret_dict['consumer_key'], master_secret_dict['consumer_secret']).encode('ascii')).decode('ascii')
return encoded_credentials
def get_bearer_token(encoded_credentials, oauth2_token_url):
"""Gets a bearer token from Twitter
This helper function retrieves the current bearer token from Twitter, given a set of credentials.
Args:
encoded_credentials (string): Twitter credentials for authentication
oauth2_token_url (string): REST API endpoint to request a bearer token from Twitter
Raises:
KeyError: If the secret json does not contain the expected keys
"""
headers = {
'Authorization': 'Basic {}'.format(encoded_credentials),
'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8',
}
data = 'grant_type=client_credentials'
response = requests.post(oauth2_token_url, headers=headers, data=data)
response_data = response.json()
if response_data['token_type'] == 'bearer':
bearer_token = response_data['access_token']
return bearer_token
else:
raise RuntimeError('unexpected token type: {}'.format(response_data['token_type']))
def invalidate_bearer_token(oauth2_invalid_token_url, bearer_token_credentials, bearer_token):
"""Invalidates a Bearer Token of a Twitter App
This helper function invalidates a bearer token of a Twitter app.
If successful, it returns the invalidated bearer token, else None
Args:
oauth2_invalid_token_url (string): The Twitter API endpoint to invalidate a bearer token
bearer_token_credentials (string): encoded consumer key and consumer secret to authenticate with Twitter
bearer_token (string): The bearer token to be invalidated
Returns:
invalidated_bearer_token: The invalidated bearer token
Raises:
ResourceNotFoundException: If the secret with the specified arn and stage does not exist
ValueError: If the secret is not valid JSON
KeyError: If the secret json does not contain the expected keys
"""
headers = {
'Authorization': 'Basic {}'.format(bearer_token_credentials),
'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8',
}
data = 'access_token=' + bearer_token
invalidate_response = requests.post(oauth2_invalid_token_url, headers=headers, data=data)
invalidate_response_data = invalidate_response.json()
if invalidate_response_data:
return
else:
raise RuntimeError('Invalidate bearer token request failed')
def verify_bearer_token(bearer_token, tweet_search_url):
"""Verifies access to Twitter APIs using a bearer token
This helper function verifies that the bearer token is valid by calling Twitter's search/tweets API endpoint
Args:
bearer_token (string): The current bearer token for the application
Returns:
True or False
Raises:
KeyError: If the response of search tweets API call fails
"""
headers = {
'Authorization' : 'Bearer {}'.format(bearer_token),
'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8',
}
search_results = requests.get(tweet_search_url, headers=headers)
try:
search_results.json()['statuses']
return True
except:
return False
def get_secret_dict(service_client, arn, stage, token=None):
"""Gets the secret dictionary corresponding for the secret arn, stage, and token
This helper function gets credentials for the arn and stage passed in and returns the dictionary by parsing the JSON string
Args:
service_client (client): The secrets manager service client
arn (string): The secret ARN or other identifier
token (string): The ClientRequestToken associated with the secret version, or None if no validation is desired
stage (string): The stage identifying the secret version
Returns:
SecretDictionary: Secret dictionary
Raises:
ResourceNotFoundException: If the secret with the specified arn and stage does not exist
ValueError: If the secret is not valid JSON
"""
# Only do VersionId validation against the stage if a token is passed in
if token:
secret = service_client.get_secret_value(SecretId=arn, VersionId=token, VersionStage=stage)
else:
secret = service_client.get_secret_value(SecretId=arn, VersionStage=stage)
plaintext = secret['SecretString']
# Parse and return the secret JSON string
return json.loads(plaintext)
Here’s what it will look like:
Figure 13: The Python code pasted in the “Function code” section
On the same page, provide the following environment variables:
Note: Resources used in this example are in US East (Ohio) region. If you intend to use another AWS Region, change the SECRETS_MANAGER_ENDPOINT set in the Environment variables to the appropriate region.
You’ve now created a Lambda function that can rotate the bearer token:
Figure 15: The new Lambda function
Before you can configure Secrets Manager to use this Lambda function, you need to update the function policy of the Lambda function. A function policy permits AWS services, such as Secrets Manager, to invoke a Lambda function on behalf of your application. You can attach a Lambda function policy from the AWS Command Line Interface (AWS CLI) or SDK. To attach a function policy, call the add-permission Lambda API from the AWS CLI.
Phase 3: Configure your application to retrieve the bearer token from Secrets Manager
Now that you’ve stored the bearer token in Secrets Manager, update the application to retrieve the bearer token from Secrets Manager instead of hard-coding this information in a configuration file or source code. For this example, I show you how to configure a Python application to retrieve this secret from Secrets Manager.
import config
def no_secrets_manager_sample()
# Get the bearer token from a config file.
Bearer_token = config.bearer_token
# Use the bearer token to authenticate requests to Twitter
Use the sample code from section titled Phase 1 and update the application to retrieve the bearer token from Secrets Manager. The following code sets up the client and retrieves and decrypts the secret Demo/Twitter_bearer_token.
# Use this code snippet in your app.
import boto3
from botocore.exceptions import ClientError
def get_secret():
secret_name = "Demo/Twitter_bearer_token"
endpoint_url = "https://secretsmanager.us-east-2.amazonaws.com"
region_name = "us-east-2"
session = boto3.session.Session()
client = session.client(
service_name='secretsmanager',
region_name=region_name,
endpoint_url=endpoint_url
)
try:
get_secret_value_response = client.get_secret_value(
SecretId=secret_name
)
except ClientError as e:
if e.response['Error']['Code'] == 'ResourceNotFoundException':
print("The requested secret " + secret_name + " was not found")
elif e.response['Error']['Code'] == 'InvalidRequestException':
print("The request was invalid due to:", e)
elif e.response['Error']['Code'] == 'InvalidParameterException':
print("The request had invalid params:", e)
else:
# Decrypted secret using the associated KMS CMK
# Depending on whether the secret was a string or binary, one of these fields will be populated
if 'SecretString' in get_secret_value_response:
secret = get_secret_value_response['SecretString']
else:
binary_secret_data = get_secret_value_response['SecretBinary']
# Your code goes here.
Applications require permissions to access Secrets Manager. My application runs on Amazon EC2 and uses an IAM role to get access to AWS services. I’ll attach the following policy to my IAM role, and you should take a similar action with your IAM role. This policy uses the GetSecretValue action to grant my application permissions to read secrets from Secrets Manager. This policy also uses the resource element to limit my application to read only the Demo/Twitter_bearer_token secret from Secrets Manager. Read the AWS Secrets Manager documentation to understand the minimum IAM permissions required to retrieve a secret.
{
"Version": "2012-10-17",
"Statement": {
"Sid": "RetrieveBearerToken",
"Effect": "Allow",
"Action": "secretsmanager:GetSecretValue",
"Resource": Input ARN of the secret Demo/Twitter_bearer_token here
}
}
Note: To improve the resiliency of your applications, associate your application with two API keys/bearer tokens. This is a higher availability option because you can continue to use one bearer token while Secrets Manager rotates the other token. Read the AWS documentation to learn how AWS Secrets Manager rotates your secrets.
Phase 4: Enable and verify rotation
Now that you’ve stored the secret in Secrets Manager and created a Lambda function to rotate this secret, configure Secrets Manager to rotate the secret Demo/Twitter_bearer_token.
From the Secrets Manager console, go to the list of secrets and choose the secret you created in the first step (in my example, this is named Demo/Twitter_bearer_token).
Scroll to Rotation configuration, and then select Edit rotation.
Figure 16: Select the “Edit rotation” button
To enable rotation, select Enable automatic rotation, and then choose how frequently you want Secrets Manager to rotate this secret. For this example, I set the rotation interval to 30 days. I also choose the rotation Lambda function, Lambda_Rotate_Bearer_Token, from the drop-down list.
Figure 17: “Edit rotation configuration” options
The banner on the next screen confirms that I have successfully configured rotation and the first rotation is in progress, which enables you to verify that rotation is functioning as expected. Secrets Manager will rotate this credential automatically every 30 days.
Figure 18: Confirmation notice
Summary
In this post, I showed you how to configure Secrets Manager to manage and rotate an API key and bearer token used by applications to authenticate and retrieve information from Twitter. You can use the steps described in this blog to manage and rotate other API keys, as well.
Secrets Manager helps you protect access to your applications, services, and IT resources without the upfront investment and on-going maintenance costs of operating your own secrets management infrastructure. To get started, open the Secrets Manager console. To learn more, read the Secrets Manager documentation.
If you have comments about this post, submit them in the Comments section below. If you have questions about anything in this post, start a new thread on the Secrets Manager forum or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
This post is courtesy of Alan Protasio, Software Development Engineer, Amazon Web Services
Just like compute and storage, messaging is a fundamental building block of enterprise applications. Message brokers (aka “message-oriented middleware”) enable different software systems, often written in different languages, on different platforms, running in different locations, to communicate and exchange information. Mission-critical applications, such as CRM and ERP, rely on message brokers to work.
A common performance consideration for customers deploying a message broker in a production environment is the throughput of the system, measured as messages per second. This is important to know so that application environments (hosts, threads, memory, etc.) can be configured correctly.
In this post, we demonstrate how to measure the throughput for Amazon MQ, a new managed message broker service for ActiveMQ, using JMS Benchmark. It should take between 15–20 minutes to set up the environment and an hour to run the benchmark. We also provide some tips on how to configure Amazon MQ for optimal throughput.
Benchmarking throughput for Amazon MQ
ActiveMQ can be used for a number of use cases. These use cases can range from simple fire and forget tasks (that is, asynchronous processing), low-latency request-reply patterns, to buffering requests before they are persisted to a database.
The throughput of Amazon MQ is largely dependent on the use case. For example, if you have non-critical workloads such as gathering click events for a non-business-critical portal, you can use ActiveMQ in a non-persistent mode and get extremely high throughput with Amazon MQ.
On the flip side, if you have a critical workload where durability is extremely important (meaning that you can’t lose a message), then you are bound by the I/O capacity of your underlying persistence store. We recommend using mq.m4.large for the best results. The mq.t2.micro instance type is intended for product evaluation. Performance is limited, due to the lower memory and burstable CPU performance.
Tip: To improve your throughput with Amazon MQ, make sure that you have consumers processing messaging as fast as (or faster than) your producers are pushing messages.
Because it’s impossible to talk about how the broker (ActiveMQ) behaves for each and every use case, we walk through how to set up your own benchmark for Amazon MQ using our favorite open-source benchmarking tool: JMS Benchmark. We are fans of the JMS Benchmark suite because it’s easy to set up and deploy, and comes with a built-in visualizer of the results.
Non-Persistent Scenarios – Queue latency as you scale producer throughput
Getting started
At the time of publication, you can create an mq.m4.large single-instance broker for testing for $0.30 per hour (US pricing).
Step 2 – Create an EC2 instance to run your benchmark Launch the EC2 instance using Step 1: Launch an Instance. We recommend choosing the m5.large instance type.
Step 3 – Configure the security groups Make sure that all the security groups are correctly configured to let the traffic flow between the EC2 instance and your broker.
From the broker list, choose the name of your broker (for example, MyBroker)
In the Details section, under Security and network, choose the name of your security group or choose the expand icon ( ).
From the security group list, choose your security group.
At the bottom of the page, choose Inbound, Edit.
In the Edit inbound rules dialog box, add a role to allow traffic between your instance and the broker: • Choose Add Rule. • For Type, choose Custom TCP. • For Port Range, type the ActiveMQ SSL port (61617). • For Source, leave Custom selected and then type the security group of your EC2 instance. • Choose Save.
Your broker can now accept the connection from your EC2 instance.
Step 4 – Run the benchmark Connect to your EC2 instance using SSH and run the following commands:
After the benchmark finishes, you can find the results in the ~/reports directory. As you may notice, the performance of ActiveMQ varies based on the number of consumers, producers, destinations, and message size.
Amazon MQ architecture
The last bit that’s important to know so that you can better understand the results of the benchmark is how Amazon MQ is architected.
Amazon MQ is architected to be highly available (HA) and durable. For HA, we recommend using the multi-AZ option. After a message is sent to Amazon MQ in persistent mode, the message is written to the highly durable message store that replicates the data across multiple nodes in multiple Availability Zones. Because of this replication, for some use cases you may see a reduction in throughput as you migrate to Amazon MQ. Customers have told us they appreciate the benefits of message replication as it helps protect durability even in the face of the loss of an Availability Zone.
Conclusion
We hope this gives you an idea of how Amazon MQ performs. We encourage you to run tests to simulate your own use cases.
To learn more, see the Amazon MQ website. You can try Amazon MQ for free with the AWS Free Tier, which includes up to 750 hours of a single-instance mq.t2.micro broker and up to 1 GB of storage per month for one year.
The adoption of Apache Spark has increased significantly over the past few years, and running Spark-based application pipelines is the new normal. Spark jobs that are in an ETL (extract, transform, and load) pipeline have different requirements—you must handle dependencies in the jobs, maintain order during executions, and run multiple jobs in parallel. In most of these cases, you can use workflow scheduler tools like Apache Oozie, Apache Airflow, and even Cron to fulfill these requirements.
Apache Oozie is a widely used workflow scheduler system for Hadoop-based jobs. However, its limited UI capabilities, lack of integration with other services, and heavy XML dependency might not be suitable for some users. On the other hand, Apache Airflow comes with a lot of neat features, along with powerful UI and monitoring capabilities and integration with several AWS and third-party services. However, with Airflow, you do need to provision and manage the Airflow server. The Cron utility is a powerful job scheduler. But it doesn’t give you much visibility into the job details, and creating a workflow using Cron jobs can be challenging.
What if you have a simple use case, in which you want to run a few Spark jobs in a specific order, but you don’t want to spend time orchestrating those jobs or maintaining a separate application? You can do that today in a serverless fashion using AWS Step Functions. You can create the entire workflow in AWS Step Functions and interact with Spark on Amazon EMR through Apache Livy.
In this post, I walk you through a list of steps to orchestrate a serverless Spark-based ETL pipeline using AWS Step Functions and Apache Livy.
Input data
For the source data for this post, I use the New York City Taxi and Limousine Commission (TLC) trip record data. For a description of the data, see this detailed dictionary of the taxi data. In this example, we’ll work mainly with the following three columns for the Spark jobs.
Column name
Column description
RateCodeID
Represents the rate code in effect at the end of the trip (for example, 1 for standard rate, 2 for JFK airport, 3 for Newark airport, and so on).
FareAmount
Represents the time-and-distance fare calculated by the meter.
TripDistance
Represents the elapsed trip distance in miles reported by the taxi meter.
The trip data is in comma-separated values (CSV) format with the first row as a header. To shorten the Spark execution time, I trimmed the large input data to only 20,000 rows. During the deployment phase, the input file tripdata.csv is stored in Amazon S3 in the <<your-bucket>>/emr-step-functions/input/ folder.
The following image shows a sample of the trip data:
Solution overview
The next few sections describe how Spark jobs are created for this solution, how you can interact with Spark using Apache Livy, and how you can use AWS Step Functions to create orchestrations for these Spark applications.
At a high level, the solution includes the following steps:
Trigger the AWS Step Function state machine by passing the input file path.
The first stage in the state machine triggers an AWS Lambda
The Lambda function interacts with Apache Spark running on Amazon EMR using Apache Livy, and submits a Spark job.
The state machine waits a few seconds before checking the Spark job status.
Based on the job status, the state machine moves to the success or failure state.
Subsequent Spark jobs are submitted using the same approach.
The state machine waits a few seconds for the job to finish.
The job finishes, and the state machine updates with its final status.
Let’s take a look at the Spark application that is used for this solution.
Spark jobs
For this example, I built a Spark jar named spark-taxi.jar. It has two different Spark applications:
MilesPerRateCode – The first job that runs on the Amazon EMR cluster. This job reads the trip data from an input source and computes the total trip distance for each rate code. The output of this job consists of two columns and is stored in Apache Parquet format in the output path.
The following are the expected output columns:
rate_code – Represents the rate code for the trip.
total_distance – Represents the total trip distance for that rate code (for example, sum(trip_distance)).
RateCodeStatus – The second job that runs on the EMR cluster, but only if the first job finishes successfully. This job depends on two different input sets:
csv – The same trip data that is used for the first Spark job.
miles-per-rate – The output of the first job.
This job first reads the tripdata.csv file and aggregates the fare_amount by the rate_code. After this point, you have two different datasets, both aggregated by rate_code. Finally, the job uses the rate_code field to join two datasets and output the entire rate code status in a single CSV file.
The output columns are as follows:
rate_code_id – Represents the rate code type.
total_distance – Derived from first Spark job and represents the total trip distance.
total_fare_amount – A new field that is generated during the second Spark application, representing the total fare amount by the rate code type.
Note that in this case, you don’t need to run two different Spark jobs to generate that output. The goal of setting up the jobs in this way is just to create a dependency between the two jobs and use them within AWS Step Functions.
Both Spark applications take one input argument called rootPath. It’s the S3 location where the Spark job is stored along with input and output data. Here is a sample of the final output:
The next section discusses how you can use Apache Livy to interact with Spark applications that are running on Amazon EMR.
Using Apache Livy to interact with Apache Spark
Apache Livy provides a REST interface to interact with Spark running on an EMR cluster. Livy is included in Amazon EMR release version 5.9.0 and later. In this post, I use Livy to submit Spark jobs and retrieve job status. When Amazon EMR is launched with Livy installed, the EMR master node becomes the endpoint for Livy, and it starts listening on port 8998 by default. Livy provides APIs to interact with Spark.
Let’s look at a couple of examples how you can interact with Spark running on Amazon EMR using Livy.
To list active running jobs, you can execute the following from the EMR master node:
curl localhost:8998/sessions
If you want to do the same from a remote instance, just change localhost to the EMR hostname, as in the following (port 8998 must be open to that remote instance through the security group):
Through Spark submit, you can pass multiple arguments for the Spark job and Spark configuration settings. You can also do that using Livy, by passing the S3 path through the args parameter, as shown following:
For a detailed list of Livy APIs, see the Apache Livy REST API page. This post uses GET /batches and POST /batches.
In the next section, you create a state machine and orchestrate Spark applications using AWS Step Functions.
Using AWS Step Functions to create a Spark job workflow
AWS Step Functions automatically triggers and tracks each step and retries when it encounters errors. So your application executes in order and as expected every time. To create a Spark job workflow using AWS Step Functions, you first create a Lambda state machine using different types of states to create the entire workflow.
First, you use the Task state—a simple state in AWS Step Functions that performs a single unit of work. You also use the Wait state to delay the state machine from continuing for a specified time. Later, you use the Choice state to add branching logic to a state machine.
The following is a quick summary of how to use different states in the state machine to create the Spark ETL pipeline:
Task state – Invokes a Lambda function. The first Task state submits the Spark job on Amazon EMR, and the next Task state is used to retrieve the previous Spark job status.
Wait state – Pauses the state machine until a job completes execution.
Choice state – Each Spark job execution can return a failure, an error, or a success state So, in the state machine, you use the Choice state to create a rule that specifies the next action or step based on the success or failure of the previous step.
Here is one of my Task states, MilesPerRateCode, which simply submits a Spark job:
"MilesPerRate Job": {
"Type": "Task",
"Resource":"arn:aws:lambda:us-east-1:xxxxxx:function:blog-miles-per-rate-job-submit-function",
"ResultPath": "$.jobId",
"Next": "Wait for MilesPerRate job to complete"
}
This Task state configuration specifies the Lambda function to execute. Inside the Lambda function, it submits a Spark job through Livy using Livy’s POST API. Using ResultPath, it tells the state machine where to place the result of the executing task. As discussed in the previous section, Spark submit returns the session ID, which is captured with $.jobId and used in a later state.
The following code section shows the Lambda function, which is used to submit the MilesPerRateCode job. It uses the Python request library to submit a POST against the Livy endpoint hosted on Amazon EMR and passes the required parameters in JSON format through payload. It then parses the response, grabs id from the response, and returns it. The Next field tells the state machine which state to go to next.
Just like in the MilesPerRate job, another state submits the RateCodeStatus job, but it executes only when all previous jobs have completed successfully.
Here is the Task state in the state machine that checks the Spark job status:
Just like other states, the preceding Task executes a Lambda function, captures the result (represented by jobStatus), and passes it to the next state. The following is the Lambda function that checks the Spark job status based on a given session ID:
In the Choice state, it checks the Spark job status value, compares it with a predefined state status, and transitions the state based on the result. For example, if the status is success, move to the next state (RateCodeJobStatus job), and if it is dead, move to the MilesPerRate job failed state.
To set up this entire solution, you need to create a few AWS resources. To make it easier, I have created an AWS CloudFormation template. This template creates all the required AWS resources and configures all the resources that are needed to create a Spark-based ETL pipeline on AWS Step Functions.
This CloudFormation template requires you to pass the following four parameters during initiation.
Parameter
Description
ClusterSubnetID
The subnet where the Amazon EMR cluster is deployed and Lambda is configured to talk to this subnet.
KeyName
The name of the existing EC2 key pair to access the Amazon EMR cluster.
VPCID
The ID of the virtual private cloud (VPC) where the EMR cluster is deployed and Lambda is configured to talk to this VPC.
S3RootPath
The Amazon S3 path where all required files (input file, Spark job, and so on) are stored and the resulting data is written.
IMPORTANT: These templates are designed only to show how you can create a Spark-based ETL pipeline on AWS Step Functions using Apache Livy. They are not intended for production use without modification. And if you try this solution outside of the us-east-1 Region, download the necessary files from s3://aws-data-analytics-blog/emr-step-functions, upload the files to the buckets in your Region, edit the script as appropriate, and then run it.
To launch the CloudFormation stack, choose Launch Stack:
Launching this stack creates the following list of AWS resources.
Logical ID
Resource Type
Description
StepFunctionsStateExecutionRole
IAM role
IAM role to execute the state machine and have a trust relationship with the states service.
SparkETLStateMachine
AWS Step Functions state machine
State machine in AWS Step Functions for the Spark ETL workflow.
LambdaSecurityGroup
Amazon EC2 security group
Security group that is used for the Lambda function to call the Livy API.
RateCodeStatusJobSubmitFunction
AWS Lambda function
Lambda function to submit the RateCodeStatus job.
MilesPerRateJobSubmitFunction
AWS Lambda function
Lambda function to submit the MilesPerRate job.
SparkJobStatusFunction
AWS Lambda function
Lambda function to check the Spark job status.
LambdaStateMachineRole
IAM role
IAM role for all Lambda functions to use the lambda trust relationship.
EMRCluster
Amazon EMR cluster
EMR cluster where Livy is running and where the job is placed.
During the AWS CloudFormation deployment phase, it sets up S3 paths for input and output. Input files are stored in the <<s3-root-path>>/emr-step-functions/input/ path, whereas spark-taxi.jar is copied under <<s3-root-path>>/emr-step-functions/.
The following screenshot shows how the S3 paths are configured after deployment. In this example, I passed a bucket that I created in the AWS account s3://tm-app-demos for the S3 root path.
If the CloudFormation template completed successfully, you will see Spark-ETL-State-Machine in the AWS Step Functions dashboard, as follows:
Choose the Spark-ETL-State-Machine state machine to take a look at this implementation. The AWS CloudFormation template built the entire state machine along with its dependent Lambda functions, which are now ready to be executed.
On the dashboard, choose the newly created state machine, and then choose New execution to initiate the state machine. It asks you to pass input in JSON format. This input goes to the first state MilesPerRate Job, which eventually executes the Lambda function blog-miles-per-rate-job-submit-function.
Pass the S3 root path as input:
{
“rootPath”: “s3://tm-app-demos”
}
Then choose Start Execution:
The rootPath value is the same value that was passed when creating the CloudFormation stack. It can be an S3 bucket location or a bucket with prefixes, but it should be the same value that is used for AWS CloudFormation. This value tells the state machine where it can find the Spark jar and input file, and where it will write output files. After the state machine starts, each state/task is executed based on its definition in the state machine.
At a high level, the following represents the flow of events:
Execute the first Spark job, MilesPerRate.
The Spark job reads the input file from the location <<rootPath>>/emr-step-functions/input/tripdata.csv. If the job finishes successfully, it writes the output data to <<rootPath>>/emr-step-functions/miles-per-rate.
If the Spark job fails, it transitions to the error state MilesPerRate job failed, and the state machine stops. If the Spark job finishes successfully, it transitions to the RateCodeStatus Job state, and the second Spark job is executed.
If the second Spark job fails, it transitions to the error state RateCodeStatus job failed, and the state machine stops with the Failed status.
If this Spark job completes successfully, it writes the final output data to the <<rootPath>>/emr-step-functions/rate-code-status/ It also transitions the RateCodeStatus job finished state, and the state machine ends its execution with the Success status.
This following screenshot shows a successfully completed Spark ETL state machine:
The right side of the state machine diagram shows the details of individual states with their input and output.
When you execute the state machine for the second time, it fails because the S3 path already exists. The state machine turns red and stops at MilePerRate job failed. The following image represents that failed execution of the state machine:
You can also check your Spark application status and logs by going to the Amazon EMR console and viewing the Application history tab:
I hope this walkthrough paints a picture of how you can create a serverless solution for orchestrating Spark jobs on Amazon EMR using AWS Step Functions and Apache Livy. In the next section, I share some ideas for making this solution even more elegant.
Next steps
The goal of this post is to show a simple example that uses AWS Step Functions to create an orchestration for Spark-based jobs in a serverless fashion. To make this solution robust and production ready, you can explore the following options:
In this example, I manually initiated the state machine by passing the rootPath as input. You can instead trigger the state machine automatically. To run the ETL pipeline as soon as the files arrive in your S3 bucket, you can pass the new file path to the state machine. Because CloudWatch Events supports AWS Step Functions as a target, you can create a CloudWatch rule for an S3 event. You can then set AWS Step Functions as a target and pass the new file path to your state machine. You’re all set!
You can also improve this solution by adding an alerting mechanism in case of failures. To do this, create a Lambda function that sends an alert email and assigns that Lambda function to a Fail That way, when any part of your state fails, it triggers an email and notifies the user.
If you want to submit multiple Spark jobs in parallel, you can use the Parallel state type in AWS Step Functions. The Parallel state is used to create parallel branches of execution in your state machine.
With Lambda and AWS Step Functions, you can create a very robust serverless orchestration for your big data workload.
Cleaning up
When you’ve finished testing this solution, remember to clean up all those AWS resources that you created using AWS CloudFormation. Use the AWS CloudFormation console or AWS CLI to delete the stack named Blog-Spark-ETL-Step-Functions.
Summary
In this post, I showed you how to use AWS Step Functions to orchestrate your Spark jobs that are running on Amazon EMR. You used Apache Livy to submit jobs to Spark from a Lambda function and created a workflow for your Spark jobs, maintaining a specific order for job execution and triggering different AWS events based on your job’s outcome. Go ahead—give this solution a try, and share your experience with us!
Tanzir Musabbir is an EMR Specialist Solutions Architect with AWS. He is an early adopter of open source Big Data technologies. At AWS, he works with our customers to provide them architectural guidance for running analytics solutions on Amazon EMR, Amazon Athena & AWS Glue. Tanzir is a big Real Madrid fan and he loves to travel in his free time.
Thanks to Greg Eppel, Sr. Solutions Architect, Microsoft Platform for this great blog that describes how to create a custom CodeBuild build environment for the .NET Framework. — AWS CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy. CodeBuild provides curated build environments for programming languages and runtimes such as Android, Go, Java, Node.js, PHP, Python, Ruby, and Docker. CodeBuild now supports builds for the Microsoft Windows Server platform, including a prepackaged build environment for .NET Core on Windows. If your application uses the .NET Framework, you will need to use a custom Docker image to create a custom build environment that includes the Microsoft proprietary Framework Class Libraries. For information about why this step is required, see our FAQs. In this post, I’ll show you how to create a custom build environment for .NET Framework applications and walk you through the steps to configure CodeBuild to use this environment.
Build environments are Docker images that include a complete file system with everything required to build and test your project. To use a custom build environment in a CodeBuild project, you build a container image for your platform that contains your build tools, push it to a Docker container registry such as Amazon Elastic Container Registry (Amazon ECR), and reference it in the project configuration. When it builds your application, CodeBuild retrieves the Docker image from the container registry specified in the project configuration and uses the environment to compile your source code, run your tests, and package your application.
Step 1: Launch EC2 Windows Server 2016 with Containers
In the Amazon EC2 console, in your region, launch an Amazon EC2 instance from a Microsoft Windows Server 2016 Base with Containers AMI.
Increase disk space on the boot volume to at least 50 GB to account for the larger size of containers required to install and run Visual Studio Build Tools.
Run the following command in that directory. This process can take a while. It depends on the size of EC2 instance you launched. In my tests, a t2.2xlarge takes less than 30 minutes to build the image and produces an approximately 15 GB image.
docker build -t buildtools2017:latest -m 2GB .
Run the following command to test the container and start a command shell with all the developer environment variables:
docker run -it buildtools2017
Create a repository in the Amazon ECS console. For the repository name, type buildtools2017. Choose Next step and then complete the remaining steps.
Execute the following command to generate authentication details for our registry to the local Docker engine. Make sure you have permissions to the Amazon ECR registry before you execute the command.
aws ecr get-login
In the same command prompt window, copy and paste the following commands:
In the CodeCommit console, create a repository named DotNetFrameworkSampleApp. On the Configure email notifications page, choose Skip.
Clone a .NET Framework Docker sample application from GitHub. The repository includes a sample ASP.NET Framework that we’ll use to demonstrate our custom build environment.On the EC2 instance, open a command prompt and execute the following commands:
Navigate to the CodeCommit repository and confirm that the files you just pushed are there.
Step 4: Configure build spec
To build your .NET Framework application with CodeBuild you use a build spec, which is a collection of build commands and related settings, in YAML format, that AWS CodeBuild can use to run a build. You can include a build spec as part of the source code or you can define a build spec when you create a build project. In this example, I include a build spec as part of the source code.
In the root directory of your source directory, create a YAML file named buildspec.yml.
At this point, we have a Docker image with Visual Studio Build Tools installed and stored in the Amazon ECR registry. We also have a sample ASP.NET Framework application in a CodeCommit repository. Now we are going to set up CodeBuild to build the ASP.NET Framework application.
In the Amazon ECR console, choose the repository that was pushed earlier with the docker push command. On the Permissions tab, choose Add.
For Source Provider, choose AWS CodeCommit and then choose the called DotNetFrameworkSampleApp repository.
For Environment Image, choose Specify a Docker image.
For Environment type, choose Windows.
For Custom image type, choose Amazon ECR.
For Amazon ECR repository, choose the Docker image with the Visual Studio Build Tools installed, buildtools2017. Your configuration should look like the image below:
Choose Continue and then Save and Build to create your CodeBuild project and start your first build. You can monitor the status of the build in the console. You can also configure notifications that will notify subscribers whenever builds succeed, fail, go from one phase to another, or any combination of these events.
Summary
CodeBuild supports a number of platforms and languages out of the box. By using custom build environments, it can be extended to other runtimes. In this post, I showed you how to build a .NET Framework environment on a Windows container and demonstrated how to use it to build .NET Framework applications in CodeBuild.
We’re excited to see how customers extend and use CodeBuild to enable continuous integration and continuous delivery for their Windows applications. Feel free to share what you’ve learned extending CodeBuild for your own projects. Just leave questions or suggestions in the comments.
This post courtesy of Thiago Morais, AWS Solutions Architect
When you build web applications or expose any data externally, you probably look for a platform where you can build highly scalable, secure, and robust REST APIs. As APIs are publicly exposed, there are a number of best practices for providing a secure mechanism to consumers using your API.
Amazon API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, authorization and access control, monitoring, and API version management.
In this post, I show you how to take advantage of the regional API endpoint feature in API Gateway, so that you can create your own Amazon CloudFront distribution and secure your API using AWS WAF.
AWS WAF is a web application firewall that helps protect your web applications from common web exploits that could affect application availability, compromise security, or consume excessive resources.
As you make your APIs publicly available, you are exposed to attackers trying to exploit your services in several ways. The AWS security team published a whitepaper solution using AWS WAF, How to Mitigate OWASP’s Top 10 Web Application Vulnerabilities.
Regional API endpoints
Edge-optimized APIs are endpoints that are accessed through a CloudFront distribution created and managed by API Gateway. Before the launch of regional API endpoints, this was the default option when creating APIs using API Gateway. It primarily helped to reduce latency for API consumers that were located in different geographical locations than your API.
When API requests predominantly originate from an Amazon EC2 instance or other services within the same AWS Region as the API is deployed, a regional API endpoint typically lowers the latency of connections. It is recommended for such scenarios.
For better control around caching strategies, customers can use their own CloudFront distribution for regional APIs. They also have the ability to use AWS WAF protection, as I describe in this post.
Edge-optimized API endpoint
The following diagram is an illustrated example of the edge-optimized API endpoint where your API clients access your API through a CloudFront distribution created and managed by API Gateway.
Regional API endpoint
For the regional API endpoint, your customers access your API from the same Region in which your REST API is deployed. This helps you to reduce request latency and particularly allows you to add your own content delivery network, as needed.
Walkthrough
In this section, you implement the following steps:
Attach the web ACL to the CloudFront distribution.
Test AWS WAF protection.
Create the regional API
For this walkthrough, use an existing PetStore API. All new APIs launch by default as the regional endpoint type. To change the endpoint type for your existing API, choose the cog icon on the top right corner:
After you have created the PetStore API on your account, deploy a stage called “prod” for the PetStore API.
On the API Gateway console, select the PetStore API and choose Actions, Deploy API.
For Stage name, type prod and add a stage description.
Choose Deploy and the new API stage is created.
Use the following AWS CLI command to update your API from edge-optimized to regional:
{
"description": "Your first API with Amazon API Gateway. This is a sample API that integrates via HTTP with your demo Pet Store endpoints",
"createdDate": 1511525626,
"endpointConfiguration": {
"types": [
"REGIONAL"
]
},
"id": "{api-id}",
"name": "PetStore"
}
After you change your API endpoint to regional, you can now assign your own CloudFront distribution to this API.
Create a CloudFront distribution
To make things easier, I have provided an AWS CloudFormation template to deploy a CloudFront distribution pointing to the API that you just created. Click the button to deploy the template in the us-east-1 Region.
For Stack name, enter RegionalAPI. For APIGWEndpoint, enter your API FQDN in the following format:
{api-id}.execute-api.us-east-1.amazonaws.com
After you fill out the parameters, choose Next to continue the stack deployment. It takes a couple of minutes to finish the deployment. After it finishes, the Output tab lists the following items:
A CloudFront domain URL
An S3 bucket for CloudFront access logs
Output from CloudFormation
Test the CloudFront distribution
To see if the CloudFront distribution was configured correctly, use a web browser and enter the URL from your distribution, with the following parameters:
With the new CloudFront distribution in place, you can now start setting up AWS WAF to protect your API.
For this demo, you deploy the AWS WAF Security Automations solution, which provides fine-grained control over the requests attempting to access your API.
For more information about deployment, see Automated Deployment. If you prefer, you can launch the solution directly into your account using the following button.
For CloudFront Access Log Bucket Name, add the name of the bucket created during the deployment of the CloudFormation stack for your CloudFront distribution.
The solution allows you to adjust thresholds and also choose which automations to enable to protect your API. After you finish configuring these settings, choose Next.
To start the deployment process in your account, follow the creation wizard and choose Create. It takes a few minutes do finish the deployment. You can follow the creation process through the CloudFormation console.
After the deployment finishes, you can see the new web ACL deployed on the AWS WAF console, AWSWAFSecurityAutomations.
Attach the AWS WAF web ACL to the CloudFront distribution
With the solution deployed, you can now attach the AWS WAF web ACL to the CloudFront distribution that you created earlier.
To assign the newly created AWS WAF web ACL, go back to your CloudFront distribution. After you open your distribution for editing, choose General, Edit.
Select the new AWS WAF web ACL that you created earlier, AWSWAFSecurityAutomations.
Save the changes to your CloudFront distribution and wait for the deployment to finish.
Test AWS WAF protection
To validate the AWS WAF Web ACL setup, use Artillery to load test your API and see AWS WAF in action.
To install Artillery on your machine, run the following command:
$ npm install -g artillery
After the installation completes, you can check if Artillery installed successfully by running the following command:
$ artillery -V
$ 1.6.0-12
As the time of publication, Artillery is on version 1.6.0-12.
One of the WAF web ACL rules that you have set up is a rate-based rule. By default, it is set up to block any requesters that exceed 2000 requests under 5 minutes. Try this out.
First, use cURL to query your distribution and see the API output:
What you are doing is firing 2000 requests to your API from 10 concurrent users. For brevity, I am not posting the Artillery output here.
After Artillery finishes its execution, try to run the cURL request again and see what happens:
$ curl -s https://{distribution-name}.cloudfront.net/prod/pets
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<TITLE>ERROR: The request could not be satisfied</TITLE>
</HEAD><BODY>
<H1>ERROR</H1>
<H2>The request could not be satisfied.</H2>
<HR noshade size="1px">
Request blocked.
<BR clear="all">
<HR noshade size="1px">
<PRE>
Generated by cloudfront (CloudFront)
Request ID: [removed]
</PRE>
<ADDRESS>
</ADDRESS>
</BODY></HTML>
As you can see from the output above, the request was blocked by AWS WAF. Your IP address is removed from the blocked list after it falls below the request limit rate.
Conclusion
In this first part, you saw how to use the new API Gateway regional API endpoint together with Amazon CloudFront and AWS WAF to secure your API from a series of attacks.
In the second part, I will demonstrate some other techniques to protect your API using API keys and Amazon CloudFront custom headers.
When I talk with customers and partners, I find that they are in different stages in the adoption of DevOps methodologies. They are automating the creation of application artifacts and the deployment of their applications to different infrastructure environments. In many cases, they are creating and supporting multiple applications using a variety of coding languages and artifacts.
The management of these processes and artifacts can be challenging, but using the right tools and methodologies can simplify the process.
In this post, I will show you how you can automate the creation and storage of application artifacts through the implementation of a pipeline and custom deploy action in AWS CodePipeline. The example includes a Node.js code base stored in an AWS CodeCommit repository. A Node Package Manager (npm) artifact is built from the code base, and the build artifact is published to a JFrogArtifactory npm repository.
I frequently recommend AWS CodePipeline, the AWS continuous integration and continuous delivery tool. You can use it to quickly innovate through integration and deployment of new features and bug fixes by building a workflow that automates the build, test, and deployment of new versions of your application. And, because AWS CodePipeline is extensible, it allows you to create a custom action that performs customized, automated actions on your behalf.
JFrog’s Artifactory is a universal binary repository manager where you can manage multiple applications, their dependencies, and versions in one place. Artifactory also enables you to standardize the way you manage your package types across all applications developed in your company, no matter the code base or artifact type.
If you already have a Node.js CodeCommit repository, a JFrog Artifactory host, and would like to automate the creation of the pipeline, including the custom action and CodeBuild project, you can use this AWS CloudFormationtemplate to create your AWS CloudFormation stack.
This figure shows the path defined in the pipeline for this project. It starts with a change to Node.js source code committed to a private code repository in AWS CodeCommit. With this change, CodePipeline triggers AWS CodeBuild to create the npm package from the node.js source code. After the build, CodePipeline triggers the custom action job worker to commit the build artifact to the designated artifact repository in Artifactory.
This blog post assumes you have already:
· Created a CodeCommit repository that contains a Node.js project.
· Configured a two-stage pipeline in AWS CodePipeline.
The Source stage of the pipeline is configured to poll the Node.js CodeCommit repository. The Build stage is configured to use a CodeBuild project to build the npm package using a buildspec.yml file located in the code repository.
If you do not have a Node.js repository, you can create a CodeCommit repository that contains this simple ‘Hello World’ project. This project also includes a buildspec.yml file that is used when you define your CodeBuild project. It defines the steps to be taken by CodeBuild to create the npm artifact.
If you do not already have a pipeline set up in CodePipeline, you can use this template to create a pipeline with a CodeCommit source action and a CodeBuild build action through the AWS Command Line Interface (AWS CLI). If you do not want to install the AWS CLI on your local machine, you can use AWS Cloud9, our managed integrated development environment (IDE), to interact with AWS APIs.
In your development environment, open your favorite editor and fill out the template with values appropriate to your project. For information, see the readme in the GitHub repository.
Use this CLI command to create the pipeline from the template:
It creates a pipeline that has a CodeCommit source action and a CodeBuild build action.
Integrating JFrog Artifactory
JFrog Artifactory provides default repositories for your project needs. For my NPM package repository, I am using the default virtual npm repository (named npm) that is available in Artifactory Pro. You might want to consider creating a repository per project but for the example used in this post, using the default lets me get started without having to configure a new repository.
I can use the steps in the Set Me Up -> npm section on the landing page to configure my worker to interact with the default NPM repository.
Describes the required values to run the custom action. I will define my custom action in the ‘Deploy’ category, identify the provider as ‘Artifactory’, of version ‘1’, and specify a variety of configurationProperties whose values will be defined when this stage is added to my pipeline.
Polls CodePipeline for a job, scanning for its action-definition properties. In this blog post, after a job has been found, the job worker does the work required to publish the npm artifact to the Artifactory repository.
{
"category": "Deploy",
"configurationProperties": [{
"name": "TypeOfArtifact",
"required": true,
"key": true,
"secret": false,
"description": "Package type, ex. npm for node packages",
"type": "String"
},
{ "name": "RepoKey",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Name of the repository in which this artifact should be stored"
},
{ "name": "UserName",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Username for authenticating with the repository"
},
{ "name": "Password",
"required": true,
"key": true,
"secret": true,
"type": "String",
"description": "Password for authenticating with the repository"
},
{ "name": "EmailAddress",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Email address used to authenticate with the repository"
},
{ "name": "ArtifactoryHost",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Public address of Artifactory host, ex: https://myexamplehost.com or http://myexamplehost.com:8080"
}],
"provider": "Artifactory",
"version": "1",
"settings": {
"entityUrlTemplate": "{Config:ArtifactoryHost}/artifactory/webapp/#/artifacts/browse/tree/General/{Config:RepoKey}"
},
"inputArtifactDetails": {
"maximumCount": 5,
"minimumCount": 1
},
"outputArtifactDetails": {
"maximumCount": 5,
"minimumCount": 0
}
}
There are seven sections to the custom action definition:
category: This is the stage in which you will be creating this action. It can be Source, Build, Deploy, Test, Invoke, Approval. Except for source actions, the category section simply allows us to organize our actions. I am setting the category for my action as ‘Deploy’ because I’m using it to publish my node artifact to my Artifactory instance.
configurationProperties: These are the parameters or variables required for your project to authenticate and commit your artifact. In the case of my custom worker, I need:
TypeOfArtifact: In this case, npm, because it’s for the Node Package Manager.
RepoKey: The name of the repository. In this case, it’s the default npm.
UserName and Password for the user to authenticate with the Artifactory repository.
EmailAddress used to authenticate with the repository.
Artifactory host name or IP address.
provider: The name you define for your custom action stage. I have named the provider Artifactory.
version: Version number for the custom action. Because this is the first version, I set the version number to 1.
entityUrlTemplate: This URL is presented to your users for the deploy stage along with the title you define in your provider. The link takes the user to their artifact repository page in the Artifactory host.
inputArtifactDetails: The number of artifacts to expect from the previous stage in the pipeline.
outputArtifactDetails: The number of artifacts that should be the result from the custom action stage. Later in this blog post, I define 0 for my output artifacts because I am publishing the artifact to the Artifactory repository as the final action.
After I define the custom action in a JSON file, I use the AWS CLI to create the custom action type in CodePipeline:
After I create the custom action type in the same region as my pipeline, I edit the pipeline to add a Deploy stage and configure it to use the custom action I created for Artifactory:
I have created a custom worker for the actions required to commit the npm artifact to the Artifactory repository. The worker is in Python and it runs in a loop on an Amazon EC2 instance. My custom worker polls for a deploy job and publishes the NPM artifact to the Artifactory repository.
The EC2 instance is running Amazon Linux and has an IAM instance role attached that gives the worker permission to access CodePipeline. The worker process is as follows:
Take the configuration properties from the custom worker and poll CodePipeline for a custom action job.
After there is a job in the job queue with the appropriate category, provider, and version, acknowledge the job.
Download the zipped artifact created in the previous Build stage from the provided S3 buckets with the provided temporary credentials.
Unzip the artifact into a temporary directory.
A user-defined Artifactory user name and password is used to receive a temporary API key from Artifactory.
To avoid having to write the password to a file, use that temporary API key and user name to authenticate with the NPM repository.
Publish the Node.js package to the specified repository.
Because I am running my custom worker on an Amazon Linux EC2 instance, I installed npm with the following command:
sudo yum install nodejs npm --enablerepo=epel
For my custom worker, I used pip to install the required Python libraries:
pip install boto3 requests
For a full Python package list, see requirements.txt in the GitHub repository.
Let’s take a look at some of the code snippets from the worker.
First, the worker polls for jobs:
def action_type():
ActionType = {
'category': 'Deploy',
'owner': 'Custom',
'provider': 'Artifactory',
'version': '1' }
return(ActionType)
def poll_for_jobs():
try:
artifactory_action_type = action_type()
print(artifactory_action_type)
jobs = codepipeline.poll_for_jobs(actionTypeId=artifactory_action_type)
while not jobs['jobs']:
time.sleep(10)
jobs = codepipeline.poll_for_jobs(actionTypeId=artifactory_action_type)
if jobs['jobs']:
print('Job found')
return jobs['jobs'][0]
except ClientError as e:
print("Received an error: %s" % str(e))
raise
When there is a job in the queue, the poller returns a number of values from the queue such as jobId, the input and output S3 buckets for artifacts, temporary credentials to access the S3 buckets, and other configuration details from the stage in the pipeline.
After successfully receiving the job details, the worker sends an acknowledgement to CodePipeline to ensure that the work on the job is not duplicated by other workers watching for the same job:
def job_acknowledge(jobId, nonce):
try:
print('Acknowledging job')
result = codepipeline.acknowledge_job(jobId=jobId, nonce=nonce)
return result
except Exception as e:
print("Received an error when trying to acknowledge the job: %s" % str(e))
raise
With the job now acknowledged, the worker publishes the source code artifact into the desired repository. The worker gets the value of the artifact S3 bucket and objectKey from the inputArtifacts in the response from the poll_for_jobs API request. Next, the worker creates a new directory in /tmp and downloads the S3 object into this directory:
def get_bucket_location(bucketName, init_client):
region = init_client.get_bucket_location(Bucket=bucketName)['LocationConstraint']
if not region:
region = 'us-east-1'
return region
def get_s3_artifact(bucketName, objectKey, ak, sk, st):
init_s3 = boto3.client('s3')
region = get_bucket_location(bucketName, init_s3)
session = Session(aws_access_key_id=ak,
aws_secret_access_key=sk,
aws_session_token=st)
s3 = session.resource('s3',
region_name=region,
config=botocore.client.Config(signature_version='s3v4'))
try:
tempdirname = tempfile.mkdtemp()
except OSError as e:
print('Could not write temp directory %s' % tempdirname)
raise
bucket = s3.Bucket(bucketName)
obj = bucket.Object(objectKey)
filename = tempdirname + '/' + objectKey
try:
if os.path.dirname(objectKey):
directory = os.path.dirname(filename)
os.makedirs(directory)
print('Downloading the %s object and writing it to disk in %s location' % (objectKey, tempdirname))
with open(filename, 'wb') as data:
obj.download_fileobj(data)
except ClientError as e:
print('Downloading the object and writing the file to disk raised this error: ' + str(e))
raise
return(filename, tempdirname)
Because the downloaded artifact from S3 is a zip file, the worker must unzip it first. To have a clean area in which to work, I extract the downloaded zip archive into a new directory:
def unzip_codepipeline_artifact(artifact, origtmpdir):
# create a new temp directory
# Unzip artifact into new directory
try:
newtempdir = tempfile.mkdtemp()
print('Extracting artifact %s into temporary directory %s' % (artifact, newtempdir))
zip_ref = zipfile.ZipFile(artifact, 'r')
zip_ref.extractall(newtempdir)
zip_ref.close()
shutil.rmtree(origtmpdir)
return(os.listdir(newtempdir), newtempdir)
except OSError as e:
if e.errno != errno.EEXIST:
shutil.rmtree(newtempdir)
raise
The worker now has the npm package that I want to store in my Artifactory NPM repository.
To authenticate with the NPM repository, the worker requests a temporary token from the Artifactory host. After receiving this temporary token, it creates a .npmrc file in the worker user’s home directory that includes a hash of the user name and temporary token. After it has authenticated, the worker runs npm config set registry <URL OF REPOSITORY> to configure the npm registry value to be the Artifactory host. Next, the worker runs npm publish –registry <URL OF REPOSITORY>, which publishes the node package to the NPM repository in the Artifactory host.
def push_to_npm(configuration, artifact_list, temp_dir, jobId):
reponame = configuration['RepoKey']
art_type = configuration['TypeOfArtifact']
print("Putting artifact into NPM repository " + reponame)
token, hostname, username = gen_artifactory_auth_token(configuration)
npmconfigfile = create_npmconfig_file(configuration, username, token)
url = hostname + '/artifactory/api/' + art_type + '/' + reponame
print("Changing directory to " + str(temp_dir))
os.chdir(temp_dir)
try:
print("Publishing following files to the repository: %s " % os.listdir(temp_dir))
print("Sending artifact to Artifactory NPM registry URL: " + url)
subprocess.call(["npm", "config", "set", "registry", url])
req = subprocess.call(["npm", "publish", "--registry", url])
print("Return code from npm publish: " + str(req))
if req != 0:
err_msg = "npm ERR! Recieved non OK response while sending response to Artifactory. Return code from npm publish: " + str(req)
signal_failure(jobId, err_msg)
else:
signal_success(jobId)
except requests.exceptions.RequestException as e:
print("Received an error when trying to commit artifact %s to repository %s: " % (str(art_type), str(configuration['RepoKey']), str(e)))
raise
return(req, npmconfigfile)
If the return value from publishing to the repository is not 0, the worker signals a failure to CodePipeline. If the value is 0, the worker signals success to CodePipeline to indicate that the stage of the pipeline has been completed successfully.
For the custom worker code, see npm_job_worker.py in the GitHub repository.
I run my custom worker on an EC2 instance using the command python npm_job_worker.py, with an optional --version flag that can be used to specify worker versions other than 1. Then I trigger a release change in my pipeline:
From my custom worker output logs, I have just committed a package named node_example at version 1.0.3:
On artifact: index.js
Committing to the repo: https://artifactory.myexamplehost.com/artifactory/api/npm/npm
Sending artifact to Artifactory URL: https:// artifactoryhost.myexamplehost.com/artifactory/api/npm/npm
npm config: 0
npm http PUT https://artifactory.myexamplehost.com/artifactory/api/npm/npm/node_example
npm http 201 https://artifactory.myexamplehost.com/artifactory/api/npm/npm/node_example
+ [email protected]
Return code from npm publish: 0
Signaling success to CodePipeline
After that has been built successfully, I can find my artifact in my Artifactory repository:
To help you automate this process, I have created this AWS CloudFormation template that automates the creation of the CodeBuild project, the custom action, and the CodePipeline pipeline. It also launches the Amazon EC2-based custom job worker in an AWS Auto Scaling group. This template requires you to have a VPC and CodeCommit repository for your Node.js project. If you do not currently have a VPC in which you want to run your custom worker EC2 instances, you can use this AWS QuickStart to create one. If you do not have an existing Node.js project, I’ve provided a sample project in the GitHub repository.
Conclusion
I‘ve shown you the steps to integrate your JFrog Artifactory repository with your CodePipeline workflow. I’ve shown you how to create a custom action in CodePipeline and how to create a custom worker that works in your CI/CD pipeline. To dig deeper into custom actions and see how you can integrate your Artifactory repositories into your AWS CodePipeline projects, check out the full code base on GitHub.
If you have any questions or feedback, feel free to reach out to us through the AWS CodePipeline forum.
Erin McGill is a Solutions Architect in the AWS Partner Program with a focus on DevOps and automation tooling.
As you can see from my EC2 Instance History post, we add new instance types on a regular and frequent basis. Driven by increasingly powerful processors and designed to address an ever-widening set of use cases, the size and diversity of this list reflects the equally diverse group of EC2 customers!
Near the bottom of that list you will find the new compute-intensive C5 instances. With a 25% to 50% improvement in price-performance over the C4 instances, the C5 instances are designed for applications like batch and log processing, distributed and or real-time analytics, high-performance computing (HPC), ad serving, highly scalable multiplayer gaming, and video encoding. Some of these applications can benefit from access to high-speed, ultra-low latency local storage. For example, video encoding, image manipulation, and other forms of media processing often necessitates large amounts of I/O to temporary storage. While the input and output files are valuable assets and are typically stored as Amazon Simple Storage Service (S3) objects, the intermediate files are expendable. Similarly, batch and log processing runs in a race-to-idle model, flushing volatile data to disk as fast as possible in order to make full use of compute resources.
New C5d Instances with Local Storage In order to meet this need, we are introducing C5 instances equipped with local NVMe storage. Available for immediate use in 5 regions, these instances are a great fit for the applications that I described above, as well as others that you will undoubtedly dream up! Here are the specs:
Instance Name
vCPUs
RAM
Local Storage
EBS Bandwidth
Network Bandwidth
c5d.large
2
4 GiB
1 x 50 GB NVMe SSD
Up to 2.25 Gbps
Up to 10 Gbps
c5d.xlarge
4
8 GiB
1 x 100 GB NVMe SSD
Up to 2.25 Gbps
Up to 10 Gbps
c5d.2xlarge
8
16 GiB
1 x 225 GB NVMe SSD
Up to 2.25 Gbps
Up to 10 Gbps
c5d.4xlarge
16
32 GiB
1 x 450 GB NVMe SSD
2.25 Gbps
Up to 10 Gbps
c5d.9xlarge
36
72 GiB
1 x 900 GB NVMe SSD
4.5 Gbps
10 Gbps
c5d.18xlarge
72
144 GiB
2 x 900 GB NVMe SSD
9 Gbps
25 Gbps
Other than the addition of local storage, the C5 and C5d share the same specs. Both are powered by 3.0 GHz Intel Xeon Platinum 8000-series processors, optimized for EC2 and with full control over C-states on the two largest sizes, giving you the ability to run two cores at up to 3.5 GHz using Intel Turbo Boost Technology.
You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe; this includes the latest Amazon Linux, Microsoft Windows (Server 2008 R2, Server 2012, Server 2012 R2 and Server 2016), Ubuntu, RHEL, SUSE, and CentOS AMIs.
Here are a couple of things to keep in mind about the local NVMe storage:
Naming – You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.
Encryption – Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated.
Lifetime – Local NVMe devices have the same lifetime as the instance they are attached to, and do not stick around after the instance has been stopped or terminated.
Available Now C5d instances are available in On-Demand, Reserved Instance, and Spot form in the US East (N. Virginia), US West (Oregon), EU (Ireland), US East (Ohio), and Canada (Central) Regions. Prices vary by Region, and are just a bit higher than for the equivalent C5 instances.
AWS Identity and Access Management (IAM) now makes it easier for you to control access to your AWS resources by using the AWS organization of IAM principals (users and roles). For some services, you grant permissions using resource-based policies to specify the accounts and principals that can access the resource and what actions they can perform on it. Now, you can use a new condition key, aws:PrincipalOrgID, in these policies to require all principals accessing the resource to be from an account in the organization. For example, let’s say you have an Amazon S3 bucket policy and you want to restrict access to only principals from AWS accounts inside of your organization. To accomplish this, you can define the aws:PrincipalOrgID condition and set the value to your organization ID in the bucket policy. Your organization ID is what sets the access control on the S3 bucket. Additionally, when you use this condition, policy permissions apply when you add new accounts to this organization without requiring an update to the policy.
In this post, I walk through the details of the new condition and show you how to restrict access to only principals in your organization using S3.
Condition concepts
Before I introduce the new condition, let’s review the condition element of an IAM policy. A condition is an optional IAM policy element you can use to specify special circumstances under which the policy grants or denies permission. A condition includes a condition key, operator, and value for the condition. There are two types of conditions: service-specific conditions and global conditions. Service-specific conditions are specific to certain actions in an AWS service. For example, the condition key ec2:InstanceType supports specific EC2 actions. Global conditions support all actions across all AWS services.
Now that I’ve reviewed the condition element in an IAM policy, let me introduce the new condition.
AWS:PrincipalOrgID Condition Key
You can use this condition key to apply a filter to the Principal element of a resource-based policy. You can use any string operator, such as StringLike, with this condition and specify the AWS organization ID for as its value.
Condition key
Description
Operator(s)
Value
aws:PrincipalOrgID
Validates if the principal accessing the resource belongs to an account in your organization.
Example: Restrict access to only principals from my organization
Let’s consider an example where I want to give specific IAM principals in my organization direct access to my S3 bucket, 2018-Financial-Data, that contains sensitive financial information. I have two accounts in my AWS organization with multiple account IDs, and only some IAM users from these accounts need access to this financial report.
To grant this access, I author a resource-based policy for my S3 bucket as shown below. In this policy, I list the individuals who I want to grant access. For the sake of this example, let’s say that while doing so, I accidentally specify an incorrect account ID. This means a user named Steve, who is not in an account in my organization, can now access my financial report. To require the principal account to be in my organization, I add a condition to my policy using the global condition key aws:PrincipalOrgID. This condition requires that only principals from accounts in my organization can access the S3 bucket. This means that although Steve is one of the principals in the policy, he can’t access the financial report because the account that he is a member of doesn’t belong to my organization.
In the policy above, I specify the principals that I grant access to using the principal element of the statement. Next, I add s3:GetObject as the action and 2018-Financial-Data/* as the resource to grant read access to my S3 bucket. Finally, I add the new condition key aws:PrincipalOrgID and specify my organization ID in the condition element of the statement to make sure only the principals from the accounts in my organization can access this bucket.
Summary
You can now use the aws:PrincipalOrgID condition key in your resource-based policies to more easily restrict access to IAM principals from accounts in your AWS organization. For more information about this global condition key and policy examples using aws:PrincipalOrgID, read the IAM documentation.
If you have comments about this post, submit them in the Comments section below. If you have questions about or suggestions for this solution, start a new thread on the IAM forum or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
This post courtesy of Jeff Levine Solutions Architect for Amazon Web Services
Amazon Linux 2 is the next generation of Amazon Linux, a Linux server operating system from Amazon Web Services (AWS). Amazon Linux 2 offers a high-performance Linux environment suitable for organizations of all sizes. It supports applications ranging from small websites to enterprise-class, mission-critical platforms.
Amazon Linux 2 includes support for the LAMP (Linux/Apache/MariaDB/PHP) stack, one of the most popular platforms for deploying websites. To secure the transmission of data-in-transit to such websites and prevent eavesdropping, organizations commonly leverage Secure Sockets Layer/Transport Layer Security (SSL/TLS) services which leverage certificates to provide encryption. The LAMP stack provided by Amazon Linux 2 includes a self-signed SSL/TLS certificate. Such certificates may be fine for internal usage but are not acceptable when attestation by a certificate authority is required.
In this post, I discuss how to extend the capabilities of Amazon Linux 2 by installing Let’s Encrypt, a certificate authority provided by the Internet Security Research Group. Let’s Encrypt offers basic SSL/TLS certificates for DNS hosts at no charge that you can use to add encryption-in-transit to a single web server. For commercial or multi-server configurations, you should consider AWS Certificate Manager and Elastic Load Balancing.
Let’s Encrypt also requires the certbot package, which you install from EPEL, the Extra Packaged for Enterprise Linux collection. Although EPEL is not included with Amazon Linux 2, I show how you can install it from the Fedora Project.
Walkthrough
At a high level, you perform the following tasks for this walkthrough:
Provision a VPC, Amazon Linux 2 instance, and LAMP stack.
Install and enable the EPEL repository.
Install and configure Let’s Encrypt.
Validate the installation.
Clean up.
Prerequisites and costs
To follow along with this walkthrough, you need the following:
Accept all other default values including with regard to storage.
Create a new security group and accept the default rule that allows TCP port 22 (SSH) from everywhere (0.0.0.0/0 in IPv4). For the purposes of this walkthrough, permitting access from all IP addresses is reasonable. In a production environment, you may restrict access to different addresses.
Allocate and associate an Elastic IP address to the server when it enters the running state.
Respond “Y” to all requests for approval to install the software.
Step 3: Install and configure Let’s Encrypt
If you are no longer connected to the Amazon Linux 2 instance, connect to it at the Elastic IP address that you just created.
Install certbot, the Let’s Encrypt client to be used to obtain an SSL/TLS certificate and install it into Apache.
sudo yum install python2-certbot-apache.noarch
Respond “Y” to all requests for approval to install the software. If you see a message appear about SELinux, you can safely ignore it. This is a known issue with the latest version of certbot.
Create a DNS “A record” that maps a host name to the Elastic IP address. For this post, assume that the name of the host is lamp.example.com. If you are hosting your DNS in Amazon Route 53, do this by creating the appropriate record set.
After the “A record” has propagated, browse to lamp.example.com. The Apache test page should appear. If the page does not appear, use a tool such as nslookup on your workstation to confirm that the DNS record has been properly configured.
You are now ready to install Let’s Encrypt. Let’s Encrypt does the following:
Confirms that you have control over the DNS domain being used, by having you create a DNS TXT record using the value that it provides.
Obtains an SSL/TLS certificate.
Modifies the Apache-related scripts to use the SSL/TLS certificate and redirects users browsing the site in HTTP mode to HTTPS mode.
Use the following command to install certbot:
sudo certbot -i apache -a manual \
--preferred-challenges dns -d lamp.example.com
The options have the following meanings:
-i apache Use the Apache installer.
-a manual Authenticate domain ownership manually.
--preferred-challenges dns Use DNS TXT records for authentication challenge.
-d lamp.example.com Specify the domain for the SSL/TLS certificate.
You are prompted for the following information: E-mail address for renewals? Enter an email address for certificate renewals. Accept the terms of services? Respond as appropriate. Send your e-mail address to the EFF? Respond as appropriate. Log your current IP address? Respond as appropriate.
You are prompted to deploy a DNS TXT record with the name “_acme-challenge.lamp.example.com” with the supplied value, as shown below.
After you enter the record, wait until the TXT record propagates. To look up the TXT record to confirm the deployment, use the nslookup command in a separate command window, as shown below. Remember to use the set ty=txt command before entering the TXT record. You are prompted to select a virtual host. There is only one, so choose 1. The final prompt asks whether to redirect HTTP traffic to HTTPS. To perform the redirection, choose 2. That completes the configuration of Let’s Encrypt.
Browse to the http:// lamp.example.com site. You are redirected to the SSL/TLS page https://lamp.example.com.
To look at the encryption information, use the appropriate actions within your browser. For example, in Firefox, you can open the padlock and traverse the menus. In the encryption technical details, you can see from the “Connection Encrypted” line that traffic to the website is now encrypted using TLS 1.2.
Security note: As of the time of publication, this website also supports TLS 1.0. I recommend that you disable this protocol because of some known vulnerabilities associated with it. To do this:
Edit the file /etc/letsencrypt/options-ssl-apache.conf.
Look for the line beginning with SSLProtocol and change it to the following:
SSLProtocol all -SSLv2 -SSLv3 -TLSv1
Save the file. After you make changes to this file, Let’s Encrypt no longer automatically updates it. Periodically check your log files for recommended updates to this file.
Restart the httpd server with the following command:
sudo service httpd restart
Step 5: Cleanup
Use the following steps to avoid incurring any further costs.
Terminate the Amazon Linux 2 instance that you created.
Release the Elastic IP address that you allocated.
Revert any DNS changes that you made, including the A and TXT records.
Conclusion
Amazon Linux 2 is an excellent option for hosting websites through the LAMP stack provided by the Amazon-Linux-Extras feature. You can then enhance the security of the Apache web server by installing EPEL and Let’s Encrypt. Let’s Encrypt provisions an SSL/TLS certificate, optionally installs it for you on the Apache server, and enables data-in-transit encryption. You can get started with Amazon Linux 2 in just a few clicks.
The EU’s General Data Protection Regulation (GDPR) describes data processor and data controller roles, and some customers and AWS Partner Network (APN) partners are asking how this affects the long-established AWS Shared Responsibility Model. I wanted to take some time to help folks understand shared responsibilities for us and for our customers in context of the GDPR.
How does the AWS Shared Responsibility Model change under GDPR? The short answer – it doesn’t. AWS is responsible for securing the underlying infrastructure that supports the cloud and the services provided; while customers and APN partners, acting either as data controllers or data processors, are responsible for any personal data they put in the cloud. The shared responsibility model illustrates the various responsibilities of AWS and our customers and APN partners, and the same separation of responsibility applies under the GDPR.
AWS responsibilities as a data processor
The GDPR does introduce specific regulation and responsibilities regarding data controllers and processors. When any AWS customer uses our services to process personal data, the controller is usually the AWS customer (and sometimes it is the AWS customer’s customer). However, in all of these cases, AWS is always the data processor in relation to this activity. This is because the customer is directing the processing of data through its interaction with the AWS service controls, and AWS is only executing customer directions. As a data processor, AWS is responsible for protecting the global infrastructure that runs all of our services. Controllers using AWS maintain control over data hosted on this infrastructure, including the security configuration controls for handling end-user content and personal data. Protecting this infrastructure, is our number one priority, and we invest heavily in third-party auditors to test our security controls and make any issues they find available to our customer base through AWS Artifact. Our ISO 27018 report is a good example, as it tests security controls that focus on protection of personal data in particular.
AWS has an increased responsibility for our managed services. Examples of managed services include Amazon DynamoDB, Amazon RDS, Amazon Redshift, Amazon Elastic MapReduce, and Amazon WorkSpaces. These services provide the scalability and flexibility of cloud-based resources with less operational overhead because we handle basic security tasks like guest operating system (OS) and database patching, firewall configuration, and disaster recovery. For most managed services, you only configure logical access controls and protect account credentials, while maintaining control and responsibility of any personal data.
Customer and APN partner responsibilities as data controllers — and how AWS Services can help
Our customers can act as data controllers or data processors within their AWS environment. As a data controller, the services you use may determine how you configure those services to help meet your GDPR compliance needs. For example, AWS Services that are classified as Infrastructure as a Service (IaaS), such as Amazon EC2, Amazon VPC, and Amazon S3, are under your control and require you to perform all routine security configuration and management that would be necessary no matter where the servers were located. With Amazon EC2 instances, you are responsible for managing: guest OS (including updates and security patches), application software or utilities installed on the instances, and the configuration of the AWS-provided firewall (called a security group).
To help you realize data protection by design principles under the GDPR when using our infrastructure, we recommend you protect AWS account credentials and set up individual user accounts with Amazon Identity and Access Management (IAM) so that each user is only given the permissions necessary to fulfill their job duties. We also recommend using multi-factor authentication (MFA) with each account, requiring the use of SSL/TLS to communicate with AWS resources, setting up API/user activity logging with AWS CloudTrail, and using AWS encryption solutions, along with all default security controls within AWS Services. You can also use advanced managed security services, such as Amazon Macie, which assists in discovering and securing personal data stored in Amazon S3.
For more information, you can download the AWS Security Best Practices whitepaper or visit the AWS Security Resources or GDPR Center webpages. In addition to our solutions and services, AWS APN partners can provide hundreds of tools and features to help you meet your security objectives, ranging from network security and configuration management to access control and data encryption.
Many of my colleagues are fortunate to be able to spend a good part of their day sitting down with and listening to our customers, doing their best to understand ways that we can better meet their business and technology needs. This information is treated with extreme care and is used to drive the roadmap for new services and new features.
AWS customers in the financial services industry (often abbreviated as FSI) are looking ahead to the Fundamental Review of Trading Book (FRTB) regulations that will come in to effect between 2019 and 2021. Among other things, these regulations mandate a new approach to the “value at risk” calculations that each financial institution must perform in the four hour time window after trading ends in New York and begins in Tokyo. Today, our customers report this mission-critical calculation consumes on the order of 200,000 vCPUs, growing to between 400K and 800K vCPUs in order to meet the FRTB regulations. While there’s still some debate about the magnitude and frequency with which they’ll need to run this expanded calculation, the overall direction is clear.
Building a Big Grid In order to make sure that we are ready to help our FSI customers meet these new regulations, we worked with TIBCO to set up and run a proof of concept grid in the AWS Cloud. The periodic nature of the calculation, along with the amount of processing power and storage needed to run it to completion within four hours, make it a great fit for an environment where a vast amount of cost-effective compute power is available on an on-demand basis.
Our customers are already using the TIBCO GridServer on-premises and want to use it in the cloud. This product is designed to run grids at enterprise scale. It runs apps in a virtualized fashion, and accepts requests for resources, dynamically provisioning them on an as-needed basis. The cloud version supports Amazon Linux as well as the PostgreSQL-compatible edition of Amazon Aurora.
Working together with TIBCO, we set out to create a grid that was substantially larger than the current high-end prediction of 800K vCPUs, adding a 50% safety factor and then rounding up to reach 1.3 million vCPUs (5x the size of the largest on-premises grid). With that target in mind, the account limits were raised as follows:
Spot Instance Limit – 120,000
EBS Volume Limit – 120,000
EBS Capacity Limit – 2 PB
If you plan to create a grid of this size, you should also bring your friendly local AWS Solutions Architect into the loop as early as possible. They will review your plans, provide you with architecture guidance, and help you to schedule your run.
Running the Grid We hit the Go button and launched the grid, watching as it bid for and obtained Spot Instances, each of which booted, initialized, and joined the grid within two minutes. The test workload used the Strata open source analytics & market risk library from OpenGamma and was set up with their assistance.
The grid grew to 61,299 Spot Instances (1.3 million vCPUs drawn from 34 instance types spanning 3 generations of EC2 hardware) as planned, with just 1,937 instances reclaimed and automatically replaced during the run, and cost $30,000 per hour to run, at an average hourly cost of $0.078 per vCPU. If the same instances had been used in On-Demand form, the hourly cost to run the grid would have been approximately $93,000.
Despite the scale of the grid, prices for the EC2 instances did not move during the bidding process. This is due to the overall size of the AWS Cloud and the smooth price change model that we launched late last year.
To give you a sense of the compute power, we computed that this grid would have taken the #1 position on the TOP 500 supercomputer list in November 2007 by a considerable margin, and the #2 position in June 2008. Today, it would occupy position #360 on the list.
I hope that you enjoyed this AWS success story, and that it gives you an idea of the scale that you can achieve in the cloud!
EC2’s H1 instances offer 2 to 16 terabytes of fast, dense storage for big data applications, optimized to deliver high throughput for sequential I/O. Enhanced Networking, 32 to 256 gigabytes of RAM, and Intel Xeon E5-2686 v4 processors running at a base frequency of 2.3 GHz round out the feature set.
I am happy to announce that we are reducing the On-Demand and Reserved Instance prices for H1 instances in the US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland) Regions by 15%, effective immediately.
Today, we’re excited to announce local build support in AWS CodeBuild.
AWS CodeBuild is a fully managed build service. There are no servers to provision and scale, or software to install, configure, and operate. You just specify the location of your source code, choose your build settings, and CodeBuild runs build scripts for compiling, testing, and packaging your code.
In this blog post, I’ll show you how to set up CodeBuild locally to build and test a sample Java application.
By building an application on a local machine you can:
Test the integrity and contents of a buildspec file locally.
Test and build an application locally before committing.
Identify and fix errors quickly from your local development environment.
Prerequisites
In this post, I am using AWS Cloud9 IDE as my development environment.
If you would like to use AWS Cloud9 as your IDE, follow the express setup steps in the AWS Cloud9 User Guide.
The AWS Cloud9 IDE comes with Docker and Git already installed. If you are going to use your laptop or desktop machine as your development environment, install Docker and Git before you start.
Note: We need to provide three environment variables namely IMAGE_NAME, SOURCE and ARTIFACTS.
IMAGE_NAME: The name of your build environment image.
SOURCE: The absolute path to your source code directory.
ARTIFACTS: The absolute path to your artifact output folder.
When you run the sample project, you get a runtime error that says the YAML file does not exist. This is because a buildspec.yml file is not included in the sample web project. AWS CodeBuild requires a buildspec.yml to run a build. For more information about buildspec.yml, see Build Spec Example in the AWS CodeBuild User Guide.
Let’s add a buildspec.yml file with the following content to the sample-web-app folder and then rebuild the project.
version: 0.2
phases:
build:
commands:
- echo Build started on `date`
- mvn install
artifacts:
files:
- target/javawebdemo.war
This time your build should be successful. Upon successful execution, look in the /artifacts folder for the final built artifacts.zip file to validate.
Conclusion:
In this blog post, I showed you how to quickly set up the CodeBuild local agent to build projects right from your local desktop machine or laptop. As you see, local builds can improve developer productivity by helping you identify and fix errors quickly.
I hope you found this post useful. Feel free to leave your feedback or suggestions in the comments.
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.