Resources are objects with a limited availability within a computing system. These typically include objects managed by the operating system, such as file handles, database connections, and network sockets. Because the number of such resources in a system is limited, they must be released by an application as soon as they are used. Otherwise, you will run out of resources and you won’t be able to allocate new ones. The paradigm of acquiring a resource and releasing it is also followed by other categories of objects such as metric wrappers and timers.
Resource leaks are bugs that arise when a program doesn’t release the resources it has acquired. Resource leaks can lead to resource exhaustion. In the worst case, they can cause the system to slow down or even crash.
Starting with Java 7, most classes holding resources implement the java.lang.AutoCloseable interface and provide a close() method to release them. However, a close() call in source code doesn’t guarantee that the resource is released along all program execution paths. For example, in the following sample code, resource r is acquired by calling its constructor and is closed along the path corresponding to the if branch, shown using green arrows. To ensure that the acquired resource doesn’t leak, you must also close r along the path corresponding to the else branch (the path shown using red arrows).
Often, resource leaks manifest themselves along code paths that aren’t frequently run, or under a heavy system load, or after the system has been running for a long time. As a result, such leaks are latent and can remain dormant in source code for long periods of time before manifesting themselves in production environments. This is the primary reason why resource leak bugs are difficult to detect or replicate during testing, and why automatically detecting these bugs during pull requests and code scans is important.
Detecting resource leaks in CodeGuru Reviewer
For this post, we consider the following Java code snippet. In this code, method getConnection() attempts to create a connection in the connection pool associated with a data source. Typically, a connection pool limits the maximum number of connections that can remain open at any given time. As a result, you must close connections after their use so as to not exhaust this limit.
1 private Connection getConnection(final BasicDataSource dataSource, ...)
throws ValidateConnectionException, SQLException {
2 boolean connectionAcquired = false;
3 // Retrying three times to get the connection.
4 for (int attempt = 0; attempt < CONNECTION_RETRIES; ++attempt) {
5 Connection connection = dataSource.getConnection();
6 // validateConnection may throw ValidateConnectionException
7 if (! validateConnection(connection, ...)) {
8 // connection is invalid
9 DbUtils.closeQuietly(connection);
10 } else {
11 // connection is established
12 connectionAcquired = true;
13 return connection;
14 }
15 }
16 return null;
17 }
At first glance, it seems that the method getConnection() doesn’t leak connection resources. If a valid connection is established in the connection pool (else branch on line 10 is taken), the method getConnection() returns it to the client for use (line 13). If the connection established is invalid (if branch on line 7 is taken), it’s closed in line 9 before another attempt is made to establish a connection.
However, method validateConnection() at line 7 can throw a ValidateConnectionException. If this exception is thrown after a connection is established at line 5, the connection is neither closed in this method nor is it returned upstream to the client to be closed later. Furthermore, if this exceptional code path runs frequently, for instance, if the validation logic throws on a specific recurring service request, each new request causes a connection to leak in the connection pool. Eventually, the client can’t acquire new connections to the data source, impacting the availability of the service.
A typical recommendation to prevent resource leak bugs is to declare the resource objects in a try-with-resources statement block. However, we can’t use try-with-resources to fix the preceding method because this method is required to return an open connection for use in the upstream client. The CodeGuru Reviewer recommendation for the preceding code snippet is as follows:
“Consider closing the following resource: connection. The resource is referenced at line 7. The resource is closed at line 9. The resource is returned at line 13. There are other execution paths that don’t close the resource or return it, for example, when validateConnection throws an exception. To prevent this resource leak, close connection along these other paths before you exit this method.”
As mentioned in the Reviewer recommendation, to prevent this resource leak, you must close the established connection when method validateConnection() throws an exception. This can be achieved by inserting the validation logic (lines 7–14) in a try block. In the finally block associated with this try, the connection must be closed by calling DbUtils.closeQuietly(connection) if connectionAcquired == false. The method getConnection() after this fix has been applied is as follows:
private Connection getConnection(final BasicDataSource dataSource, ...)
throws ValidateConnectionException, SQLException {
boolean connectionAcquired = false;
// Retrying three times to get the connection.
for (int attempt = 0; attempt < CONNECTION_RETRIES; ++attempt) {
Connection connection = dataSource.getConnection();
try {
// validateConnection may throw ValidateConnectionException
if (! validateConnection(connection, ...)) {
// connection is invalid
DbUtils.closeQuietly(connection);
} else {
// connection is established
connectionAcquired = true;
return connection;
}
} finally {
if (!connectionAcquired) {
DBUtils.closeQuietly(connection);
}
}
}
return null;
}
As shown in this example, resource leaks in production services can be very disruptive. Furthermore, leaks that manifest along exceptional or less frequently run code paths can be hard to detect or replicate during testing and can remain dormant in the code for long periods of time before manifesting themselves in production environments. With the resource leak detector, you can detect such leaks on objects belonging to a large number of popular Java types such as file streams, database connections, network sockets, timers and metrics, etc.
Combining static code analysis with machine learning for accurate resource leak detection
In this section, we dive deep into the inner workings of the resource leak detector. The resource leak detector in CodeGuru Reviewer uses static analysis algorithms and techniques. Static analysis algorithms perform code analysis without running the code. These algorithms are generally prone to high false positives (the tool might report correct code as having a bug). If the number of these false positives is high, it can lead to alarm fatigue and low adoption of the tool. As a result, the resource leak detector in CodeGuru Reviewer prioritizes precision over recall— the findings we surface are resource leaks with a high accuracy, though CodeGuru Reviewer could potentially miss some resource leak findings.
The main reason for false positives in static code analysis is incomplete information available to the analysis. CodeGuru Reviewer requires only the Java source files and doesn’t require all dependencies or the build artifacts. Not requiring the external dependencies or the build artifacts reduces the friction to perform automated code reviews. As a result, static analysis only has access to the code in the source repository and doesn’t have access to its external dependencies. The resource leak detector in CodeGuru Reviewer combines static code analysis with a machine learning (ML) model. This ML model is used to reason about external dependencies to provide accurate recommendations.
To understand the use of the ML model, consider again the code above for method getConnection() that had a resource leak. In the code snippet, a connection to the data source is established by calling BasicDataSource.getConnection() method, declared in the Apache Commons library. As mentioned earlier, we don’t require the source code of external dependencies like the Apache library for code analysis during pull requests. Without access to the code of external dependencies, a pure static analysis-driven technique doesn’t know whether the Connection object obtained at line 5 will leak, if not closed. Similarly, it doesn’t know that DbUtils.closeQuietly() is a library function that closes the connection argument passed to it at line 9. Our detector combines static code analysis with ML that learns patterns over such external function calls from a large number of available code repositories. As a result, our resource leak detector knows that the connection doesn’t leak along the following code path:
A connection is established on line 5
Method validateConnection() returns false at line 7
DbUtils.closeQuietly() is called on line 9
This suppresses the possible false warning. At the same time, the detector knows that there is a resource leak when the connection is established at line 5, and validateConnection() throws an exception at line 7 that isn’t caught.
When we run CodeGuru Reviewer on this code snippet, it surfaces only the second leak scenario and makes an appropriate recommendation to fix this bug.
The ML model used in the resource leak detector has been trained on a large number of internal Amazon and GitHub code repositories.
Responses to the resource leak findings
Although closing an open resource in code isn’t difficult, doing so properly along all program paths is important to prevent resource leaks. This can easily be overlooked, especially along exceptional or less frequently run paths. As a result, the resource leak detector in CodeGuru Reviewer has observed a relatively high frequency, and has alerted developers within Amazon to thousands of resource leaks before they hit production.
The resource leak detections have witnessed a high developer acceptance rate, and developer feedback towards the resource leak detector has been very positive. Some of the feedback from developers includes “Very cool, automated finding,” “Good bot :),” and “Oh man, this is cool.” Developers have also concurred that the findings are important and need to be fixed.
Conclusion
Resource leak bugs are difficult to detect or replicate during testing. They can impact the availability of production services. As a result, it’s important to automatically detect these bugs early on in the software development workflow, such as during pull requests or code scans. The resource leak detector in CodeGuru Reviewer combines static code analysis algorithms with ML to surface only the high confidence leaks. It has a high developer acceptance rate and has alerted developers within Amazon to thousands of leaks before those leaks hit production.
Comparing machine learning algorithm performance is fundamental for machine learning practitioners, and data scientists. The goal is to evaluate the appropriate algorithm to implement for a known business problem.
Machine learning performance is often correlated to the usefulness of the model deployed. Improving the performance of the model typically results in an increased accuracy of the prediction. Model accuracy is a key performance indicator (KPI) for businesses when evaluating production readiness and identifying the appropriate algorithm to select earlier in model development. Organizations benefit from reduced project expenses, accelerated project timelines and improved customer experience. Nevertheless, some organizations have not introduced a model comparison process into their workflow which negatively impacts cost and productivity.
First, I explain the use case that will be addressed through this post. Then, I explain the design considerations for the solution. Finally, I provide access to a GitHub repository which includes all the necessary steps for you to replicate the solution I have described, in your own AWS account.
Understanding the Use Case
Machine learning has many potential uses and quite often the same use case is being addressed by different machine learning algorithms. Let’s take Amazon Sagemaker built-in algorithms. As an example, if you are having a “Regression” use case, it can be addressed using (Linear Learner, XGBoost and KNN) algorithms. Another example for a “Classification” use case you can use algorithm such as (XGBoost, KNN, Factorization Machines and Linear Learner). Similarly for “Anomaly Detection” there are (Random Cut Forests and IP Insights).
In this post, it is a “Regression” use case to identify the age of the abalone which can be calculated based on the number of rings on its shell (age equals to number of rings plus 1.5). Usually the number of rings are counted through microscopes examinations.
I use the abalone dataset in libsvm format which contains 9 fields [‘Rings’, ‘Sex’, ‘Length’,’ Diameter’, ‘Height’,’ Whole Weight’,’ Shucked Weight’,’ Viscera Weight’ and ‘Shell Weight’] respectively.
The features starting from Sex to Shell Weight are physical measurements that can be measured using the correct tools. Therefore, using the machine learning algorithms (Linear Learner and XGBoost) to address this use case, the complexity of having to examine the abalone under microscopes to understand its age can be improved.
Benefits of the AWS Cloud Development Kit (AWS CDK)
The AWS Cloud Development Kit (AWS CDK) is an open source software development framework to define your cloud application resources.
The AWS CDK uses the jsii which is an interface developed by AWS that allows code in any language to naturally interact with JavaScript classes. It is the technology that enables the AWS Cloud Development Kit to deliver polyglot libraries from a single codebase.
This means that you can use the CDK and define your cloud application resources in typescript language for example. Then by compiling your source module using jsii, you can package it as modules in one of the supported target languages (e.g: Javascript, python, Java and .Net). So if your developers or customers prefer any of those languages, you can easily package and export the code to their preferred choice.
Also, the cdk tf provides constructs for defining Terraform HCL state files and the cdk8s enables you to use constructs for defining kubernetes configuration in TypeScript, Python, and Java. So by using the CDK you have a faster development process and easier cloud onboarding. It makes your cloud resources more flexible for sharing.
This architecture serves as an example of how you can build a MLOps pipeline that orchestrates the comparison of results between the predictions of two algorithms.
The solution uses a completely serverless environment so you don’t have to worry about managing the infrastructure. It also deletes resources not needed after collecting the predictions results, so as not to incur any additional costs.
Figure 1: Solution Architecture
Walkthrough
In the preceding diagram, the serverless MLOps pipeline is deployed using AWS Step Functions workflow. The architecture contains the following steps:
The dataset is uploaded to the Amazon S3 cloud storage under the /Inputs directory (prefix).
The Lambda function then will initiate the MLOps pipeline built using a Step Functions state machine.
The starting lambda will start by collecting the region corresponding training images URIs for both Linear Learner and XGBoost algorithms. These are used in training both algorithms over the dataset. It will also get the Amazon SageMaker Spark Container Image which is used for running the SageMaker processing Job.
The dataset is in libsvm format which is accepted by the XGBoost algorithm as per the Input/Output Interface for the XGBoost Algorithm. However, this is not supported by the Linear Learner Algorithm as per Input/Output interface for the linear learner algorithm. So we need to run a processing job using Amazon SageMaker Data Processing with Apache Spark. The processing job will transform the data from libsvm to csv and will divide the dataset into train, validation and test datasets. The output of the processing job will be stored under /Xgboost and /Linear directories (prefixes).
Figure 2: Train, validation and test samples extracted from dataset
6. Then the workflow of Step Functions will perform the following steps in parallel:
Train both algorithms.
Create models out of trained algorithms.
Create endpoints configurations and deploy predictions endpoints for both models.
Invoke lambda function to describe the status of the deployed endpoints and wait until the endpoints become in “InService”.
Invoke lambda function to perform 3 live predictions using boto3 and the “test” samples taken from the dataset to calculate the average accuracy of each model.
Invoke lambda function to delete deployed endpoints not to incur any additional charges.
7. Finally, a Lambda function will be invoked to determine which model has better accuracy in predicting the values.
The following shows a diagram of the workflow of the Step Functions:
Figure 3: AWS Step Functions workflow graph
The code to provision this solution along with step by step instructions can be found at this GitHub repo.
Results and Next Steps
After waiting for the complete execution of step functions workflow, the results are depicted in the following diagram:
Figure 4: Comparison results
This doesn’t necessarily mean that the XGBoost algorithm will always be the better performing algorithm. It just means that the performance was the result of these factors:
the hyperparameters configured for each algorithm
the number of epochs performed
the amount of dataset samples used for training
To make sure that you are getting better results from the models, you can run hyperparameters tuning jobs which will run many training jobs on your dataset using the algorithms and ranges of hyperparameters that you specify. This helps you allocate which set of hyperparameters which are giving better results.
Finally, you can use this comparison to determine which algorithm is best suited for your production environment. Then you can configure your step functions workflow to update the configuration of the production endpoint with the better performing algorithm.
Figure 5: Update production endpoint workflow
Conclusion
This post showed you how to create a repeatable, automated pipeline to deliver the better performing algorithm to your production predictions endpoint. This helps increase the productivity and reduce the time of manual comparison. You also learned to provision the solution using AWS CDK and to perform regular cleaning of deployed resources to drive down business costs. If this post helps you or inspires you to solve a problem, share your thoughts and questions in the comments. You can use and extend the code on the GitHub repo.
Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers
The NFL, an AWS Professional Services partner, is collaborating with NFL’s Player Health and Safety team to build the Digital Athlete Program. The Digital Athlete Program is working to drive progress in the prevention, diagnosis, and treatment of injuries; enhance medical protocols; and further improve the way football is taught and played. The NFL, in conjunction with AWS Professional Services, delivered an Amazon EC2 Image Builder pipeline for automating the production of Amazon Machine Images (AMIs). Following similar practices from the Digital Athlete Program, this post demonstrates how to deploy an automated Image Builder pipeline.
“AWS Professional Services faced unique environment constraints, but was able to deliver a modular pipeline solution leveraging EC2 Image Builder. The framework serves as a foundation to create hardened images for future use cases. The team also provided documentation and knowledge transfer sessions to ensure our team was set up to successfully manage the solution.”
—Joseph Steinke, Director, Data Solutions Architect, National Football League
A common scenario AWS customers face is how to build processes that configure secure AWS resources that can be leveraged throughout the organization. You need to move fast in the cloud without compromising security best practices. Amazon Elastic Compute Cloud (Amazon EC2) allows you to deploy virtual machines in the AWS Cloud. EC2 AMIs provide the configuration utilized to launch an EC2 instance. You can use AMIs for several use cases, such as configuring applications, applying security policies, and configuring development environments. Developers and system administrators can deploy configuration AMIs to bring up EC2 resources that require little-to-no setup. Often times, multiple patterns are adopted for building and deploying AMIs. Because of this, you need the ability to create a centralized, automated pattern that can output secure, customizable AMIs.
In this post, we demonstrate how to create an automated process that builds and deploys Center for Internet Security (CIS) Level 1 hardened AMIs. The pattern that we deploy includes Image Builder, a CIS Level 1 hardened AMI, an application running on EC2 instances, and Amazon Inspector for security analysis. You deploy the AMI configured with the Image Builder pipeline to an application stack. The application stack consists of EC2 instances running Nginx. Lastly, we show you how to re-hydrate your application stack with a new AMI utilizing AWS CloudFormation and Amazon EC2 launch templates. You use Amazon Inspector to scan the EC2 instances launched from the Image Builder-generated AMI against the CIS Level 1 Benchmark.
After going through this exercise, you should understand how to build, manage, and deploy AMIs to an application stack. The infrastructure deployed with this pipeline includes a basic web application, but you can use this pattern to fit many needs. After running through this post, you should feel comfortable using this pattern to configure an AMI pipeline for your organization.
The project we create in this post addresses the following use case: you need a process for building and deploying CIS Level 1 hardened AMIs to an application stack running on Amazon EC2. In addition to demonstrating how to deploy the AMI pipeline, we also illustrate how to refresh a running application stack with a new AMI. You learn how to deploy this configuration with the AWS Command Line Interface (AWS CLI) and AWS CloudFormation.
AWS services used Image Builder allows you to develop an automated workflow for creating AMIs to fit your organization’s needs. You can streamline the creation and distribution of secure images, automate your patching process, and define security and application configuration into custom AWS AMIs. In this post, you use the following AWS services to implement this solution:
AWS CloudFormation – AWS CloudFormation allows you to use domain-specific languages or simple text files to model and provision, in an automated and secure manner, all the resources needed for your applications across all Regions and accounts. You can deploy AWS resources in a safe, repeatable manner, and automate the provisioning of infrastructure.
AWS KMS – Amazon Key Management Service (AWS KMS) is a fully managed service for creating and managing cryptographic keys. These keys are natively integrated with most AWS services. You use a KMS key in this post to encrypt resources.
Amazon S3 – Amazon Simple Storage Service (Amazon S3) is an object storage service utilized for storing and encrypting data. We use Amazon S3 to store our configuration files.
AWS Auto Scaling – AWS Auto Scaling allows you to build scaling plans that automate how groups of different resources respond to changes in demand. You can optimize availability, costs, or a balance of both. We use Auto Scaling to manage Nginx on Amazon EC2.
Launch templates – Launch templates contain configurations such as AMI ID, instance type, and security group. Launch templates enable you to store launch parameters so that they don’t have to be specified every time instances are launched.
Amazon Inspector – This automated security assessment service improves the security and compliance of applications deployed on AWS. Amazon Inspector automatically assesses applications for exposures, vulnerabilities, and deviations from best practices.
Architecture overview We use Ansible as a configuration management component alongside Image Builder. The CIS Ansible Playbook applies a Level 1 set of rules to the local host of which the AMI is provisioned on. For more information about the Ansible Playbook, see the GitHub repo. Image Builder offers AMIs with Security Technical Implementation Guides (STIG) levels low-high as part of its pipeline build.
The following diagram depicts the phases of the Image Builder pipeline for building a Nginx web server. The numbers 1–6 represent the order of when each phase runs in the build process:
Source
Build components
Validate
Test
Distribute
AMI
Figure: Shows the EC2 Image Builder steps
The workflow includes the following steps:
Deploy the CloudFormation templates.
The template creates an Image Builder pipeline.
AWS Systems Manager completes the AMI build process.
Amazon EC2 starts an instance to build the AMI.
Systems Manager starts a test instance build after the first build is successful.
The AMI starts provisioning.
The Amazon Inspector CIS benchmark starts.
CloudFormation templates You deploy the following CloudFormation templates. These CloudFormation templates have a great deal of configurations. They deploy the following resources:
vpc.yml – Contains all the core networking configuration. It deploys the VPC, two private subnets, two public subnets, and the route tables. The private subnets utilize a NAT gateway to communicate to the internet. The public subnets have full outbound access to the IGW.
kms.yml – Contains the AWS KMS configuration that we use for encrypting resources. The KMS key policy is also configured in this template.
s3-iam-config.yml – Contains the launch configuration and autoscaling groups for the initial Nginx launch. For updates and patching to Nginx, we use Image Builder to build those changes.
infrastructure-ssm-params.yml – Contains the Systems Manager parameter store configuration. The parameters are populated by using outputs from other CloudFormation templates.
nginx-config.yml – Contains the configuration for Nginx. Additionally, this template contains the network load balancer, target groups, security groups, and EC2 instance AWS Identity and Access Management (IAM) roles.
nginx-image-builder.yml – Contains the configuration for the Image Builder pipeline that we use to build AMIs.
Prerequisites To follow the steps to provision the pipeline deployment, you must have the following prerequisites:
An AWS account with local credentials properly configured (typically under ~/.aws/credentials).
This solution uses a couple of service-linked roles. Let’s generate these roles using the AWS CLI.
1. Run the following commands:
aws iam create-service-linked-role --aws-service-name autoscaling.amazonaws.com
aws iam create-service-linked-role --aws-service-name imagebuilder.amazonaws.com
If you see a message similar to following code, it means that you already have the service-linked role created in your account and you can move on to the next step:
An error occurred (InvalidInput) when calling the CreateServiceLinkedRole operation: Service role name AWSServiceRoleForImageBuilder has been taken in this account, please try a different suffix.
Now that you have generated the IAM roles used in this post, you add them to the KMS key policy. This allows the roles to encrypt and decrypt the KMS key.
2. Open the AnsibleConfig/component-nginx.yml file and update the <input_s3_bucket_name> value with the bucket name you generated from the s3-iam-config stack:
You now assume the EC2ImageBuilderRole IAM role from the command line. This role allows you to create objects in the S3 bucket generated from the s3-iam-config stack. Because this bucket is encrypted with AWS KMS, any user or IAM role requires specific permissions to decrypt the key. You have already accounted for this in a previous step by adding the EC2ImageBuilderRole IAM role to the KMS key policy.
2. Create the following environment variable to use the EC2ImageBuilderRole role. Update the values with the output from the previous step:
2. Open the Parameters/nginx-image-builder-params.json file and update the ImageBuilderBucketName parameter with the S3 bucket name generated in the s3-iam-config stack:
1. On the Image Builder console, choose Image pipelines to see the status of the pipeline.
Figure: Shows the EC2 Image Builder Pipeline status
2. Choose the pipeline (for this post, cis-image-builder-LinuxCis-Pipeline)
On the pipeline details page, you can view more information and make updates to its configuration.
Figure: Shows the Image Builder Pipeline metadata
At this point, the Image Builder pipeline has started running the automation document in Systems Manager. Here you can monitor the progress of the AMI build.
3. On the Systems Manager console, choose Automation.
4. Choose the execution ID of the arn:aws:ssm:us-east-1:123456789012:document/ImageBuilderBuildImageDocument document.
Figure: Shows the Image Builder Pipeline Systems Manager Automation steps
5. Choose the step ID to see what is happening in each step.
At this point, the Image Builder pipeline is bringing up an Amazon Linux 2 EC2 instance. From there, we run Ansible playbooks that configure the security and application settings. The automation is pulling its configuration from the S3 bucket you deployed in a previous step. When the Ansible run is complete, the instance stops and an AMI is generated from this instance. When this is complete, a cleanup is initiated that ends the EC2 instance. The final result is a CIS Level 1 hardened Amazon Linux 2 AMI running Nginx.
Updating parameters
When the stack is complete, you retrieve some new parameter values.
1. On the Systems Manager console, choose Automation.
2. Choose the execution ID of the arn:aws:ssm:us-east-1:123456789012:document/ImageBuilderBuildImageDocument document.
3. Choose step 21.
The following screenshot shows the output of this step.
Figure: Shows step of EC2 Image Builder Pipeline
4. Open the Parameters/nginx-config.json file and update the AmiId parameter with the AMI ID generated from the previous step:
Let’s verify that our Nginx service is up and running properly. You use Session Manager to connect to a testing instance.
1. On the Amazon EC2 console, choose Instances.
You should see three instances, as in the following screenshot.
Figure: Shows the Nginx EC2 instances
You can connect to either one of the Nginx instances.
2. Select the testing instance.
3. On the Actions menu, choose Connect.
4. Choose Session Manager.
5. Choose Connect.
A terminal on the EC2 instance opens, similar to the following screenshot.
Figure: Shows the Session Manager terminal
6. Run the following command to ensure that Nginx is running properly:
curl localhost:8080
You should see an output similar to the following screenshot.
Figure: Shows Nginx output from terminal
Reviewing resources and configurations
Now that you have deployed the core services that for the solution, take some time to review the services that you have just deployed.
IAM roles
This project creates several IAM roles that are used to manage AWS resources. For example, EC2ImageBuilderRole is used to configure new AMIs with the Image Builder pipeline. This role contains only the permissions required to manage the Image Builder process. Adopting this pattern enforces the practice of least privilege. Additionally, many of the IAM polices attached to the IAM roles are restricted down to specific AWS resources. Let’s look at a couple of examples of managing IAM permissions with this project.
The following policy restricts Amazon S3 access to a specific S3 bucket. This makes sure that the role this policy is attached to can only access this specific S3 bucket. If this role needs to access any additional S3 buckets, the resource has to be explicitly added.
Let’s look at the EC2ImageBuilderRole. A common scenario that occurs is when you need to assume a role locally in order to perform an action on a resource. In this case, because you’re using AWS KMS to encrypt the S3 bucket, you need to assume a role that has access to decrypt the KMS key so that artifacts can be uploaded to the S3 bucket. In the following AssumeRolePolicyDocument, we allow Amazon EC2 and Systems Manager services to be assumed by this role. Additionally, we allow IAM users to assume this role as well.
The principle !Sub 'arn:aws:iam::${AWS::AccountId}:root allows for any IAM user in this account to assume this role locally. Normally, this role should be scoped down to specific IAM users or roles. For the purpose of this post, we grant permissions to all users of the account.
Nginx configuration
The AMI built from the Image Builder pipeline contains all of the application and security configurations required to run Nginx as a web application. When an instance is launched from this AMI, no additional configuration is required.
We use Amazon EC2 launch templates to configure the application stack. The launch templates contain information such as the AMI ID, instance type, and security group. When a new AMI is provisioned, you simply update the launch template CloudFormation parameter with the new AMI and update the CloudFormation stack. From here, you can start an Auto Scaling group instance refresh to update the application stack to use the new AMI. The Auto Scaling group is updated with instances running on the updated AMI by bringing down one instance at a time and replacing it.
Amazon Inspector configuration
Amazon Inspector is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS. With Amazon Inspector, assessments are generated for exposure, vulnerabilities, and deviations from best practices.
After performing an assessment, Amazon Inspector produces a detailed list of security findings prioritized by level of severity. These findings can be reviewed directly or as part of detailed assessment reports that are available via the Amazon Inspector console or API. We can use Amazon Inspector to assess our security posture against the CIS Level 1 standard that we use our Image Builder pipeline to provision. Let’s look at how we configure Amazon Inspector.
A resource group defines a set of tags that, when queried, identify the AWS resources that make up the assessment target. Any EC2 instance that is launched with the tag specified in the resource group is in scope for Amazon Inspector assessment runs. The following code shows our configuration:
In the following code, we specify the tag set in the resource group, which makes sure that when an instance is launched from this AMI, it’s under the scope of Amazon Inspector:
This starts a new AMI build with an Amazon Inspector evaluation. The process can take up to 2 hours to complete.
3. On the Amazon Inspector console, choose Assessment Runs.
Figure: Shows Amazon Inspector Assessment Run
4. Under Reports, choose Download report.
5. For Select report type, select Findings report.
6. For Select report format, select PDF.
7. Choose Generate report.
The following screenshot shows the findings report from the Amazon Inspector run.
This report generates an assessment against the CIS Level 1 standard. Any policies that don’t comply with the CIS Level 1 standard are explicitly called out in this report.
Section 3.1 lists any failed policies.
Figure: Shows Inspector findings
These failures are detailed later in the report, along with suggestions for remediation.
In section 4.1, locate the entry 1.3.2 Ensure filesystem integrity is regularly checked. This section shows the details of a failure from the Amazon Inspector findings report. You can also see suggestions on how to remediate the issue. Under Recommendation, the findings report suggests a specific command that you can use to remediate the issue.
Figure: Shows Inspector findings issue
You can use the Image Builder pipeline to simply update the Ansible playbooks with this setting, then run the Image Builder pipeline to build a new AMI, deploy the new AMI to an EC2 Instance, and run the Amazon Inspector report to ensure that the issue has been resolved. Finally, we can see the specific instances that have been assessed that have this issue.
Organizations often customize security settings based off of a given use case. Your organization may choose CIS Level 1 as a standard but elect to not apply all the recommendations. For example, you might choose to not use the FirewallD service on your Linux instances, because you feel that using Amazon EC2 security groups gives you enough networking security in place that you don’t need an additional firewall. Disabling FirewallD causes a high severity failure in the Amazon Inspector report. This is expected and can be ignored when evaluating the report.
Conclusion In this post, we showed you how to use Image Builder to automate the creation of AMIs. Additionally, we also showed you how to use the AWS CLI to deploy CloudFormation stacks. Finally, we walked through how to evaluate resources against CIS Level 1 Standard using Amazon Inspector.
About the Authors
Joe Keating is a Modernization Architect in Professional Services at Amazon Web Services. He works with AWS customers to design and implement a variety of solutions in the AWS Cloud. Joe enjoys cooking with a glass or two of wine and achieving mediocrity on the golf course.
Virginia Chu is a Sr. Cloud Infrastructure Architect in Professional Services at Amazon Web Services. She works with enterprise-scale customers around the globe to design and implement a variety of solutions in the AWS Cloud.
This post is authored by Brooke Chen, Senior Product Manager for AWS Compute Optimizer, Letian Feng, Principal Product Manager for AWS Compute Optimizer, and Chad Schmutzer, Principal Developer Advocate for Amazon EC2
Optimizing compute resources is a critical component of any application architecture. Over-provisioning compute can lead to unnecessary infrastructure costs, while under-provisioning compute can lead to poor application performance.
Launched in December 2019, AWS Compute Optimizer is a recommendation service for optimizing the cost and performance of AWS compute resources. It generates actionable optimization recommendations tailored to your specific workloads. Over the last year, thousands of AWS customers reduced compute costs up to 25% by using Compute Optimizer to help choose the optimal Amazon EC2 instance types for their workloads.
One of the most frequent requests from customers is for AWS Lambda recommendations in Compute Optimizer. Today, we announce that Compute Optimizer now supports memory size recommendations for Lambda functions. This allows you to reduce costs and increase performance for your Lambda-based serverless workloads. To get started, opt in for Compute Optimizer to start finding recommendations.
Overview
With Lambda, there are no servers to manage, it scales automatically, and you only pay for what you use. However, choosing the right memory size settings for a Lambda function is still an important task. Computer Optimizer uses machine-learning based memory recommendations to help with this task.
These recommendations are available through the Compute Optimizer console, AWS CLI, AWS SDK, and the Lambda console. Compute Optimizer continuously monitors Lambda functions, using historical performance metrics to improve recommendations over time. In this blog post, we walk through an example to show how to use this feature.
Using Compute Optimizer for Lambda
This tutorial uses the AWS CLI v2 and the AWS Management Console.
In this tutorial, we setup two compute jobs that run every minute in AWS Region US East (N. Virginia). One job is more CPU intensive than the other. Initial tests show that the invocation times for both jobs typically last for less than 60 seconds. The goal is to either reduce cost without much increase in duration, or reduce the duration in a cost-efficient manner.
Based on these requirements, a serverless solution can help with this task. Amazon EventBridge can schedule the Lambda functions using rules. To ensure that the functions are optimized for cost and performance, you can use the memory recommendation support in Compute Optimizer.
In your AWS account, opt in to Compute Optimizer to start analyzing AWS resources. Ensure you have the appropriate IAM permissions configured – follow these steps for guidance. If you prefer to use the console to opt in, follow these steps. To opt in, enter the following command in a terminal window:
$ aws compute-optimizer update-enrollment-status --status Active
Once you enable Compute Optimizer, it starts to scan for functions that have been invoked for at least 50 times over the trailing 14 days. The next section shows two example scheduled Lambda functions for analysis.
Example Lambda functions
The code for the non-CPU intensive job is below. A Lambda function named lambda-recommendation-test-sleep is created with memory size configured as 1024 MB. An EventBridge rule is created to trigger the function on a recurring 1-minute schedule:
The code for the CPU intensive job is below. A Lambda function named lambda-recommendation-test-busy is created with memory size configured as 128 MB. An EventBridge rule is created to trigger the function on a recurring 1-minute schedule:
import json
import random
def lambda_handler(event, context):
random.seed(1)
x=0
for i in range(0, 20000000):
x+=random.random()
return {
'statusCode': 200,
'body': json.dumps('Sum:' + str(x))
}
Understanding the Compute Optimizer recommendations
Compute Optimizer needs a history of at least 50 invocations of a Lambda function over the trailing 14 days to deliver recommendations. Recommendations are created by analyzing function metadata such as memory size, timeout, and runtime, in addition to CloudWatch metrics such as number of invocations, duration, error count, and success rate.
Compute Optimizer will gather the necessary information to provide memory recommendations for Lambda functions, and make them available within 48 hours. Afterwards, these recommendations will be refreshed daily.
These are recent invocations for the non-CPU intensive function:
Function duration is approximately 31.3 seconds with a memory setting of 1024 MB, resulting in a duration cost of about $0.00052 per invocation. Here are the recommendations for this function in the Compute Optimizer console:
The function is Not optimized with a reason of Memory over-provisioned. You can also fetch the same recommendation information via the CLI:
The Compute Optimizer recommendation contains useful information about the function. Most importantly, it has determined that the function is over-provisioned for memory. The attribute findingReasonCodes shows the value MemoryOverprovisioned. In memorySizeRecommendationOptions, Compute Optimizer has found that using a memory size of 900 MB results in an expected invocation duration of approximately 31.5 seconds.
For non-CPU intensive jobs, reducing the memory setting of the function often doesn’t have a negative impact on function duration. The recommendation confirms that you can reduce the memory size from 1024 MB to 900 MB, saving cost without significantly impacting duration. The new duration cost per invocation saves approximately 12%.
The Compute Optimizer console validates these calculations:
These are recent invocations for the second function which is CPU-intensive:
The function duration is about 37.5 seconds with a memory setting of 128 MB, resulting in a duration cost of about $0.000078 per invocation. The recommendations for this function appear in the Compute Optimizer console:
The function is also Not optimized with a reason of Memory under-provisioned. The same recommendation information is available via the CLI:
For this function, Compute Optimizer has determined that the function’s memory is under-provisioned. The value of findingReasonCodes is MemoryUnderprovisioned. The recommendation is to increase the memory from 128 MB to 160 MB.
This recommendation may seem counter-intuitive, since the function only uses 55 MB of memory per invocation. However, Lambda allocates CPU and other resources linearly in proportion to the amount of memory configured. This means that increasing the memory allocation to 160 MB also reduces the expected duration to around 28.7 seconds. This is because a CPU-intensive task also benefits from the increased CPU performance that comes with the additional memory.
After applying this recommendation, the new expected duration cost per invocation is approximately $0.000075. This means that for almost no change in duration cost, the job latency is reduced from 37.5 seconds to 28.7 seconds.
The Compute Optimizer console validates these calculations:
Applying the Compute Optimizer recommendations
To optimize the Lambda functions using Compute Optimizer recommendations, use the following CLI command:
After invoking the function multiple times, we can see metrics of these invocations in the console. This shows that the function duration has not changed significantly after reducing the memory size from 1024 MB to 900 MB. The Lambda function has been successfully cost-optimized without increasing job duration:
To apply the recommendation to the CPU-intensive function, use the following CLI command:
After invoking the function multiple times, the console shows that the invocation duration is reduced to about 28 seconds. This matches the recommendation’s expected duration. This shows that the function is now performance-optimized without a significant cost increase:
Final notes
A couple of final notes:
Not every function will receive a recommendation. Compute optimizer only delivers recommendations when it has high confidence that these recommendations may help reduce cost or reduce execution duration.
As with any changes you make to an environment, we strongly advise that you test recommended memory size configurations before applying them into production.
Conclusion
You can now use Compute Optimizer for serverless workloads using Lambda functions. This can help identify the optimal Lambda function configuration options for your workloads. Compute Optimizer supports memory size recommendations for Lambda functions in all AWS Regions where Compute Optimizer is available. These recommendations are available to you at no additional cost. You can get started with Compute Optimizer from the console.
Artifact repositories are often used to share software packages for use in builds and deployments. Java developers using Apache Maven use artifact repositories to share and reuse Maven packages. For example, one team might own a web service framework that is used by multiple other teams to build their own services. The framework team can publish the framework as a Maven package to an artifact repository, where new versions can be picked up by the service teams as they become available. This post explains how you can set up a continuous integration pipeline with AWS CodePipeline and AWS CodeBuild to deploy Maven artifacts to AWS CodeArtifact. CodeArtifact is a fully managed pay-as-you-go artifact repository service with support for software package managers and build tools like Maven, Gradle, npm, yarn, twine, and pip.
Solution overview
The pipeline we build is triggered each time a code change is pushed to the AWS CodeCommit repository. The code is compiled using the Java compiler, unit tested, and deployed to CodeArtifact. After the artifact is published, it can be consumed by developers working in applications that have a dependency on the artifact or by builds running in other pipelines. The following diagram illustrates this architecture.
All the components in this pipeline are fully managed and you don’t pay for idle capacity or have to manage any servers.
Prerequisites
This post assumes you have the following tools installed and configured:
To create the CodeArtifact domain, CodeArtifact repository, CodeCommit, CodePipeline, CodeBuild, and associated resources, we use AWS CloudFormation. Save the provided CloudFormation template below as codeartifact-cicd-pipeline.yaml and create a stack:
Initialize a Git repository for the Maven project and add the CodeCommit repository that was created in the CloudFormation stack as a remote repository:
The Maven project’s POM file needs to be updated with the distribution management section. This lets Maven know where to publish artifacts. Add the distributionManagement section inside the project element of the POM. Be sure to update the URL with the correct URL for the CodeArtifact repository you created earlier. You can find the CodeArtifact repository URL with the get-repository-endpoint CLI command:
<distributionManagement>
<repository>
<id>codeartifact</id>
<name>codeartifact</name>
<url>Replace with the URL from the get-repository-endpoint command</url>
</repository>
</distributionManagement>
Creating a settings.xml file
Maven needs credentials to use to authenticate with CodeArtifact when it performs the deployment. CodeArtifact uses temporary authorization tokens. To pass the token to Maven, a settings.xml file is created in the top level of the Maven project. During the deployment stage, Maven is instructed to use the settings.xml in the top level of the project instead of the settings.xml that normally resides in $HOME/.m2. Create a settings.xml in the top level of the Maven project with the following contents:
CodeBuild uses a build specification file with commands and related settings that are used during the build, test, and delivery of the artifact. In the build specification file, we specify the CodeBuild runtime to use pre-build actions (update AWS CLI), and build actions (Maven build, test, and deploy). When Maven is invoked, it is provided the path to the settings.xml created in the previous step, instead of the default in $HOME/.m2/settings.xml. Create the buildspec.yaml as shown in the following code:
The final step is to add the files in the Maven project to the Git repository and push the changes to CodeCommit. This triggers the pipeline to run. See the following code:
git checkout -b main
git add settings.xml buildspec.yaml pom.xml src
git commit -a -m "Initial commit"
git push --set-upstream origin main
Checking the pipeline
At this point, the pipeline starts to run. To check its progress, sign in to the AWS Management Console and choose the Region where you created the pipeline. On the CodePipeline console, open the pipeline that the CloudFormation stack created. The pipeline’s name is prefixed with the stack name. If you open the CodePipeline console before the pipeline is complete, you can watch each stage run (see the following screenshot).
If you see that the pipeline failed, you can choose the details in the action that failed for more information.
Checking for new artifacts published in CodeArtifact
When the pipeline is complete, you should be able to see the artifact in the CodeArtifact repository you created earlier. The artifact we published for this post is a Maven snapshot. CodeArtifact handles snapshots differently than release versions. For more information, see Use Maven snapshots. To find the artifact in CodeArtifact, complete the following steps:
On the CodeArtifact console, choose Repositories.
Choose the repository we created earlier named myrepo.
Search for the package named my-app.
Choose the my-app package from the search results.
Choose the Dependencies tab to bring up a list of Maven dependencies that the Maven project depends on.
Cleaning up
To clean up the resources you created in this post, you need to remove them in the following order:
This post covered how to build a continuous integration pipeline to deliver Maven artifacts to AWS CodeArtifact. You can modify this solution for your specific needs. For more information about CodeArtifact or the other services used, see the following:
As applications become increasingly distributed and complex, operators need more automated practices to maintain application availability and reduce the time and effort spent on detecting, debugging, and resolving operational issues.
Amazon DevOps Guru is a machine learning (ML) powered service that gives you a simpler way to improve an application’s availability and reduce expensive downtime. Without involving any complex configuration setup, DevOps Guru automatically ingests operational data in your AWS Cloud. When DevOps Guru identifies a critical issue, it automatically alerts you with a summary of related anomalies, the likely root cause, and context on when and where the issue occurred. DevOps Guru also, when possible, provides prescriptive recommendations on how to remediate the issue.
Using Amazon DevOps Guru is easy and doesn’t require you to have any ML expertise. To get started, you need to configure DevOps Guru and specify which AWS resources to analyze. If your applications are distributed across multiple AWS accounts and AWS Regions, you need to configure DevOps Guru for each account-Region combination. Though this may sound complex, it’s in fact very simple to do so using AWS CloudFormation StackSets. This post walks you through the steps to configure DevOps Guru across multiple AWS accounts or organizational units, using AWS CloudFormation StackSets.
Solution overview
The goal of this post is to provide you with sample templates to facilitate onboarding Amazon DevOps Guru across multiple AWS accounts. Instead of logging into each account and enabling DevOps Guru, you use AWS CloudFormation StackSets from the primary account to enable DevOps Guru across multiple accounts in a single AWS CloudFormation operation. When it’s enabled, DevOps Guru monitors your associated resources and provides you with detailed insights for anomalous behavior along with intelligent recommendations to mitigate and incorporate preventive measures.
We consider various options in this post for enabling Amazon DevOps Guru across multiple accounts and Regions:
All resources across multiple accounts and Regions
Resources from specific CloudFormation stacks across multiple accounts and Regions
For All resources in an organizational unit
In the following diagram, we launch the AWS CloudFormation StackSet from a primary account to enable Amazon DevOps Guru across two AWS accounts and carry out operations to generate insights. The StackSet uses a single CloudFormation template to configure DevOps Guru, and deploys it across multiple accounts and regions, as specified in the command.
Figure: Shows enabling of DevOps Guru using CloudFormation StackSets
When Amazon DevOps Guru is enabled to monitor your resources within the account, it uses a combination of vended Amazon CloudWatch metrics, AWS CloudTrail logs, and specific patterns from its ML models to detect an anomaly. When the anomaly is detected, it generates an insight with the recommendations.
Figure: Shows DevOps Guru monitoring the resources and generating insights for anomalies detected
Prerequisites
To complete this post, you should have the following prerequisites:
Two AWS accounts. For this post, we use the account numbers 111111111111 (primary account) and 222222222222. We will carry out the CloudFormation operations and monitoring of the stacks from this primary account.
To use organizations instead of individual accounts, identify the organization unit (OU) ID that contains at least one AWS account.
Access to a bash environment, either using an AWS Cloud9 environment or your local terminal with the AWS Command Line Interface (AWS CLI) installed.
(a) Using an AWS Cloud9 environment or AWS CLI terminal We recommend using AWS Cloud9 to create an environment to get access to the AWS CLI from a bash terminal. Make sure you select Linux2 as the operating system for the AWS Cloud9 environment.
Alternatively, you may use your bash terminal in your favorite IDE and configure your AWS credentials in your terminal.
(b) Creating IAM roles
If you are using Organizations for account management, you would not need to create the IAM roles manually and instead use Organization based trusted access and SLRs. You may skip the sections (b), (c) and (d). If not using Organizations, please read further.
Before you can deploy AWS CloudFormation StackSets, you must have the following IAM roles:
AWSCloudFormationStackSetAdministrationRole
AWSCloudFormationStackSetExecutionRole
The IAM role AWSCloudFormationStackSetAdministrationRole should be created in the primary account whereas AWSCloudFormationStackSetExecutionRole role should be created in all the accounts where you would like to run the StackSets.
If you’re already using AWS CloudFormation StackSets, you should already have these roles in place. If not, complete the following steps to provision these roles.
(c) Creating the AWSCloudFormationStackSetAdministrationRole role To create the AWSCloudFormationStackSetAdministrationRole role, sign in to your primary AWS account and go to the AWS Cloud9 terminal.
Execute the following command to download the file:
(d) Creating the AWSCloudFormationStackSetExecutionRole role You now create the role AWSCloudFormationStackSetExecutionRole in the primary account and other target accounts where you want to enable DevOps Guru. For this post, we create it for our two accounts and two Regions (us-east-1 and us-east-2).
Execute the following command to download the file:
Now that the roles are provisioned, you can use AWS CloudFormation StackSets in the next section.
Running AWS CloudFormation StackSets to enable DevOps Guru
With the required IAM roles in place, now you can deploy the stack sets to enable DevOps Guru across multiple accounts.
As a first step, go to your bash terminal and clone the GitHub repository to access the CloudFormation templates:
git clone https://github.com/aws-samples/amazon-devopsguru-samples
cd amazon-devopsguru-samples/enable-devopsguru-stacksets
(a) Configuring Amazon SNS topics for DevOps Guru to send notifications for operational insights
If you want to receive notifications for operational insights generated by Amazon DevOps Guru, you need to configure an Amazon Simple Notification Service (Amazon SNS) topic across multiple accounts. If you have already configured SNS topics and want to use them, identify the topic name and directly skip to the step to enable DevOps Guru.
Note for Central notification target: You may prefer to configure an SNS Topic in the central AWS account so that all Insight notifications are sent to a single target. In such a case, you would need to modify the central account SNS topic policy to allow other accounts to send notifications.
To create your stack set, enter the following command (provide an email for receiving insights):
After running this command, the SNS topic devops-guru is created across both the accounts. Go to the email address specified and confirm the subscription by clicking the Confirm subscription link in each of the emails that you receive. Your SNS topic is now fully configured for DevOps Guru to use.
Figure: Shows creation of SNS topic to receive insights from DevOps Guru
(b) Enabling DevOps Guru
Let us first examine the CloudFormation template format to enable DevOps Guru and configure it to send notifications over SNS topics. See the following code snippet:
When the StackNames property is fed with a value of *, it enables DevOps Guru for all CloudFormation stacks. However, you can enable DevOps Guru for only specific CloudFormation stacks by providing the desired stack names as shown in the following code:
For the CloudFormation template in this post, we provide the names of the stacks using the parameter inputs. To enable the AWS CLI to accept a list of inputs, we need to configure the input type as CommaDelimitedList, instead of a base string. We also provide the parameter SnsTopicName, which the template substitutes into the TopicArn property.
See the following code:
AWSTemplateFormatVersion: 2010-09-09
Description: Enable Amazon DevOps Guru
Parameters:
CfnStackNames:
Type: CommaDelimitedList
Description: Comma separated names of the CloudFormation Stacks for DevOps Guru to analyze.
Default: "*"
SnsTopicName:
Type: String
Description: Name of SNS Topic
Resources:
DevOpsGuruMonitoring:
Type: AWS::DevOpsGuru::ResourceCollection
Properties:
ResourceCollectionFilter:
CloudFormation:
StackNames: !Ref CfnStackNames
DevOpsGuruNotification:
Type: AWS::DevOpsGuru::NotificationChannel
Properties:
Config:
Sns:
TopicArn: !Sub arn:aws:sns:${AWS::Region}:${AWS::AccountId}:${SnsTopicName}
Now that we reviewed the CloudFormation syntax, we will use this template to implement the solution. For this post, we will consider three use cases for enabling Amazon DevOps Guru:
(i) For all resources across multiple accounts and Regions
(ii) For all resources from specific CloudFormation stacks across multiple accounts and Regions
(iii) For all resources in an organization
Let us review each of the above points in detail.
(i) Enabling DevOps Guru for all resources across multiple accounts and Regions
Note: Carry out the following steps in your primary AWS account.
You can use the CloudFormation template (EnableDevOpsGuruForAccount.yml) from the current directory, create a stack set, and then instantiate AWS CloudFormation StackSets instances across desired accounts and Regions.
The following screenshot of the AWS CloudFormation console in the primary account running StackSet, shows the stack set deployed in both accounts.
Figure: Screenshot for deployed StackSet and Stack instances
The following screenshot of the Amazon DevOps Guru console shows DevOps Guru is enabled to monitor all CloudFormation stacks.
Figure: Screenshot of DevOps Guru dashboard showing DevOps Guru enabled for all CloudFormation stacks
(ii) Enabling DevOps Guru for specific CloudFormation stacks for individual accounts
Note: Carry out the following steps in your primary AWS account
In this use case, we want to enable Amazon DevOps Guru only for specific CloudFormation stacks for individual accounts. We use the AWS CloudFormation StackSets override parameters feature to rerun the stack set with specific values for CloudFormation stack names as parameter inputs. For more information, see Override parameters on stack instances.
If you haven’t created the stack instances for individual accounts, use the create-stack-instances AWS CLI command and pass the parameter overrides. If you have already created stack instances, update the existing stack instances using update-stack-instances and pass the parameter overrides. Replace the required account number, Regions, and stack names as needed.
In account 111111111111, create instances with the parameter override with the following command, where CloudFormation stacks STACK-NAME-1 and STACK-NAME-2 belong to this account in us-east-1 Region:
In account 222222222222, create instances with the parameter override with the following command, where CloudFormation stacks STACK-NAME-A and STACK-NAME-B belong to this account in the us-east-1 Region:
The following example shows the command line using multiple Regions to demonstrate the use. Update the OU as needed. If you need to use additional Regions, you may have to create an SNS topic in those Regions too.
To create a stack set for an OU and across multiple Regions, enter the following command:
In this way, you can run CloudFormation StackSets to enable and configure DevOps Guru across multiple accounts, Regions, with simple and easy steps.
Reviewing DevOps Guru insights
Amazon DevOps Guru monitors for anomalies in the resources in the CloudFormation stacks that are enabled for monitoring. The following screenshot shows the initial dashboard.
Figure: Screenshot of DevOps Guru dashboard
On enabling DevOps Guru, it may take up to 24 hours to analyze the resources and baseline the normal behavior. When it detects an anomaly, it highlights the impacted CloudFormation stack, logs insights that provide details about the metrics indicating an anomaly, and prints actionable recommendations to mitigate the anomaly.
Figure: Screenshot of DevOps Guru dashboard showing ongoing reactive insight
The following screenshot shows an example of an insight (which now has been resolved) that was generated for the increased latency for an ELB. The insight provides various sections in which it provides details about the metrics, the graphed anomaly along with the time duration, potential related events, and recommendations to mitigate and implement preventive measures.
Figure: Screenshot for an Insight generated about ELB Latency
Cleaning up
When you’re finished walking through this post, you should clean up or un-provision the resources to avoid incurring any further charges.
On the AWS CloudFormation StackSets console, choose the stack set to delete.
On the Actions menu, choose Delete stacks from StackSets.
After you delete the stacks from individual accounts, delete the stack set by choosing Delete StackSet.
Un-provision the environment for AWS Cloud9.
Conclusion
This post reviewed how to enable Amazon DevOps Guru using AWS CloudFormation StackSets across multiple AWS accounts or organizations to monitor the resources in existing CloudFormation stacks. Upon detecting an anomaly, DevOps Guru generates an insight that includes the vended CloudWatch metric, the CloudFormation stack in which the resource existed, and actionable recommendations.
We hope this post was useful to you to onboard DevOps Guru and that you try using it for your production needs.
About the Authors
Nikunj Vaidya is a Sr. Solutions Architect with Amazon Web Services, focusing in the area of DevOps services. He builds technical content for the field enablement and offers technical guidance to the customers on AWS DevOps solutions and services that would streamline the application development process, accelerate application delivery, and enable maintaining a high bar of software quality.
Nuatu Tseggai is a Cloud Infrastructure Architect at Amazon Web Services. He enjoys working with customers to design and build event-driven distributed systems that span multiple services.
This post is contributed by Bill Kerr and Raj Seshadri
For most customers, infrastructure is hardly done with CI/CD in mind. However, Infrastructure as Code (IaC) should be a best practice for DevOps professionals when they provision cloud-native assets. Microservice apps that run inside an Amazon EKS cluster often use CI/CD, so why not the cluster and related cloud infrastructure as well?
This blog demonstrates how to spin up cluster infrastructure managed by CI/CD using CDK code and Cloud Resource Property Manager (CRPM) property files. Managing cloud resources is ultimately about managing properties, such as instance type, cluster version, etc. CRPM helps you organize all those properties by importing bite-sized YAML files, which are stitched together with CDK. It keeps all of what’s good about YAML in YAML, and places all of the logic in beautiful CDK code. Ultimately this improves productivity and reliability as it eliminates manual configuration steps.
Architecture Overview
In this architecture, we create a six node Amazon EKS cluster. The Amazon EKS cluster has a node group spanning private subnets across two Availability Zones. There are two public subnets in different Availability Zones available for use with an Elastic Load Balancer.
Changes to the primary (master) branch triggers a pipeline, which creates CloudFormation change sets for an Amazon EKS stack and a CI/CD stack. After human approval, the change sets are initiated (executed).
Prerequisites
Get ready to deploy the CloudFormation stacks with CDK
First, to get started with CDK you spin up a AWS Cloud9 environment, which gives you a code editor and terminal that runs in a web browser. Using AWS Cloud9 is optional but highly recommended since it speeds up the process.
Leave the default settings and select Next step again.
Select Create environment.
Download and install the dependencies and demo CDK application
In a terminal, let’s review the code used in this article and install it.
# Install TypeScript globally for CDK
npm i -g typescript
# If you are running these commands in Cloud9 or already have CDK installed, then
skip this command
npm i -g aws-cdk
# Clone the demo CDK application code
git clone https://github.com/shi/crpm-eks
# Change directory
cd crpm-eks
# Install the CDK application
npm i
Create the IAM service role
When creating an EKS cluster, the IAM role that was used to create the cluster is also the role that will be able to access it afterwards.
Deploy the CloudFormation stack containing the role
Let’s deploy a CloudFormation stack containing a role that will later be used to create the cluster and also to access it. While we’re at it, let’s also add our current user ARN to the role, so that we can assume the role.
# Deploy the EKS management role CloudFormation stack
cdk deploy role --parameters AwsArn=$(aws sts get-caller-identity --query Arn --output text)
# It will ask, "Do you wish to deploy these changes (y/n)?"
# Enter y and then press enter to continue deploying
Notice the Outputs section that shows up in the CDK deploy results, which contains the role name and the role ARN. You will need to copy and paste the role ARN (ex. arn:aws:iam::123456789012:role/eks-role-us-east- 1) from your Outputs when deploying the next stack.
Example Outputs:
role.ExportsOutputRefRoleFF41A16F = eks-role-us-east-1
role.ExportsOutputFnGetAttRoleArnED52E3F8 = arn:aws:iam::123456789012:role/eksrole-us-east-1
Create the EKS cluster
Now that we have a role created, it’s time to create the cluster using that role.
Deploy the stack containing the EKS cluster in a new VPC
Expect it to take over 20 minutes for this stack to deploy.
# Deploy the EKS cluster CloudFormation stack
# REPLACE ROLE_ARN WITH ROLE ARN FROM OUTPUTS IN ROLE STACK CREATED ABOVE
cdk deploy eks -r ROLE_ARN
# It will ask, "Do you wish to deploy these changes (y/n)?"
# Enter y and then press enter to continue deploying
Notice the Outputs section, which contains the cluster name (ex. eks-demo) and the UpdateKubeConfigCommand. The UpdateKubeConfigCommand is useful if you already have kubectl installed somewhere and would rather use your own to interact with the cluster instead of using Cloud9’s.
Navigate to this page in the AWS console if you would like to see your cluster, which is now ready to use.
Configure kubectl with access to cluster
If you are following along in Cloud9, you can skip configuring kubectl.
If you prefer to use kubectl installed somewhere else, now would be a good time to configure access to the newly created cluster by running the UpdateKubeConfigCommand mentioned in the Outputs section above. It requires that you have the AWS CLI installed and configured.
aws eks update-kubeconfig --name eks-demo --region us-east-1 --role-arn
arn:aws:iam::123456789012:role/eks-role-us-east-1
# Test access to cluster
kubectl get nodes
Leveraging Infrastructure CI/CD
Now that the VPC and cluster have been created, it’s time to turn on CI/CD. This will create a cloned copy of github.com/shi/crpm-eks in CodeCommit. Then, an AWS CloudWatch Events rule will start watching the CodeCommit repo for changes and triggering a CI/CD pipeline that builds and validates CloudFormation templates, and executes CloudFormation change sets.
Deploy the stack containing the code repo and pipeline
# Deploy the CI/CD CloudFormation stack
cdk deploy cicd
# It will ask, "Do you wish to deploy these changes (y/n)?"
# Enter y and then press enter to continue deploying
Notice the Outputs section, which contains the CodeCommit repo name (ex. eks-ci-cd). This is where the code now lives that is being watched for changes.
Example Outputs:
cicd.ExportsOutputFnGetAttLambdaRoleArn275A39EB =
arn:aws:iam::774461968944:role/eks-ci-cd-LambdaRole-6PFYXVSLTQ0D
cicd.ExportsOutputFnGetAttRepositoryNameC88C868A = eks-ci-cd
Review the pipeline for the first time
Navigate to this page in the AWS console and you should see a new pipeline in progress. The pipeline is automatically run for the first time when it is created, even though no changes have been made yet. Open the pipeline and scroll down to the Review stage. You’ll see that two change sets were created in parallel (one for the EKS stack and the other for the CI/CD stack).
Select Review to open an approval popup where you can enter a comment.
Select Reject or Approve. Following the Review button, the blue link to the left of Fetch: Initial commit by AWS CodeCommit can be selected to see the infrastructure code changes that triggered the pipeline.
Go ahead and approve it.
Clone the new AWS CodeCommit repo
Now that the golden source that is being watched for changes lives in a AWS CodeCommit repo, we need to clone that repo and get rid of the repo we’ve been using up to this point.
If you are following along in AWS Cloud9, you can skip cloning the new repo because you are just going to discard the old AWS Cloud9 environment and start using a new one.
Now would be a good time to clone the newly created repo mentioned in the preceding Outputs section Next, delete the old repo that was cloned from GitHub at the beginning of this blog. You can visit this repository to get the clone URL for the repo.
Review this documentation for help with accessing your private AWS CodeCommit repo using HTTPS.
Review this documentation for help with accessing your repo using SSH.
# Clone the CDK application code (this URL assumes the us-east-1 region)
git clone https://git-codecommit.us-east-1.amazonaws.com/v1/repos/eks-ci-cd
# Change directory
cd eks-ci-cd
# Install the CDK application
npm i
# Remove the old repo
rm -rf ../crpm-eks
Deploy the stack containing the Cloud9 IDE with kubectl and CodeCommit repo
If you are NOT using Cloud9, you can skip this section.
To make life easy, let’s create another Cloud9 environment that has kubectl preconfigured and ready to use, and also has the new CodeCommit repo checked out and ready to edit.
# Deploy the IDE CloudFormation stack
cdk deploy ide
Configuring the new Cloud9 environment
Although kubectl and the code are now ready to use, we still have to manually configure Cloud9 to stop using AWS managed temporary credentials in order for kubectl to be able to access the cluster with the management role. Here’s how to do that and test kubectl:
1. Navigate to this page in the AWS console. 2. In Your environments, select Open IDE for the newly created environment (possibly named eks-ide). 3. Once opened, navigate at the top to AWS Cloud9 -> Preferences. 4. Expand AWS SETTINGS, and under Credentials, disable AWS managed temporary credentials by selecting the toggle button. Then, close the Preferences tab. 5. In a terminal in Cloud9, enter aws configure. Then, answer the questions by leaving them set to None and pressing enter, except for Default region name. Set the Default region name to the current region that you created everything in. The output should look similar to:
AWS Access Key ID [None]:
AWS Secret Access Key [None]:
Default region name [None]: us-east-1
Default output format [None]:
6. Test the environment
kubectl get nodes
If everything is working properly, you should see two nodes appear in the output similar to:
NAME STATUS ROLES AGE VERSION
ip-192-168-102-69.ec2.internal Ready <none> 4h50m v1.17.11-ekscfdc40
ip-192-168-69-2.ec2.internal Ready <none> 4h50m v1.17.11-ekscfdc40
You can use kubectl from this IDE to control the cluster. When you close the IDE browser window, the Cloud9 environment will automatically shutdown after 30 minutes and remain offline until the next time you reopen it from the AWS console. So, it’s a cheap way to have a kubectl terminal ready when needed.
Delete the old Cloud9 environment
If you have been following along using Cloud9 the whole time, then you should have two Cloud9 environments running at this point (one that was used to initially create everything from code in GitHub, and one that is now ready to edit the CodeCommit repo and control the cluster with kubectl). It’s now a good time to delete the old Cloud9 environment.
In Your environments, select the radio button for the old environment (you named it when creating it) and select Delete.
In the popup, enter the word Delete and select Delete.
Now you should be down to having just one AWS Cloud9 environment that was created when you deployed the ide stack.
Trigger the pipeline to change the infrastructure
Now that we have a cluster up and running that’s defined in code stored in a AWS CodeCommit repo, it’s time to make some changes:
We’ll commit and push the changes, which will trigger the pipeline to update the infrastructure.
We’ll go ahead and make one change to the cluster nodegroup and another change to the actual CI/CD build process, so that both the eks-cluster stack as well as the eks-ci-cd stack get changed.
1. In the code that was checked out from AWS CodeCommit, open up res/compute/eks/nodegroup/props.yaml. At the bottom of the file, try changing minSize from 1 to 4, desiredSize from 2 to 6, and maxSize from 3 to 6 as seen in the following screenshot. Then, save the file and close it. The res (resource) directory is your well organized collection of resource properties files.
2. Next, open up res/developer-tools/codebuild/project/props.yaml and find where it contains computeType: ‘BUILD_GENERAL1_SMALL’. Try changing BUILD_GENERAL1_SMALL to BUILD_GENERAL1_MEDIUM. Then, save the file and close it.
3. Commit and push the changes in a terminal.
cd eks-ci-cd
git add .
git commit -m "Increase nodegroup scaling config sizes and use larger build
environment"
git push
4. Visit https://console.aws.amazon.com/codesuite/codepipeline/pipelines in the AWS console and you should see your pipeline in progress.
5. Wait for the Review stage to become Pending.
a. Following the Approve action box, click the blue link to the left of “Fetch: …” to see the infrastructure code changes that triggered the pipeline. You should see the two code changes you committed above.
3. After reviewing the changes, go back and select Review to open an approval popup.
4. In the approval popup, enter a comment and select Approve.
5. Wait for the pipeline to finish the Deploy stage as it executes the two change sets. You can refresh the page until you see it has finished. It should take a few minutes.
6. To see that the CodeBuild change has been made, scroll up to the Build stage of the pipeline and click on the AWS CodeBuild link as shown in the following screenshot.
7. Next, select the Build details tab, and you should determine that your Compute size has been upgraded to 7 GB memory, 4 vCPUs as shown in the following screenshot.
8. By this time, the cluster nodegroup sizes are probably updated. You can confirm with kubectl in a terminal.
# Get nodes
kubectl get nodes
If everything is ready, you should see six (desired size) nodes appear in the output similar to:
NAME STATUS ROLES AGE VERSION
ip-192-168-102-69.ec2.internal Ready <none> 5h42m v1.17.11-ekscfdc40
ip-192-168-69-2.ec2.internal Ready <none> 5h42m v1.17.11-ekscfdc40
ip-192-168-43-7.ec2.internal Ready <none> 10m v1.17.11-ekscfdc40
ip-192-168-27-14.ec2.internal Ready <none> 10m v1.17.11-ekscfdc40
ip-192-168-36-56.ec2.internal Ready <none> 10m v1.17.11-ekscfdc40
ip-192-168-37-27.ec2.internal Ready <none> 10m v1.17.11-ekscfdc40
Cluster is now manageable by code
You now have a cluster than can be maintained by simply making changes to code! The only resources not being managed by CI/CD in this demo, are the management role, and the optional AWS Cloud9 IDE. You can log into the AWS console and edit the role, adding other Trust relationships in the future, so others can assume the role and access the cluster.
Clean up
Do not try to delete all of the stacks at once! Wait for the stack(s) in a step to finish deleting before moving onto the next step.
1. Navigate to this page in the AWS console. 2. Delete the two IDE stacks first (the ide stack spawned another stack). 3. Delete the ci-cd stack. 4. Delete the cluster stack (this one takes a long time). 5. Delete the role stack.
Additional resources
Cloud Resource Property Manager (CRPM) is an open source project maintained by SHI, hosted on GitHub, and available through npm.
Conclusion
In this blog, we demonstrated how you can spin up an Amazon EKS cluster managed by CI/CD using CDK code and Cloud Resource Property Manager (CRPM) property files. Making updates to this cluster is easy as modifying the property files and updating the AWS CodePipline. Using CRPM can improve productivity and reliability because it eliminates manual configurations steps.
Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers
AWS Launch Wizard is a console-based service to quickly and easily size, configure, and deploy third party applications, such as Microsoft SQL Server Always On and HANA based SAP systems, on AWS without the need to identify and provision individual AWS resources. AWS Launch Wizard offers an easy way to deploy enterprise applications and optimize costs. Instead of selecting and configuring separate infrastructure services, you go through a few steps in the AWS Launch Wizard and it deploys a ready-to-use application on your behalf. It reduces the time you need to spend on investigating how to provision, cost and configure your application on AWS.
You can now use AWS Launch Wizard to deploy and configure self-managed Microsoft Windows Server Active Directory Domain Services running on Amazon Elastic Compute Cloud (EC2) instances. With Launch Wizard, you can have fully-functioning, production-ready domain controllers within a few hours—all without having to manually deploy and configure your resources.
You can use AWS Directory Service to run Microsoft Active Directory (AD) as a managed service, without the hassle of managing your own infrastructure. If you need to run your own AD infrastructure, you can use AWS Launch Wizard to simplify the deployment and configuration process.
In this post, I walk through creation of a cross-region Active Directory domain using Launch Wizard. First, I deploy a single Active Directory domain spanning two regions. Then, I configure Active Directory Sites and Services to match the network topology. Finally, I create a user account to verify replication of the Active Directory domain.
Figure 1: Diagram of resources deployed in this post
Prerequisites
You must have a VPC in your home. Additionally, you must have remote regions that have CIDRs that do not overlap with each other. If you need to create VPCs and subnets that do not overlap, please refer here.
Each subnet used must have outbound internet connectivity. Feel free to either use a NAT Gateway or Internet Gateway.
The VPCs must be peered in order to complete the steps in this post. For information on creating a VPC Peering connection between regions, please refer here.
If you choose to deploy your Domain Controllers to a private subnet, you must have an RDP jump / bastion instance setup to allow you to RDP to your instance.
Deploy Your Domain Controllers in the Home Region using Launch Wizard
In this section, I deploy the first set of domain controllers into the us-east-1 the home region using Launch Wizard. I refer to US-East-1 as the homeregion, and US-West-2 as the remoteregion.
Assign a Controller IP address for each domain controller
Remote Desktop Gateway preferences: Disregard for now, this is set up later.
Check the I confirm that a public subnet has been set up. Each of the selected private subnets have outbound connectivity enabled check box.
Select Next.
In the Define infrastructure requirements page, set the following inputs.
Storage and compute: Based on infrastructure requirements
Number of AD users: Up to 5000 users
Select Next.
In the Review and deploy page, review your selections. Then, select Deploy.
Note that it may take up to 2 hours for your domain to be deployed. Once the status has changed to Completed, you can proceed to the next section. In the next section, I prepare Active Directory Sites and Services for the second set of domain controller in my other region.
Configure Active Directory Sites and Services
In this section, I configure the Active Directory Sites and Services topology to match my network topology. This step ensures proper Active Directory replication routing so that domain clients can find the closest domain controller. For more information on Active Directory Sites and Services, please refer here.
Retrieve your Administrator Credentials from Secrets Manager
From the AWS Secrets Manager Console in us-east-1, select the Secret that begins with LaunchWizard-UsEast1AD.
In the middle of the Secret page, select Retrieve secret value.
This will display the username and password key with their values.
You need these credentials when you RDP into one of the domain controllers in the next steps.
Rename the Default First Site
Log in to the one of the domain controllers in us-east-1.
Select Start, type dssite and hit Enter on your keyboard.
The Active Directory Sites and Services MMC should appear.
Expand Sites. There is a site named Default-First-Site-Name.
Right click on Default-First-Site-Name select Rename.
Enter us-east-1 as the name.
Leave the Active Directory Sites and Services MMC open for the next set of steps.
Create a New Site and Subnet Definition for US-West-2
Using the Active Directory Sites and Services MMC from the previous steps, right click on Sites.
Select New Site… and enter the following inputs:
Name: us-west-2
Select DEFAULTIPSITELINK.
Select OK.
A pop up will appear telling you there will need to be some additional configuration. Select OK.
Expand Sites and right click on Subnets and select New Subnet.
Enter the following information:
Prefix: the CIDR of your us-west-2 VPC. An example would be 1.0.0/24
Site: select us-west-2
Select OK.
Leave the Active Directory Sites and Services MMC open for the following set of steps.
Configure Site Replication Settings
Using the Active Directory Sites and Services MMC from the previous steps, expand Sites, Inter-Site Transports, and select IP. You should see an object named DEFAULTIPSITELINK,
Right click on DEFAULTIPSITELINK.
Select Properties. Set or verify the following inputs on the General tab:
In the DEFAULTIPSITELINKProperties, select the Attribute Editor tab and modify the following:
Scroll down and double click on Enter 1 for the Value, then select OK twice.
For more information on these settings, please refer here.
Close the Active Directory Sites and Services MMC, as it is no longer needed.
Prepare Your Home Region Domain Controllers Security Group
In this section, I modify the Domain Controllers Security Group in us-east-1. This allows the domain controllers deployed in us-west-2 to communicate with each other.
Assign a Controller IP address for each domain controller
Remote Desktop Gateway preferences: disregard for now, as I set this later.
Check the I confirm that a public subnet has been set up. Each of the selected private subnets have outbound connectivity enabled check box
In the Define infrastructure requirements page set the following:
Storage and compute: Based on infrastructure requirements
Number of AD users: Up to 5000 users
In the Review and deploy page, review your selections. Then, select Deploy.
Note that it may take up to 2 hours to deploy domain controllers. Once the status has changed to Completed, proceed to the next section. In this next section, I prepare Active Directory Sites and Services for the second set of domain controller in another region.
Prepare Your Remote Region Domain Controllers Security Group
In this section, I modify the Domain Controllers Security Group in us-west-2. This allows the domain controllers deployed in us-west-2 to communicate with each other.
Select the Domain Controllers Security Group that was created by your Launch Wizard Active Directory.
Select Edit inbound rules. The Security Group should start with LaunchWizard-UsWest2AD-EC2ADStackExistingVPC-
Choose Add rule and enter the following:
Type: Select All traffic
Protocol: All
Port range: All
Source: Select Custom
Enter the CIDR of your remote VPC. An example would be 0.0.0/24
Choose Save rules.
Create an AD User and Verify Replication
In this section, I create a user in one region and verify that it replicated to the other region. I also use AD replication diagnostics tools to verify that replication is working properly.
Create a Test User Account
Log in to one of the domain controllers in us-east-1.
Select Start, type dsa and press Enter on your keyboard. The Active Directory Users and Computers MMC should appear.
Right click on the Users container and select New > User.
Enter the following inputs:
First name: John
Last name: Doe
User logon name: jdoe and select Next
Password and Confirm password: Your choice of complex password
Uncheck User must change password at next logon
Select Next.
Select Finish.
Verify Test User Account Has Replicated
Log in to the one of the domain controllers in us-west-2.
Select Start and type dsa.
Then, press Enter on your keyboard. The Active Directory Users and Computers MMC should appear.
Select Users. You should see a user object named John Doe.
Note that if the user is not present, it may not have been replicated yet. Replication should not take longer than 60 seconds from when the item was created.
Summary
Congratulations, you have created a cross-region Active Directory! In this post you:
Launched a new Active Directory forest in us-east-1 using AWS Launch Wizard.
Configured Active Directory Sites and Service for a multi-region configuration.
Launched a set of new domain controllers in the us-west-2 region using AWS Launch Wizard.
Created a test user and verified replication.
This post only touches on a couple of features that are available in the AWS Launch Wizard Active Directory deployment. AWS Launch Wizard also automates the creation of a Single Tier PKI infrastructure or trust creation. One of the prime benefits of this solution is the simplicity in deploying a fully functional Active Directory environment in just a few clicks. You no longer need to do the undifferentiated heavy lifting required to deploy Active Directory. For more information, please refer to AWS Launch Wizard documentation.
This post is authored by Deepthi Chelupati, Senior Product Manager for Amazon EC2 Spot Instances, and Chad Schmutzer, Principal Developer Advocate for Amazon EC2
Customers have been using EC2 Spot Instances to save money and scale workloads to new levels for over a decade. Launched in late 2009, Spot Instances are spare Amazon EC2 compute capacity in the AWS Cloud available for steep discounts off On-Demand Instance prices. One thing customers love about Spot Instances is their integration across many services on AWS, the AWS Partner Network, and open source software. These integrations unlock the ability to take advantage of the deep savings and scale Spot Instances provide for interruptible workloads. Some of the most popular services used with Spot Instances include Amazon EC2 Auto Scaling, Amazon EMR, AWS Batch, Amazon Elastic Container Service (Amazon ECS), and Amazon Elastic Kubernetes Service (EKS). When we talk with customers who are early in their cost optimization journey about the advantages of using Spot Instances with their favorite workloads, they typically can’t wait to get started. They often tell us that while they’d love to start using Spot Instances right away, flipping between documentation, blog posts, and the AWS Management Console is time consuming and eats up precious development cycles. They want to know the fastest way to start saving with Spot Instances, while feeling confident they’ve applied best practices. Customers have overwhelmingly told us that the best way for them to get started quickly is to see complete workload configuration examples in infrastructure as code templates for AWS CloudFormation and Hashicorp Terraform. To address this feedback, we launched Spot Blueprints.
Spot Blueprints overview
Today we are excited to tell you about Spot Blueprints, an infrastructure code template generator that lives right in the EC2 Spot Console. Built based on customer feedback, Spot Blueprints guides you through a few short steps. These steps are designed to gather your workload requirements while explaining and configuring Spot best practices along the way. Unlike traditional wizards, Spot Blueprints generates custom infrastructure as code in real time within each step so you can easily understand the details of the configuration. Spot Blueprints makes configuring EC2 instance type agnostic workloads easy by letting you express compute capacity requirements as vCPU and memory. Then the wizard automatically expands those requirements to a flexible list of EC2 instance types available in the AWS Region you are operating. Spot Blueprints also takes high-level Spot best practices like Availability Zone flexibility, and applies them to your workload compute requirements. For example, automatically including all of the required resources and dependencies for creating a Virtual Private Cloud (VPC) with all Availability Zones configured for use. Additional workload-specific Spot best practices are also configured, such as graceful interruption handling for load balancer connection draining, automatic job retries, or container rescheduling.
Spot Blueprints makes keeping up with the latest and greatest Spot features (like the capacity-optimized allocation strategy and Capacity Rebalancing for EC2 Auto Scaling) easy. Spot Blueprints is continually updated to support new features as they become available. Today, Spot Blueprints supports generating infrastructure as code workload templates for some of the most popular services used with Spot Instances: Amazon EC2 Auto Scaling, Amazon EMR, AWS Batch, and Amazon EKS. You can tell us what blueprint you’d like to see next right in Spot Blueprints. We are excited to hear from you!
What are Spot best practices?
We’ve mentioned Spot best practices a few times so far in this blog. Let’s quickly review the best practices and how they relate to Spot Blueprints. When using Spot Instances, it is important to understand a couple of points:
Spot Instances are interruptible and must be returned when EC2 needs the capacity back
The location and amount of spare capacity available at any given moment is dynamic and continually changes in real time
For these reasons, it is important to follow best practices when using Spot Instances in your workloads. We like to call these “Spot best practices.” These best practices can be summarized as follows:
Only run workloads that are truly interruption tolerant, meaning interruptible at both the individual instance level and overall application level
Spot workloads should be flexible, meaning they can be shifted in real time to where the spare capacity currently is, or otherwise be paused until spare capacity is available again
In practice, being flexible means qualifying a workload to run on multiple EC2 instance types (think big: multiple families, sizes, and generations), and in multiple Availability Zones, at any given time
Over the last few years, we’ve focused on making it easier to follow these best practices by adding features such as the following:
Mixed instances policy: an Auto Scaling group configuration to enhance availability by deploying across multiple instance types running in multiple Availability Zones
Amazon EC2 Instance Selector: a CLI tool and go library that recommends instance types based on resource criteria like vCPUs and memory
As mentioned prior, in addition to the Spot best practices we reviewed, there is an additional set of Spot best practices native to each workload that has integrated support for Spot Instances. For example, the ability to implement graceful interruption handling for:
Load balancer connection draining
Automatic job retries
Container draining and rescheduling
Spot Blueprints are designed to quickly explain and generate templates with Spot best practices for each specific workload. The custom-generated workload templates can be downloaded in either AWS CloudFormation or HashiCorp Terraform format, allowing for further customization and learning before being deployed in your environment.
Next, let’s walk through configuring and deploying an example Spot blueprint.
Example tutorial
In this example tutorial, we use Spot Blueprints to configure an Apache Spark environment running on Amazon EMR, deploy the template as a CloudFormation stack, run a sample job, and then delete the CloudFormation stack.
First, we navigate to the EC2 Spot console and click on “Spot Blueprints”:
In the next screen, we select the EMR blueprint, and then click “Configure blueprint”:
A couple of notes here:
If you are in a hurry, you can grab a preconfigured template to get started quickly. The preconfigured template has default best practices in place and can be further customized as needed.
If you have a suggestion for a new blueprint you’d like to see, you can click on “Don’t see a blueprint you need?” to give us feedback. We’d love to hear from you!
In Step 1, we give the blueprint a name and configure permissions. We have the blueprint create new IAM roles to allow Amazon EMR and Amazon EC2 compute resources to make calls to the AWS APIs on your behalf:
We see a summary of resources that will be created, along with a code snippet preview. Unlike traditional wizards, you can see the code generated in every step!
In Step 2, the network is configured. We create a new VPC with public subnets in all Availability Zones in the Region. This is a Spot best practice because it increases the number of Spot capacity pools, which increases the possibility for EC2 to find and provision the required capacity:
We see a summary of resources that will be created, along with a code snippet preview. Here is the CloudFormation code for our template, where you can see the VPC creation, including all dependencies such as the internet gateway, route table, and subnets:
In Step 3, we configure the compute environment. Spot Blueprints makes this easy by allowing us to simply define our capacity units and required compute capacity for each type of node in our EMR cluster. In this walkthrough we will use vCPUs. We select 4 vCPUs for the core node capacity, and 12 vCPUs for the task node capacity. Based on these configurations, Spot Blueprints will apply best practices to the EMR cluster settings. These include using On-Demand Instances for core nodes since interruptions to core nodes can cause instability in the EMR cluster, and using Spot Instances for task nodes because the EMR cluster is typically more tolerant to task node interruptions. Finally, task nodes are provisioned using the capacity-optimized allocation strategy because it helps find the most optimal Spot capacity:
You’ll notice there is no need to spend time thinking about which instance types to select – Spot Blueprints takes our application’s minimum vCPUs per instance and minimum vCPU to memory ratio requirements and automatically selects the optimal instance types. This instance type selection process applies Spot best practices by a) using instance types across different families, generations, and sizes, and b) using the maximum number of instance types possible for core (5) and task (15) nodes:
Here is the CloudFormation code for our template. You can see the EMR cluster creation, the applications being installed (Spark, Hadoop, and Ganglia), the flexible list of instance types and Availability Zones, and the capacity-optimized allocation strategy enabled (along with all dependencies):
In Step 4, we have the option of enabling EMR managed scaling on the cluster. Enabling EMR managed scaling is a Spot best practice because this allows EMR to automatically increase or decrease the compute capacity in the cluster based on the workload, further optimizing performance and cost. EMR continuously evaluates cluster metrics to make scaling decisions that optimize the cluster for cost and speed. Spot Blueprints automatically configures minimum and maximum scaling values based on the compute requirements we defined in the previous step:
Here is the updated CloudFormation code for our template with managed scaling (ManagedScalingPolicy) enabled:
In Step 5, we can review and download the template code for further customization in either CloudFormation or Terraform format, review the instance configuration summary, and review a summary of the resources that would be created from the template. Spot Blueprints will also upload the CloudFormation template to an Amazon S3 bucket so we can deploy the template directly from the CloudFormation console or CLI. Let’s go ahead and click on “Deploy in CloudFormation,” copy the URL, and then click on “Deploy in CloudFormation” again:
Having copied the S3 URL, we go to the CloudFormation console to launch the CloudFormation stack:
We walk through the CloudFormation stack creation steps and the stack launches, creating all of the resources as configured in the blueprint. It takes roughly 15-20 minutes for our stack creation to complete:
Once the stack creation is complete, we navigate to the Amazon EMR console to view the EMR cluster configured with Spot best practices:
Next, let’s run a sample Spark application written in Python to calculate the value of pi on the cluster. We’ll do this by uploading the sample application code to an S3 bucket in our account and then adding a step to the cluster referencing the application location code with arguments:
The step runs and completes successfully:
The results of the calculation are sent to our S3 bucket as configured in the arguments:
{"tries":1600000,"hits":1256253,"pi":3.1406325}
Cleanup
Now that our job is done, we delete the CloudFormation stack in order to remove the AWS resources created. Please note that as a part of the EMR cluster creation, EMR creates some EC2 security groups that cannot be removed by CloudFormation since they were created by the EMR cluster and not by CloudFormation. As a result, the deletion of the CloudFormation stack will fail to delete the VPC on the first try. To solve this, we have the option of deleting the VPC manually by hand, or we can let the CloudFormation stack ignore the VPC and leave it (along with the security groups) in place for future use. We can then delete the CloudFormation stack a final time:
Conclusion
No matter if you are a first-time Spot user learning how to take advantage of the savings and scale offered by Spot Instances, or a veteran Spot user expanding your Spot usage into a new architecture, Spot Blueprints has you covered. We hope you enjoy using Spot Blueprints to quickly get you started generating example template frameworks for workloads like Kubernetes and Apache Spark with Spot best practices in place. Please tell us which blueprint you’d like to see next right in Spot Blueprints. We can’t wait to hear from you!
Customers who maintain manufacturing facilities often find it challenging to ingest, centralize, and visualize IoT data that is emitted in flat-file format from their factory equipment. While modern IoT-enabled industrial devices can communicate over standard protocols like MQTT, there are still some legacy devices that generate useful data but are only capable of writing it locally to a flat file. This results in siloed data that is either analyzed in a vacuum without the broader context, or it is not available to business users to be analyzed at all.
AWS provides a suite of IoT and Edge services that can be used to solve this problem. In this blog, I walk you through one method of leveraging these services to ingest hard-to-reach data into the AWS cloud and extract business value from it.
Overview of solution
This solution provides a working example of an edge device running AWS IoT Greengrass with an AWS Lambda function that watches a Samba file share for new .csv files (presumably containing device or assembly line data). When it finds a new file, it will transform it to JSON format and write it to AWS IoT Core. The data is then sent to AWS IoT Analytics for processing and storage, and Amazon QuickSight is used to visualize and gain insights from the data.
Since we don’t have an actual on-premises environment to use for this walkthrough, we’ll simulate pieces of it:
In place of the legacy factory equipment, an EC2 instance running Windows Server 2019 will generate data in .csv format and write it to the Samba file share.
We’re using a Windows Server for this function to demonstrate that the solution is platform-agnostic. As long as the flat file is written to a file share, AWS IoT Greengrass can ingest it.
An EC2 instance running Amazon Linux will act as the edge device and will host AWS IoT Greengrass Core and the Samba share.
In the real world, these could be two separate devices, and the device running AWS IoT Greengrass could be as small as a Raspberry Pi.
Prerequisites
For this walkthrough, you should have the following prerequisites:
Basic knowledge of Windows and Linux server administration
If you’re unfamiliar with AWS IoT Greengrass concepts like Subscriptions and Cores, review the AWS IoT Greengrass documentation for a detailed description.
Walkthrough
First, we’ll show you the steps to launch the AWS IoT Greengrass resources using AWS CloudFormation. The AWS CloudFormation template is derived from the template provided in this blog post. Review the post for a detailed description of the template and its various options.
Create a key pair. This will be used to access the EC2 instances created by the CloudFormation template in the next step.
For EC2KeyPairName, select the EC2 key pair you just created from the drop-down menu.
For SecurityAccessCIDR, use your public IP with a /32 CIDR (i.e. 1.1.1.1/32).
You can also accept the default of 0.0.0.0/0 if you can have SSH and RDP open to all sources on the EC2 instances in this demo environment.
Accept the defaults for the remaining parameters.
View the Resources tab after stack creation completes. The stack creates the following resources:
A VPC with two subnets, two route tables with routes, an internet gateway, and a security group.
Two EC2 instances, one running Amazon Linux and the other running Windows Server 2019.
An IAM role, policy, and instance profile for the Amazon Linux instance.
A Lambda function called GGSampleFunction, which we’ll update with code to parse our flat-files with AWS IoT Greengrass in a later step.
An AWS IoT Greengrass Group, Subscription, and Core.
Other supporting objects and custom resource types.
View the Outputs tab and copy the IPs somewhere easy to retrieve. You’ll need them for multiple provisioning steps below.
3. Review the AWS IoT Greengrass resources created on your behalf by CloudFormation:
Search for IoT Greengrass in the Services drop-down menu and select it.
Click Manage your Groups.
Click file_ingestion.
Navigate through the Subscriptions, Cores, and other tabs to review the configurations.
Leveraging a device running AWS IoT Greengrass at the edge, we can now interact with flat-file data that was previously difficult to collect, centralize, aggregate, and analyze.
Set up the Samba file share
Now, we set up the Samba file share where we will write our flat-file data. In our demo environment, we’re creating the file share on the same server that runs the Greengrass software. In the real world, this file share could be hosted elsewhere as long as the device that runs Greengrass can access it via the network.
Follow the instructions in setup_file_share.md to set up the Samba file share on the AWS IoT Greengrass EC2 instance.
Keep your terminal window open. You’ll need it again for a later step.
Configure Lambda Function for AWS IoT Greengrass
AWS IoT Greengrass provides a Lambda runtime environment for user-defined code that you author in AWS Lambda. Lambda functions that are deployed to an AWS IoT Greengrass Core run in the Core’s local Lambda runtime. In this example, we update the Lambda function created by CloudFormation with code that watches for new files on the Samba share, parses them, and writes the data to an MQTT topic.
Update the Lambda function:
Search for Lambda in the Services drop-down menu and select it.
Select the file_ingestion_lambda function.
From the Function code pane, click Actions then Upload a .zip file.
Upload the provided zip file containing the Lambda code.
Select Actions > Publish new version > Publish.
2. Update the Lambda Alias to point to the new version.
Select the Version: X drop-down (“X” being the latest version number).
Choose the Aliases tab and select gg_file_ingestion.
Scroll down to Alias configuration and select Edit.
Choose the newest version number and click Save.
Do NOT use $LATEST as it is not supported by AWS IoT Greengrass.
3. Associate the Lambda function with AWS IoT Greengrass.
Search for IoT Greengrass in the Services drop-down menu and select it.
Select Groups and choose file_ingestion.
Select Lambdas > Add Lambda.
Click Use existing Lambda.
Select file_ingestion_lambda > Next.
Select Alias: gg_file_ingestion > Finish.
You should now see your Lambda associated with the AWS IoT Greengrass group.
Still on the Lambda function tab, click the ellipsis and choose Edit configuration.
Change the following Lambda settings then click Update:
Set Containerization to No container (always).
Set Timeout to 25 seconds (or longer if you have large files to process).
Set Lambda lifecycle to Make this function long-lived and keep it running indefinitely.
Deploy AWS IoT Greengrass Group
Restart the AWS IoT Greengrass daemon:
A daemon restart is required after changing containerization settings. Run the following commands on the Greengrass instance to restart the AWS IoT Greengrass daemon:
cd /greengrass/ggc/core/
sudo ./greengrassd stop
sudo ./greengrassd start
2. Deploy the AWS IoT Greengrass Group to the Core device.
Return to the file_ingestion AWS IoT Greengrass Group in the console.
Select Actions > Deploy.
Select Automatic detection.
After a few minutes, you should see a Status of Successfully completed. If the deployment fails, check the logs, fix the issues, and deploy again.
Generate test data
You can now generate test data that is ingested by AWS IoT Greengrass, written to AWS IoT Core, and then sent to AWS IoT Analytics and visualized by Amazon QuickSight.
Verify that the data is being written to AWS IoT Core following these instructions (Use iot/data for the MQTT Subscription Topic instead of hello/world).
Setup AWS IoT Analytics
Now that our data is in IoT Cloud, it only takes a few clicks to configure AWS IoT Analytics to process, store, and analyze our data.
Search for IoT Analytics in the Services drop-down menu and select it.
Set Resources prefix to file_ingestion and Topic to iot/data. Click Quick Create.
Populate the data set by selecting Data sets > file_ingestion_dataset >Actions > Run now. If you don’t get data on the first run, you may need to wait a couple of minutes and run it again.
Visualize the Data from AWS IoT Analytics in Amazon QuickSight
We can now use Amazon QuickSight to visualize the IoT data in our AWS IoT Analytics data set.
Search for QuickSight in the Services drop-down menu and select it.
If your account is not signed up for QuickSight yet, follow these instructions to sign up (use Standard Edition for this demo)
Build a new report:
Click New analysis > New dataset.
Select AWS IoT Analytics.
Set Data source name to iot-file-ingestion and select file_ingestion_dataset. Click Create data source.
Click Visualize. Wait a moment while your rows are imported into SPICE.
You can now drag and drop data fields onto field wells. Review the QuickSight documentation for detailed instructions on creating visuals.
Following is an example of a QuickSight dashboard you can build using the demo data we generated in this walkthrough.
Cleaning up
Be sure to clean up the objects you created to avoid ongoing charges to your account.
In AWS IoT Analytics, delete the datastore, channel, pipeline, data set, role, and topic rule you created.
In CloudFormation, delete the IoTGreengrass stack.
In Amazon CloudWatch, delete the log files associated with this solution.
Conclusion
Gaining valuable insights from device data that was once out of reach is now possible thanks to AWS’s suite of IoT services. In this walkthrough, we collected and transformed flat-file data at the edge and sent it to IoT Cloud using AWS IoT Greengrass. We then used AWS IoT Analytics to process, store, and analyze that data, and we built an intelligent dashboard to visualize and gain insights from the data using Amazon QuickSight. You can use this data to discover operational anomalies, enable better compliance reporting, monitor product quality, and many other use cases.
For more information on AWS IoT services, check out the overviews, use cases, and case studies on our product page. If you’re new to IoT concepts, I’d highly encourage you to take our free Internet of Things Foundation Series training.
Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.
We are pleased to announce the launch of Python support for Amazon CodeGuru, a service for automated code reviews and application performance recommendations. CodeGuru is powered by program analysis and machine learning, and trained on best practices and hard-learned lessons across millions of code reviews and thousands of applications profiled on open-source projects and internally at Amazon.
Amazon CodeGuru has two services:
Amazon CodeGuru Reviewer – Helps you improve source code quality by detecting hard-to-find defects during application development and recommending how to remediate them.
Amazon CodeGuru Profiler – Helps you find the most expensive lines of code, helps reduce your infrastructure cost, and fine-tunes your application performance.
The launch of Python support extends CodeGuru beyond its original Java support. Python is a widely used language for various use cases, including web app development and DevOps. Python’s growth in data analysis and machine learning areas is driven by its rich frameworks and libraries. In this post, we discuss how to use CodeGuru Reviewer and Profiler to improve your code quality for Python applications.
CodeGuru Reviewer for Python
CodeGuru Reviewer now allows you to analyze your Python code through pull requests and full repository analysis. For more information, see Automating code reviews and application profiling with Amazon CodeGuru. We analyzed large code corpuses and Python documentation to source hard-to-find coding issues and trained our detectors to provide best practice recommendations. We expect such recommendations to benefit beginners as well as expert Python programmers.
CodeGuru Reviewer generates recommendations in the following categories:
Data structures and control flow, including exception handling
Resource leaks
Secure coding practices to protect from potential shell injections
In the following sections, we provide real-world examples of bugs that can be detected in each of the categories:
AWS SDK API best practices
AWS has hundreds of services and thousands of APIs. Developers can now benefit from CodeGuru Reviewer recommendations related to AWS APIs. AWS recommendations in CodeGuru Reviewer cover a wide range of scenarios such as detecting outdated or deprecated APIs, warning about API misuse, authentication and exception scenarios, and efficient API alternatives.
Consider the pagination trait, implemented by over 1,000 APIs from more than 150 AWS services. The trait is commonly used when the response object is too large to return in a single response. To get the complete set of results, iterated calls to the API are required, until the last page is reached. If developers were not aware of this, they would write the code as the following (this example is patterned after actual code):
def sync_ddb_table(source_ddb, destination_ddb):
response = source_ddb.scan(TableName=“table1”)
for item in response['Items']:
...
destination_ddb.put_item(TableName=“table2”, Item=item)
…
Here the scan API is used to read items from one Amazon DynamoDB table and the put_item API to save them to another DynamoDB table. The scan API implements the Pagination trait. However, the developer missed iterating on the results beyond the first scan, leading to only partial copying of data.
The following screenshot shows what CodeGuru Reviewer recommends:
The developer fixed the code based on this recommendation and added complete handling of paginated results by checking the LastEvaluatedKey value in the response object of the paginated API scan as follows:
def sync_ddb_table(source_ddb, destination_ddb):
response = source_ddb.scan(TableName==“table1”)
for item in response['Items']:
...
destination_ddb.put_item(TableName=“table2”, Item=item)
# Keeps scanning util LastEvaluatedKey is null
while "LastEvaluatedKey" in response:
response = source_ddb.scan(
TableName="table1",
ExclusiveStartKey=response["LastEvaluatedKey"]
)
for item in response['Items']:
destination_ddb.put_item(TableName=“table2”, Item=item)
…
CodeGuru Reviewer recommendation is rich and offers multiple options for implementing Paginated scan. We can also initialize the ExclusiveStartKey value to None and iteratively update it based on the LastEvaluatedKey value obtained from the scan response object in a loop. This fix below conforms to the usage mentioned in the official documentation.
def sync_ddb_table(source_ddb, destination_ddb):
table = source_ddb.Table(“table1”)
scan_kwargs = {
…
}
done = False
start_key = None
while not done:
if start_key:
scan_kwargs['ExclusiveStartKey'] = start_key
response = table.scan(**scan_kwargs)
for item in response['Items']:
destination_ddb.put_item(TableName=“table2”, Item=item)
start_key = response.get('LastEvaluatedKey', None)
done = start_key is None
Data structures and control flow
Python’s coding style is different from other languages. For code that does not conform to Python idioms, CodeGuru Reviewer provides a variety of suggestions for efficient and correct handling of data structures and control flow in the Python 3 standard library:
Using DefaultDict for compact handling of missing dictionary keys over using the setDefault() API or dealing with KeyError exception
Using a subprocess module over outdated APIs for subprocess handling
Detecting improper exception handling such as catching and passing generic exceptions that can hide latent issues.
Detecting simultaneous iteration and modification to loops that might lead to unexpected bugs because the iterator expression is only evaluated one time and does not account for subsequent index changes.
The following code is a specific example that can confuse novice developers.
def list_sns(region, creds, sns_topics=[]):
sns = boto_session('sns', creds, region)
response = sns.list_topics()
for topic_arn in response["Topics"]:
sns_topics.append(topic_arn["TopicArn"])
return sns_topics
def process():
...
for region, creds in jobs["auth_config"]:
arns = list_sns(region, creds)
...
The process() method iterates over different AWS Regions and collects Regional ARNs by calling the list_sns() method. The developer might expect that each call to list_sns() with a Region parameter returns only the corresponding Regional ARNs. However, the preceding code actually leaks the ARNs from prior calls to subsequent Regions. This happens due to an idiosyncrasy of Python relating to the use of mutable objects as default argument values. Python default value are created exactly one time, and if that object is mutated, subsequent references to the object refer to the mutated value instead of re-initialization.
The following screenshot shows what CodeGuru Reviewer recommends:
The developer accepted the recommendation and issued the below fix.
def list_sns(region, creds, sns_topics=None):
sns = boto_session('sns', creds, region)
response = sns.list_topics()
if sns_topics is None:
sns_topics = []
for topic_arn in response["Topics"]:
sns_topics.append(topic_arn["TopicArn"])
return sns_topics
Resource leaks
A Pythonic practice for resource handling is using Context Managers. Our analysis shows that resource leaks are rampant in Python code where a developer may open external files or windows and forget to close them eventually. A resource leak can slow down or crash your system. Even if a resource is closed, using Context Managers is Pythonic. For example, CodeGuru Reviewer detects resource leaks in the following code:
def read_lines(file):
lines = []
f = open(file, ‘r’)
for line in f:
lines.append(line.strip(‘\n’).strip(‘\r\n’))
return lines
The following screenshot shows that CodeGuru Reviewer recommends that the developer either use the ContextLib with statement or use a try-finally block to explicitly close a resource.
The developer accepted the recommendation and fixed the code as shown below.
def read_lines(file):
lines = []
with open(file, ‘r’) as f:
for line in f:
lines.append(line.strip(‘\n’).strip(‘\r\n’))
return lines
Secure coding practices
Python is often used for scripting. An integral part of such scripts is the use of subprocesses. As of this writing, CodeGuru Reviewer makes a limited, but important set of recommendations to make sure that your use of eval functions or subprocesses is secure from potential shell injections. It issues a warning if it detects that the command used in eval or subprocess scenarios might be influenced by external factors. For example, see the following code:
As shown in the preceding recommendations, not only are the code issues detected, but a detailed recommendation is also provided on how to fix the issues, along with a link to the Python official documentation. You can provide feedback on recommendations in the CodeGuru Reviewer console or by commenting on the code in a pull request. This feedback helps improve the performance of Reviewer so that the recommendations you see get better over time.
Now let’s take a look at CodeGuru Profiler.
CodeGuru Profiler for Python
Amazon CodeGuru Profiler analyzes your application’s performance characteristics and provides interactive visualizations to show you where your application spends its time. These visualizations a. k. a. flame graphs are a powerful tool to help you troubleshoot which code methods have high latency or are over utilizing your CPU.
Thanks to the new Python agent, you can now use CodeGuru Profiler on your Python applications to investigate performance issues.
The following list summarizes the supported versions as of this writing.
Other environments: Python3.9, Python3.8, Python3.7, Python3.6
Onboarding your Python application
For this post, let’s assume you have a Python application running on Amazon Elastic Compute Cloud (Amazon EC2) hosts that you want to profile. To onboard your Python application, complete the following steps:
An easy way to profile your application is to start your script through the codeguru_profiler_agent module. If you have an app.py script, use the following code:
Alternatively, you can start the agent manually inside the code. This must be done only one time, preferably in your startup code:
from codeguru_profiler_agent import Profiler
if __name__ == "__main__":
Profiler(profiling_group_name='ProfilingGroupForMyApplication')
start_application() # your code in there....
Onboarding your Python Lambda function
Onboarding for an AWS Lambda function is quite similar.
Create a profiling group called ProfilingGroupForMyLambdaFunction, this time we select “Lambda” for the compute platform. Give access to your Lambda function role to submit to this profiling group. See the documentation for details about how to create a Profiling Group.
Include the codeguru_profiler_agent module in your Lambda function code.
Add the with_lambda_profiler decorator to your handler function:
from codeguru_profiler_agent import with_lambda_profiler
@with_lambda_profiler(profiling_group_name='ProfilingGroupForMyLambdaFunction')
def handler_function(event, context):
# Your code here
Alternatively, you can profile an existing Lambda function without updating the source code by adding a layer and changing the configuration. For more information, see Profiling your applications that run on AWS Lambda.
Profiling a Lambda function helps you see what is slowing down your code so you can reduce the duration, which reduces the cost and improves latency. You need to have continuous traffic on your function in order to produce a usable profile.
Viewing your profile
After running your profile for some time, you can view it on the CodeGuru console.
Each frame in the flame graph shows how much that function contributes to latency. In this example, an outbound call that crosses the network is taking most of the duration in the Lambda function, caching its result would improve the latency.
Supportability for CodeGuru Profiler is documented here.
If you don’t have an application to try CodeGuru Profiler on, you can use the demo application in the following GitHub repo.
Conclusion
This post introduced how to leverage CodeGuru Reviewer to identify hard-to-find code defects in various issue categories and how to onboard your Python applications or Lambda function in CodeGuru Profiler for CPU profiling. Combining both services can help you improve code quality for Python applications. CodeGuru is now available for you to try. For more pricing information, please see Amazon CodeGuru pricing.
About the Authors
Neela Sawant is a Senior Applied Scientist in the Amazon CodeGuru team. Her background is building AI-powered solutions to customer problems in a variety of domains such as software, multimedia, and retail. When she isn’t working, you’ll find her exploring the world anew with her toddler and hacking away at AI for social good.
Pierre Marieu is a Software Development Engineer in the Amazon CodeGuru Profiler team in London. He loves building tools that help the day-to-day life of other software engineers. Previously, he worked at Amadeus IT, building software for the travel industry.
Ran Fu is a Senior Product Manager in the Amazon CodeGuru team. He has a deep customer empathy, and love exploring who are the customers, what are their needs, and why those needs matter. Besides work, you may find him snowboarding in Keystone or Vail, Colorado.
Amazon FSx provides AWS customers with the native compatibility of third-party file systems with feature sets for workloads such as Windows-based storage, high performance computing (HPC), machine learning, and electronic design automation (EDA). Amazon FSx automates the time-consuming administration tasks such as hardware provisioning, software configuration, patching, and backups. Since Amazon FSx integrates the file systems with cloud-native AWS services, this makes them even more useful for a broader set of workloads.
Amazon FSx for Windows File Server provides fully managed file storage that is accessible over the industry-standard Server Message Block (SMB) protocol. Built on Windows Server, Amazon FSx delivers a wide range of administrative features such as data deduplication, end-user file restore, and Microsoft Active Directory (AD) integration.
In this post, I explain how to migrate files and file shares from on-premises servers to Amazon FSx with AWS DataSync in a domain migration scenario. Customers are migrating their file servers to Amazon FSx as part of their migration from an on-premises Active Directory to AWS managed Active Directory. Their plan is to replace their file servers with Amazon FSx during Active Directory migration to AWS Managed AD.
Prerequisites
Before you begin, perform the steps outlined in this blog to migrate the user accounts and groups to the managed Active Directory.
Walkthrough
There are numerous ways to perform the Active Directory migration. Generally, the following five steps are taken:
Establish two-way forest trust between on-premises AD and AWS Managed AD
Migrate user accounts and group with the ADMT tool
Duplicate Access Control List (ACL) permissions in the file server
Migrate files and folders with existing ACL to Amazon FSx using AWS DataSync
Migrate User Computers
In this post, I focus on duplication of ACL permissions and migration of files and folders using Amazon FSx and AWS DataSync. In order to perform duplication of ACL permission in file servers, I use SubInACL tool, which is available from the Microsoft website.
Duplication of the ACL is required because users want to seamlessly access file shares once their computers are migrated to AWS Managed AD. Thus all migrated files and folders have permission with Managed AD users and group objects. For enterprises, the migration of user computers does not happen overnight. Normally, migration takes place in batches or phases. With ACL duplication, both migrated and non-migrated users can access their respective file shares seamlessly during and after migration.
Duplication of Access Control List (ACL)
Before we proceed with ACL duplication, we must ensure that the migration of user accounts and groups was completed. In my demo environment, I have already migrated on-premises users to the Managed Active Directory. In the meantime, we presume that we are migrating identical users to the Managed Active Directory. There might be a scenario where migrated user accounts have different naming such as samAccount name. In this case, we will need to handle this during ACL duplication with SubInACL. For more information about syntax, refer to the SubInACL documentation.
As indicated in following screenshots, I have two users created in the on-premises Active Directory (onprem.local) and those two identical users have been created in the Managed Active Directory too (corp.example.com).
In the following screenshot, I have a shared folder called “HR_Documents” in an on-premises file server. Different users have different access rights to that folder. For example, John Smith has “Full Control” but Onprem User1 only have “Read & Execute”. Our plan is to add same access right to identical users from the Managed Active Directory, here corp.example.com, so that once John Smith is migrated to managed AD, he can access to shared folders in Amazon FSx using his Managed Active Directory credential.
Let’s verify the existing permission in the “HR_Documents” folder. Two users from onprem.local are found with different access rights.
Now it’s time to install SubInACL.
We install it in our on-premises file server. After the SubInACL tool is installed, it can be found under “C:\Program Files (x86)\Windows Resource Kits\Tools” folder by default. To perform an ACL duplication, run command prompt as administrator and run the following command;
There are several parameters that I am using in the command:
Outputlog = where log file is saved
ErrorLog = where error log file is saved
Subdirectories = to apply permissions including subfolders and files
Migratetodomain= NetBIOS name of source domain and destination domain
If the command is run successfully, you should able to see a summary of the results. If there is no error or failure, you can verify whether ACL permissions are duplicated as expected by looking at the folders and files. In our case, we can see that there is one ACL entry of identical account from corp.example.com is added.
Note: you will always see two ACL entries, one from onprem.local and another one from corp.example.com domain in all the files and folders that you used during migration. Permissions are now applied to both at the folder and file level.
Migrate files and folders using AWS DataSync
AWS DataSync is an online data transfer service that simplifies, automates, and accelerates moving data between on-premises storage systems and AWS Storage services such as Amazon S3, Amazon Elastic File System (Amazon EFS), or Amazon FSx for Windows File Server. Manual tasks related to data transfers can slow down migrations and burden IT operations. AWS DataSync reduces or automatically handles many of these tasks, including scripting copy jobs, scheduling and monitoring transfers, validating data, and optimizing network utilization.
Create an AWS DataSync agent
An AWS DataSync agent deploys as a virtual machine in an on-premises data center. An AWS DataSync agent can be run on ESXi, KVM, and Microsoft Hyper-V hypervisors. The AWS DataSync agent is used to access on-premises storage systems and transfer data to the AWS DataSync managed service running on AWS. AWS DataSync always performs incremental copies by comparing from a source to a destination and only copying files that are new or have changed.
AWS DataSync supports the following SMB (Server Message Block) locations to migrate data from:
Network File System (NFS)
Server Message Block (SMB)
In this blog, I use SMB as the source location, since I am migrating from an on-premises Windows File server. AWS DataSync supports SMB 2.1 and SMB 3.0 protocols.
AWS DataSync saves metadata and special files when copying to and from file systems. When files are copied from a SMB file share and Amazon FSx for Windows File Server, AWS DataSync copies the following metadata:
File timestamps: access time, modification time, and creation time
File owner and file group security identifiers (SIDs)
Standard file attributes
NTFS discretionary access lists (DACLs): access control entries (ACEs) that determine whether to grant access to an object
Data Synchronization with AWS DataSync
When a task starts, AWS DataSyc goes through different stages. It begins with examining file system follows by data transfer to destination. Once data transfer is completed, it performs verification for consistency between source and destination file systems. You can review detailed information about the data synchronization stages.
DataSync Endpoints
You can activate your agent by using one of the following endpoint types:
Public endpoints – If you use public endpoints, all communication from your DataSync agent to AWS occurs over the public internet.
Federal Information Processing Standard (FIPS) endpoints – If you need to use FIPS 140-2 validated cryptographic modules when accessing the AWS GovCloud (US-East) or AWS GovCloud (US-West) Region, use this endpoint to activate your agent. You use the AWS CLI or API to access this endpoint.
Virtual private cloud (VPC) endpoints – If you use a VPC endpoint, all communication from AWS DataSync to AWS services occurs through the VPC endpoint in your VPC in AWS. This approach provides a private connection between your self-managed data center, your VPC, and AWS services. It increases the security of your data as it is copied over the network.
In my demo environment, I have implemented AWS DataSync as indicated in following diagram. The DataSync Agent can be run either on VMware or Hyper-V and KVM platform in a customer on-premises data center.
Once the AWS DataSync Agent setup is completed and the task that defined the source file servers and destination Amazon FSx server is added, you can verify agent status in the AWS Management Console.
Select Task and then choose Start to start copying files and folders. This will start the replication task (or you can wait until the task runs hourly). You can check the History tab to see a history of the replication task executions.
Congratulations! You have replicated the contents of an on-premises file server to Amazon FSx. Let’s look and make sure the ACL permissions are still intact in their destination after migration. As shown in the following screenshots, the ACL permissions in the Payroll folder still remains as is, both on-premises users and Managed AD users are inside. Once the user’s computers are migrated to the Managed AD, they can access the same file share in Amazon FSx server using Managed AD credentials.
Cleaning up
If you are performing testing by following the preceding steps in your own account, delete the following resources, to avoid incurring future charges:
EC2 instances
Managed AD
Amazon FSx file system
AWS Datasync
Conclusion
You have learned how to duplicate ACL permissions and shared folder permissions during migration of file servers to Amazon FSx. This process provides a seamless migration experience for users. Once the user’s computers are migrated to the Managed AD, they only need to remap shared folders from Amazon FSx. This can be automated by pushing down shared folders mapping with a Group Policy. If new files or folders are created in the source file server, AWS Datasync will synchronize to Amazon FSx server.
For customers who are planning to do a domain migration from on-premises to AWS Managed Microsoft AD, migration of resources like file servers are common. Handling ACL permissions plays a vital role in providing a seamless migration experience. The duplication of ACL can be an option, otherwise, the ADMT tool can be used to migrate SID information from the source Domain to destination Domain. To migrate SID history, SID filtering needs to be disabled during migration.
If you want to provide feedback about this post, you are welcome to submit in the comments section below.
Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.
Today, software development practices are constantly evolving to empower developers with tools to maintain a high bar of code quality. Amazon CodeGuru Reviewer offers this capability by carrying out automated code-reviews for developers, based on the trained machine learning models that can detect complex defects and providing intelligent actionable recommendations to mitigate those defects. A quick overview of CodeGuru is covered in this blog post.
Security analysis is a critical part of a code review and CodeGuru Reviewer offers this capability with a new set of security detectors. These security detectors introduced in CodeGuru Reviewer are geared towards identifying security risks from the top 10 OWASP categories and ensures that your code follows best practices for AWS Key Management Service (AWS KMS), Amazon Elastic Compute Cloud (Amazon EC2) API, and common Java crypto and TLS/SSL libraries. As of today, CodeGuru security analysis supports Java language, thus we will take an example of a Java application.
In this post, we will walk through the on-boarding workflow to carry out the security analysis of the code repository and generate recommendations for a Java application.
Security workflow overview:
The new security workflow, introduced for CodeGuru reviewer, utilizes the source code and build artifacts to generate recommendations. The security detector evaluates build artifacts to generate security-related recommendations whereas other detectors continue to scan the source code to generate recommendations. With the use of build artifacts for evaluation, the detector can carry out a whole-program inter-procedural analysis to discover issues that are caused across your code (e.g., hardcoded credentials in one file that are passed to an API in another) and can reduce false-positives by checking if an execution path is valid or not. You must provide the source code .zip file as well as the build artifact .zip file for a complete analysis.
Customers can run a security scan when they create a repository analysis. CodeGuru Reviewer provides an additional option to get both code and security recommendations. As explained in the following sections, CodeGuru Reviewer will create an Amazon Simple Storage Service (Amazon S3) bucket in your AWS account for that region to upload or copy your source code and build artifacts for the analysis. This repository analysis option can be run on Java code from any repository.
Prerequisites
Prepare the source code and artifact zip files: If you do not have your Java code locally, download the source code that you want to evaluate for security and zip it. Similarly, if needed, download the build artifact .jar file for your source code and zip it. It will be required to upload the source code and build artifact as separate .zip files as per the instructions in subsequent sections. Thus even if it is a single file (eg. single .jar file), you will still need to zip it. Even if the .zip file includes multiple files, the right files will be discovered and analyzed by CodeGuru. For our sample test, we will use src.zip and jar.zip file, saved locally.
Creating an S3 bucket repository association:
This section summarizes the high-level steps to create the association of your S3 bucket repository.
1. On the CodeGuru console, choose Code reviews.
2. On the Repository analysis tab, choose Create repository analysis.
Figure: Screenshot of initiating the repository analysis
3. For the source code analysis, select Code and security recommendations.
4. For Repository name, enter a name for your repository.
5. Under Additional settings, for Code review name, enter a name for trackability purposes.
6. Choose Create S3 bucket and associate.
Figure: Screenshot to show selection of Security Code Analysis
It takes a few seconds to create a new S3 bucket in the current Region. When it completes, you will see the below screen.
Figure: Screenshot for Create repository analysis showing S3 bucket created
7. Choose Upload to the S3 bucket option and under that choose Upload source code zip file and select the zip file (src.zip) from your local machine to upload.
Figure: Screenshot of popup to upload code and artifacts from S3 bucket
8. Similarly, choose Upload build artifacts zip file and select the zip file (jar.zip) from your local machine and upload.
Figure: Screenshot for Create repository analysis showing S3 paths populated
Alternatively, you can always upload the source code and build artifacts as zip file from any of your existing S3 bucket as below.
9. Choose Browse S3 buckets for existing artifacts and upload from there as shown below:
Figure: Screenshot to upload code and artifacts from an existing S3 bucket
10. Now click Create repository analysis and trigger the code review.
A new pending entry is created as shown below.
Figure: Screenshot of code review in Pending state
After a few minutes, you would see the recommendations generate that would include security analysis too. In the below case, there are 10 recommendations generated.
Figure: Screenshot of repository analysis being completed
For the subsequent code reviews, you can use the same repository and upload new files or create a new repository as shown below:
Figure: Screenshot of subsequent code review making repository selection
Recommendations
Apart from detecting the security risks from the top 10 OWASP categories, the security detector, detects the deep security issues by analyzing data flow across multiple methods, procedures, and files.
The recommendations generated in the area of security are labelled as Security. In the below example we see a recommendation to remove hard-coded credentials and a non-security-related recommendation about refactoring of code for better maintainability.
Figure: Screenshot of Recommendations generated
Below is another example of recommendations pointing out the potential resource-leak as well as a security issue pointing to a potential risk of path traversal attack.
Figure: More examples of deep security recommendations
As this blog is focused on on-boarding aspects, we will cover the explanation of recommendations in more detail in a separate blog.
Disassociation of Repository (optional):
The association of CodeGuru to the S3 bucket repository can be removed by following below steps. Navigate to the Repositories page, select the repository and choose Disassociate repository.
Figure: Screenshot of disassociating the S3 bucket repo with CodeGuru
Conclusion
This post reviewed the support for on-boarding workflow to carry out the security analysis in CodeGuru Reviewer. We initiated a full repository analysis for the Java code using a separate UI workflow and generated recommendations.
We hope this post was useful and would enable you to conduct code analysis using Amazon CodeGuru Reviewer.
About the Author
Nikunj Vaidya is a Sr. Solutions Architect with Amazon Web Services, focusing in the area of DevOps services. He builds technical content for the field enablement and offers technical guidance to the customers on AWS DevOps solutions and services that would streamline the application development process, accelerate application delivery, and enable maintaining a high bar of software quality.
Amazon Simple Queue Service (Amazon SQS) is a fully managed message queuing service. It enables you to decouple and scale microservices, distributed systems, and serverless applications. A commonly used feature of Amazon SQS is dead-letter queues. The DLQ (dead-letter queue) is used to store messages that can’t be processed (consumed) successfully.
This post describes how to add automated resilience to an existing SQS queue. It monitors the dead-letter queue and moves a message back to the main queue to see if it can be processed again. It also uses a specific algorithm to make sure this is not repeated forever. Each time it attempts to reprocess the message, the replay time increases until the message is finally considered dead.
The main task of a dead-letter queue (DLQ) is to handle message failure. It allows you to set aside and isolate non-processed messages to determine why processing failed. Often these failed messages are caused by application errors. For example, a consumer application fails to parse a message correctly and throws an unhandled exception. This exception then triggers an error response that sends the message to the DLQ. The AWS documentation contains a tutorial detailing the configuration of an Amazon SQS dead-letter queue.
To process the failed messages, I build a retry mechanism by implementing an exponential backoff algorithm. The idea behind exponential backoff is to use progressively longer waits between retries for consecutive error responses. Most exponential backoff algorithms use jitter (randomized delay) to prevent successive collisions. This spreads the message retries more evenly across time, allowing them to be processed more efficiently.
Solution overview
The flow of the message sent by the producer to SQS is as follows:
The producer application sends a message to an SQS queue
The consumer application fails to process the message in the same SQS queue
The message is moved from the main SQS queue to the default dead-letter queue as per the component settings.
A Lambda function is configured with the SQS main dead-letter queue as an event source. It receives and sends back the message to the original queue adding a message timer.
The message timer is defined by the exponential backoff and jitter algorithm.
You can limit the number of retries. If the message exceeds this limit, the message is moved to a second DLQ where an operator processes it manually.
How the replay function works
Each time the SQS dead-letter queue receives a message, it triggers Lambda to run the replay function. The replay code uses an SQS message attribute `sqs-dlq-replay-nb` as a persistent counter for the current number of retries attempted. The number of retries is compared to the maximum number (defined in the application configuration file). If it exceeds the maximum, the message is moved to the human operated queue. If not, the function uses the AWS Lambda event data to build a new message for the Amazon SQS main queue. Finally it updates the retry counter, adds a new message timer to the message, and it sends the message back (replays) to the main queue.
def handler(event, context):
"""Lambda function handler."""
for record in event['Records']:
nbReplay = 0
# number of replay
if 'sqs-dlq-replay-nb' in record['messageAttributes']:
nbReplay = int(record['messageAttributes']['sqs-dlq-replay-nb']["stringValue"])
nbReplay += 1
if nbReplay > config.MAX_ATTEMPS:
raise MaxAttempsError(replay=nbReplay, max=config.MAX_ATTEMPS)
# SQS attributes
attributes = record['messageAttributes']
attributes.update({'sqs-dlq-replay-nb': {'StringValue': str(nbReplay), 'DataType': 'Number'}})
_sqs_attributes_cleaner(attributes)
# Backoff
b = backoff.ExpoBackoffFullJitter(base=config.BACKOFF_RATE, cap=config.MESSAGE_RETENTION_PERIOD)
delaySeconds = b.backoff(n=int(nbReplay))
# SQS
SQS.send_message(
QueueUrl=config.SQS_MAIN_URL,
MessageBody=record['body'],
DelaySeconds=int(delaySeconds),
MessageAttributes=record['messageAttributes']
)
How to use the application
You can use this serverless application via:
The Lambda console: choose the “Browse serverless app repository” option to create a function. Select “amazon-sqs-dlq-replay-backoff” application in the public applications repository. Then, configure the application with the default SQS parameters and the replay feature parameters.
I describe how an exponential backoff algorithm (with jitter) enhances the message processing capabilities of an Amazon SQS queue. You can now find the amazon-sqs-dlq-replay-backoff application in the AWS Serverless Application Repository. Download the code from this GitHub repository.
To get started with dead-letter queues in Amazon SQS, read:
Recovering your mission-critical workloads from outages is essential for business continuity and providing services to customers with little or no interruption. That’s why many customers replicate their mission-critical workloads in multiple places using a Disaster Recovery (DR) strategy suited for their needs.
With AWS, a customer can achieve this by deploying multi Availability Zone High-Availability setup or a multi-region setup by replicating critical components of an application to another region. Depending on the RPO and RTO of the mission-critical workload, the requirement for disaster recovery ranges from simple backup and restore, to multi-site, active-active, setup. In this blog post, I explain how AWS Outposts can be used for DR on AWS.
In many geographies, it is possible to set up your disaster recovery for a workload running in one AWS Region to another AWS Region in the same country (for example in US between us-east-1 and us-west-2). For countries where there is only one AWS Region, it’s possible to set up disaster recovery in another country where AWS Region is present. This method can be designed for the continuity, resumption and recovery of critical business processes at an agreed level and limits the impact on people, processes and infrastructure (including IT). Other reasons include to minimize the operational, financial, legal, reputational and other material consequences arising from such events.
However, for mission-critical workloads handling critical user data (PII, PHI or financial data), countries like India and Canada have regulations which mandate to have a disaster recovery setup at a “safe distance” within the same country. This ensures compliance with any data sovereignty or data localization requirements mandated by the regulators. “Safe distance” means the distance between the DR site and the primary site is such that the business can continue to operate in the event of any natural disaster or industrial events affecting the primary site. Depending on the geography, this safe distance could be 50KM or more. These regulations limit the options customers have to use another AWS Region in another country as a disaster recovery site of their primary workload running on AWS.
In this blog post, I describe an architecture using AWS Outposts which helps set up disaster recovery on AWS within the same country at a distance that can meet the requirements set by regulators. This architecture also helps customers to comply with various data sovereignty regulations in a given country. Another advantage of this architecture is the homogeneity of the primary and disaster recovery site. Your existing IT teams can set up and operate the disaster recovery site using familiar AWS tools and technology in a homogenous environment.
Prerequisites
Readers of this blog post should be familiar with basic networking concepts like WAN connectivity, BGP and the following AWS services:
I explain the architecture using an example customer scenario in India, where a customer is using AWS Mumbai Region for their mission-critical workload. This workload needs a DR setup to comply with local regulation and the DR setup needs to be in a different seismic zone than the one for Mumbai. Also, because of the nature of the regulated business, the user/sensitive data needs to be stored within India.
Following is the architecture diagram showing the logical setup.
This solution is similar to a typical AWS Outposts use case where a customer orders the Outposts to be installed in their own Data Centre (DC) or a CoLocation site (Colo). It will follow the shared responsibility model described in AWS Outposts documentation.
The only difference is that the AWS Outpost parent Region will be the closest Region other than AWS Mumbai, in this case Singapore. Customers will then provision an AWS Direct Connect public VIF locally for a Service Link to the Singapore Region. This ensures that the control plane stays available via the AWS Singapore Region even if there is an outage in AWS Mumbai Region affecting control plane availability. You can then launch and manage AWS Outposts supported resources in the AWS Outposts rack.
For data plane traffic, which should not go out of the country, the following options are available:
Provision a self-managed Virtual Private Network (VPN) between an EC2 instances running router AMI in a subnet of AWS Outposts and AWS Transit Gateway (TGW) in the primary Region.
Provision a self-managed Virtual Private Network (VPN) between an EC2 instances running router AMI in a subnet of AWS Outposts and Virtual Private Gateway (VGW) in the primary Region.
Note: The Primary Region in this example is AWS Mumbai Region. This VPN will be provisioned via Local Gateway and DX public VIF. This ensures that data plane traffic will not traverse any network out of the country (India) to comply with data localization mandated by the regulators.
Create an Outpost and order Outpost capacity as described in the documentation. Make sure that you do this step while logged into AWS Outposts console of the AWS Singapore Region.
For the WAN connectivity between your DC/Colo and AWS Direct Connect location you can choose any telco provider of your choice or work with one of AWS Direct Connect partners.
This public VIF will be used to attach AWS Outposts to its parent Region in Singapore over AWS Outposts service link. It will also be used to establish an IPsec GRE tunnel between AWS Outposts subnet and a TGW or VGW for data plane traffic (explained in subsequent steps).
Alternatively, you can provision separate Direct Connect connection and public VIFs for Service Link and data plane traffic for better segregation between the two. You will have to provision sufficient bandwidth on Direct Connect connection for the Service Link traffic as well as the Data Plane traffic (like data replication between primary Region and AWS outposts).
For an optimal experience and resiliency, AWS recommends that you use dual 1Gbps connections to the AWS Region. This connectivity can also be achieved over Internet transit; however, I recommend using AWS Direct Connect because it provides private connectivity between AWS and your DC/Colo environment, which in many cases can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.
Create a subnet in AWS Outposts and launch an EC2 instance running a router AMI of your choice from AWS Marketplace in this subnet. This EC2 instance is used to establish the IPsec GRE tunnel to the TGW or VGW in primary Region.
Choose an EC2 instance which can support your bandwidth requirement as per the AMI provider and disable source/destination check for this EC2 instance.
Assign an Elastic IP address to this EC2 instance from the customer-owned pool provisioned for your AWS Outposts.
Add rules in security group of these EC2 instances to allow ISAKMP (UDP 500), NAT Traversal (UDP 4500), and ESP (IP Protocol 50) from VGW or TGW endpoint public IP addresses.
NAT (Network Address Translation) the EIP assigned in step 5 to a public IP address at your edge router connecting to AWS Direct connect or internet transit. This public IP will be used as the customer gateway to establish IPsec GRE tunnel to the primary Region.
Create a customer gateway using the public IP address used to NAT the EC2 instances step 7. Follow the steps in similar process found at Create a Customer Gateway.
Configure the customer gateway (EC2 instance running a router AMI in AWS Outposts subnet) side for VPN connectivity. You can base this configuration suggested by AWS during the creation of VPN in step 9. This suggested sample configuration can be downloaded from AWS console post VPN setup as discussed in this document.
Modify the route table of AWS outpost Subnets to point to the EC2 instance launched in step 5 as the target for any destination in your VPCs in the primary Region, which is AWS Mumbai in this example.
At this point, you will have end-to-end connectivity between VPCs in a primary Region and resources in an AWS Outposts. This connectivity can now be used to replicate data from your primary site to AWS Outposts for DR purposes. This keeps the setup compliant with any internal or external data localization requirements.
Conclusion
In this blog post, I described an architecture using AWS Outposts for Disaster Recovery on AWS in countries without a second AWS Region. To set up disaster recovery, your existing IT teams can set up and operate the disaster recovery site using the familiar AWS tools and technology in a homogeneous environment. To learn more about AWS Outposts, refer to the documentation and FAQ.
Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.
Managing NuGet packages for .NET development can be a challenge. Tasks such as initial configuration, ongoing maintenance, and scaling inefficiencies are the biggest pain points for developers and organizations. With its addition of NuGet package support, AWS CodeArtifact now provides easy-to-configure and scalable package management for .NET developers. You can use NuGet packages stored in CodeArtifact in Visual Studio, allowing you to use the tools you already know.
In this post, we show how you can provision NuGet repositories in 5 minutes. Then we demonstrate how to consume packages from your new NuGet repositories, all while using .NET native tooling.
Two core resource types make up CodeArtifact: domains and repositories. Domains provide an easy way manage multiple repositories within an organization. Repositories store packages and their assets. You can connect repositories to other CodeArtifact repositories, or popular public package repositories such as nuget.org, using upstream and external connections. For more information about these concepts, see AWS CodeArtifact Concepts.
The following diagram illustrates this architecture.
Figure: AWS CodeArtifact core concepts
Creating CodeArtifact resources with AWS CloudFormation
The AWS CloudFormation template provided in this post provisions three CodeArtifact resources: a domain, a team repository, and a shared repository. The team repository is configured to use the shared repository as an upstream repository, and the shared repository has an external connection to nuget.org.
The following diagram illustrates this architecture.
Figure: Example AWS CodeArtifact architecture
The following CloudFormation template used in this walkthrough:
To use the CloudFormation stack, we recommend you clone the following GitHub repo so you also have access to the example projects. See the following code:
git clone https://github.com/aws-samples/aws-codeartifact-samples.git
cd aws-codeartifact-samples/getting-started/dotnet/cloudformation/
Alternatively, you can copy the previous template into a file on your local filesystem named deploy.yml.
Provisioning the CloudFormation stack
Now that you have a local copy of the template, you need to provision the resources using a CloudFormation stack. You can deploy the stack using the AWS CLI or on the AWS CloudFormation console.
To use the AWS CloudFormation console, complete the following steps:
On the AWS CloudFormation console, choose Create stack.
Choose With new resources (standard).
Select Upload a template file.
Choose Choose file.
Name the stack CodeArtifact-GettingStarted-DotNet.
Continue to choose Next until prompted to create the stack.
Configuring your local development experience
We use the CodeArtifact credential provider to connect the Visual Studio IDE to a CodeArtifact repository. You need to download and install the AWS Toolkit for Visual Studio to configure the credential provider. The toolkit is an extension for Microsoft Visual Studio on Microsoft Windows that makes it easy to develop, debug, and deploy .NET applications to AWS. The credential provider automates fetching and refreshing the authentication token required to pull packages from CodeArtifact. For more information about the authentication process, see AWS CodeArtifact authentication and tokens.
To connect to a repository, you complete the following steps:
Configure an account profile in the AWS Toolkit.
Copy the source endpoint from the AWS Explorer.
Set the NuGet package source as the source endpoint.
Add packages for your project via your CodeArtifact repository.
Configuring an account profile in the AWS Toolkit
Before you can use the Toolkit for Visual Studio, you must provide a set of valid AWS credentials. In this step, we set up a profile that has access to interact with CodeArtifact. For instructions, see Providing AWS Credentials.
Figure: Visual Studio Toolkit for AWS Account Profile Setup
Copying the NuGet source endpoint
After you set up your profile, you can see your provisioned repositories.
In the AWS Explorer pane, navigate to the repository you want to connect to.
Choose your repository (right-click).
Choose Copy NuGet Source Endpoint.
Figure: AWS CodeArtifact repositories shown in the AWS Explorer
You use the source endpoint later to configure your NuGet package sources.
Setting the package source using the source endpoint
Now that you have your source endpoint, you can set up the NuGet package source.
In Visual Studio, under Tools, choose Options.
Choose NuGet Package Manager.
Under Options, choose the + icon to add a package source.
For Name , enter codeartifact.
For Source, enter the source endpoint you copied from the previous step.
Figure: Configuring NuGet package sources for AWS CodeArtifact
Adding packages via your CodeArtifact repository
After the package source is configured against your team repository, you can pull packages via the upstream connection to the shared repository.
Choose Manage NuGet Packages for your project.
You can now see packages from nuget.org.
Choose any package to add it to your project.
Exploring packages while connected to a AWS CodeArtifact repository
Viewing packages stored in your CodeArtifact team repository
Packages are stored in a repository you pull from, or referenced via the upstream connection. Because we’re pulling packages from nuget.org through an external connection, you can see cached copies of those packages in your repository. To view the packages, navigate to your repository on the CodeArtifact console.
Packages stored in a AWS CodeArtifact repository
Cleaning Up
When you’re finished with this walkthrough, you may want to remove any provisioned resources. To remove the resources that the CloudFormation template created, navigate to the stack on the AWS CloudFormation console and choose Delete Stack. It may take a few minutes to delete all provisioned resources.
After the resources are deleted, there are no more cleanup steps.
Conclusion
We have shown you how to set up CodeArtifact in minutes and easily integrate it with NuGet. You can build and push your package faster, from hours or days to minutes. You can also integrate CodeArtifact directly in your Visual Studio environment with four simple steps. With CodeArtifact repositories, you inherit the durability and security posture from the underlying storage of CodeArtifact for your packages.
As of November 2020, CodeArtifact is available in the following AWS Regions:
US: US East (Ohio), US East (N. Virginia), US West (Oregon)
AP: Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo)
EU: Europe (Frankfurt), Europe (Ireland), Europe (Stockholm)
For an up-to-date list of Regions where CodeArtifact is available, see AWS CodeArtifact FAQ.
About the Authors
John Standish
John Standish is a Solutions Architect at AWS and spent over 13 years as a Microsoft .Net developer. Outside of work, he enjoys playing video games, cooking, and watching hockey.
Nuatu Tseggai
Nuatu Tseggai is a Cloud Infrastructure Architect at Amazon Web Services. He enjoys working with customers to design and build event-driven distributed systems that span multiple services.
Neha Gupta
Neha Gupta is a Solutions Architect at AWS and have 16 years of experience as a Database architect/ DBA. Apart from work, she’s outdoorsy and loves to dance.
Elijah Batkoski
Elijah is a Technical Writer for Amazon Web Services. Elijah has produced technical documentation and blogs for a variety of tools and services, primarily focused around DevOps.
This post demonstrates how to create, publish, and download private npm packages using AWS CodeArtifact, allowing you to share code across your organization without exposing your packages to the public.
The ability to control CodeArtifact repository access using AWS Identity and Access Management (IAM) removes the need to manage additional credentials for a private npm repository when developers already have IAM roles configured.
You can use private npm packages for a variety of use cases, such as:
In this post, you create a private scoped npm package containing a sample function that can be used across your organization. You create a second project to download the npm package. You also learn how to structure your npm package to make logging in to CodeArtifact automatic when you want to build or publish the package.
The code covered in this post is available on GitHub:
You can create your npm package in three easy steps: set up the project, create your npm script for authenticating with CodeArtifact, and publish the package.
Setting up your project
Create a directory for your new npm package. We name this directory my-package because it serves as the name of the package. We use an npm scope for this package, where @myorg represents the scope all of our organization’s packages are published under. This helps us distinguish our internal private package from external packages. See the following code:
Running this script updates your npm configuration to use your CodeArtifact repository and sets your authentication token, which expires after 12 hours.
To test our new script, enter the following command:
npm run co:login
The following code is the output:
> aws codeartifact login --tool npm --repository my-repo --domain my-domain
Successfully configured npm to use AWS CodeArtifact repository https://my-domain-<ACCOUNT ID>.d.codeartifact.us-east-1.amazonaws.com/npm/my-repo/
Login expires in 12 hours at 2020-09-04 02:16:17-04:00
Add a prepare script to our package.json to run our login command:
This configures our project to automatically authenticate and generate an access token anytime npm install or npm publish run on the project.
If you see an error containing Invalid choice, valid choices are:, you need to update the AWS CLI according to the versions listed in the perquisites of this post.
Publishing your package
To publish our new package for the first time, run npm publish.
The following screenshot shows the output.
If we navigate to our CodeArtifact repository on the CodeArtifact console, we now see our new private npm package ready to be downloaded.
Installing your private npm package
To install your private npm package, you first set up the project and add the CodeArtifact configs. After you install your package, it’s ready to use.
Setting up your project
Create a directory for a new application and name it my-app. This is a sample project to download our private npm package published in the previous step. You can apply this pattern to all repositories you intend on installing your organization’s npm packages in.
npm init -y
{
"name": "my-app",
"version": "1.0.0",
"description": "A sample application consuming a private scoped npm package",
"main": "index.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
}
}
Adding CodeArtifact configs
Copy the npm scripts prepare and co:login created earlier to your new project:
{
"name": "my-app",
"version": "1.0.0",
"description": "A sample application consuming a private scoped npm package",
"main": "index.js",
"scripts": {
"prepare": "npm run co:login",
"co:login": "aws codeartifact login --tool npm --repository my-repo --domain my-domain",
"test": "echo \"Error: no test specified\" && exit 1"
}
}
Installing your new private npm package
Enter the following command:
npm install @myorg/my-package
Your package.json should now list @myorg/my-package in your dependencies:
Remove the changes made to your user profile’s npm configuration by running npm config delete registry, this will remove the CodeArtifact repository from being set as your default npm registry.
Conclusion
In this post, you successfully published a private scoped npm package stored in CodeArtifact, which you can reuse across multiple teams and projects within your organization. You can use npm scripts to streamline the authentication process and apply this pattern to save time.
About the Author
Ryan Sonshine is a Cloud Application Architect at Amazon Web Services. He works with customers to drive digital transformations while helping them architect, automate, and re-engineer solutions to fully leverage the AWS Cloud.
This post provides a clear path for customers who are evaluating and adopting Graviton2 instance types for performance improvements and cost-optimization.
Graviton2 processors are custom designed by AWS using 64-bit Arm Neoverse N1 cores. They power the T4g*, M6g*, R6g*, and C6g* Amazon Elastic Compute Cloud (Amazon EC2) instance types and offer up to 40% better price performance over the current generation of x86-based instances in a variety of workloads, such as high-performance computing, application servers, media transcoding, in-memory caching, gaming, and more.
More and more customers want to make the move to Graviton2 to take advantage of these performance optimizations while saving money.
During the transition process, a great benefit AWS provides is the ability to perform native builds for each architecture, instead of attempting to cross-compile on homogenous hardware. This has the benefit of decreasing build time as well as reducing complexity and cost to set up.
To see this benefit in action, we look at how to build a CI/CD pipeline using AWS CodePipeline and AWS CodeBuild that can build multi-architecture Docker images in parallel to aid you in evaluating and migrating to Graviton2.
Solution overview
With CodePipeline and CodeBuild, we can automate the creation of architecture-specific Docker images, which can be pushed to Amazon Elastic Container Registry (Amazon ECR). The following diagram illustrates this architecture.
The steps in this process are as follows:
Create a sample Node.js application and associated Dockerfile.
Create the buildspec files that contain the commands that CodeBuild runs.
Create three CodeBuild projects to automate each of the following steps:
CodeBuild for x86 – Creates a x86 Docker image and pushes to Amazon ECR.
CodeBuild for arm64 – Creates a Arm64 Docker image and pushes to Amazon ECR.
CodeBuild for manifest list – Creates a Docker manifest list, annotates the list, and pushes to Amazon ECR.
Automate the orchestration of these projects with CodePipeline.
Prerequisites
The prerequisites for this solution are as follows:
The correct AWS Identity and Access Management (IAM) role permissions for your account allowing for the creation of the CodePipeline pipeline, CodeBuild projects, and Amazon ECR repositories
An Amazon ECR repository named multi-arch-test
A source control service such as AWS CodeCommit or GitHub that CodeBuild and CodePipeline can interact with
Creating a sample Node.js application and associated Dockerfile
For this post, we create a sample “Hello World” application that self-reports the processor architecture. We work in the local folder that is cloned from our source repository as specified in the prerequisites.
In your preferred text editor, add a new file with the following Node.js code:
# Hello World sample app.
const http = require('http');
const port = 3000;
const server = http.createServer((req, res) => {
res.statusCode = 200;
res.setHeader('Content-Type', 'text/plain');
res.end(`Hello World. This processor architecture is ${process.arch}`);
});
server.listen(port, () => {
console.log(`Server running on processor architecture ${process.arch}`);
});
Save the file in the root of your source repository and name it app.js.
Commit the changes to Git and push the changes to our source repository. See the following code:
We also need to create a sample Dockerfile that instructs the docker build command how to build the Docker images. We use the default Node.js image tag for version 14.
In a text editor, add a new file with the following code:
# Sample nodejs application
FROM node:14
WORKDIR /usr/src/app
COPY package*.json app.js ./
RUN npm install
EXPOSE 3000
CMD ["node", "app.js"]
Save the file in the root of the source repository and name it Dockerfile. Make sure it is Dockerfile with no extension.
Commit the changes to Git and push the changes to our source repository:
git add .
git commit -m "Adding Dockerfile to host the Node.js sample application."
git push
Creating a build specification file for your application
It’s time to create and add a buildspec file to our source repository. We want to use a single buildspec.yml file for building, tagging, and pushing the Docker images to Amazon ECR for both target native architectures, x86, and Arm64. We use CodeBuild to inject environment variables, some of which need to be changed for each architecture (such as image tag and image architecture).
A buildspec is a collection of build commands and related settings, in YAML format, that CodeBuild uses to run a build. For more information, see Build specification reference for CodeBuild.
The buildspec we add instructs CodeBuild to do the following:
install phase – Update the yum package manager
pre_build phase – Sign in to Amazon ECR using the IAM role assumed by CodeBuild
build phase – Build the Docker image using the Docker CLI and tag the newly created Docker image
post_build phase – Push the Docker image to our Amazon ECR repository
We first need to add the buildspec.yml file to our source repository.
In a text editor, add a new file with the following build specification:
version: 0.2
phases:
install:
commands:
- yum update -y
pre_build:
commands:
- echo Logging in to Amazon ECR...
- $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION)
build:
commands:
- echo Build started on `date`
- echo Building the Docker image...
- docker build -t $IMAGE_REPO_NAME:$IMAGE_TAG .
- docker tag $IMAGE_REPO_NAME:$IMAGE_TAG $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG
post_build:
commands:
- echo Build completed on `date`
- echo Pushing the Docker image...
- docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG
Save the file in the root of the repository and name it buildspec.yml.
Because we specify environment variables in the CodeBuild project, we don’t need to hard code any values in the buildspec file.
Commit the changes to Git and push the changes to our source repository:
Creating a build specification file for your manifest list creation
Next we create a buildspec file that instructs CodeBuild to create a Docker manifest list, and associate that manifest list with the Docker images that the buildspec file builds.
A manifest list is a list of image layers that is created by specifying one or more (ideally more than one) image names. You can then use it in the same way as an image name in docker pull and docker run commands, for example. For more information, see manifest create.
As of this writing, manifest creation is an experimental feature of the Docker command line interface (CLI).
Experimental features provide early access to future product functionality. These features are intended only for testing and feedback because they may change between releases without warning or be removed entirely from a future release. Experimental features must not be used in production environments. For more information, Experimental features.
When creating the CodeBuild project for manifest list creation, we specify a buildspec file name override as buildspec-manifest.yml. This buildspec instructs CodeBuild to do the following:
install phase – Update the yum package manager
pre_build phase – Sign in to Amazon ECR using the IAM role assumed by CodeBuild
build phase – Perform three actions:
Set environment variable to enable Docker experimental features for the CLI
Create the Docker manifest list using the Docker CLI
Annotate the manifest list to add the architecture-specific Docker image references
post_build phase – Push the Docker image to our Amazon ECR repository and use docker manifest inspect to echo out the contents of the manifest list from Amazon ECR
We first need to add the buildspec-manifest.yml file to our source repository.
In a text editor, add a new file with the following build specification:
version: 0.2
# Based on the Docker documentation, must include the DOCKER_CLI_EXPERIMENTAL environment variable
# https://docs.docker.com/engine/reference/commandline/manifest/
phases:
install:
commands:
- yum update -y
pre_build:
commands:
- echo Logging in to Amazon ECR...
- $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION)
build:
commands:
- echo Build started on `date`
- echo Building the Docker manifest...
- export DOCKER_CLI_EXPERIMENTAL=enabled
- docker manifest create $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:latest-arm64v8 $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:latest-amd64
- docker manifest annotate --arch arm64 $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:latest-arm64v8
- docker manifest annotate --arch amd64 $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:latest-amd64
post_build:
commands:
- echo Build completed on `date`
- echo Pushing the Docker image...
- docker manifest push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME
- docker manifest inspect $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME
Save the file in the root of the repository and name it buildspec-manifest.yml.
Commit the changes to Git and push the changes to our source repository:
Now we have created a single buildspec.yml file for building, tagging, and pushing the Docker images to Amazon ECR for both target native architectures: x86 and Arm64. This file is shared by two of the three CodeBuild projects that we create. We use CodeBuild to inject environment variables, some of which need to be changed for each architecture (such as image tag and image architecture). We also want to use the single Docker file, regardless of the architecture. We also need to ensure any third-party libraries are present and compiled correctly for the target architecture.
For more information about third-party libraries and software versions that have been optimized for Arm, see the Getting started with AWS Graviton GitHub repo.
We use the same environment variable names for the CodeBuild projects, but each project has specific values, as detailed in the following table. You need to modify these values to your numeric AWS account ID, the AWS Region where your Amazon ECR registry endpoint is located, and your Amazon ECR repository name. The instructions for adding the environment variables in the CodeBuild projects are in the following sections.
Environment Variable
x86 Project values
Arm64 Project values
manifest Project values
1
AWS_DEFAULT_REGION
us-east-1
us-east-1
us-east-1
2
AWS_ACCOUNT_ID
111111111111
111111111111
111111111111
3
IMAGE_REPO_NAME
multi-arch-test
multi-arch-test
multi-arch-test
4
IMAGE_TAG
latest-amd64
latest-arm64v8
latest
The image we use in this post uses architecture-specific tags with the term latest. This is for demonstration purposes only; it’s best to tag the images with an explicit version or another meaningful reference.
CodeBuild for x86
We start with creating a new CodeBuild project for x86 on the CodeBuild console.
CodeBuild looks for a file named buildspec.yml by default, unless overridden. For these first two CodeBuild projects, we rely on that default and don’t specify the buildspec name.
On the CodeBuild console, choose Create build project.
For Project name, enter a unique project name for your build project, such as node-x86.
To add tags, add them under Additional Configuration.
Choose a Source provider (for this post, we choose GitHub).
For Environment image, choose Managed image.
Select Amazon Linux 2.
For Runtime(s), choose Standard.
For Image, choose aws/codebuild/amazonlinux2-x86_64-standard:3.0.
This is a x86 build image.
Select Privileged.
For Service role, choose New service role.
Enter a name for the new role (one is created for you), such as CodeBuildServiceRole-nodeproject.
We reuse this same service role for the other CodeBuild projects associated with this project.
Expand Additional configurations and move to the Environment variables
Create the following Environment variables:
Name
Value
Type
1
AWS_DEFAULT_REGION
us-east-1
Plaintext
2
AWS_ACCOUNT_ID
111111111111
Plaintext
3
IMAGE_REPO_NAME
multi-arch-test
Plaintext
4
IMAGE_TAG
latest-amd64
Plaintext
Choose Create build project.
Attaching the IAM policy
Now that we have created the CodeBuild project, we need to adjust the new service role that was just created and attach an IAM policy so that it can interact with the Amazon ECR API.
On the CodeBuild console, choose the node-x86 project
Choose the Build details
Under Service role, choose the link that looks like arn:aws:iam::111111111111:role/service-role/CodeBuildServiceRole-nodeproject.
A new browser tab should open.
Choose Attach policies.
In the Search field, enter AmazonEC2ContainerRegistryPowerUser.
Select AmazonEC2ContainerRegistryPowerUser.
Choose Attach policy.
CodeBuild for arm64
Now we move on to creating a new (second) CodeBuild project for Arm64.
On the CodeBuild console, choose Create build project.
For Project name, enter a unique project name, such as node-arm64.
If you want to add tags, add them under Additional Configuration.
Choose a Source provider (for this post, choose GitHub).
For Environment image, choose Managed image.
Select Amazon Linux 2.
For Runtime(s), choose Standard.
For Image, choose aws/codebuild/amazonlinux2-aarch64-standard:2.0.
This is an Arm build image and is different from the image selected in the previous CodeBuild project.
Select Privileged.
For Service role, choose Existing service role.
Choose CodeBuildServiceRole-nodeproject.
Select Allow AWS CodeBuild to modify this service role so it can be used with this build project.
Expand Additional configurations and move to the Environment variables
Create the following Environment variables:
Name
Value
Type
1
AWS_DEFAULT_REGION
us-east-1
Plaintext
2
AWS_ACCOUNT_ID
111111111111
Plaintext
3
IMAGE_REPO_NAME
multi-arch-test
Plaintext
4
IMAGE_TAG
latest-arm64v8
Plaintext
Choose Create build project.
CodeBuild for manifest list
For the last CodeBuild project, we create a Docker manifest list, associating that manifest list with the Docker images that the preceding projects create, and pushing the manifest list to ECR. This project uses the buildspec-manifest.yml file created earlier.
On the CodeBuild console, choose Create build project.
For Project name, enter a unique project name for your build project, such as node-manifest.
If you want to add tags, add them under Additional Configuration.
Choose a Source provider (for this post, choose GitHub).
For Environment image, choose Managed image.
Select Amazon Linux 2.
For Runtime(s), choose Standard.
For Image, choose aws/codebuild/amazonlinux2-x86_64-standard:3.0.
This is a x86 build image.
Select Privileged.
For Service role, choose Existing service role.
Choose CodeBuildServiceRole-nodeproject.
Select Allow AWS CodeBuild to modify this service role so it can be used with this build project.
Expand Additional configurations and move to the Environment variables
Create the following Environment variables:
Name
Value
Type
1
AWS_DEFAULT_REGION
us-east-1
Plaintext
2
AWS_ACCOUNT_ID
111111111111
Plaintext
3
IMAGE_REPO_NAME
multi-arch-test
Plaintext
4
IMAGE_TAG
latest
Plaintext
For Buildspec name – optional, enter buildspec-manifest.yml to override the default.
Choose Create build project.
Setting up CodePipeline
Now we can move on to creating a pipeline to orchestrate the builds and manifest creation.
On the CodePipeline console, choose Create pipeline.
For Pipeline name, enter a unique name for your pipeline, such as node-multi-architecture.
For Service role, choose New service role.
Enter a name for the new role (one is created for you). For this post, we use the generated role name CodePipelineServiceRole-nodeproject.
Select Allow AWS CodePipeline to create a service role so it can be used with this new pipeline.
Choose Next.
Choose a Source provider (for this post, choose GitHub).
If you don’t have any existing Connections to GitHub, select Connect to GitHub and follow the wizard.
Choose your Branch name (for this post, I choose main, but your branch might be different).
For Output artifact format, choose CodePipeline default.
Choose Next.
You should now be on the Add build stage page.
For Build provider, choose AWS CodeBuild.
Verify the Region is your Region of choice (for this post, I use US East (N. Virginia)).
For Project name, choose node-x86.
For Build type, select Single build.
Choose Next.
You should now be on the Add deploy stage page.
Choose Skip deploy stage.
A pop-up appears that reads Your pipeline will not include a deployment stage. Are you sure you want to skip this stage?
Choose Skip.
Choose Create pipeline.
CodePipeline immediately attempts to run a build. You can let it continue without worry if it fails. We are only part of the way done with the setup.
Adding an additional build step
We need to add the additional build step for the Arm CodeBuild project in the Build stage.
On the CodePipeline console, choose node-multi-architecture pipeline
Choose Edit to start editing the pipeline stages.
You should now be on the Editing: node-multi-architecture page.
For the Build stage, choose Edit stage.
Choose + Add action.
For Action name, enter Build-arm64.
For Action provider, choose AWS CodeBuild.
Verify your Region is correct.
For Input artifacts, select SourceArtifact.
For Project name, choose node-arm64.
For Build type, select Single build.
Choose Done.
Choose Save.
A pop-up appears that reads Saving your changes cannot be undone. If the pipeline is running when you save your changes, that execution will not complete.
Choose Save.
Updating the first build action name
This step is optional. The CodePipeline wizard doesn’t allow you to enter your Build action name during creation, but you can update the Build stage’s first build action to have consistent naming.
Choose Edit to start editing the pipeline stages.
Choose the Edit icon.
For Action name, enter Build-x86.
Choose Done.
Choose Save.
A pop-up appears that says Saving your changes cannot be undone. If the pipeline is running when you save your changes, that execution will not complete.
Choose Save.
Adding the project
Now we add the CodeBuild project for manifest creation and publishing.
On the CodePipeline console, choose node-multi-architecture pipeline.
Choose Edit to start editing the pipeline stages.
Choose +Add stage below the Build
Set the Stage name to Manifest
Choose +Add action group.
For Action name, enter Create-manifest.
For Action provider, choose AWS CodeBuild.
Verify your Region is correct.
For Input artifacts, select SourceArtifact.
For Project name, choose node-manifest.
For Build type, select Single build.
Choose Done.
Choose Save.
A pop-up appears that reads Saving your changes cannot be undone. If the pipeline is running when you save your changes, that execution will not complete.
Choose Save.
Testing the pipeline
Now let’s verify everything works as planned.
In the pipeline details page, choose Release change.
This runs the pipeline in stages. The process should take a few minutes to complete. The pipeline should show each stage as Succeeded.
Now we want to inspect the output of the Create-manifest action that runs the CodeBuild project for manifest creation.
Choose Details in the Create-manifest
This opens the CodeBuild pipeline.
Under Build logs, we should see the output from the manifest inspect command we ran as the last step in the buildspec-manifest.yml See the following sample log:
To avoid incurring future charges, clean up the resources created as part of this post.
On the CodePipeline console, choose the pipeline node-multi-architecture.
Choose Delete pipeline.
When prompted, enter delete.
Choose Delete.
On the CodeBuild console, choose the Build project node-x86.
Choose Delete build project.
When prompted, enter delete.
Choose Delete.
Repeat the deletion process for Build projects node-arm64 and node-manifest.
Next we delete the Docker images we created and pushed to Amazon ECR. Be careful to not delete a repository that is being used for other images.
On the Amazon ECR console, choose the repository multi-arch-test.
You should see a list of Docker images.
Select latest, latest-arm64v8, and latest-amd64.
Choose Delete.
When prompted, enter delete.
Choose Delete.
Finally, we remove the IAM roles that we created.
On the IAM console, choose Roles.
In the search box, enter CodePipelineServiceRole-nodeproject.
Select the role and choose Delete role.
When prompted, choose Yes, delete.
Repeat these steps for the role CodeBuildServiceRole-nodeproject.
Conclusion
To summarize, we successfully created a pipeline to create multi-architecture Docker images for both x86 and arm64. We referenced them via annotation in a Docker manifest list and stored them in Amazon ECR. The Docker images were based on a single Docker file that uses environment variables as parameters to allow for Docker file reuse.
For more information about these services, see the following:
Enterprise AWS customers are often managing many accounts under a payer account, and sometimes accounts are closed before Reserved Instances (RI) or Savings Plans (SP) are fully used. Manually tracking account closures and requesting RI and SP migration from the closed accounts can become complex and error prone.
This blog post describes a solution for automating the process of requesting migration of unused RI and SP from closed accounts that are linked to a payer account. The solution helps reduce manual work and loss of unused RI and SP due to human error.
The solution automatically detects newly closed accounts that are linked to a payer account. If the closed accounts have unused RI and SP attached, it creates support cases with the required information for RI and SP migration. It can optionally send email notifications with the support case IDs. After a support case is created, the AWS Support team starts a workflow for processing the request, and updates the support case when the request is completed.
Solution Overview
The following diagram shows the overview of the solution.
Walkthrough
An Amazon CloudWatch Events time-based rule will create an event with the configured interval, such as one hour.
The event triggers an AWS Lambda function (refresh.js), which will retrieve the current status of all the linked accounts for the payer account from AWS Organizations. It saves the account status as an item in an Amazon DynamoDB table.
The DynamoDB table has Streams enabled and it triggers another Lambda function (process.js) when there are updates in the table. The stream view type is set to NEW_AND_OLD_IMAGES.
The Lambda function performs the following steps:
Compares the differences between the new and the old images in the stream record. If an account’s status is SUSPENDED in the new image and is ACTIVE in the old image, the account is newly closed.
Retrieves the RI and SP utilization information from AWS Cost Explorer for the closed account.
If there are unused RI and SP in the account, it creates support cases to request RI and SP migration and/or send email notifications through Amazon SNS.
Next, let’s take a deeper look at the Lambda functions, which were written in JavaScript. The code for the Lambda functions can be downloaded from this repository.
Lambda function: refresh.js
This Lambda function is triggered by a CloudWatch Events rule at the predefined interval, for example, one hour. It retrieves the account statuses of all the linked accounts under a payer account from AWS Organizations. Then it saves the account statuses to a DynamoDB table.
It calls the AWS Organizations ListAccounts API to get the current status of all the linked accounts for the given payer account. The following sample code is for the ListAccounts API call.
const AWS = require('aws-sdk')
// Environment variable
const organizationsEndpoint = process.env.ORGANIZATIONS_ENDPOINT;
const orgEndpoint = new AWS.Endpoint(organizationsEndpoint);
// Get region from the endpoint hostname, e.g. organizations.us-east-1.amazonaws.com
const orgRegion = organizationsEndpoint.split(".")[1];
const organizations = new AWS.Organizations({
endpoint: orgEndpoint,
region: orgRegion
});
// Call the ListAccounts API to retrieve the status for all the linked accounts
function refreshAccounts() {
console.log("Enter refreshAccounts");
var newData = {Accounts:[]};
var handleError = function(response) {
console.log(response.err, response.err.stack); // an error occurred
throw response.err;
};
var params = {};
organizations.listAccounts(params).on('success', function handlePage(response) {
console.log("ListAccounts response: \n" + JSON.stringify(response.data));
var account;
// extract each account's Id and Status
for (account of response.data.Accounts) {
var accountStatus = {Id: account.Id, Status: account.Status};
newData.Accounts.push(accountStatus);
}
// handle response pagination
if (response.hasNextPage()) {
response.nextPage().on('success', handlePage).on('error', handleError)
.send();
} else {
console.log("Account Status: \n" + JSON.stringify(newData));
// insert code here to save the data to DynamoDB
}
}).on('error', handleError).send();
}
The following sample output is from the ListAccounts API call.
Next, it calls the DynamoDB UpdateItem API to save the current account status data as an item in the DynamoDB table. The API is configured with conditional update.
If the DynamoDB table already has an item with the same key, the item is not updated unless there is a different value. When the table is updated, DynamoDB Streams triggers a call to the next Lambda function. Below is the sample code for the UpdateItem API call.
const tableName = process.env.TABLE_NAME; // environment variable
const dynamodb = new AWS.DynamoDB();
// Save the account status JSON to the DynamoDB table
// if it is different than the existing data in the table
function saveDataToDynamoDB(data) {
console.log("Enter saveDataToDynamoDB");
var params = {
ExpressionAttributeNames: {
"#ACC": "Accounts"
},
ExpressionAttributeValues: {
":t": {
S: data
}
},
Key: {
"Organization": {
S: "default"
}
},
ConditionExpression: "#ACC <> :t",
ReturnValues: "ALL_NEW",
TableName: tableName,
UpdateExpression: "SET #ACC = :t"
};
dynamodb.updateItem(params, function(err, data) {
console.log("DynamoDB callback...");
if (err) {
console.log(err, err.stack); // an error occurred
if (err.code != "ConditionalCheckFailedException") {
throw err;
}
} else {
console.log(data); // successful response
}
});
}
Lambda function: process.js
This Lambda function processes DynamoDB stream records and compares the old and new item data in the stream record. If any of the linked accounts changed statuses to SUSPENDED from ACTIVE, it calls the Cost Explorer APIs – GetReservationUtilization API and GetSavingsPlansUtilizationDetails API– to check if the newly closed accounts have unused RI and SP attached. The following sample code is for the GetReservationUtilization API call.
const costexplorerEndpoint = process.env.COST_EXPLORER_ENDPOINT; // environment variable
const ceEndpoint = new AWS.Endpoint(costexplorerEndpoint);
// Get region from the endpoint hostname, e.g. ce.us-east-1.amazonaws.com
const ceRegion = costexplorerEndpoint.split(".")[1];
const costexplorer = new AWS.CostExplorer({
endpoint:ceEndpoint,
region:ceRegion
});
// Check if the source account has unused RI. If yes, send email notification
// or create support case based on ACTION_TYPE
// Params
// startDate: String with 'YYYY-MM-DD' format
// endDate: String with 'YYYY-MM-DD' format
// sourceAccountId: the AWS account to migrate RI from
// destinationAccountId: the AWS account to migrate RI to
function processReservationUtilization(startDate,endDate,
sourceAccountId,destinationAccountId) {
console.log("Enter processReservationUtilization");
// array to hold all the unused RI lease IDs from the source account
var riLeaseIds = [];
const now = new Date();
var params = {
TimePeriod: {
End: endDate,
Start: startDate
},
Filter: {
"Dimensions": {
"Key": "LINKED_ACCOUNT",
"Values": [
sourceAccountId
]
}
},
GroupBy: [
{
Key: 'SUBSCRIPTION_ID',
Type: "DIMENSION"
}
]
};
var handleError = function(response) {
console.log(response.err, response.err.stack); // an error occurred
throw response.err;
};
costexplorer.getReservationUtilization(params)
.on('success', function handlePage(response) {
console.log("GetReservationUtilization response \n" + response.data);
var utilizationsByTime;
var group;
// Find unused RI in the results
for (utilizationsByTime of response.data.UtilizationsByTime) {
for (group of utilizationsByTime.Groups) {
const endDateTime = new Date(group.Attributes.endDateTime);
if (endDateTime.getTime() > now.getTime()) {
riLeaseIds.push(group.Attributes.leaseId);
}
}
}
// Handle response pagination
if (response.hasNextPage()) {
response.nextPage().on('success', handlePage)
.on('error', handleError).send();
} else if (riLeaseIds.length > 0) {
var requestBody = riRequestBody + "\nSource Account: " + sourceAccountId
+ "\nDestination Account: " + destinationAccountId + "\nSavings Plans Arns: " + JSON.stringify(riLeaseIds);
// Insert code here to create support case and/or send message to SNS
}
}).on('error', handleError).send();
}
// Check if the source account has unused SP. If yes, send email notification
// or create support case based on ACTION_TYPE
// Params
// startDate: String with 'YYYY-MM-DD' format
// endDate: String with 'YYYY-MM-DD' format
// sourceAccountId: the AWS account to migrate SP from
// destinationAccountId: the AWS account to migrate SP to
function processSavingsPlansUtilization(startDate,endDate,
sourceAccountId,destinationAccountId) {
console.log("Enter processSavingsPlansUtilization");
//array to hold all the unused SP ARNs from the source account
var savingsPlansArns = [];
const now = new Date();
var params = {
TimePeriod: { /* required */
End: endDate,
Start: startDate
},
Filter: {
"Dimensions": {
"Key": "LINKED_ACCOUNT",
"Values": [
sourceAccountId
]
}
}
};
var handleError = function(response) {
console.log(response.err, response.err.stack); // an error occurred
throw response.err;
};
costexplorer.getSavingsPlansUtilizationDetails(params).on('success', function handlePage(response) {
console.log("getSavingsPlansUtilizationDetails response \n" + response.data);
var utilizationDetail;
// Find unused SP in the results
for (utilizationDetail of response.data.SavingsPlansUtilizationDetails) {
const endDateTime = new Date(utilizationDetail.Attributes.EndDateTime);
if (endDateTime.getTime() > now.getTime()) {
savingsPlansArns.push(utilizationDetail.SavingsPlanArn);
}
}
// handle response pagination
if (response.hasNextPage()) {
response.nextPage().on('success', handlePage).on('error', handleError).send();
} else if (savingsPlansArns.length > 0) {
var requestBody = spRequestBody + "\nSource Account: " + sourceAccountId + "\nDestination Account: " + destinationAccountId + "\nSavings Plans Arns: " + JSON.stringify(savingsPlansArns);
// insert code here to create support case and/or send message to SNS
}
}).on('error', handleError).send();
}
For those accounts that require RI or SP migration, it calls the AWS Support CreateCase API to create support cases. It can either/or send email notifications based on the value of the environment variable ACTION_TYPE.
EMAIL_ONLY: Send the RI and SP details to an SNS topic for email notification.
SUPPORT_CASE_ONLY: Create support cases to request RI and SP migration.
SUPPORT_CASE_AND_EMAIL: Create support cases to request RI and SP migration and send the request details to an SNS topic for email notification. The support case IDs are included in the email subject.
The following sample code is for the CreateCase API call.
const supportEndpoint = process.env.SUPPORT_ENDPOINT; // environment variable
const supEndpoint = new AWS.Endpoint(supportEndpoint);
// get region from the endpoint hostname, e.g. support.us-east-1.amazonaws.com
const supRegion = supportEndpoint.split(".")[1];
const support = new AWS.Support({
endpoint:supEndpoint,
region:supRegion
});
// Create an AWS Support Case
// Params:
// emailAddress: email address for the support case
// caseSubject: the subject of the support case
// caseBody: the body of the support case
function createSupportCase(emailAddresses, caseSubject, caseBody, notify) {
console.log("Enter createSupportCase");
var params = {
communicationBody: caseBody, /* required */
subject: caseSubject, /* required */
categoryCode: 'reserved-instances',
ccEmailAddresses: [
emailAddresses
],
issueType: 'customer-service',
serviceCode: 'billing',
severityCode: 'low'
};
support.createCase(params, function(err, data) {
console.log("Support callback...");
if (err) {
console.log(err, err.stack); // an error occurred
throw err;
} else {
console.log(data); // successful response
if (notify) {
// insert code here to publish the message to a SNS topic
}
}
});
}
The following sample support case is content from the Lambda function:
Subject:
Savings Plans Migration
Body:
Please migrate the following Savings Plans from the source account to the destination account.
The AWS RI Operations Team has provided the following guidelines for RI and SP migration support cases.
1) If a closed account is still linked to a payer account, create one support case from the payer account. Otherwise, create one support case from each account – source and destination.
2) When creating a support case, use a user ID or a role with RI and SP purchase IAM permissions.
3) In the support case body, provide RI lease IDs or SP ARNs, source account ID, destination account ID, and an explicit request to migrate the RI and SP. If there are many RI lease IDs or SP ARNs, provide the data in CSV format.
Note: Do not use PDF or images.
4) Set severity code to low, service code to billing, category code to reserved-instances, and issue type to customer-service.
Deploy the stack
Download the template for deploying the solution used in this post.
Use this template to launch the solution and all associated components. The default configuration deploys AWS Lambda functions, a DynamoDB table, an Amazon CloudWatch Events rule, an SNS topic, and AWS Identity and Access Management (IAM) roles necessary to set up the solution in your account, but you can customize the template to meet your specific needs.
Before you launch the solution, review the architecture, configuration, network security, and other considerations. Follow the step-by-step instructions in this section to configure and deploy the solution into your account. You need to have authority to launch resources in the payer account before launching the stack.
Note: You are responsible for the cost of the AWS services used while running this solution. For more details, refer to the pricing web page for each AWS service used in this solution.
Time to deploy: Approximately five minutes.
1. Sign in to the AWS Management Console and use the following button to launch the AWS CloudFormation stack. Optionally, you can download the template as a starting point for your own implementation. Note: the CloudFormation stack needs to be created in the payer account.
2. The template launches in the US East (N. Virginia) Region by default. You can choose to launch it in other regions.
3. On the Create stack page, verify that the correct template URL is in the Amazon S3 URL text box and choose Next.
4. On the Specify stack details page, assign a name to your solution stack.
5. Under Parameters, review the parameters for this solution template and modify them as necessary. This solution uses the following default values.
Recommended documentation for reviewing these parameters:
7. On the Configure stack options page, choose Next.
8. On the Review page, review and confirm the settings. Check the box acknowledging that the template will create AWS Identity and Access Management (IAM) resources.
9. Choose Create stack to deploy the stack.
You can view the status of the stack in the AWS CloudFormation console in the Status column. You should receive a CREATE_COMPLETE status in approximately five minutes.
Clean Up
To clean up the stack, select the stack in the AWS CloudFormation console and click the Delete button. In the popup window, choose Delete stack.
Conclusion
This post described a solution for automating the process of detecting account closures and requesting RI and SP migration. The solution will help enterprise customers reduce manual work and avoid losing unused RI and SP due to human errors.
If you have any comments or questions, please don’t hesitate to leave them in the comments section.
Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.
One of the use cases we hear from customers is that they want to provide very limited access to Amazon Workspaces users (for example contractors, consultants) in an AWS account. At the same time they want to allow them to query Amazon Simple Storage Service (Amazon S3) data in another account using Amazon Athena over a JDBC connection.
For example, marketing companies might provide private access to the first party data to media agencies through this mechanism.
The restrictions they want to put in place are:
For security reasons these Amazon WorkSpaces should not have internet connectivity. So the access to Amazon Athena must be over AWS PrivateLink.
Public access to Athena is not allowed using the credentials used for the JDBC connection. This is to prevent the users from leveraging the credentials to query the data from anywhere else.
In this post, we show how to use Amazon Virtual Private Cloud (Amazon VPC) endpoints for Athena, along with AWS Identity and Access Management (AWS IAM) policies. This provides private access to query the Amazon S3 data while preventing users from querying the data from outside their Amazon WorkSpaces or using the Athena public endpoint.
Let’s review the steps to achieve this:
Initial setup of two AWS accounts (AccountA and AccountB)
Set up Amazon S3 bucket with sample data in AccountA
Set up an IAM user with Amazon S3 and Athena access in AccountA
Use DbVisualizer to the query the Amazon S3 data in AccountA using the Athena public endpoint
Update IAM policy for user in AccountA to restrict private only access
Prerequisites
To follow the steps in this post, you need two AWS Accounts. The Amazon VPC and subnet requirements are specified in the detailed steps.
Note: The AWS CloudFormation template used in this blog post is for US-EAST-1 (N. Virginia) Region so ensure the Region setting for both the accounts are set to US-EAST-1 (N. Virginia).
Walkthrough
The two AWS accounts are:
AccountA – Contains the Amazon S3 bucket where the data is stored. For AccountA you can create a new Amazon VPC or use the default Amazon VPC.
The AWS CloudFormation template will create a new Amazon VPC in AccountB with CIDR 10.10.0.0/16 and set up one public subnet and two private subnets.
It will also create a NAT Gateway in the public subnet and create both public and private route tables.
Since we will be launching Amazon WorkSpaces in these private subnets and not all Availability Zones (AZ) are supported by Amazon WorkSpaces, it is important to choose the right AZ when creating them.
Review the documentation to learn which AWS Regions/AZ are supported.
We have provided two parameters in the AWS CloudFormation template:
Create a new Amazon S3 bucket in AccountA with a bucket name that starts with ‘athena-’.
Next, you can download a sample file and upload it to the Amazon S3 bucket you just created.
Use the following statements to create AWS Glue database. Use an external table for the data in the Amazon S3 bucket so that you can query it from Athena.
Go to Athena console and define a new database:
CREATE DATABASE IF NOT EXISTS sampledb
Once the database is created, create a new table in sampledb (by selecting sampledb from the “Database” drop down menu). Replace the <<your bucket name>> with the bucket you just created:
CREATE EXTERNAL TABLE IF NOT EXISTS sampledb.amazon_reviews_tsv(
marketplace string,
customer_id string,
review_id string,
product_id string,
product_parent string,
product_title string,
product_category string,
star_rating int,
helpful_votes int,
total_votes int,
vine string,
verified_purchase string,
review_headline string,
review_body string,
review_date string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
LOCATION
's3://<<your bucket name>>/'
TBLPROPERTIES ("skip.header.line.count"="1")
Step 3
In AccountA, create a new IAM user with programmatic access.
Save the access key and secret access key.
For the same user add an Inline Policy which allows the following actions:
In this step, we create an Interface VPC endpoint (AWS PrivateLink) for Athena in AccountA. When you use an interface VPC endpoint, communication between your Amazon VPC and Athena is conducted entirely within the AWS network.
Each VPC endpoint is represented by one or more Elastic Network Interfaces (ENIs) with private IP addresses in your VPC subnets.
To create an Interface VPC endpoint follow the instructions and select Athena in the AWS Services list. Do not select the checkbox for Enable Private DNS Name.
Ensure the security group that is attached to the Amazon VPC endpoint is open to inbound traffic on port 443 and 444 for source AccountB VPC CIDR 10.10.0.0/16. Port 444 is used by Athena to stream query results.
Once you create the VPC endpoint, you will get a DNS endpoint name which is in the following format. We are going to use this in JDBC connection from the SQL client.
Each Amazon WorkSpace is associated with the specific Amazon VPC and AWS Directory Service construct that you used to create it. All Directory Service constructs (Simple AD, AD Connector, and Microsoft AD) require two subnets to operate, each in different Availability Zones. This is why we created 2 private subnets at the beginning.
For this blog post I have used Simple AD as the directory service for the Amazon WorkSpaces.
By default, IAM users don’t have permissions for Amazon WorkSpaces resources and operations.
To allow IAM users to manage Amazon WorkSpaces resources, you must create an IAM policy that explicitly grants them permissions
To start, go to the Amazon WorkSpaces console and select Advanced Setup.
Set up a new directory using the SimpleAD option.
Use the “small” directory size and choose the Amazon VPC and private subnets you created in Step 1 for AccountB.
Once you create the directory, register the directory with Amazon WorkSpaces by selecting “Register” from the “Action” menu.
Select private subnets you created in Step 1 for AccountB.
Next, launch Amazon WorkSpaces by following the Launch WorkSpaces button.
Select the directory you created and create a new user.
For the bundle, choose Standard with Windows 10 (PCoIP).
After the Amazon WorkSpaces is created, you can log in to the Amazon WorkSpaces using a client software. You can download it from https://clients.amazonworkspaces.com/
Login to your Amazon WorkSpace, install a SQL Client of your choice. At this point your Amazon WorkSpace still has Internet access via the NAT Gateway
I have used DbVisualizer (the free version) as the SQL client. Once you have that installed, install the JDBC driver for Athena following the instructions
Now you can set up the JDBC connections to Athena using the access key and secret key you set up for an IAM user in AccountA.
Step 6
To test out both the Athena public endpoint and the Athena VPC endpoint, create two connections using the same credentials.
For the Athena public endpoint, you need to use athena.us-east-1.amazonaws.com service endpoint. (jdbc:awsathena://athena.us-east-1.amazonaws.com:443;S3OutputLocation=s3://<athena-bucket-name>/)
For the VPC Endpoint Connection, use the VPC Endpoint you created in Step 4 (jdbc:awsathena://vpce-<>.athena.us-east-1.vpce.amazonaws.com:443;S3OutputLocation=s3://<athena-bucket-name>/)
Now run a simple query to select records from the amazon_reviews_tsv table using both the connections.
SELECT * FROM sampledb.amazon_reviews_tsv limit 10
You should be able to see results using both the connections. Since the private subnets are still connected to the internet via the NAT Gateway, you can query using the Athena public endpoint.
Run the AWS Command Line Interface (AWS CLI) command using the credentials used for the JDBC connection from your workstation. You should be able to access the Amazon S3 bucket objects and the Athena query run list using the following commands.
aws s3 ls s3://athena-workspaces-blogpost
aws athena list-query-executions
Step 7
Now we lock down the access as described in the beginning of this blog post by taking the following actions:
Update the route table for the private subnets by removing the route for the internet so access to the Athena public endpoint is restricted from the Amazon WorkSpaces. The only access will be allowed through the Athena VPC Endpoint.
Add conditional checks to the IAM user access policy that will restrict access to the Amazon S3 buckets and Athena only if:
The request came in through the VPC endpoint. For this we use the “aws:SourceVpce” check and provide the VPC Endpoint ID value.
The request for Amazon S3 data is through Athena. For this we use the condition “aws:CalledVia” and provide a value of “athena.amazonaws.com”.
In the IAM access policy below replace <<your vpce id>> with your VPC endpoint id and update the previous inline policy which was added to the IAM user in Step 3.
Once you applied the changes, try to reconnect using both the Athena VPC endpoint as well Athena public endpoint connections. The Athena VPC endpoint connection should work but the public endpoint connection will time out. Also try the same Amazon S3 and Athena AWS CLI commands. You should get access denied for both the operations.
Clean Up
To avoid incurring costs, remember to delete the resources that you created.
For AWS AccountA:
Delete the S3 buckets
Delete the database you created in AWS Glue
Delete the Amazon VPC endpoint you created for Amazon Athena
For AccountB:
Delete the Amazon Workspace you created along with the Simple AD directory. You can review more information on how to delete your Workspaces.
Conclusion
In this blog post, I showed how to leverage Amazon VPC endpoints and IAM policies to privately connect to Amazon Athena from Amazon Workspaces that don’t have internet connectivity.
Give this solution a try and share your feedback in the comments!
Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.