Tag Archives: Technical How-to

Raising code quality for Python applications using Amazon CodeGuru

2020-12-05 Ran Fu

Post Syndicated from Ran Fu original https://aws.amazon.com/blogs/devops/raising-code-quality-for-python-applications-using-amazon-codeguru/

We are pleased to announce the launch of Python support for Amazon CodeGuru, a service for automated code reviews and application performance recommendations. CodeGuru is powered by program analysis and machine learning, and trained on best practices and hard-learned lessons across millions of code reviews and thousands of applications profiled on open-source projects and internally at Amazon.

Amazon CodeGuru has two services:

Amazon CodeGuru Reviewer – Helps you improve source code quality by detecting hard-to-find defects during application development and recommending how to remediate them.
Amazon CodeGuru Profiler – Helps you find the most expensive lines of code, helps reduce your infrastructure cost, and fine-tunes your application performance.

The launch of Python support extends CodeGuru beyond its original Java support. Python is a widely used language for various use cases, including web app development and DevOps. Python’s growth in data analysis and machine learning areas is driven by its rich frameworks and libraries. In this post, we discuss how to use CodeGuru Reviewer and Profiler to improve your code quality for Python applications.

CodeGuru Reviewer for Python

CodeGuru Reviewer now allows you to analyze your Python code through pull requests and full repository analysis. For more information, see Automating code reviews and application profiling with Amazon CodeGuru. We analyzed large code corpuses and Python documentation to source hard-to-find coding issues and trained our detectors to provide best practice recommendations. We expect such recommendations to benefit beginners as well as expert Python programmers.

CodeGuru Reviewer generates recommendations in the following categories:

AWS SDK APIs best practices
Data structures and control flow, including exception handling
Resource leaks
Secure coding practices to protect from potential shell injections

In the following sections, we provide real-world examples of bugs that can be detected in each of the categories:

AWS SDK API best practices

AWS has hundreds of services and thousands of APIs. Developers can now benefit from CodeGuru Reviewer recommendations related to AWS APIs. AWS recommendations in CodeGuru Reviewer cover a wide range of scenarios such as detecting outdated or deprecated APIs, warning about API misuse, authentication and exception scenarios, and efficient API alternatives.

Consider the pagination trait, implemented by over 1,000 APIs from more than 150 AWS services. The trait is commonly used when the response object is too large to return in a single response. To get the complete set of results, iterated calls to the API are required, until the last page is reached. If developers were not aware of this, they would write the code as the following (this example is patterned after actual code):

def sync_ddb_table(source_ddb, destination_ddb):
    response = source_ddb.scan(TableName=“table1”)
    for item in response['Items']:
        ...
        destination_ddb.put_item(TableName=“table2”, Item=item)
    …

Here the scan API is used to read items from one Amazon DynamoDB table and the put_item API to save them to another DynamoDB table. The scan API implements the Pagination trait. However, the developer missed iterating on the results beyond the first scan, leading to only partial copying of data.

The following screenshot shows what CodeGuru Reviewer recommends:

The following screenshot shows CodeGuru Reviewer recommends on the need for pagination

The developer fixed the code based on this recommendation and added complete handling of paginated results by checking the LastEvaluatedKey value in the response object of the paginated API scan as follows:

def sync_ddb_table(source_ddb, destination_ddb):
    response = source_ddb.scan(TableName==“table1”)
    for item in response['Items']:
        ...
        destination_ddb.put_item(TableName=“table2”, Item=item)
    # Keeps scanning util LastEvaluatedKey is null
    while "LastEvaluatedKey" in response:
        response = source_ddb.scan(
            TableName="table1",
            ExclusiveStartKey=response["LastEvaluatedKey"]
        )
        for item in response['Items']:
            destination_ddb.put_item(TableName=“table2”, Item=item)
    …

CodeGuru Reviewer recommendation is rich and offers multiple options for implementing Paginated scan. We can also initialize the ExclusiveStartKey value to None and iteratively update it based on the LastEvaluatedKey value obtained from the scan response object in a loop. This fix below conforms to the usage mentioned in the official documentation.

def sync_ddb_table(source_ddb, destination_ddb):
    table = source_ddb.Table(“table1”)
    scan_kwargs = {
                  …
    }
    done = False
    start_key = None
    while not done:
        if start_key:
            scan_kwargs['ExclusiveStartKey'] = start_key
        response = table.scan(**scan_kwargs)
        for item in response['Items']:
            destination_ddb.put_item(TableName=“table2”, Item=item)
        start_key = response.get('LastEvaluatedKey', None)
        done = start_key is None

Data structures and control flow

Python’s coding style is different from other languages. For code that does not conform to Python idioms, CodeGuru Reviewer provides a variety of suggestions for efficient and correct handling of data structures and control flow in the Python 3 standard library:

Using DefaultDict for compact handling of missing dictionary keys over using the setDefault() API or dealing with KeyError exception
Using a subprocess module over outdated APIs for subprocess handling
Detecting improper exception handling such as catching and passing generic exceptions that can hide latent issues.
Detecting simultaneous iteration and modification to loops that might lead to unexpected bugs because the iterator expression is only evaluated one time and does not account for subsequent index changes.

The following code is a specific example that can confuse novice developers.

def list_sns(region, creds, sns_topics=[]):
    sns = boto_session('sns', creds, region)
    response = sns.list_topics()
    for topic_arn in response["Topics"]:
        sns_topics.append(topic_arn["TopicArn"])
    return sns_topics
  
def process():
    ...
    for region, creds in jobs["auth_config"]:
        arns = list_sns(region, creds)
        ...

The process() method iterates over different AWS Regions and collects Regional ARNs by calling the list_sns() method. The developer might expect that each call to list_sns() with a Region parameter returns only the corresponding Regional ARNs. However, the preceding code actually leaks the ARNs from prior calls to subsequent Regions. This happens due to an idiosyncrasy of Python relating to the use of mutable objects as default argument values. Python default value are created exactly one time, and if that object is mutated, subsequent references to the object refer to the mutated value instead of re-initialization.

The following screenshot shows what CodeGuru Reviewer recommends:

The following screenshot shows CodeGuru Reviewer recommends about initializing a value for mutable objects

The developer accepted the recommendation and issued the below fix.

def list_sns(region, creds, sns_topics=None):
    sns = boto_session('sns', creds, region)
    response = sns.list_topics()
    if sns_topics is None: 
        sns_topics = [] 
    for topic_arn in response["Topics"]:
        sns_topics.append(topic_arn["TopicArn"])
    return sns_topics

Resource leaks

A Pythonic practice for resource handling is using Context Managers. Our analysis shows that resource leaks are rampant in Python code where a developer may open external files or windows and forget to close them eventually. A resource leak can slow down or crash your system. Even if a resource is closed, using Context Managers is Pythonic. For example, CodeGuru Reviewer detects resource leaks in the following code:

def read_lines(file):
    lines = []
    f = open(file, ‘r’)
    for line in f:
        lines.append(line.strip(‘\n’).strip(‘\r\n’))
    return lines

The following screenshot shows that CodeGuru Reviewer recommends that the developer either use the ContextLib with statement or use a try-finally block to explicitly close a resource.

The following screenshot shows CodeGuru Reviewer recommend about fixing the potential resource leak

The developer accepted the recommendation and fixed the code as shown below.

def read_lines(file):
    lines = []
    with open(file, ‘r’) as f: 
        for line in f:
            lines.append(line.strip(‘\n’).strip(‘\r\n’))
    return lines

Secure coding practices

Python is often used for scripting. An integral part of such scripts is the use of subprocesses. As of this writing, CodeGuru Reviewer makes a limited, but important set of recommendations to make sure that your use of eval functions or subprocesses is secure from potential shell injections. It issues a warning if it detects that the command used in eval or subprocess scenarios might be influenced by external factors. For example, see the following code:

def execute(cmd):
    try:
        retcode = subprocess.call(cmd, shell=True)
        ...
    except OSError as e:
        ...

The following screenshot shows the CodeGuru Reviewer recommendation:

The following screenshot shows CodeGuru Reviewer recommends about potential shell injection vulnerability

The developer accepted this recommendation and made the following fix.

def execute(cmd):
    try:
        retcode = subprocess.call(shlex.quote(cmd), shell=True)
        ...
    except OSError as e:
        ...

As shown in the preceding recommendations, not only are the code issues detected, but a detailed recommendation is also provided on how to fix the issues, along with a link to the Python official documentation. You can provide feedback on recommendations in the CodeGuru Reviewer console or by commenting on the code in a pull request. This feedback helps improve the performance of Reviewer so that the recommendations you see get better over time.

Now let’s take a look at CodeGuru Profiler.

CodeGuru Profiler for Python

Amazon CodeGuru Profiler analyzes your application’s performance characteristics and provides interactive visualizations to show you where your application spends its time. These visualizations a. k. a. flame graphs are a powerful tool to help you troubleshoot which code methods have high latency or are over utilizing your CPU.

Thanks to the new Python agent, you can now use CodeGuru Profiler on your Python applications to investigate performance issues.

The following list summarizes the supported versions as of this writing.

AWS Lambda functions: Python3.8, Python3.7, Python3.6
Other environments: Python3.9, Python3.8, Python3.7, Python3.6

Onboarding your Python application

For this post, let’s assume you have a Python application running on Amazon Elastic Compute Cloud (Amazon EC2) hosts that you want to profile. To onboard your Python application, complete the following steps:

1. Create a new profiling group in CodeGuru Profiler console called ProfilingGroupForMyApplication. Give access to your Amazon EC2 execution role to submit to this profiling group. See the documentation for details about how to create a Profiling Group.

2. Install the codeguru_profiler_agent module:

pip3 install codeguru_profiler_agent

3. Start the profiler in your application.

An easy way to profile your application is to start your script through the codeguru_profiler_agent module. If you have an app.py script, use the following code:

python -m codeguru_profiler_agent -p ProfilingGroupForMyApplication app.py

Alternatively, you can start the agent manually inside the code. This must be done only one time, preferably in your startup code:

from codeguru_profiler_agent import Profiler

if __name__ == "__main__":
     Profiler(profiling_group_name='ProfilingGroupForMyApplication')
     start_application()    # your code in there....

Onboarding your Python Lambda function

Onboarding for an AWS Lambda function is quite similar.

Create a profiling group called ProfilingGroupForMyLambdaFunction, this time we select “Lambda” for the compute platform. Give access to your Lambda function role to submit to this profiling group. See the documentation for details about how to create a Profiling Group.
Include the codeguru_profiler_agent module in your Lambda function code.
Add the with_lambda_profiler decorator to your handler function:

from codeguru_profiler_agent import with_lambda_profiler

@with_lambda_profiler(profiling_group_name='ProfilingGroupForMyLambdaFunction')
def handler_function(event, context):
      # Your code here

Alternatively, you can profile an existing Lambda function without updating the source code by adding a layer and changing the configuration. For more information, see Profiling your applications that run on AWS Lambda.

Profiling a Lambda function helps you see what is slowing down your code so you can reduce the duration, which reduces the cost and improves latency. You need to have continuous traffic on your function in order to produce a usable profile.

Viewing your profile

After running your profile for some time, you can view it on the CodeGuru console.

Screenshot of Flame graph visualization by CodeGuru Profiler

Each frame in the flame graph shows how much that function contributes to latency. In this example, an outbound call that crosses the network is taking most of the duration in the Lambda function, caching its result would improve the latency.

For more information, see Investigating performance issues with Amazon CodeGuru Profiler.

Supportability for CodeGuru Profiler is documented here.

If you don’t have an application to try CodeGuru Profiler on, you can use the demo application in the following GitHub repo.

Conclusion

This post introduced how to leverage CodeGuru Reviewer to identify hard-to-find code defects in various issue categories and how to onboard your Python applications or Lambda function in CodeGuru Profiler for CPU profiling. Combining both services can help you improve code quality for Python applications. CodeGuru is now available for you to try. For more pricing information, please see Amazon CodeGuru pricing.

About the Authors

Neela Sawant is a Senior Applied Scientist in the Amazon CodeGuru team. Her background is building AI-powered solutions to customer problems in a variety of domains such as software, multimedia, and retail. When she isn’t working, you’ll find her exploring the world anew with her toddler and hacking away at AI for social good.

Pierre Marieu is a Software Development Engineer in the Amazon CodeGuru Profiler team in London. He loves building tools that help the day-to-day life of other software engineers. Previously, he worked at Amadeus IT, building software for the travel industry.

Ran Fu is a Senior Product Manager in the Amazon CodeGuru team. He has a deep customer empathy, and love exploring who are the customers, what are their needs, and why those needs matter. Besides work, you may find him snowboarding in Keystone or Vail, Colorado.

Field Notes: Migrating File Servers to Amazon FSx and Integrating with AWS Managed Microsoft AD

2020-12-02 Kyaw Soe Hlaing

Post Syndicated from Kyaw Soe Hlaing original https://aws.amazon.com/blogs/architecture/field-notes-migrating-file-servers-to-amazon-fsx-and-integrating-with-aws-managed-microsoft-ad/

Amazon FSx provides AWS customers with the native compatibility of third-party file systems with feature sets for workloads such as Windows-based storage, high performance computing (HPC), machine learning, and electronic design automation (EDA). Amazon FSx automates the time-consuming administration tasks such as hardware provisioning, software configuration, patching, and backups. Since Amazon FSx integrates the file systems with cloud-native AWS services, this makes them even more useful for a broader set of workloads.

Amazon FSx for Windows File Server provides fully managed file storage that is accessible over the industry-standard Server Message Block (SMB) protocol. Built on Windows Server, Amazon FSx delivers a wide range of administrative features such as data deduplication, end-user file restore, and Microsoft Active Directory (AD) integration.

In this post, I explain how to migrate files and file shares from on-premises servers to Amazon FSx with AWS DataSync in a domain migration scenario. Customers are migrating their file servers to Amazon FSx as part of their migration from an on-premises Active Directory to AWS managed Active Directory. Their plan is to replace their file servers with Amazon FSx during Active Directory migration to AWS Managed AD.

Arhictecture diagram

Prerequisites

Before you begin, perform the steps outlined in this blog to migrate the user accounts and groups to the managed Active Directory.

Walkthrough

There are numerous ways to perform the Active Directory migration. Generally, the following five steps are taken:

Establish two-way forest trust between on-premises AD and AWS Managed AD
Migrate user accounts and group with the ADMT tool
Duplicate Access Control List (ACL) permissions in the file server
Migrate files and folders with existing ACL to Amazon FSx using AWS DataSync
Migrate User Computers

In this post, I focus on duplication of ACL permissions and migration of files and folders using Amazon FSx and AWS DataSync. In order to perform duplication of ACL permission in file servers, I use SubInACL tool, which is available from the Microsoft website.

Duplication of the ACL is required because users want to seamlessly access file shares once their computers are migrated to AWS Managed AD. Thus all migrated files and folders have permission with Managed AD users and group objects. For enterprises, the migration of user computers does not happen overnight. Normally, migration takes place in batches or phases. With ACL duplication, both migrated and non-migrated users can access their respective file shares seamlessly during and after migration.

Duplication of Access Control List (ACL)

Before we proceed with ACL duplication, we must ensure that the migration of user accounts and groups was completed. In my demo environment, I have already migrated on-premises users to the Managed Active Directory. In the meantime, we presume that we are migrating identical users to the Managed Active Directory. There might be a scenario where migrated user accounts have different naming such as samAccount name. In this case, we will need to handle this during ACL duplication with SubInACL. For more information about syntax, refer to the SubInACL documentation.

As indicated in following screenshots, I have two users created in the on-premises Active Directory (onprem.local) and those two identical users have been created in the Managed Active Directory too (corp.example.com).

Screenshot of on-premises Active Directory (onprem.local)

Screenshot of Active Directory

In the following screenshot, I have a shared folder called “HR_Documents” in an on-premises file server. Different users have different access rights to that folder. For example, John Smith has “Full Control” but Onprem User1 only have “Read & Execute”. Our plan is to add same access right to identical users from the Managed Active Directory, here corp.example.com, so that once John Smith is migrated to managed AD, he can access to shared folders in Amazon FSx using his Managed Active Directory credential.

Let’s verify the existing permission in the “HR_Documents” folder. Two users from onprem.local are found with different access rights.

Screenshot of HR docs

Now it’s time to install SubInACL.

We install it in our on-premises file server. After the SubInACL tool is installed, it can be found under “C:\Program Files (x86)\Windows Resource Kits\Tools” folder by default. To perform an ACL duplication, run command prompt as administrator and run the following command;

Subinacl /outputlog=C:\temp\HR_document_log.txt /errorlog=C:\temp\HR_document_Err_log.txt /Subdirectories C:\HR_Documents\* /migratetodomain=onprem=corp

There are several parameters that I am using in the command:

Outputlog = where log file is saved
ErrorLog = where error log file is saved
Subdirectories = to apply permissions including subfolders and files
Migratetodomain= NetBIOS name of source domain and destination domain

Screenshot windows resources kits

screenshot of windows resources kit

If the command is run successfully, you should able to see a summary of the results. If there is no error or failure, you can verify whether ACL permissions are duplicated as expected by looking at the folders and files. In our case, we can see that there is one ACL entry of identical account from corp.example.com is added.

Note: you will always see two ACL entries, one from onprem.local and another one from corp.example.com domain in all the files and folders that you used during migration. Permissions are now applied to both at the folder and file level.

screenshot of payroll properties

screenshot of doc 1 properties

Migrate files and folders using AWS DataSync

AWS DataSync is an online data transfer service that simplifies, automates, and accelerates moving data between on-premises storage systems and AWS Storage services such as Amazon S3, Amazon Elastic File System (Amazon EFS), or Amazon FSx for Windows File Server. Manual tasks related to data transfers can slow down migrations and burden IT operations. AWS DataSync reduces or automatically handles many of these tasks, including scripting copy jobs, scheduling and monitoring transfers, validating data, and optimizing network utilization.

Create an AWS DataSync agent

An AWS DataSync agent deploys as a virtual machine in an on-premises data center. An AWS DataSync agent can be run on ESXi, KVM, and Microsoft Hyper-V hypervisors. The AWS DataSync agent is used to access on-premises storage systems and transfer data to the AWS DataSync managed service running on AWS. AWS DataSync always performs incremental copies by comparing from a source to a destination and only copying files that are new or have changed.

AWS DataSync supports the following SMB (Server Message Block) locations to migrate data from:

Network File System (NFS)
Server Message Block (SMB)

In this blog, I use SMB as the source location, since I am migrating from an on-premises Windows File server. AWS DataSync supports SMB 2.1 and SMB 3.0 protocols.

AWS DataSync saves metadata and special files when copying to and from file systems. When files are copied from a SMB file share and Amazon FSx for Windows File Server, AWS DataSync copies the following metadata:

File timestamps: access time, modification time, and creation time
File owner and file group security identifiers (SIDs)
Standard file attributes
NTFS discretionary access lists (DACLs): access control entries (ACEs) that determine whether to grant access to an object

Data Synchronization with AWS DataSync

When a task starts, AWS DataSyc goes through different stages. It begins with examining file system follows by data transfer to destination. Once data transfer is completed, it performs verification for consistency between source and destination file systems. You can review detailed information about the data synchronization stages.

DataSync Endpoints

You can activate your agent by using one of the following endpoint types:

Public endpoints – If you use public endpoints, all communication from your DataSync agent to AWS occurs over the public internet.
Federal Information Processing Standard (FIPS) endpoints – If you need to use FIPS 140-2 validated cryptographic modules when accessing the AWS GovCloud (US-East) or AWS GovCloud (US-West) Region, use this endpoint to activate your agent. You use the AWS CLI or API to access this endpoint.
Virtual private cloud (VPC) endpoints – If you use a VPC endpoint, all communication from AWS DataSync to AWS services occurs through the VPC endpoint in your VPC in AWS. This approach provides a private connection between your self-managed data center, your VPC, and AWS services. It increases the security of your data as it is copied over the network.

In my demo environment, I have implemented AWS DataSync as indicated in following diagram. The DataSync Agent can be run either on VMware or Hyper-V and KVM platform in a customer on-premises data center.

Datasync Agent Arhictecture

Once the AWS DataSync Agent setup is completed and the task that defined the source file servers and destination Amazon FSx server is added, you can verify agent status in the AWS Management Console.

Console screenshot

Select Task and then choose Start to start copying files and folders. This will start the replication task (or you can wait until the task runs hourly). You can check the History tab to see a history of the replication task executions.

Console screenshot

Congratulations! You have replicated the contents of an on-premises file server to Amazon FSx. Let’s look and make sure the ACL permissions are still intact in their destination after migration. As shown in the following screenshots, the ACL permissions in the Payroll folder still remains as is, both on-premises users and Managed AD users are inside. Once the user’s computers are migrated to the Managed AD, they can access the same file share in Amazon FSx server using Managed AD credentials.

Payroll properties screenshot

Cleaning up

If you are performing testing by following the preceding steps in your own account, delete the following resources, to avoid incurring future charges:

EC2 instances
Managed AD
Amazon FSx file system
AWS Datasync

Conclusion

You have learned how to duplicate ACL permissions and shared folder permissions during migration of file servers to Amazon FSx. This process provides a seamless migration experience for users. Once the user’s computers are migrated to the Managed AD, they only need to remap shared folders from Amazon FSx. This can be automated by pushing down shared folders mapping with a Group Policy. If new files or folders are created in the source file server, AWS Datasync will synchronize to Amazon FSx server.

For customers who are planning to do a domain migration from on-premises to AWS Managed Microsoft AD, migration of resources like file servers are common. Handling ACL permissions plays a vital role in providing a seamless migration experience. The duplication of ACL can be an option, otherwise, the ADMT tool can be used to migrate SID information from the source Domain to destination Domain. To migrate SID history, SID filtering needs to be disabled during migration.

If you want to provide feedback about this post, you are welcome to submit in the comments section below.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Incorporating security in code-reviews using Amazon CodeGuru Reviewer

2020-12-01 Nikunj Vaidya

Post Syndicated from Nikunj Vaidya original https://aws.amazon.com/blogs/devops/incorporating-security-in-code-reviews-using-amazon-codeguru-reviewer/

Today, software development practices are constantly evolving to empower developers with tools to maintain a high bar of code quality. Amazon CodeGuru Reviewer offers this capability by carrying out automated code-reviews for developers, based on the trained machine learning models that can detect complex defects and providing intelligent actionable recommendations to mitigate those defects. A quick overview of CodeGuru is covered in this blog post.

Security analysis is a critical part of a code review and CodeGuru Reviewer offers this capability with a new set of security detectors. These security detectors introduced in CodeGuru Reviewer are geared towards identifying security risks from the top 10 OWASP categories and ensures that your code follows best practices for AWS Key Management Service (AWS KMS), Amazon Elastic Compute Cloud (Amazon EC2) API, and common Java crypto and TLS/SSL libraries. As of today, CodeGuru security analysis supports Java language, thus we will take an example of a Java application.

In this post, we will walk through the on-boarding workflow to carry out the security analysis of the code repository and generate recommendations for a Java application.

Security workflow overview:

The new security workflow, introduced for CodeGuru reviewer, utilizes the source code and build artifacts to generate recommendations. The security detector evaluates build artifacts to generate security-related recommendations whereas other detectors continue to scan the source code to generate recommendations. With the use of build artifacts for evaluation, the detector can carry out a whole-program inter-procedural analysis to discover issues that are caused across your code (e.g., hardcoded credentials in one file that are passed to an API in another) and can reduce false-positives by checking if an execution path is valid or not. You must provide the source code .zip file as well as the build artifact .zip file for a complete analysis.

Customers can run a security scan when they create a repository analysis. CodeGuru Reviewer provides an additional option to get both code and security recommendations. As explained in the following sections, CodeGuru Reviewer will create an Amazon Simple Storage Service (Amazon S3) bucket in your AWS account for that region to upload or copy your source code and build artifacts for the analysis. This repository analysis option can be run on Java code from any repository.

Prerequisites

Prepare the source code and artifact zip files: If you do not have your Java code locally, download the source code that you want to evaluate for security and zip it. Similarly, if needed, download the build artifact .jar file for your source code and zip it. It will be required to upload the source code and build artifact as separate .zip files as per the instructions in subsequent sections. Thus even if it is a single file (eg. single .jar file), you will still need to zip it. Even if the .zip file includes multiple files, the right files will be discovered and analyzed by CodeGuru. For our sample test, we will use src.zip and jar.zip file, saved locally.

Creating an S3 bucket repository association:

This section summarizes the high-level steps to create the association of your S3 bucket repository.

1. On the CodeGuru console, choose Code reviews.

2. On the Repository analysis tab, choose Create repository analysis.

Figure: Screenshot of initiating the repository analysis

3. For the source code analysis, select Code and security recommendations.

4. For Repository name, enter a name for your repository.

5. Under Additional settings, for Code review name, enter a name for trackability purposes.

6. Choose Create S3 bucket and associate.

Figure: Screenshot to show selection of Security Code Analysis

It takes a few seconds to create a new S3 bucket in the current Region. When it completes, you will see the below screen.

Figure: Screenshot for Create repository analysis showing S3 bucket created

7. Choose Upload to the S3 bucket option and under that choose Upload source code zip file and select the zip file (src.zip) from your local machine to upload.

Figure: Screenshot of popup to upload code and artifacts from S3 bucket

8. Similarly, choose Upload build artifacts zip file and select the zip file (jar.zip) from your local machine and upload.

Figure: Screenshot for Create repository analysis showing S3 paths populated

Alternatively, you can always upload the source code and build artifacts as zip file from any of your existing S3 bucket as below.

9. Choose Browse S3 buckets for existing artifacts and upload from there as shown below:

Screenshot to upload code and artifacts from S3 bucket

Figure: Screenshot to upload code and artifacts from an existing S3 bucket

10. Now click Create repository analysis and trigger the code review.

A new pending entry is created as shown below.

Figure: Screenshot of code review in Pending state

After a few minutes, you would see the recommendations generate that would include security analysis too. In the below case, there are 10 recommendations generated.

Figure: Screenshot of repository analysis being completed

For the subsequent code reviews, you can use the same repository and upload new files or create a new repository as shown below:

Figure: Screenshot of subsequent code review making repository selection

Recommendations

Apart from detecting the security risks from the top 10 OWASP categories, the security detector, detects the deep security issues by analyzing data flow across multiple methods, procedures, and files.

The recommendations generated in the area of security are labelled as Security. In the below example we see a recommendation to remove hard-coded credentials and a non-security-related recommendation about refactoring of code for better maintainability.

Figure: Screenshot of Recommendations generated

Below is another example of recommendations pointing out the potential resource-leak as well as a security issue pointing to a potential risk of path traversal attack.

Screenshot of deep security recommendations

Figure: More examples of deep security recommendations

As this blog is focused on on-boarding aspects, we will cover the explanation of recommendations in more detail in a separate blog.

Disassociation of Repository (optional):

The association of CodeGuru to the S3 bucket repository can be removed by following below steps. Navigate to the Repositories page, select the repository and choose Disassociate repository.

Figure: Screenshot of disassociating the S3 bucket repo with CodeGuru

Conclusion

This post reviewed the support for on-boarding workflow to carry out the security analysis in CodeGuru Reviewer. We initiated a full repository analysis for the Java code using a separate UI workflow and generated recommendations.

We hope this post was useful and would enable you to conduct code analysis using Amazon CodeGuru Reviewer.

About the Author

Nikunj Vaidya is a Sr. Solutions Architect with Amazon Web Services, focusing in the area of DevOps services. He builds technical content for the field enablement and offers technical guidance to the customers on AWS DevOps solutions and services that would streamline the application development process, accelerate application delivery, and enable maintaining a high bar of software quality.

Using Amazon SQS dead-letter queues to replay messages

2020-11-25 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-amazon-sqs-dead-letter-queues-to-replay-messages/

Amazon Simple Queue Service (Amazon SQS) is a fully managed message queuing service. It enables you to decouple and scale microservices, distributed systems, and serverless applications. A commonly used feature of Amazon SQS is dead-letter queues. The DLQ (dead-letter queue) is used to store messages that can’t be processed (consumed) successfully.

This post describes how to add automated resilience to an existing SQS queue. It monitors the dead-letter queue and moves a message back to the main queue to see if it can be processed again. It also uses a specific algorithm to make sure this is not repeated forever. Each time it attempts to reprocess the message, the replay time increases until the message is finally considered dead.

I use Amazon SQS dead-letter queues, AWS Lambda, and a specific algorithm to decrease the rate of retries for failed messages. I then package and publish this serverless solution in the AWS Serverless Application Repository.

Dead-letter queues and message replay

The main task of a dead-letter queue (DLQ) is to handle message failure. It allows you to set aside and isolate non-processed messages to determine why processing failed. Often these failed messages are caused by application errors. For example, a consumer application fails to parse a message correctly and throws an unhandled exception. This exception then triggers an error response that sends the message to the DLQ. The AWS documentation contains a tutorial detailing the configuration of an Amazon SQS dead-letter queue.

To process the failed messages, I build a retry mechanism by implementing an exponential backoff algorithm. The idea behind exponential backoff is to use progressively longer waits between retries for consecutive error responses. Most exponential backoff algorithms use jitter (randomized delay) to prevent successive collisions. This spreads the message retries more evenly across time, allowing them to be processed more efficiently.

Solution overview

The flow of the message sent by the producer to SQS is as follows:

The producer application sends a message to an SQS queue
The consumer application fails to process the message in the same SQS queue
The message is moved from the main SQS queue to the default dead-letter queue as per the component settings.
A Lambda function is configured with the SQS main dead-letter queue as an event source. It receives and sends back the message to the original queue adding a message timer.
The message timer is defined by the exponential backoff and jitter algorithm.
You can limit the number of retries. If the message exceeds this limit, the message is moved to a second DLQ where an operator processes it manually.

How the replay function works

Each time the SQS dead-letter queue receives a message, it triggers Lambda to run the replay function. The replay code uses an SQS message attribute `sqs-dlq-replay-nb` as a persistent counter for the current number of retries attempted. The number of retries is compared to the maximum number (defined in the application configuration file). If it exceeds the maximum, the message is moved to the human operated queue. If not, the function uses the AWS Lambda event data to build a new message for the Amazon SQS main queue. Finally it updates the retry counter, adds a new message timer to the message, and it sends the message back (replays) to the main queue.

def handler(event, context):
    """Lambda function handler."""
    for record in event['Records']:
        nbReplay = 0
        # number of replay
        if 'sqs-dlq-replay-nb' in record['messageAttributes']:
            nbReplay = int(record['messageAttributes']['sqs-dlq-replay-nb']["stringValue"])

        nbReplay += 1
        if nbReplay > config.MAX_ATTEMPS:
            raise MaxAttempsError(replay=nbReplay, max=config.MAX_ATTEMPS)

        # SQS attributes
        attributes = record['messageAttributes']
        attributes.update({'sqs-dlq-replay-nb': {'StringValue': str(nbReplay), 'DataType': 'Number'}})

        _sqs_attributes_cleaner(attributes)

        # Backoff
        b = backoff.ExpoBackoffFullJitter(base=config.BACKOFF_RATE, cap=config.MESSAGE_RETENTION_PERIOD)
        delaySeconds = b.backoff(n=int(nbReplay))

        # SQS
        SQS.send_message(
            QueueUrl=config.SQS_MAIN_URL,
            MessageBody=record['body'],
            DelaySeconds=int(delaySeconds),
            MessageAttributes=record['messageAttributes']
        )

How to use the application

You can use this serverless application via:

The Lambda console: choose the “Browse serverless app repository” option to create a function. Select “amazon-sqs-dlq-replay-backoff” application in the public applications repository. Then, configure the application with the default SQS parameters and the replay feature parameters.
The Serverless Framework, as described by Yan Cui in this blog post.
An AWS CloudFormation template by using the AWS::ServerlessRepo::Application resource, as described in the documentation.

Here is an example of a CloudFormation template using the AWS Serverless Application Repository application:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  ReplaySqsQueue:
    Type: AWS::Serverless::Application
    Properties:
      Location: 
        ApplicationId: arn:aws:serverlessrepo:eu-west-1:1234123412:applications~sqs-dlq-replay
        SemanticVersion: 1.0.0
      Parameters:
        BackoffRate: "2"
        MaxAttempts: "3"

Conclusion

I describe how an exponential backoff algorithm (with jitter) enhances the message processing capabilities of an Amazon SQS queue. You can now find the amazon-sqs-dlq-replay-backoff application in the AWS Serverless Application Repository. Download the code from this GitHub repository.

To get started with dead-letter queues in Amazon SQS, read:

To implement replay mechanisms, see:

For more serverless learning resources, visit https://serverlessland.com.

Field Notes: Setting Up Disaster Recovery in a Different Seismic Zone Using AWS Outposts

2020-11-24 Vijay Menon

Post Syndicated from Vijay Menon original https://aws.amazon.com/blogs/architecture/field-notes-setting-up-disaster-recovery-in-a-different-seismic-zone-using-aws-outposts/

Recovering your mission-critical workloads from outages is essential for business continuity and providing services to customers with little or no interruption. That’s why many customers replicate their mission-critical workloads in multiple places using a Disaster Recovery (DR) strategy suited for their needs.

With AWS, a customer can achieve this by deploying multi Availability Zone High-Availability setup or a multi-region setup by replicating critical components of an application to another region. Depending on the RPO and RTO of the mission-critical workload, the requirement for disaster recovery ranges from simple backup and restore, to multi-site, active-active, setup. In this blog post, I explain how AWS Outposts can be used for DR on AWS.

In many geographies, it is possible to set up your disaster recovery for a workload running in one AWS Region to another AWS Region in the same country (for example in US between us-east-1 and us-west-2). For countries where there is only one AWS Region, it’s possible to set up disaster recovery in another country where AWS Region is present. This method can be designed for the continuity, resumption and recovery of critical business processes at an agreed level and limits the impact on people, processes and infrastructure (including IT). Other reasons include to minimize the operational, financial, legal, reputational and other material consequences arising from such events.

However, for mission-critical workloads handling critical user data (PII, PHI or financial data), countries like India and Canada have regulations which mandate to have a disaster recovery setup at a “safe distance” within the same country. This ensures compliance with any data sovereignty or data localization requirements mandated by the regulators. “Safe distance” means the distance between the DR site and the primary site is such that the business can continue to operate in the event of any natural disaster or industrial events affecting the primary site. Depending on the geography, this safe distance could be 50KM or more. These regulations limit the options customers have to use another AWS Region in another country as a disaster recovery site of their primary workload running on AWS.

In this blog post, I describe an architecture using AWS Outposts which helps set up disaster recovery on AWS within the same country at a distance that can meet the requirements set by regulators. This architecture also helps customers to comply with various data sovereignty regulations in a given country. Another advantage of this architecture is the homogeneity of the primary and disaster recovery site. Your existing IT teams can set up and operate the disaster recovery site using familiar AWS tools and technology in a homogenous environment.

Prerequisites

Readers of this blog post should be familiar with basic networking concepts like WAN connectivity, BGP and the following AWS services:

Amazon EC2
Amazon VPC
AWS Outposts
AWS Transit Gateway
AWS Managed VPN
AWS Direct Connect
AWS Marketplace Amazon Machine Images (AMI) for Network Infrastructure

Architecture Overview

I explain the architecture using an example customer scenario in India, where a customer is using AWS Mumbai Region for their mission-critical workload. This workload needs a DR setup to comply with local regulation and the DR setup needs to be in a different seismic zone than the one for Mumbai. Also, because of the nature of the regulated business, the user/sensitive data needs to be stored within India.

Following is the architecture diagram showing the logical setup.

This solution is similar to a typical AWS Outposts use case where a customer orders the Outposts to be installed in their own Data Centre (DC) or a CoLocation site (Colo). It will follow the shared responsibility model described in AWS Outposts documentation.

The only difference is that the AWS Outpost parent Region will be the closest Region other than AWS Mumbai, in this case Singapore. Customers will then provision an AWS Direct Connect public VIF locally for a Service Link to the Singapore Region. This ensures that the control plane stays available via the AWS Singapore Region even if there is an outage in AWS Mumbai Region affecting control plane availability. You can then launch and manage AWS Outposts supported resources in the AWS Outposts rack.

For data plane traffic, which should not go out of the country, the following options are available:

Provision a self-managed Virtual Private Network (VPN) between an EC2 instances running router AMI in a subnet of AWS Outposts and AWS Transit Gateway (TGW) in the primary Region.
Provision a self-managed Virtual Private Network (VPN) between an EC2 instances running router AMI in a subnet of AWS Outposts and Virtual Private Gateway (VGW) in the primary Region.

Note: The Primary Region in this example is AWS Mumbai Region. This VPN will be provisioned via Local Gateway and DX public VIF. This ensures that data plane traffic will not traverse any network out of the country (India) to comply with data localization mandated by the regulators.

Architecture Walkthrough

Make sure your data center (DC) or the choice of collocate facility (Colo) meets the requirements for AWS Outposts.
Create an Outpost and order Outpost capacity as described in the documentation. Make sure that you do this step while logged into AWS Outposts console of the AWS Singapore Region.
Provision connectivity between AWS Outposts and network of your DC/Colo as mentioned in AWS Outpost documentation. This includes setting up VLANs for service links and Local Gateway (LGW).
Provision an AWS Direct Connect connection and public VIF between your DC/Colo and the primary Region via the closest AWS Direct Connect location.
- For the WAN connectivity between your DC/Colo and AWS Direct Connect location you can choose any telco provider of your choice or work with one of AWS Direct Connect partners.
- This public VIF will be used to attach AWS Outposts to its parent Region in Singapore over AWS Outposts service link. It will also be used to establish an IPsec GRE tunnel between AWS Outposts subnet and a TGW or VGW for data plane traffic (explained in subsequent steps).
- Alternatively, you can provision separate Direct Connect connection and public VIFs for Service Link and data plane traffic for better segregation between the two. You will have to provision sufficient bandwidth on Direct Connect connection for the Service Link traffic as well as the Data Plane traffic (like data replication between primary Region and AWS outposts).
- For an optimal experience and resiliency, AWS recommends that you use dual 1Gbps connections to the AWS Region. This connectivity can also be achieved over Internet transit; however, I recommend using AWS Direct Connect because it provides private connectivity between AWS and your DC/Colo environment, which in many cases can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.
Create a subnet in AWS Outposts and launch an EC2 instance running a router AMI of your choice from AWS Marketplace in this subnet. This EC2 instance is used to establish the IPsec GRE tunnel to the TGW or VGW in primary Region.
- Choose an EC2 instance which can support your bandwidth requirement as per the AMI provider and disable source/destination check for this EC2 instance.
- Assign an Elastic IP address to this EC2 instance from the customer-owned pool provisioned for your AWS Outposts.
Add rules in security group of these EC2 instances to allow ISAKMP (UDP 500), NAT Traversal (UDP 4500), and ESP (IP Protocol 50) from VGW or TGW endpoint public IP addresses.
NAT (Network Address Translation) the EIP assigned in step 5 to a public IP address at your edge router connecting to AWS Direct connect or internet transit. This public IP will be used as the customer gateway to establish IPsec GRE tunnel to the primary Region.
Create a customer gateway using the public IP address used to NAT the EC2 instances step 7. Follow the steps in similar process found at Create a Customer Gateway.
Create a VPN attachment for the transit gateway using the customer gateway created in step 8. This VPN must be a dynamic route-based VPN. For steps, review Transit Gateway VPN Attachments. If you are connecting the customer gateway to VPC using VGW in primary Region then follow the steps mentioned at How do I create a secure connection between my office network and Amazon Virtual Private Cloud?.
Configure the customer gateway (EC2 instance running a router AMI in AWS Outposts subnet) side for VPN connectivity. You can base this configuration suggested by AWS during the creation of VPN in step 9. This suggested sample configuration can be downloaded from AWS console post VPN setup as discussed in this document.
Modify the route table of AWS outpost Subnets to point to the EC2 instance launched in step 5 as the target for any destination in your VPCs in the primary Region, which is AWS Mumbai in this example.

At this point, you will have end-to-end connectivity between VPCs in a primary Region and resources in an AWS Outposts. This connectivity can now be used to replicate data from your primary site to AWS Outposts for DR purposes. This keeps the setup compliant with any internal or external data localization requirements.

Conclusion

In this blog post, I described an architecture using AWS Outposts for Disaster Recovery on AWS in countries without a second AWS Region. To set up disaster recovery, your existing IT teams can set up and operate the disaster recovery site using the familiar AWS tools and technology in a homogeneous environment. To learn more about AWS Outposts, refer to the documentation and FAQ.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Using NuGet with AWS CodeArtifact

2020-11-24 John Standish

Post Syndicated from John Standish original https://aws.amazon.com/blogs/devops/using-nuget-with-aws-codeartifact/

Managing NuGet packages for .NET development can be a challenge. Tasks such as initial configuration, ongoing maintenance, and scaling inefficiencies are the biggest pain points for developers and organizations. With its addition of NuGet package support, AWS CodeArtifact now provides easy-to-configure and scalable package management for .NET developers. You can use NuGet packages stored in CodeArtifact in Visual Studio, allowing you to use the tools you already know.

In this post, we show how you can provision NuGet repositories in 5 minutes. Then we demonstrate how to consume packages from your new NuGet repositories, all while using .NET native tooling.

All relevant code for this post is available in the aws-codeartifact-samples GitHub repo.

Prerequisites

For this walkthrough, you should have the following prerequisites:

AWS Account
AWS Command Line Interface (AWS CLI). For instructions, see Installing, updating, and uninstalling the AWS CLI version 2
AWS Toolkit for Visual Studio
The latest version of NuGet. For instructions, see Install NuGet client tools
- Alternatively, use .NET Core. For instructions, see Download .NET
Visual Studio IDE (Community, Professional or Enterprise)

Architecture overview

Two core resource types make up CodeArtifact: domains and repositories. Domains provide an easy way manage multiple repositories within an organization. Repositories store packages and their assets. You can connect repositories to other CodeArtifact repositories, or popular public package repositories such as nuget.org, using upstream and external connections. For more information about these concepts, see AWS CodeArtifact Concepts.

The following diagram illustrates this architecture.

Figure: AWS CodeArtifact core concepts

Creating CodeArtifact resources with AWS CloudFormation

The AWS CloudFormation template provided in this post provisions three CodeArtifact resources: a domain, a team repository, and a shared repository. The team repository is configured to use the shared repository as an upstream repository, and the shared repository has an external connection to nuget.org.

The following diagram illustrates this architecture.

Figure: Example AWS CodeArtifact architecture

The following CloudFormation template used in this walkthrough:

AWSTemplateFormatVersion: '2010-09-09'
Description: AWS CodeArtifact resources for dotnet

Resources:
  # Create Domain
  ExampleDomain:
    Type: AWS::CodeArtifact::Domain
    Properties:
      DomainName: example-domain
      PermissionsPolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              AWS: 
              - !Sub arn:aws:iam::${AWS::AccountId}:root
            Resource: "*"
            Action:
              - codeartifact:CreateRepository
              - codeartifact:DescribeDomain
              - codeartifact:GetAuthorizationToken
              - codeartifact:GetDomainPermissionsPolicy
              - codeartifact:ListRepositoriesInDomain

  # Create External Repository
  MyExternalRepository:
    Type: AWS::CodeArtifact::Repository
    Condition: ProvisionNugetTeamAndUpstream
    Properties:
      DomainName: !GetAtt ExampleDomain.Name
      RepositoryName: my-external-repository       
      ExternalConnections:
        - public:nuget-org
      PermissionsPolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              AWS: 
              - !Sub arn:aws:iam::${AWS::AccountId}:root
            Resource: "*"
            Action:
              - codeartifact:DescribePackageVersion
              - codeartifact:DescribeRepository
              - codeartifact:GetPackageVersionReadme
              - codeartifact:GetRepositoryEndpoint
              - codeartifact:ListPackageVersionAssets
              - codeartifact:ListPackageVersionDependencies
              - codeartifact:ListPackageVersions
              - codeartifact:ListPackages
              - codeartifact:PublishPackageVersion
              - codeartifact:PutPackageMetadata
              - codeartifact:ReadFromRepository

  # Create Repository
  MyTeamRepository:
    Type: AWS::CodeArtifact::Repository
    Properties:
      DomainName: !GetAtt ExampleDomain.Name
      RepositoryName: my-team-repository
      Upstreams:
        - !GetAtt MyExternalRepository.Name
      PermissionsPolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              AWS: 
              - !Sub arn:aws:iam::${AWS::AccountId}:root
            Resource: "*"
            Action:
              - codeartifact:DescribePackageVersion
              - codeartifact:DescribeRepository
              - codeartifact:GetPackageVersionReadme
              - codeartifact:GetRepositoryEndpoint
              - codeartifact:ListPackageVersionAssets
              - codeartifact:ListPackageVersionDependencies
              - codeartifact:ListPackageVersions
              - codeartifact:ListPackages
              - codeartifact:PublishPackageVersion
              - codeartifact:PutPackageMetadata
              - codeartifact:ReadFromRepository

Getting the CloudFormation template

To use the CloudFormation stack, we recommend you clone the following GitHub repo so you also have access to the example projects. See the following code:

git clone https://github.com/aws-samples/aws-codeartifact-samples.git
cd aws-codeartifact-samples/getting-started/dotnet/cloudformation/

Alternatively, you can copy the previous template into a file on your local filesystem named deploy.yml.

Provisioning the CloudFormation stack

Now that you have a local copy of the template, you need to provision the resources using a CloudFormation stack. You can deploy the stack using the AWS CLI or on the AWS CloudFormation console.

To use the AWS CLI, enter the following code:

aws cloudformation deploy \
--template-file deploy.yml \
--region <YOUR_PREFERRED_REGION> \
--stack-name CodeArtifact-GettingStarted-DotNet

To use the AWS CloudFormation console, complete the following steps:

On the AWS CloudFormation console, choose Create stack.
Choose With new resources (standard).
Select Upload a template file.
Choose Choose file.
Name the stack CodeArtifact-GettingStarted-DotNet.
Continue to choose Next until prompted to create the stack.

Configuring your local development experience

We use the CodeArtifact credential provider to connect the Visual Studio IDE to a CodeArtifact repository. You need to download and install the AWS Toolkit for Visual Studio to configure the credential provider. The toolkit is an extension for Microsoft Visual Studio on Microsoft Windows that makes it easy to develop, debug, and deploy .NET applications to AWS. The credential provider automates fetching and refreshing the authentication token required to pull packages from CodeArtifact. For more information about the authentication process, see AWS CodeArtifact authentication and tokens.

To connect to a repository, you complete the following steps:

Configure an account profile in the AWS Toolkit.
Copy the source endpoint from the AWS Explorer.
Set the NuGet package source as the source endpoint.
Add packages for your project via your CodeArtifact repository.

Configuring an account profile in the AWS Toolkit

Before you can use the Toolkit for Visual Studio, you must provide a set of valid AWS credentials. In this step, we set up a profile that has access to interact with CodeArtifact. For instructions, see Providing AWS Credentials.

Figure: Visual Studio Toolkit for AWS Account Profile Setup

Copying the NuGet source endpoint

After you set up your profile, you can see your provisioned repositories.

In the AWS Explorer pane, navigate to the repository you want to connect to.
Choose your repository (right-click).
Choose Copy NuGet Source Endpoint.

Figure: AWS CodeArtifact repositories shown in the AWS Explorer

You use the source endpoint later to configure your NuGet package sources.

Setting the package source using the source endpoint

Now that you have your source endpoint, you can set up the NuGet package source.

In Visual Studio, under Tools, choose Options.
Choose NuGet Package Manager.
Under Options, choose the + icon to add a package source.
For Name , enter codeartifact.
For Source, enter the source endpoint you copied from the previous step.

Figure: Configuring NuGet package sources for AWS CodeArtifact

Adding packages via your CodeArtifact repository

After the package source is configured against your team repository, you can pull packages via the upstream connection to the shared repository.

Choose Manage NuGet Packages for your project.
- You can now see packages from nuget.org.
Choose any package to add it to your project.

Exploring packages while connected to a AWS CodeArtifact repository

Viewing packages stored in your CodeArtifact team repository

Packages are stored in a repository you pull from, or referenced via the upstream connection. Because we’re pulling packages from nuget.org through an external connection, you can see cached copies of those packages in your repository. To view the packages, navigate to your repository on the CodeArtifact console.

Packages stored in a AWS CodeArtifact repository

Cleaning Up

When you’re finished with this walkthrough, you may want to remove any provisioned resources. To remove the resources that the CloudFormation template created, navigate to the stack on the AWS CloudFormation console and choose Delete Stack. It may take a few minutes to delete all provisioned resources.

After the resources are deleted, there are no more cleanup steps.

Conclusion

We have shown you how to set up CodeArtifact in minutes and easily integrate it with NuGet. You can build and push your package faster, from hours or days to minutes. You can also integrate CodeArtifact directly in your Visual Studio environment with four simple steps. With CodeArtifact repositories, you inherit the durability and security posture from the underlying storage of CodeArtifact for your packages.

As of November 2020, CodeArtifact is available in the following AWS Regions:

US: US East (Ohio), US East (N. Virginia), US West (Oregon)
AP: Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo)
EU: Europe (Frankfurt), Europe (Ireland), Europe (Stockholm)

For an up-to-date list of Regions where CodeArtifact is available, see AWS CodeArtifact FAQ.

About the Authors

John Standish

John Standish is a Solutions Architect at AWS and spent over 13 years as a Microsoft .Net developer. Outside of work, he enjoys playing video games, cooking, and watching hockey.

Nuatu Tseggai

Nuatu Tseggai is a Cloud Infrastructure Architect at Amazon Web Services. He enjoys working with customers to design and build event-driven distributed systems that span multiple services.

Neha Gupta

Neha Gupta is a Solutions Architect at AWS and have 16 years of experience as a Database architect/ DBA. Apart from work, she’s outdoorsy and loves to dance.

Elijah Batkoski

Elijah is a Technical Writer for Amazon Web Services. Elijah has produced technical documentation and blogs for a variety of tools and services, primarily focused around DevOps.

Publishing private npm packages with AWS CodeArtifact

2020-11-20 Ryan Sonshine

Post Syndicated from Ryan Sonshine original https://aws.amazon.com/blogs/devops/publishing-private-npm-packages-aws-codeartifact/

This post demonstrates how to create, publish, and download private npm packages using AWS CodeArtifact, allowing you to share code across your organization without exposing your packages to the public.

The ability to control CodeArtifact repository access using AWS Identity and Access Management (IAM) removes the need to manage additional credentials for a private npm repository when developers already have IAM roles configured.

You can use private npm packages for a variety of use cases, such as:

Reducing code duplication
Configuration such as code linting and styling
CLI tools for internal processes

This post shows how to easily create a sample project in which we publish an npm package and install the package from CodeArtifact. For more information about pipeline integration, see AWS CodeArtifact and your package management flow – Best Practices for Integration.

Solution overview

The following diagram illustrates this solution.

Diagram showing npm package publish and install with CodeArtifact

In this post, you create a private scoped npm package containing a sample function that can be used across your organization. You create a second project to download the npm package. You also learn how to structure your npm package to make logging in to CodeArtifact automatic when you want to build or publish the package.

The code covered in this post is available on GitHub:

aws-codeartifact-npm-example – Sample npm package and application

Prerequisites

Before you begin, you need to complete the following:

Create an AWS account.
Install the AWS Command Line Interface (AWS CLI). CodeArtifact is supported in these CLI versions:
1. 18.83 or later: install the AWS CLI version 1
2. 0.54 or later: install the AWS CLI version 2
Create a CodeArtifact repository.
Add required IAM permissions for CodeArtifact.

Creating your npm package

You can create your npm package in three easy steps: set up the project, create your npm script for authenticating with CodeArtifact, and publish the package.

Setting up your project

Create a directory for your new npm package. We name this directory my-package because it serves as the name of the package. We use an npm scope for this package, where @myorg represents the scope all of our organization’s packages are published under. This helps us distinguish our internal private package from external packages. See the following code:

npm init --scope=@myorg -y

{
  "name": "@myorg/my-package",
  "version": "1.0.0",
  "description": "A sample private scoped npm package",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  }
}

The package.json file specifies that the main file of the package is called index.js. Next, we create that file and add our package function to it:

module.exports.helloWorld = function() {
  console.log('Hello world!');
}

Creating an npm script

To create your npm script, complete the following steps:

On the CodeArtifact console, choose the repository you created as part of the prerequisites.

If you haven’t created a repository, create one before proceeding.

CodeArtifact repository details console

Select your CodeArtifact repository and choose Details to view the additional details for your repository.

We use two items from this page:

Repository name (my-repo)
Domain (my-domain)

Create a script named co:login in our package.json. The package.json contains the following code:

{
  "name": "@myorg/my-package",
  "version": "1.0.0",
  "description": "A sample private scoped npm package",
  "main": "index.js",
  "scripts": {
    "co:login": "aws codeartifact login --tool npm --repository my-repo --domain my-domain",
    "test": "echo \"Error: no test specified\" && exit 1"
  }
}

Running this script updates your npm configuration to use your CodeArtifact repository and sets your authentication token, which expires after 12 hours.

To test our new script, enter the following command:

npm run co:login

The following code is the output:

> aws codeartifact login --tool npm --repository my-repo --domain my-domain
Successfully configured npm to use AWS CodeArtifact repository https://my-domain-<ACCOUNT ID>.d.codeartifact.us-east-1.amazonaws.com/npm/my-repo/
Login expires in 12 hours at 2020-09-04 02:16:17-04:00

Add a prepare script to our package.json to run our login command:

{
  "name": "@myorg/my-package",
  "version": "1.0.0",
  "description": "A sample private scoped npm package",
  "main": "index.js",
  "scripts": {
    "prepare": "npm run co:login",
    "co:login": "aws codeartifact login --tool npm --repository my-repo --domain my-domain",
    "test": "echo \"Error: no test specified\" && exit 1"
  }
}

This configures our project to automatically authenticate and generate an access token anytime npm install or npm publish run on the project.

If you see an error containing Invalid choice, valid choices are:, you need to update the AWS CLI according to the versions listed in the perquisites of this post.

Publishing your package

To publish our new package for the first time, run npm publish.

The following screenshot shows the output.

Terminal showing npm publish output

If we navigate to our CodeArtifact repository on the CodeArtifact console, we now see our new private npm package ready to be downloaded.

CodeArtifact console showing published npm package

Installing your private npm package

To install your private npm package, you first set up the project and add the CodeArtifact configs. After you install your package, it’s ready to use.

Setting up your project

Create a directory for a new application and name it my-app. This is a sample project to download our private npm package published in the previous step. You can apply this pattern to all repositories you intend on installing your organization’s npm packages in.

npm init -y

{
  "name": "my-app",
  "version": "1.0.0",
  "description": "A sample application consuming a private scoped npm package",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  }
}

Adding CodeArtifact configs

Copy the npm scripts prepare and co:login created earlier to your new project:

{
  "name": "my-app",
  "version": "1.0.0",
  "description": "A sample application consuming a private scoped npm package",
  "main": "index.js",
  "scripts": {
    "prepare": "npm run co:login",
    "co:login": "aws codeartifact login --tool npm --repository my-repo --domain my-domain",
    "test": "echo \"Error: no test specified\" && exit 1"
  }
}

Installing your new private npm package

Enter the following command:

npm install @myorg/my-package

Your package.json should now list @myorg/my-package in your dependencies:

{
  "name": "my-app",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "prepare": "npm run co:login",
    "co:login": "aws codeartifact login --tool npm --repository my-repo --domain my-domain",
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "dependencies": {
    "@myorg/my-package": "^1.0.0"
  }
}

Using your new npm package

In our my-app application, create a file named index.js to run code from our package containing the following:

const { helloWorld } = require('@myorg/my-package');

helloWorld();

Run node index.js in your terminal to see the console print the message from our @myorg/my-package helloWorld function.

Cleaning Up

If you created a CodeArtifact repository for the purposes of this post, use one of the following methods to delete the repository:

Remove the changes made to your user profile’s npm configuration by running npm config delete registry, this will remove the CodeArtifact repository from being set as your default npm registry.

Conclusion

In this post, you successfully published a private scoped npm package stored in CodeArtifact, which you can reuse across multiple teams and projects within your organization. You can use npm scripts to streamline the authentication process and apply this pattern to save time.

About the Author

Ryan Sonshine is a Cloud Application Architect at Amazon Web Services. He works with customers to drive digital transformations while helping them architect, automate, and re-engineer solutions to fully leverage the AWS Cloud.

Creating multi-architecture Docker images to support Graviton2 using AWS CodeBuild and AWS CodePipeline

2020-11-20 Tyler Lynch

Post Syndicated from Tyler Lynch original https://aws.amazon.com/blogs/devops/creating-multi-architecture-docker-images-to-support-graviton2-using-aws-codebuild-and-aws-codepipeline/

This post provides a clear path for customers who are evaluating and adopting Graviton2 instance types for performance improvements and cost-optimization.

Graviton2 processors are custom designed by AWS using 64-bit Arm Neoverse N1 cores. They power the T4g*, M6g*, R6g*, and C6g* Amazon Elastic Compute Cloud (Amazon EC2) instance types and offer up to 40% better price performance over the current generation of x86-based instances in a variety of workloads, such as high-performance computing, application servers, media transcoding, in-memory caching, gaming, and more.

More and more customers want to make the move to Graviton2 to take advantage of these performance optimizations while saving money.

During the transition process, a great benefit AWS provides is the ability to perform native builds for each architecture, instead of attempting to cross-compile on homogenous hardware. This has the benefit of decreasing build time as well as reducing complexity and cost to set up.

To see this benefit in action, we look at how to build a CI/CD pipeline using AWS CodePipeline and AWS CodeBuild that can build multi-architecture Docker images in parallel to aid you in evaluating and migrating to Graviton2.

Solution overview

With CodePipeline and CodeBuild, we can automate the creation of architecture-specific Docker images, which can be pushed to Amazon Elastic Container Registry (Amazon ECR). The following diagram illustrates this architecture.

Solution overview architectural diagram

The steps in this process are as follows:

Create a sample Node.js application and associated Dockerfile.
Create the buildspec files that contain the commands that CodeBuild runs.
Create three CodeBuild projects to automate each of the following steps:
- CodeBuild for x86 – Creates a x86 Docker image and pushes to Amazon ECR.
- CodeBuild for arm64 – Creates a Arm64 Docker image and pushes to Amazon ECR.
- CodeBuild for manifest list – Creates a Docker manifest list, annotates the list, and pushes to Amazon ECR.
Automate the orchestration of these projects with CodePipeline.

Prerequisites

The prerequisites for this solution are as follows:

The correct AWS Identity and Access Management (IAM) role permissions for your account allowing for the creation of the CodePipeline pipeline, CodeBuild projects, and Amazon ECR repositories
An Amazon ECR repository named multi-arch-test
A source control service such as AWS CodeCommit or GitHub that CodeBuild and CodePipeline can interact with
The source code repository initialized and cloned locally

Creating a sample Node.js application and associated Dockerfile

For this post, we create a sample “Hello World” application that self-reports the processor architecture. We work in the local folder that is cloned from our source repository as specified in the prerequisites.

In your preferred text editor, add a new file with the following Node.js code:


# Hello World sample app.
const http = require('http');

const port = 3000;

const server = http.createServer((req, res) => {
  res.statusCode = 200;
  res.setHeader('Content-Type', 'text/plain');
  res.end(`Hello World. This processor architecture is ${process.arch}`);
});

server.listen(port, () => {
  console.log(`Server running on processor architecture ${process.arch}`);
});

Save the file in the root of your source repository and name it app.js.
Commit the changes to Git and push the changes to our source repository. See the following code:


git add .
git commit -m "Adding Node.js sample application."
git push

We also need to create a sample Dockerfile that instructs the docker build command how to build the Docker images. We use the default Node.js image tag for version 14.

In a text editor, add a new file with the following code:


# Sample nodejs application
FROM node:14
WORKDIR /usr/src/app
COPY package*.json app.js ./
RUN npm install
EXPOSE 3000
CMD ["node", "app.js"]

Save the file in the root of the source repository and name it Dockerfile. Make sure it is Dockerfile with no extension.
Commit the changes to Git and push the changes to our source repository:


git add .
git commit -m "Adding Dockerfile to host the Node.js sample application."
git push

Creating a build specification file for your application

It’s time to create and add a buildspec file to our source repository. We want to use a single buildspec.yml file for building, tagging, and pushing the Docker images to Amazon ECR for both target native architectures, x86, and Arm64. We use CodeBuild to inject environment variables, some of which need to be changed for each architecture (such as image tag and image architecture).

A buildspec is a collection of build commands and related settings, in YAML format, that CodeBuild uses to run a build. For more information, see Build specification reference for CodeBuild.

The buildspec we add instructs CodeBuild to do the following:

install phase – Update the yum package manager
pre_build phase – Sign in to Amazon ECR using the IAM role assumed by CodeBuild
build phase – Build the Docker image using the Docker CLI and tag the newly created Docker image
post_build phase – Push the Docker image to our Amazon ECR repository

We first need to add the buildspec.yml file to our source repository.

In a text editor, add a new file with the following build specification:


version: 0.2
phases:
    install:
        commands:
            - yum update -y
    pre_build:
        commands:
            - echo Logging in to Amazon ECR...
            - $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION)
    build:
        commands:
            - echo Build started on `date`
            - echo Building the Docker image...          
            - docker build -t $IMAGE_REPO_NAME:$IMAGE_TAG .
            - docker tag $IMAGE_REPO_NAME:$IMAGE_TAG $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG      
    post_build:
        commands:
            - echo Build completed on `date`
            - echo Pushing the Docker image...
            - docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG

Save the file in the root of the repository and name it buildspec.yml.

Because we specify environment variables in the CodeBuild project, we don’t need to hard code any values in the buildspec file.

Commit the changes to Git and push the changes to our source repository:


git add .
git commit -m "Adding CodeBuild buildspec.yml file."
git push

Creating a build specification file for your manifest list creation

Next we create a buildspec file that instructs CodeBuild to create a Docker manifest list, and associate that manifest list with the Docker images that the buildspec file builds.

A manifest list is a list of image layers that is created by specifying one or more (ideally more than one) image names. You can then use it in the same way as an image name in docker pull and docker run commands, for example. For more information, see manifest create.

As of this writing, manifest creation is an experimental feature of the Docker command line interface (CLI).

Experimental features provide early access to future product functionality. These features are intended only for testing and feedback because they may change between releases without warning or be removed entirely from a future release. Experimental features must not be used in production environments. For more information, Experimental features.

When creating the CodeBuild project for manifest list creation, we specify a buildspec file name override as buildspec-manifest.yml. This buildspec instructs CodeBuild to do the following:

install phase – Update the yum package manager
pre_build phase – Sign in to Amazon ECR using the IAM role assumed by CodeBuild
build phase – Perform three actions:
- Set environment variable to enable Docker experimental features for the CLI
- Create the Docker manifest list using the Docker CLI
- Annotate the manifest list to add the architecture-specific Docker image references
post_build phase – Push the Docker image to our Amazon ECR repository and use docker manifest inspect to echo out the contents of the manifest list from Amazon ECR

We first need to add the buildspec-manifest.yml file to our source repository.

In a text editor, add a new file with the following build specification:


version: 0.2
# Based on the Docker documentation, must include the DOCKER_CLI_EXPERIMENTAL environment variable
# https://docs.docker.com/engine/reference/commandline/manifest/    

phases:
    install:
        commands:
            - yum update -y
    pre_build:
        commands:
            - echo Logging in to Amazon ECR...
            - $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION)
    build:
        commands:
            - echo Build started on `date`
            - echo Building the Docker manifest...   
            - export DOCKER_CLI_EXPERIMENTAL=enabled       
            - docker manifest create $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:latest-arm64v8 $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:latest-amd64    
            - docker manifest annotate --arch arm64 $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:latest-arm64v8
            - docker manifest annotate --arch amd64 $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:latest-amd64

    post_build:
        commands:
            - echo Build completed on `date`
            - echo Pushing the Docker image...
            - docker manifest push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME
            - docker manifest inspect $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME

Save the file in the root of the repository and name it buildspec-manifest.yml.
Commit the changes to Git and push the changes to our source repository:


git add .
git commit -m "Adding CodeBuild buildspec-manifest.yml file."
git push

Setting up your CodeBuild projects

Now we have created a single buildspec.yml file for building, tagging, and pushing the Docker images to Amazon ECR for both target native architectures: x86 and Arm64. This file is shared by two of the three CodeBuild projects that we create. We use CodeBuild to inject environment variables, some of which need to be changed for each architecture (such as image tag and image architecture). We also want to use the single Docker file, regardless of the architecture. We also need to ensure any third-party libraries are present and compiled correctly for the target architecture.

For more information about third-party libraries and software versions that have been optimized for Arm, see the Getting started with AWS Graviton GitHub repo.

We use the same environment variable names for the CodeBuild projects, but each project has specific values, as detailed in the following table. You need to modify these values to your numeric AWS account ID, the AWS Region where your Amazon ECR registry endpoint is located, and your Amazon ECR repository name. The instructions for adding the environment variables in the CodeBuild projects are in the following sections.

	Environment Variable	x86 Project values	Arm64 Project values	manifest Project values
1	AWS_DEFAULT_REGION	us-east-1	us-east-1	us-east-1
2	AWS_ACCOUNT_ID	111111111111	111111111111	111111111111
3	IMAGE_REPO_NAME	multi-arch-test	multi-arch-test	multi-arch-test
4	IMAGE_TAG	latest-amd64	latest-arm64v8	latest

The image we use in this post uses architecture-specific tags with the term latest. This is for demonstration purposes only; it’s best to tag the images with an explicit version or another meaningful reference.

CodeBuild for x86

We start with creating a new CodeBuild project for x86 on the CodeBuild console.

CodeBuild looks for a file named buildspec.yml by default, unless overridden. For these first two CodeBuild projects, we rely on that default and don’t specify the buildspec name.

On the CodeBuild console, choose Create build project.
For Project name, enter a unique project name for your build project, such as node-x86.
To add tags, add them under Additional Configuration.
Choose a Source provider (for this post, we choose GitHub).
For Environment image, choose Managed image.
Select Amazon Linux 2.
For Runtime(s), choose Standard.
For Image, choose aws/codebuild/amazonlinux2-x86_64-standard:3.0.

This is a x86 build image.

Select Privileged.
For Service role, choose New service role.
Enter a name for the new role (one is created for you), such as CodeBuildServiceRole-nodeproject.

We reuse this same service role for the other CodeBuild projects associated with this project.

Expand Additional configurations and move to the Environment variables
Create the following Environment variables:

	Name	Value	Type
1	AWS_DEFAULT_REGION	us-east-1	Plaintext
2	AWS_ACCOUNT_ID	111111111111	Plaintext
3	IMAGE_REPO_NAME	multi-arch-test	Plaintext
4	IMAGE_TAG	latest-amd64	Plaintext

Choose Create build project.

Attaching the IAM policy

Now that we have created the CodeBuild project, we need to adjust the new service role that was just created and attach an IAM policy so that it can interact with the Amazon ECR API.

On the CodeBuild console, choose the node-x86 project
Choose the Build details
Under Service role, choose the link that looks like arn:aws:iam::111111111111:role/service-role/CodeBuildServiceRole-nodeproject.

A new browser tab should open.

Choose Attach policies.
In the Search field, enter AmazonEC2ContainerRegistryPowerUser.
Select AmazonEC2ContainerRegistryPowerUser.
Choose Attach policy.

CodeBuild for arm64

Now we move on to creating a new (second) CodeBuild project for Arm64.

On the CodeBuild console, choose Create build project.
For Project name, enter a unique project name, such as node-arm64.
If you want to add tags, add them under Additional Configuration.
Choose a Source provider (for this post, choose GitHub).
For Environment image, choose Managed image.
Select Amazon Linux 2.
For Runtime(s), choose Standard.
For Image, choose aws/codebuild/amazonlinux2-aarch64-standard:2.0.

This is an Arm build image and is different from the image selected in the previous CodeBuild project.

Select Privileged.
For Service role, choose Existing service role.
Choose CodeBuildServiceRole-nodeproject.
Select Allow AWS CodeBuild to modify this service role so it can be used with this build project.
Expand Additional configurations and move to the Environment variables
Create the following Environment variables:

	Name	Value	Type
1	AWS_DEFAULT_REGION	us-east-1	Plaintext
2	AWS_ACCOUNT_ID	111111111111	Plaintext
3	IMAGE_REPO_NAME	multi-arch-test	Plaintext
4	IMAGE_TAG	latest-arm64v8	Plaintext

Choose Create build project.

CodeBuild for manifest list

For the last CodeBuild project, we create a Docker manifest list, associating that manifest list with the Docker images that the preceding projects create, and pushing the manifest list to ECR. This project uses the buildspec-manifest.yml file created earlier.

On the CodeBuild console, choose Create build project.
For Project name, enter a unique project name for your build project, such as node-manifest.
If you want to add tags, add them under Additional Configuration.
Choose a Source provider (for this post, choose GitHub).
For Environment image, choose Managed image.
Select Amazon Linux 2.
For Runtime(s), choose Standard.
For Image, choose aws/codebuild/amazonlinux2-x86_64-standard:3.0.

This is a x86 build image.

Select Privileged.
For Service role, choose Existing service role.
Choose CodeBuildServiceRole-nodeproject.
Select Allow AWS CodeBuild to modify this service role so it can be used with this build project.
Expand Additional configurations and move to the Environment variables
Create the following Environment variables:

	Name	Value	Type
1	AWS_DEFAULT_REGION	us-east-1	Plaintext
2	AWS_ACCOUNT_ID	111111111111	Plaintext
3	IMAGE_REPO_NAME	multi-arch-test	Plaintext
4	IMAGE_TAG	latest	Plaintext

For Buildspec name – optional, enter buildspec-manifest.yml to override the default.
Choose Create build project.

Setting up CodePipeline

Now we can move on to creating a pipeline to orchestrate the builds and manifest creation.

On the CodePipeline console, choose Create pipeline.
For Pipeline name, enter a unique name for your pipeline, such as node-multi-architecture.
For Service role, choose New service role.
Enter a name for the new role (one is created for you). For this post, we use the generated role name CodePipelineServiceRole-nodeproject.
Select Allow AWS CodePipeline to create a service role so it can be used with this new pipeline.
Choose Next.
Choose a Source provider (for this post, choose GitHub).
If you don’t have any existing Connections to GitHub, select Connect to GitHub and follow the wizard.
Choose your Branch name (for this post, I choose main, but your branch might be different).
For Output artifact format, choose CodePipeline default.
Choose Next.

You should now be on the Add build stage page.

For Build provider, choose AWS CodeBuild.
Verify the Region is your Region of choice (for this post, I use US East (N. Virginia)).
For Project name, choose node-x86.
For Build type, select Single build.
Choose Next.

You should now be on the Add deploy stage page.

Choose Skip deploy stage.

A pop-up appears that reads Your pipeline will not include a deployment stage. Are you sure you want to skip this stage?

Choose Skip.
Choose Create pipeline.

CodePipeline immediately attempts to run a build. You can let it continue without worry if it fails. We are only part of the way done with the setup.

Adding an additional build step

We need to add the additional build step for the Arm CodeBuild project in the Build stage.

On the CodePipeline console, choose node-multi-architecture pipeline
Choose Edit to start editing the pipeline stages.

You should now be on the Editing: node-multi-architecture page.

For the Build stage, choose Edit stage.
Choose + Add action.

Editing node-multi-architecture

For Action name, enter Build-arm64.
For Action provider, choose AWS CodeBuild.
Verify your Region is correct.
For Input artifacts, select SourceArtifact.
For Project name, choose node-arm64.
For Build type, select Single build.
Choose Done.
Choose Save.

A pop-up appears that reads Saving your changes cannot be undone. If the pipeline is running when you save your changes, that execution will not complete.

Choose Save.

Updating the first build action name

This step is optional. The CodePipeline wizard doesn’t allow you to enter your Build action name during creation, but you can update the Build stage’s first build action to have consistent naming.

Choose Edit to start editing the pipeline stages.
Choose the Edit icon.
For Action name, enter Build-x86.
Choose Done.
Choose Save.

A pop-up appears that says Saving your changes cannot be undone. If the pipeline is running when you save your changes, that execution will not complete.

Choose Save.

Adding the project

Now we add the CodeBuild project for manifest creation and publishing.

On the CodePipeline console, choose node-multi-architecture pipeline.
Choose Edit to start editing the pipeline stages.
Choose +Add stage below the Build
Set the Stage name to Manifest
Choose +Add action group.
For Action name, enter Create-manifest.
For Action provider, choose AWS CodeBuild.
Verify your Region is correct.
For Input artifacts, select SourceArtifact.
For Project name, choose node-manifest.
For Build type, select Single build.
Choose Done.
Choose Save.

A pop-up appears that reads Saving your changes cannot be undone. If the pipeline is running when you save your changes, that execution will not complete.

Choose Save.

Testing the pipeline

Now let’s verify everything works as planned.

In the pipeline details page, choose Release change.

This runs the pipeline in stages. The process should take a few minutes to complete. The pipeline should show each stage as Succeeded.

Pipeline visualization

Now we want to inspect the output of the Create-manifest action that runs the CodeBuild project for manifest creation.

Choose Details in the Create-manifest

This opens the CodeBuild pipeline.

Under Build logs, we should see the output from the manifest inspect command we ran as the last step in the buildspec-manifest.yml See the following sample log:


[Container] 2020/10/07 16:47:39 Running command docker manifest inspect $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1369,
         "digest": "sha256:238c2762212ff5d7e0b5474f23d500f2f1a9c851cdd3e7ef0f662efac508cd04",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1369,
         "digest": "sha256:0cc9e96921d5565bdf13274e0f356a139a31d10e95de9ad3d5774a31b8871b05",
         "platform": {
            "architecture": "arm64",
            "os": "linux"
         }
      }
   ]
}

Cleaning up

To avoid incurring future charges, clean up the resources created as part of this post.

On the CodePipeline console, choose the pipeline node-multi-architecture.
Choose Delete pipeline.
When prompted, enter delete.
Choose Delete.
On the CodeBuild console, choose the Build project node-x86.
Choose Delete build project.
When prompted, enter delete.
Choose Delete.
Repeat the deletion process for Build projects node-arm64 and node-manifest.

Next we delete the Docker images we created and pushed to Amazon ECR. Be careful to not delete a repository that is being used for other images.

On the Amazon ECR console, choose the repository multi-arch-test.

You should see a list of Docker images.

Select latest, latest-arm64v8, and latest-amd64.
Choose Delete.
When prompted, enter delete.
Choose Delete.

Finally, we remove the IAM roles that we created.

On the IAM console, choose Roles.
In the search box, enter CodePipelineServiceRole-nodeproject.
Select the role and choose Delete role.
When prompted, choose Yes, delete.
Repeat these steps for the role CodeBuildServiceRole-nodeproject.

Conclusion

To summarize, we successfully created a pipeline to create multi-architecture Docker images for both x86 and arm64. We referenced them via annotation in a Docker manifest list and stored them in Amazon ECR. The Docker images were based on a single Docker file that uses environment variables as parameters to allow for Docker file reuse.

For more information about these services, see the following:

About the Authors

Tyler Lynch photo

Tyler Lynch
Tyler Lynch is a Sr. Solutions Architect focusing on EdTech at AWS.

Alistair McLean photo

Alistair McLean

Alistair is a Principal Solutions Architect focused on State and Local Government and K12 customers at AWS.

Field Notes: Automating Migration Requests for Reserved Instances and Savings Plans in Closed Accounts

2020-11-19 Daniel Li

Post Syndicated from Daniel Li original https://aws.amazon.com/blogs/architecture/field-notes-automating-migration-requests-for-reserved-instances-and-savings-plans-in-closed-accounts/

Enterprise AWS customers are often managing many accounts under a payer account, and sometimes accounts are closed before Reserved Instances (RI) or Savings Plans (SP) are fully used. Manually tracking account closures and requesting RI and SP migration from the closed accounts can become complex and error prone.

This blog post describes a solution for automating the process of requesting migration of unused RI and SP from closed accounts that are linked to a payer account. The solution helps reduce manual work and loss of unused RI and SP due to human error.

The solution automatically detects newly closed accounts that are linked to a payer account. If the closed accounts have unused RI and SP attached, it creates support cases with the required information for RI and SP migration. It can optionally send email notifications with the support case IDs. After a support case is created, the AWS Support team starts a workflow for processing the request, and updates the support case when the request is completed.

Solution Overview

The following diagram shows the overview of the solution.

Walkthrough

An Amazon CloudWatch Events time-based rule will create an event with the configured interval, such as one hour.
The event triggers an AWS Lambda function (refresh.js), which will retrieve the current status of all the linked accounts for the payer account from AWS Organizations. It saves the account status as an item in an Amazon DynamoDB table.
The DynamoDB table has Streams enabled and it triggers another Lambda function (process.js) when there are updates in the table. The stream view type is set to NEW_AND_OLD_IMAGES.
The Lambda function performs the following steps:
- Compares the differences between the new and the old images in the stream record. If an account’s status is SUSPENDED in the new image and is ACTIVE in the old image, the account is newly closed.
- Retrieves the RI and SP utilization information from AWS Cost Explorer for the closed account.
- If there are unused RI and SP in the account, it creates support cases to request RI and SP migration and/or send email notifications through Amazon SNS.

Next, let’s take a deeper look at the Lambda functions, which were written in JavaScript. The code for the Lambda functions can be downloaded from this repository.

Lambda function: refresh.js

This Lambda function is triggered by a CloudWatch Events rule at the predefined interval, for example, one hour. It retrieves the account statuses of all the linked accounts under a payer account from AWS Organizations. Then it saves the account statuses to a DynamoDB table.

It calls the AWS Organizations ListAccounts API to get the current status of all the linked accounts for the given payer account. The following sample code is for the ListAccounts API call.

const AWS = require('aws-sdk')
// Environment variable
const organizationsEndpoint = process.env.ORGANIZATIONS_ENDPOINT; 
const orgEndpoint = new AWS.Endpoint(organizationsEndpoint);
// Get region from the endpoint hostname, e.g. organizations.us-east-1.amazonaws.com
const orgRegion = organizationsEndpoint.split(".")[1];

const organizations = new AWS.Organizations({
    endpoint: orgEndpoint,
    region: orgRegion 
});

// Call the ListAccounts API to retrieve the status for all the linked accounts
function refreshAccounts() {
    console.log("Enter refreshAccounts");

    var newData = {Accounts:[]};
    
    var handleError = function(response) {
        console.log(response.err, response.err.stack); // an error occurred
        throw response.err;
    };
    
    var params = {};
    organizations.listAccounts(params).on('success', function handlePage(response) {
        console.log("ListAccounts response: \n" + JSON.stringify(response.data));    
        var account;
            
        // extract each account's Id and Status 
        for (account of response.data.Accounts) {
            var accountStatus = {Id: account.Id, Status: account.Status};
            newData.Accounts.push(accountStatus);
        }
    
        // handle response pagination
        if (response.hasNextPage()) {
            response.nextPage().on('success', handlePage).on('error', handleError)
            .send();
        } else {
            console.log("Account Status: \n" + JSON.stringify(newData)); 
            // insert code here to save the data to DynamoDB
        }
    }).on('error', handleError).send();
}

The following sample output is from the ListAccounts API call.

{
  Accounts: [
    {
      Id: '123456789000',
      Arn: 'arn:aws:organizations::123456789000:account/o-x3ub1dr53f/123456789000',
      Email: '[email protected]',
      Name: 'Payer',
      Status: 'ACTIVE',
      JoinedMethod: 'INVITED',
      JoinedTimestamp: 2020-01-09T06:49:36.035Z
    },
    {
      Id: '987654321000',
      Arn: 'arn:aws:organizations::123456789000:account/o-x3ub1dr53f/987654321000',
      Email: '[email protected]',
      Name: 'Linked Account 1',
      Status: 'SUSPENDED',
      JoinedMethod: 'INVITED',
      JoinedTimestamp: 2020-08-11T23:46:18.560Z
    }
  ]
}

From the API output, the Lambda function creates a smaller JSON with only the account ID and status information.

{
    "Accounts": [
        {
            "Id": "123456789000",
            "Status": "ACTIVE"
        },
        {
            "Id": "987654321000",
            "Status": "SUSPENDED"
        }
    ]
}

Next, it calls the DynamoDB UpdateItem API to save the current account status data as an item in the DynamoDB table. The API is configured with conditional update.

If the DynamoDB table already has an item with the same key, the item is not updated unless there is a different value. When the table is updated, DynamoDB Streams triggers a call to the next Lambda function. Below is the sample code for the UpdateItem API call.

const tableName = process.env.TABLE_NAME; // environment variable
const dynamodb = new AWS.DynamoDB();

// Save the account status JSON to the DynamoDB table 
// if it is different than the existing data in the table
function saveDataToDynamoDB(data) {
    console.log("Enter saveDataToDynamoDB");

    var params = {
        ExpressionAttributeNames: {
            "#ACC": "Accounts"
        }, 
        ExpressionAttributeValues: {
            ":t": {
                S: data
            } 
        }, 
        Key: {
            "Organization": {
                S: "default"
            }
        }, 
        ConditionExpression: "#ACC <> :t",
        ReturnValues: "ALL_NEW", 
        TableName: tableName, 
        UpdateExpression: "SET #ACC = :t"
    };
    
    dynamodb.updateItem(params, function(err, data) {
        console.log("DynamoDB callback...");  
        if (err) {
            console.log(err, err.stack); // an error occurred
        
            if (err.code != "ConditionalCheckFailedException") {
                throw err;
            }
        } else {
            console.log(data);           // successful response
        }
    });
}

Lambda function: process.js

This Lambda function processes DynamoDB stream records and compares the old and new item data in the stream record. If any of the linked accounts changed statuses to SUSPENDED from ACTIVE, it calls the Cost Explorer APIs – GetReservationUtilization API and GetSavingsPlansUtilizationDetails API– to check if the newly closed accounts have unused RI and SP attached. The following sample code is for the GetReservationUtilization API call.

const costexplorerEndpoint = process.env.COST_EXPLORER_ENDPOINT; // environment variable
const ceEndpoint = new AWS.Endpoint(costexplorerEndpoint);
// Get region from the endpoint hostname, e.g. ce.us-east-1.amazonaws.com
const ceRegion = costexplorerEndpoint.split(".")[1];

const costexplorer = new AWS.CostExplorer({
    endpoint:ceEndpoint,
    region:ceRegion
});

// Check if the source account has unused RI. If yes, send email notification 
// or create support case based on ACTION_TYPE
// Params
//   startDate: String with 'YYYY-MM-DD' format
//   endDate: String with 'YYYY-MM-DD' format
//   sourceAccountId: the AWS account to migrate RI from
//   destinationAccountId: the AWS account to migrate RI to
function processReservationUtilization(startDate,endDate,
            sourceAccountId,destinationAccountId) {
  
  console.log("Enter processReservationUtilization");
  
  // array to hold all the unused RI lease IDs from the source account
  var riLeaseIds = []; 
  
  const now = new Date();

  var params = {
    TimePeriod: {
      End: endDate, 
      Start: startDate
    },
    Filter: { 
      "Dimensions": {
          "Key": "LINKED_ACCOUNT",
          "Values": [
              sourceAccountId
          ]
      }
    },
    GroupBy: [
      {
        Key: 'SUBSCRIPTION_ID',
        Type: "DIMENSION"
      }
    ]
  };
    
  var handleError = function(response) {
      console.log(response.err, response.err.stack); // an error occurred
      throw response.err;
  };

  costexplorer.getReservationUtilization(params)
    .on('success', function handlePage(response) {
    console.log("GetReservationUtilization response \n" + response.data); 
    
    var utilizationsByTime;
    var group;
    
    // Find unused RI in the results
    for (utilizationsByTime of response.data.UtilizationsByTime) {
      for (group of utilizationsByTime.Groups) {
        const endDateTime = new Date(group.Attributes.endDateTime);
        if (endDateTime.getTime() > now.getTime()) {
          riLeaseIds.push(group.Attributes.leaseId);
        }        
      }
    }
    
    // Handle response pagination
    if (response.hasNextPage()) {
        response.nextPage().on('success', handlePage)
          .on('error', handleError).send();
    } else if (riLeaseIds.length > 0) {
      var requestBody = riRequestBody + "\nSource Account: " + sourceAccountId 
        + "\nDestination Account: " + destinationAccountId + "\nSavings Plans Arns: " + JSON.stringify(riLeaseIds);
      // Insert code here to create support case and/or send message to SNS
    }      
  }).on('error', handleError).send();
}

The following sample code is for the GetSavingsPlansUtilizationDetails API call.

// Check if the source account has unused SP. If yes, send email notification 
// or create support case based on ACTION_TYPE
// Params
//   startDate: String with 'YYYY-MM-DD' format
//   endDate: String with 'YYYY-MM-DD' format
//   sourceAccountId: the AWS account to migrate SP from
//   destinationAccountId: the AWS account to migrate SP to
function processSavingsPlansUtilization(startDate,endDate,
        sourceAccountId,destinationAccountId) {
  
  console.log("Enter processSavingsPlansUtilization");
  
  //array to hold all the unused SP ARNs from the source account
  var savingsPlansArns = []; 
  
  const now = new Date();  
  
  var params = {
    TimePeriod: { /* required */
      End: endDate, 
      Start: startDate    
    },
    Filter: { 
      "Dimensions": {
          "Key": "LINKED_ACCOUNT",
          "Values": [
            sourceAccountId
          ]
      }
    }
  };

  var handleError = function(response) {
      console.log(response.err, response.err.stack); // an error occurred
      throw response.err;
  };

  costexplorer.getSavingsPlansUtilizationDetails(params).on('success', function handlePage(response) {
    console.log("getSavingsPlansUtilizationDetails response \n" + response.data);  

    var utilizationDetail;
    
    // Find unused SP in the results
    for (utilizationDetail of response.data.SavingsPlansUtilizationDetails) {
      const endDateTime = new Date(utilizationDetail.Attributes.EndDateTime);
      if (endDateTime.getTime() > now.getTime()) {
        savingsPlansArns.push(utilizationDetail.SavingsPlanArn);
      }
    }

    // handle response pagination
    if (response.hasNextPage()) {
        response.nextPage().on('success', handlePage).on('error', handleError).send();
    } else if (savingsPlansArns.length > 0) {
      var requestBody = spRequestBody + "\nSource Account: " + sourceAccountId + "\nDestination Account: " + destinationAccountId + "\nSavings Plans Arns: " + JSON.stringify(savingsPlansArns);
      // insert code here to create support case and/or send message to SNS
    }
  }).on('error', handleError).send();
}

For those accounts that require RI or SP migration, it calls the AWS Support CreateCase API to create support cases. It can either/or send email notifications based on the value of the environment variable ACTION_TYPE.

EMAIL_ONLY: Send the RI and SP details to an SNS topic for email notification.
SUPPORT_CASE_ONLY: Create support cases to request RI and SP migration.
SUPPORT_CASE_AND_EMAIL: Create support cases to request RI and SP migration and send the request details to an SNS topic for email notification. The support case IDs are included in the email subject.

The following sample code is for the CreateCase API call.

const supportEndpoint = process.env.SUPPORT_ENDPOINT; // environment variable
const supEndpoint = new AWS.Endpoint(supportEndpoint);
// get region from the endpoint hostname, e.g. support.us-east-1.amazonaws.com
const supRegion = supportEndpoint.split(".")[1];

const support = new AWS.Support({
    endpoint:supEndpoint,
    region:supRegion
});

// Create an AWS Support Case
// Params:
//   emailAddress: email address for the support case
//   caseSubject: the subject of the support case
//   caseBody: the body of the support case
function createSupportCase(emailAddresses, caseSubject, caseBody, notify) {
  
  console.log("Enter createSupportCase");
  
  var params = {
    communicationBody: caseBody, /* required */
    subject: caseSubject, /* required */
    categoryCode: 'reserved-instances',
    ccEmailAddresses: [
      emailAddresses
    ],
    issueType: 'customer-service',
    serviceCode: 'billing',
    severityCode: 'low'
  };
  
  support.createCase(params, function(err, data) {
    console.log("Support callback...");  
    
    if (err) {
      console.log(err, err.stack); // an error occurred
      throw err;
    } else {
      console.log(data);           // successful response
      if (notify) {
        // insert code here to publish the message to a SNS topic
      }
    }
  });
}

The following sample support case is content from the Lambda function:

Subject:

Savings Plans Migration

Body:

Please migrate the following Savings Plans from the source account to the destination account.

Source Account: 987654321000

Destination Account: 123456789000

Savings Plans Arns: [“arn:aws:savingsplans::987654321000:savingsplan/bb242919-11ba-429c-b1dc-101010101010”]

AWS Support case guidelines

The AWS RI Operations Team has provided the following guidelines for RI and SP migration support cases.

1) If a closed account is still linked to a payer account, create one support case from the payer account. Otherwise, create one support case from each account – source and destination.

2) When creating a support case, use a user ID or a role with RI and SP purchase IAM permissions.

3) In the support case body, provide RI lease IDs or SP ARNs, source account ID, destination account ID, and an explicit request to migrate the RI and SP. If there are many RI lease IDs or SP ARNs, provide the data in CSV format.

Note: Do not use PDF or images.

4) Set severity code to low, service code to billing, category code to reserved-instances, and issue type to customer-service.

Deploy the stack

Download the template for deploying the solution used in this post.

Use this template to launch the solution and all associated components. The default configuration deploys AWS Lambda functions, a DynamoDB table, an Amazon CloudWatch Events rule, an SNS topic, and AWS Identity and Access Management (IAM) roles necessary to set up the solution in your account, but you can customize the template to meet your specific needs.

Before you launch the solution, review the architecture, configuration, network security, and other considerations. Follow the step-by-step instructions in this section to configure and deploy the solution into your account. You need to have authority to launch resources in the payer account before launching the stack.

Note: You are responsible for the cost of the AWS services used while running this solution. For more details, refer to the pricing web page for each AWS service used in this solution.

Time to deploy: Approximately five minutes.

1. Sign in to the AWS Management Console and use the following button to launch the AWS CloudFormation stack. Optionally, you can download the template as a starting point for your own implementation. Note: the CloudFormation stack needs to be created in the payer account.

2. The template launches in the US East (N. Virginia) Region by default. You can choose to launch it in other regions.

3. On the Create stack page, verify that the correct template URL is in the Amazon S3 URL text box and choose Next.

4. On the Specify stack details page, assign a name to your solution stack.

5. Under Parameters, review the parameters for this solution template and modify them as necessary. This solution uses the following default values.

Recommended documentation for reviewing these parameters:

6. Choose Next.

7. On the Configure stack options page, choose Next.

8. On the Review page, review and confirm the settings. Check the box acknowledging that the template will create AWS Identity and Access Management (IAM) resources.

9. Choose Create stack to deploy the stack.

You can view the status of the stack in the AWS CloudFormation console in the Status column. You should receive a CREATE_COMPLETE status in approximately five minutes.

Clean Up

To clean up the stack, select the stack in the AWS CloudFormation console and click the Delete button. In the popup window, choose Delete stack.

Conclusion

This post described a solution for automating the process of detecting account closures and requesting RI and SP migration. The solution will help enterprise customers reduce manual work and avoid losing unused RI and SP due to human errors.

If you have any comments or questions, please don’t hesitate to leave them in the comments section.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Restricting Amazon WorkSpaces Users to Run Amazon Athena Queries

2020-11-18 Somdeb Bhattacharjee

Post Syndicated from Somdeb Bhattacharjee original https://aws.amazon.com/blogs/architecture/field-notes-restricting-amazon-workspaces-users-to-run-amazon-athena-queries/

One of the use cases we hear from customers is that they want to provide very limited access to Amazon Workspaces users (for example contractors, consultants) in an AWS account. At the same time they want to allow them to query Amazon Simple Storage Service (Amazon S3) data in another account using Amazon Athena over a JDBC connection.

For example, marketing companies might provide private access to the first party data to media agencies through this mechanism.

The restrictions they want to put in place are:

For security reasons these Amazon WorkSpaces should not have internet connectivity. So the access to Amazon Athena must be over AWS PrivateLink.
Public access to Athena is not allowed using the credentials used for the JDBC connection. This is to prevent the users from leveraging the credentials to query the data from anywhere else.

In this post, we show how to use Amazon Virtual Private Cloud (Amazon VPC) endpoints for Athena, along with AWS Identity and Access Management (AWS IAM) policies. This provides private access to query the Amazon S3 data while preventing users from querying the data from outside their Amazon WorkSpaces or using the Athena public endpoint.

Let’s review the steps to achieve this:

Initial setup of two AWS accounts (AccountA and AccountB)
Set up Amazon S3 bucket with sample data in AccountA
Set up an IAM user with Amazon S3 and Athena access in AccountA
Create an Amazon VPC endpoint for Athena in AccountA
Set up Amazon WorkSpaces for a user in AccountB
Install a SQL client tool (we will use DbVisualizer Free) and Athena JDBC driver in Amazon WorkSpaces in AccountB
Use DbVisualizer to the query the Amazon S3 data in AccountA using the Athena public endpoint
Update IAM policy for user in AccountA to restrict private only access

Prerequisites

To follow the steps in this post, you need two AWS Accounts. The Amazon VPC and subnet requirements are specified in the detailed steps.

Note: The AWS CloudFormation template used in this blog post is for US-EAST-1 (N. Virginia) Region so ensure the Region setting for both the accounts are set to US-EAST-1 (N. Virginia).

Walkthrough

The two AWS accounts are:

AccountA – Contains the Amazon S3 bucket where the data is stored. For AccountA you can create a new Amazon VPC or use the default Amazon VPC.

AccountB – Amazon WorkSpaces account. Use the following AWS CloudFormation template for AccountB:

The AWS CloudFormation template will create a new Amazon VPC in AccountB with CIDR 10.10.0.0/16 and set up one public subnet and two private subnets.
It will also create a NAT Gateway in the public subnet and create both public and private route tables.
Since we will be launching Amazon WorkSpaces in these private subnets and not all Availability Zones (AZ) are supported by Amazon WorkSpaces, it is important to choose the right AZ when creating them.

Review the documentation to learn which AWS Regions/AZ are supported.

We have provided two parameters in the AWS CloudFormation template:

AZName1
AZName2

Step 1

Before launching the CloudFormation stack:

Log in to AccountB
Search for AWS Resource Access Manager
On the right-hand side, you will notice the AZ ID to AZ Name mapping. Note down the AZ Name corresponding to AZ ID use1-az2 and use1-az4
Now launch the CloudFormation template and remember to choose the AZ names you noted down earlier
- https://athena-workspaces-blogpost.s3.amazonaws.com/vpc.yaml
Enter the CloudFormation Stack Name as – ‘AthenaWorkspaces’ and leave everything default.
Once the CloudFormation stack creation is complete, create a peering connection from AccountB to AccountA.
Update the associated route tables for the private subnets with the new peering connection.

For information on how to create a VPC peering connection, refer to AWS documentation on VPC Peering.

AccountB VPC Route Table:

AccountA VPC Route Table:

AccountA VPC Route Table

Step 2

Create a new Amazon S3 bucket in AccountA with a bucket name that starts with ‘athena-’.
Next, you can download a sample file and upload it to the Amazon S3 bucket you just created.
Use the following statements to create AWS Glue database. Use an external table for the data in the Amazon S3 bucket so that you can query it from Athena.
Go to Athena console and define a new database:

CREATE DATABASE IF NOT EXISTS sampledb

Once the database is created, create a new table in sampledb (by selecting sampledb from the “Database” drop down menu). Replace the <<your bucket name>> with the bucket you just created:

CREATE EXTERNAL TABLE IF NOT EXISTS sampledb.amazon_reviews_tsv(
  marketplace string, 
  customer_id string, 
  review_id string, 
  product_id string, 
  product_parent string, 
  product_title string,
  product_category string,
  star_rating int, 
  helpful_votes int, 
  total_votes int, 
  vine string, 
  verified_purchase string, 
  review_headline string, 
  review_body string, 
  review_date string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  ESCAPED BY '\\'
  LINES TERMINATED BY '\n'
LOCATION
  's3://<<your bucket name>>/'
TBLPROPERTIES ("skip.header.line.count"="1")

Step 3

In AccountA, create a new IAM user with programmatic access.
Save the access key and secret access key.
For the same user add an Inline Policy which allows the following actions:

IAM summary

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowAthenaReadActions",
            "Effect": "Allow",
            "Action": [
                "athena:ListWorkGroups",
                "athena:ListDataCatalogs",
                "athena:GetExecutionEngine",
                "athena:GetExecutionEngines",
                "athena:GetNamespace",
                "athena:GetCatalogs",
                "athena:GetNamespaces",
                "athena:GetTables",
                "athena:GetTable"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowAthenaWorkgroupActions",
            "Effect": "Allow",
            "Action": [
                "athena:StartQueryExecution",
                "athena:GetQueryResults",
                "athena:DeleteNamedQuery",
                "athena:GetNamedQuery",
                "athena:ListQueryExecutions",
                "athena:StopQueryExecution",
                "athena:GetQueryResultsStream",
                "athena:ListNamedQueries",
                "athena:CreateNamedQuery",
                "athena:GetQueryExecution",
                "athena:BatchGetNamedQuery",
                "athena:BatchGetQueryExecution",
                "athena:GetWorkGroup"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowGlueActionsViaVPCE",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateDatabase",
                "glue:GetTables",
                "glue:GetTable"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowGlueActionsViaAthena",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateDatabase",
                "glue:GetTables",
                "glue:GetTable"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowS3ActionsViaAthena",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:ListMultipartUploadParts",
                "s3:AbortMultipartUpload",
                "s3:CreateBucket",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::athena-*"
            ]
        }
    ]
}

Step 4

In this step, we create an Interface VPC endpoint (AWS PrivateLink) for Athena in AccountA. When you use an interface VPC endpoint, communication between your Amazon VPC and Athena is conducted entirely within the AWS network.
Each VPC endpoint is represented by one or more Elastic Network Interfaces (ENIs) with private IP addresses in your VPC subnets.
To create an Interface VPC endpoint follow the instructions and select Athena in the AWS Services list. Do not select the checkbox for Enable Private DNS Name.
Ensure the security group that is attached to the Amazon VPC endpoint is open to inbound traffic on port 443 and 444 for source AccountB VPC CIDR 10.10.0.0/16. Port 444 is used by Athena to stream query results.
Once you create the VPC endpoint, you will get a DNS endpoint name which is in the following format. We are going to use this in JDBC connection from the SQL client.

VPC_Endpoint_ID.athena.Region.vpce.amazonaws.com

Step 5

In this step we set up Amazon WorkSpaces in AccountB.
Each Amazon WorkSpace is associated with the specific Amazon VPC and AWS Directory Service construct that you used to create it. All Directory Service constructs (Simple AD, AD Connector, and Microsoft AD) require two subnets to operate, each in different Availability Zones. This is why we created 2 private subnets at the beginning.
For this blog post I have used Simple AD as the directory service for the Amazon WorkSpaces.
By default, IAM users don’t have permissions for Amazon WorkSpaces resources and operations.
To allow IAM users to manage Amazon WorkSpaces resources, you must create an IAM policy that explicitly grants them permissions
Then attach the policy to the IAM users or groups that require those permissions.
To start, go to the Amazon WorkSpaces console and select Advanced Setup.
- Set up a new directory using the SimpleAD option.
- Use the “small” directory size and choose the Amazon VPC and private subnets you created in Step 1 for AccountB.
- Once you create the directory, register the directory with Amazon WorkSpaces by selecting “Register” from the “Action” menu.
- Select private subnets you created in Step 1 for AccountB.

Directory info

Next, launch Amazon WorkSpaces by following the Launch WorkSpaces button.
Select the directory you created and create a new user.
For the bundle, choose Standard with Windows 10 (PCoIP).
After the Amazon WorkSpaces is created, you can log in to the Amazon WorkSpaces using a client software. You can download it from https://clients.amazonworkspaces.com/
Login to your Amazon WorkSpace, install a SQL Client of your choice. At this point your Amazon WorkSpace still has Internet access via the NAT Gateway
I have used DbVisualizer (the free version) as the SQL client. Once you have that installed, install the JDBC driver for Athena following the instructions
Now you can set up the JDBC connections to Athena using the access key and secret key you set up for an IAM user in AccountA.

Step 6

To test out both the Athena public endpoint and the Athena VPC endpoint, create two connections using the same credentials.

For the Athena public endpoint, you need to use athena.us-east-1.amazonaws.com service endpoint. (jdbc:awsathena://athena.us-east-1.amazonaws.com:443;S3OutputLocation=s3://<athena-bucket-name>/)

Athena public

For the VPC Endpoint Connection, use the VPC Endpoint you created in Step 4 (jdbc:awsathena://vpce-<>.athena.us-east-1.vpce.amazonaws.com:443;S3OutputLocation=s3://<athena-bucket-name>/)

Database connection Athena

Now run a simple query to select records from the amazon_reviews_tsv table using both the connections.

SELECT * FROM sampledb.amazon_reviews_tsv limit 10

You should be able to see results using both the connections. Since the private subnets are still connected to the internet via the NAT Gateway, you can query using the Athena public endpoint.

Run the AWS Command Line Interface (AWS CLI) command using the credentials used for the JDBC connection from your workstation. You should be able to access the Amazon S3 bucket objects and the Athena query run list using the following commands.

aws s3 ls s3://athena-workspaces-blogpost

aws athena list-query-executions

Step 7

Now we lock down the access as described in the beginning of this blog post by taking the following actions:
Update the route table for the private subnets by removing the route for the internet so access to the Athena public endpoint is restricted from the Amazon WorkSpaces. The only access will be allowed through the Athena VPC Endpoint.
Add conditional checks to the IAM user access policy that will restrict access to the Amazon S3 buckets and Athena only if:
- The request came in through the VPC endpoint. For this we use the “aws:SourceVpce” check and provide the VPC Endpoint ID value.
- The request for Amazon S3 data is through Athena. For this we use the condition “aws:CalledVia” and provide a value of “athena.amazonaws.com”.
In the IAM access policy below replace <<your vpce id>> with your VPC endpoint id and update the previous inline policy which was added to the IAM user in Step 3.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowAthenaReadActions",
            "Effect": "Allow",
            "Action": [
                "athena:ListWorkGroups",
                "athena:ListDataCatalogs",
                "athena:GetExecutionEngine",
                "athena:GetExecutionEngines",
                "athena:GetNamespace",
                "athena:GetCatalogs",
                "athena:GetNamespaces",
                "athena:GetTables",
                "athena:GetTable"
            ],
            "Resource": "*",
            "Condition":{
               "StringEquals":{
                  "aws:SourceVpce":[
                     "<<your vpce id>>"
                  ]
               }
            }
        },
        {
            "Sid": "AllowAthenaWorkgroupActions",
            "Effect": "Allow",
            "Action": [
                "athena:StartQueryExecution",
                "athena:GetQueryResults",
                "athena:DeleteNamedQuery",
                "athena:GetNamedQuery",
                "athena:ListQueryExecutions",
                "athena:StopQueryExecution",
                "athena:GetQueryResultsStream",
                "athena:ListNamedQueries",
                "athena:CreateNamedQuery",
                "athena:GetQueryExecution",
                "athena:BatchGetNamedQuery",
                "athena:BatchGetQueryExecution",
                "athena:GetWorkGroup"
            ],
            "Resource": "*",
            "Condition":{
               "StringEquals":{
                  "aws:SourceVpce":[
                     "<<your vpce id>>"
                  ]
               }
            }
        },
        {
            "Sid": "AllowGlueActionsViaVPCE",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateDatabase",
                "glue:GetTables",
                "glue:GetTable"
            ],
            "Resource": "*",
            "Condition":{
               "StringEquals":{
                  "aws:SourceVpce":[
                     "<<your vpce id>>"
                  ]
               }
            }
        },
        {
            "Sid": "AllowGlueActionsViaAthena",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateDatabase",
                "glue:GetTables",
                "glue:GetTable"
            ],
            "Resource": "*",
            "Condition":{
               "ForAnyValue:StringEquals":{
                  "aws:CalledVia":[
                     "athena.amazonaws.com"
                  ]
               }
            }
        },
        {
            "Sid": "AllowS3ActionsViaAthena",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:ListMultipartUploadParts",
                "s3:AbortMultipartUpload",
                "s3:CreateBucket",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::athena-*"
            ],
            "Condition":{
               "ForAnyValue:StringEquals":{
                  "aws:CalledVia":[
                     "athena.amazonaws.com"
                  ]
               }
            }
        }
    ]
}

Once you applied the changes, try to reconnect using both the Athena VPC endpoint as well Athena public endpoint connections. The Athena VPC endpoint connection should work but the public endpoint connection will time out. Also try the same Amazon S3 and Athena AWS CLI commands. You should get access denied for both the operations.

Clean Up

To avoid incurring costs, remember to delete the resources that you created.

For AWS AccountA:

Delete the S3 buckets
Delete the database you created in AWS Glue
Delete the Amazon VPC endpoint you created for Amazon Athena

For AccountB:

Delete the Amazon Workspace you created along with the Simple AD directory. You can review more information on how to delete your Workspaces.

Conclusion

In this blog post, I showed how to leverage Amazon VPC endpoints and IAM policies to privately connect to Amazon Athena from Amazon Workspaces that don’t have internet connectivity.

Give this solution a try and share your feedback in the comments!

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Managing your AWS Outposts capacity using Amazon CloudWatch and AWS Lambda

2020-11-17 Shubha Kumbadakone

Post Syndicated from Shubha Kumbadakone original https://aws.amazon.com/blogs/compute/managing-your-aws-outposts-capacity-using-amazon-cloudwatch-and-aws-lambda/

This post is authored by Carlos Castro, AWS Principal Solutions Architect and Kevin Wang, AWS Associate Solutions Architect

Customers are excited about using AWS Outposts to bring services and capabilities available on AWS to on-premises for workloads that require low-latency, local data processing, or local data storage. As part of this, the responsibility for managing capacity shifts from the AWS to the customer. Customers can purchase AWS Outposts configurations from a broad selection of instance types based on the use cases they plan to run on AWS Outposts. When additional capacity is needed, customers can expand their AWS Outposts. When operating in an AWS Region, customers don’t have to worry about redundant capacity. However, with AWS Outposts, customers must ensure that there is sufficient compute and storage capacity to restore services in the event of a hardware or software failure. This allows customers to meet the business continuity and disaster recovery (BCDR) requirements for the workloads that run on their Outpost.

This post shows how you can combine Region-based AWS services for a common management task, ensuring there is available capacity to support an N+M high availability model, where N is the required capacity and M is the spare capacity designed into the solution. I monitor the Amazon EC2 and Amazon EBS capacity on an Outpost and show how to automate governance to ensure that there are available resources to recover from failures. With this, customers can create a “hot-spare” failover mechanism.

Overview of the solution

I use integrations between services in the AWS Outposts’ home Region with the services deployed on the Outpost to accomplish the goal. AWS Outposts provides available and utilized capacity metrics in Amazon CloudWatch. It is a best practice for administrators to create Amazon CloudWatch Events to monitor capacity. For this use case, events trigger fan-out actions to inform (via Amazon Simple Notification Service) and act (via an AWS Lambda function) on the condition. The action from an event can include triggering a workflow to stop low-priority resources, restricting the use of remaining capacity to a subset of the administrators or sending a notification to the appropriate team to plan adding incremental capacity. In this case, I act on the event by updating the permissions policy associated with administrators in AWS Identity and Access Management (IAM). If the threshold has been exceeded, I remove the capability to create more resources by updating an AWS IAM policy. This action allows me to reserve that capacity in the event of a failure. Once the threshold normalizes, I return permissions to the administrators. AWS administrators have the flexibility of assigning the IAM policies you control with the solution to selected IAM users, groups, or roles. For example, application system administrators can be assigned the policies with dynamic resource creation permissions while AWS Outposts infrastructure administrators maintain full access to creating resources.

The following image represents this flow.

Solution Architecture Diagram and Workflow

Tutorial

Now, you are ready to set up this environment. First, create resources using the AWS Management Console and AWS CloudFormation. You can also use the AWS Command Line Interface (CLI) or your preferred infrastructure as code framework.

The suggested sequence of steps is as follows:

Create AWS IAM policies
Launch AWS CloudFormation stack
Review created resources
Validate your work

Prerequisites

For this walkthrough, you must have the following prerequisites:

An AWS account
A fully configured AWS Outposts
An AWS IAM Role that you use to assign permissions to your AWS Outposts administrators
An Amazon Virtual Private Cloud (VPC) subnet associated with your AWS Outposts
An Amazon S3 bucket in the same AWS Region as your AWS Outpost. Please refer to the AWS Region Table for more information.

Before you deploy the solution’s resources, clone the solution’s GitHub repository. Once you’ve cloned the repository, you must upload a copy of the compressed AWS Lambda code artifacts (.zip files) to the top-level folder of an S3 bucket in your account.

Create AWS IAM policies

This approach updates AWS IAM policies to reflect the current capacity available in your AWS Outpost. This enables users to create resources (such as EC2 instances) when there is capacity available and denies them the ability to do so when the capacity threshold has been exceeded. Since this solution automates remediation via AWS Lambda, you must create a policy that allows the function to make changes in IAM. These permissions are restricted to creating and deleting policy versions. Please review the IAM service documentation for additional information or more in-depth examples.

To create an IAM policy

Log in to AWS IAM console.
Select Policies from the left side navigation tree and click the Create Policy button.

Navigate to the JSON tab and replace the contents with the code below.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Action": "ec2:RunInstances",
            "Resource": [
                "arn:aws:ec2:*:*:subnet/",
                "arn:aws:ec2:*:*:network-interface/*",
                "arn:aws:ec2:*:*:instance/*",
                "arn:aws:ec2:*:*:volume/*",
                "arn:aws:ec2:*::image/ami-*",
                "arn:aws:ec2:*:*:key-pair/*",
                "arn:aws:ec2:*:*:security-group/*"
            ]
        }
    ]
 }

Select Review Policy.
Provide a name for your policy, e.g., outposts-ec2-capacity and select Create Policy.
Record the Amazon Resource Name (ARN) for the created policy in your text editor of choice.

Repeat steps 2-6 to create a capacity policy for EBS, e.g., outposts-ebs-capacity using the JSON code below:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Action": "ec2:CreateVolume",
            "Resource": [
                "arn:aws:ec2:*:*:subnet/",
                "arn:aws:ec2:*:*:network-interface/*",
                "arn:aws:ec2:*:*:instance/*",
                "arn:aws:ec2:*:*:volume/*",
                "arn:aws:ec2:*::image/ami-*",
                "arn:aws:ec2:*:*:key-pair/*",
                "arn:aws:ec2:*:*:security-group/*"
            ]
        }
    ]
 }

Assign the created policies to a role mapped to your Outposts users.

Now, you have the correct IAM policies in place, and you can move on to launch an AWS CloudFormation stack.

Launch AWS CloudFormation stack

Create an AWS CloudFormation stack using the outposts-manage-capacity.yml template that is included in your local clone of the repository. For illustration purposes, the template creates resources to monitor a subset of the Amazon EC2 instance types supported on AWS Outposts. If you have a different configuration, adjust the template by modifying the CloudWatch alarm resources accordingly (Type: AWS::CloudWatch::Alarm).

Provide all necessary parameters using the following descriptions:

Review created resources

Now that you have launched your AWS CloudFormation stack, you are ready to review the resources you created. In order to automate the monitoring of our capacity, I use Amazon CloudWatch alarms to monitor the default metrics published by AWS Outposts and trigger an action when a particular condition has been met. Review the metric alarms created by navigating to the Amazon CloudWatch console. There are alarms for sample instance types in your Outposts. There is also an alarm for EBS capacity. The thresholds for these alarms are based on the parameters (ComputeThreshold, StorageThreshold) provided to your AWS CloudFormation stack.

I use an Amazon CloudWatch Event rule to fan-out notifications about capacity events in our AWS Outposts. Review the event rule created by navigating to the Amazon CloudWatch console. The rule triggers an action to send a message to an SNS topic to inform administrators via email. Review the created topics by navigating to the Amazon SNS console. After your email subscription is created, you must confirm it using the confirmation email message delivered to the account provided.

You then automate remediation by creating a second action in the event rule to trigger the AWS Lambda function. Remediation is based on the current capacity conditions and the state of our CloudWatch alarms. These functions use the boto3 python library to update the version of our previously created Amazon IAM policy and reflect the correct privilege for our administrators based on the system state. Review the created AWS Lambda functions using the console. Navigate to the function’s Configuration tab to see that Amazon CloudWatch Events are configured as triggers as shown in the figure below.

Lambda Function’s Configuration tab to check if CloudWatch events are configured as a trigger

Validate your work

To validate your work, assign the IAM policy created for Amazon EC2 and Amazon EBS to the IAM users or IAM roles assigned to your AWS Outposts administrators. Once assigned, use one of these IAM principals to create enough EC2 or EBS resources on your AWS Outpost to breach the thresholds you previously established. This depends on your AWS Outposts configuration and the capacity it is populated with. When capacity is exceeded, your administrators should receive an error similar to the following image when attempting to create resources in addition to an email notification of the condition. You can also circumvent this by using the ‘Set_Alarm_State’ function through the SDK/CLI to modify the state of your target alarms.

This error should be removed once the capacity returns to the normal state. Please note that AWS Outposts CloudWatch metrics have a default resolution period of five minutes so allow time for a period update to register before changes are made.

Cleaning up

To avoid incurring future charges, delete the resources created during this blog by deleting the AWS CloudFormation stack.

Conclusion

One of the benefits of AWS Outposts is a truly consistent hybrid experience. This post showed how you can use the same management and monitoring services in the AWS Cloud across on-premises and in-Region AWS environments. This simplifies the responsibility to manage capacity to meet your workload requirements. For additional information on the benefits of AWS Outposts, visit its product detail page.

Getting started with RPA using AWS Step Functions and Amazon Textract

2020-11-16 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/getting-started-with-rpa-using-aws-step-functions-and-amazon-textract/

This post is courtesy of Joe Tringali, Solutions Architect.

Many organizations are using robotic process automation (RPA) to automate workflow, back-office processes that are labor-intensive. RPA, as software bots, can often handle many of these activities. Often RPA workflows contain repetitive manual tasks that must be done by humans, such as viewing invoices to find payment details.

AWS Step Functions is a serverless function orchestrator and workflow automation tool. Amazon Textract is a fully managed machine learning service that automatically extracts text and data from scanned documents. Combining these services, you can create an RPA bot to automate the workflow and enable employees to handle more complex tasks.

In this post, I show how you can use Step Functions and Amazon Textract to build a workflow that enables the processing of invoices. Download the code for this solution from https://github.com/aws-samples/aws-step-functions-rpa.

Overview

The following serverless architecture can process scanned invoices in PDF or image formats for submitting payment information to a database.

To implement this architecture, I use single-purpose Lambda functions and Step Functions to build the workflow:

Invoices are scanned and loaded into an Amazon Simple Storage Service (S3) bucket.
The loading of an invoice into Amazon S3 triggers an AWS Lambda function to be invoked.
The Lambda function starts an asynchronous Amazon Textract job to analyze the text and data of the scanned invoice.
The Amazon Textract job publishes a completion notification message with a status of “SUCCEEDED” or “FAILED” to an Amazon Simple Notification Service (SNS) topic.
SNS sends the message to an Amazon Simple Queue Service (SQS) queue that is subscribed to the SNS topic.
The message in the SQS queue triggers another Lambda function.
The Lambda function initiates a Step Functions state machine to process the results of the Amazon Textract job.
For an Amazon Textract job that completes successfully, a Lambda function saves the document analysis into an Amazon S3 bucket.
The loading of the document analysis to Amazon S3 triggers another Lambda function.
The Lambda function retrieves the text and data of the scanned invoice to find the payment information. It writes an item to an Amazon DynamoDB table with a status indicating if the invoice can be processed.
If the DynamoDB item contains the payment information, another Lambda function is invoked.
The Lambda function archives the processed invoice into another S3 bucket.
If the DynamoDB item does not contain the payment information, a message is published to an Amazon SNS topic requesting that the invoice be reviewed.

Amazon Textract can extract information from the various invoice images and associate labels with the data. You must then handle the various labels that different invoices may associate with the payee name, due date, and payment amount.

Determining payee name, due date and payment amount

After the document analysis has been saved to S3, a Lambda function retrieves the text and data of the scanned invoice to find the information needed for payment. However, invoices can use a variety of labels for the same piece of data, such a payment’s due date.

In the example invoices included with this blog, the payment’s due date is associated with the labels “Pay On or Before”, “Payment Due Date” and “Payment Due”. Payment amounts can also have different labels, such as “Total Due”, “New Balance Total”, “Total Current Charges”, and “Please Pay”. To address this, I use a series of helper functions in the app.py file in the process_document_analysis folder of the GitHub repo.

In app.py, there is the following get_ky_map helper function:

def get_kv_map(blocks):
    key_map = {}
    value_map = {}
    block_map = {}
    for block in blocks:
        block_id = block['Id']
        block_map[block_id] = block
        if block['BlockType'] == "KEY_VALUE_SET":
            if 'KEY' in block['EntityTypes']:
                key_map[block_id] = block
            else:
                value_map[block_id] = block
    return key_map, value_map, block_map

The get_kv_map function is invoked by the Lambda function handler. It iterates over the “Blocks” element of the document analysis produced by Amazon Textract to create dictionaries of keys (labels) and values (data) associated with each block identified by Amazon Textract. It then invokes the following get_kv_relationship helper function:

def get_kv_relationship(key_map, value_map, block_map):
    kvs = {}
    for block_id, key_block in key_map.items():
        value_block = find_value_block(key_block, value_map)
        key = get_text(key_block, block_map)
        val = get_text(value_block, block_map)
        kvs[key] = val
    return kvs

The get_kv_relationship function merges the key and value dictionaries produced by the get_kv_map function to create a single Python key value dictionary where labels are the keys to the dictionary and the invoice’s data are the values. The handler then invokes the following get_line_list helper function:

def get_line_list(blocks):
    line_list = []
    for block in blocks:
        if block['BlockType'] == "LINE":
            if 'Text' in block: 
                line_list.append(block["Text"])
    return line_list

Extracting payee names is more complex because the data may not be labeled. The payee may often differ from the entity sending the invoice. With the Amazon Textract analysis in a format more easily consumable by Python, I use the following get_payee_name helper function to parse and extract the payee:

def get_payee_name(lines):
    payee_name = ""
    payable_to = "payable to"
    payee_lines = [line for line in lines if payable_to in line.lower()]
    if len(payee_lines) > 0:
        payee_line = payee_lines[0]
        payee_line = payee_line.strip()
        pos = payee_line.lower().find(payable_to)
        if pos > -1:
            payee_line = payee_line[pos + len(payable_to):]
            if payee_line[0:1] == ':':
                payee_line = payee_line[1:]
            payee_name = payee_line.strip()
    return payee_name

The get_amount helper function searches the key value dictionary produced by the get_kv_relationship function to retrieve the payment amount:

def get_amount(kvs, lines):
    amount = None
    amounts = [search_value(kvs, amount_tag) for amount_tag in amount_tags if search_value(kvs, amount_tag) is not None]
    if len(amounts) > 0:
        amount = amounts[0]
    else:
        for idx, line in enumerate(lines):
            if line.lower() in amount_tags:
                amount = lines[idx + 1]
                break
    if amount is not None:
        amount = amount.strip()
        if amount[0:1] == '$':
            amount = amount[1:]
    return amount

The amount_tags variable contains a list of possible labels associated with the payment amount:

amount_tags = ["total due", "new balance total", "total current charges", "please pay"]

Similarly, the get_due_date helper function searches the key value dictionary produced by the get_kv_relationship function to retrieve the payment due date:

def get_due_date(kvs):
    due_date = None
    due_dates = [search_value(kvs, due_date_tag) for due_date_tag in due_date_tags if search_value(kvs, due_date_tag) is not None]
    if len(due_dates) > 0:
        due_date = due_dates[0]
    if due_date is not None:
        date_parts = due_date.split('/')
        if len(date_parts) == 3:
            due_date = datetime(int(date_parts[2]), int(date_parts[0]), int(date_parts[1])).isoformat()
        else:
            date_parts = [date_part for date_part in re.split("\s+|,", due_date) if len(date_part) > 0]
            if len(date_parts) == 3:
                datetime_object = datetime.strptime(date_parts[0], "%b")
                month_number = datetime_object.month
                due_date = datetime(int(date_parts[2]), int(month_number), int(date_parts[1])).isoformat()
    else:
        due_date = datetime.now().isoformat()
    return due_date

The due_date_tag contains a list of possible labels associated with the payment due:

due_date_tags = ["pay on or before", "payment due date", "payment due"]

If all required elements needed to issue a payment are found, it adds an item to the DynamoDB table with a status attribute of “Approved for Payment”. If the Lambda function cannot determine the value of one or more required elements, it adds an item to the DynamoDB table with a status attribute of “Pending Review”.

Payment Processing

If the item in the DynamoDB table is marked “Approved for Payment”, the processed invoice is archived. If the item’s status attribute is marked “Pending Review”, an SNS message is published to an SNS Pending Review topic. You can subscribe to this topic so that you can add additional labels to the Python code for determining payment due dates and payment amounts.

Note that the Lambda functions are single-purpose functions, and all workflow logic is contained in the Step Functions state machine. This diagram shows the various tasks (states) of a successful workflow.

For more information about this solution, download the code from the GitHub repo (https://github.com/aws-samples/aws-step-functions-rpa).

Prerequisites

Before deploying the solution, you must install the following prerequisites:

Python.
AWS Command Line Interface (AWS CLI) – for instructions, see Installing the AWS CLI.
AWS Serverless Application Model Command Line Interface (AWS SAM CLI) – for instructions, see Installing the AWS SAM CLI.

Deploying the solution

The solution creates the following S3 buckets with names suffixed by your AWS account ID to prevent a global namespace collision of your S3 bucket names:

scanned-invoices-<YOUR AWS ACCOUNT ID>
invoice-analyses-<YOUR AWS ACCOUNT ID>
processed-invoices-<YOUR AWS ACCOUNT ID>

The following steps deploy the example solution in your AWS account. The solution deploys several components including a Step Functions state machine, Lambda functions, S3 buckets, a DynamoDB table for payment information, and SNS topics.

AWS CloudFormation requires an S3 bucket and stack name for deploying the solution. To deploy:

Download code from GitHub repo (https://github.com/aws-samples/aws-step-functions-rpa).
Run the following command to build the artifacts locally on your workstation:sam build
Run the following command to create a CloudFormation stack and deploy your resources:sam deploy --guided --capabilities CAPABILITY_NAMED_IAM

Monitor the progress and wait for the completion of the stack creation process from the AWS CloudFormation console before proceeding.

Testing the solution

To test the solution, upload the PDF test invoices to the S3 bucket named scanned-invoices-<YOUR AWS ACCOUNT ID>.

A Step Functions state machine with the name <YOUR STACK NAME>-ProcessedScannedInvoiceWorkflow runs the workflow. Amazon Textract document analyses are stored in the S3 bucket named invoice-analyses-<YOUR AWS ACCOUNT ID>, and processed invoices are stored in the S3 bucket named processed-invoices-<YOUR AWS ACCOUNT ID>. Processed payments are found in the DynamoDB table named <YOUR STACK NAME>-invoices.

You can monitor the status of the workflows from the Step Functions console. Upon completion of the workflow executions, review the items added to DynamoDB from the Amazon DynamoDB console.

Cleanup

To avoid ongoing charges for any resources you created in this blog post, delete the stack:

Empty the three S3 buckets created during deployment using the S3 console:
– scanned-invoices-<YOUR AWS ACCOUNT ID>
– invoice-analyses-<YOUR AWS ACCOUNT ID>
– processed-invoices-<YOUR AWS ACCOUNT ID>
Delete the CloudFormation stack created during deployment using the CloudFormation console.

Conclusion

In this post, I showed you how to use a Step Functions state machine and Amazon Textract to automatically extract data from a scanned invoice. This eliminates the need for a person to perform the manual step of reviewing an invoice to find payment information to be fed into a backend system. By replacing the manual steps of a workflow with automation, an organization can free up their human workforce to handle more value-added tasks.

To learn more, visit AWS Step Functions and Amazon Textract for more information. For more serverless learning resources, visit https://serverlessland.com.

Field Notes: Optimize your Java application for Amazon ECS with Quarkus

2020-11-11 Sascha Moellering

Post Syndicated from Sascha Moellering original https://aws.amazon.com/blogs/architecture/field-notes-optimize-your-java-application-for-amazon-ecs-with-quarkus/

In this blog post, I show you an interesting approach to implement a Java-based application and compile it to a native image using Quarkus. This native image is the main application, which is containerized, and runs in an Amazon Elastic Container Service and Amazon Elastic Kubernetes Service cluster on AWS Fargate.

Amazon ECS is a fully managed container orchestration service, Amazon EKS is a fully managed Kubernetes service, both services support Fargate to provide serverless compute for containers. Fargate removes the need to provision and manage servers, lets you specify and pay for resources per application, and improves security through application isolation by design. AWS Lambda is a serverless compute service that runs your code in response to events and automatically manages the underlying compute resources for you.

Quarkus is a Supersonic Subatomic Java framework that uses OpenJDK HotSpot as well as GraalVM and over fifty different libraries like RESTEasy, Vertx, Hibernate, and Netty. In a previous blog post, I demonstrated how GraalVM can be used to optimize the size of Docker images. GraalVM is an open source, high-performance polyglot virtual machine from Oracle. I use it to compile native images ahead of time to improve startup performance, and reduce the memory consumption and file size of Java Virtual Machine (JVM)-based applications. The framework that allows ahead-of-time-compilation (AOT) is called Substrate.

Application Architecture

First, review the GitHub repository containing the demo application.

Our application is a simple REST-based Create Read Update Delete (CRUD) service that implements basic user management functionalities. All data is persisted in an Amazon DynamoDB table. Quarkus offers an extension for Amazon DynamoDB that is based on AWS SDK for Java V2. This Quarkus extension supports two different programming models: blocking access and asynchronous programming. For local development, DynamoDB Local is also supported. DynamoDB Local is the downloadable version of DynamoDB that lets you write and test applications without accessing the DynamoDB service. Instead, the database is self-contained on your computer. When you are ready to deploy your application in production, you can make a few minor changes to the code so that it uses the DynamoDB service.

The REST-functionality is located in the class UserResource which uses the JAX-RS implementation RESTEasy. This class invokes the UserService that implements the functionalities to access a DynamoDB table with the AWS SDK for Java. All user-related information is stored in a Plain Old Java Object (POJO) called User.

Building the application

To create a Docker container image that can be used in the task definition of my ECS cluster, follow these three steps: build the application, create the Docker Container Image, and push the created image to my Docker image registry.

To build the application, I used Maven with different profiles. The first profile (default profile) uses a standard build to create an uber JAR – a self-contained application with all dependencies. This is very useful if you want to run local tests with your application, because the build time is much shorter compared to the native-image build. When you run the package command, it also execute all tests, which means you need DynamoDB Local running on your workstation.

$ docker run -p 8000:8000 amazon/dynamodb-local -jar DynamoDBLocal.jar -inMemory -sharedDb

$ mvn package

The second profile uses GraalVM to compile the application into a native image. In this case, you use the native image as base for a Docker container. The Dockerfile can be found under src/main/docker/Dockerfile.native and uses a build-pattern called multi-stage build.

$ mvn package -Pnative -Dquarkus.native.container-build=true

An interesting aspect of multi-stage builds is that you can use multiple FROM statements in your Dockerfile. Each FROM instruction can use a different base image, and begins a new stage of the build. You can pick the necessary files and copy them from one stage to another, thereby limiting the number of files you have to copy. Use this feature to build your application in one stage and copy your compiled artifact and additional files to your target image. In this case, you use ubi-quarkus-native-image:20.1.0-java11 as base image and copy the necessary TLS-files (SunEC library and the certificates) and point your application to the necessary files with JVM properties.

FROM quay.io/quarkus/ubi-quarkus-native-image:20.1.0-java11 as nativebuilder
RUN mkdir -p /tmp/ssl-libs/lib \
  && cp /opt/graalvm/lib/security/cacerts /tmp/ssl-libs \
  && cp /opt/graalvm/lib/libsunec.so /tmp/ssl-libs/lib/

FROM registry.access.redhat.com/ubi8/ubi-minimal
WORKDIR /work/
COPY target/*-runner /work/application
COPY --from=nativebuilder /tmp/ssl-libs/ /work/
RUN chmod 775 /work
EXPOSE 8080
CMD ["./application", "-Dquarkus.http.host=0.0.0.0", "-Djava.library.path=/work/lib", "-Djavax.net.ssl.trustStore=/work/cacerts"]

In the second and third steps, I have to build and push the Docker image to a Docker registry of my choice which is straight forward:

$ docker build -f src/main/docker/Dockerfile.native -t

$ docker push <repo/image:tag>

Setting up the infrastructure

You’ve compiled the application to a native-image and have built a Docker image. Now, you set up the basic infrastructure consisting of an Amazon Virtual Private Cloud (VPC), an Amazon ECS or Amazon EKS cluster with on AWS Fargate launch type, an Amazon DynamoDB table, and an Application Load Balancer.

Figure 1: Architecture of the infrastructure (for Amazon ECS)

Codifying your infrastructure allows you to treat your infrastructure just as code. In this case, you use the AWS Cloud Development Kit (AWS CDK), an open source software development framework, to model and provision your cloud application resources using familiar programming languages. The code for the CDK application can be found in the demo application’s code repository under eks_cdk/lib/ecs_cdk-stack.ts or ecs_cdk/lib/ecs_cdk-stack.ts. Set up the infrastructure in the AWS Region us-east-1:

$ npm install -g aws-cdk // Install the CDK
$ cd ecs_cdk
$ npm install // retrieves dependencies for the CDK stack
$ npm run build // compiles the TypeScript files to JavaScript
$ cdk deploy  // Deploys the CloudFormation stack

The output of the AWS CloudFormation stack is the load balancer’s DNS record. The heart of our infrastructure is an Amazon ECS or Amazon EKS cluster with AWS Fargate launch type. The Amazon ECS cluster is set up as follows:

const cluster = new ecs.Cluster(this, "quarkus-demo-cluster", {
      vpc: vpc
    });
    
    const logging = new ecs.AwsLogDriver({
      streamPrefix: "quarkus-demo"
    })

    const taskRole = new iam.Role(this, 'quarkus-demo-taskRole', {
      roleName: 'quarkus-demo-taskRole',
      assumedBy: new iam.ServicePrincipal('ecs-tasks.amazonaws.com')
    });
    
    const taskDef = new ecs.FargateTaskDefinition(this, "quarkus-demo-taskdef", {
      taskRole: taskRole
    });
    
    const container = taskDef.addContainer('quarkus-demo-web', {
      image: ecs.ContainerImage.fromRegistry("<repo/image:tag>"),
      memoryLimitMiB: 256,
      cpu: 256,
      logging
    });
    
    container.addPortMappings({
      containerPort: 8080,
      hostPort: 8080,
      protocol: ecs.Protocol.TCP
    });

    const fargateService = new ecs_patterns.ApplicationLoadBalancedFargateService(this, "quarkus-demo-service", {
      cluster: cluster,
      taskDefinition: taskDef,
      publicLoadBalancer: true,
      desiredCount: 3,
      listenerPort: 8080
    });

Cleaning up

After you are finished, you can easily destroy all of these resources with a single command to save costs.

$ cdk destroy

Conclusion

In this post, I described how Java applications can be implemented using Quarkus, compiled to a native-image, and ran using Amazon ECS or Amazon EKS on AWS Fargate. I also showed how AWS CDK can be used to set up the basic infrastructure. I hope I’ve given you some ideas on how you can optimize your existing Java application to reduce startup time and memory consumption. Feel free to submit enhancements to the sample template in the source repository or provide feedback in the comments.

We also encourage you to explore how you to optimize your Java application for AWS Lambda with Quarkus.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Integrating AWS Outposts with existing security zones

2020-11-05 Shubha Kumbadakone

Post Syndicated from Shubha Kumbadakone original https://aws.amazon.com/blogs/compute/integrating-aws-outposts-with-existing-security-zones/

This post is contributed by Santiago Freitas and Matt Lehwess.

AWS Outposts is a fully managed service that extends AWS infrastructure, services, APIs, and tools to your on-premises facility. This blog post explains how the resources created on an Outpost can be integrated with security zones of an existing infrastructure. Integrating Outposts with existing security zones is important for customers that segment their environment into multiple domains based on their security policy.

Background

It’s common for customers to have different security zones within their infrastructure. For example, a customer might have a DMZ security zone where internet facing systems are located, an extra-net security zone where systems used to communicate with business partners are hosted, and an internal security zone where the systems accessible only by employees operate.

As illustrated in the following figure, customers often use firewalls and network ACLs to perform packet inspection and filtering for traffic that flows between security zones. Customers also usually use similar security controls for traffic flowing between different application tiers.

Example of firewall being used for security zone segmentation

Enabling connectivity from an Outposts VPC to your on-premises network

During your Outpost deployment, AWS works with you to establish connectivity to your local network. During this installation process, AWS creates an address pool based on an IP range you assign to the Outpost out of your local network’s IP ranges. This address pool is known as a customer-owned IP address pool or CoIP pool.

When resources running on Outposts (like an EC2 instance) need to communicate with existing on-premises systems, you must assign an Elastic IP address to them. This Elastic IP address comes from the CoIP pool. The resources in your local area network (LAN), including firewalls and network ACLs, see the Elastic IP address as the source IP of the packets sent from the resources running on Outposts. This Elastic IP address is also used as the destination IP for traffic from your on-premises systems to the instance on the Outpost.

How to enable connectivity from an Outposts VPC to your on-premises network

With a typical Outpost deployment, as shown in the following diagram, the following items are required to enable connectivity from the Outpost’s VPC to your on-premises network:

VPC routing: Routes to your on-premises network in the route table of the VPC’s Outpost subnet.
CoIP pool: Assigned by you, the customer, and created by AWS at time of install, for example, 192.168.1.0/26 in the diagram. It’s used to allocate Elastic IP addresses that can then be associated with instances or other resources on the Outpost.
Elastic IP address association: Customer configured, association of Elastic IP addresses from the CoIP pool with EC2 instances and other resources on the Outpost. For example, VPC IP 10.0.3.112 from instance Y has Elastic IP address 192.168.1.15 associated with it.
Routing advertisement: The Outpost advertises the assigned CoIP range to the local devices within your on-premises network via Border Gateway Protocol (BGP). This is configured by AWS during installation and based on the CoIP pool IP range you provide.

AWS Outposts – typical network topology

Integrating Outposts with existing security zones

CoIP pools, and AWS Resource Access Manager (RAM) are used to facilitate the integration of Outposts with your existing security zones. AWS RAM lets you share your resources with AWS account or through AWS Organizations.

When AWS deploys your Outposts, you can ask AWS to create multiple CoIP pools. Each CoIP pool is configured with an IP range assigned by you during initial installation. If after the initial installation you need AWS to create additional CoIP pools for an existing Outpost, you can open a case with AWS Support and request the creation of additional pools.

CoIP pools are owned initially by the AWS account that owns the Outpost. In a multi-account Outpost deployment, you can share your customer-owned IP pool(s) with other AWS accounts in your AWS Organization using AWS RAM.

VPCs that have subnets on the Outpost must be created in the same account that owns the Outpost. After creation of these subnets within the Outpost account, AWS RAM can then be used to share these subnets with other accounts or organizational units that are in the same organization from AWS Organizations.

After you’ve shared the CoIP pool with the account where you’d like to run your workloads, and you’ve shared an Outpost subnet with that same account, users of that account can deploy resources in the shared Outpost subnet. For the Outposts resources that must communicate with resources in your existing infrastructure, users can assign Elastic IP addresses from one or more of the shared CoIP pools to those Outposts resources.

Multiple CoIP pools and the ability to share them and Outpost subnets with a particular account, gives you granular control over the IP range used by Outpost resources requiring connectivity to your on-premises LAN.

The following figure illustrates the sharing of a subnet and CoIP pool from Account 1, which is the account where the Outpost was initially deployed, with Account 2. As the CoIP pool was shared with Account 2, the users in Account 2 are able to assign an Elastic IP address to the EC2 instance running within Account 2.

AWS Outposts and VPC Sharing

Creating an AWS RAM resource share

The following screenshots demonstrate how to use AWS RAM to share a CoIP pool and a subnet with another account in the same organization from AWS Organizations.

Step 1. Navigate to the AWS RAM service page within the AWS Management Console, and click Create Resource Share. This step is done within the AWS account that owns the Outpost.

Step 2. Next, give the share an appropriate name.

Step 3. Select one or more subnets you’d like to share with your application owner’s account. In this case, select a subnet that exists in your VPC on the Outpost.

Step 4. In this case, share the CoIP range. You can find that in the resources type list.

Step 5. Select the Customer-owned Ipv4Pool resource type, and then select the CoIP to be shared. Selecting the the CoIP adds it to the list of shared resources along with the subnet selected earlier.

Step 6. Add the account number for the account you’d like to share the Outposts subnet and CoIP pool with.

Step 7. Click Create Resource. After clicking the Create Resource share button, you should see the resource share listed and be in the state “active.”

Now that you’ve successfully shared the subnet and CoIP pool with your application account (that’s in the same organization), you can now go in to that account and allocate an Elastic IP address from the CoIP pool. You can then assign the Elastic IP address to an instance launched in the subnet previously shared, which is on the Outpost.

Allocating and associating the Elastic IP address from your CoIP pool

Step 1. From within the application account, allocate an Elastic IP address from the CoIP pool. This is done by navigating to the VPC console, then to Elastic IP addresses, and then click Allocate Elastic IP address.

Step 2. Select Customer owned pool of IPv4 addresses, select the CoIP pool that’s been shared with this account previously, then click Allocate. You can see this step in the following image.

Step 3. You can now see the new Elastic IP address that has been allocated from the shared CoIP pool. You can now associate that Elastic IP address with an instance in our shared subnet.

Note: When using the AWS Management Console to allocate an Elastic IP address, the ElasticIP address is automatically allocated from the CoIPpool. If you’re using the AWS CLI, you have the option to select a specific Elastic IP address from the CoIP pool by using the --customer-owned-ipv4-pool <value> option.

Step 4. You can now associate the Elastic IP address of type CoIP with an instance. Click Associate after selecting the instance that is running on your Outpost in the shared subnet. This step is represented in the following image.

Step 5 – Once the operation is complete, you can see the CoIP Elastic IP address in the Elastic IP address console, and that it’s assigned to the instance that’s running in your shared subnet.

The steps preceding demonstrated how to share the Outpost with other accounts through the use of AWS Resource Access Manager, subnet sharing, and CoIP sharing.

Use case

Let’s apply the capabilities described prior into a design. A customer is running a 3-tier web application and would like to host the web and application tiers on Outposts. The database tier is hosted on the existing infrastructure. The application’s user traffic comes from the internet, through an external firewall towards the web tier hosted on the Outpost. Traffic between web and application tiers is secured with security groups and remains within the Outpost. Application servers connect to the database servers via the existing internal firewalls. The customer uses independent external and internal firewalls.

During the Outposts installation, the customer asks AWS to create two CoIP pools. The first CoIP pool, referenced in the following image as CoIP Web, is assigned the IP range 172.0.1.0/24. The second CoIP pool, referenced in the following image as CoIP App, is assigned the IP range 172.0.2.0/24. The customer then creates two additional AWS accounts, one for the web tier and another for the app tier. Within the account that owns the Outpost, the customer creates two VPCs and within each of those VPCs a subnet is created.

The subnet with IP address 10.0.1.0/24 is shared with the web tier account and the subnet with IP address 10.1.2.0/24 is shared with the app tier account. The customer then configures VPC peering between the VPCs to enable communication between the web and app tiers. VPC peering configuration is performed within the account that owns the Outposts.

Note: With VPC peering traffic between the VPCs remains within the Outpost. However, data transfer charges for VPC peering applies. This use-case could also be built with the web tier and app tier using different subnets within the same VPC to save on data transfer costs over VPC peering.

The following image shows initial setup with two CoIP pools, two accounts, and two VPCs.

Initial setup with two CoIP pools, two accounts and two VPCs

After the initial setup the customer then shares each CoIP pool with the appropriate account using AWS RAM. The customer can then create their web and application servers within the respective accounts. As shown in the prior image, the web server is assigned IP address 10.0.1.5 and the application server is assigned IP address 10.1.2.10. Each IP address is part of their respective VPC IP range and are reachable only within the Outpost at this point.

To enable the integration of the web and application servers with the existing infrastructure, the customer assigns an Elastic IP address from the CoIP pool shared with each account to their respective Amazon EC2 instances.

With this configuration, the customer can integrate the web and application servers into the existing security zones. To integrate the web servers, path “A” in the following image, the customer creates a rule in their external firewall that allows communication from any source (internet) to Elastic IP address 172.0.1.5 on port 443 (HTTPS) which has been assigned to the web server.

For the communication between the web and application servers, path “B” in the following image, the customer has configured VPC peering between the two VPCs and created the required security groups.

To integrate the application servers into the Internal Security Zone, path “C” in the following image, the customer has assigned Elastic IP address 172.0.2.10 to the application server and has configured a rule on the Internal Firewall allowing the IP range 172.0.2.0/24 that is assigned to the app CoIP pool to communicate with the database server.

End to end traffic flow from users to web tier and from app tier to database tier

In addition to setting up their Outposts as covered in the preceding details, to enable the communication between the Outposts hosted resources and the existing infrastructure you must create a custom route table and associate it with an Outpost subnet.

Conclusion

AWS Outposts extends AWS infrastructure, services, APIs, and tools to your on-premises facility. During deployment of your Outposts, AWS works with you to establish connectivity to your local network. This blog post builds upon this initial deployment of an Outpost to allow the Outpost’s resources to integrate into your existing security zones. The design is applicable for customers that segment their environment into multiple security zones.

Santiago Freitas is AWS Head of Technology for Central Eastern Europe (CEE), Middle East, North Africa (MENA), Sub Saharan Africa (SSA), Turkey (TUR), and Russia and Commonwealth of Independent States (RUS-CIS). Previously he was an AWS Global Solutions Architect for financial services. Before joining AWS, Santiago was a Distinguished Engineer at Cisco. He is based in Dubai, United Arab Emirates.

Matt is a Principal Developer Advocate for Amazon Web Services where he has spent the last 7 years working with AWS customers and partners, helping them to build best-in-class AWS networking solutions. He is co-author of the AWS Certified Advanced Networking Official Study Guide, as well as an Amazon Re:Invent speaker extraordinaire (5 years and counting!). His primary focus at AWS is to help evangelize AWS networking solutions and more recently, AWS Outposts.

Field Notes: How to Identify and Block Fake Crawler Bots Using AWS WAF

2020-11-04 Vijay Menon

Post Syndicated from Vijay Menon original https://aws.amazon.com/blogs/architecture/field-notes-how-to-identify-and-block-fake-crawler-bots-using-aws-waf/

In this blog post, we focus on how to identify fake bots using these AWS services: AWS WAF, Amazon Kinesis Data Firehose, Amazon S3 and AWS Lambda. We use fake Google/Bing bots to demonstrate, but the principles can be applied to other popular crawlers like Slurp Bot from Yahoo, DuckDuckBot from DuckDuckGo, Alexa crawler from Alexa internet ranking service.

For industries like media, online retailors, news or social websites, content is critical and often sets them apart from other competitors. These companies put in a significant amount of effort to make the content as visible and accessible as possible. To do that these companies rely on crawler bots, so that legitimate users searching for content can find the content easily. Crawler bots are useful for indexing the site pages and helping make the content more searchable and improve rankings.

However, this capability can be misused. So it is important to distinguish between genuine crawler bots and fake ones that are doing more than just indexing your site. It’s important to properly identify good and bad actors so that you can stop the bad ones without impacting the ability of good ones, and at scale. This helps in driving more traffic, visitors, and more revenue from your websites.

Identifying bots

There are two primary sources of information required to identify a fake bot:

HTTP Header User-Agent: Fake bots try to present themselves as real bots, for example as Google or Bing, by using the same user agent string used by Google or Bing.
IP Address: You can look at the source IP address of the incoming request and determine if it belongs to the search engine provider network like Google or Bing. You can do this by performing a forward and reverse look up and comparing the results. These methods are well documented by the search engine providers Google and Bing.

Solution Overview

The solution leverages the capabilities of AWS WAF. Our demonstration application is a static website hosted on Amazon S3 fronted by Amazon CloudFront. This means that we can provide permission of access to the S3 bucket only to CloudFront using origin access identity.

The logs are streamed in near real time using Amazon Kinesis Data Firehose, inspected using a Lambda function to help identify fake bots before storing the logs on Amazon S3. The Lambda function does two things:

Inspect the traffic using the rules to identify bad or fake bots. In this case it uses forward and reverse DNS lookup results of the Client IP address of packets with User-Agent string resembling GoogleBot or BingBot.
Once identified as a fake bot, the Lambda function updates AWS WAF IP-Set to permanently block the requests coming from IP addresses of fake bots.

Note: For the sake of this demonstration, we are using a static website hosted on Amazon S3 with CloudFront. This requires the AWS WAF and IP-Set used by AWS WAF to be of scope ‘CLOUDFRONT’. You can modify it to use scope ‘REGIONAL’ if you chose to protect your web properties behind Application Load Balancer or API Gateway.

Solution Architecture

Prerequisites

Readers of this blog post should be familiar with HTTP and the following AWS services:

For this walkthrough, you should have the following:

AWS Account
AWS Command Line Interface (AWS CLI): You need AWS CLI installed and configured on the workstation from where you are going to try the steps mentioned below.
Credentials configured in AWS CLI should have the required IAM permissions to spin up and modify the resources mentioned in this post.
Make sure that you deploy the solution to us-east-1 Region and your AWS CLI default Region is us-east-1. If us-east-1 is not the default Region, reference the Regions explicitly while executing AWS CLI commands using --region us-east-1 switch.
Amazon S3 bucket in us-east-1 region

Walkthrough

1. Create an Amazon S3 static Website and put it behind an Amazon CloudFront distribution. You can follow these steps. Note down the CloudFront distribution ID. We will use it in subsequent steps.

2. Download and unzip the file containing CloudFormation template and lambda function code from here to a folder in your local workstation. You will need to run all of the subsequent commands from this folder.

3. Zip the lambda function lambda_function.py and upload it to an Amazon S3 bucket of your choice in us-east-1. Note the bucket name as it will be used in the subsequent steps. The blog uses waf-logs-fake-bots-us-east-1 for reference as the S3 bucket name.

$ zip lambda_function.py lambda_function.py.zip $ aws s3 cp lambda_function.py.zip s3://waf-logs-fake-bots-us-east-1/

4. Create the resources required for this blog post by deploying the AWS CloudFormation template and running the below command:

aws cloudformation create-stack \
--stack-name FakeBotBlockBlog \
--template-body file://BotBlog.yml \
--parameters ParameterKey=KinesisBufferIntervalSeconds,ParameterValue=900 ParameterKey=KinesisBufferSizeMB,ParameterValue=3 ParameterKey=IPSetName,ParameterValue=BlockFakeBotIPSet ParameterKey=IPSetScope,ParameterValue=CLOUDFRONT ParameterKey=S3BucketWithDeploymentPackage,ParameterValue=waf-logs-fake-bots-us-east-1 ParameterKey=DeploymentPackageZippedFilename,ParameterValue=lambda_function.py.zip \
--capabilities CAPABILITY_IAM \
--region us-east-1

You need to provide the following information, and you can change the parameters based on your specific needs:

a. KinesisBufferIntervalSeconds and KinesisBufferSizeMB. These will define the interval at which Kinesis Firehose ships the logs to Amazon S3, the Default is 900 seconds and 3MB respectively, whichever is met first.

b. IPSetName. Name of the IP Set which will be used to record the client IP address of fake bots. Default value is BlockFakeBotIPSet.

c. IPSetScope. Scope of the IP Set. I am using CLOUDFRONT and associate it with the CloudFront distribution created in step 1. You can choose to make it REGIONAL in which case WebACLassociation will need to be with an ALB or an API Gateway.

d. S3BucketWithDeploymentPackage. Name of S3 bucket used in step 3. The blog assumes waf-logs-fake-bots-us-east-1.

e. DeploymentPackageZipedFilename. Lambda function filename without the file extension. For example, the blog assumes lambda_function.py.zip is available on the Amazon S3 bucket and uses this value for this parameter.

Some stack templates might include resources that can affect permissions in your AWS account, for example, by creating new AWS Identity and Access Management (IAM) role. For those stacks, you must explicitly acknowledge this by specifying CAPABILITY_IAM or CAPABILITY_NAMED_IAM value for the –capabilities parameter.

Stack creation will take you approximately 5-7 minutes. Check the status of the stack by executing the below command every few minutes. You should see StackStatus value as CREATE_COMPLETE.

Example:

aws cloudformation describe-stacks --stack-name FakeBotBlockBlog | grep StackStatus

The CloudFormation template will create the following resources:

IP Set for AWS WAF
WebACL with rules to block the client IP addresses of fake bots, and an AWS-managed common rule set.
Lambda function to help detect fake bots and modify the AWS WAF IP Set to block them
Kinesis Firehose delivery stream, which will use the above Lambda function for processing
IAM roles with required permissions for the Lambda function and Kinesis Firehose
S3 bucket for AWS WAF logs

5. Enable logging for the WebACL using AWS CLI. For this you need the ARN of the WebACL and Kinesis Firehose. You can find that information from the output of the CloudFormation stack created in step 4 using the below AWS CLI command

aws cloudformation describe-stacks --stack-name FakeBotBlockBlog

Please note the 2 ARNs and run the following commands by replacing (1) ResourceArn value with WebACL ARN and (2) LogDestinationConfigs value with Kinesis Firehose delivery stream ARN.

Example:

aws wafv2 put-logging-configuration –-logging-configuration ResourceArn=arn:aws:wafv2:us-east-1:123456789012:global/webacl/FakeBotWebACL/259ea98f-24ba-4acd-8803-3e7d02e8d482,LogDestinationConfigs=arn:aws:firehose:us-east-1:123456789012:deliverystream/aws-waf-logs-FakeBotBlockBlog --region us-east-1

6. Associate CloudFront distribution with this WebACL: Sign in to the AWS Management Console and open the AWS WAF and Shield console at https://console.aws.amazon.com/wafv2/homev2/web-acls?region=global .

Click on the WebACL created earlier in this procedure
Navigate to ‘Associated AWS resources’ tab and select Add AWS resources
In the subsequent screen, select the CloudFront distribution created in step 1
Select Add

Note: If you are using an ALB or API Gateway for your web property, then you need to use REGIONAL WebACL and IP Set. Review the procedure to associate an ALB or API Gateway to the WebACL.

You can monitor WebACL performance from the Overview section of WebACL from the AWS WAF and Shield console.

Testing

To test, you will need to generate some traffic which will trigger the lambda function to detect and block the fake bots created earlier in this blog. The web traffic can be generated from the local machine or from an EC2 instance with access to the internet using curl. Manually set the user agent to resemble Googlebot by running the following command from shell:

Replace http://www.awsdemodesign.com/ with the URL of your CloudFront distribution you created in step 1 of the walkthrough.

for i in {1..1000}; do curl -I -A "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" http://www.awsdemodesign.com/; done

Initially you will see HTTP1.1/ 200 OK response. This will trigger Lambda and modify the IP Set to include your public IP address to be blocked. You can verify that by inspecting the IP set from AWS WAF and Shield console.

Sign in to the AWS Management Console and open the AWS WAF and Shield console
Click on IP Set created earlier in this blog. In the subsequent screen you can see your public IP address in the list of IP addresses.

If you run the curl command again, you will see that the response now is HTTP/1.1 403 Forbidden.

Clean Up

Disassociate the CloudFront Distribution from WebACL
Delete the S3 bucket and CloudFront Distribution created in Step 1
Empty and delete the S3 bucket created by the CloudFormation stack for AWS WAF logs.
Delete CloudFormation stack created in Step 4.

Conclusion

In this blog post, we demonstrated how you can set up and inspect incoming web traffic using AWS Lambda, AWS WAF native logging capabilities, and Kinesis Firehose to help detect and block bad or fake bots at scale. Furthermore, the solution outlined in this post provides a framework which can be extended to identify similar unwanted traffic impersonating as other good bots.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Announcing Outposts and local gateway sharing for multi-account access

2020-11-03 Shubha Kumbadakone

Post Syndicated from Shubha Kumbadakone original https://aws.amazon.com/blogs/compute/announcing-outposts-and-local-gateway-sharing-for-multi-account-access/

This post was contributed by James Devine, Sr. Outposts SA

AWS Outposts enables customers to run AWS services in their on-premises environments. With the release of Outposts and local gateway (LGW) sharing, customers can now configure multi-account access and sharing within an AWS Organization.

Prior to this release, an Outpost was only viable within a single AWS account. VPC sharing was the main way to enable multiple accounts to use Outposts capacity. With the release of Outposts and LGW sharing support, there is now additional functionality to enable multi-account access Outpost capacity within an AWS Organization.

Outposts and LGW sharing is facilitated through AWS Resource Access Manager (RAM). It enables Outposts and LGWs to be shared with AWS accounts within the same AWS Organization. The account that orders Outposts is the owner account that can create resource shares. The accounts that have access to the share are called consumer accounts. Each consumer account can create its own VPCs with subnets that reside on the shared Outpost.

This post will discuss how to start using this new functionality and considerations to take into account.

Use cases

Per AWS best practices, customers typically deploy a number of AWS accounts. Utilizing multiple accounts allows for reduced blast radius and the ability to provide infrastructure isolation by line of business, environment type, and even down to individual workloads. Outposts sharing enables customers to extend their existing AWS account structures to seamlessly integrate with Outposts.

Getting started – creating a resource share

Before any resources can be shared, the first step is to configure an AWS Organization (if one does not already exist). Outposts resources can only be shared with accounts under the same AWS Organization. The Outposts can reside in any account under an organization. For centralized management of Outposts, it is recommend to create a dedicated account, or set of accounts, to host Outposts.

Once an organization is created with member AWS accounts, resources shares can then be created. It’s possible to place multiple resources into a resource share. To facilitate Outposts, LGW, and customer-owned IP (CoIP) sharing, a single resource share can be created that includes all three resources. Principals can then be added to the resource share. The principals can be both organizational units (OUs) and individual AWS account IDs within the AWS Organization. In this case, I’ve shared all three resources with a consumer account ID as a principal, as demonstrated in the following screenshot.

Screen shot of resource sharing.

Sharing an Outpost

After an Outposts is provisioned, the logical Outpost ID can be shared with any account under the AWS Organization. The consumer account then has access to provision resources on the Outposts, such as Amazon EBS volumes and Outposts subnets, as well as launching instances on the shared Outpost.

From the AWS Management Console in the consumer account, I can see the shared Outposts ID, its associated Availability Zone, and the owner account ID.

Once the Outposts ID is selected, I can use the Actions drop down menu to create Outposts subnets and EBS volumes. I can also select Launch instance to provision instances on the Outpost.

Sharing an LGW

Each consumer account can create their own Outposts subnets within their own VPCs. LGW sharing enables the consumer account to create routes an Outposts subnet route table that has a shared LGW as the destination. This enables Outposts subnets in the consumer account to have communication with the on-premises network through the shared LGW.

The consumer account view shared LGWs, as shown in the following screenshot.

The consumer account can then select VPCs within the account to associate with the LGW route table. This enables routing to on-premises if a CoIP is assigned to an instance.

This enables routing to on-premises if a CoIP is assigned to an instance.

Considerations

LGW and Outposts sharing is meant to enable sharing of resources between various accounts within a larger organizational structure. It is not suitable for multi-tenancy outside of an AWS Organization. Additional considerations around capacity planning, access, and local network connectivity should be taken into account.

Resources created in the consumer account are only visible from within the consumer account. The AWS account that owns the Outpost does not have the ability to view instances, EBS volumes, VPCs, subnets, or any other resource created within the consumer account. Since the consumer account is part of an AWS Organization, it is possible to use the default OrganizationAccountAccessRole role that is created by AWS Organizations. This allows for visibility and management of Outpost resources across the AWS Organization.

Capacity information is not shared with the consumer account. However, it is possible to use cross-account CloudWatch metric sharing. Outposts utilization metrics from the account that owns the Outpost can be shared with the consumer account. This allows the consumer account to see what capacity is available on the shared Outposts. I’ve configured the cross-account sharing, and from my consumer account I can see that there is ample c5.xlarge capacity on the shared Outposts.

I’ve configured the cross-account sharing, and from my consumer account I can see that there is ample c5.xlarge capacity on the shared Outposts.

If a principal (consumer account or organizational unit) no longer requires access to Outposts capacity, the resource share can be deleted through RAM in the primary Outposts account. It is important to note that this does not delete subnets, EBS volumes, instances, or other resources running on the shared Outposts. Proper cleanup of Outposts resources within the consumer account (EBS volumes, instances, subnets, etc.) should be planned for whenever removing principals from a resource share to ensure that the capacity is released.

Conclusion

In the blog post, I described the Outposts and LGW sharing capabilities and demonstrated how they can be used to enable multi-account sharing of an Outpost within an AWS Organization. These new capabilities unlock even more customer use cases and allow for stronger blast-radius and account isolation. It’s exciting to see continued functionality come to Outposts! You can start using LGW and Outposts sharing today. There’s no need to upgrade or modify your Outposts in any way to take advantage of this new and exciting functionality.

Fine-tuning blue/green deployments on application load balancer

2020-10-31 Raghavarao Sodabathina

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/devops/blue-green-deployments-with-application-load-balancer/

In a traditional approach to application deployment, you typically fix a failed deployment by redeploying an older, stable version of the application. Redeployment in traditional data centers is typically done on the same set of resources due to the cost and effort of provisioning additional resources. Applying the principles of agility, scalability, and automation capabilities of AWS can shift the paradigm of application deployment. This enables a better deployment technique called blue/green deployment.

Blue/green deployments provide near-zero downtime release and rollback capabilities. The fundamental idea behind blue/green deployment is to shift traffic between two identical environments that are running different versions of your application. The blue environment represents the current application version serving production traffic. In parallel, the green environment is staged running a newer version of your application. After the green environment is ready and tested, production traffic is redirected from blue to green. If any problems are identified, you can roll back by reverting traffic to the blue environment.

Canary deployments are a pattern for the slow rollout of new version of an existing application. The canary deployments incrementally deploy the new version, making it visible to new users in a slow fashion. As you gain confidence in the deployment, you can deploy it to replace the current version in its entirety.

AWS provides several options to help you automate and streamline your blue/green deployments and canary deployments, one such approach is using Application Load Balancer weighted target group feature. In this post, we will cover the concepts of target group stickiness, load balancer stickiness, connection draining and how they influence traffic shifting for canary and blue/green deployments when using the Application Load Balancer weighted target group feature .

Application Load Balancer weighted target groups

A target group is used to route requests to one or more registered targets like Amazon Elastic Compute Cloud (Amazon EC2) instances, fixed IP addresses, or AWS Lambda functions, among others. When creating a load balancer, you create one or more listeners and configure listener rules to direct the traffic to a target group.

Application Load Balancers now support weighted target groups routing. With this feature, you can add more than one target group to the forward action of a listener rule, and specify a weight for each group. For example, when you define a rule as having two target groups with weights of 9 and 1, the load balancer routes 90% of the traffic to the first target group and 10% to the other target group. You can create and configure your weighted target groups by using AWS Console , AWS CLI or AWS SDK.

For more information, see How do I set up weighted target groups for my Application Load Balancer?

Target group level stickiness

You can set target group stickiness to make sure clients get served from a specific target group for a configurable duration of time to ensure consistent experience. Target group stickiness is different from the already existing load balancer stickiness (also known as sticky sessions). Sticky sessions make sure that the requests from a client are always sticking to a particular target within a target group. Target group stickiness only ensures the requests are sent to a particular target group.

You can enable target group level stickiness using the AWS Command Line Interface (AWS CLI) with the TargetGroupStickinessConfig parameter, as shown in the following CLI command:

aws elbv2 modify-listener \
    --listener-arn " < LISTENER ARN > " \
    --default-actions \
    '[{
       "Type": "forward",
       "Order": 1,
       "ForwardConfig": {
          "TargetGroups": [
             {"TargetGroupArn": "<Blue Target Group ARN>", "Weight": 90}, \
             {"TargetGroupArn": "<Green Target Group ARN>", "Weight": 10}, \
          ],
          "TargetGroupStickinessConfig": {
             "Enabled": true,
             "DurationSeconds": 120
          }
       }
    }]'

In the next sections, we will see how to fine-tune weighted target group configuration to achieve effective canary deployments and blue/green deployments.

Canary deployments with Application Load Balancer weighted target group

The canary deployment pattern allows you to roll out a new version of your application to a subset of users before making it widely available. This can be helpful in validating the stability of a new version of the application or performing A/B testing.

For this use case, you want to perform canary deployment for your application and test it by driving only 10% of the incoming traffic to your new version for 12 hours. You need to create two weighted target groups for your Application Load Balancer and use target group stickiness set to a duration of 12 hours. When target group stickiness is enabled, the requests from a client are sent to the same target group for the specified time duration.

Figure 1: Blue and green target groups with weights 90 and 10 for canary deployment

We can define a rule as having two target groups, blue and green, with weights of 90 and 10, respectively, and enable target group level stickiness with a duration of 12 hours (43,200 seconds). The following table summarizes this configuration. See the following CLI command:

aws elbv2 modify-listener \
    --listener-arn " < LISTENER ARN > " \
    --default-actions \
    '[{
       "Type": "forward",
       "Order": 1,
       "ForwardConfig": {
          "TargetGroups": [
             {"TargetGroupArn": "<Blue Target Group ARN>", "Weight": 90}, \
             {"TargetGroupArn": "<Green Target Group ARN>", "Weight": 10}, \
          ],
          "TargetGroupStickinessConfig": {
             "Enabled": true,
             "DurationSeconds": 43200
          }
       }
    }]'

At this point, the users with existing sessions continue to be sent to the blue target group running version 1, and 10% of the new users without a session are sent to the green target group up to 12 hours running version 2, as illustrated in the following diagram.

Figure 2: Blue-green deployment architecture with 90% blue traffic and 10% green traffic.

When you’re confident that the new version is performing well and stable, you can update the target group weights for your blue and green target groups to be 0% and 100%, respectively, to ensure that all the traffic is shifted to your green target group. You may still see some traffic flowing into the blue target group for existing users with active session whose target group stickiness duration (in this case target group stickiness duration is 12 hours) has not expired.

Recommendation: As illustrated above, target group stickiness duration still influences the traffic shift between blue and green targets. So we recommend you to reduce the target group stickiness duration from 12 hours to 5 minutes or less depending upon your use case to ensure that the existing users going to the blue target group also fully transition to the green target group at the earliest. Some of our customers are using target group stickiness duration as 5 minutes to shift their traffic to green target group after successful canary testing.

The recommended value of stickiness may vary across application types. For example, for a typical 3-tier front-end deployment, lower target group stickiness value is desirable. However, for middle tier deployment, the target group stickiness duration value may need to be higher to account for longer transactions.

Blue/green deployments with Application Load Balancer weighted target group

For this use case, you want you perform blue/green deployment for your application to provide near-zero downtime release and rollback capabilities. You can create two weighted target groups called blue and green with the following weights applied as an initial configuration.

Figure 3: Blue/green deployment configuration with blue target group 100% and green target group 0%

When you’re ready to perform the deployment, you can change the weights for blue and green targets groups to be 0% and 100%, respectively, to shift the traffic completely to your newer version of the application.

Figure 4: Blue/green deployment configuration with blue target group 0% and green target group 100%

When you’re performing blue/green deployment using weighted target groups, the recommendation is to not enable target group level stickiness so that traffic shifts immediately from the blue target group to the green target group. See the following CLI command:

aws elbv2 modify-listener \
    --listener-arn "<LISTENER ARN>" \
    --default-actions \
    '[{
       "Type": "forward",
       "Order": 1,
       "ForwardConfig": {
          "TargetGroups": [
             {"TargetGroupArn": "<Blue Target Group>", "Weight": 0}, \
             {"TargetGroupArn": "<Green Target Group>", "Weight": 100}, \
          ]
       }
    }]'

The following diagram shows the updated architecture.

Figure 5: Blue-green deployment architecture with 0% blue traffic and 100% green traffic

If you need to enable target group level stickiness, you can ensure that all traffic transitions from the blue target group to the green target group by keeping the target group level stickiness duration as low as possible (5 minutes or less).

In the following code, the target group level stickiness is enabled for a duration of 5 minutes and traffic is completely shifted from the blue target group to the green target group:

aws elbv2 modify-listener \
    --listener-arn "<LISTENER ARN> " \
    --default-actions \
    '[{
       "Type": "forward",
       "Order": 1,
       "ForwardConfig": {
          "TargetGroups": [
             {"TargetGroupArn": "<Blue Target Group>", "Weight": 0}, \
             {"TargetGroupArn": "<Green Target Group>", "Weight": 100}, \
          ],
          "TargetGroupStickinessConfig": {
             "Enabled": true,
             "DurationSeconds": 300
          }
       }
    }]'

The existing users with connection stickiness to the blue target group continue to the blue target group until the 5-minute duration elapses from the last request time.

As recommended above, the recommended value of stickiness may vary across application types.

Connection draining

To provide near-zero downtime release with blue/green deployment, you want to avoid breaking open network connections while taking an instance out of service, updating its software, or replacing it with a fresh instance that contains updated software. In the above use cases, you can ensure graceful transition between blue and green target groups by enabling the connection draining feature for your Elastic Load Balancers. You can do this from the AWS Management Console, the AWS CLI, or by calling the ModifyLoadBalancerAttributes function in the Elastic Load Balancing API. You can enable the feature and enter a timeout between 1 second and 1 hour. The connection time out duration depends upon your application profile. If your application is stateless like your customers are using your website, connection time out duration of lowest value is preferable. Applications that are transactions heavy and connection oriented sessions like web sockets, we recommend you to choose relatively high connection draining duration as it will impact the customer experience adversely.

Load balancer stickiness

In addition to the target group level stickiness, Application Load Balancer also supports load balancer level stickiness. When a load balancer first receives a request from a client, it routes the request to a target, generates a cookie named AWSALB that encodes information about the selected target, encrypts the cookie, and includes the cookie in the response to the client. The client should include the cookie that it receives in subsequent requests to the load balancer. When the load balancer receives a request from a client that contains the cookie, if sticky sessions are enabled for the target group and the request goes to the same target, the load balancer detects the cookie and routes the request to the same target. If the cookie is present but can’t be decoded, or if it refers to a target that was deregistered or is unhealthy, the load balancer selects a new target and updates the cookie with information about the new target.

You can enable Application Load Balancer stickiness using the AWS CLI or the console. You can specify a value between 1 second–7 days.

In the context of blue/green and canary deployments, the load balancer stickiness has no influence on the traffic shifting behavior using the weighted target groups because target group stickiness takes precedence over load balancer stickiness.

Conclusion

In this post, we showed how to perform canary and blue/green deployments with Application Load Balancer’s weighted target group feature and how target group level stickiness impacts your canary and blue/green deployments. We also demonstrated how quickly you can enable ELB connection draining to provide near-zero downtime release with blue/green deployment. We hope that you find these recommendations helpful when you build a blue/green deployment with Application Load Balancer. You can reach out to AWS Solutions Architects and AWS Support teams for further assistance.

Raghavarao Sodabathina is an Enterprise Solutions Architect at AWS, focusing on Data Analytics, AI/ML, and Serverless Platform. He engages with customers to create innovative solutions that address customer business problems and accelerate the adoption of AWS services. In his spare time, Raghavarao enjoys spending time with his family, reading books, and watching movies.

Siva Rajamani is a Boston-based Enterprise Solutions Architect for AWS. He enjoys working closely with customers, supporting their digital transformation and AWS adoption journey. His core areas of focus are Serverless, Application Integration, and Security. Outside of work, he enjoys outdoor activities and watching documentaries.

TAGS: blue-green deployments

Rapid and flexible Infrastructure as Code using the AWS CDK with AWS Solutions Constructs

2020-10-31 Biff Gaut

Post Syndicated from Biff Gaut original https://aws.amazon.com/blogs/devops/rapid-flexible-infrastructure-with-solutions-constructs-cdk/

Introduction

As workloads move to the cloud and all infrastructure becomes virtual, infrastructure as code (IaC) becomes essential to leverage the agility of this new world. JSON and YAML are the powerful, declarative modeling languages of AWS CloudFormation, allowing you to define complex architectures using IaC. Just as higher level languages like BASIC and C abstracted away the details of assembly language and made developers more productive, the AWS Cloud Development Kit (AWS CDK) provides a programming model above the native template languages, a model that makes developers more productive when creating IaC. When you instantiate CDK objects in your Typescript (or Python, Java, etc.) application, those objects “compile” into a YAML template that the CDK deploys as an AWS CloudFormation stack.

AWS Solutions Constructs take this simplification a step further by providing a library of common service patterns built on top of the CDK. These multi-service patterns allow you to deploy multiple resources with a single object, resources that follow best practices by default – both independently and throughout their interaction.

Comparison of an Application stack with Assembly Language, 4th generation language and Object libraries such as Hibernate with an IaC stack of CloudFormation, AWS CDK and AWS Solutions Constructs

Application Development Stack vs. IaC Development Stack

Solution overview

To demonstrate how using Solutions Constructs can accelerate the development of IaC, in this post you will create an architecture that ingests and stores sensor readings using Amazon Kinesis Data Streams, AWS Lambda, and Amazon DynamoDB.

An architecture diagram showing sensor readings being sent to a Kinesis data stream. A Lambda function will receive the Kinesis records and store them in a DynamoDB table.

Prerequisite – Setting up the CDK environment

Tip – If you want to try this example but are concerned about the impact of changing the tools or versions on your workstation, try running it on AWS Cloud9. An AWS Cloud9 environment is launched with an AWS Identity and Access Management (AWS IAM) role and doesn’t require configuring with an access key. It uses the current region as the default for all CDK infrastructure.

To prepare your workstation for CDK development, confirm the following:

Node.js 10.3.0 or later is installed on your workstation (regardless of the language used to write CDK apps).
You have configured credentials for your environment. If you’re running locally you can do this by configuring the AWS Command Line Interface (AWS CLI).
TypeScript 2.7 or later is installed globally (npm -g install typescript)

Before creating your CDK project, install the CDK toolkit using the following command:

npm install -g aws-cdk

Create the CDK project

First create a project folder called stream-ingestion with these two commands:

mkdir stream-ingestion
cd stream-ingestion

Now create your CDK application using this command:

npx [email protected] init app --language=typescript

Tip – This example will be written in TypeScript – you can also specify other languages for your projects.

At this time, you must use the same version of the CDK and Solutions Constructs. We’re using version 1.68.0 of both based upon what’s available at publication time, but you can update this with a later version for your projects in the future.

Let’s explore the files in the application this command created:

bin/stream-ingestion.ts – This is the module that launches the application. The key line of code is:

new StreamIngestionStack(app, 'StreamIngestionStack');

This creates the actual stack, and it’s in StreamIngestionStack that you will write the CDK code that defines the resources in your architecture.

lib/stream-ingestion-stack.ts – This is the important class. In the constructor of StreamIngestionStack you will add the constructs that will create your architecture.

During the deployment process, the CDK uploads your Lambda function to an Amazon S3 bucket so it can be incorporated into your stack.

To create that S3 bucket and any other infrastructure the CDK requires, run this command:

cdk bootstrap

The CDK uses the same supporting infrastructure for all projects within a region, so you only need to run the bootstrap command once in any region in which you create CDK stacks.

To install the required Solutions Constructs packages for our architecture, run the these two commands from the command line:

npm install @aws-solutions-constructs/[email protected]
npm install @aws-solutions-constructs/[email protected]

Write the code

First you will write the Lambda function that processes the Kinesis data stream messages.

Create a folder named lambda under stream-ingestion
Within the lambda folder save a file called lambdaFunction.js with the following contents:

var AWS = require("aws-sdk");

// Create the DynamoDB service object
var ddb = new AWS.DynamoDB({ apiVersion: "2012-08-10" });

AWS.config.update({ region: process.env.AWS_REGION });

// We will configure our construct to 
// look for the .handler function
exports.handler = async function (event) {
  try {
    // Kinesis will deliver records 
    // in batches, so we need to iterate through
    // each record in the batch
    for (let record of event.Records) {
      const reading = parsePayload(record.kinesis.data);
      await writeRecord(record.kinesis.partitionKey, reading);
    };
  } catch (err) {
    console.log(`Write failed, err:\n${JSON.stringify(err, null, 2)}`);
    throw err;
  }
  return;
};

// Write the provided sensor reading data to the DynamoDB table
async function writeRecord(partitionKey, reading) {

  var params = {
    // Notice that Constructs automatically sets up 
    // an environment variable with the table name.
    TableName: process.env.DDB_TABLE_NAME,
    Item: {
      partitionKey: { S: partitionKey },  // sensor Id
      timestamp: { S: reading.timestamp },
      value: { N: reading.value}
    },
  };

  // Call DynamoDB to add the item to the table
  await ddb.putItem(params).promise();
}

// Decode the payload and extract the sensor data from it
function parsePayload(payload) {

  const decodedPayload = Buffer.from(payload, "base64").toString(
    "ascii"
  );

  // Our CLI command will send the records to Kinesis
  // with the values delimited by '|'
  const payloadValues = decodedPayload.split("|", 2)
  return {
    value: payloadValues[0],
    timestamp: payloadValues[1]
  }
}

We won’t spend a lot of time explaining this function – it’s pretty straightforward and heavily commented. It receives an event with one or more sensor readings, and for each reading it extracts the pertinent data and saves it to the DynamoDB table.

You will use two Solutions Constructs to create your infrastructure:

The aws-kinesisstreams-lambda construct deploys an Amazon Kinesis data stream and a Lambda function.

aws-kinesisstreams-lambda creates the Kinesis data stream and Lambda function that subscribes to that stream. To support this, it also creates other resources, such as IAM roles and encryption keys.

The aws-lambda-dynamodb construct deploys a Lambda function and a DynamoDB table.

aws-lambda-dynamodb creates an Amazon DynamoDB table and a Lambda function with permission to access the table.

To deploy the first of these two constructs, replace the code in lib/stream-ingestion-stack.ts with the following code:

import * as cdk from "@aws-cdk/core";
import * as lambda from "@aws-cdk/aws-lambda";
import { KinesisStreamsToLambda } from "@aws-solutions-constructs/aws-kinesisstreams-lambda";

import * as ddb from "@aws-cdk/aws-dynamodb";
import { LambdaToDynamoDB } from "@aws-solutions-constructs/aws-lambda-dynamodb";

export class StreamIngestionStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const kinesisLambda = new KinesisStreamsToLambda(
      this,
      "KinesisLambdaConstruct",
      {
        lambdaFunctionProps: {
          // Where the CDK can find the lambda function code
          runtime: lambda.Runtime.NODEJS_10_X,
          handler: "lambdaFunction.handler",
          code: lambda.Code.fromAsset("lambda"),
        },
      }
    );

    // Next Solutions Construct goes here
  }
}

Let’s explore this code:

It instantiates a new KinesisStreamsToLambda object. This Solutions Construct will launch a new Kinesis data stream and a new Lambda function, setting up the Lambda function to receive all the messages in the Kinesis data stream. It will also deploy all the additional resources and policies required for the architecture to follow best practices.
The third argument to the constructor is the properties object, where you specify overrides of default values or any other information the construct needs. In this case you provide properties for the encapsulated Lambda function that informs the CDK where to find the code for the Lambda function that you stored as lambda/lambdaFunction.js earlier.

Now you’ll add the second construct that connects the Lambda function to a new DynamoDB table. In the same lib/stream-ingestion-stack.ts file, replace the line // Next Solutions Construct goes here with the following code:

    // Define the primary key for the new DynamoDB table
    const primaryKeyAttribute: ddb.Attribute = {
      name: "partitionKey",
      type: ddb.AttributeType.STRING,
    };

    // Define the sort key for the new DynamoDB table
    const sortKeyAttribute: ddb.Attribute = {
      name: "timestamp",
      type: ddb.AttributeType.STRING,
    };

    const lambdaDynamoDB = new LambdaToDynamoDB(
      this,
      "LambdaDynamodbConstruct",
      {
        // Tell construct to use the Lambda function in
        // the first construct rather than deploy a new one
        existingLambdaObj: kinesisLambda.lambdaFunction,
        tablePermissions: "Write",
        dynamoTableProps: {
          partitionKey: primaryKeyAttribute,
          sortKey: sortKeyAttribute,
          billingMode: ddb.BillingMode.PROVISIONED,
          removalPolicy: cdk.RemovalPolicy.DESTROY
        },
      }
    );

    // Add autoscaling
    const readScaling = lambdaDynamoDB.dynamoTable.autoScaleReadCapacity({
      minCapacity: 1,
      maxCapacity: 50,
    });

    readScaling.scaleOnUtilization({
      targetUtilizationPercent: 50,
    });

Let’s explore this code:

The first two const objects define the names and types for the partition key and sort key of the DynamoDB table.
The LambdaToDynamoDB construct instantiated creates a new DynamoDB table and grants access to your Lambda function. The key to this call is the properties object you pass in the third argument.
- The first property sent to LambdaToDynamoDB is existingLambdaObj – by setting this value to the Lambda function created by KinesisStreamsToLambda, you’re telling the construct to not create a new Lambda function, but to grant the Lambda function in the other Solutions Construct access to the DynamoDB table. This illustrates how you can chain many Solutions Constructs together to create complex architectures.
- The second property sent to LambdaToDynamoDB tells the construct to limit the Lambda function’s access to the table to write only.
- The third property sent to LambdaToDynamoDB is actually a full properties object defining the DynamoDB table. It provides the two attribute definitions you created earlier as well as the billing mode. It also sets the RemovalPolicy to DESTROY. This policy setting ensures that the table is deleted when you delete this stack – in most cases you should accept the default setting to protect your data.
The last two lines of code show how you can use statements to modify a construct outside the constructor. In this case we set up auto scaling on the new DynamoDB table, which we can access with the dynamoTable property on the construct we just instantiated.

That’s all it takes to create the all resources to deploy your architecture.

Save all the files, then compile the Typescript into a CDK program using this command:

npm run build

Finally, launch the stack using this command:

cdk deploy

(Enter “y” in response to Do you wish to deploy all these changes (y/n)?)

You will see some warnings where you override CDK default values. Because you are doing this intentionally you may disregard these, but it’s always a good idea to review these warnings when they occur.

Tip – Many mysterious CDK project errors stem from mismatched versions. If you get stuck on an inexplicable error, check package.json and confirm that all CDK and Solutions Constructs libraries have the same version number (with no leading caret ^). If necessary, correct the version numbers, delete the package-lock.json file and node_modules tree and run npm install. Think of this as the “turn it off and on again” first response to CDK errors.

You have now deployed the entire architecture for the demo – open the CloudFormation stack in the AWS Management Console and take a few minutes to explore all 12 resources that the program deployed (and the 380 line template generated to created them).

Feed the Stream

Now use the CLI to send some data through the stack.

Go to the Kinesis Data Streams console and copy the name of the data stream. Replace the stream name in the following command and run it from the command line.

aws kinesis put-records \
--stream-name StreamIngestionStack-KinesisLambdaConstructKinesisStreamXXXXXXXX-XXXXXXXXXXXX \
--records \
PartitionKey=1301,'Data=15.4|2020-08-22T01:16:36+00:00' \
PartitionKey=1503,'Data=39.1|2020-08-22T01:08:15+00:00'

Tip – If you are using the AWS CLI v2, the previous command will result in an “Invalid base64…” error because v2 expects the inputs to be Base64 encoded by default. Adding the argument --cli-binary-format raw-in-base64-out will fix the issue.

To confirm that the messages made it through the service, open the DynamoDB console – you should see the two records in the table.

Now that you’ve got it working, pause to think about what you just did. You deployed a system that can ingest and store sensor readings and scale to handle heavy loads. You did that by instantiating two objects – well under 60 lines of code. Experiment with changing some property values and deploying the changes by running npm run build and cdk deploy again.

Cleanup

To clean up the resources in the stack, run this command:

cdk destroy

Conclusion

Just as languages like BASIC and C allowed developers to write programs at a higher level of abstraction than assembly language, the AWS CDK and AWS Solutions Constructs allow us to create CloudFormation stacks in Typescript, Java, or Python instead JSON or YAML. Just as there will always be a place for assembly language, there will always be situations where we want to write CloudFormation templates manually – but for most situations, we can now use the AWS CDK and AWS Solutions Constructs to create complex and complete architectures in a fraction of the time with very little code.

AWS Solutions Constructs can currently be used in CDK applications written in Typescript, Javascript, Java and Python and will be available in C# applications soon.

About the Author

Biff Gaut has been shipping software since 1983, from small startups to large IT shops. Along the way he has contributed to 2 books, spoken at several conferences and written many blog posts. He is now a Principal Solutions Architect at AWS working on the AWS Solutions Constructs team, helping customers deploy better architectures more quickly.

Field Notes: Integrating IoT and ITSM using AWS IoT Greengrass and AWS Secrets Manager – Part 2

2020-10-29 Gary Emmerton

Post Syndicated from Gary Emmerton original https://aws.amazon.com/blogs/architecture/field-notes-integrating-iot-and-itsm-using-aws-iot-greengrass-and-aws-secrets-manager-part-2/

In part 1 of this blog I introduced the need for organizations to securely connect thousands of IoT devices with many different systems in the hyperconnected world that exists today, and how that can be addressed using AWS IoT Greengrass and AWS Secrets Manager. We walked through the creation of ServiceNow credentials in AWS Secrets Manager, the creation of IAM roles and the Lambda functions that will run on our edge device (a Raspberry Pi).

In this second part of the blog, we will setup AWS IoT Greengrass, on our Raspberry Pi, and AWS IoT Core so that we can run the AWS Lambda functions and access our ServiceNow credentials, retrieved securely from AWS Secrets Manager.

Setting up AWS IoT Core and AWS IoT Greengrass

The overall sequence for configuring AWS IoT Core and AWS IoT Greengrass is:

Create a certificate, and IoT Thing and link them
Create AWS IoT Greengrass group
Associate IAM role to the AWS IoT Greengrass group
Create and attach a policy to the certificate
Create an AWS IoT Greengrass Resource Definition for our ‘Secret’
Create an AWS IoT Greengrass Function Definition for our Lambda functions
Create an AWS IoT Greengrass Subscription Definition for IoT Topics to be used
Finally associate our Resource, Function and Subscription Definitions with our AWS IoT Greengrass Core

Steps

For this walkthrough, I have selected the AWS region “eu-west-1”, however, feel free to use other Regions where AWS IoT Core and AWS IoT Greengrass are available.

First, let’s install Greengrass on the Raspberry Pi:

Follow the instructions to configure the pre-requisites on the Raspberry Pi
Then we download the AWS IoT Greengrass software
And then we unzip the AWS IoT Greengrass software using the following command (note, this command is for version 1.10.0 of Greengrass and will change as later versions are released):

sudo tar -xzvf greengrass-linux-armv6l-1.10.0.tar.gz -C /

Note that AWS IoT Greengrass must be compatible with the version of the AWS Greengrass SDK installed to identify what versions are compatible and use sudo pip3 install greengrasssdk==<version_number> to install the SDK compatible with the version of AWS IoT Greengrass that we installed.

Our AWS IoT Greengrass core will authenticate with AWS IoT Core in AWS using certificates, so we need to generate these first using the following command:

aws iot create-keys-and-certificate --set-as-active --certificate-pem-outfile "iot-ge.cert.pem" --public-key-outfile "iot-ge.public.key" --private-key-outfile "iot-ge.private.key"

This command will generate three files containing the private key, public key and certificate. All of these files need to be copied to the /greengrass/certs folder on the Raspberry Pi. Also, the output of the preceding command will give the ARN of the certificate – we need to make a note of this ARN as we will use it in the next steps.

We also need to download a copy of the Amazon Root CA into the /greegrass/certs folder using the command below:

sudo wget -O root.ca.pem https://www.amazontrust.com/repository/AmazonRootCA1.pem

For the next step we need our AWS account number and IoT Host address unique to our account – we get the IoT Host address using the command:

aws iot describe-endpoint --endpoint-type iot:Data-ATS

Now we need to create a config.json file on the Raspberry Pi in the /greengrass/config folder, with the account number and IoT Host address obtained in the previous step;

{
  "coreThing" : {
    "caPath" : "root.ca.pem",
    "certPath" : "iot-ge.cert.pem",
    "keyPath" : "iot-ge.private.key",
    "thingArn" : "arn:aws:iot:eu-west-1:<aws_account_number>:thing/IoT-blog_Core",
    "iotHost" : "<endpoint_address>",
    "ggHost" : "greengrass-ats.iot.eu-west-1.amazonaws.com",
    "keepAlive" : 600
  },
  "runtime" : {
    "cgroup" : {
      "useSystemd" : "yes"
    },
    "allowFunctionsToRunAsRoot" : "yes"
  },
  "managedRespawn" : false,
  "crypto" : {
    "principals" : {
      "SecretsManager" : {
        "privateKeyPath" : "file:///greengrass/certs/iot-ge.private.key"
      },
      "IoTCertificate" : {
        "privateKeyPath" : "file:///greengrass/certs/iot-ge.private.key",
        "certificatePath" : "file:///greengrass/certs/iot-ge.cert.pem"
      }
    },
    "caPath" : "file:///greengrass/certs/root.ca.pem"
  }
}

Note that the line "allowFunctionsToRunAsRoot" : "yes" allows the Lambda functions to easily access the SenseHat on the Raspberry Pi. This configuration should normally be avoided in Production environments for security reasons but has been used here for simplicity.

Next we create the IoT Thing to represent our Raspberry Pi to match the entry we added into the config.json file previously:

aws iot create-thing --thing-name IoT-blog_Core

Now that our config.json file is in place and our IoT ‘thing’ created we can start the AWS IoT Greengrass software using the following commands:

cd /greengrass/ggc/core/
sudo ./greengrassd start

Then we attach the certificate to our new Thing – we need the ARN of the certificate that was noted in the earlier steps when we created the certificates:

aws iot attach-thing-principal --thing-name "IoT-blog_Core" --principal "<certificate_arn>"

Now we create the AWS IoT Greengrass group – make a note of the Group ID in the output of this command as we use it later:

aws greengrass create-group --name IoT-blog-group

Next we create the AWS IoT Greengrass Core definition file – create this using a text editor and save as core-def.json

{
  "Cores": [
    {
      "CertificateArn": "<certificate_arn>",
      "Id": "<IoT Thing Name>",
      "SyncShadow": true,
      "ThingArn": "<thing_arn>"
    }
  ]
}

Then, using the preceding file we just created, we create the core definition using the following command:

aws greengrass create-core-definition --name "IoT-blog_Core" --initial-version file://core-def.json

Now we associate the AWS IoT Greengrass core with the AWS IoT Greengrass group – we need the LatestVersionARN from the output of the command above and the group ID of your existing AWS IoT Greengrass group (in the output from the command for creation of the group in previous steps):

aws greengrass create-group-version --group-id "<greengrass_group_id>" --core-definition-version-arn "<core_definition_version_arn>"

Then we associate the IAM role (created earlier) to the AWS IoT Greengrass group;

aws greengrass associate-role-to-group --group-id "<greengrass_group_id>" --role-arn "arn:aws:iam::<aws_account_number>:role/IoTGGRole"

We need to create a policy to associate with the certificate so that our AWS IoT Greengrass Core (authenticated/authorized by our certificates) has rights to interact with AWS IoT Core. To do this we create the policy.json file:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "iot:Publish",
        "iot:Subscribe",
        "iot:Connect",
        "iot:Receive"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "iot:GetThingShadow",
        "iot:UpdateThingShadow",
        "iot:DeleteThingShadow"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "greengrass:*"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}

Then create the policy using the policy file using the command below:

aws iot create-policy --policy-name myGGPolicy --policy-document file://policy.json

And finally attach our new policy to the certificate – as the certificate is attached to our AWS IoT Greengrass Core, this gives the rights defined in the policy to our AWS IoT Greengrass Core;

aws iot attach-policy --target "<certificate_arn>" --policy-name "myGGPolicy"

Now we have the AWS IoT Greengrass Core and permissions in place, it’s time to add our Secret as a resource for AWS IoT Greengrass.

First, we need to create a resource definition that refers to the ARN of the secret we created earlier. Get the ARN of the secret using the following command:

aws secretsmanager describe-secret --secret-id "greengrass-snow-creds"

And then we create a text file containing the following and save it as resource.json:

{
"Resources": [
    {
      "Id": "SNOW-Credentials",
      "Name": "SNOW-Credentials",
      "ResourceDataContainer": {
        "SecretsManagerSecretResourceData": {
          "ARN": "<secret_arn>"
        }
      }
    }
  ]
}

Now we command to create the resource reference in IoT to the Secret:

aws greengrass create-resource-definition --name "MySNOWSecret" --initial-version file://resource.json

Note the Resource ID from the output as it is needed as it has to be added to the Lambda definition json file in the next steps. The function definition file contains the details of the Lambda function(s) that we will attach to our AWS IoT Greengrass group. We create a text file with the content below and save as lambda-def.json.

We also specify a couple of variables in the definition file; these are the same as the environment variables that can be specified for Lambda, but they make the variables available in AWS IoT Greengrass.

Note, if we specify environment variables for the functions in the Lambda console then these will NOT be available when the function is running under AWS IoT Greengrass. We will need our ServiceNow API URL to add to the configuration below, and this will be in the form of https://devXXXXX.service-now.com/api/now/table/incident, where XXXXX is the developer instance number assigned by ServiceNow when our instance is created.

We need the ARNs of the Lambda functions that we created in part 1 of the blog – these appear in the output after successfully creating the functions from the command line, or can be obtained using the aws lambda list-functions command – we need to have the ‘:1’ at the end of the ARN as AWS IoT Greengrass needs to reference published function versions.

{
  "DefaultConfig": {
    "Execution": {
      "IsolationMode": "NoContainer",
      "RunAs": {
        "Gid": 0,
        "Uid": 0
      }
    }
  },
  "Functions": [
    {
      "FunctionArn": "<lambda_function1_arn>:1",
      "FunctionConfiguration": {
        "EncodingType": "json",
        "Environment": {
          "Execution": {
            "IsolationMode": "NoContainer"
          },
          "Variables": { 
            "tempLimit": "30",
            "humidLimit": "50"
          }
        },
        "ExecArgs": "string",
        "Executable": "lambda_function.lambda_handler",
        "Pinned": true,
        "Timeout": 10
      },
    "Id": "sensorLambda"
    },
    {
      "FunctionArn": "<lambda_function2_arn>:1",
      "FunctionConfiguration": {
        "EncodingType": "json",
        "Environment": {
          "Execution": {
            "IsolationMode": "NoContainer"
          },
          "ResourceAccessPolicies": [
            {
              "Permission": "ro",
              "ResourceId": "SNOW-Credentials"
            }
          ],
          "Variables": { 
            "snowUrl": "<service_now_api_url>"
          }
        },
        "ExecArgs": "string",
        "Executable": "lambda_function.lambda_handler",
        "Pinned": false,
        "Timeout": 10
      },
    "Id": "anomalyLambda"
    }
  ]
}

The Lambda function now needs to be registered within our AWS IoT Greengrass core using the definition file just created, using the following command:

aws greengrass create-function-definition --name "IoT-blog-lambda" --initial-version file://lambda-def.json

Create Subscriptions

We now need to create some IoT Topics to pass data between the two Lambda functions and also to submit all sensor data to AWS IoT Core, which gives us visibility of the successful collection of sensor data.cd.

First, let’s create a subscription configuration file (subscriptions.json) for sensor data and anomaly data:

{
  "Subscriptions": [
    {
      "Id": "SensorData",
      "Source": "<lambda_function1_arn>:1",
      "Subject": "IoTBlog/sensorData",
      "Target": "cloud"
    },
    {
      "Id": "AnomalyData",
      "Source": "<lambda_function1_arn>:1",
      "Subject": "IoTBlog/anomaly",
      "Target": "<lambda_function2_arn>:1"
    },
    {
      "Id": "AnomalyDataB",
      "Source": "<lambda_function1_arn>:1",
      "Subject": "IoTBlog/anomaly",
      "Target": "cloud"
    }
  ]
}

And next, we run the command to create the subscription from this configuration:

aws greengrass create-subscription-definition --name "IoT-sensor-subs" --initial-version file://subscriptions.json

Update AWS IoT Greengrass Group Associations and Deploy

Now that the functions, subscriptions and resources have been defined, we run the following command to update our AWS IoT Greengrass group to the new version with those components included:

aws greengrass create-group-version --group-id <gg_group_id> --core-definition-version-arn "<core_def_version_arn>" --function-definition-version-arn "<function_def_version_arn>" --resource-definition-version-arn "<resource_def_version_arn>" --subscription-definition-version-arn "<subscription_def_version_arn>"

And finally, we can deploy our configuration. Use the following command to deploy the Greengrass group to our device, using the group-version-id from the output of the previous command and also the group-id:

aws greengrass create-deployment --deployment-type NewDeployment --group-id <gg_group_id> --group-version-id <gg_group_version_id>

Summarized below is the integration between the different functions and components that we have now deployed to get from our sensor data through to an incident being raised in ServiceNow:

Raspberry PI

Create an Incident

Everything is setup now from an IoT perspective, so we can attempt to trigger a threshold breach on the sensors to trigger the creation of an incident in ServiceNow. In order to trigger the incident creation, let’s raise the humidity around the sensor so that it breaches the threshold defined in the environment variables of the Lambda function.

Under normal conditions we will just see the data published by the first Lambda function in the IoTBlog/sensorData topic:

IoTblog sensordata

However, when a threshold is breached (in our example, humidity above 50%), the data is published to the IoTBlog/anomaly topic as shown below:

ioTblog Anomaly

Via the AWS IoT Greengrass subscriptions created earlier, this message arriving in the anomaly topic also triggers the second Lambda function to create the ticket in ServiceNow.

The log for the second Lambda function on AWS IoT Greengrass (stored in /greengrass/ggc/var/log/user/<region>/<aws_account_number>/ on the Raspberry Pi) will show a ‘201’ return code if the incident is successfully created in ServiceNow.

Now let’s log on to ServiceNow and check out our new incident. Good news, our new incident appears correctly:

And when we click on our incident we can see the detail, including the full data from the IoT topic in the Activities section;

This is only a basic use of the ServiceNow API and there are many other parameters that you can use to increase the richness of the incident, refer to the ServiceNow API documentation for more details.

Cleaning up

To avoid incurring future charges, delete the resources that you created in the walkthrough.

Conclusion

We have built an IoT device (Raspberry Pi), running AWS IoT Greengrass, AWS Lambda, and using ServiceNow credentials managed in AWS Secrets Manager. Using this we have triggered an anomaly event that has created an incident automatically in ServiceNow, directly from the Lambda function running on our Pi. You can use this architecture as the foundation to integrate your edge devices and ITSM solution to automate ticket generation in your organization.

Look out for follow-up blogs that will extend this solution to provide a real-time dashboard for the sensor data and store the sensor data in a Data Lake for historical visualization.

Find out more about deploying Secrets to AWS IoT Greengrass Core.

Check out the AWS IoT Blog for more examples of how to use AWS to integrate your edge devices with the AWS Cloud.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Building, bundling, and deploying applications with the AWS CDK

2020-10-28 Cory Hall

Post Syndicated from Cory Hall original https://aws.amazon.com/blogs/devops/building-apps-with-aws-cdk/

The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework to model and provision your cloud application resources using familiar programming languages.

The post CDK Pipelines: Continuous delivery for AWS CDK applications showed how you can use CDK Pipelines to deploy a TypeScript-based AWS Lambda function. In that post, you learned how to add additional build commands to the pipeline to compile the TypeScript code to JavaScript, which is needed to create the Lambda deployment package.

In this post, we dive deeper into how you can perform these build commands as part of your AWS CDK build process by using the native AWS CDK bundling functionality.

If you’re working with Python, TypeScript, or JavaScript-based Lambda functions, you may already be familiar with the PythonFunction and NodejsFunction constructs, which use the bundling functionality. This post describes how to write your own bundling logic for instances where a higher-level construct either doesn’t already exist or doesn’t meet your needs. To illustrate this, I walk through two different examples: a Lambda function written in Golang and a static site created with Nuxt.js.

Concepts

A typical CI/CD pipeline contains steps to build and compile your source code, bundle it into a deployable artifact, push it to artifact stores, and deploy to an environment. In this post, we focus on the building, compiling, and bundling stages of the pipeline.

The AWS CDK has the concept of bundling source code into a deployable artifact. As of this writing, this works for two main types of assets: Docker images published to Amazon Elastic Container Registry (Amazon ECR) and files published to Amazon Simple Storage Service (Amazon S3). For files published to Amazon S3, this can be as simple as pointing to a local file or directory, which the AWS CDK uploads to Amazon S3 for you.

When you build an AWS CDK application (by running cdk synth), a cloud assembly is produced. The cloud assembly consists of a set of files and directories that define your deployable AWS CDK application. In the context of the AWS CDK, it might include the following:

AWS CloudFormation templates and instructions on where to deploy them
Dockerfiles, corresponding application source code, and information about where to build and push the images to
File assets and information about which S3 buckets to upload the files to

Use case

For this use case, our application consists of front-end and backend components. The example code is available in the GitHub repo. In the repository, I have split the example into two separate AWS CDK applications. The repo also contains the Golang Lambda example app and the Nuxt.js static site.

Golang Lambda function

To create a Golang-based Lambda function, you must first create a Lambda function deployment package. For Go, this consists of a .zip file containing a Go executable. Because we don’t commit the Go executable to our source repository, our CI/CD pipeline must perform the necessary steps to create it.

In the context of the AWS CDK, when we create a Lambda function, we have to tell the AWS CDK where to find the deployment package. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  runtime: lambda.Runtime.GO_1_X,
  handler: 'main',
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-go-executable')),
});

In the preceding code, the lambda.Code.fromAsset() method tells the AWS CDK where to find the Golang executable. When we run cdk synth, it stages this Go executable in the cloud assembly, which it zips and publishes to Amazon S3 as part of the PublishAssets stage.

If we’re running the AWS CDK as part of a CI/CD pipeline, this executable doesn’t exist yet, so how do we create it? One method is CDK bundling. The lambda.Code.fromAsset() method takes a second optional argument, AssetOptions, which contains the bundling parameter. With this bundling parameter, we can tell the AWS CDK to perform steps prior to staging the files in the cloud assembly.

Breaking down the BundlingOptions parameter further, we can perform the build inside a Docker container or locally.

Building inside a Docker container

For this to work, we need to make sure that we have Docker running on our build machine. In AWS CodeBuild, this means setting privileged: true. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-source-code'), {
    bundling: {
      image: lambda.Runtime.GO_1_X.bundlingDockerImage,
      command: [
        'bash', '-c', [
          'go test -v',
          'GOOS=linux go build -o /asset-output/main',
      ].join(' && '),
    },
  })
  ...
});

We specify two parameters:

image (required) – The Docker image to perform the build commands in
command (optional) – The command to run within the container

The AWS CDK mounts the folder specified as the first argument to fromAsset at /asset-input inside the container, and mounts the asset output directory (where the cloud assembly is staged) at /asset-output inside the container.

After we perform the build commands, we need to make sure we copy the Golang executable to the /asset-output location (or specify it as the build output location like in the preceding example).

This is the equivalent of running something like the following code:

docker run \
  --rm \
  -v folder-containing-source-code:/asset-input \
  -v cdk.out/asset.1234a4b5/:/asset-output \
  lambci/lambda:build-go1.x \
  bash -c 'GOOS=linux go build -o /asset-output/main'

Building locally

To build locally (not in a Docker container), we have to provide the local parameter. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-source-code'), {
    bundling: {
      image: lambda.Runtime.GO_1_X.bundlingDockerImage,
      command: [],
      local: {
        tryBundle(outputDir: string) {
          try {
            spawnSync('go version')
          } catch {
            return false
          }

          spawnSync(`GOOS=linux go build -o ${path.join(outputDir, 'main')}`);
          return true
        },
      },
    },
  })
  ...
});

The local parameter must implement the ILocalBundling interface. The tryBundle method is passed the asset output directory, and expects you to return a boolean (true or false). If you return true, the AWS CDK doesn’t try to perform Docker bundling. If you return false, it falls back to Docker bundling. Just like with Docker bundling, you must make sure that you place the Go executable in the outputDir.

Typically, you should perform some validation steps to ensure that you have the required dependencies installed locally to perform the build. This could be checking to see if you have go installed, or checking a specific version of go. This can be useful if you don’t have control over what type of build environment this might run in (for example, if you’re building a construct to be consumed by others).

If we run cdk synth on this, we see a new message telling us that the AWS CDK is bundling the asset. If we include additional commands like go test, we also see the output of those commands. This is especially useful if you wanted to fail a build if tests failed. See the following code:

$ cdk synth
Bundling asset GolangLambdaStack/MyGoFunction/Code/Stage...
✓  . (9ms)
✓  clients (5ms)

DONE 8 tests in 11.476s
✓  clients (5ms) (coverage: 84.6% of statements)
✓  . (6ms) (coverage: 78.4% of statements)

DONE 8 tests in 2.464s

Cloud Assembly

If we look at the cloud assembly that was generated (located at cdk.out), we see something like the following code:

$ cdk synth
Bundling asset GolangLambdaStack/MyGoFunction/Code/Stage...
✓  . (9ms)
✓  clients (5ms)

DONE 8 tests in 11.476s
✓  clients (5ms) (coverage: 84.6% of statements)
✓  . (6ms) (coverage: 78.4% of statements)

DONE 8 tests in 2.464s

It contains our GolangLambdaStack CloudFormation template that defines our Lambda function, as well as our Golang executable, bundled at asset.01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952/main.

Let’s look at how the AWS CDK uses this information. The GolangLambdaStack.assets.json file contains all the information necessary for the AWS CDK to know where and how to publish our assets (in this use case, our Golang Lambda executable). See the following code:

{
  "version": "5.0.0",
  "files": {
    "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952": {
      "source": {
        "path": "asset.01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952",
        "packaging": "zip"
      },
      "destinations": {
        "current_account-current_region": {
          "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}",
          "objectKey": "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952.zip",
          "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}"
        }
      }
    }
  }
}

The file contains information about where to find the source files (source.path) and what type of packaging (source.packaging). It also tells the AWS CDK where to publish this .zip file (bucketName and objectKey) and what AWS Identity and Access Management (IAM) role to use (assumeRoleArn). In this use case, we only deploy to a single account and Region, but if you have multiple accounts or Regions, you see multiple destinations in this file.

The GolangLambdaStack.template.json file that defines our Lambda resource looks something like the following code:

{
  "Resources": {
    "MyGoFunction0AB33E85": {
      "Type": "AWS::Lambda::Function",
      "Properties": {
        "Code": {
          "S3Bucket": {
            "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}"
          },
          "S3Key": "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952.zip"
        },
        "Handler": "main",
        ...
      }
    },
    ...
  }
}

The S3Bucket and S3Key match the bucketName and objectKey from the assets.json file. By default, the S3Key is generated by calculating a hash of the folder location that you pass to lambda.Code.fromAsset(), (for this post, folder-containing-source-code). This means that any time we update our source code, this calculated hash changes and a new Lambda function deployment is triggered.

Nuxt.js static site

In this section, I walk through building a static site using the Nuxt.js framework. You can apply the same logic to any static site framework that requires you to run a build step prior to deploying.

To deploy this static site, we use the BucketDeployment construct. This is a construct that allows you to populate an S3 bucket with the contents of .zip files from other S3 buckets or from a local disk.

Typically, we simply tell the BucketDeployment construct where to find the files that it needs to deploy to the S3 bucket. See the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-directory')),
  ],
  destinationBucket: myBucket
});

To deploy a static site built with a framework like Nuxt.js, we need to first run a build step to compile the site into something that can be deployed. For Nuxt.js, we run the following two commands:

yarn install – Installs all our dependencies
yarn generate – Builds the application and generates every route as an HTML file (used for static hosting)

This creates a dist directory, which you can deploy to Amazon S3.

Just like with the Golang Lambda example, we can perform these steps as part of the AWS CDK through either local or Docker bundling.

Building inside a Docker container

To build inside a Docker container, use the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-nuxtjs-project'), {
      bundling: {
        image: cdk.BundlingDockerImage.fromRegistry('node:lts'),
        command: [
          'bash', '-c', [
            'yarn install',
            'yarn generate',
            'cp -r /asset-input/dist/* /asset-output/',
          ].join(' && '),
        ],
      },
    }),
  ],
  ...
});

For this post, we build inside the publicly available node:lts image hosted on DockerHub. Inside the container, we run our build commands yarn install && yarn generate, and copy the generated dist directory to our output directory (the cloud assembly).

The parameters are the same as described in the Golang example we walked through earlier.

Building locally

To build locally, use the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-nuxtjs-project'), {
      bundling: {
        local: {
          tryBundle(outputDir: string) {
            try {
              spawnSync('yarn --version');
            } catch {
              return false
            }

            spawnSync('yarn install && yarn generate');

       fs.copySync(path.join(__dirname, ‘path-to-nuxtjs-project’, ‘dist’), outputDir);
            return true
          },
        },
        image: cdk.BundlingDockerImage.fromRegistry('node:lts'),
        command: [],
      },
    }),
  ],
  ...
});

Building locally works the same as the Golang example we walked through earlier, with one exception. We have one additional command to run that copies the generated dist folder to our output directory (cloud assembly).

Conclusion

This post showed how you can easily compile your backend and front-end applications using the AWS CDK. You can find the example code for this post in this GitHub repo. If you have any questions or comments, please comment on the GitHub repo. If you have any additional examples you want to add, we encourage you to create a Pull Request with your example!

Our code also contains examples of deploying the applications using CDK Pipelines, so if you’re interested in deploying the example yourself, check out the example repo.

About the author

Cory Hall

Cory is a Solutions Architect at Amazon Web Services with a passion for DevOps and is based in Charlotte, NC. Cory works with enterprise AWS customers to help them design, deploy, and scale applications to achieve their business goals.