Tag Archives: Technical How-to

Field Notes: Analyze Cross-Account AWS KMS Call Usage with AWS CloudTrail and Amazon Athena

Post Syndicated from Abhijit Rajeshirke original https://aws.amazon.com/blogs/architecture/field-notes-analyze-cross-account-aws-kms-call-usage-with-aws-cloudtrail-and-amazon-athena/

Businesses are expanding their footprint on Amazon Web Services (AWS) and are adopting a multi-account strategy to help isolate and manage business applications and data. In the multi-account strategy, it is common to have business applications deployed in one account accessing an Amazon Simple Storage Service (Amazon S3) encrypted bucket from another AWS account.

When an application in an AWS account uses a AWS Key Management Service (AWS KMS) key owned by a different account, it’s known as a cross-account call. For cross-account requests, AWS KMS throttles the account that makes the requests, not the account that owns the AWS KMS key. These requests count toward the request quota of the caller account. Sometimes it’s essential to identify or track cross-account AWS KMS API usage. In this blog, you will learn about use cases to track these requests and steps to identify cross-account AWS KMS calls.

To understand the problem better, consider a scenario where you have multiple AWS accounts set up in a hub and spoke configuration as shown in the following diagram.  Each account is administered by a different administrator. Amazon S3 data lake is located in the centralized hub account. The data lake bucket is encrypted using server-side encryption with AWS KMS (SSE-KMS) with customer-managed keys. Multiple spoke accounts access datasets from this data lake bucket. When a spoke account uploads or downloads objects from the data lake, Amazon S3 makes a GenerateDataKey (for uploads) or Decrypt (for downloads) API request to AWS KMS on behalf of the spoke account. These API requests get applied toward AWS KMS quota of the spoke account.

In the following diagram (figure 1), spoke accounts B, C, and D are uploading/downloading files from the encrypted data lake located in hub account A. Related AWS KMS API quotas will get applied to spoke accounts even though encryption/decryption is happening at the data lake S3 bucket. For example, the centralized Amazon S3 data lake is located in hub account A with an account ID 111111111111. Amazon S3 data lake bucket is encrypted using AWS KMS key ARN ending in 3aa3c82a2174.

Spoke account B with account ID 222222222222 is downloading 1,811 files and uploading 749 files from the centralized data lake. A total of 2,560 AWS KMS API calls will be counted against the request quota for account B.

Spoke account C with account ID 33333333333 is downloading 997 files and uploading 271 files from centralized data lake. A total of 1,268 AWS KMS API calls will be counted against the request quota for account C.

Spoke account D with account ID 444444444444 is downloading 638 files and uploading 306 files from centralized data lake from centralized data lake. The total 944 AWS KMS API quotas will get applied to account D.

Spoke and hub accounts are owned by separate business units and owned by different account administrators.

Note: when you configure your bucket to use an S3 Bucket Key for SSE-KMS, you may not see separate Decrypt or GenerateDataKey for each file upload or download.

Figure 1: Architecture outlining the hub and spoke accounts

Figure 1: Architecture outlining the hub and spoke accounts

This architecture design works for the following three use cases.

Use case #1:

A spoke account administrator wants to track the individual AWS KMS key-wise encryption/decryption costs using AWS Cost Explorer and cost allocation tags. Tracking costs this way works well for the AWS KMS API calls made within the same spoke account and related costs will be displayed under appropriate cost allocation tags. However, for the cross account AWS KMS API calls, cost allocation tags will not be visible outside of the hub account and will be displayed under cost allocation tag “None.” Analyzing cross-account AWS KMS API calls will help administrator determine approximate percentage usage by each cross account KMS key.

Use case # 2:

The spoke account has multiple applications, and each application has a unique AWS Identity and Access Management (IAM) principal. The spoke account administrator would like to track encryption/decryption usage. Identifying IAM principal-wise cross account calls will help the administrator determine approximate percent usage by each IAM principal /each application.

Use case # 3:

The spoke account administrator wants to understand how much AWS KMS quota is used for the cross-account specific KMS keys.

Solution Overview

Let’s discuss how we can track cross-account AWS KMS calls using AWS CloudTrail and Amazon Athena. For this solution, we will reuse your existing CloudTrail or create new CloudTrail in a Region where the hub account Amazon S3 data lake is located. As shown in the following diagram, we will use Athena to query the CloudTrail data to identify cross account AWS KMS calls used for S3 encryption/decryption.

Figure 2- Spoke and hub architecture

Figure 2: Architecture outlining the CloudTrail and Athena Solution

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • AWS accounts (one for hub and at least one for spoke)
  • AWS KMS (SSE-KMS) with customer managed keys encrypted S3 bucket

Walkthrough

Step 1: Activate AWS CloudTrail for the hub account

CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of your AWS account. With CloudTrail, you can log, continuously monitor, and retain account activity across your AWS accounts. If you have already activated CloudTrail, you can reuse the same. If you haven’t, you can activate it using the steps in this tutorial. For the proposed solution, you must enable CloudTrail for management events only. You don’t require CloudTrail for data events or insight events. Also, be aware that you need only single CloudTrail and creating duplicate cloud trails can increase the service cost.

Note:  you can analyze the data in Athena only when the CloudTrail data is available. Any access requests made prior to enabling CloudTrail cannot be analyzed. It takes up to 15 minutes for events to get to CloudTrail, and up to 5 minutes for CloudTrail to write to S3.

Step 2: Create Amazon Athena table to query the CloudTrail data

Amazon Athena is an interactive query service that analyzes data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. Create an Athena table in any database or default database in a Region where your hub account S3 data lake bucket resides.

If you are using Athena for the first time, follow these steps to create a database. Once the database is created you need to create Athena table. Follow these steps to create a table:

  1. Open the Athena built-in query editor,
  2. copy the following query,
  3. modify as suggested,
  4. run the query.

In the LOCATION and storage.location.template clauses, replace the bucket with CloudTrail bucket. Replace accountId with hub account’s ID and replace awsRegion with region where data lake S3 bucket is located. For projection.timestamp.range, replace 2020/01/01 with the start date you want to use.

After successful initiation of the query, you will see the CloudTrail_logs table created in Athena.

CREATE EXTERNAL TABLE cloudtrail_logs_region(

    eventVersion STRING,

    userIdentity STRUCT<

        type: STRING,

        principalId: STRING,

        arn: STRING,

        accountId: STRING,

        invokedBy: STRING,

        accessKeyId: STRING,

        userName: STRING,

        sessionContext: STRUCT<

            attributes: STRUCT<

                mfaAuthenticated: STRING,

                creationDate: STRING>,

            sessionIssuer: STRUCT<

                type: STRING,

                principalId: STRING,

                arn: STRING,

                accountId: STRING,

                userName: STRING>>>,

    eventTime STRING,

    eventSource STRING,

    eventName STRING,

    awsRegion STRING,

    sourceIpAddress STRING,

    userAgent STRING,

    errorCode STRING,

    errorMessage STRING,

    requestParameters STRING,

    responseElements STRING,

    additionalEventData STRING,

    requestId STRING,

    eventId STRING,

    readOnly STRING,

    resources ARRAY<STRUCT<

        arn: STRING,

        accountId: STRING,

        type: STRING>>,

    eventType STRING,

    apiVersion STRING,

    recipientAccountId STRING,

    serviceEventDetails STRING,

    sharedEventID STRING,

    vpcEndpointId STRING

  )

PARTITIONED BY (

   `timestamp` string)

ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'

STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'

OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

  's3://bucket/AWSLogs/account-id/CloudTrail/aws-region'

TBLPROPERTIES (

  'projection.enabled'='true',

  'projection.timestamp.format'='yyyy/MM/dd',

  'projection.timestamp.interval'='1',

  'projection.timestamp.interval.unit'='DAYS',

  'projection.timestamp.range'='2020/01/01,NOW',

  'projection.timestamp.type'='date',

  'storage.location.template'='s3://bucket/AWSLogs/account-id/CloudTrail/aws-region/${timestamp}')

Athena screenshot

Step 3: Identify cross account Amazon S3 encryption/decryption calls

Once the Athena table is created, you can run the following query to find out cross-account AWS KMS calls made for S3 encryption /decryption.

Query:

SELECT useridentity.accountid as requestor_account_id,

              resources[1].accountid as owner_account_id,

              resources[1].arn as key_arn,       

          count(resources) as count

  FROM CloudTrail_logs_us_east_2

  WHERE eventsource='kms.amazonaws.com'

          AND timestamp between '2021/04/01' and '2021/08/30'

          AND eventname in ('Decrypt','Encrypt','GenerateDataKey')

          AND useridentity.accountid!= resources[1].accountid

    AND json_extract(json_extract(requestparameters , '$.encryptionContext'),'$.aws:s3:arn') is not null

  GROUP BY  useridentity.accountid,resources[1].accountid,resources[1].arn

  ORDER BY  key_arn,count desc

Result:

Result displays the cross account AWS KMS calls made for S3 encryption /decryption i.e. where caller account is not the key owner account for time period between April 1, 2021 and August 30, 2021.

Athena screenshot 2

The preceding example shows cross-account AWS KMS API calls generated by downloading /uploading files from centralized Amazon S3 data lake located in account A (111111111111) from spoke accounts B (222222222222), C (333333333333), and D (444444444444).

These AWS KMS quotas will get applied to caller (spoke) accounts even though key owner is hub account.

For example:

  • 2,560 AWS KMS API call quotas will be applied to account B.
  • 1,644 AWS KMS API call quotas will be applied to account C.
  • 944 AWS KMS API call quotas will be applied to account D.

Step 4: Identify IAM principal-wise cross account Amazon S3 encryption/decryption calls.

To identify IAM principal-wise cross account Amazon S3 encryption /decryption calls, you can run following query.

Query:

SELECT useridentity.accountid as requestor_account_id,
useridentity.principalid as requestor_principal,
resources[1].accountid as owner_account_id,
resources[1].arn as key_arn,
count(resources) as count
FROM CloudTrail_logs_us_east_2
WHERE eventsource='kms.amazonaws.com'
AND timestamp between '2021/04/01' and '2021/08/30'
AND eventname in ('Decrypt','Encrypt','GenerateDataKey')
AND useridentity.accountid!= resources[1].accountid
AND json_extract(json_extract(requestparameters , '$.encryptionContext'),'$.aws:s3:arn') is not null
GROUP BY useridentity.accountid,useridentity.principalid,resources[1].accountid,resources[1].arn
ORDER BY requestor_account_id,count desc

Result:

Athena screenshot 3

The preceding result shows  AWS Identity and Access Management (IAM) principal-wise cross-account AWS KMS API calls made between hub and spoke accounts. For example, Account B (22222222222) has two applications configured with IAM principals ids ending with 4C5VIMGI2, 4YFPRTQMP are accessing the centralized S3 bucket located in hub account A (111111111111).

For the time period between ‘2021/04/01’ and ‘2021/08/30’, the application configured with IAM principal ending in 4C5VIMGI2 made 1622 cross-account AWS KMS API calls. During this same time period, the application configured with IAM principal 4YFPRTQMP made 936 cross-account AWS KMS API calls.

We can further filter the results to see only KMS key ARN ending with 3aa3c82a2174 to get application- wise % of AWS KMS API calls made to the Amazon S3 centralized data lake from all the spoke accounts.

account ID table

Note:  we assume that each application is configured with a unique IAM principal.

Step 5: Identify cross account Amazon S3 encryption/decryption calls by events.

Query:

SELECT useridentity.accountid as requestor_account_id,

              resources[1].accountid as owner_account_id,

              resources[1].arn as key_arn,

              eventname as eventname,

          count(resources) as count

  FROM CloudTrail_logs_us_east_2

  WHERE eventsource='kms.amazonaws.com'

          AND timestamp between '2021/04/01' and '2021/08/30'

          AND useridentity.accountid!= resources[1].accountid

        AND json_extract(json_extract(requestparameters , '$.encryptionContext'),'$.aws:s3:arn') is not null

  GROUP BY useridentity.accountid, resources [1].accountid,resources[1].arn,eventname

  ORDER BY requestor_account_id,count desc

Result:

Athena screenshot 4

Amazon S3 makes decrypt API requests when you download the files and GenerateDataKey API request when you upload the file to encrypted S3 bucket. The result shows that:

  • Spoke account B (22222222222) made 1811 decrypt API requests to download 1811 files and 749 GenerateDataKey API requests to uploaded 749 files.
  • Spoke account C (33333333333) made 1373 decrypt API requests to download 1373 files and 271 GenerateDataKey API requests to uploaded 271 files.
  • Spoke account D (444444444444) made 638 decrypt API requests to download 638 files and 306 GenerateDataKey API requests to uploaded 306 files.

Note: When you configure your bucket to use an S3BucketKey for SSE-KMS, you may not have a separate Decrypt or GenerateDataKey for each file upload or download.

Step 6: Identify all the AWS KMS Calls.

To analyze the hub account for all the AWS KMS API calls made, run following query.

Query:

SELECT useridentity.accountid as requestor_account_id,

              resources[1].accountid as owner_account_id,

              resources[1].arn as key_arn,

          count(resources) as count

  FROM CloudTrail_logs_us_east_2

  WHERE eventsource='kms.amazonaws.com'

          AND timestamp between '2021/04/01' and '2021/08/30'

  GROUP BY useridentity.accountid, resources [1].accountid,resources[1].arn

  ORDER BY requestor_account_id,count desc

Result:

Athena screenshot 5

Results show all the AWS KMS API calls made in the hub account both within the account and across accounts. From this result, we can analyze that for centralized S3 data lake (KMS key ARN ending with 3aa3c82a2174), the majority of the calls are cross account AWS KMS API call and only 303 calls are made within account. You can do further analysis by refining the Amazon Athena queries based on your needs.

Cleaning up

To avoid incurring future charges, delete the resources that are no longer required.

Step 1: Delete the CloudTrail created in hub account

If you have created CloudTrail specifically for this solution, you can delete the CloudTrail by following the instructions in this user guide.

Step 2: Drop the Amazon Athena table

Log in to the Amazon Athena console and run the following drop table query:

Drop table < CloudTrail_logs_aws_region_1>

Conclusion

Tracking use of the cross-account AWS KMS APIs can be challenging in a multi-account scenario. In this blog, we learned how to use AWS CloudTrail and Amazon Athena to analyze AWS KMS API usage. In a hub and spoke account model, cross-account AWS KMS API quotas are applied to the spoke account when the spoke account accesses SSE-KMS encrypted S3 bucket in the hub account. You learned to analyze cross-account AWS KMS API quotas using AWS CloudTrail and Amazon Athena.  Finally, we learned how we can identify all the AWS KMS API call within account for period of time and analyze AWS KMS API traffic within account and across account. You can repeat the process and aggregate the data across Regions.

Additional Reading:

Manage your AWS KMS API request rates using Service Quotas and Amazon CloudWatch

Why did my CloudTrail cost and usage increase unexpectedly?

User Guide: Managing CloudTrail Costs

Field Notes: Understanding Carrier Codes, Message Structure, and Interaction Analytics with Amazon Pinpoint

Post Syndicated from Edward Schaefer original https://aws.amazon.com/blogs/architecture/field-notes-understanding-carrier-codes-message-structure-and-interaction-analytics-with-amazon-pinpoint/

IT developers are frequently looking for an analytics system that tracks app user behavior and engagement with various marketing campaigns. It can be challenging to differentiate between use cases and advantages of utilizing Long Codes, Short Codes and Toll-Free numbers to feed into interaction analytics. With Amazon Pinpoint, developers can learn how each user prefers to engage and can personalize their end-user’s experience to increase engagement.

In this blog post, we’ll evaluate the differences between Long Codes, Short Codes and Toll-Free and we’ll also discuss messaging templates and creating journeys to customize events handling in Amazon Pinpoint.

Typical use cases of Amazon Pinpoint include:

  • Sending timely and targeted message to your customers promoting your products and services with basic templates or highly-personalized messages.
  • Event-based campaigns can be used to send a message when a customer creates a new account or when they add an item to their cart but don’t purchase it. These communications are transactional in nature and can be sent on customer activities within your application.
  • You can create customer outreach with millions in user communities and use built-in analytics to observe your campaign performance.

SMS messaging forms one of the most critical communication channels with customers. Both one way and two-way messaging are supported by Amazon Pinpoint when you enable the SMS channel in your project.

Architecture overview

The following diagram illustrates a typical architecture for how Pinpoint integrates with various AWS services.

Architecture outlining how Pinpoint intrgatios with various AWS services.

Long codes, short codes, and toll-free numbers

Dedicated long codes and 10DLC

A dedicated long code, also referred to as a long virtual number or LVN, is a standard phone number that contains up to 12 digits, depending on the country that it’s based in. You cannot request a long code for A2P messaging within the United States. Instead, you’ll need to request a 10DLC.

Typically used for customer service-related communications, long codes also allow businesses to establish consistent experiences. This is done by using the same number to send both SMS text messages as well as voice messages to customers.

In the United States, 10-digit long code (10DLC) numbers are designed specifically for high-volume Application-to-Person (A2P) messaging. Before purchasing a 10DLC, you must first register your company and create a campaign using the Amazon Pinpoint console. Since Jun 1, 2021, United States mobile carriers require 10DLC for A2P messages.

Common use cases: Customer service, appointment reminders, two-way communications, fraud, or emergency notifications (10DLC), promotional messaging (10DLC).

Advantages:

  • Customer Trust and Brand recognition: 10DLC and dedicated long codes are registered and dedicated to individual companies and campaigns. This means that if you send multiple messages to a recipient each message will appear to come from the same number.
  • SMS and Voice capable: Businesses can use the same long code numbers for both SMS and voice communications to provide consistent contact experiences to their customers.
  • High reliability and delivery rate (10DLC): 10DLC has been adopted by United States wireless carriers specifically for business messaging. By requiring a registration and pre-vetting process, wireless carriers can support higher message volumes and better deliverability while protecting consumers from potential abuse.
  • Low costs: 10DLC and dedicated long code numbers cost less than dedicate short codes.
  • Fast provisioning: Dedicated long codes are typically provisioned within 24-hours. 10DLC numbers are provisioned in about 1 week. These timelines are much shorter than the 10+ weeks needed for short code provisioning.

Disadvantages:

  • Carrier specific limits (10DLC): United States mobile carriers have announced varying throughput and volume limits depending on the tier assigned to your company. This can make planning more difficult. See 10DLC capabilities for an outline of carrier-specific limits.
  • Limited Throughput (Long Codes): Mobile carriers limit the throughput rate of dedicated long codes to 1 message part per second (MPS) in Canada and 10 MPS in all other countries and regions. This creates a bottleneck when messaging thousands of recipients.
  • Transactional messaging only (Long Codes): Mobile carriers prohibit the use of long codes for promotional or marketing messages. Violations could cause carriers to block messages, impose fines, or shut down service.
  • Difficult to remember: Compared to 5-digit to 6-digit short codes, 10-digit numbers are more difficult for customers to remember and to enter into their devices.

Toll-free numbers

Similar to dedicated long codes, toll-free numbers are 10-digit phone numbers beginning with one of the following area codes: 800, 888, 877, 866, 855, 844, or 833. Toll-free numbers are primarily used for transactional messages but they can be used promotional messaging if recipients opt-in to receiving messages and the opt-out rate is low.

Common use cases: Customer service, appointment reminders, two-way communications, fraud or emergency notifications.

Advantages:

  • SMS and Voice capable: Businesses can use the same toll-free numbers for both SMS and voice communications to provide consistent contact experiences to their customers.
  • Low costs: Like 10DLC and dedicated long code numbers, toll-free numbers cost significantly less than dedicate short codes.
  • Fast Provisioning: Typically, toll-free numbers are available immediately after request submission.

Disadvantages:

  • Supported in United States Only: Currently Amazon Pinpoint only supports SMS-enabled toll-free numbers in the United States. A toll-free number cannot be used to send messages outside of the US.
  • Limited Throughput: Mobile carriers limit the throughput rate of toll-free numbers to 3 MPS.

Dedicated short codes

Dedicated short codes are 3-digit to 8-digit numbers are commonly used for high-throughput application-to-person (A2P) messaging workloads. Mobile carriers review and approve all new short code requests before making them active. This vetting process allows messages sent using short codes to bypass carrier filters.

Common use-cases: Promotional messaging, emergency alert systems, mass communications, contest and voting submissions, and two-factor authentication.

Advantages:

  • High-throughput and high-volume messaging: Short codes are designed to be used for high volume messaging campaigns. The vetting and approval process required before activating short codes allows carriers to provide increased MPS throughput and daily message volume quotas while still protecting consumers from abuse.
  • High reliability and delivery rate: Due to the lengthy approval and audit process required to acquire a dedicated short code, short codes are not subject to carrier filtering resulting in reliable message delivery.
  • Easy to remember: Short codes are commonly 5–6 digits in length. This short length allows users to easily recognize and remember codes, increasing campaign effectiveness.

Disadvantages:

  • Lengthy Provisioning Time: It can take 8–12 weeks for short codes to become active on all carrier networks.
  • High Cost: Short code numbers have high one-time setup and monthly fees compared to other originating identities.  For example, in the United States, there’s a $650 one-time setup fee plus an additional recurring charge of $995.00 per month for each short code.
  • Strict rules and regulations: Short codes are governed by different regulatory bodies depending on the country and region. For example, in the United States short codes are regulated by the Federal Communications Commission (FCC), Federal Trade Commission (FTC), and Cellular Telecommunications Industry Association (CTIA). Review Best Practices to learn about the key SMS messaging laws around the world.

 

Number type Number format Channel support Two-way capable Requires registration Estimated Provisioning time SMS throughput (message segments per-second)² Pricing³
1 Long Code/10DLC 10DLC:
10 digits
Long Codes: up to 12 digits
SMS
Voice
Yes* Yes 1 week United States (US): Varies¹
Canada (CA): 1 MPS
All other countries and regions: 10 MPS
$1/mo + registration
2 Short code 3–8 digits SMS Yes* Yes United States (US): 12 weeks
Canada (CA):
16 weeks
All other countries and regions: Varies
United States (US) 100 MPS
Canada (CA): 100 MPS
All other countries and regions: Varies by country
United States (US): $650 setup + $995/mo
Canada (CA): $3,000 setup + $995/mo
All other countries and regions: Varies
3 Toll-free 10 digits SMS
Voice
Yes* No Available immediately United States (US): 3 MPS
Canada (CA): N/A
All other countries and regions: N/A
$2/mo

¹ https://docs.aws.amazon.com/pinpoint/latest/userguide/settings-10dlc.html
² https://docs.aws.amazon.com/pinpoint/latest/userguide/channels-sms-limitations-characters.html
³ https://aws.amazon.com/pinpoint/pricing/#Numbers

*visit Supported countries and regions (SMS channel) to check the SMS capabilities in your recipients’ countries.

Message templates

When you start using Amazon Pinpoint, you’ll want to think about your message templates.  These templates are the context of your messages and can be used for any of the four supported messaging channels–SMS, Voice, Push Notifications, and Voice.

When creating your templates, you can use custom attributes that you imported when building your segment.  We won’t be going into building your segments as that’s covered in Building segments.

We’ll outline how to create a template for SMS messages as that’s focus for this blog post.  The first section we have to fill out is our template details. You can access template creation page by navigating to the Message templates section in the left ‘hamburger’ menu when on the pinpoint service page.

Here you’ll start by naming your template and giving this initial version a description.  A sample format you can follow for naming your template names is channel_message-type_segment_target-campaign. For our example we’ll be using the name SMS_Transactional_NEUSA-Customers_WelcomeNewCustomer.

Next, we’ll build our template. We open the attribute finder by clicking on ‘Case attribute finder’ on the top right of that section and then loading our “Custom Attributes.”

When writing your message templates, you need to think through how these are perceived by the reader and if any words or phrases appear to be a spam message.  We’ll examine what makes a message to appear like spam in the next section as we cover message structure.

Visit the documentation to learn the latest guidance on templates. 

Message structure

The key part of building an Amazon Pinpoint template is the message.

First, there are a few components that make up your SMS message; the greeting, body, and closing.  We’ll start with the greeting section as this is the first thing our recipients will read. When you build out your campaign in Amazon Pinpoint, more on Amazon Pinpoint campaigns by visiting Amazon Pinpoint campaigns, you choose the type of message you’ll be sending.  This is one of the reasons why our naming convention contains “message-type” portion. You choose between Transactional or Promotional Messages.

When sending out promotional messages you’ll want to avoid directly addressing the person by name in the greeting.

The reason for this is when sending out promotional messages they are traditionally sent in bulk through various carriers around the world.  Therefore, using the recipient’s name in the greeting could be viewed as a targeted spam message and something you should avoid when creating templates for your promotional messages. Instead consider using your company name for the greeting as this quickly tells readers who the message is from upfront.

Transactional messages have a bit more leeway when it comes to the greeting section and the recipient’s name can be used, with some caveats.  You’ll still want to avoid using language that indicates spam, for example, punctuation like “Hi Bob-” or “Greetings Jane!” and instead greet your customers with a purpose “Bob’s order record:” or “Jane: Account Update”. Otherwise, you can leave out the name all together or substitute it with your company name. The goal with the greeting in a transactional message is to give the user an idea what the message is about.

Now that we have an idea of our greeting, we move onto the body which will contain the context of your message and any links or data you may want to pass on to the customer. Let’s start off with introducing randomness and taking advantage of the “Attributes Finder” panel we enabled earlier.  We can use attributes for common things like the customer’s name, account identifier, payment due, city, or any other information we may have on the recipient as part of our segment for our template as well.

If a URL is needed for the user to perform some action once they receive that message, this is an easy way to achieve randomness. We’ll want each URL to be unique which we can do by using a URL shortener like the one described by Eric Johnson @ https://aws.amazon.com/blogs/compute/building-a-serverless-url-shortener-app-without-lambda-part-1/. Other options to add uniqueness to our message might be a timestamp with the seconds, an anti-phishing phrase, city or zip-code, or any other unique information that’s ideally not something personally identifiable.

In the example below, we’ve used the following template:

[Attributes.CompanyName] Purchase the items today and receive
[Attributes.DiscountAmount] of your next order.  Visit
[Attributes.ShortURL] to checkout and complete your order.

When messages are sent that use this template, Amazon Pinpoint will use the custom attributes provided in the segment data to populate the custom fields.

Proactive interactions

When sending messages to customers, it’s important to think about how to handle certain events that may come back from a message sent to a customer.  These include common events such as a customer opting out of messages or a message failure.  We’ll start by handling opt-outs as this is a common one and sometimes unintentional.

In the below screenshot, we’ve setup a Amazon Pinpoint journey (for more information about journeys, visit Amazon Pinpoint journeys to take actions based a customer’s response to a message.  If the customer responds back to the message with “stop”, we’ve setup the yes/no split to invoke a Lambda function and send the customer an email.  In this case our email may include a message like “You have opted out of SMS messaging, if this was unintentional, please visit your account to reactivate.”

We invoke a Lambda function to update the customer record with their SMS status.  On the customer portal, you could then have a button next to the SMS status, that when clicked, calls the appropriate APIs to opt-in the customer.  Another way to achieve this is to validate phone number using Pinpoint’s validation API even before the page loads and then show the same button the customer.

Another common scenario are message failures, which we can address in the same manner. Our journey is configured with the same entry, SMS send, and yes/no split. However, this time we evaluate if the message failed, and if it does, we send an email to the customer. Similar to opt-outs, we can use a Lambda trigger to update the customer database and then prompt the user to update their phone number. This scenario is commonly caused when phone numbers are incorrect or entered in wrong format. So, a great way to reduce errors is to validate the phone number after the customer enters it in (more information about how to validate numbers in Amazon Pinpoint can be found in the Developer Guide)

The following flow shows a typical Pinpoint journey and integration with your application backend.

Diagram showing Journeys workflow

Conclusion

In this blog post, we showed you how to use features of Amazon Pinpoint like templates, and then journeys, to automatically resolve failed messages, opt-outs, and other events. We also reviewed the various types of phone number options available in Amazon Pinpoint, as well as how to monitor your usage.

From here, navigate to the console, setup your project in Amazon Pinpoint, and begin testing the various channels. We recommend watching a video on building immersive experiences with Amazon Pinpoint and Journeys. We have only just begun showing you what is possible with Amazon Pinpoint today.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Use Amazon EC2 for cost-efficient cloud gaming with pay-as-you-go pricing

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/use-amazon-ec2-for-cost-efficient-cloud-gaming-with-pay-as-you-go-pricing/

This post is written by Markus Ziller, Solutions Architect

Since AWS launched in 2006, cloud computing disrupted traditional IT operations by providing a more cost-efficient, scalable, and secure alternative to owning hardware and data centers. Similarly, cloud gaming today enables gamers to play video games with pay-as-you go pricing. This removes the need of high upfront investments in gaming hardware. Cloud gaming platforms like Amazon Luna are an entryway, but customers are limited to the games available on the service. Furthermore, many customers also prefer to own their games, or they already have a sizable collection. For those use cases, vendor-neutral software like NICE DCV or Parsec are powerful solutions for streaming your games from anywhere.

This post shows a way to stream video games from the AWS Cloud to your local machine. I will demonstrate how you can provision a powerful gaming machine with pay-as-you-go pricing that allows you to play even the most demanding video games with zero upfront investment into gaming hardware.

The post comes with code examples on GitHub that let you follow along and replicate this post in your account.

In this example, I use the AWS Cloud Development Kit (AWS CDK), an open source software development framework, to model and provision cloud application resources. Using the CDK can reduce the complexity and amount of code needed to automate resource deployment.

Overview of the solution

The main services in this solution are Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic Block Store (Amazon EBS). The architecture is as follows:

The architecture of the solution. It shows an EC2 instance of the G4 family deployed in a public subnet. The EC2 instances communicates with S3. Also shown is how a security group controls access from users to the EC2 instance

The key components of this solution are described in the following list. During this post, I will explain each component in detail. Deploy this architecture with the sample CDK project that comes with this blog post.

  1. An Amazon Virtual Private Cloud (Amazon VPC) that lets you launch AWS resources in a logically isolated virtual network. This includes a network configuration that lets you connect with instances in your VPC on ports 3389 and 8443.
  2. An Amazon EC2 instance of the G4 instance family. Amazon EC2 G4 instances are the most cost-effective and versatile GPU instances. Utilize G4 instances to run your games.
  3. Access to an Amazon Simple Storage Service (Amazon S3) bucket. S3 is an object storage that contains the graphics drivers required for GPU instances.
  4. A way to create a personal gaming Amazon Machine Images (AMI). AMIs mean that you only need to conduct the initial configuration of your gaming instance once. After that, create new gaming instances on demand from your AMI.

Walkthrough

The following sections walk through the steps required to set up your personal gaming AMI. You will only have to do this once.

For this walkthrough, you need:

  • An AWS account
  • Installed and authenticated AWS CLI
  • Installed Node.js, TypeScript
  • Installed git
  • Installed AWS CDK

You will also need an EC2 key pair for the initial instance setup. List your available key pairs with the following CLI command:

aws ec2 describe-key-pairs --query 'KeyPairs[*].KeyName' --output table

Alternatively, create a new key pair with the CLI. You will need the created .pem file later, so make sure to keep it.

KEY_NAME=Gaming
aws ec2 create-key-pair --key-name $KEY_NAME –query 'KeyMaterial' --output text > $KEY_NAME.pem

Checkout and deploy the sample stack

  1. After completing the prerequisites, clone the associated GitHub repository by running the following command in a local directory:git clone [email protected]:aws-samples/cloud-gaming-on-ec2-instances
  1. Open the repository in your preferred local editor, and then review the contents of the *.ts files in cdk/bin/ and cdk/lib/
  1. In gaming-on-g4-instances.ts you will find two CDK stacks: G4DNStack and G4ADStack. A CDK stack is a unit of deployment. All AWS resources defined within the scope of a stack, either directly or indirectly, are provisioned as a single unit.

The architecture for both stacks is similar, and it only differs in the instance type that will be deployed.

G4DNStack and G4ADStack share parameters that determine the configuration of the EC2 instance, the VPC network and the preinstalled software. The stacks come with defaults for some parameters – I recommend keeping the default values.

EC2

  • instanceSize: Sets the EC2 size. Defaults to g4dn.xlarge and g4ad.4xlarge, respectively. Check the service page for a list of valid configurations.
  • sshKeyName: The name of the EC2 key pair you will use to connect to the instance. Ensure you have access to the respective .pem file.
  • volumeSizeGiB: The root EBS volume size. Around 20 GB will be used for the Windows installation, the rest will be available for your software.

VPC

  • openPorts: Access from these ports will be allowed. Per default, this will allow access for Remote Desktop Protocol (RDP) (3389) and NICE DCV (8443).
  • associateElasticIp: Controls if an Elastic IP address will be created and added to the EC2 instance. An Elastic IP address is a static public IPv4 address. Contrary to dynamic IPv4 addresses managed by AWS, it will not change after an instance restart. This is an optional convenience that comes at a small cost for reserving the IP address for you.
  • allowInboundCidr: Access from this CIDR range will be allowed. This limits the IP space from where your instance is routable. It can be used to add an additional security layer. Per default, traffic from any IP address will be allowed to reach your instance. Notwithstanding allowInboundCidr, an RDP or NICE DCV connection to your instance requires valid credentials and will be rejected otherwise.

Software

For g4dn instances, one more parameter is required (see process of installing NVIDIA drivers):

  • gridSwCertUrl: The NVIDIA driver requires a certificate file, which can be downloaded from Amazon S3.

Choose an instance type based on your performance and cost requirements, and then adapt the config accordingly. I recommend starting with the G4DNStack, as it comes at the lowest hourly instance cost. If you need more graphics performance, then choose the G4ADStack for up to 40% improvement in graphics performance.

After choosing an instance, follow the instructions in the README.md in order to deploy the stack.

The CDK will create a new Amazon VPC with a public subnet. It will also configure security groups to allow inbound traffic from the CIDR range and ports specific in the config. By default, this will allow inbound traffic from any IP address. I recommend restricting access to your current IP address by going to checkip.amazonaws.com and replacing 0.0.0.0/0 with <YOUR_IP>/32 for the allowInboundCidr config parameter.

Besides the security groups, the CDK also manages all required IAM permissions.

It creates the following resources in your VPC:

  • An EC2 GPU instance running Windows Server 2019. Depending on the stack you chose, the default instance type will be g4dn.xlarge or g4ad.4xlarge. You can override this in the template. When the EC2 instance is launched, the CDK runs an instance-specific PowerShell script. This PowerShell script downloads all drivers and NICE DCV. Check the code to see the full script or add your own commands.
  • An EBS gp3 volume with your defined size (default: 150 GB).
  • An EC2 launch template.
  • (Optionally) An Elastic IP address as a static public IP address for your gaming instances.

After the stack has been deployed, you will see the following output:

The output of a CDK deployment of the CloudGamingOnG4DN stack. Shows values for Credentials, InstanceId, KeyName, LaunchTemplateId, PublicIp. Values for Credentials, InstanceId and PublicIp are redacted

Click the first link, and download the remote desktop file in order to connect to your instance. Use the .pem file from the previous step to receive the instance password.

Install drivers on EC2

You will use the RDP to initially connect to your instance. RDP provides a user with a graphical interface to connect to another computer over a network connection. RDP clients are available for a large number of platforms. Use the public IP address provided by the CDK output, the username Administrator, and the password displayed by the EC2 dialogue.

Ensure that you note the password for later steps.

Most configuration process steps are automated by the CDK. However, a few actions (e.g., driver installation) cannot be properly automated. For those, the following manual steps are required once.

Navigate to $home\Desktop\InstallationFiles. If the folder contains an empty file named “OK”, then everything was downloaded correctly and you can proceed with the installation. If you connect while the setup process is still in progress, then wait until the OK file gets created before proceeding. This typically takes 2-3 minutes.

The next step differs slightly for g4ad and g4dn instances.

g4ad instances with AMD Radeon Pro V520

Follow the instructions in the EC2 documentation to install the AMD driver from the InstallationFiles folder. The installation may take a few minutes, and it will display the following output when successfully finished.

The output of the PowerShell command that installs the graphics driver for AMD Radeon Pro V520

The contents of the InstallationFiles folder. It contains a folder 1_AMD_driver and files 2_NICEDCV-Server, 3_NICEDCV-DisplayDriver, OK

Next, install NICE DCV Server and Display driver by double-clicking the respective files. Finally, restart the instance by running the Restart-Computer PowerShell command.

 

g4dn instances with NVIDIA T4

Navigate to 1_NVIDIA_drivers, run the NVIDIA driver installer for Windows Server 2019 and follow the instructions.

The contents of the InstallationFiles folder. It contains a folder 1_NVIDIA_drivers and files 2_NICEDCV-Server, 3_NICEDCV-DisplayDriver, 4_update_registry, OK

Next, double-click the respective files in the InstallationFiles folder in order to install NICE DCV Server and Display driver.

Finally, right-click on 4_update_registry.ps1, and select “Run with PowerShell” to activate the driver. In order to complete the setup, restart the instance.

Install your software

After the instance restart, you can connect from your local machine to your EC2 instance with NICE DCV. The NICE DCV bandwidth-adaptive streaming protocol allows near real-time responsiveness for your applications without compromising the image accuracy. This is the recommended way to stream latency sensitive applications.

Download the NICE DCV viewer client and connect to your EC2 instance with the same credentials that you used for the RDP connection earlier. After testing the NICE DCV connection, I recommend disabling RDP by removing the corresponding rule in your security group.

You are now all set to install your games and tools on the EC2 instance. Make sure to install it on the C: drive, as you will create an AMI with the contents of C: later.

Start and stop your instance on demand

At this point, you have fully set up the EC2 instance for cloud gaming on AWS. You can now start and stop the instance when needed. The following CLI commands are all you need to remember:

aws ec2 start-instances --instance-ids <INSTANCE_ID>

aws ec2 stop-instances --instance-ids <INSTANCE_ID>

This will use the regular On-Demand instance capacity of EC2, and you will be billed hourly charges for the time that your instance is running. If your instance is stopped, you will only be charged for the EBS volume and the Elastic IP address if you chose to use one.

Launch instances from AMI

Make sure you have installed all of the applications you require on your EC2, and then create your personal gaming AMI by running the following AWS CLI command.

aws ec2 create-image --instance-id <YOUR_INSTANCE_ID> --name <THE_NAME_OF_YOUR_AMI>

Use the following command to get details about your AMI creation. When the status changes from pending to available, then your AMI is ready.

aws ec2 describe-images --owners self --query 'Images[*].[Name, ImageId, BlockDeviceMappings[0].Ebs.SnapshotId, State]' --output table

The CDK created an EC2 launch template when it deployed the stack. Run the following CLI command to spin up an EC2 instance from this template.

aws ec2 run-instances --image-id <YOUR_AMI_ID> --launch-template LaunchTemplateName=<LAUNCH_TEMPLATE_NAME> --query "Instances[*].[InstanceId, PublicIpAddress]" --output table

This command will start a new EC2 instance with the exact same configuration as your initial EC2, but with all your software already installed.

Conclusion

This post walked you through creating your personal cloud gaming stack. You are now all set to lean back and enjoy the benefits of per second billing while playing your favorite video games in the AWS cloud.

Visit the Amazon EC2 G4 instances service page to learn more about how AWS continues to push the boundaries of cost-effectiveness for graphic-intensive applications.

Copy large datasets from Google Cloud Storage to Amazon S3 using Amazon EMR

Post Syndicated from Andrew Lee original https://aws.amazon.com/blogs/big-data/copy-large-datasets-from-google-cloud-storage-to-amazon-s3-using-amazon-emr/

Many organizations have data sitting in various data sources in a variety of formats. Even though data is a critical component of decision-making, for many organizations this data is spread across multiple public clouds. Organizations are looking for tools that make it easy and cost-effective to copy large datasets across cloud vendors. With Amazon EMR and the Hadoop file copy tools Apache DistCp and S3DistCp, we can migrate large datasets from Google Cloud Storage (GCS) to Amazon Simple Storage Service (Amazon S3).

Apache DistCp is an open-source tool for Hadoop clusters that you can use to perform data transfers and inter-cluster or intra-cluster file transfers. AWS provides an extension of that tool called S3DistCp, which is optimized to work with Amazon S3. Both these tools use Hadoop MapReduce to parallelize the copy of files and directories in a distributed manner. Data migration between GCS and Amazon S3 is possible by utilizing Hadoop’s native support for S3 object storage and using a Google-provided Hadoop connector for GCS. This post demonstrates how to configure an EMR cluster for DistCp and S3DistCP, goes over the settings and parameters for both tools, performs a copy of a test 9.4 TB dataset, and compares the performance of the copy.

Prerequisites

The following are the prerequisites for configuring the EMR cluster:

  1. Install the AWS Command Line Interface (AWS CLI) on your computer or server. For instructions, see Installing, updating, and uninstalling the AWS CLI.
  2. Create an Amazon Elastic Compute Cloud (Amazon EC2) key pair for SSH access to your EMR nodes. For instructions, see Create a key pair using Amazon EC2.
  3. Create an S3 bucket to store the configuration files, bootstrap shell script, and the GCS connector JAR file. Make sure that you create a bucket in the same Region as where you plan to launch your EMR cluster.
  4. Create a shell script (sh) to copy the GCS connector JAR file and the Google Cloud Platform (GCP) credentials to the EMR cluster’s local storage during the bootstrapping phase. Upload the shell script to your bucket location: s3://<S3 BUCKET>/copygcsjar.sh. The following is an example shell script:
#!/bin/bash
sudo aws s3 cp s3://<S3 BUCKET>/gcs-connector-hadoop3-latest.jar /tmp/gcs-connector-hadoop3-latest.jar
sudo aws s3 cp s3://<S3 BUCKET>/gcs.json /tmp/gcs.json
  1. Download the GCS connector JAR file for Hadoop 3.x (if using a different version, you need to find the JAR file for your version) to allow reading of files from GCS.
  2. Upload the file to s3://<S3 BUCKET>/gcs-connector-hadoop3-latest.jar.
  3. Create GCP credentials for a service account that has access to the source GCS bucket. The credentials should be named json and be in JSON format.
  4. Upload the key to s3://<S3 BUCKET>/gcs.json. The following is a sample key:
{
   "type":"service_account",
   "project_id":"project-id",
   "private_key_id":"key-id",
   "private_key":"-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n",
   "client_email":"service-account-email",
   "client_id":"client-id",
   "auth_uri":"https://accounts.google.com/o/oauth2/auth",
   "token_uri":"https://accounts.google.com/o/oauth2/token",
   "auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs",
   "client_x509_cert_url":"https://www.googleapis.com/robot/v1/metadata/x509/service-account-email"
}
  1. Create a JSON file named gcsconfiguration.json to enable the GCS connector in Amazon EMR. Make sure the file is in the same directory as where you plan to run your AWS CLI commands. The following is an example configuration file:
[
   {
      "Classification":"core-site",
      "Properties":{
         "fs.AbstractFileSystem.gs.impl":"com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS",
         "google.cloud.auth.service.account.enable":"true",
         "google.cloud.auth.service.account.json.keyfile":"/tmp/gcs.json",
         "fs.gs.status.parallel.enable":"true"
      }
   },
   {
      "Classification":"hadoop-env",
      "Configurations":[
         {
            "Classification":"export",
            "Properties":{
               "HADOOP_USER_CLASSPATH_FIRST":"true",
               "HADOOP_CLASSPATH":"$HADOOP_CLASSPATH:/tmp/gcs-connector-hadoop3-latest.jar"
            }
         }
      ]
   },
   {
      "Classification":"mapred-site",
      "Properties":{
         "mapreduce.application.classpath":"/tmp/gcs-connector-hadoop3-latest.jar"
      }
   }
]

Launch and configure Amazon EMR

For our test dataset, we start with a basic cluster consisting of one primary node and four core nodes for a total of five c5n.xlarge instances. You should iterate on your copy workload by adding more core nodes and check on your copy job timings in order to determine the proper cluster sizing for your dataset.

  1. We use the AWS CLI to launch and configure our EMR cluster (see the following basic create-cluster command):
aws emr create-cluster \
--name "My First EMR Cluster" \
--release-label emr-6.3.0 \
--applications Name=Hadoop \
--ec2-attributes KeyName=myEMRKeyPairName \
--instance-type c5n.xlarge \
--instance-count 5 \
--use-default-roles

  1. Create a custom bootstrap action to be performed at cluster creation to copy the GCS connector JAR file and GCP credentials to the EMR cluster’s local storage. You can add the following parameter to the create-cluster command to configure your custom bootstrap action:
--bootstrap-actions Path="s3://<S3 BUCKET>/copygcsjar.sh"

Refer to Create bootstrap actions to install additional software for more details about this step.

  1. To override the default configurations for your cluster, you need to supply a configuration object. You can add the following parameter to the create-cluster command to specify the configuration object:
--configurations file://gcsconfiguration.json

Refer to Configure applications when you create a cluster for more details on how to supply this object when creating the cluster.

Putting it all together, the following code is an example of a command to launch and configure an EMR cluster that can perform migrations from GCS to Amazon S3:

aws emr create-cluster \
--name "My First EMR Cluster" \
--release-label emr-6.3.0 \
--applications Name=Hadoop \
--ec2-attributes KeyName=myEMRKeyPairName \
--instance-type c5n.xlarge \
--instance-count 5 \
--use-default-roles \
--bootstrap-actions Path="s3:///copygcsjar.sh" \
--configurations file://gcsconfiguration.json

Submit S3DistCp or DistCp as a step to an EMR cluster

You can run the S3DistCp or DistCp tool in several ways.

When the cluster is up and running, you can SSH to the primary node and run the command in a terminal window, as mentioned in this post.

You can also start the job as part of the cluster launch. After the job finishes, the cluster can either continue running or be stopped. You can do this by submitting a step directly via the AWS Management Console when creating a cluster. Provide the following details:

  • Step type – Custom JAR
  • NameS3DistCp Step
  • JAR locationcommand-runner.jar
  • Argumentss3-dist-cp --src=gs://<GCS BUCKET>/ --dest=s3://<S3 BUCKET>/
  • Action of failure – Continue

We can always submit a new step to the existing cluster. The syntax here is slightly different than in previous examples. We separate arguments by commas. In the case of a complex pattern, we shield the whole step option with single quotation marks:

aws emr add-steps \
--cluster-id j-ABC123456789Z \
--steps 'Name=LoadData,Jar=command-runner.jar,ActionOnFailure=CONTINUE,Type=CUSTOM_JAR,Args=s3-dist-cp,--src=gs://<GCS BUCKET>/, --dest=s3://<S3 BUCKET>/'

DistCp settings and parameters

In this section, we optimize the cluster copy throughput by adjusting the number of maps or reducers and other related settings.

Memory settings

We use the following memory settings:

-Dmapreduce.map.memory.mb=1536
-Dyarn.app.mapreduce.am.resource.mb=1536

Both parameters determine the size of the map containers that are used to parallelize the transfer. Setting this value in line with the cluster resources and the number of maps defined is key to ensuring efficient memory usage. You can calculate the number of launched containers by using the following formula:

Total number of launched containers = Total memory of cluster / Map container memory

Dynamic strategy settings

We use the following dynamic strategy settings:

-Ddistcp.dynamic.max.chunks.tolerable=4000
-Ddistcp.dynamic.split.ratio=3 -strategy dynamic

The dynamic strategy settings determine how DistCp splits up the copy task into dynamic chunk files. Each of these chunks is a subset of the source file listing. The map containers then draw from this pool of chunks. If a container finishes early, it can get another unit of work. This makes sure that containers finish the copy job faster and perform more work than slower containers. The two tunable settings are split ratio and max chunks tolerable. The split ratio determines how many chunks are created from the number of maps. The max chunks tolerable setting determines the maximum number of chunks to allow. The setting is determined by the ratio and the number of maps defined:

Number of chunks = Split ratio * Number of maps
Max chunks tolerable must be > Number of chunks

Map settings

We use the following map setting:

-m 640

This determines the number of map containers to launch.

List status settings

We use the following list status setting:

-numListstatusThreads 15

This determines the number of threads to perform the file listing of the source GCS bucket.

Sample command

The following is a sample command when running with 96 core or task nodes in the EMR cluster:

hadoop distcp
-Dmapreduce.map.memory.mb=1536 \
-Dyarn.app.mapreduce.am.resource.mb=1536 \
-Ddistcp.dynamic.max.chunks.tolerable=4000 \
-Ddistcp.dynamic.split.ratio=3 \
-strategy dynamic \
-update \
-m 640 \
-numListstatusThreads 15 \
gs://<GCS BUCKET>/ s3://<S3 BUCKET>/

S3DistCp settings and parameters

When running large copies from GCS using S3DistCP, make sure you have the parameter fs.gs.status.parallel.enable (also shown earlier in the sample Amazon EMR application configuration object) set in core-site.xml. This helps parallelize getFileStatus and listStatus methods to reduce latency associated with file listing. You can also adjust the number of reducers to maximize your cluster utilization. The following is a sample command when running with 24 core or task nodes in the EMR cluster:

s3-dist-cp -Dmapreduce.job.reduces=48 --src=gs://<GCS BUCKET>/--dest=s3://<S3 BUCKET>/

Testing and performance

To test the performance of DistCp with S3DistCp, we used a test dataset of 9.4 TB (157,000 files) stored in a multi-Region GCS bucket. Both the EMR cluster and S3 bucket were located in us-west-2. The number of core nodes that we used in our testing varied from 24–120.

The following are the results of the DistCp test:

  • Workload – 9.4 TB and 157,098 files
  • Instance types – 1x c5n.4xlarge (primary), c5n.xlarge (core)
Nodes Throughput Transfer Time Maps
24 1.5GB/s 100 mins 168
48 2.9GB/s 53 mins 336
96 4.4GB/s 35 mins 640
120 5.4GB/s 29 mins 840

The following are the results of the S3DistCp test:

  • Workload – 9.4 TB and 157,098 files
  • Instance types – 1x c5n.4xlarge (primary), c5n.xlarge (core)
Nodes Throughput Transfer Time Reducers
24 1.9GB/s 82 mins 48
48 3.4GB/s 45 mins 120
96 5.0GB/s 31 mins 240
120 5.8GB/s 27 mins 240

The results show that S3DistCP performed slightly better than DistCP for our test dataset. In terms of node count, we stopped at 120 nodes because we were satisfied with the performance of the copy. Increasing nodes might yield better performance if required for your dataset. You should iterate through your node counts to determine the proper number for your dataset.

Using Spot Instances for task nodes

Amazon EMR supports the capacity-optimized allocation strategy for EC2 Spot Instances for launching Spot Instances from the most available Spot Instance capacity pools by analyzing capacity metrics in real time. You can now specify up to 15 instance types in your EMR task instance fleet configuration. For more information, see Optimizing Amazon EMR for resilience and cost with capacity-optimized Spot Instances.

Clean up

Make sure to delete the cluster when the copy job is complete unless the copy job was a step at the cluster launch and the cluster was set up to stop automatically after the completion of the copy job.

Conclusion

In this post, we showed how you can copy large datasets from GCS to Amazon S3 using an EMR cluster and two Hadoop file copy tools: DistCp and S3DistCp.

We also compared the performance of DistCp with S3DistCp with a test dataset stored in a multi-Region GCS bucket. As a follow-up to this post, we will run the same test on Graviton instances to compare the performance/cost of the latest x86 based instances vs. Graviton 2 instances.

You should conduct your own tests to evaluate both tools and find the best one for your specific dataset. Try copying a dataset using this solution and let us know your experience by submitting a comment or starting a new thread on one of our forums.


About the Authors

Hammad Ausaf is a Sr Solutions Architect in the M&E space. He is a passionate builder and strives to provide the best solutions to AWS customers.

Andrew Lee is a Solutions Architect on the Snap Account, and is based in Los Angeles, CA.

Replace traditional email mailbox polling with real-time reads using Amazon SES and Lambda

Post Syndicated from agardezi original https://aws.amazon.com/blogs/messaging-and-targeting/replace-traditional-email-mailbox-polling-with-real-time-reads-using-amazon-ses-and-lambda/

Integrating emails into an automated workflow for automated processing can be challenging. Traditionally, applications have had to use the POP protocol to connect to mail servers and poll for emails to arrive in a mailbox and then process the messages inline and perform actions on the message. This can be an inefficient mechanism and prone to errors that result in the workflow missing messages. Since this method requires polling it’s not great if you need real-time processing of messages and introduces inefficiencies in the design. Amazon Simple Email Service (Amazon SES) is a cost effective, scalable and flexible email service with support for different workflows including the ability to perform spam checks and virus scans. In this blog you will see how to use Amazon SES with AWS Lambda and Amazon S3 in order to automate the processing of emails in real time and integrate with an application without the need for polling.

The use case explored in this blog focuses on automation for CRM or order processing platforms and processing of email related to customer contact or direct email requests. An example of this use case is copying a client engagement email to Salesforce (or any other database) where it is recorded and can later be categorized or attached to the appropriate client account or opportunity. When designing an application that needs to read emails from a mailbox, a developer would traditionally have to use a mail library (like JavaMail if using Java) to make a call to the mailbox, authenticate and then pull messages into an application object. This would mean polling the mailbox every 10 – 15 minutes to check for new messages, handle errors when the mailbox is unavailable and maintaining a fully functioning mailbox. This solution can help you implement automated processing of emails arriving in a mailbox without the need to poll the mailbox. The entire solution will be implemented in a serverless fashion.

Solution

This blog post shows how to use SES to perform automated processing of email in an application workflow. I will use the option in SES to save received emails in S3 and trigger a Lambda function to process the message without having to poll a mailbox. This sample application demo is using email to receive simple orders which get automatically processed and the details stored in DynamoDB. The following diagram shows the high-level architecture:

Step 1: Create an S3 Bucket for Email Storage

Start by creating an S3 bucket where received emails will be stored in order for the full email to be processed by the lambda. The bucket must have a policy attached so SES can put objects in the bucket on your behalf:

{
  "Version":"2012-10-17",
  "Statement":[
    {
      "Sid":"AllowSESPuts",
      "Effect":"Allow",
      "Principal":{
        "Service":"ses.amazonaws.com"
      },
      "Action":"s3:PutObject",
      "Resource":"arn:aws:s3:::myBucket/*",
      "Condition":{
        "StringEquals":{
          "aws:Referer":"111122223333"
        }
      }
    }
  ]
}

Make the following changes to the preceding policy example:

  1. Replace myBucket with the name of the Amazon S3 bucket that you want to write to.
  2. Replace 111122223333 with your AWS account ID.

You can find out more about the policy here.

Step 2: Create DynamoDB Table to Simulate Application

Next, add a DynamoDB table. The DynamoDB table will store the incoming order info. For this sample we will keep it simple and have a table with email as a partition key. Here is the data model:

{   
    "email_order_received": {
        "email": "string",
        "itemname": "string",
        "quantity": "number"
    }   
}

Step 3: Create Lambda Function triggered by SES to Process Email

Now the DynamoDB table is ready, create the Lambda function to process the email and send data to the DynamoDB table. The lambda function needs an execution role that has permissions to access the S3 bucket, the DynamoDB table and create the CloudWatch log group. It also needs a Resource-based Policy so SES can invoke the Lambda function. In the final step when we configure SES to call the lambda, SES automatically adds the necessary permissions to the function as detailed here.  This is a sample policy statement:

{
  "Version": "2012-10-17",
  "Id": "default",
  "Statement": [
    {
      "Sid": "allowSesInvoke",
      "Effect": "Allow",
      "Principal": {
        "Service": "ses.amazonaws.com"
      },
      "Action": "lambda:InvokeFunction",
      "Resource": "arn:aws:lambda:eu-west-1:111122223333:function:email-event-ses",
      "Condition": {
        "StringEquals": {
          "AWS:SourceAccount": "111122223333"
        }
      }
    }
  ]
}

Sample Lambda code in python:

import boto3
import email


def lambda_handler(event, context):
    s3 = boto3.client("s3")
    dynamodb = boto3.resource("dynamodb")
    table = dynamodb.Table('email_order_received')
    
    print("Spam filter")
    # Check the SES spam and virus filter settings
    if (
        event["Records"][0]["ses"]["receipt"]["spfVerdict"]["status"] == "FAIL" or
        event["Records"][0]["ses"]["receipt"]["dkimVerdict"]["status"] == "FAIL" or
        event["Records"][0]["ses"]["receipt"]["spamVerdict"]["status"] == "FAIL" or
        event["Records"][0]["ses"]["receipt"]["virusVerdict"]["status"] == "FAIL"
       ):
        print("Dropping Spam")
    else:
        print("Not Spam")
        email_bucket = "email-handling-test"
        bucketkey = "monitor/" + event["Records"][0]["ses"]["mail"]["messageId"]
    
        fileObj = s3.get_object(Bucket = email_bucket, Key=bucketkey)
    
        msg = email.message_from_bytes(fileObj['Body'].read())
        From = msg['From']
        itemname = msg['Subject']
        body = ""
        if msg.is_multipart():
            for part in msg.walk():
                type = part.get_content_type()
                disp = str(part.get('Content-Disposition'))
                # look for plain text parts, but skip attachments
                if type == 'text/plain' and 'attachment' not in disp:
                    charset = part.get_content_charset()
                    # decode the base64 unicode bytestring into plain text
                    body = part.get_payload(decode=True).decode(encoding=charset, errors="ignore")
                    # if we've found the plain/text part, stop looping thru the parts
                    break
        else:
            # not multipart - i.e. plain text, no attachments
            charset = msg.get_content_charset()
            body = msg.get_payload(decode=True).decode(encoding=charset, errors="ignore")
            
        table.put_item(
            Item={
                'email': From,
                'itemname': itemname,
                'quantity': body
            }
        )
        print("inserted data into dynamodb")

When you add a Lambda action to a receipt rule, Amazon SES sends an event record to Lambda every time it receives an incoming message. This event contains information about the email headers for the incoming message, as well as the results of tests (spam filtering and virus scanning) that Amazon SES performs on incoming messages, however it omits the body of the incoming email. This is why the lambda has to process the body form the email stored in S3. You can see details of the event here. In this demo app we assume the item name is in the subject and the body of the email has the quantity of the items and this data is written to the DynamoDB table.

Step 4: Configure SES to Send Emails to S3 and Trigger Lambda Function

The final step is to configure Amazon SES. Start by verifying a domain so SES can use it to send and receive emails. Domain verification helps ensure you are the owner of the domain and are thus authorised to manage the sending and receiving of the emails from addresses in the domain. To verify your domain:

  1. In the SES console in the navigation pane under Identity Management, choose Domains.
  2. Choose Verify new Domain
  3. In the Verify new Domain dialog enter your domain name
  4. Choose Verify This Domain
  5. In the dialogue box you will see a Domain verification record set. You need to add this record to your domain DNS server. You will also have to add the email receiving record (MX Record) to you domain DNS server.
  6. If your DNS server is Route53 and it is registered under the same account then SES also gives you the option to update your DNS server from within the SES console.

Once the domain is verified its status goes from “pending verification” to “verified” and now it can used it to send and receive emails.

Next, create a recipient rule set. The Rule Set lets you specify what SES does with emails it receives on domains you own. You can create rules for individual addresses or any address under the domain. To create the Rule Set:

  1. In the left navigation pane, under Email Receiving, choose Rule Sets.
  2. Choose Create Rule.
  3. Enter the recipient email address you want to configure the rule for. You can add up to a maximum of 100 recipient addresses or just set it up for any address in the domain using just the domain name as a wildcard.
  4. Once the addresses have been added, add the actions for the rule. Add two actions:
    1. First one is of type S3, this is to save a copy of the email to the S3 bucket created in step 1. Select the bucket name created in step 1 from the drop-down list. You can add a prefix to the filename as well to categorise the output of different rules.
    2. Second is of type Lambda to trigger the lambda for processing the email. Select the lambda created in step 3 from the drop-down list.

Once the SES Rule is configured, we have the full workflow in place. Now any email sent to the [email protected] address will be processed by the Lambda. In this way you can configure email processing to be part of your application workflow without having to perform polling.

Clean-up

To clean up the resources used in your account:

  1. Navigate to Amazon S3 and delete the contents of the bucket you created where your emails are stored.
  2. Once the bucket is empty, delete the bucket.
  3. Navigate to the DynamoDB console and delete the table you created above. Make sure you select the option to “Delete all CloudWatch alarms for this table”
  4. Remove the domain from Amazon SES. To do this, navigate to the Amazon SES Console and choose Domains from the left navigation. Select the domain you want to remove and choose Remove button to remove it from Amazon SES.
  5. From the Amazon SES Console, navigate to the Rule Sets from the left navigation. On the Active Rule Set section, choose View Active Rule Set button and delete all the rules you have created, by selecting the rule and choosing Action, Delete.
  6. On the Rule Sets page choose Disable Active Rule Set button to disable listening for incoming email messages.
  7. On the Rule Sets page, Inactive Rule Sets section, delete the only rule set, by selecting the rule set and choosing Action, Delete.
  8. Navigate to the Lambda console and delete the Lambda you created earlier. Select the Lambda and choose Delete from the Actions menu.
  9. Navigate to CloudWatch console and from the left navigation choose Logs, Log groups. Find the log group that belongs to the resources and delete it by selecting it and choosing Actions, Delete log group(s).

Conclusion

In this post, we have shown you how to integrate email processing into an application workflow without having to resort to polling a mail box.

By using SES to receive emails you can create a modular serverless architecture that allows emails to be processed and checked for spam plus viruses and the output can then be sent to any downstream system or stored in a database for application use.


About the Author

Syed Ali Abbas Gardezi is a Sr. Solution Architect for AWS based in London, United Kingdom. He works with AWS GSI Partners architecting, designing and implementing various large-scale IT solution. Before joining AWS he worked in several Architecture roles in a tier 1 financial organisation in London.

AWS Control Tower Account vending through Amazon Lex ChatBot

Post Syndicated from Marco Fischer original https://aws.amazon.com/blogs/devops/aws-control-tower-account-vending-through-amazon-lex-chatbot/

In this blog post you will learn about a multi-environment solution that uses a cloud native CICD pipeline to build, test, and deploy a Serverless ChatOps bot that integrates with AWS Control Tower Account Factory for AWS account vending. This solution can be used and integrated with any of your favourite request portal or channel that allows to call a RESTFUL API endpoint, for you to offer AWS Account vending at scale for your enterprise.

Introduction

Most of the AWS Control Tower customers use the AWS Control Tower Account Factory (a Service Catalog product), and the ServiceCatalog service to vend standardized AWS Services and Products into AWS Accounts. ChatOps is a collaboration model that interconnects a process with people, tools, and automation. It combines a Bot that can fulfill service requests (the work needed) and be augmented by Ops and Engineering staff in order to allow approval processes or corrections in the case of exception request. Major tasks in the public Cloud go toward building a proper foundation (the so called LandingZone). The main goals of this foundation are providing not only an AWS Account access (with the right permissions), but also the correct Cloud Center of Excellence (CCoE) approved products and services. This post demonstrates how to utilize the existing AWS Control Tower Account Factory, extending the Service Catalog portfolio in Control Tower with additional products, and executing Account vending and Product vending through an easy ChatBot interface. You will also learn how to utilize this Solution with Slack. But it can also be easily utilized with Chime/MS Teams or a normal Web-frontend, as the integration is channel-agnostig through an API Gateway integration layer. Then, you will combine all of this, integrating a ChatBot frontend where users can issue requests against the CCoE and Ops team to fulfill AWS services easily and transparently. As a result, you experience a more efficient process for vending AWS Accounts and Products and taking away the burden on your Cloud Operations team.

Background

  • An AWS Account Factory Account account is an AWS account provisioned using account factory in AWS Control Tower.
  • AWS Service Catalog lets you to centrally manage commonly deployed IT services. For this blog, account factory utilizes AWS Service Catalog to provision new AWS accounts.
  • Control Tower provisioned product is an instance of the Control Tower Account Factory product that is provisioned by AWS Service Catalog. In this post, any new AWS account created through the ChatOps solution will be a provisioned product and visible in Service Catalog.
  • Amazon Lex: is a service for building conversational interfaces into any application using voice and text

Architecture Overview

The following architecture shows the overview of the solution which will be built with the code provided through Github.

Multi-Environment CICD Architecture

The multi-environment pipeline is building 3 environments (Dev, Staging, Production) with different quality gates to push changes on this solution from a “Development Environment” up to a “Production environment”. This will make sure that your AWS ChatBot and the account vending is scalable and fully functional before you release it to production and make it available to your end-users.

  • AWS Code Commit: There are two repositories used, one repository where Amazon Lex bot is created through a Java-Lambda function and installed in STEP 1. And one for the Amazon Lex bot APIs that are running and capturing the Account vending requests behind API Gateway and then communicating with the Amazon Lex Bot.
  • AWS Code Pipeline: It integrates CodeCommit and CodeBuild and CodeDeploy, to be manage your release pipelines moving from Dev to Production.
  • AWS Code Build: Each different activity executed inside the pipeline is a CodeBuild activity. Inside the source code repository there are different files with the prefix buildspec-. Each of these files contains the exact commands that the code build must execute on each of the stages: build/test.
  • AWS Code Deploy: Tthis is an AWS service that manages the deployment of the serverless application stack. In this solution it implements a canary deployment where in the first minute we switch 10% of the requests to the new version of it which will allow to test the scaling of the solution. (CodeDeployDefaultLambdaCanary10Percent5Minutes)

AWS ControlTower Account Vending integration and ChatOps bot architecture

AWS ControlTower Account Vending integration and ChatOps bot architecture

The actual Serverless Application architecture built with Amazon Lex and the Application code in Lambda accessible through Amazon API Gateway, which will allow you to integrate this solution with almost any front-end (Slack, MS Teams, Website).

  • Amazon Lex: With Amazon Lex, the same deep learning technologies that power Amazon Alexa are now available to any developer, enabling you to quickly and easily build sophisticated, natural language, conversational bots (“chatbots”). As Amazon lex is not available yet in all AWS regions that currently AWS Control Tower is supported, it may be that you want to deploy Amazon Lex in another region than you have AWS Control Tower deployed.
  • Amazon API Gateway / AWS Lambda: The API Gateway is used as a central entry point for the Lambda functions (AccountVendor) that are capturing the Account vending requests from a frontend (e.g. Slack or Website). As Lambda functions can not be exposed directly as a REST service, they need a trigger which in this case API Gateway does.
  • Amazon SNS: Amazon Simple Notification Service (Amazon SNS) is a fully managed messaging service. SNS is used to send notifications via e-mail channel to an approver mailbox.
  • Amazon DynamoDB: Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It’s a fully managed, multi-region, multi-active, durable database. Amazon DynamoDB will store the Account vending requests from the Lambda code that get triggered by the Lex-bot interaction.

Solution Overview and Prerequisites

Solution Overview

Start with building these 2 main components of the Architecture through an automated script. This will be split into “STEP 1”, and “STEP 2” in this walkthrough. “STEP 3” and “STEP 4” will be testing the solution and then integrating the solution with a frontend, in this case we use Slack as an example and also provide you with the Slack App manifest file to build the solution quickly.

  • STEP 1) “Install Amazon Lex Bot”: The key part of the left side of the Architecture, the Amazon Lex Bot called (“ChatOps” bot) will be built in a first step, then
  • STEP 2) “Build of the multi-environment CI/CD pipeline”: Build and deploy a full load testing DevOps pipeline that will stresstest the Lex bot and its capabilities to answer to requests. This will build the supporting components that are needed to integrate with Amazon Lex and are described below (Amazon API Gateway, AWS Lambda, Amazon DynamoDB, Amazon SNS).
  • STEP 3) “Testing the ChatOps Bot”: We will execute some test scripts through Postman, that will trigger Amazon API Gateway and trigger a sample Account request that will require a feedback from the ChatOps Lex Bot.
  • STEP 4) “Integration with Slack”: The final step is an end-to-end integration with an communication platform solution such as Slack.

The DevOps pipeline (using CodePipeline, CodeCommit, CodeBuild and CodeDeploy) is automatically triggered when the stack is deployed and the AWS CodeCommit repository is created inside the account. The pipeline builds the Amazon Lex ChatOps bot from the source code. The Step 2 integrates the surrounding components with the ChatOps Lex bot in 3 different environments: Dev/Staging/Prod. In addition to that, we use canary deployment to promote updates in the lambda code from the AWS CodeCommit repository. During the canary deployment we implemented the rollback procedure using a log metric filter that scans the word Exception inside the log file in CloudWatch. When the word is found, an alarm is triggered and deployment is automatically rolled back. Usually, the rollback will occur automatically during the load test phase. This would prevent faulty code from being promoted into the production environment.

Prerequisites

For this walkthrough, you should have the following prerequisites ready. What you’ll need:

  • An AWS account
  • A ready AWS ControlTower deployment (needs 3 AWS Accounts/e-mail addresses)
  • AWS Cloud9 IDE or a development environment with access to download/run the scripts provided through Github
  • You need to log into the AWS Control Tower management account with AWSAdministratorAccess role if using AWS SSO or equivalent permissions if you are using other federations.

Walkthrough

To get started, you can use Cloud9 IDE or log into your AWS SSO environment within AWS Control Tower.

  1. Prepare: Set up the sample solution

Log in to your AWS account and open Cloud9.

1.1. Clone the GitHub repository to your Cloud9 environment.

The complete solution can be found at the GitHub repository here. The actual deployment and build are scripted in shell, but the Serverless code is in Java and uses Amazon Serverless services to build this solution (Amazon API Gateway, Amazon DynamoDB, Amazon SNS).

git clone https://github.com/aws-samples/multi-environment-chatops-bot-for-controltower

  1. STEP 1: Install Amazon Lex Bot

Amazon Lex is currently not deployable natively with Amazon CloudFormation. Therefore the solution is using a custom Lambda resource in Amazon CloudFormation to create the Amazon Lex bot. We will create the Lex bot, along some sample utterances, three custom slots (Account Type, Account E-Mail and Organizational OU) and one main intent (“Control Tower Account Vending Intent”) to capture the request to trigger an AWS Account vending process.

2.1. Start the script, “deploy.sh” and provide the below inputs. Select a project name. You can override it if you wan’t to choose a custom name and select the bucket name accordingly (we recommend to use the default names)

./deploy.sh

Choose a project name [chatops-lex-bot-xyz]:

Choose a bucket name for source code upload [chatops-lex-bot-xyz]:

2.2. To confirm, double check the AWS region you have specificed.

Attention: Make sure you have configured your AWS CLI region! (use either 'aws configure' or set your 'AWS_DEFAULT_REGION' variable).

Using region from $AWS_DEFAULT_REGION: eu-west-1

2.3. Then, make sure you choose the region where you want to install Amazon Lex (make sure you use an available AWS region where Lex is available), or use the default and leave empty. The Amazon Lex AWS region can be different as where you have AWS ControlTower deployed.

Choose a region where you want to install the chatops-lex-bot [eu-west-1]:

Using region eu-west-1

2.4. The script will create a new S3 bucket in the specified region in order to upload the code to create the Amazon Lex bot.

Creating a new S3 bucket on eu-west-1 for your convenience...
make_bucket: chatops-lex-bot-xyz
Bucket chatops-lex-bot-xyz successfully created!

2.5. We show a summary of the bucket name and the project being used.

Using project name................chatops-lex-bot-xyz
Using bucket name.................chatops-lex-bot-xyz

2.6 Make sure that if any of these names or outputs are wrong, you can still stop here by pressing Ctrl+c.

If these parameters are wrong press ctrl+c to stop now...

2.7 The script will upload the source code to the S3 bucket specified, you should see a successful upload.

Waiting 9 seconds before continuing
upload: ./chatops-lex-bot-xyz.zip to s3://chatops-lex-bot-xyz/chatops-lex-bot-xyz.zip

2.8 Then, the script will trigger an aws cloudformation package command, that will use the uploaded zip file, reference it and generate a ready CloudFormation yml file for deployment. The output of the generated package-file (devops-packaged.yml) will be stored locally and used to executed the aws cloudformation deploy command.

Successfully packaged artifacts and wrote output template to file devops-packaged.yml.

Note: You can ignore this part below as the shell script will execute the “aws cloudformation deploy” command for you.

Execute the following command to deploy the packaged template

aws cloudformation deploy --template-file devops-packaged.yml --stack-name <YOUR STACK NAME>

2.9 The AWS CloudFormation scripts should be running in the background

Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - chatops-lex-bot-xyz-cicd

2.10 Once you see the successful output of the CloudFormation script “chatops-lex-bot-xyz-cicd”, everything is ready to continue.

------------------------------------------
ChatOps Lex Bot Pipeline is installed
Will install the ChatOps API as an Add-On to the Vending Machine
------------------------------------------

2.11 Before we continue, confirm the output of the AWS CloudFormation called “chatops-lex-bot-xyz-cicd”. You should find three outputs from the CloudFormation template.

  • A CodePipeline, CodeCommit Repository with the same naming convention (chatops-lex-bot-xyz), and a CodeBuild execution with one stage (Prod). The execution of this pipeline should show as “Succeeded” within CodePipeline.
  • As a successful result of the execution of the Pipeline, you should find another CloudFormation that was triggered, which you should find in the output of CodeBuild or the CloudFormation Console (chatops-lex-bot-xyz-Prod).
  • The created resource of this CloudFormation will be the Lambda function (chatops-lex-bot-xyz-Prod-AppFunction-abcdefgh) that will create the Amazon Lex Bot. You can find the details in Amazon Lambda in the Mgmt console. For more information on CloudFormation and custom resources, see the CloudFormation documentation.
  • You can find the successful execution in the CloudWatch Logs:

Adding Slot Type:: AccountTypeValues
Adding Slot Type:: AccountOUValues
Adding Intent:: AWSAccountVending
Adding LexBot:: ChatOps
Adding LexBot Alias:: AWSAccountVending

  • Check if the Amazon Lex bot has been created in the Amazon Lex console, you should see an Amazon Lex bot called “ChatOps” with the status “READY”.

2.12. This means you have successfully installed the ChatOps Lex Bot. You can now continue with STEP 2.

  1. STEP 2. Build of the multi-environment CI/CD pipeline

In this section, we will finalize the set up by creating a full CI/CD Pipeline, the API Gateway and Lambda functions that can capture requests for Account creation (AccountVendor) and interact with Amazon Lex, and a full testing cycle to do a Dev-Staging-Production build pipeline that does a stress test on the whole set of Infrastructure created.

3.1 You should see the same name of the bucket and project as used previously. If not, please override the input here. Otherwise, leave empty (we recommend to use the default names).

Choose a bucket name for source code upload [chatops-lex-xyz]:

3.2. This means that the Amazon Lex Bot was successfully deployed, and we just confirm the deployed AWS region.

ChatOps-Lex-Bot is already deployed in region eu-west-1

3.3 Please specify a mailbox that you have access in order to approve new ChatOps (e.g. Account vending) vending requests as a manual approver step.

Choose a mailbox to receive approval e-mails for new accounts: [email protected]

3.4 Make sure you have the right AWS region where AWS Control Tower has deployed its Account Factory Portfolio product in Service Catalog (to double check you can log into AWS Service Catalog and confirm that you see the AWS Control Tower Account Factory)

Choose the AWS region where your vending machine is installed [eu-west-1]:
Using region eu-west-1

Creating a new S3 bucket on eu-west-1 for your convenience...
{
"Location": "http://chatops-lex-xyz.s3.amazonaws.com/"
}

Bucket chatops-lex-xyz successfully created!

3.5 Now the script will identify if you have Control Tower deployed and if it can identify the Control Tower Account Factory Product.

Trying to find the AWS Control Tower Account Factory Portfolio

Using project name....................chatops-lex-xyz
Using bucket name.....................chatops-lex-xyz
Using mailbox for approvals...........approvermail+chatops-lex-bot-xyz@yourdomain.com
Using lexbot region...................eu-west-1
Using service catalog portfolio-id....port-abcdefghijklm

If these parameters are wrong press ctrl+c to stop now…

3.6 If something is wrong or has not been set and you see an empty line for any of the, stop here and press ctr+c. Check the Q&A section if you might have missed some errors previously. These values need to be filled to proceed.

Waiting 1 seconds before continuing
[INFO] Scanning for projects...
[INFO] Building Serverless Jersey API 1.0-SNAPSHOT

3.7 You should see a “BUILD SUCCESS” message.

[INFO] BUILD SUCCESS
[INFO] Total time:  0.190 s

3.8 Then the package built locally will be uploaded to the S3 bucket, and then again prepared for Amazon CloudFormation to package- and deploy.

upload: ./chatops-lex-xyz.zip to s3://chatops-lex-xyz/chatops-lex-xyz.zip

Successfully packaged artifacts and wrote output template to file devops-packaged.yml.
Execute the following command to deploy the packaged template
aws cloudformation deploy --template-file devops-packaged.yml --stack-name <YOUR STACK NAME>

3.9 You can neglect the above message, as the shell script will execute the Cloudformation API for you. The AWS CloudFormation scripts should be running in the background, and you can double check in the AWS Mgmt Console.

Waiting for changeset to be created..
Waiting for stack create/update to complete

Successfully created/updated stack - chatops-lex-xyz-cicd
------------------------------------------
ChatOps Lex Pipeline and Chatops Lex Bot Pipelines successfully installed
------------------------------------------

3.10 This means that the Cloud Formation scripts have executed successfully. Lets confirm in the Amazon CloudFormation console, and in Code Pipeline if we have a successful outcome and full test-run of the CICD pipeline. To remember, have a look at the AWS Architecture overview and the resources / components created.

You should find the successful Cloud Formation artefacts named:

  • chatops-lex-xyz-cicd: This is the core CloudFormation that we created and uploaded that built a full CI/CD pipeline with three phases (DEV/STAGING/PROD). All three stages will create a similar set of AWS resources (e.g. Amazon API Gateway, AWS Lambda, Amazon DynamoDB), but only the Staging phase will run an additional Load-Test prior to doing the production release.
  • chatops-lex-xyz-DEV: A successful build, creation and deployment of the DEV environment.
  • chatops-lex-xyz-STAGING: The staging phase will run a set of load tests, for a full testing and through io (an open-source load testing framework)
  • chatops-lex-xyz-PROD: A successful build, creation and deployment of the Production environment.

3.11 For further confirmation, you can check the Lambda-Functions (chatops-lex-xyz-pipeline-1-Prod-ChatOpsLexFunction-), Amazon DynamoDB (chatops-lex-xyz-pipeline-1_account_vending_) and Amazon SNS (chatops-lex-xyz-pipeline-1_aws_account_vending_topic_Prod) if all the resources as shown in the Architecture picture have been created.

Within Lambda and/or Amazon API Gateway, you will find the API Gateway execution endpoints, same as in the Output section from CloudFormation:

  • ApiUrl: https://apiId.execute-api.eu-west-1.amazonaws.com/Prod/account
  • ApiApproval https://apiId.execute-api.eu-west-1.amazonaws.com/Prod/account/confirm

3.11 This means you have successfully installed the Amazon Lex ChatOps bot, and the surrounding test CI/CD pipeline. Make sure you have accepted the SNS subscription confirmation.

AWS Notification - Subscription Confirmation

You have chosen to subscribe to the topic:
arn:aws:sns:eu-west-1:12345678901:chatops-lex-xyz-pipeline_aws_account_vending_topic_Prod
To confirm this subscription, click or visit the link below (If this was in error no action is necessary)

  1. STEP 3: Testing the ChatOps Bot

In this section, we provided a test script to test if the Amazon Lex Bot is up and if Amazon API Gateway/Lambda are correctly configured to handle the requests.

4.1 Use the Postman script under the /test folder postman-test.json, before you start integrating this solution with a Chat or Web- frontend such as Slack or a custom website in Production.

4.2. You can import the JSON file into Postman and execute a RESTful test call to the API Gateway endpoint.

4.3 Once the script is imported in Postman, you should execute the two commands below and replace the HTTP URL of the two requests (Vending API and Confirmation API) by the value of APIs recently created in the Production environment. Alternatively, you can also access these values directly from the Output tab in the CloudFormation stack with a name similar to chatops-lex-xyz-Prod:

aws cloudformation describe-stacks --query "Stacks[0].Outputs[?OutputKey=='ApiUrl'].OutputValue" --output text

aws cloudformation describe-stacks --query "Stacks[0].Outputs[?OutputKey=='ApiApproval'].OutputValue" --output text

4.4 Execute an API call against the PROD API

  • Use the Amazon API Gateaway endpoint to trigger a REST call against the endpoint, an example would be https://apiId.execute-api.eu-west-1.amazonaws.com/Prod/account/. Make sure you change the “apiId” with your Amazon Gateway API ID endpoint found in the above sections (CloudFormation Output or within the Lambda), see here the start of the parameters that you have to change in the postman-test.json file:

"url": {
"raw": "https://apiId.execute-api.us-east-1.amazonaws.com/Prod/account",
"protocol": "https",

  • Request Input, fill out and update the values on each of the JSON sections:

{ “UserEmail”: “[email protected]”, “UserName”:“TestUser-Name”, “UserLastname”: “TestUser-LastName”, “UserInput”: “Hi, I would like a new account please!”}

  • If the test response is SUCCESSFUL, you should see the following JSON as a return:

{"response": "Hi TestUser-Name, what account type do you want? Production or Sandbox?","initial-params": "{\"UserEmail\": \"[email protected]\",\"UserName\":\"TestUser-Name\",\"UserLastname\": \"TestUser-LastName\",\"UserInput\": \"Hi, I would like a new account please!\"}"}

4.5 Test the “confirm” action. To confirm the Account vending request, you can easily execute the /confirm API, which is similar to if you would confirm the action through the e-mail confirmation that you receive via Amazon SNS.

Make sure you change the following sections in Postman (Production-Confirm-API) and use the ApiApproval-apiID that has the /confirm path.

https://apiId.execute-api.eu-west-1.amazonaws.com/Prod/account/confirm

  1. STEP 4: Slack Integration Example

We will demonstrate you how to integrate with a Slack channel but any other request portal (Jira), Website or App that allows REST API integrations (e.g. Amazon Chime) could be used for this.

5.1 Use the attached YAML slack App manifest file to create a new Slack Application within your Organization. Go to “https://api.slack.com/apps?new_app=1” and choose “Create New App”.

5.2 Choose the “From an app manifest” to create a new Slack App and paste the sample code from the /test folder slack-app-manifest.yml .

  • Note: Make sure you first overwrite the request_url parameter for your Slack App that will point to the Production API Gateway endpoint.

request_url: https://apiId.execute-api.us-east-1.amazonaws.com/Prod/account"

5.3 Choose to deploy and re-install the Slack App to your workspace and then access the ChatBot Application within your Slack workspace. If everything is successful, you can see a working Serverless ChatBot as shown below.

Slack Example

Conclusion and Cleanup

Conclusion

In this blog post, you have learned how to create a multi-environment CICD pipeline that builds a fully Serverless AWS account vending solution using an AI powered Amazon Lex bot integrated with AWS Control Tower Account Factory. This solution will help you enable standardized account vending on AWS through an easy way by exposing a ChatBot to your AWS consumers coming from various channels. This solution can be extended with AWS ServiceCatalog to allow to launch not just AWS accounts, but almost any AWS Service by using IaC (CloudFormation) templates provided through the CCoE Ops and Architecture teams.

Cleanup

For a proper cleanup, you can just go into AWS CloudFormation and choose the deployed Stacks and choose to “delete Stack”. If you incur issues while deleting, see below troubleshooting solutions for a fix. Also make sure you delete your integration Apps (e.g. Slack) for a full cleanup.

Troubleshooting

  1. An error occurred (BucketAlreadyOwnedByYou) when calling the CreateBucket operation: Your previous request to create the named bucket succeeded and you already own it.
    Solution: Make sure you use a distinct name for the S3 bucket used in this project, for the Amazon Lex Bot and the CICD pipeline
  2. When you delete and rollback of the CloudFormation stacks and you get an error (Code: 409; Error Code: BucketNotEmpty).
    Solution: Delete the S3 build bucket and its content “delete permanently” and then delete the associated CloudFormation stack that has created the CICD pipeline.

Field Notes: Perform Automations in Ungoverned Regions During Account Launch Using AWS Control Tower Lifecycle Events

Post Syndicated from Amit Kumar original https://aws.amazon.com/blogs/architecture/field-notes-perform-automations-in-ungoverned-regions-during-account-launch-using-aws-control-tower-lifecycle-events/

This post was co-authored by Amit Kumar; Partner Solutions Architect at AWS, Pavan Kumar Alladi; Senior Cloud Architect at Tech Mahindra, and Thooyavan Arumugam; Senior Cloud Architect at Tech Mahindra.

Organizations use AWS Control Tower to set up and govern secure, multi-account AWS environments. Frequently enterprises with a global presence want to use AWS Control Tower to perform automations during the account creation including in AWS Regions where AWS Control Tower service is not available. To review the current list of Regions where AWS Control Tower is available, visit the AWS Regional Services List.

This blog post shows you how we can use AWS Control Tower lifecycle events, AWS Service Catalog, and AWS Lambda to perform automation in the Region where AWS Control Tower service is unavailable. This solution depicts the scenario for a single Region and the solution need to be changed to work with a multi-Regions scenario.

We use an AWS CloudFormation template to create a virtual private cloud (VPC) with subnet and internet gateway as an example and use it in shared service catalog products at the organization level to make it available in child accounts. Every time AWS Control Tower lifecycle events related to account creation occurs, a Lambda function is initiated to perform automation activities in AWS Regions that are not governed by AWS Control Tower.

The solution in this blog post uses the following AWS services:

Figure 1. Solution architecture

Figure 1. Solution architecture

Prerequisites

For this walkthrough, you need the following prerequisites:

  • AWS Control Tower configured with AWS Organizations defined and registered within AWS Control Tower. For this blog post, AWS Control Tower is deployed in AWS Mumbai Region and with an AWS Organizations structure as depicted in Figure 2.
  • Working knowledge of AWS Control Tower.
Figure 2. AWS Organizations structure

Figure 2. AWS Organizations structure

Create an AWS Service Catalog product and portfolio, and share at the AWS Organizations level

  1. Sign in to AWS Control Tower management account as an administrator, and select an AWS Region which is not governed by AWS Control Tower (for this blog post, we will use AWS us-west-1 (N. California) as the Region because at this time it is unavailable in AWS Control Tower).
  2. In the AWS Service Catalog console, in the left navigation menu, choose Products.
  3. Choose upload new product. For Product Name enter customvpcautomation, and for Owner enter organizationabc. For method, choose Use a template file.
  4. In Upload a template file, select Choose file, and then select the CloudFormation template you are going to use for automation. In this example, we are going to use a CloudFormation template which creates a VPC with CIDR 10.0.0.0/16, Public Subnet, and Internet Gateway.
Figure 3. AWS Service Catalog product

Figure 3. AWS Service Catalog product

CloudFormation template: save this as a YAML file before selecting this in the console.

AWSTemplateFormatVersion: 2010-09-09
Description: Template to create a VPC with CIDR 10.0.0.0/16 with a Public Subnet and Internet Gateway. 

Resources:
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsSupport: true
      EnableDnsHostnames: true
      Tags:
        - Key: Name
          Value: VPC

  IGW:
    Type: AWS::EC2::InternetGateway
    Properties:
      Tags:
        - Key: Name
          Value: IGW

  VPCtoIGWConnection:
    Type: AWS::EC2::VPCGatewayAttachment
    DependsOn:
      - IGW
      - VPC
    Properties:
      InternetGatewayId: !Ref IGW
      VpcId: !Ref VPC

  PublicRouteTable:
    Type: AWS::EC2::RouteTable
    DependsOn: VPC
    Properties:
      VpcId: !Ref VPC
      Tags:
        - Key: Name
          Value: Public Route Table

  PublicRoute:
    Type: AWS::EC2::Route
    DependsOn:
      - PublicRouteTable
      - VPCtoIGWConnection
    Properties:
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId: !Ref IGW
      RouteTableId: !Ref PublicRouteTable

  PublicSubnet:
    Type: AWS::EC2::Subnet
    DependsOn: VPC
    Properties:
      VpcId: !Ref VPC
      MapPublicIpOnLaunch: true
      CidrBlock: 10.0.0.0/24
      AvailabilityZone: !Select
        - 0
        - !GetAZs
          Ref: AWS::Region
      Tags:
        - Key: Name
          Value: Public Subnet

  PublicRouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    DependsOn:
      - PublicRouteTable
      - PublicSubnet
    Properties:
      RouteTableId: !Ref PublicRouteTable
      SubnetId: !Ref PublicSubnet

Outputs:

 PublicSubnet:
    Description: Public subnet ID
    Value:
      Ref: PublicSubnet
    Export:
      Name:
        'Fn::Sub': '${AWS::StackName}-SubnetID'

 VpcId:
    Description: The VPC ID
    Value:
      Ref: VPC
    Export:
      Name:
        'Fn::Sub': '${AWS::StackName}-VpcID'
  1. After the CloudFormation template is selected, choose Review, and then choose Create Product.
Figure 4. AWS Service Catalog product

Figure 4. AWS Service Catalog product

  1. In the AWS Service Catalog console, in the left navigation menu, choose Portfolios, and then choose Create portfolio.
  2. For Portfolio name, enter customvpcportfolio, for Owner, enter organizationabc, and then choose Create.
Figure 5. AWS Service Catalog portfolio

Figure 5. AWS Service Catalog portfolio

  1. After the portfolio is created, select customvpcportfolio. In the actions dropdown, select Add product to portfolio. Then select customvpcautomation product, and choose Add Product to Portfolio.
  2. Navigate back to customvpcportfolio, and select the portfolio name to see all the details. On the portfolio details page, expand the Groups, roles, and users tab, and choose Add groups, roles, users. Next, select the Roles tab and search for AWSControlTowerAdmin role, and choose Add access.
Figure 6. AWS Service Catalog portfolio role selection

Figure 6. AWS Service Catalog portfolio role selection

  1. Navigate to the Share section in portfolio details, and choose Share option. Select AWS Organization, and choose Share.

Note: If you get a warning stating “AWS Organizations sharing is not enabled”, then choose Enable and select the organizational unit (OU) where you want this portfolio to be shared. In this case, we have shared at Workload OU where all workload account is created.

Figure 7. AWS Service Catalog portfolio sharing

Figure 7. AWS Service Catalog portfolio sharing

Create an AWS Identity and Access Management (IAM) role

  1. Sign in to AWS Control Tower management account as an administrator and navigate to IAM Service.
  2. In the IAM console, choose Policies in the navigation pane, then choose Create Policy.
  3. Click on Choose a service, and select STS. In the Actions menu, choose All STS Actions, in Resources, choose All resources, and then choose Next: Tags.
  4. Skip the Tag section, go to the Review section, and for Name enter lambdacrossaccountSTS, and then choose Create policy.
  5. In the navigation pane of the IAM console, choose Roles, and then choose Create role. For the use case, select Lambda, and then choose Next: Permissions.
  6. Select AWSServiceCatalogAdminFullAccess and AmazonSNSFullAccess, then choose Next: Tags (skip tag screen if needed), then choose Next: Review.
  7. For Role name, enter Automationnongovernedregions, and then choose Create role.
Figure 8. AWS IAM role permissions

Figure 8. AWS IAM role permissions

Create an Amazon Simple Notification Service (Amazon SNS) topic

  1. Sign in to AWS Control Tower management account as an administrator and select AWS Mumbai Region (Home Region for AWS CT). Navigate to Amazon SNS Service, and on the navigation panel, choose Topics.
  2. On the Topics page, Choose Create topic. On the Create topic page, in the Details section, for Type select Standard, and for Name enter ControlTowerNotifications. Keep default for other options, and then choose Create topic.
  3. In the Details section, in the left navigation pane, choose Subscriptions.
  4. On the Subscriptions page, choose Create subscription. For Protocol, choose Email and for Endpoint mention the email id where notification need to come and choose Create Subscription.

You will receive an email stating that the subscription is in pending status. Follow the email instructions to confirm the subscription. Check in the Amazon SNS Service console to verify subscription confirmation.

Figure 9. Amazon SNS topic creation and subscription

Figure 9. Amazon SNS topic creation and subscription

Create an AWS Lambda function

  1. Sign in to AWS Control Tower management account as an administrator and select AWS Mumbai Region (Home Region for AWS Control Tower). Open the Functions page on the Lambda console, and choose Create function.
  2.  In the Create function section, choose Author from scratch.
  3. In the Basic information section:
    1. For Function name, enter NonGovernedCrossAccountAutomation.
    2. For Runtime, choose Python 3.8.
    3. For Role, select Choose an existing role.
    4. For Existing role, select the Lambda role that you created earlier.
  1. Choose Create function.
  2. Copy and paste the following code in to the Lambda editor (replace the existing code).
  3. In the File menu, choose Save.

Lambda function code: The Lambda function is developed to initiate the AWS Service Catalog product, shared at Organizations level from AWS Control Tower management account, onto all member accounts in a hub and spoke model. Key activities performed by the Lambda function are:

    • Assume role – Provides the mechanism to assume AWSControlTowerExecution role in the child account.
    • Launch product – Launch the AWS Service Catalog product shared in the non-governed Region with the member account.
    • Email notification – Send notifications to the subscribed recipients.

When this Lambda function is invoked by the AWS Control Tower lifecycle event, it performs the activity of provisioning the AWS Service Catalog products in the Region which is not governed by AWS Control Tower.

#####################################################################################
# Decription:This Lambda used execute service catalog products in unmanaged ControlTower 
# regions while creation of AWS accounts
# Environment: Control Tower Env
# Version 1.0
#####################################################################################

import boto3
import os
import time

SSM_Master = boto3.client('ssm')
STS_Master = boto3.client('sts')
SC_Master = boto3.client('servicecatalog',region_name = 'us-west-1')
SNS_Master = boto3.client('sns')

def lambda_handler(event, context):
    
    if event['detail']['serviceEventDetails']['createManagedAccountStatus']['state'] == 'SUCCEEDED':
        
        account_name = event['detail']['serviceEventDetails']['createManagedAccountStatus']['account']['accountName']
        account_id = event['detail']['serviceEventDetails']['createManagedAccountStatus']['account']['accountId']
        ##Assume role to member account
        assume_role(account_id)  
        
        try:
        
            print("-- Executing Service Catalog Procduct in the account: ", account_name)
            ##Launch Product in member account
            launch_product(os.environ['ProductName'], SC_Member)
            sendmail(f'-- Product Launched successfully ')

        except Exception as err:
        
            print(f'-- Error in Executing Service Catalog Procduct in the account: : {err}')
            sendmail(f'-- Error in Executing Service Catalog Procduct in the account: : {err}')   
    
 ##Function to Assume Role and create session in the Member account.                       
def assume_role(account_id):
    
    global SC_Member, IAM_Member, role_arn
    
    ## Assume the Member account role to execute the SC product.
    role_arn = "arn:aws:iam::$ACCOUNT_NUMBER$:role/AWSControlTowerExecution".replace("$ACCOUNT_NUMBER$", account_id)
    
    ##Assuming Member account Service Catalog.
    Assume_Member_Acc = STS_Master.assume_role(RoleArn=role_arn,RoleSessionName="Member_acc_session")
    aws_access_key_id=Assume_Member_Acc['Credentials']['AccessKeyId']
    aws_secret_access_key=Assume_Member_Acc['Credentials']['SecretAccessKey']
    aws_session_token=Assume_Member_Acc['Credentials']['SessionToken']

    #Session to Connect to IAM and Service Catalog in Member Account                          
    IAM_Member = boto3.client('iam',aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key,aws_session_token=aws_session_token)
    SC_Member = boto3.client('servicecatalog', aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key,aws_session_token=aws_session_token,region_name = "us-west-1")
    
    ##Accepting the portfolio share in the Member account.
    print("-- Accepting the portfolio share in the Member account.")
    
    length = 0
    
    while length == 0:
        
        try:
            
            search_product = SC_Member.search_products()
            length = len(search_product['ProductViewSummaries'])
        
        except Exception as err:
            
            print(err)
        
        if length == 0:
        
            print("The shared product is still not available. Hence waiting..")
            time.sleep(10)
            ##Accept portfolio share in member account
            Accept_portfolio = SC_Member.accept_portfolio_share(PortfolioId=os.environ['portfolioID'],PortfolioShareType='AWS_ORGANIZATIONS')
            Associate_principal = SC_Member.associate_principal_with_portfolio(PortfolioId=os.environ['portfolioID'],PrincipalARN=role_arn, PrincipalType='IAM')
        
        else:
            
            print("The products are listed in account.")
    
    print("-- The portfolio share has been accepted and has been assigned the IAM Role principal.")
    
    return SC_Member

##Function to execute product in the Member account.    
def launch_product(ProductName, session):
  
    describe_product = SC_Master.describe_product_as_admin(Name=ProductName)
    created_time = []
    version_ID = []
    
    for version in describe_product['ProvisioningArtifactSummaries']:
        
        describe_provisioning_artifacts = SC_Master.describe_provisioning_artifact(ProvisioningArtifactId=version['Id'],Verbose=True,ProductName=ProductName,)
        
        if describe_provisioning_artifacts['ProvisioningArtifactDetail']['Active'] == True:
            
            created_time.append(describe_provisioning_artifacts['ProvisioningArtifactDetail']['CreatedTime'])
            version_ID.append(describe_provisioning_artifacts['ProvisioningArtifactDetail']['Id'])
    
    latest_version = dict(zip(created_time, version_ID))
    latest_time = max(created_time)
    launch_provisioned_product = session.provision_product(ProductName=ProductName,ProvisionedProductName=ProductName,ProvisioningArtifactId=latest_version[latest_time],ProvisioningParameters=[
        {
            'Key': 'string',
            'Value': 'string'
        },
    ])
    print("-- The provisioned product ID is : ", launch_provisioned_product['RecordDetail']['ProvisionedProductId'])
    
    return(launch_provisioned_product)   
    
def sendmail(message):
    
     sendmail = SNS_Master.publish(
     TopicArn=os.environ['SNSTopicARN'],
     Message=message,
     Subject="Alert - Attention Required",
MessageStructure='string')
  1. Choose Configuration, then choose Environment variables.
  2. Choose Edit, and then choose Add environment variable for each of the following:
    1. Variable 1: Key as ProductName, and Value as “customvpcautomation” (name of the product created in the previous step).
    2. Variable 2: Key as SNSTopicARN, and Value as “arn:aws:sns:ap-south-1:<accountid>:ControlTowerNotifications” (ARN of the Amazon SNS topic created in the previous step).
    3. Variable 3: Key as portfolioID, and Value as “port-tbmq6ia54yi6w” (ID for the portfolio which was created in the previous step).
Figure 10. AWS Lambda function environment variable

Figure 10. AWS Lambda function environment variable

  1. Choose Save.
  2. On the function configuration page, on the General configuration pane, choose Edit.
  3. Change the Timeout value to 5 min.
  4. Go to Code Section, and choose the Deploy option to deploy all the changes.

Create an Amazon EventBridge rule and initiate with a Lambda function

  1. Sign in to AWS Control Tower management account as an administrator, and select AWS Mumbai Region (Home Region for AWS Control Tower).
  2. On the navigation bar, choose Services, select Amazon EventBridge, and in the left navigation pane, select Rules.
  3. Choose Create rule, and for Name enter NonGovernedRegionAutomation.
  4. Choose Event pattern, and then choose Pre-defined pattern by service.
  5. For Service provider, choose AWS.
  6. For Service name, choose Control Tower.
  7. For Event type, choose AWS Service Event via CloudTrail.
  8. Choose Specific event(s) option, and select CreateManagedAccount.
  9. In Select targets, for Target, choose Lambda. Select the Lambda function which was created earlier named as NonGovernedCrossAccountAutomation in Function dropdown.
  10. Choose Create.
Figure 11. Amazon EventBridge rule initiated with AWS Lambda

Figure 11. Amazon EventBridge rule initiated with AWS Lambda

Solution walkthrough

    1. Sign in to AWS Control Tower management account as an administrator, and select AWS Mumbai Region (Home Region for AWS Control Tower).
    2. Navigate to the AWS Control Tower Account Factory page, and select Enroll account.
    3. Create a new account and complete the Account Details section. Enter the Account email, Display name, AWS SSO email, and AWS SSO user name, and select the Organizational Unit dropdown. Choose Enroll account.
Figure 12. AWS Control Tower new account creation

Figure 12. AWS Control Tower new account creation

      1. Wait for account creation and enrollment to succeed.
Figure 13. AWS Control Tower new account enrollment

Figure 13. AWS Control Tower new account enrollment

      1. Sign out of the AWS Control Tower management account, and log in to the new account. Select the AWS us-west-1 (N. California) Region. Navigate to AWS Service Catalog and then to Provisioned products. Select the Access filter as Account and you will observe that one provisioned product is created and available.
Figure 14. AWS Service Catalog provisioned product

Figure 14. AWS Service Catalog provisioned product

      1. Go to VPC service to verify if a new VPC is created by the AWS Service Catalog product with a CIDR of 10.0.0.0/16.
Figure 15. AWS VPC creation validation

Figure 15. AWS VPC creation validation

      1. Step 4 and Step 5 validates that you are able to perform the automation during account creation through the AWS Control Tower lifecycle events in non-governed Regions.

Cleaning up

To avoid incurring future charges, clean up the resources created as part of this blog post.

  • Delete the AWS Service Catalog product and portfolio you created.
  • Delete the IAM role, Amazon SNS topic, Amazon EventBridge rule, and AWS Lambda function you created.
  • Delete the AWS Control Tower setup (if created).

Conclusion

In this blog post, we demonstrated how to use AWS Control Tower lifecycle events to perform automation tasks during account creation in Regions not governed by AWS Control Tower. AWS Control Tower provides a way to set up and govern a secure, multi-account AWS environment. With this solution, customers can use AWS Control Tower to automate various tasks during account creation in Regions regardless if AWS Control Tower is available in that Region.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.
Daniel Cordes

Pavan Kumar Alladi

Pavan Kumar Alladi is a Senior Cloud Architect with Tech Mahindra and is based out of Chennai, India. He is working on AWS technologies from past 10 years as a specialist in designing and architecting solutions on AWS Cloud. He is ardent in learning and implementing cloud based cutting edge solutions and is extremely zealous about applying cloud services to resolve complex real world business problems. Currently, he leads customer engagements to deliver solutions for Platform Engineering, Cloud Migrations, Cloud Security and DevOps.

Gaurav Jain

Thooyavan Arumugam

Thooyavan Arumugam is a Senior Cloud Architect at Tech Mahindra’s AWS Practice team. He has over 16 years of industry experience in Cloud infrastructure, network, and security. He is passionate about learning new technologies and helping customers solve complex technical problems by providing solutions using AWS products and services. He provides advisory services to customers and solution design for Cloud Infrastructure (Security, Network), new platform design and Cloud Migrations.

Field Notes: Building a Multi-Region Architecture for SQL Server using FCI and Distributed Availability Groups

Post Syndicated from Yogi Barot original https://aws.amazon.com/blogs/architecture/field-notes-building-a-multi-region-architecture-for-sql-server-using-fci-and-distributed-availability-groups/

A multiple-Region architecture for Microsoft SQL Server is often a topic of interest that comes up when working with our customers. The main reasons customers adopt a multiple-Region architecture approach for SQL Server deployments are:

  • Business continuity and disaster recovery (DR)
  • Geographically distributed customer base, and improved latency for end users

We will explain the architecture patterns that you can follow to effectively design a highly available SQL Server deployment, which spans two or more AWS Regions. You will also learn how to use the multiple-Region approach to scale out the read workloads, and improve the latency for your globally distributed end users.

This blog post explores SQL Server DR architecture using SQL Server Failover Cluster with Amazon FSx for Windows File Server, for primary site and secondary DR site, and describes how to set up a multiple-Region Always On distributed availability group.

Architecture overview

The architecture diagram in Figure 1 depicts two SQL Server clusters (multiple Availability Zones) in two separate Regions, and uses distributed availability group for replication and DR. This will also serve as the reference architecture for this solution.

Figure 1. Two SQL Server clusters (multiple Availability Zones) in two separate Regions

Figure 1. Two SQL Server clusters (multiple Availability Zones) in two separate Regions

In Figure 1, there are two separate clusters in different Regions. The primary cluster in Region_01 is initially configured with SQL Server Failover Cluster Instance (FCI) using Amazon FSx for its shared storage. Always On is enabled on both nodes, and is configured to use FCI SQL Network Name (SQLFCI01) as the single replica for local Availability Group (AG01). Region_02 has an identical configuration to Region_01, but with different hostnames, listeners, and SQL Network Name to avoid possible collisions.

Highlighted in Figure 1, the Always On distributed availability group is then configured to use both listener endpoints (AG01 and AG02). Depending on what type of authentication infrastructure you have, you can either use certificates (no domain and trust dependency), or just AWS Directory Service for Microsoft Active Directory authentication to build the local mirroring endpoint that will be used by the distributed availability group.

With Amazon FSx, you get a fully managed shared file storage solution, that automatically replicates the underlying storage synchronously across multiple Availability Zones. Amazon FSx provides high availability with automatic failure detection, and automatic failover if there are any hardware or storage issues. The service fully supports continuously available shares, a feature that allows SQL Server uninterrupted access to shared file data.

There is an asynchronous replication setup using a distributed availability group from Region A to Region B. In this type of configuration, because there is only one availability group replica, it also serves as the forwarder for the local FCI cluster. The concept of a forwarder is new, and it’s one of the core functionalities for the distributed availability group. Because Windows Failover Cluster1 and Windows Failover Cluster2 are standalone and independent clusters, don’t need to open a large set of ports, thus minimizing security risk.

In this solution, because FCI is our primary high availability solution, users and applications should then connect through FCI SQL Server Network Name with the latest supported drivers and key parameters (such as, MultiSubNetFailover=True – if supported) to facilitate the failover and make sure that the applications seamlessly connect to the new replica without any errors or timeouts.

Prerequisites

Walkthrough

Following are the steps required to configure SQL Server DR using SQL Server Failover Cluster with Amazon FSx for Windows File Server for primary site and secondary DR site. We also show how to set up a multiple-Region Always On distributed availability group.

Assumed Variables

Region_01:

WSFC Cluster Name: SQLCluster1
FCI Virtual Network Name: SQLFCI01
Local Availability Group: SQLAG01

Region_02:

WSFC Cluster Name: SQLCluster2
FCI Virtual Network Name: SQLFCI02
Local Availability Group: SQLAG02

  • Make sure to configure network connectivity between your clusters. In this solution, we are using two VPCs in two separate Regions.
    • VPC peering is configured to enable network traffic on both VPCs.
    • The domain controller (AWS Managed Microsoft AD) on both VPCs are configured with forest trust and conditional forwarding (this enables DNS resolution between the two VPCs).
  • Create a local availability group, using FCI SQL Network Name as the replica. Because we will be setting up a domain-independent distributed availability group between the two clusters, we will be setting up certificates to authenticate between the two separate clusters.
  1. Create master key and endpoint for SQLCluster1

use master
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>'
GO
CREATE CERTIFICATE [SQLAG01-Cert]
with SUBJECT = 'SQLAG01 Endpoint Cert'
GO
 
BACKUP CERTIFICATE [SQLAG01-Cert]
TO FILE = N'\\<FileShare>\SQLAG01-Cert.crt'
GO
 
CREATE ENDPOINT [SQLAG01-Endpoint]
STATE = STARTED
AS TCP
(
LISTENER_PORT = 5022
)
FOR DATABASE_MIRRORING
(
AUTHENTICATION = CERTIFICATE [SQLAG01-Cert],
ROLE = ALL,
ENCRYPTION = REQUIRED ALGORITHM AES
)
GO
  1. Create master key and endpoint for SQLCluster2

use master
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<password>'
GO
CREATE CERTIFICATE [SQLAG02-Cert]
with SUBJECT = 'SQLAG02 Endpoint Cert'
GO
 
BACKUP CERTIFICATE [SQLAG02-Cert]
TO FILE = N'\\<Fileshare>\SQLAG02-Cert.crt'
GO
 
CREATE ENDPOINT [SQLAG02-Endpoint]
STATE = STARTED
AS TCP
(
LISTENER_PORT = 5022
)
FOR DATABASE_MIRRORING
(
AUTHENTICATION = CERTIFICATE [SQLAG02-Cert],
ROLE = ALL,
ENCRYPTION = REQUIRED ALGORITHM AES
)
GO
    • Make sure to place all exported certificates in a location that you can easily access from each FCI instance.
    • Create a SQL Server login and user in the master database on each FCI instance.
  1. Create database login in SQLCluster1

use master
CREATE LOGIN [SQLAG02_DAG] WITH PASSWORD = '<password>'
GO
CREATE USER [SQLAG02_DAG] FOR LOGIN [SQLAG02_DAG] 
GO
CREATE CERTIFICATE [SQLAG02-Cert]
AUTHORIZATION [SQLAG02_DAG]
FROM FILE = N'\\<Fileshare>\SQLAG02-Cert.crt'
GO
  1. Create database login in SQLCluster2

use master
CREATE LOGIN [SQLAG01_DAG] WITH PASSWORD = '<password>'
GO
CREATE USER [SQLAG01_DAG] FOR LOGIN [SQLAG01_DAG] 
GO
CREATE CERTIFICATE [SQLAG01-Cert]
AUTHORIZATION [SQLAG01_DAG]
FROM FILE = N'\\<Fileshare>\SQLAG01-Cert.crt'
GO
    • Now grant the newly created user endpoint access to the local mirroring endpoint in each FCI instance.
  1. Grant permission on endpoint – SQLCluster1

GRANT CONNECT ON ENDPOINT::[SQLAG01-Endpoint] TO [SQLAG02_DAG]
GO
  1. Grant permission on endpoint – SQLCluster2

GRANT CONNECT ON ENDPOINT::[SQLAG02-Endpoint] TO [SQLAG01_DAG]
GO
  1. Create distributed Always On availability group on SQLCluster1

Next, create the distributed availability group on the primary cluster.

CREATE AVAILABILITY GROUP [SQLFCIDAG]  
   WITH (DISTRIBUTED)   
   AVAILABILITY GROUP ON  
    'SQLAG01' WITH    
        (   
            LISTENER_URL = 'tcp://SQLFCI01.DEMOSQL.COM:5022',    
            AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
            FAILOVER_MODE = MANUAL,   
            SEEDING_MODE = AUTOMATIC
        ),   
    'SQLAG02' WITH    
        (   
            LISTENER_URL = 'tcp://SQLFCI02.SQLDEMO.COM:5022',   
            AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
            FAILOVER_MODE = MANUAL,   
            SEEDING_MODE = AUTOMATIC
      );   
    • Note that we are using the SQL Network Name of the FCI cluster as our listener URL.
    • Now, join our secondary WSFC FCI cluster to the distributed availability group.
  1. Join secondary cluster on SQLCluster2 to distributed availability group

ALTER AVAILABILITY GROUP [SQLFCIDAG]   
   JOIN   
   AVAILABILITY GROUP ON  
      'SQLAG01' WITH    
      (   
         LISTENER_URL = 'tcp://SQLFCI01.DEMOSQL.COM:5022',    
         AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
         FAILOVER_MODE = MANUAL,   
         SEEDING_MODE = AUTOMATIC   
      ),   
      'SQLAG02' WITH    
      (   
         LISTENER_URL = 'tcp://SQLFCI02.SQLDEMO.COM:5022',   
         AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
         FAILOVER_MODE = MANUAL,   
         SEEDING_MODE = AUTOMATIC   
      );    
GO  
    • After you run the join script, you should be able to see the database from the primary FCI cluster’s local availability group populate the secondary FCI cluster.
    • To do a distributed availability group failover, it is best practice to synchronize both clusters first.
  1. Synchronize primary cluster

ALTER AVAILABILITY GROUP [SQLFCIDAG] 
 MODIFY AVAILABILITY GROUP ON 
 'SQLAG01' 
WITH 
( 
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT 
), 
'SQLAG02'
WITH
( 
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT 
);
    • You can verify synchronization lag and verify state displays as “SYNCHRONIZED”:
SELECT ag.name
       , drs.database_id
       , db_name(drs.database_id) as database_name
       , drs.group_id
       , drs.replica_id
       , drs.synchronization_state_desc
       , drs.last_hardened_lsn  
FROM sys.dm_hadr_database_replica_states drs 
INNER JOIN sys.availability_groups ag on drs.group_id = ag.group_id;
  1. Perform failover at primary cluster

    After everything is ready, perform failover by first changing the DAG role on the global primary.

ALTER AVAILABILITY GROUP [SQLFCIDAG] SET (ROLE = SECONDARY);
  1. Perform failover at secondary cluster

After which, initiate the actual failover by running this script on the secondary cluster.

ALTER AVAILABILITY GROUP [SQLFCIDAG] FORCE_FAILOVER_ALLOW_DATA_LOSS;
  1. Change sync mode on primary and secondary clusters

    Then make sure to change Sync mode on both clusters back to Asynchronous:

 ALTER AVAILABILITY GROUP [SQLFCIDAG] 
 MODIFY AVAILABILITY GROUP ON 
'SQLAG01' 
WITH 
( 
AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT 
), 
'SQLAG02'
WITH
( 
AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT 
);

Conclusion

A multiple-Region strategy for your mission critical SQL Server deployments is key for business continuity and disaster recovery. This blog post focused on how to achieve that optimally by using distributed availability groups. You also learned about other benefits such as read scale outs by using distributed availability groups.

To learn more, check out Simplify your Microsoft SQL Server high availability deployments using Amazon FSx for Windows File Server.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Building Multi-Region and Multi-Account Tools with AWS Organizations

Post Syndicated from Cody Penta original https://aws.amazon.com/blogs/architecture/field-notes-building-multi-region-and-multi-account-tools-with-aws-organizations/

It’s common to start with a single AWS account when you are beginning your cloud journey with AWS. Running operations such as creating, reading, updating, and deleting resources in a single AWS account can be straightforward with AWS application program interfaces (APIs). Because an organization grows, so does their account strategy, often splitting workloads across multiple accounts. Fortunately, AWS customers can use AWS Organizations to group these accounts into logical units, also known as organizational units (OUs), to apply common policies and deploy standard infrastructure. However, this will result in an increased difficulty to run an API against all accounts, moreover, every Region that account could use. How does an organization answer these questions:

  • What is every Amazon FSx backup I own?
  • How can I do an on-demand batch job that will apply to my entire organization?
  • What is every internet access point across my organization?

This blog post shows us how we can use Organizations, AWS Single Sign-On (AWS SSO), AWS CloudFormation StackSets, and various AWS APIs to effectively build multi-account and multi-region tools that can address use cases like the ones above.

Running an AWS API sequentially across hundreds of accounts—potentially, many Regions—could take hours, depending on the API you call. An important aspect we will cover throughout this solution is the importance of concurrency for these types of tools.

Overview of solution

For this solution, we have created a fictional organization called Tavern that is set up with multiple organizational units (OUs), accounts, and Regions, to reflect a real-world scenario.

Figure 1. Organization configuration example

Figure 1. Organization configuration example

We will set up a user with multi-factor authentication (MFA) enabled so we can sign-in and access an admin user in the root account. Using this admin user, we will deploy a stack set across the organization that enables this user to assume limited permissions into each child account.

Next, we will use the Go programming language because of its native concurrency capabilities. More specifically, we will implement the pipeline concurrency pattern to build a multi-account and multi-region tool that will run APIs across our entire AWS footprint.

Additionally, we will add two common edge cases:

  • We block mass API actions to an account in a suspended OU (not pictured) and the root account.
  • We block API actions in disabled regions.

This will show us how to implement robust error handling in equally powerful tooling.

 Walkthrough

Let us separate the solution into distinct steps:

  • Create an automation user through AWS SSO.
    • This user can optionally be an IAM user or role assumed into by a third-party identity provider (such as, Azure Active Directory). Note the ARN of this identity because that is the key piece of information we will use for crafting a policy document.
  • Deploy a CloudFormation stack set across the organization that enables this user to assume limited access into each account.
    • For this blog post, we will deploy an organization-wide role with `ec2:DescribeRouteTables` permissions. Feel free to expand or change the permission set based on the type of tool you build.
  • Using Go, AWS Command Line Interface (CLI) v2, and AWS SDK for Go v2:
    1. Authenticate using AWS SSO.
    2. List every account in the organization.
    3. Assume permissions into that account.
    4.  Run an API across every Region in that account.
    5. Aggregate results for every Region.
    6. Aggregate results for every account.
    7. Report back the result.

For additional context, review this GitHub repository that contains all code and assets for this blog post.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • Multiple AWS accounts
  • AWS Organizations
  • AWS SSO (optional)
  • AWS SDK for Go v2
  • AWS CLI v2
  • Go programming knowledge (preferred), especially Go’s concurrency model
  • General programming knowledge

Create an automation user in AWS SSO

The first thing we need to do is create an identity to sign into. This can either be an AWS Identity and Access Management (IAM) user, an IAM role integrated with a third-party identity provider, or—in this case—an AWS SSO user.

  1. Log into the AWS SSO user console.
  2. Press Add user button.
  3. Fill in the appropriate information.
Figure 2.AWS SSO create user

Figure 2. AWS SSO create user

  1. Assign the user to the appropriate group. In this case, we will assign this user to AWSControlTowerAdmins.
Figure 3.Assigning SSO user to a group

Figure 3. Assigning SSO user to a group

  1. Verify the user was created. (Optionally: enable MFA).
Figure 4.Verifying User Creation and MFA

Figure 4. Verifying User Creation and MFA

Deploy a stack set across your organization

To effectively run any API across the organization, we need to deploy a common role that our AWS SSO user can assume across every account. We can use AWS CloudFormation StackSets to deploy this role at scale.

  1. Write the IAM role and associated policy document. The following is an example AWS Cloud Development Kit (AWS CDK) code for such a role. Note that orgAccount, roleName, and ssoUser in the below code will have to be replaced with your own values.
    const role = new iam.Role(this, 'TavernAutomationRole', {
      roleName: 'TavernAutomationRole',
      assumedBy: new iam.ArnPrincipal(`arn:aws:sts::${orgAccount}:assumed-role/${roleName}/${ssoUser}`),
    })
    role.addToPolicy(new PolicyStatement({
      actions: ['ec2:DescribeRouteTables'],
      resources: ['*']
    }))
  1. Log into the CloudFormation StackSets console.
  2. Press Create StackSet button.
  3. Upload the CloudFormation template containing the common role to be deployed to the organization by the preferred method.
  4. Specify name and optional description.
  5. Add any standard organization tags, and choose Service-managed permissions option.
  6. Choose Deploy to organization, and decide whether to disable or enable automatic deployment and appropriate account removal behavior. For this blog post, we choose to enable automatic deployment and accounts should remove the stack with removed from the target OU.
  7. For Specify regions, choose US East (N.Virginia). Note, because this stack contains only an IAM role, and IAM is a global service, region choice has no effect.
  8. For Maximum concurrent accounts, choose Percent, and enter 100 (this stack is not dependent on order).
  9. For Failure tolerance, choose Number, and enter 5, account deployment failures before a total rollback happens.
  10. For Region Concurrency, choose Sequential.
  11. Review your choices, note the deployment target (should be r-*), and acknowledge that CloudFormation might create IAM resources with custom names.
  12. Press the Submit button to deploy the stack.

Configure AWS SSO for the AWS CLI

To use our organization tools, we must first configure AWS SSO locally. With the AWS CLI v2, we can run:

aws configure sso

To configure credentials:

  1. Run the preceding command in your terminal.
  2. Follow the prompted steps.
    1. Specify your AWS SSO Start URL:
    2. AWS SSO Region:
  1. Authenticate through the pop-up browser window.
  2. Navigate back to the CLI, and choose the root account (this is where our principle for IAM originates).
  3. Specify the default client region.
  4. Specify the default output format.

Note the CLI profile name. Regardless if you choose to go with the autogenerated one or the custom one, we need this profile name for our upcoming code.

Start coding to utilize the AWS SSO shared profile

After AWS SSO is configured, we can start coding the beginning part of our multi-account tool. Our first step is to list every account belonging to our organization.

var (
    stsc    *sts.Client
    orgc    *organizations.Client
    ec2c    *ec2.Client
    regions []string
)

// init initializes common AWS SDK clients and pulls in all enabled regions
func init() {
    cfg, err := config.LoadDefaultConfig(context.TODO(), config.WithSharedConfigProfile("tavern-automation"))
    if err != nil {
        log.Fatal("ERROR: Unable to resolve credentials for tavern-automation: ", err)
    }

    stsc = sts.NewFromConfig(cfg)
    orgc = organizations.NewFromConfig(cfg)
    ec2c = ec2.NewFromConfig(cfg)

    // NOTE: By default, only describes regions that are enabled in the root org account, not all Regions
    resp, err := ec2c.DescribeRegions(context.TODO(), &ec2.DescribeRegionsInput{})
    if err != nil {
        log.Fatal("ERROR: Unable to describe regions", err)
    }

    for _, region := range resp.Regions {
        regions = append(regions, *region.RegionName)
    }
    fmt.Println("INFO: Listing all enabled regions:")
    fmt.Println(regions)
}

// main constructs a concurrent pipeline that pushes every account ID down
// the pipeline, where an action is concurrently run on each account and
// results are aggregated into a single json file
func main() {
    var accounts []string

    paginator := organizations.NewListAccountsPaginator(orgc, &organizations.ListAccountsInput{})
    for paginator.HasMorePages() {
        resp, err := paginator.NextPage(context.TODO())
        if err != nil {
            log.Fatal("ERROR: Unable to list accounts in this organization: ", err)
        }

        for _, account := range resp.Accounts {
            accounts = append(accounts, *account.Id)
        }
    }
    fmt.Println(accounts)

Implement concurrency into our code

With a slice of every AWS account, it’s time to concurrently run an API across all accounts. We will use some familiar Go concurrency patterns, as well as fan-out and fan-in.

// ... continued in main

    // Begin pipeline by calling gen with a list of every account
    in := gen(accounts...)

    // Fan out and create individual goroutines handling the requested action (getRoute)
    var out []<-chan models.InternetRoute
    for range accounts {
        c := getRoute(in)
        out = append(out, c)
    }

    // Fans in and collect the routing information from all go routines
    var allRoutes []models.InternetRoute
    for n := range merge(out...) {
        allRoutes = append(allRoutes, n)
    }

In the preceding code, we called a gen() function that started construction of our pipeline. Let’s take a deeper look into this function.

// gen primes the pipeline, creating a single separate goroutine
// that will sequentially put a single account id down the channel
// gen returns the channel so that we can plug it in into the next
// stage
func gen(accounts ...string) <-chan string {
    out := make(chan string)
    go func() {
        for _, account := range accounts {
            out <- account
        }
        close(out)
    }()
    return out
}

We see that gen just initializes the pipeline, and then starts pushing account ID’s down the pipeline one by one.

The next two functions are where all the heavy lifting is done. First, let’s investigate `getRoute()`.

// getRoute queries every route table in an account, including every enabled region, for a
// 0.0.0.0/0 (i.e. default route) to an internet gateway
func getRoute(in <-chan string) <-chan models.InternetRoute {
    out := make(chan models.InternetRoute)
    go func() {
        for account := range in {
            role := fmt.Sprintf("arn:aws:iam::%s:role/TavernAutomationRole", account)
            creds := stscreds.NewAssumeRoleProvider(stsc, role)

            for _, region := range regions {
                localCfg := aws.Config{
                    Region:      region,
                    Credentials: aws.NewCredentialsCache(creds),
                }

                localEc2Client := ec2.NewFromConfig(localCfg)

                paginator := ec2.NewDescribeRouteTablesPaginator(localEc2Client, &ec2.DescribeRouteTablesInput{})
                for paginator.HasMorePages() {
                    resp, err := paginator.NextPage(context.TODO())
                    if err != nil {
                        fmt.Println("WARNING: Unable to retrieve route tables from account: ", account, err)
                        out <- models.InternetRoute{Account: account}
                        close(out)
                        return
                    }

                    for _, routeTable := range resp.RouteTables {
                        for _, r := range routeTable.Routes {
                            if r.GatewayId != nil && strings.Contains(*r.GatewayId, "igw-") {
                                fmt.Println(
                                    "Account: ", account,
                                    " Region: ", region,
                                    " DestinationCIDR: ", *r.DestinationCidrBlock,
                                    " GatewayId: ", *r.GatewayId,
                                )
    
                                out <- models.InternetRoute{
                                    Account:         account,
                                    Region:          region,
                                    Vpc:             routeTable.VpcId,
                                    RouteTable:      routeTable.RouteTableId,
                                    DestinationCidr: r.DestinationCidrBlock,
                                    InternetGateway: r.GatewayId,
                                }
                            }
                        }
                    }
                }
            }

        }
        close(out)
    }()
    return out
}

A couple of key points to highlight are as follows:

for account := range in

When iterating over a channel, the current goroutine blocks, meaning we wait here until we get an account ID passed to us before continuing. We’ll keep doing this until our upstream closes the channel. In our case, our upstream closes the channel once it pushes every account ID down the channel.

role := fmt.Sprintf("arn:aws:iam::%s:role/TavernAutomationRole", account)
creds := stscreds.NewAssumeRoleProvider(stsc, role)

Here, we can reference our existing role that we deployed to every account and assume into that role with AWS Security Token Service (STS).

for _, region := range regions {

Lastly, when we have credentials into that account, we need to iterate over every region in that account to ensure we are capturing the entire global presence.

These three key areas are how we build organization-level tools. The remaining code is calling the desired API and delivering the result down to the next stage in our pipeline, where we merge all of the results.

// merge takes every go routine and "plugs" it into a common out channel
// then blocks until every input channel closes, signally that all goroutines
// are done in the previous stage
func merge(cs ...<-chan models.InternetRoute) <-chan models.InternetRoute {
    var wg sync.WaitGroup
    out := make(chan models.InternetRoute)

    output := func(c <-chan models.InternetRoute) {
        for n := range c {
            out <- n
        }
        wg.Done()
    }

    wg.Add(len(cs))
    for _, c := range cs {
        go output(c)
    }

    go func() {
        wg.Wait()
        close(out)
    }()
    return out
}

At the end of the main function, we take our in-memory data structures representing our internet entry points and marshal it into a JSON file.

    // ... continued in main

    savedRoutes, err := json.MarshalIndent(allRoutes, "", "\t")
    if err != nil {
        fmt.Println("ERROR: Unable to marshal internet routes to JSON: ", err)
    }
    ioutil.WriteFile("routes.json", savedRoutes, 0644)

With the code in place, we can run the code with `go run main.go` inside of your preferred terminal. The command will generate results like the following:

    // ... routes.json
    {
        "Account": "REDACTED",
        "Region": "eu-north-1",
        "Vpc": "vpc-1efd6c77",
        "RouteTable": "rtb-1038a979",
        "DestinationCidr": "0.0.0.0/0",
        "InternetGateway": "igw-c1b125a8"
    },
    {
        "Account": " REDACTED ",
        "Region": "eu-north-1",
        "Vpc": "vpc-de109db7",
        "RouteTable": "rtb-e042ce89",
        "DestinationCidr": "0.0.0.0/0",
        "InternetGateway": "igw-cbd457a2"
    },
    // ...

Cleaning up

To avoid incurring future charges, delete the following resources:

  • Stack set through the CloudFormation console
  • AWS SSO user (if you created one)

Conclusion

Creating organization tools that answer difficult questions such as, “show me every internet entry point in our organization,” are possible using Organizations APIs and CloudFormation StackSets. We also learned how to use Go’s native concurrency features to build these tools that scale across hundreds of accounts.

Further steps you might explore include:

  • Visiting the Github Repo to capture the full picture.
  • Taking our sequential solution for iterating over Regions and making it concurrent.
  • Exploring the possibility of accepting functions and interfaces in stages to generalize specific pipeline features.

Thanks for taking the time to read, and feel free to leave comments.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

How to use domain with Amazon SES in multiple accounts or regions

Post Syndicated from Leonardo Azize original https://aws.amazon.com/blogs/messaging-and-targeting/how-to-use-domain-with-amazon-ses-in-multiple-accounts-or-regions/

Sometimes customers want to use their email domain with Amazon Simples Email Service (Amazon SES) across multiple accounts, or the same account but across multiple regions.

For example, AnyCompany is an insurance company with marketing and operations business units. The operations department sends transactional emails every time customers perform insurance simulations. The marketing department sends email advertisements to existing and prospective customers. Since they are different organizations inside AnyCompany, they want to have their own Amazon SES billing. At the same time, they still want to use the same AnyCompany domain.

Other use-cases include customers who want to setup multi-region redundancy, need to satisfy data residency requirements, or need to send emails on behalf of several different clients. In all of these cases, customers can use different regions, in the same or across different accounts.

This post shows how to verify and configure your domain on Amazon SES across multiple accounts or multiple regions.

Overview of solution

You can use the same domain with Amazon SES across multiple accounts or regions. Your options are: different accounts but the same region, different accounts and different regions, and the same account but different regions.

In all of these scenarios, you will have two SES instances running, each sending email for example.com domain – let’s call them SES1 and SES2. Every time you configure a domain in Amazon SES it will generate a series of DNS records you will have to add on your domain authoritative DNS server, which is unique for your domain. Those records are different for each SES instance.

You will need to modify your DNS to add one TXT record, with multiple values, for domain verification. If you decide to use DomainKeys Identified Mail (DKIM), you will modify your DNS to add six CNAME records, three records from each SES instance.

When you configure a domain on Amazon SES, you can also configure a MAIL FROM domain. If you decide to do so, you will need to modify your DNS to add one TXT record for Sender Policy Framework (SPF) and one MX record for bounce and complaint notifications that email providers send you.

Furthermore, your domain can be configured to support DMAC for email spoofing detection. It will rely on SPF or DKIM configured above. Below we walk you through these steps.

  • Verify domain
    You will take TXT values from both SES1 and SES2 instances and add them in DNS, so SES can validate you own the domain
  • Complying with DMAC
    You will add a TXT value with DMAC policy that applies to your domain. This is not tied to any specific SES instance
  • Custom MAIL FROM Domain and SPF
    You will take TXT and MX records related from your MAIL FROM domain from both SES1 and SES2 instances and add them in DNS, so SES can comply with DMARC

Here is a sample matrix of the various configurations:

Two accounts, same region Two accounts, different regions One account, two regions
TXT records for domain verification*

1 record with multiple values

_amazonses.example.com = “VALUE FROM SES1”
“VALUE FROM SES2”

CNAMES for DKIM verification

6 records, 3 from each SES instance

record1-SES1._domainkey.example.com = VALUE FROM SES1
record2-SES1._domainkey.example.com = VALUE FROM SES1
record3-SES1._domainkey.example.com = VALUE FROM SES1
record1-SES2._domainkey.example.com = VALUE FROM SES2
record2-SES2._domainkey.example.com = VALUE FROM SES2
record3-SES2._domainkey.example.com = VALUE FROM SES2

TXT record for DMARC

1 record. It is not related to SES instance or region

_dmarc.example.com = DMARC VALUE

MAIL FROM MX record to define message sender for SES

1 record for entire region

mail.example.com = 10 feedback-smtp.us-east-1.amazonses.com

2 records, one for each region

mail1.example.com = 10 feedback-smtp.us-east-1.amazonses.com
mail2.example.com = 10 feedback-smtp.eu-west-1.amazonses.com

MAIL FROM TXT record for SPF

1 record for entire region

mail.example.com = “v=spf1 include:amazonses.com ~all”

2 records, one for each region

mail1.example.com = “v=spf1 include:amazonses.com ~all”
mail2.example.com = “v=spf1 include:amazonses.com ~all”

* Considering your DNS supports multiple values for a TXT record

Setup SES1 and SES2

In this blog, we call SES1 your primary or existing SES instance. We assume that you have already setup SES, but if not, you can still follow the instructions and setup both at the same time. The settings on SES2 will differ slightly, and therefore you will need to add new DNS entries to support the two-instance setup.

In this document we will use configurations from the “Verification,” “DKIM,” and “Mail FROM Domain” sections of the SES Domains screen and configure SES2 and setup DNS correctly for the two-instance configuration.

Verify domain

Amazon SES requires that you verify, in DNS, your domain, to confirm that you own it and to prevent others from using it. When you verify an entire domain, you are verifying all email addresses from that domain, so you don’t need to verify email addresses from that domain individually.

You can instruct multiple SES instances, across multiple accounts or regions to verify your domain.  The process to verify your domain requires you to add some records in your DNS provider. In this post I am assuming Amazon Route 53 is an authoritative DNS server for example.com domain.

Verifying a domain for SES purposes involves initiating the verification in SES console, and adding DNS records and values to confirm you have ownership of the domain. SES will automatically check DNS to complete the verification process. We assume you have done this step for SES1 instance, and have a _amazonses.example.com TXT record with one value already in your DNS. In this section you will add a second value, from SES2, to the TXT record. If you do not have SES1 setup in DNS, complete these steps twice, once for SES1 and again for SES2. This will prove to both SES instances that you own the domain and are entitled to send email from them.

Initiate Verification in SES Console

Just like you have done on SES1, in the second SES instance (SES2) initiate a verification process for the same domain; in our case example.com

  1. Sign in to the AWS Management Console and open the Amazon SES console.
  2. In the navigation pane, under Identity Management, choose Domains.
  3. Choose Verify a New Domain.
  4. In the Verify a New Domain dialog box, enter the domain name (i.e. example.com).
  5. If you want to set up DKIM signing for this domain, choose Generate DKIM Settings.
  6. Click on Verify This Domain.
  7. In the Verify a New Domain dialog box, you will see a Domain Verification Record Set containing a Name, a Type, and a Value. Copy Name and Value and store them for the step below, where you will add this value to DNS.
    (This information is also available by choosing the domain name after you close the dialog box.)

To complete domain verification, add a TXT record with the displayed Name and Value to your domain’s DNS server. For information about Amazon SES TXT records and general guidance about how to add a TXT record to a DNS server, see Amazon SES domain verification TXT records.

Add DNS Values for SES2

To complete domain verification for your second account, edit current _amazonses TXT record and add the Value from the SES2 to it. If you do not have an _amazonses TXT record create it, and add the Domain Verification values from both SES1 and SES2 to it. We are showing how to add record to Route 53 DNS, but the steps should be similar in any DNS management service you use.

  1. Sign in to the AWS Management Console and open the Amazon Route 53 console.
  2. In the navigation pane, choose Hosted zones.
  3. Choose the domain name you are verifying.
  4. Choose the _amazonses TXT record you created when you verified your domain for SES1.
  5. Under Record details, choose Edit record.
  6. In the Value box, go to the end of the existing attribute value, and then press Enter.
  7. Add the attribute value for the additional account or region.
  8. Choose Save.
  9. To validate, run the following command:
    dig TXT _amazonses.example.com +short
  10. You should see the two values returned:
    "4AjLMzUu4nSjrz4QVqDD8rXq8X2AHr+JhGSl4foiMmU="
    "abcde12345Sjrz4QVqDD8rXq8X2AHr+JhGSl4foiMmU="

Please note:

  1. if your DNS provider does not allow underscores in record names, you can omit _amazonses from the Name.
  2. to help you easily identify this record within your domain’s DNS settings, you can optionally prefix the Value with “amazonses:”.
  3. some DNS providers automatically append the domain name to DNS record names. To avoid duplication of the domain name, you can add a period to the end of the domain name in the DNS record. This indicates that the record name is fully qualified and the DNS provider need not append an additional domain name.
  4. if your DNS server does not support two values for a TXT record, you can have one record named _amazonses.example.com and another one called example.com.

Finally, after some time SES will complete its validation of the domain name and you should see the “pending validation” change to “verified”.

Verify DKIM

DomainKeys Identified Mail (DKIM) is a standard that allows senders to sign their email messages with a cryptographic key. Email providers then use these signatures to verify that the messages weren’t modified by a third party while in transit.

An email message that is sent using DKIM includes a DKIM-Signature header field that contains a cryptographically signed representation of the message. A provider that receives the message can use a public key, which is published in the sender’s DNS record, to decode the signature. Email providers then use this information to determine whether messages are authentic.

When you enable DKIM it generates CNAME records you need to add into your DNS. As it generates different values for each SES instance, you can use DKIM with multiple accounts and regions.

To complete the DKIM verification, copy the three (3) DKIM Names and Values from SES1 and three (3) from SES2 and add them to your DNS authoritative server as CNAME records.

You will know you are successful because, after some time SES will complete the DKIM verification and the “pending verification” will change to “verified”.

Configuring for DMARC compliance

Domain-based Message Authentication, Reporting and Conformance (DMARC) is an email authentication protocol that uses Sender Policy Framework (SPF) and/or DomainKeys Identified Mail (DKIM) to detect email spoofing. In order to comply with DMARC, you need to setup a “_dmarc” DNS record and either SPF or DKIM, or both. The DNS record for compliance with DMARC is setup once per domain, but SPF and DKIM require DNS records for each SES instance.

  1. Setup “_dmarc” record in DNS for your domain; one time per domain. See instructions here
  2. To validate it, run the following command:
    dig TXT _dmarc.example.com +short
    "v=DMARC1;p=quarantine;pct=25;rua=mailto:[email protected]"
  3. For DKIM and SPF follow the instructions below

Custom MAIL FROM Domain and SPF

Sender Policy Framework (SPF) is an email validation standard that’s designed to prevent email spoofing. Domain owners use SPF to tell email providers which servers are allowed to send email from their domains. SPF is defined in RFC 7208.

To comply with Sender Policy Framework (SPF) you will need to use a custom MAIL FROM domain. When you enable MAIL FROM domain in SES console, the service generates two records you need to configure in your DNS to document who is authorized to send messages for your domain. One record is MX and another TXT; see screenshot for mail.example.com. Save these records and enter them in your DNS authoritative server for example.com.

Configure MAIL FROM Domain for SES2

  1. Open the Amazon SES console at https://console.aws.amazon.com/ses/.
  2. In the navigation pane, under Identity Management, choose Domains.
  3. In the list of domains, choose the domain and proceed to the next step.
  4. Under MAIL FROM Domain, choose Set MAIL FROM Domain.
  5. On the Set MAIL FROM Domain window, do the following:
    • For MAIL FROM domain, enter the subdomain that you want to use as the MAIL FROM domain. In our case mail.example.com.
    • For Behavior if MX record not found, choose one of the following options:
      • Use amazonses.com as MAIL FROM – If the custom MAIL FROM domain’s MX record is not set up correctly, Amazon SES will use a subdomain of amazonses.com. The subdomain varies based on the AWS Region in which you use Amazon SES.
      • Reject message – If the custom MAIL FROM domain’s MX record is not set up correctly, Amazon SES will return a MailFromDomainNotVerified error. Emails that you attempt to send from this domain will be automatically rejected.
    • Click Set MAIL FROM Domain.

You will need to complete this step on SES1, as well as SES2. The MAIL FROM records are regional and you will need to add them both to your DNS authoritative server.

Set MAIL FROM records in DNS

From both SES1 and SES2, take the MX and TXT records provided by the MAIL FROM configuration and add them to the DNS authoritative server. If SES1 and SES2 are in the same region (us-east-1 in our example) you will publish exactly one MX record (mail.example.com in our example) into DNS, pointing to endpoint for that region. If SES1 and SES2 are in different regions, you will create two different records (mail1.example.com and mail2.example.com) into DNS, each pointing to endpoint for specific region.

Verify MX record

Example of MX record where SES1 and SES2 are in the same region

dig MX mail.example.com +short
10 feedback-smtp.us-east-1.amazonses.com.

Example of MX records where SES1 and SES2 are in different regions

dig MX mail1.example.com +short
10 feedback-smtp.us-east-1.amazonses.com.

dig MX mail2.example.com +short
10 feedback-smtp.eu-west-1.amazonses.com.

Verify if it works

On both SES instances (SES1 and SES2), check that validations are complete. In the SES Console:

  • In Verification section, Status should be “verified” (in green color)
  • In DKIM section, DKIM Verification Status should be “verified” (in green color)
  • In MAIL FROM Domain section, MAIL FROM domain status should be “verified” (in green color)

If you have it all verified on both accounts or regions, it is correctly configured and ready to use.

Conclusion

In this post, we explained how to verify and use the same domain for Amazon SES in multiple account and regions and maintaining the DMARC, DKIM and SPF compliance and security features related to email exchange.

While each customer has different necessities, Amazon SES is flexible to allow customers decide, organize, and be in control about how they want to uses Amazon SES to send email.

Author bio

Leonardo Azize Martins is a Cloud Infrastructure Architect at Professional Services for Public Sector.

His background is on development and infrastructure for web applications, working on large enterprises.

When not working, Leonardo enjoys time with family, read technical content, watch movies and series, and play with his daughter.

Contributor

Daniel Tet is a senior solutions architect at AWS specializing in Low-Code and No-Code solutions. For over twenty years, he has worked on projects for Franklin Templeton, Blackrock, Stanford Children’s Hospital, Napster, and Twitter. He has a Bachelor of Science in Computer Science and an MBA. He is passionate about making technology easy for common people; he enjoys camping and adventures in nature.

 

Align with best practices while creating infrastructure using CDK Aspects

Post Syndicated from Om Prakash Jha original https://aws.amazon.com/blogs/devops/align-with-best-practices-while-creating-infrastructure-using-cdk-aspects/

Organizations implement compliance rules for cloud infrastructure to ensure that they run the applications according to their best practices. They utilize AWS Config to determine overall compliance against the configurations specified in their internal guidelines. This is determined after the creation of cloud resources in their AWS account. This post will demonstrate how to use AWS CDK Aspects to check and align with best practices before the creation of cloud resources in your AWS account.

The AWS Cloud Development Kit (CDK) is an open-source software development framework that lets you define your cloud application resources using familiar programming languages, such as TypeScript, Python, Java, and .NET. The expressive power of programming languages to define infrastructure accelerates the development process and improves the developer experience.

AWS Config is a service that enables you to assess, audit, and evaluate your AWS resource configurations. Config continuously monitors and records your AWS resource configurations, as well as lets you automate the evaluation of recorded configurations against desired configurations. React to non-compliant resources and change their state either automatically or manually.

AWS Config helps customers run their workloads on AWS in a compliant manner. Some customers want to detect it up front, and then only provision compliant resources. Some configurations are important for the customers, so they might not provision resources without having them compliant from the beginning. The following are examples of such configurations:

  • Amazon S3 bucket must not be created with public access
  • Amazon S3 bucket encryption must be enabled
  • Database deletion protection must be enabled

CDK Aspects

CDK Aspects are a way to apply an operation to every construct in a given scope. The aspect could verify something about the state of the constructs, such as ensuring that all buckets are encrypted, or it could modify the constructs, such as by adding tags.

An aspect is a class that implements the IAspect interface shown below. Aspects employ visitor patter, which allows them to add a new operation to existing object structures without modifying the structures. In object-oriented programming and software engineering, the visitor design pattern is a method for separating an algorithm from an object structure on which it operates.

interface IAspect {
   visit(node: IConstruct): void;
}

An AWS CDK app goes through the following lifecycle phases when you call cdk deploy. These phases are also shown in the diagram below. Learn more about the CDK application lifecycle at this page.

  1. Construction
  2. Preparation
  3. Validation
  4. Synthesis
  5. Deployment

Understanding the CDK Deploy

CDK Aspects become relevant during the Prepare phase, where it makes the final modifications round in the constructs to setup their final state. This Prepare phase happens automatically. All constructs have their internal list of Aspects which are called and applied during the Prepare phase. Add your custom aspects in a scope by calling the following method:

Aspects.of(myConstruct).add(new SomeAspect(...));

When you call the method above, constructs add the custom aspects to the list of internal aspects. When CDK application goes through the Prepare phase, then AWS CDK calls the visit method of the object for the constructs and all of its children in top-down order. The visit method is free to change anything in the construct.

How to align with or check configuration compliance using CDK Aspects

In the following sections, you will see how to implement CDK Aspects for some common use cases when provisioning the cloud resources. CDK Aspects are extensible, and you can extend it for any suitable use cases in order to implement additional rules. Apply CDK Aspects not only to verify against the best practices, but also to update the state before resource creation.

The code below creates the cloud resources to be verified against the best practices or to be updated using Aspects in the following sections.

export class AwsCdkAspectsStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    //Create a VPC with 3 availability zones
    const vpc = new ec2.Vpc(this, 'MyVpc', {
      maxAzs: 3,
    });

    //Create a security group
    const sg = new ec2.SecurityGroup(this, 'mySG', {
      vpc: vpc,
      allowAllOutbound: true
    })

    //Add ingress rule for SSH from the public internet
    sg.addIngressRule(ec2.Peer.anyIpv4(), ec2.Port.tcp(22), 'SSH access from anywhere')

    //Launch an EC2 instance in private subnet
    const instance = new ec2.Instance(this, 'MyInstance', {
      vpc: vpc,
      machineImage: ec2.MachineImage.latestAmazonLinux(),
      instanceType: new ec2.InstanceType('t3.small'),
      vpcSubnets: {subnetType: ec2.SubnetType.PRIVATE},
      securityGroup: sg
    })

    //Launch MySQL rds database instance in private subnet
    const database = new rds.DatabaseInstance(this, 'MyDatabase', {
      engine: rds.DatabaseInstanceEngine.mysql({
        version: rds.MysqlEngineVersion.VER_5_7
      }),
      vpc: vpc,
      vpcSubnets: {subnetType: ec2.SubnetType.PRIVATE},
      deletionProtection: false
    })

    //Create an s3 bucket
    const bucket = new s3.Bucket(this, 'MyBucket')
  }
}

In the first section, you will see the use cases and code where Aspects are used to verify the resources against the best practices.

  1. VPC CIDR range must start with specific CIDR IP
  2. Security Group must not have public ingress rule
  3. EC2 instance must use approved AMI
//Verify VPC CIDR range
export class VPCCIDRAspect implements IAspect {
    public visit(node: IConstruct): void {
        if (node instanceof ec2.CfnVPC) {
            if (!node.cidrBlock.startsWith('192.168.')) {
                Annotations.of(node).addError('VPC does not use standard CIDR range starting with "192.168."');
            }
        }
    }
}

//Verify public ingress rule of security group
export class SecurityGroupNoPublicIngressAspect implements IAspect {
    public visit(node: IConstruct) {
        if (node instanceof ec2.CfnSecurityGroup) {
            checkRules(Stack.of(node).resolve(node.securityGroupIngress));
        }

        function checkRules (rules :Array<IngressProperty>) {
            if(rules) {
                for (const rule of rules.values()) {
                    if (!Tokenization.isResolvable(rule) && (rule.cidrIp == '0.0.0.0/0' || rule.cidrIp == '::/0')) {
                        Annotations.of(node).addError('Security Group allows ingress from public internet.');
                    }
                }
            }
        }
    }
}

//Verify AMI of EC2 instance
export class EC2ApprovedAMIAspect implements IAspect {
    public visit(node: IConstruct) {
        if (node instanceof ec2.CfnInstance) {
            if (node.imageId != 'approved-image-id') {
                Annotations.of(node).addError('EC2 Instance is not using approved AMI.');
            }
        }
    }
}

In the second section, you will see the use cases and code where Aspects are used to update the resources in order to make them compliant before creation.

  1. S3 bucket encryption must be enabled. If not, then enable
  2. S3 bucket versioning must be enabled. If not, then enable
  3. RDS instance must have deletion protection enabled. If not, then enable
//Enable versioning of bucket if not enabled
export class BucketVersioningAspect implements IAspect {
    public visit(node: IConstruct): void {
        if (node instanceof s3.CfnBucket) {
            if (!node.versioningConfiguration
                || (!Tokenization.isResolvable(node.versioningConfiguration)
                    && node.versioningConfiguration.status !== 'Enabled')) {
                Annotations.of(node).addInfo('Enabling bucket versioning configuration.');
                node.addPropertyOverride('VersioningConfiguration', {'Status': 'Enabled'})
            }
        }
    }
}

//Enable server side encryption for the bucket if no encryption is enabled
export class BucketEncryptionAspect implements IAspect {
    public visit(node: IConstruct): void {
        if (node instanceof s3.CfnBucket) {
            if (!node.bucketEncryption) {
                Annotations.of(node).addInfo('Enabling default S3 server side encryption.');
                node.addPropertyOverride('BucketEncryption', {
                        "ServerSideEncryptionConfiguration": [
                            {
                                "ServerSideEncryptionByDefault": {
                                    "SSEAlgorithm": "AES256"
                                }
                            }
                        ]
                    }
                )
            }
        }
    }
}

//Enable deletion protection of DB instance if not already enabled
export class RDSDeletionProtectionAspect implements IAspect {
    public visit(node: IConstruct) {
        if (node instanceof rds.CfnDBInstance) {
            if (! node.deletionProtection) {
                Annotations.of(node).addInfo('Enabling deletion protection of DB instance.');
                node.addPropertyOverride('DeletionProtection', true);
            }
        }
    }
}

Once you create the aspects, add them in a particular scope. That scope can be App, Stack, or Construct. In the example below, all aspects are added in the scope of Stack.

const app = new cdk.App();

const stack = new AwsCdkAspectsStack(app, 'MyApplicationStack');

Aspects.of(stack).add(new VPCCIDRAspect());
Aspects.of(stack).add(new SecurityGroupNoPublicIngressAspect());
Aspects.of(stack).add(new EC2ApprovedAMIAspect());
Aspects.of(stack).add(new RDSDeletionProtectionAspect());
Aspects.of(stack).add(new BucketEncryptionAspect());
Aspects.of(stack).add(new BucketVersioningAspect());

app.synth();

Once you call cdk deploy for the above code with aspects added, you will see the output below. The deployment will not continue until you resolve the errors and modifications conducted to make some of the resources compliant.

Screenshot displaying CDK errors.

Also utilize Aspects to make general modifications to the resources regardless of any compliance checks. For example, use it apply mandatory tags to every taggable resource. Tags is an example of implementing CDK Aspects in order to achieve this functionality. Utilizing the code below, add or remove a tag from all taggable resources and their children in the scope of a Construct.

Tags.of(myConstruct).add('key', 'value');
Tags.of(myConstruct).remove('key');

Below is an example of adding the Department tag to every resource created in the scope of Stack.

Tags.of(stack).add('Department', 'Finance');

CDK Aspects are ways for developers to align with and check best practices in their infrastructure configurations using the programming language of choice. AWS CloudFormation Guard (cfn-guard) provides compliance administrators with a simple, policy-as-code language to author policies and apply them to enforce best practices. Aspects are applied before generation of the CloudFormation template in Prepare phase, but cfn-guard is applied after generation of the CloudFormation template and before the Deploy phase. Developers can use Aspects or cfn-guard or both as part of a CI/CD pipeline to stop deployment of non-compliant resources, but CloudFormation Guard is the way to go when you want to enforce compliances and prevent deployment of non-compliant resources.

Conclusion

If you are utilizing AWS CDK to provision your infrastructure, then you can start using Aspects to align with best practices before resources are created. If you are utilizing CloudFormation template to manage your infrastructure, then you can read this blog to learn how to migrate the CloudFormation template to AWS CDK. After the migration, utilize CDK Aspects not only to evaluate compliance of your resources against the best practices, but also modify their state to make them compliant before they are created.

About the Authors

Om Prakash Jha

Om Prakash Jha is a Solutions Architect at AWS. He helps customers build well-architected applications on AWS in the retail industry vertical. He has more than a decade of experience in developing, designing, and architecting mission critical applications. His passion is DevOps and application modernization. Outside of his work, he likes to read books, watch movies, and explore part of the world with his family.

Target cross-platform Go builds with AWS CodeBuild Batch builds

Post Syndicated from Russell Sayers original https://aws.amazon.com/blogs/devops/target-cross-platform-go-builds-with-aws-codebuild-batch-builds/

Many different operating systems and architectures could end up as the destination for our applications. By using a AWS CodeBuild batch build, we can run builds for a Go application targeted at multiple platforms concurrently.

Cross-compiling Go binaries for different platforms is as simple as setting two environment variables $GOOS and $GOARCH, regardless of the build’s host platform. For this post we will build all of the binaries on Linux/x86 containers. You can run the command go tool dist list to see the Go list of supported platforms. We will build binaries for six platforms: Windows+ARM, Windows+AMD64, Linux+ARM64, Linux+AMD64, MacOS+ARM64, and Mac+AMD64. Note that AMD64 is a 64-bit architecture based on the Intel x86 instruction set utilized on both AMD and Intel hardware.

This post demonstrates how to create a single AWS CodeBuild project by using a batch build and a single build spec to create concurrent builds for the six targeted platforms. Learn more about batch builds in AWS CodeBuild in the documentation: Batch builds in AWS CodeBuild

Solution Overview

AWS CodeBuild is a fully managed continuous integration service that compiles source code, runs tests, and produces ready-to-deploy software packages. A batch build is utilized to run concurrent and coordinated builds. Let’s summarize the 3 different batch builds:

  • Build graph: defines dependencies between builds. CodeBuild utilizes the dependencies graph to run builds in an order that satisfies the dependencies.
  • Build list: utilizes a list of settings for concurrently run builds.
  • Build matrix: utilizes a matrix of settings to create a build for every combination.

The requirements for this project are simple – run multiple builds with a platform pair of $GOOS and $GOARCH environment variables. For this, a build list can be utilized. The buildspec for the project contains a batch/build-list setting containing every environment variable for the six builds.

batch:
  build-list:
    - identifier: build1
      env:
        variables:
          GOOS: darwin
          GOARCH: amd64
    - identifier: build2
      env:
        variables:
          GOOS: darwin
          GOARCH: arm64
    - ...

The batch build project will launch seven builds. See the build sequence in the diagram below.

  • Step 1 – A build downloads the source.
  • Step 2 – Six concurrent builds configured with six sets of environment variables from the batch/build-list setting.
  • Step 3 – Concurrent builds package a zip file and deliver to the artifacts Amazon Simple Storage Service (Amazon S3) bucket.

build sequence

The supplied buildspec file includes commands for the install and build phases. The install phase utilizes the phases/install/runtime-versions phase to set the version of Go is used in the build container.

The build phase contains commands to replace source code placeholders with environment variables set by CodeBuild. The entire list of environment variables is documented at Environment variables in build environments. This is followed by a simple go build to build the binaries. The application getting built is an AWS SDK for Go sample that will list the contents of an S3 bucket.

  build:
    commands:
      - mv listObjects.go listObjects.go.tmp
      - cat listObjects.go.tmp | envsubst | tee listObjects.go
      - go build listObjects.go

The artifacts sequence specifies the build outputs that we want packaged and delivered to the artifacts S3 bucket. The name setting creates a name for ZIP artifact. And the name combines the operating system, architecture environment variables, as well as the git commit hash. We use Shell command language to expand the environment variables, as well as command substitution to take the first seven characters of the git hash.

artifacts:
  files:
    - 'listObjects'
    - 'listObjects.exe'
    - 'README.md'
  name: listObjects_${GOOS}_${GOARCH}_$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | cut -c 1-7).zip 

Let’s walk through the CodeBuild project setup process. When the builds complete, we’ll see zip files containing the builds for each platform in an artifacts bucket.

Here is what we will create:

  • Create an S3 bucket to host the built artifacts.
  • Create the CodeBuild project, which needs to know:
    • Where the source is
    • The environment – a docker image and a service role
    • The location of the build spec
    • Batch configuration, a service role used to launch batch build groups
    • The artifact S3 bucket location

The code that is built is available on github here: https://github.com/aws-samples/cross-platform-go-builds-with-aws-codebuild. For an alternative to the manual walkthrough steps, a CloudFormation template that will build all of the resources is available in the git repository.

Prerequisites

For this walkthrough, you must have the following prerequisites:

Create the Artifacts S3 Bucket

An S3 bucket will be the destination of the build artifacts.

  • In the Amazon S3 console, create a bucket with a unique name. This will be the destination of your build artifacts.
    creating an s3 bucket

Create the AWS CodeBuild Project

  • In the AWS CodeBuild console, create a project named multi-arch-build.
    codebuild project name
  • For Source Provider, choose GitHub. Choose the Connect to GitHub button and follow the authorization screens. For repository, enter https://github.com/aws-samples/cross-platform-go-builds-with-aws-codebuild
    coldbuild select source
  • For Environment image, choose Managed Image. Choose the most recent Ubuntu image. This image contains every tool needed for the build. If you are interested in the contents of the image, then you can see the Dockerfile used to build the image in the GitHub repository here: https://github.com/aws/aws-codebuild-docker-images/tree/master/ubuntu
    Select environment
  • For Service Role, keep the suggested role name.
    Service role
  • For Build specifications, leave Buildspec name empty. This will use the default location buildspec.yml in the source root.
  • Under Batch configuration, enable Define batch configuration. For the Role Name, enter a name for the role: batch-multi-arch-build-service-role. There is an option here to combine artifacts. CodeBuild can combine the artifacts from batch builds into a single location. This isn’t needed for this build, as we want a zip to be created for each platform.
    Batch configuration
  • Under Artifacts, for Type, choose Amazon S3. For Bucket name, select the S3 bucket created earlier. This is where we want the build artifacts delivered. Choose the checkbox for Enable semantic versioning. This will tell CodeBuild to use the artifact name that was specified in the buildspec file.
    Artifacts
  • For Artifacts Packaging, choose Zip for CodeBuild to create a compressed zip from the build artifacts.
    Artifacts packaging
  • Create the build project, and start a build. You will see the DOWNLOAD_SOURCE build complete, followed by the six concurrent builds for each combination of OS and architecture.
    Builds in batch

Run the Artifacts

The builds have completed, and each packaged artifact has been delivered to the S3 bucket. Remember the name of the ZIP archive that was built using the buildspec file setting. This incorporated a combination of the operating system, architecture, and git commit hash.

Run the artifacts

Below, I have tested the artifact by downloading the zip for the operating system and architecture combination on three different platforms: MacOS/AMD64, an AWS Graviton2 instance, and a Microsoft Windows instance. Note the system information, unzipping the platform artifact, and the build specific information substituted into the Go source code.

Window1 Window2 Window3

Cleaning up

To avoid incurring future charges, delete the resources:

  • On the Amazon S3 console, choose the artifacts bucket created, and choose Empty. Confirm the deletion by typing ‘permanently delete’. Choose Empty.
  • Choose the artifacts bucket created, and Delete.
  • On the IAM console, choose Roles.
  • Search for batch-multi-arch-build-service-role and Delete. Search for codebuild-multi-arch-build-service-role and Delete.
  • Go to the CodeBuild console. From Build projects, choose multi-arch-build, and choose Delete build project.

Conclusion

This post utilized CodeBuild batch builds to build and package binaries for multiple platforms concurrently. The build phase used a small amount of scripting to replace placeholders in the code with build information CodeBuild makes available in environment variables. By overriding the artifact name using the buildspec setting, we created zip files built from information about the build. The zip artifacts were downloaded and tested on three platforms: Intel MacOS, a Graviton ARM based EC2 instance, and Microsoft Windows.

Features like this let you build on CodeBuild, a fully managed build service – and not have to worry about maintaining your own build servers.

Field Notes: How to Build an AWS Glue Workflow using the AWS Cloud Development Kit

Post Syndicated from Michael Hamilton original https://aws.amazon.com/blogs/architecture/field-notes-how-to-build-an-aws-glue-workflow-using-the-aws-cloud-development-kit/

Many customers use AWS Glue workflows to build and orchestrate their ETL (extract-transform-load) pipelines directly in the AWS Glue console using the visual tool to author workflows. This can be time consuming, harder to version control, and error prone due to manual configurations, when compared to managing your workflows as code. To improve your operational excellence, consider deploying the entire AWS Glue ETL pipeline using the AWS Cloud Development Kit (AWS CDK).

In this blog post, you will learn how to build an AWS Glue workflow using Amazon Simple Storage Service (Amazon S3), various components of AWS Glue, AWS Secrets Manager, Amazon Redshift, and the AWS CDK.

Architecture overview

In this architecture, you will use the AWS CDK to deploy your data sources, ETL scripts, AWS Glue workflow components, and an Amazon Redshift cluster for analyzing the transformed data.

AWS Glue workflow architecture

Figure 1. AWS Glue workflow architecture

It is common for customers to pre-aggregate data before sending it downstream to analytical engines, like Amazon Redshift, because table joins and aggregations are computationally expensive. The AWS Glue workflow will join COVID-19 case data, and COVID-19 hiring data together on their date columns in order to run correlation analysis on the final dataset. The datasets may seem arbitrary, but we wanted to offer a way to better understand the impacts COVID-19 had on jobs in the United States. The takeaway here is to use this as a blueprint for automating the deployment of data analytic pipelines for the data of interest to your business.

After the AWS CDK application is deployed, it will begin creating all of the resources required to build the complete workflow. When it completes, the components in the architecture will be created, and the AWS Glue workflow will be ready to start. In this blog post, you start workflows manually, but they can be configured to start on a scheduled time or from a workflow trigger.

The workflow is programmed to dynamically pull the raw data from the Registry of Open Data on AWS where you can find the Covid-19 case data and the Hiring Data respectively.

Prerequisites

This blog post uses an AWS CDK stack written in TypeScript and AWS Glue jobs written in Python. Follow the instructions in the AWS CDK Getting Started guide to set up your environment, before you proceed to deployment.

In addition to setting up your environment, you need to clone the Git repository, which contains the AWS CDK scripts and Python ETL scripts used by AWS Glue. The ETL scripts will be deployed to Amazon S3 by the AWS CDK stack as assets, and referenced by the AWS Glue jobs as part of the AWS Glue Workflow.

You should have the following prerequisites:

Deployment

After you have cloned the repository, navigate to the glue-cdk-blog/lib folder and open the blog-glue-workflow-stack.ts file. This is the AWS CDK script used to deploy all necessary resources to build your AWS Glue workflow. The blog-redshift-vpc-stack.ts contains the necessary resources to deploy the Amazon Redshift cluster, connections, and permissions. The glue-cdk-blog/lib/assets folder also contains the AWS Glue job scripts. These files are uploaded to Amazon S3 by the AWS CDK when you bootstrap.

You won’t review the individual lines of code in the script in this blog post, but if you are unfamiliar with any of the AWS CDK level 1 or level 2 constructs used in the sample, you can review what each construct does with the AWS CDK documentation. Familiarize yourself with the script you cloned and anticipate what resources will be deployed. Then, deploy both stacks and verify your initial findings.

After your environment is configured, and the packages and modules installed, deploy the AWS CDK stack and assets in two commands.

  1. Bootstrap the AWS CDK stack to create an S3 bucket in the predefined account that will contain the assets.

cdk bootstrap

  1. Deploy the AWS CDK stacks.

cdk deploy --all

Verify that both of these commands have completed successfully, and remediate any failures returned. Upon successful completion, you’re ready to start the AWS Glue workflow that was just created. You can find the AWS CDK commands reference in the AWS CDK Toolkit commands documentation, and help with Troubleshooting common AWS CDK issues you may encounter.

Walkthrough

Prior to initiating the AWS Glue workflow, explore the resources the AWS CDK stacks just deployed to your account.

  1. Log in to the AWS Management Console and the AWS CDK account.
  2. Navigate to Amazon S3 in the AWS console (you should see an S3 bucket with the name prefix of cdktoolkit-stagingbucket-xxxxxxxxxxxx).
  3. Review the objects stored in the bucket in the assets folder. These are the .py files used by your AWS Glue jobs. They were uploaded to the bucket when you issued the AWS CDK bootstrap command, and referenced within the AWS CDK script as the scripts to use for the AWS Glue jobs. When retrieving data from multiple sources, you cannot always control the naming convention of the sourced files. To solve this and create better standardization, you will use a job within the AWS Glue workflow to copy these scripts to another folder and rename them with a more meaningful name.
  4. Navigate to Amazon Redshift in the AWS console and verify your new cluster. You can use the Amazon Redshift Query Editor within the console to connect to the cluster and see that you have an empty database called db-covid-hiring. The Amazon Redshift cluster and networking resources were created by the redshift_vpc_stack which are listed here:
    • VPC, subnet and security group for Amazon Redshift
    • Secrets Manager secret
    • AWS Glue connection and S3 endpoint
    • Amazon Redshift cluster
  1. Navigate to AWS Glue in the AWS console and review the following new resources created by the workflow_stack CDK stack:
    • Two crawlers to crawl the data in S3
    • Three AWS Glue jobs used within the AWS Glue workflow
    • Five triggers to initiate AWS Glue jobs and crawlers
    • One AWS Glue workflow to manage the ETL orchestration
  1. All of these resources could have been deployed within a single stack, but this is intended to be a simple example on how to share resources across multiple stacks. The AWS Identity and Access Management (IAM) role that AWS Glue uses to run the ETL jobs in the workflow_stack, is also used by Secrets Manager for Amazon Redshift in the redshift_vpc_stack. Inspect the /bin/blog-glue-workflow-stack.ts file to further understand cross stack resource sharing.

By performing these steps, you have deployed all of the AWS Glue resources necessary to perform common ETL tasks. You then combined the resources to create an orchestration of tasks using an AWS Glue workflow. All of this was done using IaC with AWS CDK. Your workflow should look like Figure 2.

AWS Glue console showing the workflow created by the CDK

Figure 2. AWS Glue console showing the workflow created by the CDK

As mentioned earlier, you could have started your workflow using a scheduled cron trigger, but you initiated the workflow manually so you had time to review the resources the workflow_stack CDK deployed, prior to initiation of the workflow. Now that you have reviewed the resources, validate your workflow by initiating it and verifying it runs successfully.

  1. From within the AWS Glue console, select Workflows under ETL.
  2. Select the workflow named glue-workflow, and then select Run from the actions listbox.
  3. You can verify the status of the workflow by viewing the run details under the History tab.
  4. Your job will take approximately 15 minutes to successfully complete, and your history should look like Figure 3.
AWS Glue console showing the workflow as completed after the run

Figure 3. AWS Glue console showing the workflow as completed after the run

The workflow performs the following tasks:

  1. Prepares the ETL scripts by copying the files in the S3 asset bucket to a new folder and renames them with a more relevant name.
  2. Initiates a crawler to crawl the raw source data as csv files and adds tables to the Glue Data Catalog.
  3. Runs a Python script to perform some ETL tasks on the .csv files and converts them to parquet files.
  4. Crawls the parquet files and adds them to the Glue Data Catalog.
  5. Loads the parquet files into a DynamicFrame and runs an Amazon Redshift COPY command to load the data into the Amazon Redshift database.

After the workflow completes, you can query and perform analytics on the data that was populated in Amazon Redshift. Open the Amazon Redshift Query Editor and run a simple SELECT statement to query the covid_hiring_table which is the joined Covid-19 case data and hiring data (see Figure 4).

Amazon Redshift query editor showing the data that the workflow loaded into the Redshift tables

Figure 4. Amazon Redshift query editor showing the data that the workflow loaded into the Redshift tables

Cleaning up

Some resources, like S3 buckets and Amazon DynamoDB tables, must be manually emptied and deleted through the console to be fully removed. To clean up the deployment, delete all objects in the AWS CDK asset bucket in Amazon S3 by using the AWS console to empty the bucket, and then run cdk destroy –all to delete the resources the AWS CDK stacks created in your account. Finally, if you don’t plan on using AWS CloudFormation assets in this account in the future, you will need to delete the AWS CDK asset stack within the CloudFormation console to remove the AWS CDK asset bucket.

Conclusion

In this blog post, you learned how to automate the deployment of AWS Glue workflows using the AWS CDK. This further enhances your continuous integration and delivery (CI/CD) data pipelines by automating the deployment of the ETL jobs and AWS Glue workflow orchestration, providing an efficient, fast, and repeatable way to build and deploy AWS Glue workflows at scale.

Although AWS CDK primarily supports level 1 constructs for most AWS Glue resources, new constructs are added continually. See the AWS CDK API Reference for updates, prior to authoring your stacks, for AWS Glue level 2 construct support. You can find the code used in this blog post in this GitHub repository, and the AWS CDK in TypeScript reference to the AWS CDK namespace.

We hope this blog post helps enrich your work through the skills gained of automating the creation of Glue Workflows, enabling you to quickly build and deploy your own ETL pipelines and run analytical models that power your business.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Simulated location data with Amazon Location Service

Post Syndicated from Aaron Sempf original https://aws.amazon.com/blogs/devops/simulated-location-data-with-amazon-location-service/

Modern location-based applications require the processing and storage of real-world assets in real-time. The recent release of Amazon Location Service and its Tracker feature makes it possible to quickly and easily build these applications on the AWS platform. Tracking real-world assets is important, but at some point when working with Location Services you will need to demo or test location-based applications without real-world assets.

Applications that track real-world assets are difficult to test and demo in a realistic setting, and it can be hard to get your hands on large amounts of real-world location data. Furthermore, not every company or individual in the early stages of developing a tracking application has access to a large fleet of test vehicles from which to derive this data.

Location data can also be considered highly sensitive, because it can be easily de-anonymized to identify individuals and movement patterns. Therefore, only a few openly accessible datasets exist and are unlikely to exhibit the characteristics required for your particular use-case.

To overcome this problem, the location-based services community has developed multiple openly available location data simulators. This blog will demonstrate how to connect one of those simulators to Amazon Location Service Tracker to test and demo your location-based services on AWS.

Walk-through

Part 1: Create a tracker in Amazon Location Service

This walkthrough will demonstrate how to get started setting up simulated data into your tracker.

Amazon location Service console
Step 1: Navigate to Amazon Location Service in the AWS Console and select “Trackers“.

Step 2: On the “Trackers” screen click the orange “Create tracker“ button.

Select Create Tracker

Create a Tracker form

Step 3: On the “Create tracker” screen, name your tracker and make sure to reply “Yes” to the question asking you if you will only use simulated or sample data. This allows you to use the free-tier of the service.

Next, click “Create tracker” to create you tracker.

Confirm create tracker

Done. You’ve created a tracker. Note the “Name” of your tracker.

Generate trips with the SharedStreets Trip-simulator

A common option for simulating trip data is the shared-steets/trip-simulator project.

SharedStreets maintains an open-source project on GitHub – it is a probabilistic, multi-agent GPS trajectory simulator. It even creates realistic noise, and thus can be used for testing algorithms that must work under real-world conditions. Of course, the generated data is fake, so privacy is not a concern.

The trip-simulator generates files with a single GPS measurement per line. To playback those files to the Amazon Location Service Tracker, you must use a tool to parse the file; extract the GPS measurements, time measurements, and device IDs of the simulated vehicles; and send them to the tracker at the right time.

Before you start working with the playback program, the trip-simulator requires a map to simulate realistic trips. Therefore, you must download a part of OpenStreetMap (OSM). Using GeoFabrik you can download extracts at the size of states or selected cities based on the area within which you want to simulate your data.

This blog will demonstrate how to simulate a small fleet of cars in the greater Munich area. The example will be written for OS-X, but it generalizes to Linux operating systems. If you have a Windows operating system, I recommend using Windows Subsystem for Linux (WSL). Alternatively, you can run this from a Cloud9 IDE in your AWS account.

Step 1: Download the Oberbayern region from download.geofabrik.de

Prerequisites:

curl https://download.geofabrik.de/europe/germany/bayern/oberbayern-latest.osm.pbf -o oberbayern-latest.osm.pbf

Step 2: Install osmium-tool

Prerequisites:

brew install osmium-tool

Step 3: Extract Munich from the Oberbayern map

osmium extract -b "11.5137,48.1830,11.6489,48.0891" oberbayern-latest.osm.pbf -o ./munich.osm.pbf -s "complete_ways" --overwrite

Step 4: Pre-process the OSM map for the vehicle routing

Prerequisites:

docker run -t -v $(pwd):/data osrm/osrm-backend:v5.25.0 osrm-extract -p /opt/car.lua /data/munich.osm.pbf
docker run -t -v $(pwd):/data osrm/osrm-backend:v5.25.0 osrm-contract /data/munich.osrm

Step 5: Install the trip-simulator

Prerequisites:

npm install -g trip-simulator

Step 6: Run a 10 car, 30 minute car simulation

trip-simulator \
  --config car \
  --pbf munich.osm.pbf \
  --graph munich.osrm \
  --agents 10 \
  --start 1563122921000 \
  --seconds 1800 \
  --traces ./traces.json \
  --probes ./probes.json \
  --changes ./changes.json \
  --trips ./trips.json

The probes.json file is the file containing the GPS probes we will playback to Amazon Location Service.

Part 2: Playback trips to Amazon Location Service

Now that you have simulated trips in the probes.json file, you can play them back in the tracker created earlier. For this, you must write only a few lines of Python code. The following steps have been neatly separated into a series of functions that yield an iterator.

Prerequisites:

Step 1: Load the probes.json file and yield each line

import json
import time
import datetime
import boto3

def iter_probes_file(probes_file_name="probes.json"):
    """Iterates a file line by line and yields each individual line."""
    with open(probes_file_name) as probes_file:
        while True:
            line = probes_file.readline()
            if not line:
                break
            yield line

Step 2: Parse the probe on each line
To process the probes, you parse the JSON on each line and extract the data relevant for the playback. Note that the coordinates order is longitude, latitude in the probes.json file. This is the same order that the Location Service expects.

def parse_probes_trip_simulator(probes_iter):
    """Parses a file witch contains JSON document, one per line.
    Each line contains exactly one GPS probe. Example:
    {"properties":{"id":"RQQ-7869","time":1563123002000,"status":"idling"},"geometry":{"type":"Point","coordinates":[-86.73903753135207,36.20418779626351]}}
    The function returns the tuple (id,time,status,coordinates=(lon,lat))
    """
    for line in probes_iter:
        probe = json.loads(line)
        props = probe["properties"]
        geometry = probe["geometry"]
        yield props["id"], props["time"], props["status"], geometry["coordinates"]

Step 3: Update probe record time

The probes represent historical data. Therefore, when you playback you will need to normalize the probes recorded time to sync with the time you send the request in order to achieve the effect of vehicles moving in real-time.

This example is a single threaded playback. If the simulated playback lags behind the probe data timing, then you will be provided a warning through the code detecting the lag and outputting a warning.

The SharedStreets trip-simulator generates one probe per second. This frequency is too high for most applications, and in real-world applications you will often see frequencies of 15 to 60 seconds or even less. You must decide if you want to add another iterator for sub-sampling the data.

def update_probe_record_time(probes_iter):
    """
    Modify all timestamps to be relative to the time this function was called.
    I.e. all timestamps will be equally spaced from each other but in the future.
    """
    new_simulation_start_time_utc_ms = datetime.datetime.now().timestamp() * 1000
    simulation_start_time_ms = None
    time_delta_recording_ms = None
    for i, (_id, time_ms, status, coordinates) in enumerate(probes_iter):
        if time_delta_recording_ms is None:
            time_delta_recording_ms = new_simulation_start_time_utc_ms - time_ms
            simulation_start_time_ms = time_ms
        simulation_lag_sec = (
            (
                datetime.datetime.now().timestamp() * 1000
                - new_simulation_start_time_utc_ms
            )
            - (simulation_start_time_ms - time_ms)
        ) / 1000
        if simulation_lag_sec > 2.0 and i % 10 == 0:
            print(f"Playback lags behind by {simulation_lag_sec} seconds.")
        time_ms += time_delta_recording_ms
        yield _id, time_ms, status, coordinates

Step 4: Playback probes
In this step, pack the probes into small batches and introduce the timing element into the simulation playback. The reason for placing them in batches is explained below in step 6.

def sleep(time_elapsed_in_batch_sec, last_sleep_time_sec):
    sleep_time = max(
        0.0,
        time_elapsed_in_batch_sec
        - (datetime.datetime.now().timestamp() - last_sleep_time_sec),
    )
    time.sleep(sleep_time)
    if sleep_time > 0.0:
        last_sleep_time_sec = datetime.datetime.now().timestamp()
    return last_sleep_time_sec


def playback_probes(
    probes_iter,
    batch_size=10,
    batch_window_size_sec=2.0,
):
    """
    Replays the probes in live mode.
    The function assumes, that the probes returned by probes_iter are sorted
    in ascending order with respect to the probe timestamp.
    It will either yield batches of size 10 or smaller batches if the timeout is reached.
    """
    last_probe_record_time_sec = None
    time_elapsed_in_batch_sec = 0
    last_sleep_time_sec = datetime.datetime.now().timestamp()
    batch = []
    # Creates two second windows and puts all the probes falling into
    # those windows into a batch. If the max. batch size is reached it will yield early.
    for _id, time_ms, status, coordinates in probes_iter:
        probe_record_time_sec = time_ms / 1000
        if last_probe_record_time_sec is None:
            last_probe_record_time_sec = probe_record_time_sec
        time_to_next_probe_sec = probe_record_time_sec - last_probe_record_time_sec
        if (time_elapsed_in_batch_sec + time_to_next_probe_sec) > batch_window_size_sec:
            last_sleep_time_sec = sleep(time_elapsed_in_batch_sec, last_sleep_time_sec)
            yield batch
            batch = []
            time_elapsed_in_batch_sec = 0
        time_elapsed_in_batch_sec += time_to_next_probe_sec
        batch.append((_id, time_ms, status, coordinates))
        if len(batch) == batch_size:
            last_sleep_time_sec = sleep(time_elapsed_in_batch_sec, last_sleep_time_sec)
            yield batch
            batch = []
            time_elapsed_in_batch_sec = 0
        last_probe_record_time_sec = probe_record_time_sec
    if len(batch) > 0:
        last_sleep_time_sec = sleep(time_elapsed_in_batch_sec, last_sleep_time_sec)
        yield batch

Step 5: Create the updates for the tracker

LOCAL_TIMEZONE = (
    datetime.datetime.now(datetime.timezone(datetime.timedelta(0))).astimezone().tzinfo
)

def convert_to_tracker_updates(probes_batch_iter):
    """
    Converts batches of probes in the format (id,time_ms,state,coordinates=(lon,lat))
    into batches ready for upload to the tracker.
    """
    for batch in probes_batch_iter:
        updates = []
        for _id, time_ms, _, coordinates in batch:
            # The boto3 location service client expects a datetime object for sample time
            dt = datetime.datetime.fromtimestamp(time_ms / 1000, LOCAL_TIMEZONE)
            updates.append({"DeviceId": _id, "Position": coordinates, "SampleTime": dt})
        yield updates

Step 6: Send the updates to the tracker
In the update_tracker function, you use the batch_update_device_position function of the Amazon Location Service Tracker API. This lets you send batches of up to 10 location updates to the tracker in one request. Batching updates is much more cost-effective than sending one-by-one. You pay for each call to batch_update_device_position. Therefore, batching can lead to a 10x cost reduction.

def update_tracker(batch_iter, location_client, tracker_name):
    """
    Reads tracker updates from an iterator and uploads them to the tracker.
    """
    for update in batch_iter:
        response = location_client.batch_update_device_position(
            TrackerName=tracker_name, Updates=update
        )
        if "Errors" in response and response["Errors"]:
            for error in response["Errors"]:
                print(error["Error"]["Message"])

Step 7: Putting it all together
The follow code is the main section that glues every part together. When using this, make sure to replace the variables probes_file_name and tracker_name with the actual probes file location and the name of the tracker created earlier.

if __name__ == "__main__":
    location_client = boto3.client("location")
    probes_file_name = "probes.json"
    tracker_name = "my-tracker"
    iterator = iter_probes_file(probes_file_name)
    iterator = parse_probes_trip_simulator(iterator)
    iterator = update_probe_record_time(iterator)
    iterator = playback_probes(iterator)
    iterator = convert_to_tracker_updates(iterator)
    update_tracker(
        iterator, location_client=location_client, tracker_name=tracker_name
    )

Paste all of the code listed in steps 1 to 7 into a file called trip_playback.py, then execute

python3 trip_playback.py

This will start the playback process.

Step 8: (Optional) Tracking a device’s position updates
Once the playback is running, verify that the updates are actually written to the tracker repeatedly querying the tracker for updates for a single device. Here, you will use the get_device_position function of the Amazon Location Service Tracker API to receive the last known device position.

import boto3
import time

def get_last_vehicle_position_from_tracker(
    device_id, tracker_name="your-tracker", client=boto3.client("location")
):
    response = client.get_device_position(DeviceId=device_id, TrackerName=tracker_name)
    if response["ResponseMetadata"]["HTTPStatusCode"] != 200:
        print(str(response))
    else:
        lon = response["Position"][0]
        lat = response["Position"][1]
        return lon, lat, response["SampleTime"]
        
if __name__ == "__main__":   
    device_id = "my-device"     
    tracker_name = "my-tracker"
    while True:
        lon, lat, sample_time = get_last_vehicle_position_from_tracker(
            device_id=device_id, tracker_name=tracker_name
        )
        print(f"{lon}, {lat}, {sample_time}")
        time.sleep(10)

In the example above, you must replace the tracker_name with the name of the tracker created earlier and the device_id with the ID of one of the simulation vehicles. You can find the vehicle IDs in the probes.json file created by the SharedStreets trip-simulator. If you run the above code, then you should see the following output.

location probes data

AWS IoT Device Simulator

As an alternative, if you are familiar with AWS IoT, AWS has its own vehicle simulator that is part of the IoT Device Simulator solution. It lets you simulate a vehicle fleet moving on a road network. This has been described here. The simulator sends the location data to an Amazon IoT endpoint. The Amazon Location Service Developer Guide shows how to write and set-up a Lambda function to connect the IoT topic to the tracker.

The AWS IoT Device Simulator has a GUI and is a good choice for simulating a small number of vehicles. The drawback is that only a few trips are pre-packaged with the simulator and changing them is somewhat complicated. The SharedStreets Trip-simulator has much more flexibility, allowing simulations of fleets made up of a larger number of vehicles, but it has no GUI for controlling the playback or simulation.

Cleanup

You’ve created a Location Service Tracker resource. It does not incur any charges if it isn’t used. If you want to delete it, you can do so on the Amazon Location Service Tracker console.

Conclusion

This blog showed you how to use an open-source project and open-source data to generate simulated trips, as well as how to play those trips back to the Amazon Location Service Tracker. Furthermore, you have access to the AWS IoT Device Simulator, which can also be used for simulating vehicles.

Give it a try and tell us how you test your location-based applications in the comments.

About the authors

Florian Seidel

Florian is a Solutions Architect in the EMEA Automotive Team at AWS. He has worked on location based services in the automotive industry for the last three years and has many years of experience in software engineering and software architecture.

Aaron Sempf

Aaron is a Senior Partner Solutions Architect, in the Global Systems Integrators team. When not working with AWS GSI partners, he can be found coding prototypes for autonomous robots, IoT devices, and distributed solutions.

Field Notes: Build a Cross-Validation Machine Learning Model Pipeline at Scale with Amazon SageMaker

Post Syndicated from Wei Teh original https://aws.amazon.com/blogs/architecture/field-notes-build-a-cross-validation-machine-learning-model-pipeline-at-scale-with-amazon-sagemaker/

When building a machine learning algorithm, such as a regression or classification algorithm, a common goal is to produce a generalized model. This is so that it performs well on new data that the model has not seen before. Overfitting and underfitting are two fundamental causes of poor performance for machine learning models. A model is overfitted when it performs well on known data, but generalizes poorly on new data. However, an underfit model performs poorly on both trained and new data. A reliable model validation technique helps provide better assessment for predicting model performance in practice, and provides insight for training models to achieve the best accuracy.

Cross-validation is a standard model validation technique commonly used for assessing performance of machine learning algorithms. In general, it works by first sampling the dataset into groups of similar sizes, where each group contains a subset of data dedicated for training and model evaluation. After the data has been grouped, a machine learning algorithm will fit and score a model using the data in each group independently. The final score of the model is defined by the average score across all the trained models for performance metric representation.

There are few cross-validation methods commonly used, including k-fold, stratified k-fold, and leave-p-out, to name a few. Although there are well-defined data science frameworks that can help simplify cross-validation processes, such as Python scikit-learn library, these frameworks are designed to work in a monolithic, single compute environment. When it comes to training machine learning algorithms with large volume of data, these frameworks become bottlenecked with limited scalability and reliability.

In this blog post, we are going to walk through the steps for building a highly scalable, high-accuracy, machine learning pipeline, with the k-fold cross-validation method, using Amazon Simple Storage Service (Amazon S3), Amazon SageMaker Pipelines, SageMaker automatic model tuning, and SageMaker training at scale.

Overview of solution

To operate the k-fold cross validation training pipeline at scale, we built an end to end machine learning pipeline using SageMaker native features. This solution implements the k-fold data processing, model training, and model selection processes as individual components to maximize parallellism. The pipeline is orchestrated through SageMaker Pipelines in distributed manner to achieve scalability and performance efficiency. Let’s dive into the high-level architecture of the solution in the following section.

Figure 1. Solution architecture

Figure 1. Solution architecture

The overall solution architecture is shown in Figure 1. There are four main building blocks in the k-fold cross-validation model pipeline:

  1. Preprocessing – Sample and split the entire dataset into k groups.
  2. Model training – Fit the SageMaker training jobs in parallel with hyperparameters optimized through the SageMaker automatic model tuning job.
  3. Model selection – Fit a final model, using the best hyperparameters obtained in step 2, with the entire dataset.
  4. Model registration – Register the final model with SageMaker Model Registry, for model lifecycle management and deployment.

The final output from the pipeline is a model that represents best performance and accuracy for the given dataset. The pipeline can be orchestrated easily using a workflow management tool, such as Pipelines.

Amazon SageMaker is a fully managed service that enables data scientists and developers to quickly develop, train, tune, and deploy machine learning quickly and at scale. When it comes to choosing the right machine learning and data processing frameworks to solve problems, SageMaker gives you the flexibility to use prebuilt containers bundled with the supported common machine learning frameworks—such as Tensorflow, Pytorch, and MxNet—or to bring your own container images with custom scripts and libraries that fit your use cases to train on the highly available SageMaker model training environment. Additionally, Pipelines enables users to develop complete machine learning workflows using python SDK, and manage these workflows in SageMaker Studio.

For simplicity, we will use the public Iris flower data as the train and test dataset to build a multivariate classification model using linear algorithm (SVM). The pipeline architecture is agnostic to the data and model; hence, it can be modified to adopt a different dataset or algorithm.

Prerequisites

To deploy the solution, you require the following:

  • SageMaker Studio
  • A Command Line (Terminal) that supports building Docker images (or instance, AWS Cloud9)

Solution walkthrough

In this section, we are going to walk through the steps to create a cross-validation model training pipeline using Pipelines. The main components are as follows.

  1. Pipeline parameters
    Pipelines parameters are introduced as variables that allow the predefined values to be overridden at runtime. Pipelines supports the following parameters types: String, Integer, and Float (expressed as ParameterString, ParameterInteger, and ParameterFloat). The following are some examples of the parameters used in the cross-validation model training pipeline:
    • K-Fold – Value of k to be used in k-fold cross-validation
    • ProcessingInstanceCount – Number of instances for SageMaker processing job
    • ProcessingInstanceType – Instance type used for SageMaker processing job
    • TrainingInstanceType – Instance type used for SageMaker training job
    • TrainingInstanceCount – Number of instances for SageMaker training job
  1. Preprocessing

In this step, the original dataset is split into k equal-sized samples. One of the k samples is retained as the validation data for model evaluation, with the remaining k-1 samples to be used as training data. This process is repeated k times, with each of the k samples used as the validation set only one time. The k sample collections are uploaded to an S3 bucket, with the prefix corresponding to an index (0 – k-1) to be identified as the input path to the specified training jobs in the next step of the pipeline. The cross-validation split is submitted as a SageMaker processing job orchestrated through the Pipelines processing step. The processing flow is shown in Figure 2.

Figure 2. K-fold cross-validation: original data is split into k equal-sized samples uploaded to S3 bucket

Figure 2. K-fold cross-validation: original data is split into k equal-sized samples uploaded to S3 bucket

The following code snippet splits the k-fold dataset in the preprocessing script:

def save_kfold_datasets(X, y, k):
    """ Splits the datasets (X,y) k folds and saves the output from 
    each fold into separate directories.

    Args:
        X : numpy array represents the features
        y : numpy array represetns the target
        k : int value represents the number of folds to split the given datasets
    """

    # Shuffles and Split dataset into k folds. 
    kf = KFold(n_splits=k, random_state=23, shuffle=True)

    fold_idx = 0
    for train_index, test_index in kf.split(X, y=y, groups=None):    
       X_train, X_test = X[train_index], X[test_index]
       y_train, y_test = y[train_index], y[test_index]
       os.makedirs(f'{base_dir}/train/{fold_idx}', exist_ok=True)
       np.savetxt(f'{base_dir}/train/{fold_idx}/train_x.csv', X_train, delimiter=',')
       np.savetxt(f'{base_dir}/train/{fold_idx}/train_y.csv', y_train, delimiter=',')

       os.makedirs(f'{base_dir}/test/{fold_idx}', exist_ok=True)
       np.savetxt(f'{base_dir}/test/{fold_idx}/test_x.csv', X_test, delimiter=',')
       np.savetxt(f'{base_dir}/test/{fold_idx}/test_y.csv', y_test, delimiter=',')
       fold_idx += 1
  1.  Cross-validation training with SageMaker automatic model tuning

In a typical cross-validation training scenario, a chosen algorithm is trained for k times with specific training and a validation dataset sampled through the k-fold technique, mentioned in the previous step. Traditionally, the cross-validation model training process is performed sequentially on the same server. This method is inefficient and doesn’t scale well for models with large volumes of data. Because all the samples are uploaded to an S3 bucket, we can now run k training jobs in parallel. Each training job will consume input samples in the specified bucket location correspond to the index (ranged between 0 – k-1) given to the training job. Additionally, the hyperparameter values must be the same for all k jobs because cross validation estimates the true out-of-sample performance of a model trained with this specific set of hyperparameters.

Although the cross-validation technique helps generalize the models, hyperparameter tuning for the model is typically performed manually. In this blog post, we are going to take a heuristic approach of finding the most optimized hyperparameters using SageMaker automatic model tuning.

We start by defining a training script that accepts the hyperparameters as input for the specified model algorithm, and then implement the model training and evaluation steps.

The steps involved in the training script are summarized as follows:

    1. Parse hyperparameters from the input.
    2. Fit the model using the parsed hyperparameters.
    3. Evaluate model performance (score).
    4. Save the trained model.
if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-c', '--c', type=float, default=1.0)
    parser.add_argument('--gamma', type=float)
    parser.add_argument('--kernel', type=str)
    # Sagemaker specific arguments. Defaults are set in the environment variables.
    parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
    parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
    parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
    parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST'))
    args = parser.parse_args()
    model = train(train=args.train, test=args.test)
    evaluate(test=args.test, model=model)
    dump(model, os.path.join(args.model_dir, "model.joblib"))

Next, we create a python script that performs cross-validation model training by submitting k SageMaker training jobs in parallel with given hyperparameters. Additionally, the script monitors the progress of the training jobs, and calculates the objective metrics by averaging the scores across the completed jobs.

Now we create a python script that uses a SageMaker automatic model tuning job to find the optimal hyperparameters for the trained models. The hyperparameter tuner works by running a specified number of training jobs using the ranges of hyperparameters specified. The number of training jobs and ranges of hyperparameters are given in the input parameter to the script. After the tuning job completes, the objective metrics, as well as the hyperparameters from the best cross-validation model training job, are captured, formatted in JSON format, respectively, to be used in the next steps of the workflow. Figure 3 illustrates cross-validation training with automatic model tuning.

Figure 3. In cross-validation training step, a SageMaker HyperparameterTuner job invokes n training jobs. The metrics and hyperparameters are captured for downstream processes.

Figure 3. In cross-validation training step, a SageMaker HyperparameterTuner job invokes n training jobs. The metrics and hyperparameters are captured for downstream processes.

Finally, the training and cross-validation scripts are packaged and built as a custom container image, available for the SageMaker automatic model tuning job for submission. The following code snippet is for building the custom image:

FROM python:3.7
RUN apt-get update && pip install sagemaker boto3 numpy sagemaker-training
COPY cv.py /opt/ml/code/train.py
COPY scikit_learn_iris.py /opt/ml/code/scikit_learn_iris.py
ENV SAGEMAKER_PROGRAM train.py
  1. Model evaluation
    The objective metrics in the cross-validation training and tuning steps define the model quality. To evaluate the model performance, we created a conditional step that compares the metrics against a baseline to determine the next step in the workflow. The following code snippet illustrates the conditional step in detail. Specifically, this step first extracts the objective metrics based on the evaluation report uploaded in previous step, and then compares the value with baseline_model_objective_value provided in the pipeline job. The workflow continues if the model objective metric is greater than or equal to the baseline value, and stops otherwise.
from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo
from sagemaker.workflow.condition_step import (
    ConditionStep,
    JsonGet,
)
cond_gte = ConditionGreaterThanOrEqualTo(
    left=JsonGet(
        step=step_cv_train_hpo,
        property_file=evaluation_report,
        json_path="multiclass_classification_metrics.accuracy.value",
    ),
    right=baseline_model_objective_value,
)
step_cond = ConditionStep(
    name="ModelEvaluationStep",
    conditions=[cond_gte],
    if_steps=[step_model_selection, step_register_model],
    else_steps=[],
)
  1. Model Selection
    At this stage of the pipeline, we’ve completed cross-validation and hyperparameter optimization steps to identify the best performing model trained with the specific hyperparameter values. In this step, we are going to fit a model using the same algorithm used in cross-validation training by providing the entire dataset and the hyperparameters from the best model. The trained model will be used for serving predictions for downstream applications. The following code snippet illustrates a Pipelines training step for model selection:
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep
from sagemaker.sklearn.estimator import SKLearn
sklearn_estimator = SKLearn("scikit_learn_iris.py", 
                           framework_version=framework_version, 
                           instance_type=training_instance_type,
                           py_version='py3', 
                           source_dir="code",
                           output_path=s3_bucket_base_path_output,
                           role=role)
step_model_selection = TrainingStep(
    name="ModelSelectionStep",
    estimator=sklearn_estimator,
    inputs={
        "train": TrainingInput(
            s3_data=f'{step_process.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"]}/all',
            content_type="text/csv"
        ),
        "jobinfo": TrainingInput(
            s3_data=f"{s3_bucket_base_path_jobinfo}",
            content_type="application/json"
        )
    }
)
  1. Model registration
    Because the cross-validation model training pipeline evolves, it’s important to have a mechanism for managing the version of model artifacts over time, so that the team responsible for the project can manage the model lifecycle, including track, deploy, or rollback a model based on the version. Building your own model registry, with lifecycle management capabilities, can be complicated and challenging to maintain and operate. SageMaker Model Registry simplifies model lifecycle management by enabling model catalog, versioning, metrics association, model approval workflow, and model deployment automation.

In the final step of the pipeline, we are going to register the trained model with Model Registry by associating model objective metrics, the model artifact location on S3 bucket, the estimator object used in the model selection step, model training and inference metadata, and approval status. The following code snippet illustrates the model registry step using ModelMetrics and RegisterModel.

from sagemaker.model_metrics import MetricsSource, ModelMetrics
from sagemaker.workflow.step_collections import RegisterModel
model_metrics = ModelMetrics(
    model_statistics=MetricsSource(
        s3_uri="{}/evaluation.json".format(
            step_cv_train_hpo.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"]
        ),
        content_type="application/json",
    )
)
step_register_model = RegisterModel(
    name="RegisterModelStep",
    estimator=sklearn_estimator,
    model_data=step_model_selection.properties.ModelArtifacts.S3ModelArtifacts,
    content_types=["text/csv"],
    response_types=["text/csv"],
    inference_instances=["ml.t2.medium", "ml.m5.xlarge"],
    transform_instances=["ml.m5.xlarge"],
    model_package_group_name=model_package_group_name,
    approval_status=model_approval_status,
    model_metrics=model_metrics,

Figure 4 shows a model version registered in SageMaker Model Registry upon a successful pipeline job through Studio.

Figure 4. Model version registered successfully in SageMaker

  1. Putting everything together
    Now that we’ve defined a cross-validation training pipeline, we can track, visualize, and manage the pipeline job directly from within Studio. The following code snippet and Figure 5 depicts our pipeline definition:
from sagemaker.workflow.pipeline_experiment_config import PipelineExperimentConfig
from sagemaker.workflow.execution_variables import ExecutionVariables
pipeline_name = f"CrossValidationTrainingPipeline"
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        processing_instance_count,
        processing_instance_type,
        training_instance_type,
        training_instance_count,
        inference_instance_type,
        hpo_tuner_instance_type,
        model_approval_status,
        role,
        default_bucket,
        baseline_model_objective_value,
        bucket_prefix,
        image_uri,
        k,
        max_jobs,
        max_parallel_jobs,
        min_c,
        max_c,
        min_gamma,
        max_gamma,
        gamma_scaling_type
    ],    
    pipeline_experiment_config=PipelineExperimentConfig(
      ExecutionVariables.PIPELINE_NAME,
      ExecutionVariables.PIPELINE_EXECUTION_ID),
    steps=[step_process, step_cv_train_hpo, step_cond],
Figure 5. SageMaker Pipelines definition shown in SageMaker Studio

Figure 5. SageMaker Pipelines definition shown in SageMaker Studio

Finally, to kick off the pipeline, invoke the pipeline.start() function, with optional parameters specific to the job run:

execution = pipeline.start(
    parameters=dict(
        BaselineModelObjectiveValue=0.8,
        MinimumC=0,
        MaximumC=1
    ))

You can track the pipeline job from within Studio, or use SageMaker application programming interfaces (APIs). Figure 6 shows a screenshot of a pipeline job in progress from Studio.

Figure 6. SageMaker Pipelines job progress shown in SageMaker Studio

Figure 6. SageMaker Pipelines job progress shown in SageMaker Studio

Conclusion

In this blog post, we showed you an architecture that orchestrates a complete workflow for cross-validation model training. We implemented the workflow using SageMaker Pipelines that incorporates preprocessing, hyperparameter tuning, model evaluation, model selection, and model registration. The solution addresses the common challenge of orchestrating cross-validation model pipeline at scale. The entire pipeline implementation, including a jupyter notebook that defines the pipeline, a Dockerfile and python scripts described in this blog post, can be found in the GitHub project.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Building Modern Applications with Amazon EKS on Amazon Outposts

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/building-modern-applications-with-amazon-eks-on-amazon-outposts/

This post is written by Brad Kirby, Principal Outposts Specialist, and Chris Lunsford, Senior Outposts SA. 

Customers are modernizing applications by deconstructing monolithic architectures and migrating application components into container–based, service-oriented, and microservices architectures. Modern applications improve scalability, reliability, and development efficiency by allowing services to be owned by smaller, more focused teams.

This post shows how you can combine Amazon Elastic Kubernetes Service (Amazon EKS) with AWS Outposts to deploy managed Kubernetes application environments on-premises alongside your existing data and applications.

For a brief introduction to the subject matter, the watch this video, which demonstrates:

  • The benefits of application modernization
  • How containers are an ideal enabling technology for microservices architectures
  • How AWS Outposts combined with Amazon container services enables you to unwind complex service interdependencies and modernize on-premises applications with low latency, local data processing, and data residency requirements

Understanding the Amazon EKS on AWS Outposts architecture

Amazon EKS

Many organizations chose Kubernetes as their container orchestration platform because of its openness, flexibility, and a growing pool of Kubernetes literate IT professionals. Amazon EKS enables you to run Kubernetes clusters on AWS without needing to install and manage the Kubernetes control plane. The control plane, including the API servers, scheduler, and cluster store services, runs within a managed environment in the AWS Region. Kubernetes clients (like kubectl) and cluster worker nodes communicate with the managed control plane via public and private EKS service endpoints.

AWS Outposts

The AWS Outposts service delivers AWS infrastructure and services to on-premises locations from the AWS Global Cloud Infrastructure. An Outpost functions as an extension of the Availability Zone (AZ) where it is anchored. AWS operates, monitors, and manages AWS Outposts infrastructure as part of its parent Region. Each Outpost connects back to its anchor AZ via a Service Link connection (a set of encrypted VPN tunnels). AWS Outposts extend Virtual Private Cloud (VPC) environments from the Region to on-premises locations and enables you to deploy VPC subnets to Outposts in your data center and co-location spaces. The Outposts Local Gateway (LGW) routes traffic between Outpost VPC subnets and the on-premises network.

Amazon EKS on AWS Outposts

You deploy Amazon EKS worker nodes to Outposts using self-managed node groups. The worker nodes run on Outposts and register with the Kubernetes control plane in the AWS Region. The worker nodes, and containers running on the nodes, can communicate with AWS services and resources running on the Outpost and in the region (via the Service Link) and with on-premises networks (via the Local Gateway).

 

You use the same AWS and Kubernetes tools and APIs to work with EKS on Outposts nodes that you use to work with EKS nodes in the Region. You can use eksctl, the AWS Management Console, AWS CLI, or infrastructure as code (IaC) tools like AWS CloudFormation or HashiCorp Terraform to create self-managed node groups on AWS Outposts.

Amazon EKS self-managed node groups on AWS Outposts

Like many customers, you might use managed node groups when you deploy EKS worker nodes in your Region, and you may be wondering, “what are self-managed node groups?”

Self-managed node groups, like managed node groups, use standard Amazon Elastic Compute Cloud (Amazon EC2) instances in EC2 Auto Scaling groups to deploy, scale-up, and scale-down EKS worker nodes using Amazon EKS optimized Amazon Machine Images (AMIs). Amazon configures the EKS optimized AMIs to work with the EKS service. The images include Docker, kubelet, and the AWS IAM Authenticator. The AMIs also contain a specialized bootstrap script /etc/eks/bootstrap.sh that allows worker nodes to discover and connect to your cluster control plane and add Kubernetes labels that identify the nodes in the cluster.

What makes the node groups self-managed? Managed node groups have additional features that simplify updating nodes in the node group. With self-managed nodes, you must implement a process to update or replace your node group when you want to update the nodes to use a new Amazon EKS optimized AMI.

You create self-managed node groups on AWS Outposts using the same process and resources that you use to deploy EC2 instances using EC2 Auto Scaling groups. All instances in a self-managed node group must:

  • Be the same Instance type
  • Be running the same Amazon Machine Image (AMI)
  • Use the same Amazon EKS node IAM role
  • Be tagged with the io/cluster/<cluster-name>=owned tag

Additionally, to deploy on AWS Outposts, instances must:

  • Use encrypted EBS volumes
  • Launch in Outposts subnets

Kubernetes authenticates all API requests to a cluster. You must configure an EKS cluster to allow nodes from a self-managed node group to join the cluster. Self-managed nodes use the node IAM role to authenticate with an EKS cluster. Amazon EKS clusters use the AWS IAM Authenticator for Kubernetes to authenticate requests and Kubernetes native Role Based Access Control (RBAC) to authorize requests. To enable self-managed nodes to register with a cluster, configure the AWS IAM Authenticator to recognize the node IAM role and assign the role to the system:bootstrappers and system:nodes RBAC groups.

In the following tutorial, we take you through the steps required to deploy EKS worker nodes on an Outpost and register those nodes with an Amazon EKS cluster running in the Region. We created a sample Terraform module aws-eks-self-managed-node-group to help you get started quickly. If you are interested, you can dive into the module sample code to see the detailed configurations for self-managed node groups.

Deploying Amazon EKS on AWS Outposts (Terraform)

To deploy Amazon EKS nodes on an AWS Outposts deployment, you must complete two high-level steps:

Step 1: Create a self-managed node group

Step 2: Configure Kubernetes to allow the nodes to register

We use Terraform in this tutorial to provide visibility (through the sample code) into the details of the configurations required to deploy self-managed node groups on AWS Outposts.

Prerequisites

To follow along with this tutorial, you should have the following prerequisites:

  • An AWS account.
  • An operational AWS Outposts deployment (Outpost) associated with your AWS account.
  • AWS Command Line Interface (CLI) version 1.25 or later installed and configured on your workstation.
  • HashiCorp Terraform version 14.6 or later installed on your workstation.
  • Familiarity with Terraform HashiCorp Configuration Language (HCL) syntax and experience using Terraform to deploy AWS resources.
  • An existing VPC that meets the requirements for an Amazon EKS cluster. For more information, see Cluster VPC considerations. You can use the Getting started with Amazon EKS guide to walk you through creating a VPC that meets the requirements.
  • An existing Amazon EKS cluster. You can use the Creating an Amazon EKS cluster guide to walk you through creating the cluster.
  • Tip: We recommend creating and using a new VPC and Amazon EKS cluster to complete this tutorial. Procedures like modifying the aws-auth ConfigMap on the cluster may impact other nodes, users, and workloads on the cluster. Using a new VPC and Amazon EKS cluster will help minimize the risk of impacting other applications as you complete the tutorial.
  • Note: Do not reference subnets deployed on AWS Outposts when creating Amazon EKS clusters in the Region. The Amazon EKS cluster control plane runs in the Region and attaches to VPC subnets in the Region availability zones.
  • A symmetric KMS key to encrypt the EBS volumes of the EKS worker nodes. You can use the alias/aws/ebs AWS managed key for this prerequisite.

Using the sample code

The source code for the amazon-eks-self-managed-node-group Terraform module is available in AWS-Samples on GitHub.

Setup

To follow along with this tutorial, complete the following steps to setup your workstation:

  1. Open your terminal.
  2. Make a new directory to hold your EKS on Outposts Terraform configurations.
  3. Change to the new directory.
  4. Create a new file named providers.tf with the following contents:
    terraform {
      required_providers {
        aws = {
          source  = "hashicorp/aws"
          version = "~> 3.27"
        }
    
        kubernetes = {
          source  = "hashicorp/kubernetes"
          version = "~> 1.13.3"
        }
      }
    }

  5. Keep your terminal open and do all the work in this tutorial from this directory.

Step 1: Create a self-managed node group

To create a self-managed node group on your Outpost

  1. Create a new Terraform configuration file named self-managed-node-group.tf.
  2. Configure the aws Terraform provider with your AWS CLI profile and Outpost parent Region:
    provider "aws" {
      region  = "us-west-2"
      profile = "default"
    }
    

  3. Configure the aws-eks-self-managed-node-group module with the following (minimum) arguments:
    • eks_cluster_name the name of your EKS cluster
    • instance_type an instance type supported on your AWS Outposts deployment
    • desired_capacity, min_size, and max_size as desired to control the number of nodes in your node group (ensure that your Outpost has sufficient resources available to run the desired number nodes of the specified instance type)
    • subnets the subnet IDs of the Outpost subnets where you want the nodes deployed
    • (Optional) Kubernetes node_labels to apply to the nodes
    • Allow ebs_encrypted and configure the ebs_kms_key_arn with KMS key you want to use to encrypt the nodes’ EBS volumes (required for Outposts deployments)

Example:

module "eks_self_managed_node_group" {
  source = "github.com/aws-samples/amazon-eks-self-managed-node-group"

  eks_cluster_name = "cmluns-eks-cluster"
  instance_type    = "m5.2xlarge"
  desired_capacity = 1
  min_size         = 1
  max_size         = 1
  subnets          = ["subnet-0afb721a5cc5bd01f"]

  node_labels = {
    "node.kubernetes.io/outpost"    = "op-0d4579457ff2dc345"
    "node.kubernetes.io/node-group" = "node-group-a"
  }

  ebs_encrypted   = true
  ebs_kms_key_arn = "arn:aws:kms:us-west-2:799838960553:key/0e8f15cc-d3fc-4da4-ae03-5fadf45cc0fb"
}

You may configure other optional arguments as appropriate for your use case – see the module README for details.

Step 2: Configure Kubernetes to allow the nodes to register

Use the following procedures to configure Terraform to manage the AWS IAM Authenticator (aws-auth) ConfigMap on the cluster. This adds the node-group IAM role to the IAM authenticator and Kubernetes RBAC groups.

Configure the Terraform Kubernetes Provider to allow Terraform to interact with the Kubernetes control plane.

Note: If you add a node group to a cluster with existing node groups, mapped IAM roles, or mapped IAM users, the aws-auth ConfigMap may already be configured on your cluster. If the ConfigMap exists, you must download, edit, and replace the ConfigMap on the cluster using kubectl. We do not provide a procedure for this operation as it may affect workloads running on your cluster. Please see the section Managing users or IAM roles for your cluster in the Amazon EKS User Guide for more information.

To check if your cluster has the aws-auth ConfigMap configured

  1. Run the aws eks --region <region> update-kubeconfig --name <cluster-name> command to update your workstation’s ~/.kube/config with the information needed to connect to your cluster, substituting your <region> and <cluster-name>.
❯ aws eks --region us-west-2 update-kubeconfig --name cmluns-eks-cluster
Updated context arn:aws:eks:us-west-2:799838960553:cluster/cmluns-eks-cluster in ~/.kube/config
  1. Run the kubectl describe configmap -n kube-system aws-auth
  • If you receive an error stating, Error from server (NotFound): configmaps "aws-auth" not found, then proceed with the following procedures to use Terraform to apply the ConfigMap.
❯ kubectl describe configmap -n kube-system aws-auth
Error from server (NotFound): configmaps "aws-auth" not found
  • If you do not receive the preceding error, and kubectl returns an aws-auth ConfigMap, then you should not use Terraform to manage the ConfigMap.

To configure the Terraform Kubernetes provider

  1. Create a new Terraform configuration file named aws-auth-config-map.tf.
  2. Add the aws_eks_cluster Terraform data source, and configure it to look up your cluster by name.
data "aws_eks_cluster" "selected" {
  name = "cmluns-eks-cluster"
}
  1. Add the aws_eks_cluster_auth Terraform data source, and configure it to look up your cluster by name.
data "aws_eks_cluster_auth" "selected" {
  name = "cmluns-eks-cluster"
}
  1. Configure the kubernetes provider with your cluster host (endpoint address), cluster_ca_certificate, and the token from the aws_eks_cluster and aws_eks_cluster_auth data sources.
provider "kubernetes" {
  load_config_file       = false
  host                   = data.aws_eks_cluster.selected.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.selected.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.selected.token
}

To configure the AWS IAM Authenticator on the cluster

  1. Open the aws-auth-config-map.tf Terraform configuration file you created in the last step.
  2. Add a kubernetes_config_map Terraform resource to add the aws-auth ConfigMap to the kube-system
  3. Configure the data argument with a YAML format heredoc string that adds the Amazon Resource Name (ARN) for your IAM role to the mapRoles
resource "kubernetes_config_map" "aws_auth" {
  metadata {
    name      = "aws-auth"
    namespace = "kube-system"
  }

  data = {
    mapRoles = <<-EOT
      - rolearn: ${module.eks_self_managed_node_group.role_arn}
        username: system:node:{{EC2PrivateDNSName}}
        groups:
          - system:bootstrappers
          - system:nodes
    EOT
  }
}

Apply the configuration and verify node registration

You have created the Terraform configurations to deploy an EKS self-managed node group to your Outpost and configure Kubernetes to authenticate the nodes. Now you apply the configurations and verify that the nodes register with your Kubernetes cluster.

To apply the Terraform configurations

  1. Run the terraform init command to download the providers, self-managed node group module, and prepare the directory for use.
  2. Run the terraform apply
  3. Review the resources that will be created.
  4. Enter yes to confirm that you want to create the resources.
    ❯ terraform apply
    
    Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
      + create
    
    Terraform will perform the following actions:
    
    <-- Output omitted for brevity -->
    
    Plan: 9 to add, 0 to change, 0 to destroy.
    
    Do you want to perform these actions?
      Terraform will perform the actions described above.
      Only 'yes' will be accepted to approve.
    
      Enter a value: yes
    

  5. Press Enter.
    <-- Output omitted for brevity -->
    
    Apply complete! Resources: 9 added, 0 changed, 0 destroyed.
    

  6. Run the aws eks --region <region> update-kubeconfig --name <cluster-name> command to update your workstation’s ~/.kube/config with the information required to connect to your cluster – substituting your <region> and <cluster-name>.
    ❯ aws eks --region us-west-2 update-kubeconfig --name cmluns-eks-cluster
    Updated context arn:aws:eks:us-west-2:799838960553:cluster/cmluns-eks-cluster in ~/.kube/config
    

  7. Run the kubectl get nodes command to view the status of your cluster nodes.
  8. Verify your new nodes show up in the list with a STATUS of Ready.
    ❯ kubectl get nodes
    NAME                                          STATUS   ROLES    AGE   VERSION
    ip-10-253-46-119.us-west-2.compute.internal   Ready    <none>   37s   v1.18.9-eks-d1db3c
    

  9. (Optional) If your nodes are registering, and their STATUS does not show Ready, run the kubectl get nodes --watch command to watch them come online.
  10. (Optional) Run the kubectl get nodes --show-labels command to view the node list with the labels assigned to each node. The nodes created by your AWS Outposts node group will have the labels you assigned in Step 1.

To verify the Kubernetes system pods deploy on the worker nodes

  1. Run the kubectl get pods --namespace kube-system
  2. Verify that each pod shows READY 1/1 with a STATUS of Running.
❯ kubectl get pods --namespace kube-system
NAME                       READY   STATUS    RESTARTS   AGE
aws-node-84xlc             1/1     Running   0          2m16s
coredns-559b5db75d-jxqbp   1/1     Running   0          5m36s
coredns-559b5db75d-vdccc   1/1     Running   0          5m36s
kube-proxy-fspw4           1/1     Running   0          2m16s

The nodes in your AWS Outposts node group should be registered with the EKS control plane in the Region and the Kubernetes system pods should successfully deploy on the new nodes.

Clean up

One of the nice things about using infrastructure as code tools like Terraform is that they automate stack creation and deletion. Use the following procedure to remove the resources you created in this tutorial.

To clean up the resources created in this tutorial

  1. Run the terraform destroy
  2. Review the resources that will be destroyed.
  3. Enter yes to confirm that you want to destroy the resources.
❯ terraform destroy

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:

Plan: 0 to add, 0 to change, 9 to destroy.

Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes
  1. Press Enter.

Destroy complete! Resources: 9 destroyed.

  1. Clean up any resources you created for the prerequisites.

Conclusion

In this post, we discussed how containers sit at the heart of the application modernization process, making it easy to adopt microservices architectures that improve application scalability, availability, and performance. We also outlined the challenges associated with modernizing on-premises applications with low latency, local data processing, and data residency requirements. In these cases, AWS Outposts brings cloud services, like Amazon EKS, close to the workloads being refactored. We also walked you through deploying Amazon EKS worker nodes on-premises on AWS Outposts.

Now that you have deployed Amazon EKS worker nodes on in a test VPC using Terraform, you should adapt the Terraform module(s) and resources to prepare and deploy your production Kubernetes clusters on-premises on Outposts. If you don’t have an Outpost and you want to get started modernizing your on-premises applications with Amazon EKS and AWS Outposts, contact your local AWS account Solutions Architect (SA) to help you get started.

Field Notes: How to Scale Your Networks on Amazon Web Services

Post Syndicated from Androski Spicer original https://aws.amazon.com/blogs/architecture/field-notes-how-to-scale-your-networks-on-amazon-web-services/

As AWS adoption increases throughout an organization, the number of networks and virtual private clouds (VPCs) to support them also increases. Customers can see growth upwards of tens, hundreds, or in the case of the enterprise, thousands of VPCs.

Generally, this increase in VPCs is driven by the need to:

  • Simplify routing, connectivity, and isolation boundaries
  • Reduce network infrastructure cost
  • Reduce management overhead

Overview of solution

This blog post discusses the guidance customers require to achieve their desired outcomes. Guidance is provided through a series of real-world scenarios customers encounter on their journey to building a well-architected network environment on AWS. These challenges range from the need to centralize networking resources, to reduce complexity and cost, to implementing security techniques that help workloads to meet industry and customer specific operational compliance.

The scenarios presented here form the foundation and starting point from which the intended guidance is provided. These scenarios start as simple, but gradually increase in complexity. Each scenario tackles different questions customers ask AWS solutions architects, service teams, professional services, and other AWS professionals, on a daily basis.

Some of these questions are:

  • What does centralized DNS look like on AWS, and how should I approach and implement it?
  • How do I reduce the cost and complexity associated with Amazon Virtual Private Cloud (Amazon VPC) interface endpoints for AWS services by centralizing that is spread across many AWS accounts?
  • What does centralized packet inspection look like on AWS, and how should we approach it?

This blog post will answer these questions, and more.

Prerequisites

This blog post assumes that the reader has some understanding of AWS networking basics outlined in the blog post One to Many: Evolving VPC Design. It also assumes that the reader understands industry-wide networking basics.

Simplify routing, connectivity, and isolation boundaries

Simplification in routing starts with selecting the correct layer 3 technology. In the past, customers used a combination of VPC peering, Virtual Gateway configurations, and the Transit VPC Solution to achieve inter–VPC routing, and routing to on-premises resources. These solutions presented challenges in configuration and management complexity, as well as security and scaling.

To solve these challenges, AWS introduced AWS Transit Gateway. Transit Gateway is a regional virtual router that customers can attach their VPCs, site-to-site virtual private networks (VPNs), Transit Gateway Connect, AWS Direct Connect gateways, and cross-region transit gateway peering connections, and configure routing between them. Transit Gateway scales up to 5,000 attachments; so, a customer can start with one VPC attachment, and scale up to thousands of attachments across thousands of accounts. Each VPC, Direct Connect gateway, and peer transit gateway connection receives up to 50 Gbps of bandwidth.

Routing happens at layer 3 through a transit gateway. Transit Gateway come with a default route table to which all default attachment association happens. If route propagation and association is enabled at transit gateway creation time, AWS will create a transit gateway with a default route table to which attachments are automatically associated and their routes automatically propagated. This creates a network where all attachments can route to each other.

Adding VPN or Direct Connect gateway attachments to on-premises networks will allow all attached VPCs and networks to easily route to on-premises networks. Some customers require isolation boundaries between routing domains. This can be achieved with Transit Gateway.

Let’s review a use case where a customer with two spoke VPCs and a shared services VPC (shared-services-vpc-A) would like to:

  • Allow all spoke VPCs to access the shared services VPC
  • Disallow access between spoke VPCs

Figure 1. Transit Gateway Deployment

To achieve this, the customer needs to:

  1. Create a transit gateway with the name tgw-A and two route tables with the names spoke-tgw-route-table and shared-services-tgw-route-table.
    1. When creating the transit gateway, disable automatic association and propagation to the default route table.
    2. Enable equal-cost multi-path routing (ECMP) and use a unique Border Gateway Protocol (BGP) autonomous system number (ASN).
  1. Associate all spoke VPCs with the spoke-tgw-route-table.
    1. Their routes should not be propagated.
    2. Propagate their routes to the shared-services-tgw-route-table.
  1. Associate the shared services VPC with the shared-services-tgw-route-table and its routes should be propagated or statically added to the spoke-tgw-route-table.
  2. Add a default and summarized route with a next hop of the transit gateway to the shared services and spoke VPCs route table.

After successfully deploying this configuration, the customer decides to:

  1. Allow all VPCs access to on-premises resources through AWS site-to-site VPNs.
  2. Require an aggregated bandwidth of 10 Gbps across this VPN.
Figure 2. Transit Gateway hub and spoke architecture, with VPCs and multiple AWS site-to-site VPNs

Figure 2. Transit Gateway hub and spoke architecture, with VPCs and multiple AWS site-to-site VPNs

To achieve this, the customer needs to:

  1. Create four site-to-site VPNs between the transit gateway and the on-premises routers with BGP as the routing protocol.
    1. AWS site-to-site VPN has two VPN tunnels. Each tunnel has a dedicated bandwidth of 1.25 Gbps.
    2. Read more on how to configure ECMP for site-to-site VPNs.
  1. Create a third transit gateway route table with the name WAN-connections-route-table.
  2. Associate all four VPNs with the WAN-connections-route-table.
  3. Propagate the routes from the spoke and shared services VPCs to WAN-connections-route-table.
  4. Propagate VPN attachment routes to the spoke-tgw-route-table and shared-services-tgw-route-table.

Building on this progress, the customer has decided to deploy another transit gateway and shared services VPC in another AWS Region. They would like both shared service VPCs to be connected.

Transit Gateway peering connection architecture

Figure 3. Transit Gateway peering connection architecture

To accomplish these requirements, the customer needs to:

  1. Create a transit gateway with the name tgw-B in the new region.
  2. Create a transit gateway peering connection between tgw-A and tgw-B. Ensure peering requests are accepted.
  3. Statically add a route to the shared-services-tgw-route-table in region A that has the transit-gateway-peering attachment as the next for hop traffic destined to the VPC Classless Inter-Domain Routing (CIDR) range for shared-services-vpc-B. Then, in region B, add a route to the shared-services-tgw-route-table that has the transit-gateway-peering attachment as the next for hop traffic destined to the VPC CIDR range for shared-services-vpc-A.

Reduce network infrastructure cost

It is important to design your network to eliminate unnecessary complexity and management overhead, as well as cost optimization. To achieve this, use centralization. Instead of creating network infrastructure that is needed by every VPC inside each VPC, deploy these resources in a type of shared services VPC and share them throughout your entire network. This results in the creation of this infrastructure only one time, which reduces the cost and management overhead.

Some VPC components that can be centralized are network address translation (NAT) gateways, VPC interface endpoints, and AWS Network Firewall. Third-party firewalls can also be centralized.

Let’s take a look at a few use cases that build on the previous use cases.

Figure 4. Centralized interface endpoint architecture

Figure 4. Centralized interface endpoint architecture

The customer has made the decision to allow access to AWS Key Management Service (AWS KMS) and AWS Secrets Manager from their VPCs.

The customer should employ the strategy of centralizing their VPC interface endpoints to reduce the potential proliferation of cost, management overhead, and complexity that can occur when working with this VPC feature.

To centralize these endpoints, the customer should:

  1. Deploy AWS VPC interface endpoints for AWS KMS and Secrets Manager inside shared-services-vpc-A and shared-services-vpc-B.
    1. Disable each Private DNS.

Figure 5. Centralized interface endpoint step-by-step guide (Step 1)

  1. Use the AWS default DNS name for AWS KMS and Secrets Manager to create an Amazon Route 53 private hosted zone (PHZ) for each of these services. These are:
    1. kms.<region>.amazonaws.com
    2. secretsmanager.<region>.amazonaws.com
Figure 6. Centralized interface endpoint step-by-step guide (Step 2)

Figure 6. Centralized interface endpoint step-by-step guide (Step 2)

  1. Authorize each spoke VPC to associate with the PHZ in their respective region. This can be done from the AWS Command Line Interface (AWS CLI) by using the command aws route53 create-vpc-association-authorization –hosted-zone-id <hosted-zone-id> –vpc VPCRegion=<region>,VPCId=<vpc-id> –region <AWS-REGION>.
  2. Create an A record for each PHZ. In the creation process, for the Route to option, select the VPC Endpoint Alias. Add the respective VPC interface endpoint DNS hostname that is not Availability Zone specific (for example, vpce-0073b71485b9ad255-mu7cd69m.ssm.ap-south-1.vpce.amazonaws.com).
Figure 7. Centralized interface endpoint step-by-step guide (Step 3)

Figure 7. Centralized interface endpoint step-by-step guide (Step 3)

  1. Associate each spoke VPC with the available PHZs. Use the CLI command aws route53 associate-vpc-with-hosted-zone –hosted-zone-id <hosted-zone-id> –vpc VPCRegion=<region>,VPCId=<vpc-id> –region <AWS-REGION>.

This concludes the configuration for centralized VPC interface endpoints for AWS KMS and Secrets Manager. You can learn more about cross-account PHZ association configuration.

After successfully implementing centralized VPC interface endpoints, the customer has decided to centralize:

  1. Internet access.
  2. Packet inspection for East-West and North-South internet traffic using a pair of firewalls that support the Geneve protocol.

To achieve this, the customer should use the AWS Gateway Load Balancer (GWLB), Amazon VPC endpoint services, GWLB endpoints, and transit gateway route table configurations.

Figure 8. Illustrated security-egress VPC infrastructures and route table configuration

Figure 8. Illustrated security-egress VPC infrastructures and route table configuration

To accomplish these centralization requirements, the customer should create:

  1. A VPC with the name security-egress VPC.
  2. A GWLB, an autoscaling group with at least two instance of the customer’s firewall which are evenly distributed across multiple private subnets in different Availability Zones.
  3. A target group for use with the GWLB. Associate the autoscaling group with this target group.
  4. An AWS endpoint service using the GWLB as the entry point. Then create AWS interface endpoints for this endpoint service inside the same set of private subnets or create a /28 set of subnets for interface endpoints.
  5. Two AWS NAT gateways spread across two public subnets in multiple Availability Zones.
  6. A transit gateway attachment request from the security-egress VPC and ensure that:
    1. Transit gateway appliance mode is enabled for this attachment as it ensures bidirectional traffic forwarding to the same transit gateway attachments.
    2. Transit gateway–specific subnets are used to host the attachment interfaces.
  1. In the security-egress VPC, configure the route tables accordingly.
    1. Private subnet route table.
      1. Add default route to the NAT gateway.
      2. Add summarized routes with a next-hop of Transit Gateway for all networks you intend to route to that are connected to the Transit Gateway.
    1. Public subnet route table.
      1. Add default route to the internet gateway.
      2. Add summarized routes with a next-hop of the GWLB endpoints you intend to route to for all private networks.

Transit Gateway configuration

  1. Create a new transit gateway route table with the name transit-gateway-egress-route-table.
    1. Propagate all spoke and shared services VPCs routes to it.
    2. Associate the security-egress VPC with this route table.
  1. Add a default route to the spoke-tgw-route-table and shared-services-tgw-route-table that points to the security-egress VPC attachment, and remove all VPC attachment routes respectively from both route tables.
Illustrated routing configuration for the transit gateway route tables and VPC route tables

Figure 9. Illustrated routing configuration for the transit gateway route tables and VPC route tables

Illustrated North-South traffic flow from spoke VPC to the internet

Figure 10. Illustrated North-South traffic flow from spoke VPC to the internet

Figure 11. Illustrated East-West traffic flow between spoke VPC and shared services VPC

Figure 11. Illustrated East-West traffic flow between spoke VPC and shared services VPC

Conclusion

In this blog post, we went on a network architecture journey that started with a use case of routing domain isolation. This is a scenario most customers confront when getting started with Transit Gateway. Gradually, we built upon this use case and exponentially increased its complexity by exploring other real-world scenarios that customers confront when designing multiple region networks across multiple AWS accounts.

Regardless of the complexity, these use cases were accompanied by guidance that helps customers achieve a reduction in cost and complexity throughout their entire network on AWS.

When designing your networks, design for scale. Use AWS services that let you achieve scale without the complexity of managing the underlying infrastructure.

Also, simplify your network through the technique of centralizing repeatable resources. If more than one VPC requires access to the same resource, then find ways to centralize access to this resource which reduces the proliferation of these resources. DNS, packet inspection, and VPC interface endpoints are good examples of things that should be centralized.

Thank you for reading. Hopefully you found this blog post useful.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: How to Prepare Large Text Files for Processing with Amazon Translate and Amazon Comprehend

Post Syndicated from Veeresh Shringari original https://aws.amazon.com/blogs/architecture/field-notes-how-to-prepare-large-text-files-for-processing-with-amazon-translate-and-amazon-comprehend/

Biopharmaceutical manufacturing is a highly regulated industry where deviation documents are used to optimize manufacturing processes. Deviation documents in biopharmaceutical manufacturing processes are geographically diverse, spanning multiple countries and languages. The document corpus is complex, with additional requirements for complete encryption. Therefore, to reduce downtime and increase process efficiency, it is critical to automate the ingestion and understanding of deviation documents. For this workflow, a large biopharma customer needed to translate and classify documents at their manufacturing site.

The customer’s challenge included translation and classification of paragraph-sized text documents into statement types. First, the tokenizer previously used was failing for certain languages. Second, post-tokenization, big paragraphs were needed to be sliced into sizes smaller than 5,000 bytes to facilitate consumption into Amazon Translate and Amazon Comprehend. Because each sentence and paragraphs were of differing sizes, the customer needed to slice them so that each sentence and paragraph did not lose their context and meaning.

This blog post describes a solution to tokenize text documents into appropriate-sized chunks for easy consumption by Amazon Translate and Amazon Comprehend.

Overview of solution

The solution is divided into the following steps. Text data coming from the AWS Glue output is transformed and stored in Amazon Simple Storage Service (Amazon S3) in a .txt file. This transformed data is passed into the sentence tokenizer with slicing and encryption using AWS Key Management Service (AWS KMS). This data is now ready to be fed into Amazon Translate and Amazon Comprehend, and then to a Bidirectional Encoder Representations from Transformers (BERT) model for clustering. All of the models are developed and managed in Amazon SageMaker.

Prerequisites

For this walkthrough, you should have the following prerequisites:

The architecture in Figure 1 shows a complete document classification and clustering workflow running the sentence tokenizer solution (step 4) as an input to Amazon Translate and Amazon Comprehend. The complete architecture also uses AWS Glue crawlers, Amazon Athena, Amazon S3 , AWS KMS, and SageMaker.

Figure 1. Higher level architecture describing use of the tokenizer in the system

Figure 1. Higher level architecture describing use of the tokenizer in the system

Solution steps

  1. Ingest the streaming data from the daily pharma supply chain incidents from the AWS Glue crawlers and Athena-based view tables. AWS Glue is used for ETL (extract, transform, and load), while Athena helps to analyze the data in Amazon S3 for its integrity.
  2. Ingest the streaming data into Amazon S3, which is AWS KMS encrypted. This limits any unauthorized access to the secured files, as required for the healthcare domain.
  3. Enable the CloudWatch logs. CloudWatch logs help to store, monitor, and access error messages logged by SageMaker.
  4. Open the SageMaker notebook using AWS console, and navigate to the integrated development environment (IDE) with Python notebook.

Solution description

Initialize the Amazon S3 client, and enable the get_execution role.

Figure 2. Code sample to initialize Amazon S3 Client execution roles

Figure 3 shows the code for tokenizing large paragraphs into sentences. This helps to feed a sentence of 5,000 byte chunks to Amazon Translate and Amazon Comprehend. Additionally, in the regulated environment, data at rest and in transition, is encrypted using AWS KMS (using S3 IO object) before chunking into 5,000-byte size files using last-in-first-out (LIFO) process.

Figure 3. Code sample with file chunking function and AWS KMS encryption

Figure 3. Code sample with file chunking function and AWS KMS encryption

Figure 4 shows the function for writing the file chunks to objects in Amazon S3, and objects are AWS KMS encrypted.

Figure 4. Code sample for writing chunked 5,000-byte sized data to Amazon S3

Code sample

The following example code details the tokenizer and chunking tool which we subsequently run through SageMaker:
https://gitlab.aws.dev/shringv/aws-samples-aws-tokenizer-slicing-file

Cleaning up

To avoid incurring future charges, delete the resources (like S3 objects) used for the practice files after you have completed implementation of the solution.

Conclusion

In this blog post, we presented a solution which incorporates sentence-level tokenization with rules governing expected sentence size. The solution includes automation scripts to reduce bigger files into smaller chunked sizes of 5,000 bytes to facilitate Amazon Translate and Amazon Comprehend. The solution is effective for tokenizing and chunking complex environments with multi-language files. Furthermore, the solution uses file exchange security by using AWS KMS, as required by regulated industries.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.