Tag Archives: *Post Types

Best practices for managing Terraform State files in AWS CI/CD Pipeline

Post Syndicated from Arun Kumar Selvaraj original https://aws.amazon.com/blogs/devops/best-practices-for-managing-terraform-state-files-in-aws-ci-cd-pipeline/

Introduction

Today customers want to reduce manual operations for deploying and maintaining their infrastructure. The recommended method to deploy and manage infrastructure on AWS is to follow Infrastructure-As-Code (IaC) model using tools like AWS CloudFormation, AWS Cloud Development Kit (AWS CDK) or Terraform.

One of the critical components in terraform is managing the state file which keeps track of your configuration and resources. When you run terraform in an AWS CI/CD pipeline the state file has to be stored in a secured, common path to which the pipeline has access to. You need a mechanism to lock it when multiple developers in the team want to access it at the same time.

In this blog post, we will explain how to manage terraform state files in AWS, best practices on configuring them in AWS and an example of how you can manage it efficiently in your Continuous Integration pipeline in AWS when used with AWS Developer Tools such as AWS CodeCommit and AWS CodeBuild. This blog post assumes you have a basic knowledge of terraform, AWS Developer Tools and AWS CI/CD pipeline. Let’s dive in!

Challenges with handling state files

By default, the state file is stored locally where terraform runs, which is not a problem if you are a single developer working on the deployment. However if not, it is not ideal to store state files locally as you may run into following problems:

  • When working in teams or collaborative environments, multiple people need access to the state file
  • Data in the state file is stored in plain text which may contain secrets or sensitive information
  • Local files can get lost, corrupted, or deleted

Best practices for handling state files

The recommended practice for managing state files is to use terraform’s built-in support for remote backends. These are:

Remote backend on Amazon Simple Storage Service (Amazon S3): You can configure terraform to store state files in an Amazon S3 bucket which provides a durable and scalable storage solution. Storing on Amazon S3 also enables collaboration that allows you to share state file with others.

Remote backend on Amazon S3 with Amazon DynamoDB: In addition to using an Amazon S3 bucket for managing the files, you can use an Amazon DynamoDB table to lock the state file. This will allow only one person to modify a particular state file at any given time. It will help to avoid conflicts and enable safe concurrent access to the state file.

There are other options available as well such as remote backend on terraform cloud and third party backends. Ultimately, the best method for managing terraform state files on AWS will depend on your specific requirements.

When deploying terraform on AWS, the preferred choice of managing state is using Amazon S3 with Amazon DynamoDB.

AWS configurations for managing state files

  1. Create an Amazon S3 bucket using terraform. Implement security measures for Amazon S3 bucket by creating an AWS Identity and Access Management (AWS IAM) policy or Amazon S3 Bucket Policy. Thus you can restrict access, configure object versioning for data protection and recovery, and enable AES256 encryption with SSE-KMS for encryption control.
  1. Next create an Amazon DynamoDB table using terraform with Primary key set to LockID. You can also set any additional configuration options such as read/write capacity units. Once the table is created, you will configure the terraform backend to use it for state locking by specifying the table name in the terraform block of your configuration.
  1. For a single AWS account with multiple environments and projects, you can use a single Amazon S3 bucket. If you have multiple applications in multiple environments across multiple AWS accounts, you can create one Amazon S3 bucket for each account. In that Amazon S3 bucket, you can create appropriate folders for each environment, storing project state files with specific prefixes.

Now that you know how to handle terraform state files on AWS, let’s look at an example of how you can configure them in a Continuous Integration pipeline in AWS.

Architecture

Architecture on how to use terraform in an AWS CI pipeline

Figure 1: Example architecture on how to use terraform in an AWS CI pipeline

This diagram outlines the workflow implemented in this blog:

  1. The AWS CodeCommit repository contains the application code
  2. The AWS CodeBuild job contains the buildspec files and references the source code in AWS CodeCommit
  3. The AWS Lambda function contains the application code created after running terraform apply
  4. Amazon S3 contains the state file created after running terraform apply. Amazon DynamoDB locks the state file present in Amazon S3

Implementation

Pre-requisites

Before you begin, you must complete the following prerequisites:

Setting up the environment

  1. You need an AWS access key ID and secret access key to configure AWS CLI. To learn more about configuring the AWS CLI, follow these instructions.
  2. Clone the repo for complete example: git clone https://github.com/aws-samples/manage-terraform-statefiles-in-aws-pipeline
  3. After cloning, you could see the following folder structure:
AWS CodeCommit repository structure

Figure 2: AWS CodeCommit repository structure

Let’s break down the terraform code into 2 parts – one for preparing the infrastructure and another for preparing the application.

Preparing the Infrastructure

  1. The main.tf file is the core component that does below:
      • It creates an Amazon S3 bucket to store the state file. We configure bucket ACL, bucket versioning and encryption so that the state file is secure.
      • It creates an Amazon DynamoDB table which will be used to lock the state file.
      • It creates two AWS CodeBuild projects, one for ‘terraform plan’ and another for ‘terraform apply’.

    Note – It also has the code block (commented out by default) to create AWS Lambda which you will use at a later stage.

  1. AWS CodeBuild projects should be able to access Amazon S3, Amazon DynamoDB, AWS CodeCommit and AWS Lambda. So, the AWS IAM role with appropriate permissions required to access these resources are created via iam.tf file.
  1. Next you will find two buildspec files named buildspec-plan.yaml and buildspec-apply.yaml that will execute terraform commands – terraform plan and terraform apply respectively.
  1. Modify AWS region in the provider.tf file.
  1. Update Amazon S3 bucket name, Amazon DynamoDB table name, AWS CodeBuild compute types, AWS Lambda role and policy names to required values using variable.tf file. You can also use this file to easily customize parameters for different environments.

With this, the infrastructure setup is complete.

You can use your local terminal and execute below commands in the same order to deploy the above-mentioned resources in your AWS account.

terraform init
terraform validate
terraform plan
terraform apply

Once the apply is successful and all the above resources have been successfully deployed in your AWS account, proceed with deploying your application. 

Preparing the Application

  1. In the cloned repository, use the backend.tf file to create your own Amazon S3 backend to store the state file. By default, it will have below values. You can override them with your required values.
bucket = "tfbackend-bucket" 
key    = "terraform.tfstate" 
region = "eu-central-1"
  1. The repository has sample python code stored in main.py that returns a simple message when invoked.
  1. In the main.tf file, you can find the below block of code to create and deploy the Lambda function that uses the main.py code (uncomment these code blocks).
data "archive_file" "lambda_archive_file" {
    ……
}

resource "aws_lambda_function" "lambda" {
    ……
}
  1. Now you can deploy the application using AWS CodeBuild instead of running terraform commands locally which is the whole point and advantage of using AWS CodeBuild.
  1. Run the two AWS CodeBuild projects to execute terraform plan and terraform apply again.
  1. Once successful, you can verify your deployment by testing the code in AWS Lambda. To test a lambda function (console):
    • Open AWS Lambda console and select your function “tf-codebuild”
    • In the navigation pane, in Code section, click Test to create a test event
    • Provide your required name, for example “test-lambda”
    • Accept default values and click Save
    • Click Test again to trigger your test event “test-lambda”

It should return the sample message you provided in your main.py file. In the default case, it will display “Hello from AWS Lambda !” message as shown below.

Sample Amazon Lambda function response

Figure 3: Sample Amazon Lambda function response

  1. To verify your state file, go to Amazon S3 console and select the backend bucket created (tfbackend-bucket). It will contain your state file.
Amazon S3 bucket with terraform state file

Figure 4: Amazon S3 bucket with terraform state file

  1. Open Amazon DynamoDB console and check your table tfstate-lock and it will have an entry with LockID.
Amazon DynamoDB table with LockID

Figure 5: Amazon DynamoDB table with LockID

Thus, you have securely stored and locked your terraform state file using terraform backend in a Continuous Integration pipeline.

Cleanup

To delete all the resources created as part of the repository, run the below command from your terminal.

terraform destroy

Conclusion

In this blog post, we explored the fundamentals of terraform state files, discussed best practices for their secure storage within AWS environments and also mechanisms for locking these files to prevent unauthorized team access. And finally, we showed you an example of how efficiently you can manage them in a Continuous Integration pipeline in AWS.

You can apply the same methodology to manage state files in a Continuous Delivery pipeline in AWS. For more information, see CI/CD pipeline on AWS, Terraform backends types, Purpose of terraform state.

Arun Kumar Selvaraj

Arun Kumar Selvaraj is a Cloud Infrastructure Architect with AWS Professional Services. He loves building world class capability that provides thought leadership, operating standards and platform to deliver accelerated migration and development paths for his customers. His interests include Migration, CCoE, IaC, Python, DevOps, Containers and Networking.

Manasi Bhutada

Manasi Bhutada is an ISV Solutions Architect based in the Netherlands. She helps customers design and implement well architected solutions in AWS that address their business problems. She is passionate about data analytics and networking. Beyond work she enjoys experimenting with food, playing pickleball, and diving into fun board games.

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Post Syndicated from Raymond Lai original https://aws.amazon.com/blogs/big-data/enforce-fine-grained-access-control-on-open-table-formats-via-amazon-emr-integrated-with-aws-lake-formation/

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. This allows you to simplify security and governance over transactional data lakes by providing access controls at table-, column-, and row-level permissions with your Apache Spark jobs. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making. You can build a lake house architecture using Amazon EMR integrated with Lake Formation for FGAC. This combination of services allows you to conduct data analysis on your transactional data lake while ensuring secure and controlled access.

The Amazon EMR record server component supports table-, column-, row-, cell-, and nested attribute-level data filtering functionality. It extends support to Hive, Apache Hudi, Apache Iceberg, and Delta lake formats for both reading (including time travel and incremental query) and write operations (on DML statements such as INSERT). Additionally, with version 6.15, Amazon EMR introduces access control protection for its application web interface such as on-cluster Spark History Server, Yarn Timeline Server, and Yarn Resource Manager UI.

In this post, we demonstrate how to implement FGAC on Apache Hudi tables using Amazon EMR integrated with Lake Formation.

Transaction data lake use case

Amazon EMR customers often use Open Table Formats to support their ACID transaction and time travel needs in a data lake. By preserving historical versions, data lake time travel provides benefits such as auditing and compliance, data recovery and rollback, reproducible analysis, and data exploration at different points in time.

Another popular transaction data lake use case is incremental query. Incremental query refers to a query strategy that focuses on processing and analyzing only the new or updated data within a data lake since the last query. The key idea behind incremental queries is to use metadata or change tracking mechanisms to identify the new or modified data since the last query. By identifying these changes, the query engine can optimize the query to process only the relevant data, significantly reducing the processing time and resource requirements.

Solution overview

In this post, we demonstrate how to implement FGAC on Apache Hudi tables using Amazon EMR on Amazon Elastic Compute Cloud (Amazon EC2) integrated with Lake Formation. Apache Hudi is an open source transactional data lake framework that greatly simplifies incremental data processing and the development of data pipelines. This new FGAC feature supports all OTF. Besides demonstrating with Hudi here, we will follow up with other OTF tables with other blogs. We use notebooks in Amazon SageMaker Studio to read and write Hudi data via different user access permissions through an EMR cluster. This reflects real-world data access scenarios—for example, if an engineering user needs full data access to troubleshoot on a data platform, whereas data analysts may only need to access a subset of that data that doesn’t contain personally identifiable information (PII). Integrating with Lake Formation via the Amazon EMR runtime role further enables you to improve your data security posture and simplifies data control management for Amazon EMR workloads. This solution ensures a secure and controlled environment for data access, meeting the diverse needs and security requirements of different users and roles in an organization.

The following diagram illustrates the solution architecture.

Solution architecture

We conduct a data ingestion process to upsert (update and insert) a Hudi dataset to an Amazon Simple Storage Service (Amazon S3) bucket, and persist or update the table schema in the AWS Glue Data Catalog. With zero data movement, we can query the Hudi table governed by Lake Formation via various AWS services, such as Amazon Athena, Amazon EMR, and Amazon SageMaker.

When users submit a Spark job through any EMR cluster endpoints (EMR Steps, Livy, EMR Studio, and SageMaker), Lake Formation validates their privileges and instructs the EMR cluster to filter out sensitive data such as PII data.

This solution has three different types of users with different levels of permissions to access the Hudi data:

  • hudi-db-creator-role – This is used by the data lake administrator who has privileges to carry out DDL operations such as creating, modifying, and deleting database objects. They can define data filtering rules on Lake Formation for row-level and column-level data access control. These FGAC rules ensure that data lake is secured and fulfills the data privacy regulations required.
  • hudi-table-pii-role – This is used by engineering users. The engineering users are capable of carrying out time travel and incremental queries on both Copy-on-Write (CoW) and Merge-on-Read (MoR). They also have privilege to access PII data based on any timestamps.
  • hudi-table-non-pii-role – This is used by data analysts. Data analysts’ data access rights are governed by FGAC authorized rules controlled by data lake administrators. They do not have visibility on columns containing PII data like names and addresses. Additionally, they can’t access rows of data that don’t fulfill certain conditions. For example, the users only can access data rows that belong to their country.

Prerequisites

You can download the three notebooks used in this post from the GitHub repo.

Before you deploy the solution, make sure you have the following:

Complete the following steps to set up your permissions:

  1. Log in to your AWS account with your admin IAM user.

Make sure you are in theus-east-1Region.

  1. Create a S3 bucket in the us-east-1 Region (for example,emr-fgac-hudi-us-east-1-<ACCOUNT ID>).

Next, we enable Lake Formation by changing the default permission model.

  1. Sign in to the Lake Formation console as the administrator user.
  2. Choose Data Catalog settings under Administration in the navigation pane.
  3. Under Default permissions for newly created databases and tables, deselect Use only IAM access control for new databases and Use only IAM access control for new tables in new databases.
  4. Choose Save.

Data Catalog settings

Alternatively, you need to revoke IAMAllowedPrincipals on resources (databases and tables) created if you started Lake Formation with the default option.

Finally, we create a key pair for Amazon EMR.

  1. On the Amazon EC2 console, choose Key pairs in the navigation pane.
  2. Choose Create key pair.
  3. For Name, enter a name (for exampleemr-fgac-hudi-keypair).
  4. Choose Create key pair.

Create key pair

The generated key pair (for this post, emr-fgac-hudi-keypair.pem) will save to your local computer.

Next, we create an AWS Cloud9 interactive development environment (IDE).

  1. On the AWS Cloud9 console, choose Environments in the navigation pane.
  2. Choose Create environment.
  3. For Name¸ enter a name (for example,emr-fgac-hudi-env).
  4. Keep the other settings as default.

Cloud9 environment

  1. Choose Create.
  2. When the IDE is ready, choose Open to open it.

cloud9 environment

  1. In the AWS Cloud9 IDE, on the File menu, choose Upload Local Files.

Upload local file

  1. Upload the key pair file (emr-fgac-hudi-keypair.pem).
  2. Choose the plus sign and choose New Terminal.

new terminal

  1. In the terminal, input the following command lines:
#Create encryption certificates for EMR in transit encryption
openssl req -x509 \
-newkey rsa:1024 \
-keyout privateKey.pem \
-out certificateChain.pem \
-days 365 \
-nodes \
-subj '/C=US/ST=Washington/L=Seattle/O=MyOrg/OU=MyDept/CN=*.compute.internal'
cp certificateChain.pem trustedCertificates.pem

# Zip certificates
zip -r -X my-certs.zip certificateChain.pem privateKey.pem trustedCertificates.pem

# Upload the certificates zip file to S3 bucket
# Replace <ACCOUNT ID> with your AWS account ID
aws s3 cp ./my-certs.zip s3://emr-fgac-hudi-us-east-1-<ACCOUNT ID>/my-certs.zip

Note that the example code is a proof of concept for demonstration purposes only. For production systems, use a trusted certification authority (CA) to issue certificates. Refer to Providing certificates for encrypting data in transit with Amazon EMR encryption for details.

Deploy the solution via AWS CloudFormation

We provide an AWS CloudFormation template that automatically sets up the following services and components:

  • An S3 bucket for the data lake. It contains the sample TPC-DS dataset.
  • An EMR cluster with security configuration and public DNS enabled.
  • EMR runtime IAM roles with Lake Formation fine-grained permissions:
    • <STACK-NAME>-hudi-db-creator-role – This role is used to create Apache Hudi database and tables.
    • <STACK-NAME>-hudi-table-pii-role – This role provides permission to query all columns of Hudi tables, including columns with PII.
    • <STACK-NAME>-hudi-table-non-pii-role – This role provides permission to query Hudi tables that have filtered out PII columns by Lake Formation.
  • SageMaker Studio execution roles that allow the users to assume their corresponding EMR runtime roles.
  • Networking resources such as VPC, subnets, and security groups.

Complete the following steps to deploy the resources:

  1. Choose Quick create stack to launch the CloudFormation stack.
  2. For Stack name, enter a stack name (for example,rsv2-emr-hudi-blog).
  3. For Ec2KeyPair, enter the name of your key pair.
  4. For IdleTimeout, enter an idle timeout for the EMR cluster to avoid paying for the cluster when it’s not being used.
  5. For InitS3Bucket, enter the S3 bucket name you created to save the Amazon EMR encryption certificate .zip file.
  6. For S3CertsZip, enter the S3 URI of the Amazon EMR encryption certificate .zip file.

CloudFormation template

  1. Select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  2. Choose Create stack.

The CloudFormation stack deployment takes around 10 minutes.

Set up Lake Formation for Amazon EMR integration

Complete the following steps to set up Lake Formation:

  1. On the Lake Formation console, choose Application integration settings under Administration in the navigation pane.
  2. Select Allow external engines to filter data in Amazon S3 locations registered with Lake Formation.
  3. Choose Amazon EMR for Session tag values.
  4. Enter your AWS account ID for AWS account IDs.
  5. Choose Save.

LF - Application integration settings

  1. Choose Databases under Data Catalog in the navigation pane.
  2. Choose Create database.
  3. For Name, enter default.
  4. Choose Create database.

LF - create database

  1. Choose Data lake permissions under Permissions in the navigation pane.
  2. Choose Grant.
  3. Select IAM users and roles.
  4. Choose your IAM roles.
  5. For Databases, choose default.
  6. For Database permissions, select Describe.
  7. Choose Grant.

LF - Grant data permissions

Copy Hudi JAR file to Amazon EMR HDFS

To use Hudi with Jupyter notebooks, you need to complete the following steps for the EMR cluster, which includes copying a Hudi JAR file from the Amazon EMR local directory to its HDFS storage, so that you can configure a Spark session to use Hudi:

  1. Authorize inbound SSH traffic (port 22).
  2. Copy the value for Primary node public DNS (for example, ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com) from the EMR cluster Summary section.

EMR cluster summary

  1. Go back to previous AWS Cloud9 terminal you used to create the EC2 key pair.
  2. Run the following command to SSH into the EMR primary node. Replace the placeholder with your EMR DNS hostname:
chmod 400 emr-fgac-hudi-keypair.pem
ssh -i emr-fgac-hudi-keypair.pem [email protected]
  1. Run the following command to copy the Hudi JAR file to HDFS:
hdfs dfs -mkdir -p /apps/hudi/lib
hdfs dfs -copyFromLocal /usr/lib/hudi/hudi-spark-bundle.jar /apps/hudi/lib/hudi-spark-bundle.jar

Create the Hudi database and tables in Lake Formation

Now we’re ready to create the Hudi database and tables with FGAC enabled by the EMR runtime role. The EMR runtime role is an IAM role that you can specify when you submit a job or query to an EMR cluster.

Grant database creator permission

First, let’s grant the Lake Formation database creator permission to<STACK-NAME>-hudi-db-creator-role:

  1. Log in to your AWS account as an administrator.
  2. On the Lake Formation console, choose Administrative roles and tasks under Administration in the navigation pane.
  3. Confirm that your AWS login user has been added as a data lake administrator.
  4. In the Database creator section, choose Grant.
  5. For IAM users and roles, choose<STACK-NAME>-hudi-db-creator-role.
  6. For Catalog permissions, select Create database.
  7. Choose Grant.

Register the data lake location

Next, let’s register the S3 data lake location in Lake Formation:

  1. On the Lake Formation console, choose Data lake locations under Administration in the navigation pane.
  2. Choose Register location.
  3. For Amazon S3 path, Choose Browse and choose the data lake S3 bucket. (<STACK_NAME>s3bucket-XXXXXXX) created from the CloudFormation stack.
  4. For IAM role, choose<STACK-NAME>-hudi-db-creator-role.
  5. For Permission mode, select Lake Formation.
  6. Choose Register location.

LF - Register location

Grant data location permission

Next, we need to grant<STACK-NAME>-hudi-db-creator-rolethe data location permission:

  1. On the Lake Formation console, choose Data locations under Permissions in the navigation pane.
  2. Choose Grant.
  3. For IAM users and roles, choose<STACK-NAME>-hudi-db-creator-role.
  4. For Storage locations, enter the S3 bucket (<STACK_NAME>-s3bucket-XXXXXXX).
  5. Choose Grant.

LF - Grant permissions

Connect to the EMR cluster

Now, let’s use a Jupyter notebook in SageMaker Studio to connect to the EMR cluster with the database creator EMR runtime role:

  1. On the SageMaker console, choose Domains in the navigation pane.
  2. Choose the domain<STACK-NAME>-Studio-EMR-LF-Hudi.
  3. On the Launch menu next to the user profile<STACK-NAME>-hudi-db-creator, choose Studio.

SM - Domain details

  1. Download the notebook rsv2-hudi-db-creator-notebook.
  2. Choose the upload icon.

SM Studio - Upload

  1. Choose the downloaded Jupyter notebook and choose Open.
  2. Open the uploaded notebook.
  3. For Image, choose SparkMagic.
  4. For Kernel, choose PySpark.
  5. Leave the other configurations as default and choose Select.

SM Studio - Change environment

  1. Choose Cluster to connect to the EMR cluster.

SM Studio - connect EMR cluster

  1. Choose the EMR on EC2 cluster (<STACK-NAME>-EMR-Cluster) created with the CloudFormation stack.
  2. Choose Connect.
  3. For EMR execution role, choose<STACK-NAME>-hudi-db-creator-role.
  4. Choose Connect.

Create database and tables

Now you can follow the steps in the notebook to create the Hudi database and tables. The major steps are as follows:

  1. When you start the notebook, configure“spark.sql.catalog.spark_catalog.lf.managed":"true"to inform Spark that spark_catalog is protected by Lake Formation.
  2. Create Hudi tables using the following Spark SQL.
%%sql 
CREATE TABLE IF NOT EXISTS ${hudi_catalog}.${hudi_db}.${cow_table_name_sql}(
    c_customer_id string,
    c_birth_country string,
    c_customer_sk integer,
    c_email_address string,
    c_first_name string,
    c_last_name string,
    ts bigint
) USING hudi
LOCATION '${cow_table_location_sql}'
OPTIONS (
  type = 'cow',
  primaryKey = '${hudi_primary_key}',
  preCombineField = '${hudi_pre_combined_field}'
 ) 
PARTITIONED BY (${hudi_partitioin_field});

  1. Insert data from the source table to the Hudi tables.
%%sql
INSERT OVERWRITE ${hudi_catalog}.${hudi_db}.${cow_table_name_sql}
SELECT 
    c_customer_id ,  
    c_customer_sk,
    c_email_address,
    c_first_name,
    c_last_name,
    unix_timestamp(current_timestamp()) AS ts,
    c_birth_country
FROM ${src_df_view}
WHERE c_birth_country = 'HONG KONG' OR c_birth_country = 'CHINA' 
LIMIT 1000
  1. Insert data again into the Hudi tables.
%%sql
INSERT INTO ${hudi_catalog}.${hudi_db}.${cow_table_name_sql}
SELECT 
    c_customer_id ,  
    c_customer_sk,
    c_email_address,
    c_first_name,
    c_last_name,
    unix_timestamp(current_timestamp()) AS ts,
    c_birth_country
FROM ${insert_into_view}

Query the Hudi tables via Lake Formation with FGAC

After you create the Hudi database and tables, you’re ready to query the tables using fine-grained access control with Lake Formation. We have created two types of Hudi tables: Copy-On-Write (COW) and Merge-On-Read (MOR). The COW table stores data in a columnar format (Parquet), and each update creates a new version of files during a write. This means that for every update, Hudi rewrites the entire file, which can be more resource-intensive but provides faster read performance. MOR, on the other hand, is introduced for cases where COW may not be optimal, particularly for write- or change-heavy workloads. In a MOR table, each time there is an update, Hudi writes only the row for the changed record, which reduces cost and enables low-latency writes. However, the read performance might be slower compared to COW tables.

Grant table access permission

We use the IAM role<STACK-NAME>-hudi-table-pii-roleto query Hudi COW and MOR containing PII columns. We first grant the table access permission via Lake Formation:

  1. On the Lake Formation console, choose Data lake permissions under Permissions in the navigation pane.
  2. Choose Grant.
  3. Choose<STACK-NAME>-hudi-table-pii-rolefor IAM users and roles.
  4. Choose thersv2_blog_hudi_db_1database for Databases.
  5. For Tables, choose the four Hudi tables you created in the Jupyter notebook.

LF - Grant data permissions

  1. For Table permissions, select Select.
  2. Choose Grant.

LF - table permissions

Query PII columns

Now you’re ready to run the notebook to query the Hudi tables. Let’s follow similar steps to the previous section to run the notebook in SageMaker Studio:

  1. On the SageMaker console, navigate to the<STACK-NAME>-Studio-EMR-LF-Hudidomain.
  2. On the Launch menu next to the<STACK-NAME>-hudi-table-readeruser profile, choose Studio.
  3. Upload the downloaded notebook rsv2-hudi-table-pii-reader-notebook.
  4. Open the uploaded notebook.
  5. Repeat the notebook setup steps and connect to the same EMR cluster, but use the role<STACK-NAME>-hudi-table-pii-role.

In the current stage, FGAC-enabled EMR cluster needs to query Hudi’s commit time column for performing incremental queries and time travel. It does not support Spark’s “timestamp as of” syntax and Spark.read(). We are actively working on incorporating support for both actions in future Amazon EMR releases with FGAC enabled.

You can now follow the steps in the notebook. The following are some highlighted steps:

  1. Run a snapshot query.
%%sql 
SELECT c_birth_country, count(*) FROM ${hudi_catalog}.${hudi_db}.${cow_table_name_sql} GROUP BY c_birth_country;
  1. Run an incremental query.
incremental_df = spark.sql(f"""
SELECT * FROM {HUDI_CATALOG}.{HUDI_DATABASE}.{COW_TABLE_NAME_SQL} WHERE _hoodie_commit_time >= {commit_ts[-1]}
""")

incremental_df.createOrReplaceTempView("incremental_view")
%%sql
SELECT 
    c_birth_country, 
    count(*) 
FROM incremental_view
GROUP BY c_birth_country;
  1. Run a time travel query.
%%sql
SELECT
    c_birth_country, COUNT(*) as count
FROM ${hudi_catalog}.${hudi_db}.${cow_table_name_sql}
WHERE _hoodie_commit_time IN
(
    SELECT DISTINCT _hoodie_commit_time FROM ${hudi_catalog}.${hudi_db}.${cow_table_name_sql} ORDER BY _hoodie_commit_time LIMIT 1 
)
GROUP BY c_birth_country
  1. Run MOR read-optimized and real-time table queries.
%%sql
SELECT
    a.email_label,
    count(*)
FROM (
    SELECT
        CASE
            WHEN c_email_address = 'UNKNOWN' THEN 'UNKNOWN'
            ELSE 'NOT_UNKNOWN'
        END AS email_label
    FROM ${hudi_catalog}.${hudi_db}.${mor_table_name_sql}_ro
    WHERE c_birth_country = 'HONG KONG'
) a
GROUP BY a.email_label;
%%sql
SELECT *  
FROM ${hudi_catalog}.${hudi_db}.${mor_table_name_sql}_ro
WHERE 
    c_birth_country = 'INDIA' OR c_first_name = 'MASKED'

Query the Hudi tables with column-level and row-level data filters

We use the IAM role<STACK-NAME>-hudi-table-non-pii-roleto query Hudi tables. This role is not allowed to query any columns containing PII. We use the Lake Formation column-level and row-level data filters to implement fine-grained access control:

  1. On the Lake Formation console, choose Data filters under Data Catalog in the navigation pane.
  2. Choose Create new filter.
  3. For Data filter name, entercustomer-pii-filter.
  4. Choosersv2_blog_hudi_db_1for Target database.
  5. Choosersv2_blog_hudi_mor_sql_dl_customer_1for Target table.
  6. Select Exclude columns and choose thec_customer_id,c_email_address, andc_last_namecolumns.
  7. Enterc_birth_country != 'HONG KONG'for Row filter expression.
  8. Choose Create filter.

LF - create data filter

  1. Choose Data lake permissions under Permissions in the navigation pane.
  2. Choose Grant.
  3. Choose<STACK-NAME>-hudi-table-non-pii-rolefor IAM users and roles.
  4. Choosersv2_blog_hudi_db_1for Databases.
  5. Choosersv2_blog_hudi_mor_sql_dl_tpc_customer_1for Tables.
  6. Choosecustomer-pii-filterfor Data filters.
  7. For Data filter permissions, select Select.
  8. Choose Grant.

LF - Grant data permissions

Let’s follow similar steps to run the notebook in SageMaker Studio:

  1. On the SageMaker console, navigate to the domainStudio-EMR-LF-Hudi.
  2. On the Launch menu for thehudi-table-readeruser profile, choose Studio.
  3. Upload the downloaded notebook rsv2-hudi-table-non-pii-reader-notebook and choose Open.
  4. Repeat the notebook setup steps and connect to the same EMR cluster, but select the role<STACK-NAME>-hudi-table-non-pii-role.

You can now follow the steps in the notebook. From the query results, you can see that FGAC via the Lake Formation data filter has been applied. The role can’t see the PII columnsc_customer_id,c_last_name, andc_email_address. Also, the rows fromHONG KONGhave been filtered.

filtered query result

Clean up

After you’re done experimenting with the solution, we recommend cleaning up resources with the following steps to avoid unexpected costs:

  1. Shut down the SageMaker Studio apps for the user profiles.

The EMR cluster will be automatically deleted after the idle timeout value.

  1. Delete the Amazon Elastic File System (Amazon EFS) volume created for the domain.
  2. Empty the S3 buckets created by the CloudFormation stack.
  3. On the AWS CloudFormation console, delete the stack.

Conclusion

In this post, we used Apachi Hudi, one type of OTF tables, to demonstrate this new feature to enforce fine-grained access control on Amazon EMR. You can define granular permissions in Lake Formation for OTF tables and apply them via Spark SQL queries on EMR clusters. You also can use transactional data lake features such as running snapshot queries, incremental queries, time travel, and DML query. Please note that this new feature covers all OTF tables.

This feature is launched starting from Amazon EMR release 6.15 in all Regions where Amazon EMR is available. With the Amazon EMR integration with Lake Formation, you can confidently manage and process big data, unlocking insights and facilitating informed decision-making while upholding data security and governance.

To learn more, refer to Enable Lake Formation with Amazon EMR and feel free to contact your AWS Solutions Architects, who can be of assistance alongside your data journey.


About the Author

Raymond LaiRaymond Lai is a Senior Solutions Architect who specializes in catering to the needs of large enterprise customers. His expertise lies in assisting customers with migrating intricate enterprise systems and databases to AWS, constructing enterprise data warehousing and data lake platforms. Raymond excels in identifying and designing solutions for AI/ML use cases, and he has a particular focus on AWS Serverless solutions and Event Driven Architecture design.

Bin Wang, PhD, is a Senior Analytic Specialist Solutions Architect at AWS, boasting over 12 years of experience in the ML industry, with a particular focus on advertising. He possesses expertise in natural language processing (NLP), recommender systems, diverse ML algorithms, and ML operations. He is deeply passionate about applying ML/DL and big data techniques to solve real-world problems.

Aditya Shah is a Software Development Engineer at AWS. He is interested in Databases and Data warehouse engines and has worked on performance optimisations, security compliance and ACID compliance for engines like Apache Hive and Apache Spark.

Melody Yang is a Senior Big Data Solution Architect for Amazon EMR at AWS. She is an experienced analytics leader working with AWS customers to provide best practice guidance and technical advice in order to assist their success in data transformation. Her areas of interests are open-source frameworks and automation, data engineering and DataOps.

Meet the final cohort of AWS Heroes this year – November 2023

Post Syndicated from Taylor Jacobsen original https://aws.amazon.com/blogs/aws/meet-the-final-cohort-of-aws-heroes-this-year-november-2023/

As 2023 comes to an end, we’re celebrating our final Heroes cohort launch of the year! These technical experts are passionate about helping their local communities build faster on AWS—they’re focused on sharing best practices, solving problems, and even more. We’re thrilled to have them join the AWS Heroes program, and recognizing them for their contributions to the greater AWS community.

Please meet our newest Heroes!

Emin Alemdar – Izmir, Turkey

Container Hero Emin Alemdar is a Solutions Architect at Spacelift where he produces solutions related to Kubernetes, Cloud technologies, and Cloud Native Transformation. In general, his work focuses on these services and technologies, and he shares best practices with the AWS community. Additionally, Emin is a CNCF Ambassador and is part of the HashiCorp Ambassador Program within the open source community.

Richard Fan – Hong Kong

Security Hero Richard Fan is a Security Engineer at ExpressVPN. He is dedicated to helping builders easily adopt AWS, and shares best practices around streamlines for cloud governance. Richard has also developed different tools to simplify the experience with AWS security services, such as his nitro-enclave-python-demo project, which helps builders get started on AWS Nitro Enclaves and has been adopted by some AWS workshops. Furthermore, Richard promotes the concept and use cases of enclave technology by partnering with multiple companies to review their AWS Nitro Enclaves offering.

Takuya Tachibana – Misawa, Japan

Community Hero Takuya Tachibana is the CEO of Heptagon inc. and Director of DigitalCube Co. Ltd. Since 2012, he has been contributing to the Japan AWS User Group (JAWS-UG), and he was the leader from 2016-2017 for all of the JAWS-UGs, representing and overseeing every chapter in Japan. Tachibana has been a speaker at over 100 community and cloud events throughout all of Japan, including AWS Summit Seoul, AWS Summit Beijing, and AWS Community Day APAC.

Learn More

To learn more about the AWS Heroes program or to connect with a Hero near you, please visit the AWS Heroes website.

Taylor

Announcing the latest AWS Heroes – June 2023

Post Syndicated from Taylor Jacobsen original https://aws.amazon.com/blogs/aws/announcing-the-latest-aws-heroes-june-2023/

AWS Heroes dedicate their time to help others build better and faster on AWS. Heroes support and give back to the community in a variety of ways: contributing to open source projects, organizing AWS Community Days, speaking at conferences, leading workshops, mentoring builders, hosting meetups, and much more.

Please welcome and say hello to our newest AWS Heroes!

AJ Stuyvenberg – Boston, USA

Serverless Hero AJ Stuyvenberg is a Staff Engineer at Datadog, and has been a member of the serverless community since early 2017. His work focuses on serverless and distributed system observability. AJ is an open source author and maintains several projects, which improve the serverless developer experience. He has also spoken at multiple conferences, including AWS re:Invent and AWS Summits, and frequently writes about serverless topics on his blog.

Danielle Heberling – Hillsboro, USA

Serverless Hero Danielle Heberling is a software engineer with a background that includes being a musician, teaching at a K-8 public school, and working in technical support. She’s passionate about building things that make the world a better place, whether that be through social change or a good laugh. When she’s not coding or talking about serverless, you can often find her reaching back to her teaching roots by mentoring folks from underrepresented groups that would like to make a career switch into tech.

Dominik Grzywaczewski – Lublin, Poland

Community Hero Dominik Grzywaczewski is a Senior Cloud Site Reliability Engineer at Chaos Gears with more than 15 years of experience in IT. His primary objective is to assist companies in gaining a deeper understanding of Cloud Computing technologies, and effectively leveraging them to drive faster and more secure innovation. Dominik shares his passion by organizing technical meetups and workshops, and consistently collaborates with AWS community members. He also founded the AWS User Group in Lublin (Poland) and co-organizes the AWS Community Day conference in Warsaw (Poland).

Johannes Koch – Hessen, Germany

DevTools Hero Johannes Koch is a Sr. DevOps Engineer, Developer Experience, GTS at FICO where he contributes to the FICO®️ Platform. He shares his best practices related to Continuous Integration and Continuous Deployment (CI/CD) on his YouTube channel: cicdonaws. Johannes also founded the AWS User Group Bergstrasse, helped to start the AWS Community DACH Förderverein, and is part of the team that organizes the AWS Community Day in the DACH region.

Michael Walmsley – Melbourne, Australia

Serverless Hero Michael Walmsley is a Lead Technology Architect in the myWizard®️ Automation Group at Accenture, where he is focused on building event-driven products in the cloud. He is excited by the AWS Lambda Powertools open-source projects, and has been using and actively contributing to them since 2020. Michael is also a passionate AWS community member in Australia, supporting local meetups and conferences. He helps organize and run the AWS Programming and Tools Meetup in Melbourne, which focuses on running monthly hands-on training workshops that are open to everyone.

Mikey Fan – Beijing, China

Community Hero Mikey Fan is a Cloud-native Application Architect and SDN Developer. Since 2020, he has been actively exploring how to build innovative applications based on AWS EKS, Private 5G, and SD-WAN technology, and then applying them to 5G Edge Computing scenarios. Mikey is also a cloud-computing technology evangelist and an open-source enthusiast. He enjoys contributing code to open-source projects, such as Kubernetes and Tungsten Fabric, and he likes to demo how these open-source technologies can be combined with AWS cloud computing to create greater value.

Ran Isenberg – Kfar Saba, Israel

Serverless Hero Ran Isenberg is a principal software architect at CyberArk, where he designs and builds serverless services. He is passionate about CI/CD and AWS CDK, and has contributed several utilities to the AWS Lambda Powertools open-source project. Ran also maintains numerous serverless related open-source projects on his GitHub account, such as the AWS Lambda cookbook – a serverless service template that gets you started in the serverless world with all of the best practices in seconds.

Sabiha Ali – Dubai, United Arab Emirates

Community Hero Sabiha Ali is a Solutions Architect at ScaleCapacity. She specializes in Amazon Connect, architecting resilient and secure systems in the cloud. As an Amazon Connect Ambassador, she helps businesses enhance their customer experiences. Her unwavering passion for learning has earned her numerous AWS certifications (9X), solidifying her expertise in the field. She became an AWS User Group Leader in Dubai after starting out as an active AWS Community Builder. Sabiha is also committed to empowering women in the tech industry, making her a valued professional and an advocate for change.

Tomasz Dudek – Wroclaw, Poland

Machine Learning Hero Tomasz Dudek works as a Data & AI Team Lead and a Solutions Architect at Chaos Gears. He guides customers on how leveraging machine learning powered solutions can help their businesses thrive. He also designs AWS architectures and manages a data-focused team. Additionally, Tomasz co-organizes the AWS Community Day Poland, and as well as hosts the AWS User Group in his hometown Wroclaw. He often conducts workshops, such as SageMaker Immersion Days, speaks at conferences, and shares his knowledge in the form of short posts on LinkedIn, and longer ones on his blog, ‘MLOps and how you tame it.’

Wojciech Dąbrowski – Katowice, Poland

Community Hero Wojciech Dąbrowski is Head of Cloud Architecture at DTiQ, where he leads the team responsible for the architecture of cloud solutions and the cloud adaptation strategy in the organization. He has been an AWS User Group Silesia leader since 2019, and has managed to organize multiple online and offline meetups. In addition, Wojciech leads workshops and presents cloud computing and software engineering topics at various events.

Learn More

If you’d like to learn more about the new Heroes or connect with a Hero near you, please visit the AWS Heroes website or browse the AWS Heroes Content Library.

Taylor

A Guide to Maintaining a Healthy Email Database

Post Syndicated from nnatri original https://aws.amazon.com/blogs/messaging-and-targeting/guide-to-maintaining-healthy-email-database/

Introduction

In the digital age, email remains a powerful tool for businesses to communicate with their customers. Whether it’s for marketing campaigns, customer service updates, or important announcements, a well-maintained email database is crucial for ensuring that your messages reach their intended recipients. However, managing an email database is not just about storing email addresses. It involves keeping the database healthy, which means it’s up-to-date, accurate, and filled with engaged subscribers.

Amazon Simple Email Service (SES) offers robust features that help businesses manage their email environments effectively. Trusted by customers such as Amazon.com, Netflix, Duolingo and Reddit, SES helps customers deliver high-volume email campaigns of hundreds of billions of emails per year. Introduced in 2020, the list and subscription management feature of Amazon SES has added a new dimension to email database management, thereby reducing effort and time-to-value of managing a subscription list by allowing you to manage your list of contacts via its REST API, SDK or AWS CLI.

In this blog post, we will delve into the world of email database management in Amazon SES. You will explore two ways to manage your email database: building out your own email database functionality and using the built-in list and subscription management service. You will also learn the pros and cons of each approach and provide examples of customer use cases that would benefit from each approach. Regardless of the approach you ultimately decide to take, the blog will also share updated strategies for email database management to help with improving deliverability and customer engagement.

This guide is designed to help you navigate the complexities of email database management and make informed decisions that best suit your business needs. So, whether you’re new to Amazon SES or looking to optimize your existing email database management practices, this guide is for you. Let’s get started!

Email Database Management in Amazon SES

Amazon Simple Email Service (SES) offers two primary ways to manage your email database: building out your own email database functionality and using the built-in list and subscription management service. Each approach has its own set of advantages and potential drawbacks, and the best choice depends on your specific use case and business needs.

Building Out Your Email Database Functionality

When you choose to build out your own email database functionality, you have the flexibility to customize the database to suit your specific needs and leverage SES’ scalability as an email channel to send email at high volumes to your customer. Depending on the business requirement, the customizations could involve creating custom fields for subscriber data, implementing complex logic for categorizing and segmenting users, or integrating with other systems in your tech stack.

Using the Built-in List and Subscription Management Service

Alternatively, you can look at Amazon SES’s built-in list and subscription management service, which offers a ready-made solution for managing your email database. It handles tasks such as managing subscriptions to different topics and maintaining your customer email database through contact lists. Additionally, you can insert up to two links per email to the subscription preference page, which allow users to manage their topic preferences within Amazon SES.

SubscriptionPage

The non-configurable subscription page will automatically populate the customer’s current subscribed topic and allow setting of granular topic’s preferences. More information on how to configure that can be found here.

The following table should serve as a guideline to help you with deciding your approach for Email Database Management.
Building Your Own Email Database Functionality Using Built-in List and Subscription Management Service
Pros

Customization: Full control over the database structure and functionality, allowing for tailoring to specific needs. This includes creating custom fields for subscriber data, implementing own algorithms for handling bounces and complaints, and integrating with other systems in the tech stack.

Integration: Flexible flow of data across the business due to the ability to integrate the email database with other systems in the tech stack. You’ve already built your own email database or have one in mind which supports querying, building that database external to Amazon SES would make for a more customizable implementation.

Data Ownership: When you manage your own database, you have full ownership and control over your data. This can be important for businesses with strict data governance or regulatory requirements.

Ease of Use: The built-in service provides readily-available API to create, update and delete contacts. These operations are also available via REST API, AWS CLI and SDK. Once you’ve set up the subscription topics and contact lists, you can leverage the preference center to allow your customers to easily sub/unsubscribe from different topics.

Cost-Effective: More cost-effective than building own functionality as it requires less time and resources. The built-in service is also available free of charge unlike building out own infrastructure which would require ongoing infrastructure service costs.

Cons

Time and Resources: Building your own email database functionality requires a significant investment of time and resources. This includes the initial setup of the database, designing the schema, setting up the servers, and configuring the database software. Additionally, you’ll need to develop the functionality for managing subscriptions, and database cleanup in upon receiving bounces and complaints. Databases require ongoing maintenance to ensure they remain operational and efficient. This includes tasks like updating the database software, managing backups, optimizing queries, and scaling the database as your subscriber base grows.

Complexity: As your subscriber base grows, managing your own email database can become increasingly complex. You’ll need to handle more data, which can slow down queries and make the database more difficult to manage. You’ll also need to deal with more complex issues like data integrity, redundancy, and normalization. Additionally, as you add more features to your email database functionality, the codebase can become more complex, making it harder to maintain and debug.

Security: When you manage your own email database, you’re responsible for its security. This includes protecting the data from unauthorized access, ensuring the confidentiality of your subscribers’ information, and complying with data protection regulations. You’ll need to implement security measures like encryption, access controls, and regular security audits. If your database is compromised, it could lead to data loss or a breach of your subscribers’ privacy, which could damage your reputation and potentially lead to legal consequences.

Limited Customization: The built-in service may not offer the same level of customization as building own functionality. It may not meet all needs if there are specific requirements. For example, the preference center management page cannot be customized.

Dependence: Using the built-in service means you’re reliant on Amazon SES for your email database management. If the service experiences downtime or issues, it could impact your ability to manage your email database. This could potentially disrupt your email campaigns and affect your relationship with your subscribers. Furthermore, if you decide to switch to a different email service provider in the future, migrating your email database from the built-in service could be a complex and time-consuming process. Additionally, if your email database needs to be accessed or manipulated by other systems in your tech stack, this dependency on Amazon SES could complicate the integration process and limit your flexibility.

Customer Use Cases Best suited for businesses with specific needs that aren’t met by standard list management services, or those who wish to integrate their email database with other systems. For example, a large e-commerce company might choose to build out their own email database functionality to integrate with their customer relationship management (CRM) and inventory systems. Ideal for small to medium-sized businesses that need a straightforward, cost-effective solution for managing their email database. It’s also a good fit for businesses without the resources or technical expertise to build their own email database functionality.

Strategies for Email Database Management with Amazon Simple Email Service

Once you’ve made the decision on whether to manage your email database within Amazon SES or build your own, that’s only half of the equation. It’s important to recognize that your email databases will only work best to serve the business needs when you have processes in place to maintain them. In this section, let’s go through some of the best practices on how to do so.

  • Maintaining email list hygiene:
    • Both Amazon SES and a custom-built email database require maintaining a healthy email list. This involves regularly cleaning your list to remove invalid email addresses, hard bounces, and unengaged subscribers. With Amazon SES, the process to handle hard bounces and complaints is automated.
    • With a custom-built email database, you have more control over how and when this cleaning occurs. Rather than focusing on only email addresses that either hard bounces or complained, you can remove unengaged users. Every business will have their own definition of an un-engaged users based on business needs. Regardless, you will need to store the engagement attribute (e.g. days since last interaction). This will be simpler to architect in an external database which supports querying and bulk modification.
  • Managing Subscriptions:
    • With Amazon SES, you can easily manage subscriptions using the built-in functionality. This includes adding new subscribers, removing unsubscribed users, and updating user topic preferences. However, you will not be able to customize the look-and-feel of your subscription preference pages.
    • If you build your own email database, you’ll need to create your own system for managing subscriptions, which could require significant time and resources. The trade-off is that you can fully customize your subscription management system to showcase your branding on the subscription preference page and also handle custom logic for subscription/unsubscription.
  • Encouraging Engagement: Low engagement rates can indicate that your recipients are not interested in your content. To stimulate action, you can include a survey in the email, ask for feedback, or run a giveaway. You can then filter out inactive subscribers who still aren’t interacting with your emails. For engaged subscribers, you can segment these audiences into sub-groups by preference and send tailored email marketing campaigns. Before removing less active subscribers, consider what other kinds of content you could provide that might be more appealing. Unengaged subscribers can sometimes be re-engaged with the right offer, such as a free gift, a special perk, or exclusive content.
  • Renewing Opt-In: For your disengaged subscribers, send a re-optin campaign and remove them if they don’t re-subscribe. Be transparent! Notify inactive subscribers that you’ve noticed their lack of engagement and let them know that you don’t want to clutter their inbox if they’re not interested. Ask them if they want to continue to receive emails with a clear call-to-action button that will re-sign them up for future emails.
  • Making It Easy to Unsubscribe: Including an easy-to-find unsubscribe button and a one-step opt-out process won’t encourage subscribers to leave if you’re giving them a reason to stay. If recipients feel like they can’t leave, they’ll just mark your emails as spam, which counts as a big strike against your sender reputation.

Remember, effective email database management is a continuous process that requires regular attention and maintenance. By following these best practices, you can maximize the effectiveness of your email marketing efforts and build strong relationships with your subscribers.

Conclusion

In conclusion, maintaining a healthy email database is a critical aspect of successful email marketing. Whether you choose to build out your own email database functionality or use Amazon SES’s built-in list and subscription management service, it’s important to understand the pros and cons of each approach and align your decision with your business needs.

Building your own email database functionality offers the advantage of customization and integration with other systems in your tech stack. However, it requires significant time, resources, and technical expertise. On the other hand, Amazon SES’s built-in service is easy to use, cost-effective, and handles many complexities of email database management, but it may not offer the same level of customization.

Regardless of the approach you choose, following best practices for email database management is essential. This includes handling bounces and complaints, managing subscriptions, encouraging engagement, sending re-engagement email campaigns, renewing opt-ins, and making it easy to unsubscribe.

These practices will help you maintain a healthy email list, improve engagement rates, and ultimately, enhance the effectiveness of your email marketing efforts.It’s important to stay updated with the latest trends and strategies in email database management. So, keep exploring, learning, and implementing the best practices that suit your business needs.

For more information on Amazon SES and its features, visit the Amazon SES Documentation. Here, you’ll find comprehensive guides, tutorials, and API references to help you make the most of Amazon SES.

Introducing the Enhanced Document API for DynamoDB in the AWS SDK for Java 2.x

Post Syndicated from John Viegas original https://aws.amazon.com/blogs/devops/introducing-the-enhanced-document-api-for-dynamodb-in-the-aws-sdk-for-java-2-x/

We are excited to announce that the AWS SDK for Java 2.x now offers the Enhanced Document API for DynamoDB, providing an enhanced way of working with Amazon DynamoDb items.
This post covers using the Enhanced Document API for DynamoDB with the DynamoDB Enhanced Client. By using the Enhanced Document API, you can create an EnhancedDocument instance to represent an item with no fixed schema, and then use the DynamoDB Enhanced Client to read and write to DynamoDB.
Furthermore, unlike the Document APIs of aws-sdk-java 1.x, which provided arguments and return types that were not type-safe, the EnhancedDocument provides strongly-typed APIs for working with documents. This interface simplifies the development process and ensures that the data is correctly typed.

Prerequisites:

Before getting started, ensure you are using an up-to-date version of the AWS Java SDK dependency with all the latest released bug-fixes and features. For Enhanced Document API support, you must use version 2.20.33 or later. See our “Set up an Apache Maven project” guide for details on how to manage the AWS Java SDK dependency in your project.

Add dependency for dynamodb-enhanced in pom.xml.

<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>dynamodb-enhanced</artifactId>
<version>2.20.33</version>
</dependency>

Quick walk-through for using Enhanced Document API to interact with DDB

Step 1 : Create a DynamoDB Enhanced Client

Create an instance of the DynamoDbEnhancedClient class, which provides a high-level interface for Amazon DynamoDB that simplifies working with DynamoDB tables.

DynamoDbEnhancedClient enhancedClient = DynamoDbEnhancedClient.builder()
                                               .dynamoDbClient(DynamoDbClient.create())
                                               .build();

Step 2 : Create a DynamoDbTable resource object with Document table schema

To execute commands against a DynamoDB table using the Enhanced Document API, you must associate the table with your Document table schema to create a DynamoDbTable resource object. The Document table schema builder requires the primary index key and attribute converter providers. Use AttributeConverterProvider.defaultProvider() to convert document attributes of default types. An optional secondary index key can be added to the builder.


DynamoDbTable<EnhancedDocument> documentTable = enhancedClient.table("my_table",
                                              TableSchema.documentSchemaBuilder()
                                                         .addIndexPartitionKey(TableMetadata.primaryIndexName(),"hashKey", AttributeValueType.S)
                                                         .addIndexSortKey(TableMetadata.primaryIndexName(), "sortKey", AttributeValueType.N)
                                                         .attributeConverterProviders(AttributeConverterProvider.defaultProvider())
                                                         .build());
                                                         
// call documentTable.createTable() if "my_table" does not exist in DynamoDB

Step 3 : Write a DynamoDB item using an EnhancedDocument

The EnhancedDocument class has static factory methods along with a builder method to add attributes to a document. The following snippet demonstrates the type safety provided by EnhancedDocument when you construct a document item.

EnhancedDocument simpleDoc = EnhancedDocument.builder()
 .attributeConverterProviders(defaultProvider())
 .putString("hashKey", "sampleHash")
 .putNull("nullKey")
 .putNumber("sortKey", 1.0)
 .putBytes("byte", SdkBytes.fromUtf8String("a"))
 .putBoolean("booleanKey", true)
 .build();
 
documentTable.putItem(simpleDoc);

Step 4 : Read a Dynamo DB item as an EnhancedDocument

Attributes of the Documents retrieved from a DynamoDB table can be accessed with getter methods

EnhancedDocument docGetItem = documentTable.getItem(r -> r.key(k -> k.partitionValue("samppleHash").sortValue(1)));

docGetItem.getString("hashKey");
docGetItem.isNull("nullKey")
docGetItem.getNumber("sortKey").floatValue();
docGetItem.getBytes("byte");
docGetItem.getBoolean("booleanKey"); 

AttributeConverterProviders for accessing document attributes as custom objects

You can provide a custom AttributeConverterProvider instance to an EnhancedDocument to convert document attributes to a specific object type.
These providers can be set on either DocumentTableSchema or EnhancedDocument to read or write attributes as custom objects.

TableSchema.documentSchemaBuilder()
           .attributeConverterProviders(CustomClassConverterProvider.create(), defaultProvider())
           .build();
    
// Insert a custom class instance into an EnhancedDocument as attribute 'customMapOfAttribute'.
EnhancedDocument customAttributeDocument =
EnhancedDocument.builder().put("customMapOfAttribute", customClassInstance, CustomClass.class).build();

// Retrieve attribute 'customMapOfAttribute' as CustomClass object.
CustomClass customClassObject = customAttributeDocument.get("customMapOfAttribute", CustomClass.class);

Convert Documents to JSON and vice-versa

The Enhanced Document API allows you to convert a JSON string to an EnhancedDocument and vice-versa.

// Enhanced document created from JSON string using defaultConverterProviders.
EnhancedDocument documentFromJson = EnhancedDocument.fromJson("{\"key\": \"Value\"}")
                                              
// Converting an EnhancedDocument to JSON string "{\"key\": \"Value\"}"                                                 
String jsonFromDocument = documentFromJson.toJson();

Define a Custom Attribute Converter Provider

Custom attribute converter providers are implementations of AttributeConverterProvider that provide converters for custom classes.
Below is an example for a CustomClassForDocumentAPI which has as a single field stringAttribute of type String and its corresponding AttributeConverterProvider implementation.

public class CustomClassForDocumentAPI {
    private final String stringAttribute;

    public CustomClassForDocumentAPI(Builder builder) {
        this.stringAttribute = builder.stringAttribute;
    }
    public static Builder builder() {
        return new Builder();
    }
    public String stringAttribute() {
        return stringAttribute;
    }
    public static final class Builder {
        private String stringAttribute;
        private Builder() {
        }
        public Builder stringAttribute(String stringAttribute) {
            this.stringAttribute = string;
            return this;
        }
        public CustomClassForDocumentAPI build() {
            return new CustomClassForDocumentAPI(this);
        }
    }
}
import java.util.Map;
import software.amazon.awssdk.enhanced.dynamodb.AttributeConverter;
import software.amazon.awssdk.enhanced.dynamodb.AttributeConverterProvider;
import software.amazon.awssdk.enhanced.dynamodb.EnhancedType;
import software.amazon.awssdk.utils.ImmutableMap;

public class CustomAttributeForDocumentConverterProvider implements AttributeConverterProvider {
    private final Map<EnhancedType<?>, AttributeConverter<?>> converterCache = ImmutableMap.of(
        EnhancedType.of(CustomClassForDocumentAPI.class), new CustomClassForDocumentAttributeConverter());
        // Different types of converters can be added to this map.

    public static CustomAttributeForDocumentConverterProvider create() {
        return new CustomAttributeForDocumentConverterProvider();
    }

    @Override
    public <T> AttributeConverter<T> converterFor(EnhancedType<T> enhancedType) {
        return (AttributeConverter<T>) converterCache.get(enhancedType);
    }
}

A custom attribute converter is an implementation of AttributeConverter that converts a custom classes to and from a map of attribute values, as shown below.

import java.util.LinkedHashMap;
import java.util.Map;
import software.amazon.awssdk.enhanced.dynamodb.AttributeConverter;
import software.amazon.awssdk.enhanced.dynamodb.AttributeValueType;
import software.amazon.awssdk.enhanced.dynamodb.EnhancedType;
import software.amazon.awssdk.enhanced.dynamodb.internal.converter.attribute.EnhancedAttributeValue;
import software.amazon.awssdk.enhanced.dynamodb.internal.converter.attribute.StringAttributeConverter;
import software.amazon.awssdk.services.dynamodb.model.AttributeValue;

public class CustomClassForDocumentAttributeConverter implements AttributeConverter<CustomClassForDocumentAPI> {
    public static CustomClassForDocumentAttributeConverter create() {
        return new CustomClassForDocumentAttributeConverter();
    }
    @Override
    public AttributeValue transformFrom(CustomClassForDocumentAPI input) {
        Map<String, AttributeValue> attributeValueMap = new LinkedHashMap<>();
        if(input.string() != null){
            attributeValueMap.put("stringAttribute", AttributeValue.fromS(input.string()));
        }
        return EnhancedAttributeValue.fromMap(attributeValueMap).toAttributeValue();
    }

    @Override
    public CustomClassForDocumentAPI transformTo(AttributeValue input) {
        Map<String, AttributeValue> customAttr = input.m();
        CustomClassForDocumentAPI.Builder builder = CustomClassForDocumentAPI.builder();
        if (customAttr.get("stringAttribute") != null) {
            builder.stringAttribute(StringAttributeConverter.create().transformTo(customAttr.get("stringAttribute")));
        }
        return builder.build();
    }
    @Override
    public EnhancedType<CustomClassForDocumentAPI> type() {
        return EnhancedType.of(CustomClassForDocumentAPI.class);
    }
    @Override
    public AttributeValueType attributeValueType() {
        return AttributeValueType.M;
    }
}

Attribute Converter Provider for EnhancedDocument Builder

When working outside of a DynamoDB table context, make sure to set the attribute converter providers explicitly on the EnhancedDocument builder. When used within a DynamoDB table context, the table schema’s converter provider will be used automatically for the EnhancedDocument.
The code snippet below shows how to set an AttributeConverterProvider using the EnhancedDocument builder method.

// Enhanced document created from JSON string using custom AttributeConverterProvider.
EnhancedDocument documentFromJson = EnhancedDocument.builder()
                                                    .attributeConverterProviders(CustomClassConverterProvider.create())
                                                    .json("{\"key\": \"Values\"}")
                                                    .build();
                                                    
CustomClassForDocumentAPI customClass = documentFromJson.get("key", CustomClassForDocumentAPI.class)

Conclusion

In this blog post we showed you how to set up and begin using the Enhanced Document API with the DynamoDB Enhanced Client and standalone with the EnhancedDocument class. The enhanced client is open-source and resides in the same repository as the AWS SDK for Java 2.0.
We hope you’ll find this new feature useful. You can always share your feedback on our GitHub issues page.

Meet the Newest AWS Heroes – March 2023

Post Syndicated from Taylor Jacobsen original https://aws.amazon.com/blogs/aws/meet-the-newest-aws-heroes-march-2023/

The AWS Heroes are passionate AWS experts who are dedicated to sharing their in-depth knowledge within the community. They inspire, uplift, and motivate the global AWS community, and today, we’re excited to announce and recognize the newest Heroes in 2023!

Aidan Steele – Melbourne, Australia

Serverless Hero Aidan Steele is a Senior Engineer at Nightvision. He is an avid AWS user, and has been using the first platform and EC2 since 2008. Fifteen years later, EC2 still has a special place in his heart, but his interests are in containers and serverless functions, and blurring the distinction between them wherever possible. He enjoys finding novel uses for AWS services, especially when they have a security or network focus. This is best demonstrated through his open source contributions on GitHub, where he shares interesting use cases via hands-on projects.

Ananda Dwi Rahmawati – Yogyakarta, Indonesia

Container Hero Ananda Dwi Rahmawati is a Sr. Cloud Infrastructure Engineer, specializing in system integration between cloud infrastructure, CI/CD workflows, and application modernization. She implements solutions using powerful services provided by AWS, such as Amazon Elastic Kubernetes Service (EKS), combined with open source tools to achieve the goal of creating reliable, highly available, and scalable systems. She is a regular technical speaker who delivers presentations using real-world case studies at several local community meetups and conferences, such as Kubernetes and OpenInfra Days Indonesia, AWS Community Day Indonesia, AWS Summit ASEAN, and many more.

Wendy Wong – Sydney, Australia

Data Hero Wendy Wong is a Business Performance Analyst at Service NSW, building data pipelines with AWS Analytics and agile projects in AI. As a teacher at heart, she enjoys sharing her passion as a Data Analytics Lead Instructor for General Assembly Sydney, writing technical blogs on dev.to. She is both an active speaker for AWS analytics and an advocate of diversity and inclusion, presenting at a number of events: AWS User Group Malaysia, Women Who Code, AWS Summit Australia 2022, AWS BuildHers, AWS Innovate Modern Applications, and many more.

Learn More

If you’d like to learn more about the new Heroes or connect with a Hero near you, please visit the AWS Heroes website or browse the AWS Heroes Content Library.

Taylor

Enable federation to Amazon QuickSight accounts with Ping One

Post Syndicated from Srikanth Baheti original https://aws.amazon.com/blogs/big-data/enable-federation-to-amazon-quicksight-accounts-with-ping-one/

Amazon QuickSight is a scalable, serverless, embeddable, machine learning (ML)-powered business intelligence (BI) service built for the cloud that supports identity federation in both Standard and Enterprise editions. Organizations are working towards centralizing their identity and access strategy across all of their applications, including on-premises, third-party, and applications on AWS. Many organizations use Ping One to control and manage user authentication and authorization centrally. If your organization uses Ping One for cloud applications, you can enable federation to all of your QuickSight accounts without needing to create and manage users in QuickSight. This authorizes users to access QuickSight assets—analyses, dashboards, folders, and datasets—through centrally managed Ping One.

In this post, we go through the steps to configure federated single sign-on (SSO) between a Ping One instance and a QuickSight account. We demonstrate registering an SSO application in Ping One, creating groups, and mapping to an AWS Identity and Access Management (IAM) role that translates to QuickSight user license types (admin, author, and reader). These QuickSight roles represent three different personas supported in QuickSight. Administrators can publish the QuickSight app in Ping One to enable users to perform SSO to QuickSight using their Ping credentials.

Prerequisites

To complete this walkthrough, you must have the following prerequisites:

  • A Ping One subscription
  • One or more QuickSight account subscriptions

Solution overview

The walkthrough includes the following steps:

  1. Create groups in Ping One for each of the QuickSight user license types.
  2. Register an AWS application in Ping One.
  3. Add Ping One as your SAML identity provider (IdP) in AWS.
  4. Configure an IAM policy.
  5. Configure an IAM role.
  6. Configure your AWS application in Ping One.
  7. Test the application from Ping One.

Create groups in Ping One for each of the QuickSight roles

To create groups in Ping One, complete the following steps:

  1. Sign in to the Ping One portal using an administrator account.
  2. Under Identities, choose Groups.
  3. Choose the plus sign to add a group.
    BDB-2210-Ping-Groups
  4. For Group Name, enter QuickSightReaders.
  5. Choose Save.
    BDB-2210-Ping-Groups-Save
  6. Repeat these steps to create the groups QuickSightAdmins and QuickSightAuthors.

Register an AWS application in Ping One

To configure the integration of an AWS application in Ping One, you need to add AWS to your list of managed software as a service (SaaS) apps.

  1. Sign in to the Ping One portal using an administrator account.
  2. Under Connections, choose Application Catalog.
  3. In the search box, enter amazon web services.
  4. Choose Amazon Web Services – AWS from the results to add the application.  BDB-2210-Ping-AWS-APP
  5. For Name, enter Amazon QuickSight.
  6. Choose Next.
    BDB-2210-Ping-AWS-SAVEUnder Map Attributes, there should be four attributes.
  7. Delete the attribute related to SessionDuration.
  8. Choose Username as the value for all the remaining attributes for now.
    We update these values in later steps.
  9. Choose Next.
    BDB-2210-Ping-AWS-Attributes
  10. In the Select Groups section, add the QuickSightAdmins, QuickSightAuthors, and QuickSightReaders groups you created.
  11. Choose Save.
    BDB-2210-Ping-AWS-Attributes-Save
  12. After the application is created, choose the application again and download the federation metadata XML.

You use this in the next step.
BDB-2210-Ping-AWS-Metadata

Add Ping One as your SAML IdP in AWS

To configure Ping One as your SAML IdP, complete the following steps:

  1. Open a new tab in your browser.
  2. Sign in to the IAM console in your AWS account with admin permissions.
  3. On the IAM console, under Access Management in the navigation pane, choose Identity providers.
  4. Choose Add provider.
    BDB-2210-Ping-AWS-IAM
  5. For Provider name, enter PingOne.
  6. Choose file to upload the metadata document you downloaded earlier.
  7. Choose Add provider.
  8. In the banner message that appears, choose View provider.
  9. Copy the IdP ARN to use in a later step.
    BDB-2210-Ping-AWS-IAM_ARN

Configure an IAM policy

In this step, you create an IAM policy to map three different roles with permissions in QuickSight.

Use the following steps to set up QuickSightUserCreationPolicy. This policy grants privileges in QuickSight to the federated user based on the assigned groups in Ping One.

  1. On the IAM console, choose Policies.
  2. Choose Create policy.
  3. On the JSON tab, replace the existing text with the following code:
    {
       "Version": "2012-10-17",
        "Statement": [ 
             {  
                "Sid": "VisualEditor0", 
                 "Effect": "Allow", 
                 "Action": "quicksight:CreateAdmin", 
                 "Resource": "*", 
                 "Condition": { 
                     "StringEquals": { 
                         "aws:PrincipalTag/user-role": "QuickSightAdmins" 
     
                    } 
                 } 
             }, 
             { 
                 "Sid": "VisualEditor1", 
                 "Effect": "Allow", 
                 "Action": "quicksight:CreateUser", 
                 "Resource": "*", 
                 "Condition": { 
                     "StringEquals": { 
                         "aws:PrincipalTag/user-role": "QuickSightAuthors" 
                     } 
                 } 
             }, 
             { 
                 "Sid": "VisualEditor2", 
                 "Effect": "Allow", 
                 "Action": "quicksight:CreateReader", 
                 "Resource": "*", 
                 "Condition": { 
                     "StringEquals": { 
                         "aws:PrincipalTag/user-role": "QuickSightReaders" 
                     } 
                 } 
             } 
         ] 
     } 
  4. Choose Review policy.
    BDB-2210-AWS-IAM-Policy
  5. For Name, enter QuickSightUserCreationPolicy.
    BDB-2210-AWS-IAM-Policy-Save
  6. Choose Create policy.

Configure an IAM role

Next, create the role that Ping One users assume when federating into QuickSight. Use the following steps to set up the federated role:

  1. On the IAM console, choose Roles.
  2. Choose Create role.
  3. For Trusted entity type, select SAML 2.0 federation.
  4. For SAML 2.0-based provider, choose the provider you created earlier (PingOne).
  5. Select Allow programmatic and AWS Management Console access.
  6. For Attribute, choose SAML:aud.
  7. For Value, enter https://signin.aws.amazon.com/saml.
  8. Choose Next.
    BDB-2210-Ping-IAM-Role
  9. Under Permissions policies, select the QuickSightUserCreationPolicy IAM policy you created in the previous step.
  10. Choose Next.
    BDB-2210-Ping-IAM-Role_Permissions
  11. For Role name, enter QSPingOneFederationRole.
    DBD-2210-PingOne-IAM-Role-Name
  12. Choose Create role.
  13. On the IAM console, in the navigation pane, choose Roles.
  14. Choose the QSPingOneFederationRole role you created to open the role’s properties.
  15. Copy the role ARN to use in later steps.
  16. On the Trust relationships tab, under Trusted entities, verify that the IdP you created is listed.
  17. Under Condition in the policy code, verify that SAML:aud with a value of https://signin.aws.amazon.com/saml is present.
  18. Choose Edit trust policy to add an additional condition.
    DBD-2210-PingOne-IAM-TrustPolicy
  19. Under Condition, add the following code:
    "StringLike": {
    "aws:RequestTag/user-role": "*"
    }

  20. Under Action, add the following code:
      "sts:TagSession"

    BDB-2210-PingOne-Role-Save

  21. Choose Update policy to save changes.

Configure an AWS application in Ping One

To configure your AWS application, complete the following steps:

  1. Sign in to the Ping One portal using a Ping One administrator account.
  2. Under Connections, choose Application.
  3. Choose the Amazon QuickSight application you created earlier.
  4. On the Profile tab, choose Enable Advanced ConfigurationBDB-2210-Ping-AdvancedConfig
  5. Choose Enable in the pop-up window.
    BDB-2210-Ping-AdvancedConfig1
  6. On the Configuration tab, choose the pencil icon to edit the configuration.
    BDB-2210-Ping-AdvancedConfig2
  7. Under SIGNING KEY, select Sign Assertion & Response.
    BDB-2210-Ping-AdvancedConfig4
  8. Under SLO BINDING, for Assertion Validity Duration In Seconds, enter a duration, such as 900.
  9. For Target Application URL, enter https://quicksight.aws.amazon.com/.
  10. Choose Save.
    BDB-2210-Ping-AdvancedConfig5On the Attribute Mappings tab, you now add or update the attributes as in the following table.
Attribute Name Value
saml_subject Username
https://aws.amazon.com/SAML/Attributes/RoleSessionName Username
https://aws.amazon.com/SAML/Attributes/Role ‘arn:aws:iam::xxxxxxxxxx:role/QSPingOneFederationRole,
arn:aws:iam::xxxxxxxxxx:saml-provider/PingOne’
https://aws.amazon.com/SAML/Attributes/PrincipalTag:user-role user.memberOfGroupNames[0]
  1. Enter https://aws.amazon.com/SAML/Attributes/PrincipalTag:user-role for the attribute name and use the corresponding value from the table for the expression.
  2. Choose Save.
  3. If you have more than one QuickSight user role (for this post, QuickSightAdmins, QuicksightAuthors, and QuickSightReaders), you can add all the appropriate role names as follows:
    #data.containsAny(user.memberOfGroupNames,{'QuickSightAdmins'})? 'QuickSightAdmins' : 
    
    #data.containsAny(user.memberOfGroupNames,{'QuickSightAuthorss'}) ? 'QuickSightAuthors' : 
    
    #data.containsAny(user.memberOfGroupNames,{'QuickSightReaders'}) ?'QuickSightReaders' : null

  4. To edit the role attribute, choose the gear icon next to the role.
  5. Populate the corresponding expression from the table and choose Save.

The format of the expression is the role ARN (copied in the role creation step) followed by the IdP ARN (copied in the IdP creation step) separated by a comma.

Test the application

In this section, you test your Ping One SSO configuration by using a Microsoft application.

  1. In the Ping One portal, under Identities, choose Groups.
  2. Choose a group and choose Add Users Individually.
  3. From the list of users, add the appropriate users to the group by choosing the plus sign.
  4. Choose Save.
  5. To test the connectivity, under Environment, choose Properties, then copy the URL under APPLICATION PORTAL URL.
  6. Browse to the URL in a private browsing window.
  7. Enter your user credentials and choose Sign On.
    Upon a successful sign-in, you’re redirected to the All Applications page with a new application called Amazon QuickSight.
  8. Choose the Amazon QuickSight application to be redirected to the QuickSight console.

Note in the following screenshot that the user name at the top of the page shows as the Ping One federated user.

Summary

This post provided step-by-step instructions to configure federated SSO between Ping One and the QuickSight console. We also discussed how to create policies and roles in IAM and map groups in Ping One to IAM roles for secure access to the QuickSight console.

For additional discussions and help getting answers to your questions, check out the QuickSight Community.


About the authors

Srikanth Baheti is a Specialized World Wide Sr. Solution Architect for Amazon QuickSight. He started his career as a consultant and worked for multiple private and government organizations. Later he worked for PerkinElmer Health and Sciences & eResearch Technology Inc, where he was responsible for designing and developing high traffic web applications, highly scalable and maintainable data pipelines for reporting platforms using AWS services and Serverless computing.

Raji Sivasubramaniam is a Sr. Solutions Architect at AWS, focusing on Analytics. Raji is specialized in architecting end-to-end Enterprise Data Management, Business Intelligence and Analytics solutions for Fortune 500 and Fortune 100 companies across the globe. She has in-depth experience in integrated healthcare data and analytics with wide variety of healthcare datasets including managed market, physician targeting and patient analytics.

Raj Jayaraman is a Senior Specialist Solutions Architect for Amazon QuickSight. Raj focuses on helping customers develop sample dashboards, embed analytics and adopt BI design patterns and best practices.

Introducing Amazon Simple Queue Service dead-letter queue redrive to source queues

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-amazon-simple-queue-service-dead-letter-queue-redrive-to-source-queues/

This blog post is written by Mark Richman, a Senior Solutions Architect for SMB.

Today AWS is launching a new capability to enhance the dead-letter queue (DLQ) management experience for Amazon Simple Queue Service (SQS). DLQ redrive to source queues allows SQS to manage the lifecycle of unconsumed messages stored in DLQs.

SQS is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications. Using Amazon SQS, you can send, store, and receive messages between software components at any volume without losing messages or requiring other services to be available.

To use SQS, a producer sends messages to an SQS queue, and a consumer pulls the messages from the queue. Sometimes, messages can’t be processed due to a number of possible issues. These can include logic errors in consumers that cause message processing to fail, network connectivity issues, or downstream service failures. This can result in unconsumed messages remaining in the queue.

Understanding SQS dead-letter queues (DLQs)

SQS allows you to manage the life cycle of the unconsumed messages using dead-letter queues (DLQs).

A DLQ is a separate SQS queue that one or many source queues can send messages that can’t be processed or consumed. DLQs allow you to debug your application by letting you isolate messages that can’t be processed correctly to determine why their processing didn’t succeed. Use a DLQ to handle message consumption failures gracefully.

When you create a source queue, you can specify a DLQ and the condition under which SQS moves messages from the source queue to the DLQ. This is called the redrive policy. The redrive policy condition specifies the maxReceiveCount. When a producer places messages on an SQS queue, the ReceiveCount tracks the number of times a consumer tries to process the message. When the ReceiveCount for a message exceeds the maxReceiveCount for a queue, SQS moves the message to the DLQ. The original message ID is retained.

For example, a source queue has a redrive policy with maxReceiveCount set to 5. If the consumer of the source queue receives a message 6, without successfully consuming it, SQS moves the message to the dead-letter queue.

You can configure an alarm to alert you when any messages are delivered to a DLQ. You can then examine logs for exceptions that might have caused them to be delivered to the DLQ. You can analyze the message contents to diagnose consumer application issues. Once the issue has been resolved and the consumer application recovers, these messages can be redriven from the DLQ back to the source queue to process them successfully.

Previously, this required dedicated operational cycles to review and redrive these messages back to their source queue.

DLQ redrive to source queues

DLQ redrive to source queues enables SQS to manage the second part of the lifecycle of unconsumed messages that are stored in DLQs. Once the consumer application is available to consume the failed messages, you can now redrive the messages from the DLQ back to the source queue. You can optionally review a sample of the available messages in the DLQ. You redrive the messages using the Amazon SQS console. This allows you to more easily recover from application failures.

Using redrive to source queues

To show how to use the new functionality there is an existing standard source SQS queue called MySourceQueue.

SQS does not create DLQs automatically. You must first create an SQS queue and then use it as a DLQ. The DLQ must be in the same region as the source queue.

Create DLQ

  1. Navigate to the SQS Management Console and create a standard SQS queue for the DLQ called MyDLQ. Use the default configuration. Refer to the SQS documentation for instructions on creating a queue.
  2. Navigate to MySourceQueue and choose Edit.
  3. Navigate to the Dead-letter queue section and choose Enabled.
  4. Select the Amazon Resource Name (ARN) of the MyDLQ queue you created previously.
  5. You can configure the number of times that a message can be received before being sent to a DLQ by setting Set Maximum receives to a value between 1 and 1,000. For this demo enter a value of 1 to immediately drive messages to the DLQ.
  6. Choose Save.
Configure source queue with DLQ

Configure source queue with DLQ

The console displays the Details page for the queue. Within the Dead-letter queue tab, you can see the Maximum receives value and DLQ ARN.

DLQ configuration

DLQ configuration

Send and receive test messages

You can send messages to test the functionality in the SQS console.

  1. Navigate to MySourceQueue and choose Send and receive messages
  2. Send a number of test messages by entering the message content in Message body and choosing Send message.
  3. Send and receive messages

    Send and receive messages

  4. Navigate to the Receive messages section where you can see the number of messages available.
  5. Choose Poll for messages. The Maximum message count is set to 10 by default If you sent more than 10 test messages, poll multiple times to receive all the messages.
Poll for messages

Poll for messages

All the received messages are sent to the DLQ because the maxReceiveCount is set to 1. At this stage you would normally review the messages. You would determine why their processing didn’t succeed and resolve the issue.

Redrive messages to source queue

Navigate to the list of all queues and filter if required to view the DLQ. The queue displays the approximate number of messages available in the DLQ. For standard queues, the result is approximate because of the distributed architecture of SQS. In most cases, the count should be close to the actual number of messages in the queue.

Messages available in DLQ

Messages available in DLQ

  1. Select the DLQ and choose Start DLQ redrive.
  2. DLQ redrive

    DLQ redrive

    SQS allows you to redrive messages either to their source queue(s) or to a custom destination queue.

  3. Choose to Redrive to source queue(s), which is the default.
  4. Redrive has two velocity control settings.

  • System optimized sends messages back to the source queue as fast as possible
  • Custom max velocity allows SQS to redrive messages with a custom maximum rate of messages per second. This feature is useful for minimizing the impact to normal processing of messages in the source queue.

You can optionally inspect messages prior to redrive.

  • To redrive the messages back to the source queue, choose DLQ redrive.
  • DLQ redrive

    DLQ redrive

    The Dead-letter queue redrive status panel shows the status of the redrive and percentage processed. You can refresh the display or cancel the redrive.

    Dead-letter queue redrive status

    Dead-letter queue redrive status

    Once the redrive is complete, which takes a few seconds in this example, the status reads Successfully completed.

    Redrive status completed

    Redrive status completed

  • Navigate back to the source queue and you can see all the messages are redriven back from the DLQ to the source queue.
  • Messages redriven from DLQ to source queue

    Messages redriven from DLQ to source queue

    Conclusion

    Dead-letter queue redrive to source queues allows you to effectively manage the life cycle of unconsumed messages stored in dead-letter queues. You can build applications with the confidence that you can easily examine unconsumed messages, recover from errors, and reprocess failed messages.

    You can redrive messages from their DLQs to their source queues using the Amazon SQS console.

    Dead-letter queue redrive to source queues is available in all commercial regions, and coming soon to GovCloud.

    To get started, visit https://aws.amazon.com/sqs/

    For more serverless learning resources, visit Serverless Land.

    Forwarding emails automatically based on content with Amazon Simple Email Service

    Post Syndicated from Murat Balkan original https://aws.amazon.com/blogs/messaging-and-targeting/forwarding-emails-automatically-based-on-content-with-amazon-simple-email-service/

    Introduction

    Email is one of the most popular channels consumers use to interact with support organizations. In its most basic form, consumers will send their email to a catch-all email address where it is further dispatched to the correct support group. Often, this requires a person to inspect content manually. Some IT organizations even have a dedicated support group that handles triaging the incoming emails before assigning them to specialized support teams. Triaging each email can be challenging, and delays in email routing and support processes can reduce customer satisfaction. By utilizing Amazon Simple Email Service’s deep integration with Amazon S3, AWS Lambda, and other AWS services, the task of categorizing and routing emails is automated. This automation results in increased operational efficiencies and reduced costs.

    This blog post shows you how a serverless application will receive emails with Amazon SES and deliver them to an Amazon S3 bucket. The application uses Amazon Comprehend to identify the dominant language from the message body.  It then looks it up in an Amazon DynamoDB table to find the support group’s email address specializing in the email subject. As the last step, it forwards the email via Amazon SES to its destination. Archiving incoming emails to Amazon S3 also enables further processing or auditing.

    Architecture

    By completing the steps in this post, you will create a system that uses the architecture illustrated in the following image:

    Architecture showing how to forward emails by content using Amazon SES

    The flow of events starts when a customer sends an email to the generic support email address like [email protected]. This email is listened to by Amazon SES via a recipient rule. As per the rule, incoming messages are written to a specified Amazon S3 bucket with a given prefix.

    This bucket and prefix are configured with S3 Events to trigger a Lambda function on object creation events. The Lambda function reads the email object, parses the contents, and sends them to Amazon Comprehend for language detection.

    Amazon DynamoDB looks up the detected language code from an Amazon DynamoDB table, which includes the mappings between language codes and support group email addresses for these languages. One support group could answer English emails, while another support group answers French emails. The Lambda function determines the destination address and re-sends the same email address by performing an email forward operation. Suppose the lookup does not return any destination address, or the language was not be detected. In that case, the email is forwarded to a catch-all email address specified during the application deployment.

    In this example, Amazon SES hosts the destination email addresses used for forwarding, but this is not a requirement. External email servers will also receive the forwarded emails.

    Prerequisites

    To use Amazon SES for receiving email messages, you need to verify a domain that you own. Refer to the documentation to verify your domain with Amazon SES console. If you do not have a domain name, you will register one from Amazon Route 53.

    Deploying the Sample Application

    Clone this GitHub repository to your local machine and install and configure AWS SAM with a test AWS Identity and Access Management (IAM) user.

    You will use AWS SAM to deploy the remaining parts of this serverless architecture.

    The AWS SAM template creates the following resources:

    • An Amazon DynamoDB mapping table (language-lookup) contains information about language codes and associates them with destination email addresses.
    • An AWS Lambda function (BlogEmailForwarder) that reads the email content parses it, detects the language, looks up the forwarding destination email address, and sends it.
    • An Amazon S3 bucket, which will store the incoming emails.
    • IAM roles and policies.

    To start the AWS SAM deployment, navigate to the root directory of the repository you downloaded and where the template.yaml AWS SAM template resides. AWS SAM also requires you to specify an Amazon Simple Storage Service (Amazon S3) bucket to hold the deployment artifacts. If you haven’t already created a bucket for this purpose, create one now. You will refer to the documentation to learn how to create an Amazon S3 bucket. The bucket should have read and write access by an AWS Identity and Access Management (IAM) user.

    At the command line, enter the following command to package the application:

    sam package --template template.yaml --output-template-file output_template.yaml --s3-bucket BUCKET_NAME_HERE

    In the preceding command, replace BUCKET_NAME_HERE with the name of the Amazon S3 bucket that should hold the deployment artifacts.

    AWS SAM packages the application and copies it into this Amazon S3 bucket.

    When the AWS SAM package command finishes running, enter the following command to deploy the package:

    sam deploy --template-file output_template.yaml --stack-name blogstack --capabilities CAPABILITY_IAM --parameter-overrides FromEmailAddress=info@ YOUR_DOMAIN_NAME_HERE CatchAllEmailAddress=catchall@ YOUR_DOMAIN_NAME_HERE

    In the preceding command, change the YOUR_DOMAIN_NAME_HERE with the domain name you validated with Amazon SES. This domain also applies to other commands and configurations that will be introduced later.

    This example uses “blogstack” as the stack name, you will change this to any other name you want. When you run this command, AWS SAM shows the progress of the deployment.

    Configure the Sample Application

    Now that you have deployed the application, you will configure it.

    Configuring Receipt Rules

    To deliver incoming messages to Amazon S3 bucket, you need to create a Rule Set and a Receipt rule under it.

    Note: This blog uses Amazon SES console to create the rule sets. To create the rule sets with AWS CloudFormation, refer to the documentation.

    1. Navigate to the Amazon SES console. From the left navigation choose Rule Sets.
    2. Choose Create a Receipt Rule button at the right pane.
    3. Add info@YOUR_DOMAIN_NAME_HERE as the first recipient addresses by entering it into the text box and choosing Add Recipient.

     

     

    Choose the Next Step button to move on to the next step.

    1. On the Actions page, select S3 from the Add action drop-down to reveal S3 action’s details. Select the S3 bucket that was created by the AWS SAM template. It is in the format of your_stack_name-inboxbucket-randomstring. You will find the exact name in the outputs section of the AWS SAM deployment under the key name InboxBucket or by visiting the AWS CloudFormation console. Set the Object key prefix to info/. This tells Amazon SES to add this prefix to all messages destined to this recipient address. This way, you will re-use the same bucket for different recipients.

    Choose the Next Step button to move on to the next step.

    In the Rule Details page, give this rule a name at the Rule name field. This example uses the name info-recipient-rule. Leave the rest of the fields with their default values.

    Choose the Next Step button to move on to the next step.

    1. Review your settings on the Review page and finalize rule creation by choosing Create Rule

    1. In this example, you will be hosting the destination email addresses in Amazon SES rather than forwarding the messages to an external email server. This way, you will be able to see the forwarded messages in your Amazon S3 bucket under different prefixes. To host the destination email addresses, you need to create different rules under the default rule set. Create three additional rules for catchall@YOUR_DOMAIN_NAME_HERE , english@ YOUR_DOMAIN_NAME_HERE and french@YOUR_DOMAIN_NAME_HERE email addresses by repeating the steps 2 to 5. For Amazon S3 prefixes, use catchall/, english/, and french/ respectively.

     

    Configuring Amazon DynamoDB Table

    To configure the Amazon DynamoDB table that is used by the sample application

    1. Navigate to Amazon DynamoDB console and reach the tables view. Inspect the table created by the AWS SAM application.

    language-lookup table is the table where languages and their support group mappings are kept. You need to create an item for each language, and an item that will hold the default destination email address that will be used in case no language match is found. Amazon Comprehend supports more than 60 different languages. You will visit the documentation for the supported languages and add their language codes to this lookup table to enhance this application.

    1. To start inserting items, choose the language-lookup table to open table overview page.
    2. Select the Items tab and choose the Create item From the dropdown, select Text. Add the following JSON content and choose Save to create your first mapping object. While adding the following object, replace Destination attribute’s value with an email address you own. The email messages will be forwarded to that address.

    {

      “language”: “en”,

      “destination”: “english@YOUR_DOMAIN_NAME_HERE”

    }

    Lastly, create an item for French language support.

    {

      “language”: “fr”,

      “destination”: “french@YOUR_DOMAIN_NAME_HERE”

    }

    Testing

    Now that the application is deployed and configured, you will test it.

    1. Use your favorite email client to send the following email to the domain name info@ email address.

    Subject: I need help

    Body:

    Hello, I’d like to return the shoes I bought from your online store. How can I do this?

    After the email is sent, navigate to the Amazon S3 console to inspect the contents of the Amazon S3 bucket that is backing the Amazon SES Rule Sets. You will also see the AWS Lambda logs from the Amazon CloudWatch console to confirm that the Lambda function is triggered and run successfully. You should receive an email with the same content at the address you defined for the English language.

    1. Next, send another email with the same content, this time in French language.

    Subject: j’ai besoin d’aide

    Body:

    Bonjour, je souhaite retourner les chaussures que j’ai achetées dans votre boutique en ligne. Comment puis-je faire ceci?

     

    Suppose a message is not matched to a language in the lookup table. In that case, the Lambda function will forward it to the catchall email address that you provided during the AWS SAM deployment.

    You will inspect the new email objects under english/, french/ and catchall/ prefixes to observe the forwarding behavior.

    Continue experimenting with the sample application by sending different email contents to info@ YOUR_DOMAIN_NAME_HERE address or adding other language codes and email address combinations into the mapping table. You will find the available languages and their codes in the documentation. When adding a new language support, don’t forget to associate a new email address and Amazon S3 bucket prefix by defining a new rule.

    Cleanup

    To clean up the resources you used in your account,

    1. Navigate to the Amazon S3 console and delete the inbox bucket’s contents. You will find the name of this bucket in the outputs section of the AWS SAM deployment under the key name InboxBucket or by visiting the AWS CloudFormation console.
    2. Navigate to AWS CloudFormation console and delete the stack named “blogstack”.
    3. After the stack is deleted, remove the domain from Amazon SES. To do this, navigate to the Amazon SES Console and choose Domains from the left navigation. Select the domain you want to remove and choose Remove button to remove it from Amazon SES.
    4. From the Amazon SES Console, navigate to the Rule Sets from the left navigation. On the Active Rule Set section, choose View Active Rule Set button and delete all the rules you have created, by selecting the rule and choosing Action, Delete.
    5. On the Rule Sets page choose Disable Active Rule Set button to disable listening for incoming email messages.
    6. On the Rule Sets page, Inactive Rule Sets section, delete the only rule set, by selecting the rule set and choosing Action, Delete.
    7. Navigate to CloudWatch console and from the left navigation choose Logs, Log groups. Find the log group that belongs to the BlogEmailForwarderFunction resource and delete it by selecting it and choosing Actions, Delete log group(s).
    8. You will also delete the Amazon S3 bucket you used for packaging and deploying the AWS SAM application.

     

    Conclusion

    This solution shows how to use Amazon SES to classify email messages by the dominant content language and forward them to respective support groups. You will use the same techniques to implement similar scenarios. You will forward emails based on custom key entities, like product codes, or you will remove PII information from emails before forwarding with Amazon Comprehend.

    With its native integrations with AWS services, Amazon SES allows you to enhance your email applications with different AWS Cloud capabilities easily.

    To learn more about email forwarding with Amazon SES, you will visit documentation and AWS blogs.

    Better performance for less: AWS continues to beat Azure on SQL Server price/performance

    Post Syndicated from Fred Wurden original https://aws.amazon.com/blogs/compute/sql-server-runs-better-on-aws/

    By Fred Wurden, General Manager, AWS Enterprise Engineering (Windows, VMware, RedHat, SAP, Benchmarking)

    AWS R5b.8xlarge delivers better performance at lower cost than Azure E64_32s_v4 for a SQL Server workload

    In this blog, we will review a recent benchmark that Principled Technologies published on 2/25. The benchmark found that an Amazon Elastic Compute Cloud (Amazon EC2) R5b.8xlarge instance delivered better performance for a SQL Server workload at a lower cost when directly tested against an Azure E64_32s_v4 VM.

    Behind the study: Understanding how SQL Server performed better, for a lower cost with an AWS EC2 R5b instance

    Principled Technologies tested an online transaction processing (OLTP) workload for SQL Server 2019 on both an R5b instance on Amazon EC2 with Amazon Elastic Block Store (EBS) as storage and Azure E64_32s_v4. This particular Azure VM was chosen as an equivalent to the R5b instance, as both instances have comparable specifications for input/output operations per second (IOPS) performance, use Intel Xeon processors from the same generation (Cascade Lake), and offer the same number of cores (32). For storage, Principled Technologies mirrored storage configurations across the Azure VM and the EC2 instance (which used Amazon Elastic Block Store (EBS)), maxing out the IOPS specs on each while offering a direct comparison between instances.

    Test Configurations

    Source: Principled Technologies

    When benchmarking, Principled Technologies ran a TPC-C-like OLTP workload from HammerDB v3.3 on both instances, testing against new orders per minute (NOPM) performance. NOPM shows the number of new-order transactions completed in one minute as part of a serialized business workload. HammerDB claims that because NOPM is “independent of any particular database implementation [it] is the recommended primary metric to use.”

    The results: SQL Server on AWS EC2 R5b delivered 2x performance than the Azure VM and 62% less expensive 

    Graphs that show AWS instance outperformed the Azure instance

    Source: Principled Technologies

    These test results from the Principled Technologies report show the price/performance and performance comparisons. The performance metric is New Orders Per Minute (NOPM); faster is better. The price/performance calculations are based on the cost of on-demand, License Included SQL Server instances and storage to achieve 1,000 NOPM performance, smaller is better.

    An EC2 r5b.8xlarge instance powered by an Intel Xeon Scalable processor delivered better SQL Server NOPM performance on the HammerDB benchmark and a lower price per 1,000 NOPM than an Azure E64_32s_v4 VM powered by similar Intel Xeon Scalable processors.

    On top of that, AWS’s storage price-performance exceeded Azure’s. The Azure managed disks offered 53 percent more storage than the EBS storage, but the EC2 instance with EBS storage cost 24 percent less than the Azure VM with managed disks. Even by reducing Azure storage by the difference in storage, something customers cannot do, EBS would have cost 13 percent less per storage GB than the Azure managed disks.

    Why AWS is the best cloud to run your Windows and SQL Server workloads

    To us, these results aren’t surprising. In fact, they’re in line with the success that customers find running Windows on AWS for over 12 years. Customers like Pearson and Expedia have all found better performance and enhanced cost savings by moving their Windows, SQL Server, and .NET workloads to AWS. In fact, RepricerExpress migrated its Windows and SQL Server environments from Azure to AWS to slash outbound bandwidth costs while gaining agility and performance.

    Not only do we offer better price-performance for your Windows workloads, but we also offer better ways to run Windows in the cloud. Whether you want to rehost your databases to EC2, move to managed with Amazon Relational Database for SQL Server (RDS), or even modernize to cloud-native databases, AWS stands ready to help you get the most out of the cloud.

     


    To learn more on migrating Windows Server or SQL Server, visit Windows on AWS. For more stories about customers who have successfully migrated and modernized SQL Server workloads with AWS, visit our Customer Success page. Contact us to start your migration journey today.

    Creating a cross-region Active Directory domain with AWS Launch Wizard for Microsoft Active Directory

    Post Syndicated from AWS Admin original https://aws.amazon.com/blogs/compute/creating-a-cross-region-active-directory-domain-with-aws-launch-wizard-for-microsoft-active-directory/

    AWS Launch Wizard is a console-based service to quickly and easily size, configure, and deploy third party applications, such as Microsoft SQL Server Always On and HANA based SAP systems, on AWS without the need to identify and provision individual AWS resources. AWS Launch Wizard offers an easy way to deploy enterprise applications and optimize costs. Instead of selecting and configuring separate infrastructure services, you go through a few steps in the AWS Launch Wizard and it deploys a ready-to-use application on your behalf. It reduces the time you need to spend on investigating how to provision, cost and configure your application on AWS.

    You can now use AWS Launch Wizard to deploy and configure self-managed Microsoft Windows Server Active Directory Domain Services running on Amazon Elastic Compute Cloud (EC2) instances. With Launch Wizard, you can have fully-functioning, production-ready domain controllers within a few hours—all without having to manually deploy and configure your resources.

    You can use AWS Directory Service to run Microsoft Active Directory (AD) as a managed service, without the hassle of managing your own infrastructure. If you need to run your own AD infrastructure, you can use AWS Launch Wizard to simplify the deployment and configuration process.

    In this post, I walk through creation of a cross-region Active Directory domain using Launch Wizard. First, I deploy a single Active Directory domain spanning two regions. Then, I configure Active Directory Sites and Services to match the network topology. Finally, I create a user account to verify replication of the Active Directory domain.

    Diagram of Resources deployed in this post

    Figure 1: Diagram of resources deployed in this post

    Prerequisites

    1. You must have a VPC in your home. Additionally, you must have remote regions that have CIDRs that do not overlap with each other. If you need to create VPCs and subnets that do not overlap, please refer here.
    2. Each subnet used must have outbound internet connectivity. Feel free to either use a NAT Gateway or Internet Gateway.
    3. The VPCs must be peered in order to complete the steps in this post. For information on creating a VPC Peering connection between regions, please refer here.
    4. If you choose to deploy your Domain Controllers to a private subnet, you must have an RDP jump / bastion instance setup to allow you to RDP to your instance.

    Deploy Your Domain Controllers in the Home Region using Launch Wizard

    In this section, I deploy the first set of domain controllers into the us-east-1 the home region using Launch Wizard. I refer to US-East-1 as the home region, and US-West-2 as the remote region.

    1. In the AWS Launch Wizard Console, select Active Directory in the navigation pane on the left.
    2. Select Create deployment.
    3. In the Review Permissions page, select Next.
    4. In the Configure application settings page set the following:
      • General:
        • Deployment name: UsEast1AD
      • Active Directory (AD) installation
        • Installation type: Active Directory on EC2
      • Domain Settings:
        • Number of domain controllers: 2
        • AMI installation type: License-included AMI
      • License-included AMI: ami-################# | Windows_Server-2019-English-Full-Base-202#-##-##
      • Connection type: Create new Active Directory
      • Domain DNS name: corp.example.com
      • Domain NetBIOS Name: CORP
      • Connectivity:
        • Key Pair Name: Choose and exiting Key pair or select and existing one.
        • Virtual Private Cloud (VPC): Select Virtual Private Cloud (VPC)
      • VPC: Select your home region VPC
      • Availability Zone (AZ) and private subnets:
        • Select 2 Availability Zones
        • Choose the proper subnet in each subnet
        • Assign a Controller IP address for each domain controller
      • Remote Desktop Gateway preferences: Disregard for now, this is set up later.
      • Check the I confirm that a public subnet has been set up. Each of the selected private subnets have outbound connectivity enabled check box.
    1. Select Next.
    2. In the Define infrastructure requirements page, set the following inputs.
      • Storage and compute: Based on infrastructure requirements
      • Number of AD users: Up to 5000 users
    3. Select Next.
    4. In the Review and deploy page, review your selections. Then, select Deploy.

    Note that it may take up to 2 hours for your domain to be deployed. Once the status has changed to Completed, you can proceed to the next section. In the next section, I prepare Active Directory Sites and Services for the second set of domain controller in my other region.

    Configure Active Directory Sites and Services

    In this section, I configure the Active Directory Sites and Services topology to match my network topology. This step ensures proper Active Directory replication routing so that domain clients can find the closest domain controller. For more information on Active Directory Sites and Services, please refer here.

    Retrieve your Administrator Credentials from Secrets Manager

    1. From the AWS Secrets Manager Console in us-east-1, select the Secret that begins with LaunchWizard-UsEast1AD.
    2. In the middle of the Secret page, select Retrieve secret value.
      1. This will display the username and password key with their values.
      2. You need these credentials when you RDP into one of the domain controllers in the next steps.

    Rename the Default First Site

    1. Log in to the one of the domain controllers in us-east-1.
    2. Select Start, type dssite and hit Enter on your keyboard.
    3. The Active Directory Sites and Services MMC should appear.
      1. Expand Sites. There is a site named Default-First-Site-Name.
      2. Right click on Default-First-Site-Name select Rename.
      3. Enter us-east-1 as the name.
    4. Leave the Active Directory Sites and Services MMC open for the next set of steps.

    Create a New Site and Subnet Definition for US-West-2

    1. Using the Active Directory Sites and Services MMC from the previous steps, right click on Sites.
    2. Select New Site… and enter the following inputs:
      • Name: us-west-2
      • Select DEFAULTIPSITELINK.
    3.  Select OK.
    4. A pop up will appear telling you there will need to be some additional configuration. Select OK.
    5. Expand Sites and right click on Subnets and select New Subnet.
    6. Enter the following information:
      • Prefix: the CIDR of your us-west-2 VPC. An example would be 1.0.0/24
      • Site: select us-west-2
    7. Select OK.
    8. Leave the Active Directory Sites and Services MMC open for the following set of steps.

    Configure Site Replication Settings

    Using the Active Directory Sites and Services MMC from the previous steps, expand Sites, Inter-Site Transports, and select IP. You should see an object named DEFAULTIPSITELINK,

    1. Right click on DEFAULTIPSITELINK.
    2. Select Properties. Set or verify the following inputs on the General tab:
    3. Select Apply.
    4. In the DEFAULTIPSITELINK Properties, select the Attribute Editor tab and modify the following:
      • Scroll down and double click on Enter 1 for the Value, then select OK twice.
        • For more information on these settings, please refer here.
    5. Close the Active Directory Sites and Services MMC, as it is no longer needed.

    Prepare Your Home Region Domain Controllers Security Group

    In this section, I modify the Domain Controllers Security Group in us-east-1. This allows the domain controllers deployed in us-west-2 to communicate with each other.

    1. From the Amazon Elastic Compute Cloud (Amazon EC2) console, select Security Groups under the Network & Security navigation section.
    2. Select the Domain Controllers Security Group that was created with Launch Wizard Active Directory.
    3. Select Edit inbound rules. The Security Group should start with LaunchWizard-UsEast1AD-.
    4. Choose Add rule and enter the following:
      • Type: Select All traffic
      • Protocol: All
      • Port range: All
      • Source: Select Custom
      • Enter the CIDR of your remote VPC. An example would be 1.0.0/24
    5. Select Save rules.

    Create a Copy of Your Administrator Secret in Your Remote Region

    In this section, I create a Secret in Secrets Manager that contains the Administrator credentials when I created a home region.

    1. Find the Secret that being with LaunchWizard-UsEast1AD from the AWS Secrets Manager Console in us-east-1.
    2. In the middle of the Secret page, select Retrieve secret value.
      • This displays the username and password key with their values. Make note of these keys and values, as we need them for the next steps.
    3. From the AWS Secrets Manager Console, change the region to us-west-2.
    4. Select Store a new secret. Then, enter the following inputs:
      • Select secret type: Other type of secrets
      • Add your first keypair
      • Select Add row to add the second keypair
    5. Select Next, then enter the following inputs.
      • Secret name: UsWest2AD
      • Select Next twice
      • Select Store

    Deploy Your Domain Controllers in the Remote Region using Launch Wizard

    In this section, I deploy the second set of domain controllers into the us-west-1 region using Launch Wizard.

    1. In the AWS Launch Wizard Console, select Active Directory in the navigation pane on the left.
    2. Select Create deployment.
    3. In the Review Permissions page, select Next.
    4. In the Configure application settings page, set the following inputs.
      • General
        • Deployment name: UsWest2AD
      • Active Directory (AD) installation
        • Installation type: Active Directory on EC2
      • Domain Settings:
        • Number of domain controllers: 2
        • AMI installation type: License-included AMI
        • License-included AMI: ami-################# | Windows_Server-2019-English-Full-Base-202#-##-##
      • Connection type: Add domain controllers to existing Active Directory
      • Domain DNS name: corp.example.com
      • Domain NetBIOS Name: CORP
      • Domain Administrator secret name: Select you secret you created above.
      • Add permission to secret
        • After you verified the Secret you created above has the policy listed. Check the checkbox confirming the secret has the required policy.
      • Domain DNS IP address for resolution: The private IP of either domain controller in your home region
      • Connectivity:
        • Key Pair Name: Choose an existing Key pair
        • Virtual Private Cloud (VPC): Select Virtual Private Cloud (VPC)
      • VPC: Select your home region VPC
      • Availability Zone (AZ) and private subnets:
        • Select 2 Availability Zones
        • Choose the proper subnet in each subnet
        • Assign a Controller IP address for each domain controller
      • Remote Desktop Gateway preferences: disregard for now, as I set this later.
      • Check the I confirm that a public subnet has been set up. Each of the selected private subnets have outbound connectivity enabled check box
    1. In the Define infrastructure requirements page set the following:
      • Storage and compute: Based on infrastructure requirements
      • Number of AD users: Up to 5000 users
    2. In the Review and deploy page, review your selections. Then, select Deploy.

    Note that it may take up to 2 hours to deploy domain controllers. Once the status has changed to Completed, proceed to the next section. In this next section, I prepare Active Directory Sites and Services for the second set of domain controller in another region.

    Prepare Your Remote Region Domain Controllers Security Group

    In this section, I modify the Domain Controllers Security Group in us-west-2. This allows the domain controllers deployed in us-west-2 to communicate with each other.

    1. From the Amazon Elastic Compute Cloud (Amazon EC2) console, select Security Groups under the Network & Security navigation section.
    2. Select the Domain Controllers Security Group that was created by your Launch Wizard Active Directory.
    3. Select Edit inbound rules. The Security Group should start with LaunchWizard-UsWest2AD-EC2ADStackExistingVPC-
    4. Choose Add rule and enter the following:
      • Type: Select All traffic
      • Protocol: All
      • Port range: All
      • Source: Select Custom
      • Enter the CIDR of your remote VPC. An example would be 0.0.0/24
    5. Choose Save rules.

    Create an AD User and Verify Replication

    In this section, I create a user in one region and verify that it replicated to the other region. I also use AD replication diagnostics tools to verify that replication is working properly.

    Create a Test User Account

    1. Log in to one of the domain controllers in us-east-1.
    2. Select Start, type dsa and press Enter on your keyboard. The Active Directory Users and Computers MMC should appear.
    3. Right click on the Users container and select New > User.
    4. Enter the following inputs:
      • First name: John
      • Last name: Doe
      • User logon name: jdoe and select Next
      • Password and Confirm password: Your choice of complex password
      • Uncheck User must change password at next logon
    5. Select Next.
    6. Select Finish.

    Verify Test User Account Has Replicated

    1. Log in to the one of the domain controllers in us-west-2.
    2. Select Start and type dsa.
    3. Then, press Enter on your keyboard. The Active Directory Users and Computers MMC should appear.
    4. Select Users. You should see a user object named John Doe.

    Note that if the user is not present, it may not have been replicated yet. Replication should not take longer than 60 seconds from when the item was created.

    Summary

    Congratulations, you have created a cross-region Active Directory! In this post you:

    1. Launched a new Active Directory forest in us-east-1 using AWS Launch Wizard.
    2. Configured Active Directory Sites and Service for a multi-region configuration.
    3. Launched a set of new domain controllers in the us-west-2 region using AWS Launch Wizard.
    4. Created a test user and verified replication.

    This post only touches on a couple of features that are available in the AWS Launch Wizard Active Directory deployment. AWS Launch Wizard also automates the creation of a Single Tier PKI infrastructure or trust creation. One of the prime benefits of this solution is the simplicity in deploying a fully functional Active Directory environment in just a few clicks. You no longer need to do the undifferentiated heavy lifting required to deploy Active Directory.  For more information, please refer to AWS Launch Wizard documentation.

    Rapid and flexible Infrastructure as Code using the AWS CDK with AWS Solutions Constructs

    Post Syndicated from Biff Gaut original https://aws.amazon.com/blogs/devops/rapid-flexible-infrastructure-with-solutions-constructs-cdk/

    Introduction

    As workloads move to the cloud and all infrastructure becomes virtual, infrastructure as code (IaC) becomes essential to leverage the agility of this new world. JSON and YAML are the powerful, declarative modeling languages of AWS CloudFormation, allowing you to define complex architectures using IaC. Just as higher level languages like BASIC and C abstracted away the details of assembly language and made developers more productive, the AWS Cloud Development Kit (AWS CDK) provides a programming model above the native template languages, a model that makes developers more productive when creating IaC. When you instantiate CDK objects in your Typescript (or Python, Java, etc.) application, those objects “compile” into a YAML template that the CDK deploys as an AWS CloudFormation stack.

    AWS Solutions Constructs take this simplification a step further by providing a library of common service patterns built on top of the CDK. These multi-service patterns allow you to deploy multiple resources with a single object, resources that follow best practices by default – both independently and throughout their interaction.

    Comparison of an Application stack with Assembly Language, 4th generation language and Object libraries such as Hibernate with an IaC stack of CloudFormation, AWS CDK and AWS Solutions Constructs

    Application Development Stack vs. IaC Development Stack

    Solution overview

    To demonstrate how using Solutions Constructs can accelerate the development of IaC, in this post you will create an architecture that ingests and stores sensor readings using Amazon Kinesis Data Streams, AWS Lambda, and Amazon DynamoDB.

    An architecture diagram showing sensor readings being sent to a Kinesis data stream. A Lambda function will receive the Kinesis records and store them in a DynamoDB table.

    Prerequisite – Setting up the CDK environment

    Tip – If you want to try this example but are concerned about the impact of changing the tools or versions on your workstation, try running it on AWS Cloud9. An AWS Cloud9 environment is launched with an AWS Identity and Access Management (AWS IAM) role and doesn’t require configuring with an access key. It uses the current region as the default for all CDK infrastructure.

    To prepare your workstation for CDK development, confirm the following:

    • Node.js 10.3.0 or later is installed on your workstation (regardless of the language used to write CDK apps).
    • You have configured credentials for your environment. If you’re running locally you can do this by configuring the AWS Command Line Interface (AWS CLI).
    • TypeScript 2.7 or later is installed globally (npm -g install typescript)

    Before creating your CDK project, install the CDK toolkit using the following command:

    npm install -g aws-cdk

    Create the CDK project

    1. First create a project folder called stream-ingestion with these two commands:

    mkdir stream-ingestion
    cd stream-ingestion

    1. Now create your CDK application using this command:

    npx [email protected] init app --language=typescript

    Tip – This example will be written in TypeScript – you can also specify other languages for your projects.

    At this time, you must use the same version of the CDK and Solutions Constructs. We’re using version 1.68.0 of both based upon what’s available at publication time, but you can update this with a later version for your projects in the future.

    Let’s explore the files in the application this command created:

    • bin/stream-ingestion.ts – This is the module that launches the application. The key line of code is:

    new StreamIngestionStack(app, 'StreamIngestionStack');

    This creates the actual stack, and it’s in StreamIngestionStack that you will write the CDK code that defines the resources in your architecture.

    • lib/stream-ingestion-stack.ts – This is the important class. In the constructor of StreamIngestionStack you will add the constructs that will create your architecture.

    During the deployment process, the CDK uploads your Lambda function to an Amazon S3 bucket so it can be incorporated into your stack.

    1. To create that S3 bucket and any other infrastructure the CDK requires, run this command:

    cdk bootstrap

    The CDK uses the same supporting infrastructure for all projects within a region, so you only need to run the bootstrap command once in any region in which you create CDK stacks.

    1. To install the required Solutions Constructs packages for our architecture, run the these two commands from the command line:

    npm install @aws-solutions-constructs/[email protected]
    npm install @aws-solutions-constructs/[email protected]

    Write the code

    First you will write the Lambda function that processes the Kinesis data stream messages.

    1. Create a folder named lambda under stream-ingestion
    2. Within the lambda folder save a file called lambdaFunction.js with the following contents:
    var AWS = require("aws-sdk");
    
    // Create the DynamoDB service object
    var ddb = new AWS.DynamoDB({ apiVersion: "2012-08-10" });
    
    AWS.config.update({ region: process.env.AWS_REGION });
    
    // We will configure our construct to 
    // look for the .handler function
    exports.handler = async function (event) {
      try {
        // Kinesis will deliver records 
        // in batches, so we need to iterate through
        // each record in the batch
        for (let record of event.Records) {
          const reading = parsePayload(record.kinesis.data);
          await writeRecord(record.kinesis.partitionKey, reading);
        };
      } catch (err) {
        console.log(`Write failed, err:\n${JSON.stringify(err, null, 2)}`);
        throw err;
      }
      return;
    };
    
    // Write the provided sensor reading data to the DynamoDB table
    async function writeRecord(partitionKey, reading) {
    
      var params = {
        // Notice that Constructs automatically sets up 
        // an environment variable with the table name.
        TableName: process.env.DDB_TABLE_NAME,
        Item: {
          partitionKey: { S: partitionKey },  // sensor Id
          timestamp: { S: reading.timestamp },
          value: { N: reading.value}
        },
      };
    
      // Call DynamoDB to add the item to the table
      await ddb.putItem(params).promise();
    }
    
    // Decode the payload and extract the sensor data from it
    function parsePayload(payload) {
    
      const decodedPayload = Buffer.from(payload, "base64").toString(
        "ascii"
      );
    
      // Our CLI command will send the records to Kinesis
      // with the values delimited by '|'
      const payloadValues = decodedPayload.split("|", 2)
      return {
        value: payloadValues[0],
        timestamp: payloadValues[1]
      }
    }
    

    We won’t spend a lot of time explaining this function – it’s pretty straightforward and heavily commented. It receives an event with one or more sensor readings, and for each reading it extracts the pertinent data and saves it to the DynamoDB table.

    You will use two Solutions Constructs to create your infrastructure:

    The aws-kinesisstreams-lambda construct deploys an Amazon Kinesis data stream and a Lambda function.

    • aws-kinesisstreams-lambda creates the Kinesis data stream and Lambda function that subscribes to that stream. To support this, it also creates other resources, such as IAM roles and encryption keys.

    The aws-lambda-dynamodb construct deploys a Lambda function and a DynamoDB table.

    • aws-lambda-dynamodb creates an Amazon DynamoDB table and a Lambda function with permission to access the table.
    1. To deploy the first of these two constructs, replace the code in lib/stream-ingestion-stack.ts with the following code:
    import * as cdk from "@aws-cdk/core";
    import * as lambda from "@aws-cdk/aws-lambda";
    import { KinesisStreamsToLambda } from "@aws-solutions-constructs/aws-kinesisstreams-lambda";
    
    import * as ddb from "@aws-cdk/aws-dynamodb";
    import { LambdaToDynamoDB } from "@aws-solutions-constructs/aws-lambda-dynamodb";
    
    export class StreamIngestionStack extends cdk.Stack {
      constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
        super(scope, id, props);
    
        const kinesisLambda = new KinesisStreamsToLambda(
          this,
          "KinesisLambdaConstruct",
          {
            lambdaFunctionProps: {
              // Where the CDK can find the lambda function code
              runtime: lambda.Runtime.NODEJS_10_X,
              handler: "lambdaFunction.handler",
              code: lambda.Code.fromAsset("lambda"),
            },
          }
        );
    
        // Next Solutions Construct goes here
      }
    }
    

    Let’s explore this code:

    • It instantiates a new KinesisStreamsToLambda object. This Solutions Construct will launch a new Kinesis data stream and a new Lambda function, setting up the Lambda function to receive all the messages in the Kinesis data stream. It will also deploy all the additional resources and policies required for the architecture to follow best practices.
    • The third argument to the constructor is the properties object, where you specify overrides of default values or any other information the construct needs. In this case you provide properties for the encapsulated Lambda function that informs the CDK where to find the code for the Lambda function that you stored as lambda/lambdaFunction.js earlier.
    1. Now you’ll add the second construct that connects the Lambda function to a new DynamoDB table. In the same lib/stream-ingestion-stack.ts file, replace the line // Next Solutions Construct goes here with the following code:
        // Define the primary key for the new DynamoDB table
        const primaryKeyAttribute: ddb.Attribute = {
          name: "partitionKey",
          type: ddb.AttributeType.STRING,
        };
    
        // Define the sort key for the new DynamoDB table
        const sortKeyAttribute: ddb.Attribute = {
          name: "timestamp",
          type: ddb.AttributeType.STRING,
        };
    
        const lambdaDynamoDB = new LambdaToDynamoDB(
          this,
          "LambdaDynamodbConstruct",
          {
            // Tell construct to use the Lambda function in
            // the first construct rather than deploy a new one
            existingLambdaObj: kinesisLambda.lambdaFunction,
            tablePermissions: "Write",
            dynamoTableProps: {
              partitionKey: primaryKeyAttribute,
              sortKey: sortKeyAttribute,
              billingMode: ddb.BillingMode.PROVISIONED,
              removalPolicy: cdk.RemovalPolicy.DESTROY
            },
          }
        );
    
        // Add autoscaling
        const readScaling = lambdaDynamoDB.dynamoTable.autoScaleReadCapacity({
          minCapacity: 1,
          maxCapacity: 50,
        });
    
        readScaling.scaleOnUtilization({
          targetUtilizationPercent: 50,
        });
    

    Let’s explore this code:

    • The first two const objects define the names and types for the partition key and sort key of the DynamoDB table.
    • The LambdaToDynamoDB construct instantiated creates a new DynamoDB table and grants access to your Lambda function. The key to this call is the properties object you pass in the third argument.
      • The first property sent to LambdaToDynamoDB is existingLambdaObj – by setting this value to the Lambda function created by KinesisStreamsToLambda, you’re telling the construct to not create a new Lambda function, but to grant the Lambda function in the other Solutions Construct access to the DynamoDB table. This illustrates how you can chain many Solutions Constructs together to create complex architectures.
      • The second property sent to LambdaToDynamoDB tells the construct to limit the Lambda function’s access to the table to write only.
      • The third property sent to LambdaToDynamoDB is actually a full properties object defining the DynamoDB table. It provides the two attribute definitions you created earlier as well as the billing mode. It also sets the RemovalPolicy to DESTROY. This policy setting ensures that the table is deleted when you delete this stack – in most cases you should accept the default setting to protect your data.
    • The last two lines of code show how you can use statements to modify a construct outside the constructor. In this case we set up auto scaling on the new DynamoDB table, which we can access with the dynamoTable property on the construct we just instantiated.

    That’s all it takes to create the all resources to deploy your architecture.

    1. Save all the files, then compile the Typescript into a CDK program using this command:

    npm run build

    1. Finally, launch the stack using this command:

    cdk deploy

    (Enter “y” in response to Do you wish to deploy all these changes (y/n)?)

    You will see some warnings where you override CDK default values. Because you are doing this intentionally you may disregard these, but it’s always a good idea to review these warnings when they occur.

    Tip – Many mysterious CDK project errors stem from mismatched versions. If you get stuck on an inexplicable error, check package.json and confirm that all CDK and Solutions Constructs libraries have the same version number (with no leading caret ^). If necessary, correct the version numbers, delete the package-lock.json file and node_modules tree and run npm install. Think of this as the “turn it off and on again” first response to CDK errors.

    You have now deployed the entire architecture for the demo – open the CloudFormation stack in the AWS Management Console and take a few minutes to explore all 12 resources that the program deployed (and the 380 line template generated to created them).

    Feed the Stream

    Now use the CLI to send some data through the stack.

    Go to the Kinesis Data Streams console and copy the name of the data stream. Replace the stream name in the following command and run it from the command line.

    aws kinesis put-records \
    --stream-name StreamIngestionStack-KinesisLambdaConstructKinesisStreamXXXXXXXX-XXXXXXXXXXXX \
    --records \
    PartitionKey=1301,'Data=15.4|2020-08-22T01:16:36+00:00' \
    PartitionKey=1503,'Data=39.1|2020-08-22T01:08:15+00:00'

    Tip – If you are using the AWS CLI v2, the previous command will result in an “Invalid base64…” error because v2 expects the inputs to be Base64 encoded by default. Adding the argument --cli-binary-format raw-in-base64-out will fix the issue.

    To confirm that the messages made it through the service, open the DynamoDB console – you should see the two records in the table.

    Now that you’ve got it working, pause to think about what you just did. You deployed a system that can ingest and store sensor readings and scale to handle heavy loads. You did that by instantiating two objects – well under 60 lines of code. Experiment with changing some property values and deploying the changes by running npm run build and cdk deploy again.

    Cleanup

    To clean up the resources in the stack, run this command:

    cdk destroy

    Conclusion

    Just as languages like BASIC and C allowed developers to write programs at a higher level of abstraction than assembly language, the AWS CDK and AWS Solutions Constructs allow us to create CloudFormation stacks in Typescript, Java, or Python instead JSON or YAML. Just as there will always be a place for assembly language, there will always be situations where we want to write CloudFormation templates manually – but for most situations, we can now use the AWS CDK and AWS Solutions Constructs to create complex and complete architectures in a fraction of the time with very little code.

    AWS Solutions Constructs can currently be used in CDK applications written in Typescript, Javascript, Java and Python and will be available in C# applications soon.

    About the Author

    Biff Gaut has been shipping software since 1983, from small startups to large IT shops. Along the way he has contributed to 2 books, spoken at several conferences and written many blog posts. He is now a Principal Solutions Architect at AWS working on the AWS Solutions Constructs team, helping customers deploy better architectures more quickly.

    Event-driven architecture for using third-party Git repositories as source for AWS CodePipeline

    Post Syndicated from Kirankumar Chandrashekar original https://aws.amazon.com/blogs/devops/event-driven-architecture-for-using-third-party-git-repositories-as-source-for-aws-codepipeline/

    In the post Using Custom Source Actions in AWS CodePipeline for Increased Visibility for Third-Party Source Control, we demonstrated using custom actions in AWS CodePipeline and a worker that periodically polls for jobs and processes further to get the artifact from the Git repository. In this post, we discuss using an event-driven architecture to trigger an AWS CodePipeline pipeline that has a third-party Git repository within the source stage that is part of a custom action.

    Instead of using a worker to periodically poll for available jobs across all pipelines, we can define a custom source action on a particular pipeline to trigger an Amazon CloudWatch Events rule when the webhook on CodePipeline receives an event and puts it into an In Progress state. This works exactly like how CodePipeline works with natively supported Git repositories like AWS CodeCommit or GitHub as a source.

    Solution architecture

    The following diagram shows how you can use an event-driven architecture with a custom source stage that is associated with a third-party Git repository that isn’t supported by CodePipeline natively. For our use case, we use GitLab, but you can use any Git repository that supports Git webhooks.

    3rdparty-gitblog-new.jpg

    The architecture includes the following steps:

    1. A user commits code to a Git repository.

    2. The commit invokes a Git webhook.

    3. This invokes a CodePipeline webhook.

    4. The CodePipeline source stage is put into In Progress status.

    5. The source stage action triggers a CloudWatch Events rule that indicates the stage started.

    6. The CloudWatch event triggers an AWS Lambda function.

    7. The function polls for the job details of the custom action.

    8. The function also triggers AWS CodeBuild and passes all the job-related information.

    9. CodeBuild gets the public SSH key stored in AWS Secrets Manager (or user name and password, if using HTTPS Git access).

    10. CodeBuild clones the repository for a particular branch.

    11. CodeBuild zips and uploads the archive to the CodePipeline artifact store Amazon Simple Storage Service (Amazon S3) bucket.

    12. A Lambda function sends a success message to the CodePipeline source stage so it can proceed to the next stage.

    Similarly, with the same setup, if you chose a release change for the pipeline that has custom source stage, a CloudWatch event is triggered, which triggers a Lambda function, and the same process repeats until it gets the artifact from the Git repository.

    Solution overview

    To set up the solution, you complete the following steps:

    1. Create an SSH key pair for authenticating to the Git repository.

    2. Publish the key to Secrets Manager.

    3. Launch the AWS CloudFormation stack to provision resources.

    4. Deploy a sample CodePipeline and test the custom action type.

    5. Retrieve the webhook URL.

    6. Create a webhook and add the webhook URL.

    Creating an SSH key pair
    You first create an SSH key pair to use for authenticating to the Git repository using ssh-keygen on your terminal. See the following code:

    ssh-keygen -t rsa -b 4096 -C "[email protected]"

    Follow the prompt from ssh-keygen and give a name for the key, for example codepipeline_git_rsa. This creates two new files in the current directory: codepipeline_git_rsa and codepipeline_git_rsa.pub.

    Make a note of the contents of codepipeline_git_rsa.pub and add it as an authorized key for your Git user. For instructions, see Adding an SSH key to your GitLab account.

    Publishing the key
    Publish this key to Secrets Manager using the AWS Command Line Interface (AWS CLI):

    export SecretsManagerArn=$(aws secretsmanager create-secret --name codepipeline_git \
    --secret-string file://codepipeline_git_rsa --query ARN --output text)
    

    Make a note of the ARN, which is required later.

    Alternative, you can create a secret on the Secrets Manager console.

    Make sure that the lines in the private key codepipeline_git are the same when the value to the secret is added.

    Launching the CloudFormation stack

    Clone the git repository aws-codepipeline-third-party-git-repositories that contains the AWS CloudFormation templates and AWS Lambda function code using the below command:

    git clone https://github.com/aws-samples/aws-codepipeline-third-party-git-repositories.git .

    Now you should have the below files in the cloned repository

    cfn/
    |--sample_pipeline_custom.yaml
    `--third_party_git_custom_action.yaml
    lambda/
    `--lambda_function.py

    Launch the CloudFormation stack using the template third_party_git_custom_action.yaml from the cfn directory. The main resources created by this stack are:

    1. CodePipeline Custom Action Type. ResourceType: AWS::CodePipeline::CustomActionType
    2. Lambda Function. ResourceType: AWS::Lambda::Function
    3. CodeBuild Project. ResourceType: AWS::CodeBuild::Project
    4. Lambda Execution Role. ResourceType: AWS::IAM::Role
    5. CodeBuild Service Role. ResourceType: AWS::IAM::Role

    These resources help uplift the logic for connecting to the Git repository, which for this post is GitLab.

    Upload the Lambda function code to any S3 bucket in the same Region where the stack is being deployed. To create a new S3 bucket, use the following code (make sure to provide a unique name):

    export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
    export S3_BUCKET_NAME=codepipeline-git-custom-action-${ACCOUNT_ID} 
    aws s3 mb s3://${S3_BUCKET_NAME} --region us-east-1

    Then zip the contents of the function and upload to the S3 bucket (substitute the appropriate bucket name):

    export ZIP_FILE_NAME="codepipeline_git.zip"
    zip -jr ${ZIP_FILE_NAME} ./lambda/lambda_function.py && \
    aws s3 cp codepipeline_git.zip \
    s3://${S3_BUCKET_NAME}/${ZIP_FILE_NAME}
    

    If you don’t have a VPC and subnets that Lambda and CodeBuild require, you can create those by launching the following CloudFormation stack.

    Run the following AWS CLI command to deploy the third-party Git source solution stack:

    export vpcId="vpc-123456789"
    export subnetId1="subnet-12345"
    export subnetId2="subnet-54321"
    export GIT_SOURCE_STACK_NAME="thirdparty-codepipeline-git-source"
    aws cloudformation create-stack \
    --stack-name ${GIT_SOURCE_STACK_NAME} \
    --template-body file://$(pwd)/cfn/third_party_git_custom_action.yaml \
    --parameters ParameterKey=SourceActionVersion,ParameterValue=1 \
    ParameterKey=SourceActionProvider,ParameterValue=CustomSourceForGit \
    ParameterKey=GitPullLambdaSubnet,ParameterValue=${subnetId1}\\,${subnetId2} \
    ParameterKey=GitPullLambdaVpc,ParameterValue=${vpcId} \
    ParameterKey=LambdaCodeS3Bucket,ParameterValue=${S3_BUCKET_NAME} \
    ParameterKey=LambdaCodeS3Key,ParameterValue=${ZIP_FILE_NAME} \
    --capabilities CAPABILITY_IAM

    Alternatively, launch the stack by choosing Launch Stack:

    cloudformation-launch-stack.png

    For more information about the VPC requirements for Lambda and CodeBuild, see Internet and service access for VPC-connected functions and Use AWS CodeBuild with Amazon Virtual Private Cloud, respectively.

    A custom source action type is now available on the account where you deployed the stack. You can check this on the CodePipeline console by attempting to create a new pipeline. You can see your source type listed under Source provider.

    codepipeline-source-stage-dropdown.png

    Testing the pipeline

    We now deploy a sample pipeline and test the custom action type using the template sample_pipeline_custom.yaml from the cfn directory . You can run the following AWS CLI command to deploy the CloudFormation stack:

    Note: Please provide the GitLab repository url to SSH_URL environment variable that you have access to or create a new GitLab project and repository. The example url "[email protected]:kirankumar15/test.git" is for illustration purposes only.

    export SSH_URL="[email protected]:kirankumar15/test.git"
    export SAMPLE_STACK_NAME="third-party-codepipeline-git-source-test"
    aws cloudformation create-stack \
    --stack-name ${SAMPLE_STACK_NAME}\
    --template-body file://$(pwd)/cfn/sample_pipeline_custom.yaml \
    --parameters ParameterKey=Branch,ParameterValue=master \
    ParameterKey=GitUrl,ParameterValue=${SSH_URL} \
    ParameterKey=SourceActionVersion,ParameterValue=1 \
    ParameterKey=SourceActionProvider,ParameterValue=CustomSourceForGit \
    ParameterKey=CodePipelineName,ParameterValue=sampleCodePipeline \
    ParameterKey=SecretsManagerArnForSSHPrivateKey,ParameterValue=${SecretsManagerArn} \
    ParameterKey=GitWebHookIpAddress,ParameterValue=34.74.90.64/28 \
    --capabilities CAPABILITY_IAM

    Alternatively, choose Launch stack:

    cloudformation-launch-stack.png

    Retrieving the webhook URL
    When the stack creation is complete, retrieve the CodePipeline webhook URL from the stack outputs. Use the following AWS CLI command:

    aws cloudformation describe-stacks \
    --stack-name ${SAMPLE_STACK_NAME}\ 
    --output text \
    --query "Stacks[].Outputs[?OutputKey=='CodePipelineWebHookUrl'].OutputValue"

    For more information about stack outputs, see Outputs.

    Creating a webhook

    You can use an existing GitLab repository or create a new GitLab repository and follow the below steps to add a webhook to it.
    To create your webhook, complete the following steps:

    1. Navigate to the Webhooks Settings section on the GitLab console for the repository that you want to have as a source for CodePipeline.

    2. For URL, enter the CodePipeline webhook URL you retrieved in the previous step.

    3. Select Push events and optionally enter a branch name.

    4. Select Enable SSL verification.

    5. Choose Add webhook.

    gitlab-webhook.png

    For more information about webhooks, see Webhooks.

    We’re now ready to test the solution.

    Testing the solution

    To test the solution, we make changes to the branch that we passed as the branch parameter in the GitLab repository. This should trigger the pipeline. On the CodePipeline console, you can see the Git Commit ID on the source stage of the pipeline when it succeeds.

    Note: Please provide the GitLab repository url that you have access to or create a new GitLab repository. and make sure that it has buildspec.yml in the contents to execute in AWS CodeBuild project in the Build stage. The example url "[email protected]:kirankumar15/test.git" is for illustration purposes only.

    Enter the following code to clone your repository:

    git clone [email protected]:kirankumar15/test.git .

    Add a sample file to the repository with the name sample.txt, then commit and push it to the repository:

    echo "adding a sample file" >> sample_text_file.txt
    git add ./
    git commit -m "added sample_test_file.txt to the repository"
    git push -u origin master

    The pipeline should show the status In Progress.

    codepipeline_inprogress.png

    After few minutes, it changes to Succeeded status and you see the Git Commit message on the source stage.

    codepipeline_succeeded.png

    You can also view the Git Commit message by choosing the execution ID of the pipeline, navigating to the timeline section, and choosing the source action. You should see the Commit message and Commit ID that correlates with the Git repository.

    codepipeline-commit-msg.png

    Troubleshooting

    If the CodePipeline fails, check the Lambda function logs for the function with the name GitLab-CodePipeline-Source-${ACCOUNT_ID}. For instructions on checking logs, see Accessing Amazon CloudWatch logs for AWS Lambda.

    If the Lambda logs has the CodeBuild build ID, then check the CodeBuild run logs for that build ID for errors. For instructions, see View detailed build information.

    Cleaning up

    Delete the CloudFormation stacks that you created. You can use the following AWS CLI commands:

    aws cloudformation delete-stack --stack-name ${SAMPLE_STACK_NAME}
    
    aws cloudformation delete-stack --stack-name ${GIT_SOURCE_STACK_NAME}

    Alternatively, delete the stack on the AWS CloudFormation console.

    Additionally, empty the S3 bucket and delete it. Locate the bucket in the ${SAMPLE_STACK_NAME} stack. Then use the following AWS CLI command:

    aws s3 rb s3://${S3_BUCKET_NAME} --force

    You can also delete and empty the bucket on the Amazon S3 console.

    Conclusion

    You can use the architecture in this post for any Git repository that supports webhooks. This solution also works if the repository is reachable only from on premises, and if the endpoints can be accessed from a VPC. This event-driven architecture works just like using any natively supported source for CodePipeline.

     

    About the Author

    kirankumar.jpeg Kirankumar Chandrashekar is a DevOps consultant at AWS Professional Services. He focuses on leading customers in architecting DevOps technologies. Kirankumar is passionate about Infrastructure as Code and DevOps. He enjoys music, as well as cooking and travelling.

     

    Learn why AWS is the best cloud to run Microsoft Windows Server and SQL Server workloads

    Post Syndicated from Fred Wurden original https://aws.amazon.com/blogs/compute/learn-why-aws-is-the-best-cloud-to-run-microsoft-windows-server-and-sql-server-workloads/

    Fred Wurden, General Manager, AWS Enterprise Engineering (Windows, VMware, RedHat, SAP, Benchmarking)

    For companies that rely on Windows Server but find it daunting to move those workloads to the cloud, there is no easier way to run Windows in the cloud than AWS. Customers as diverse as Expedia, Pearson, Seven West Media, and RepricerExpress have chosen AWS over other cloud providers to unlock the Microsoft products they currently rely on, including Windows Server and SQL Server. The reasons are several: by embracing AWS, they’ve achieved cost savings through forthright pricing options and expanded breadth and depth of capabilities. In this blog, we break down these advantages to understand why AWS is the simplest, most popular and secure cloud to run your business-critical Windows Server and SQL Server workloads.

    AWS lowers costs and increases choice with flexible pricing options

    Customers expect accurate and transparent pricing so you can make the best decisions for your business. When assessing which cloud to run your Windows workloads, customers look at the total cost of ownership (TCO) of workloads.

    Not only does AWS provide cost-effective ways to run Windows and SQL Server workloads, we also regularly lower prices to make it even more affordable. Since launching in 2006, AWS has reduced prices 85 times. In fact, we recently dropped pricing by and average of 25% for Amazon RDS for SQL Server Enterprise Edition database instances in the Multi-AZ configuration, for both On-Demand Instance and Reserved Instance types on the latest generation hardware.

    The AWS pricing approach makes it simple to understand your costs, even as we actively help you pay AWS less now and in the future. For example, AWS Trusted Advisor provides real-time guidance to provision your resources more efficiently. This means that you spend less money with us. We do this because we know that if we aren’t creating more and more value for you each year, you’ll go elsewhere.

    In addition, we have several other industry-leading initiatives to help lower customer costs, including AWS Compute Optimizer, Amazon CodeGuru, and AWS Windows Optimization and Licensing Assessments (AWS OLA). AWS Compute Optimizer recommends optimal AWS Compute resources for your workloads by using machine learning (ML) to analyze historical utilization metrics. Customers who use Compute Optimizer can save up to 25% on applications running on Amazon Elastic Compute Cloud (Amazon EC2). Machine learning also plays a key role in Amazon CodeGuru, which provides intelligent recommendations for improving code quality and identifying an application’s most expensive lines of code. Finally, AWS OLA helps customers to optimize licensing and infrastructure provisioning based on actual resource consumption (ARC) to offer cost-effective Windows deployment options.

    Cloud pricing shouldn’t be complicated

    Other cloud providers bury key pricing information when making comparisons to other vendors, thereby incorrectly touting pricing advantages. Often those online “pricing calculators” that purport to clarify pricing neglect to include hidden fees, complicating costs through licensing rules (e.g., you can run this workload “for free” if you pay us elsewhere for “Software Assurance”). At AWS, we believe such pricing and licensing tricks are contrary to the fundamental promise of transparent pricing for cloud computing.

    By contrast, AWS makes it straightforward for you to run Windows Server applications where you want. With our End-of-Support Migration Program (EMP) for Windows Server, you can easily move your legacy Windows Server applications—without needing any code changes. The EMP technology decouples the applications from the underlying OS. This enables AWS Partners or AWS Professional Services to migrate critical applications from legacy Windows Server 2003, 2008, and 2008 R2 to newer, supported versions of Windows Server on AWS. This allows you to avoid extra charges for extended support that other cloud providers charge.

    Other cloud providers also may limit your ability to Bring-Your-Own-License (BYOL) for SQL Server to your preferred cloud provider. Meanwhile, AWS improves the BYOL experience using EC2 Dedicated Hosts and AWS License Manager. With EC2 Dedicated Hosts, you can save costs by moving existing Windows Server and SQL Server licenses do not have Software Assurance to AWS. AWS License Manager simplifies how you manage your software licenses from software vendors such as Microsoft, SAP, Oracle, and IBM across AWS and on-premises environments. We also work hard to help our customers spend less.

    How AWS helps customers save money on Windows Server and SQL Server workloads

    The first way AWS helps customers save money is by delivering the most reliable global cloud infrastructure for your Windows workloads. Any downtime costs customers in terms of lost revenue, diminished customer goodwill, and reduced employee productivity.

    With respect to pricing, AWS offers multiple pricing options to help our customers save. First, we offer AWS Savings Plans that provide you with a flexible pricing model to save up to 72 percent on your AWS compute usage. You can sign up for Savings Plans for a 1- or 3-year term. Our Savings Plans help you easily manage your plans by taking advantage of recommendations, performance reporting and budget alerts in AWS Cost Explorer, which is a unique benefit only AWS provides. Not only that, but we also offer Amazon EC2 Spot Instances that help you save up to 90 percent on your compute costs vs. On-Demand Instance pricing.

    Customers don’t need to walk this migration path alone. In fact, AWS customers often make the most efficient use of cloud resources by working with assessment partners like Cloudamize, CloudChomp, or Migration Evaluator (formerly TSO Logic), which is now part of AWS. By running detailed assessments of their environments with Migration Evaluator before migration, customers can achieve an average of 36 percent savings using AWS over three years. So how do you get from an on-premises Windows deployment to the cloud? AWS makes it simple.

    AWS has support programs and tools to help you migrate to the cloud

    Though AWS Migration Acceleration Program (MAP) for Windows is a great way to reduce the cost of migrating Windows Server and SQL Server workloads, MAP is more than a cost savings tool. As part of MAP, AWS offers a number of resources to support and sustain your migration efforts. This includes an experienced APN Partner ecosystem to execute migrations, our AWS Professional Services team to provide best practices and prescriptive advice, and a training program to help IT professionals understand and carry out migrations successfully. We help you figure out which workloads to move first, then leverage the combined experience of our Professional Services and partner teams to guide you through the process. For customers who want to save even more (up to 72% in some cases) we are the leaders in helping customers transform legacy systems to modernized managed services.

    Again, we are always available to help guide you in your Windows journey to the cloud. We guide you through our technologies like AWS Launch Wizard, which provides a guided way of sizing, configuring, and deploying AWS resources for Microsoft applications like Microsoft SQL Server Always On, or through our comprehensive ecosystem of tens of thousands of partners and third-party solutions, including many with deep expertise with Windows technologies.

    Why run Windows Server and SQL Server anywhere else but AWS?

    Not only does AWS offer significantly more services than any other cloud, with over 48 services without comparable equivalents on other clouds, but AWS also provides better ways to use Microsoft products than any other cloud. This includes Active Directory as a managed service and FSx for Windows File Server, the only fully managed file storage service for Windows. If you’re interested in learning more about how AWS improves the Windows experience, please visit this article on our Modernizing with AWS blog.

    Bring your Windows Server and SQL Server workloads to AWS for the most secure, reliable, and performant cloud, providing you with the depth and breadth of capabilities at the lowest cost. To learn more, visit Windows on AWS. Contact us today to learn more on how we can help you move your Windows to AWS or innovate on open source solutions.

    About the Author
    Fred Wurden is the GM of Enterprise Engineering (Windows, VMware, Red Hat, SAP, benchmarking) working to make AWS the most customer-centric cloud platform on Earth. Prior to AWS, Fred worked at Microsoft for 17 years and held positions, including: EU/DOJ engineering compliance for Windows and Azure, interoperability principles and partner engagements, and open source engineering. He lives with his wife and a few four-legged friends since his kids are all in college now.